~~useful for interview

simple python

numbers

float('inf')
float('-inf')
# ^+-*/ and ><= etc work as expected, can lead to nan (e.g. inf-inf)
float('nan')

how to check variables a and b ref same object

(not just equals but exact same reference)

x is y

how == works for containers (dict, list, etc)

always deep comparison, e.g.

[1, 2, [3, 4]] == [1, 2, [3, 4]] # True
Counter(s) == Counter(t) # to check for anagrams

==, id, hash for custom objects

when you define new class unless you decide to override it's automatically creating (among others) these methods for you

__eq__, __hash__
if you OVERRIDE __eq__ - hash will be disabled by python for this object (you need to override this method if you want to use it)
default __eq__ and __hash__ compares object BY ID (address in memory). id(obj)

`hash` invariants to comply

if a==b then hash(a)==hash(b) must be true
immutability - during object's lifetime value of __hash__ should be always the same
must return int (not float or other types)

about `id(x)`

it returns address of object in memory casted to int. for 64-bit system up to 2^64-1 but less in practice (because OS usually restricts it to 48 bits only)

can you use custom object as key of a dict?

yes (if it does NOT have __eq__ or __hash__ overridden. or if both of them are overridden by you [reasonable assumptions apply, e.g. hash is consistent through lifetime of object, etc])

algorithms

binary search in sorted array

bisect_left - first idx so that a[:idx] < x and a[idx:] >= idx

import bisect # part of standard library
idx = bisect.bisect_left(array, num)
# bisect.bisect_left(a, x, lo=0, hi=len(a), *, key=None)

a = [0, 1, 5, 5, 5, 5, 6]
print(bisect.bisect(a, 5)) # (prints 6) # first > value index
print(bisect.bisect_left(a, 5)) # (prints 2) # first >= value index

copy.deepcopy vs loops

copy.deepcopy has internal way to track references of objects - so it will do a proper deepcopy for looped structures with cross-references as well

data structures

basic

list/tuple

tuple is immutable list, no other tricks. can be used as key to hashtables/etc
list is ~similar to std::vector but allocates +12% on pushback if exceeds the curr allocated memory. append is amortised O(1). doesn't shrink down so removing elements is just O(1). insert O(N).
- it stores references of the elements (so for x[i] you pass 2 references and it's slower than vector which just stores it's elements. though python isn't about performance).
- b=a[:] will create copy (shallow) and b=a[:][:][::-1] - 3 copies
- [1] + [2] + [3] will create 5 lists in total

list has

list.append(x)              -> None
list.extend(iterable)       -> None
list.insert(i, x)           -> None
list.remove(x)              -> None       # removes first occurrence
list.pop([i])               -> Any        # default is -1 (last item)
list.clear()                -> None
list.index(x[, start[, end]]) -> int
list.count(x)               -> int
list.sort(*, key=None, reverse=False) -> None
list.reverse()              -> None
list.copy()                 -> list

deque

blocks of 64 (to fit cache line well) BUT ONLY 62 data refs + 2 refs to next/prev block

if the block is freed it does return memory back (to python memory manager)
because ^ if you always add/pop element which creates/frees new block - it'd be worse performance (still O(1), just inefficient)

dict/set

set is storing only keys (doesn't store values)
- there's frozenset (in theory can be used as key - when order doesn't matter for inner objects. though I've never seen it and looks like a bad idea. gpt suggests storing edges of graph, caching results of functions which has set as argument, nested sets [set of sets only possible with frozenset])
dict is hashtable with open addressing collision resolution (~quadratic probing) - amortised O(1) for lookup/instert/remove. when remove place is market as 'tombstone' by special marker
to be a proper key
- must (1) have __hash__ and __eq__ defined (2) be effectively immutable (must not affect hash during lifetime of object)

dict has

dict.clear()                        -> None
dict.copy()                         -> dict
dict.fromkeys(iterable[, value])   -> dict   # class method
dict.get(key[, default])           -> Any
dict.items()                       -> dict_items
dict.keys()                        -> dict_keys
dict.pop(key[, default])           -> Any
dict.popitem()                     -> tuple   # (key, value), LIFO since 3.7
dict.setdefault(key[, default])    -> Any
dict.update([other], **kwargs)     -> None
dict.values()                      -> dict_values

set has

# note: _update is usually available if you want to do operation inplace (w/o it it's creating new set)
set.add(x)                         -> None
set.clear()                        -> None
set.copy()                         -> set
set.difference(*others)           -> set
set.difference_update(*others)    -> None
set.discard(x)                     -> None    # no error if x not present
set.intersection(*others)         -> set
set.intersection_update(*others)  -> None
set.isdisjoint(other)             -> bool
set.issubset(other)               -> bool
set.issuperset(other)             -> bool
set.pop()                          -> Any     # arbitrary element
set.remove(x)                      -> None    # KeyError if not present
set.symmetric_difference(other)   -> set
set.symmetric_difference_update(other) -> None
set.union(*others)                -> set
set.update(*others)               -> None

specialized dicts

collections.Counter

.most_common(k) internally utilizes either sorted(...) if k=n or nlargest heap (complexity O(nlogk) !!!)

heap (heapq on top of list)

min queue, works as flat array, 2i, 2i+1 children. on push - appends to end and flows to bottom, on pop - moves last item to front and flow up
used in PriorityQueue
for leetcode used in e.g. merge N sorted lists

heapq.heapify(x)            -> None        # transforms list x into a valid min-heap, O(n)
heapq.heappush(heap, item)  -> None        # pushes item onto heap, maintaining heap property, O(log n)
heapq.heappop(heap)         -> item        # pops and returns smallest item, O(log n)
heapq.heappushpop(heap, item) -> item     # push then pop (more efficient than separate), O(log n)
heapq.heapreplace(heap, item) -> item     # pop then push (more efficient for replacement), O(log n)
heapq.nlargest(n, iterable[, key]) -> list  # returns n largest elements, O(n log k)
heapq.nsmallest(n, iterable[, key]) -> list # returns n smallest elements, O(n log k)
heapq.merge(*iterables, key=None, reverse=False) -> iterator
                                        # lazily merges multiple sorted inputs, O(n log k) where k = number of iterables

queues

all queues are thread safe and blocking (they use lock inside their implementation). in contrast - base structures (list, deque, etc) are not thread safe

queue.Queue (based on deque, amortized O(1) push/pop)
queue.PriorityQueue (based on heap - log for push/pop)
queue.LIFOQueue - stack. based on list, same as append/remove last from list
queue.SimpleQueue (like Queue but 1 lock + 1 condition instead of 2 conditions. Queue supports max (and blocks producer if max capacity atm) and SimpleQueue is always unbounded. use case: unbounded consumer/producer)
multiprocessing.Queue - queue but for multiple processes (all prev ones were thread-safe, not process-safe)

algorithms

.sort() or sorted() uses Timsort (which is insertions sort for small arrays or mergesort otherwise - works well on real data, esp partially sorted arrays - O(n) in that case)
- it has O(NlogN) time complexity (can be around O(N) best case for mostly sorted subarrays)
- it has 'optimized' space complexity but not O(1) - worst case is O(N). if you ever need true O(1) space in python - need to implement smth custom, e.g. heapsort
- it is STABLE (i.e. it PRESERVES ORDER of 'equal' elements)

dangerous things

looping over values

python uses function-level scoping (not block-level scoping like c++). so e.g. this will work

for i in range(5):
    pass
print(i)

BUT list comprehensions have THEIR OWN SCOPE, e.g.

i = 10
[i for i in range(5)]
print(i) # prints 10

looping over python objects which CHANGE inside loop

Just NEVER do any of that

x = {1: 'a', 2: 'b'}
for k in x.keys(): # list(x.keys()) will work. .values(), just x, etc - will fail
    x[3] = 'c'  # 💥 RuntimeError: dictionary changed size during iteration

a = [1, 2, 3, 4]
for x in a:
    if x == 2:
        a.remove(x)  # 😬 - random bad things can happen

memory management

all object live on heap (even x=42 - int will be allocated on heap, only x (variable) will be on stack)
heap is not limited technically (limited to 32/64 bits depending on the system, for 64 bits practically limited to RAM available to machine - easily hungreds of gigabytes)
memory on stack IS LIMITED (implementation-dependent but for c-python ~8Mb on Linux/Macos and ~1Mb on Windows)
- for max depth of recursion there's limit (default ~1k, can be reset but be careful as you might have OOM. when limit is reached will throw recursion error). so in practise you should use stack/queue for dfs/bfs
- for recursion - python doesn't have tailor-call-optimization (when return f(x) doesn't go to recursion in actual impl)
- @lru_cache(maxsize=None) (by default none <-> not limited, otherwise last maxsize values only). internally uses mix of dict and double linked list
python uses pymalloc (which allocates 512b blocks for small objects collective usage) and malloc for large memory blocks (256Kb+). when gc worked out memory returned to python memory manager and usually still kept. like it MIGHT be returned to the system but it is NOT GUARANTEED
- if you have to force it - use separate subprocess for memory-hungry operations. when that process terminates it will free memory to the system

garbage collection

python counts references for object. as soon as counter reaches 0 it immediately frees memory (returns it to python memory allocator, not necessarily to the os)
for cyclic references - python periodically runs check for unreachable loops and removes them as well
python uses function-level scoping (not block-level scoping like c++). so e.g. this will work

for i in range(5):
    pass
print(i)

BUT list comprehensions have THEIR OWN SCOPE, e.g.

i = 10
[i for i in range(5)]
print(i) # prints 10

asmekal/python_cheetsheet.md

Select an option

No results found

Select an option

No results found

~~useful for interview

simple python

numbers

how to check variables a and b ref same object

how == works for containers (dict, list, etc)

==, id, hash for custom objects

`hash` invariants to comply

about `id(x)`

can you use custom object as key of a dict?

algorithms

binary search in sorted array

copy.deepcopy vs loops

data structures

basic

list/tuple

deque

dict/set

specialized dicts

heap (heapq on top of list)

queues

algorithms

dangerous things

looping over values

looping over python objects which CHANGE inside loop

memory management

garbage collection

asmekal/python_cheetsheet.md

~~useful for interview

simple python

numbers

how to check variables a and b ref same object

how == works for containers (dict, list, etc)

==, id, hash for custom objects

__hash__ invariants to comply

about id(x)

can you use custom object as key of a dict?

algorithms

binary search in sorted array

copy.deepcopy vs loops

data structures

basic

list/tuple

deque

dict/set

specialized dicts

heap (heapq on top of list)

queues

algorithms

dangerous things

looping over values

looping over python objects which CHANGE inside loop

memory management

garbage collection

`hash` invariants to comply

about `id(x)`