float('inf')
float('-inf')
# ^+-*/ and ><= etc work as expected, can lead to nan (e.g. inf-inf)
float('nan')(not just equals but exact same reference)
x is yalways deep comparison, e.g.
[1, 2, [3, 4]] == [1, 2, [3, 4]] # True
Counter(s) == Counter(t) # to check for anagramswhen you define new class unless you decide to override it's automatically creating (among others) these methods for you
__eq__,__hash__- if you OVERRIDE
__eq__- hash will be disabled by python for this object (you need to override this method if you want to use it) - default
__eq__and__hash__compares object BY ID (address in memory).id(obj)
- if
a==bthenhash(a)==hash(b)must be true - immutability - during object's lifetime value of
__hash__should be always the same - must return int (not float or other types)
- it returns address of object in memory casted to int. for 64-bit system up to 2^64-1 but less in practice (because OS usually restricts it to 48 bits only)
yes (if it does NOT have __eq__ or __hash__ overridden. or if both of them are overridden by you [reasonable assumptions apply, e.g. hash is consistent through lifetime of object, etc])
- bisect_left - first idx so that
a[:idx] < xanda[idx:] >= idx
import bisect # part of standard library
idx = bisect.bisect_left(array, num)
# bisect.bisect_left(a, x, lo=0, hi=len(a), *, key=None)
a = [0, 1, 5, 5, 5, 5, 6]
print(bisect.bisect(a, 5)) # (prints 6) # first > value index
print(bisect.bisect_left(a, 5)) # (prints 2) # first >= value indexcopy.deepcopy has internal way to track references of objects - so it will do a proper deepcopy for looped structures with cross-references as well
- tuple is immutable list, no other tricks. can be used as key to hashtables/etc
- list is ~similar to std::vector but allocates +12% on pushback if exceeds the curr allocated memory. append is amortised O(1). doesn't shrink down so removing elements is just O(1). insert O(N).
- it stores references of the elements (so for x[i] you pass 2 references and it's slower than vector which just stores it's elements. though python isn't about performance).
b=a[:]will create copy (shallow) andb=a[:][:][::-1]- 3 copies[1] + [2] + [3]will create 5 lists in total
list has
list.append(x) -> None
list.extend(iterable) -> None
list.insert(i, x) -> None
list.remove(x) -> None # removes first occurrence
list.pop([i]) -> Any # default is -1 (last item)
list.clear() -> None
list.index(x[, start[, end]]) -> int
list.count(x) -> int
list.sort(*, key=None, reverse=False) -> None
list.reverse() -> None
list.copy() -> listblocks of 64 (to fit cache line well) BUT ONLY 62 data refs + 2 refs to next/prev block
- if the block is freed it does return memory back (to python memory manager)
- because ^ if you always add/pop element which creates/frees new block - it'd be worse performance (still O(1), just inefficient)
- set is storing only keys (doesn't store values)
- there's frozenset (in theory can be used as key - when order doesn't matter for inner objects. though I've never seen it and looks like a bad idea. gpt suggests storing edges of graph, caching results of functions which has set as argument, nested sets [set of sets only possible with frozenset])
- dict is hashtable with open addressing collision resolution (~quadratic probing) - amortised O(1) for lookup/instert/remove. when remove place is market as 'tombstone' by special marker
- to be a proper key
- must (1) have
__hash__and__eq__defined (2) be effectively immutable (must not affect hash during lifetime of object)
- must (1) have
dict has
dict.clear() -> None
dict.copy() -> dict
dict.fromkeys(iterable[, value]) -> dict # class method
dict.get(key[, default]) -> Any
dict.items() -> dict_items
dict.keys() -> dict_keys
dict.pop(key[, default]) -> Any
dict.popitem() -> tuple # (key, value), LIFO since 3.7
dict.setdefault(key[, default]) -> Any
dict.update([other], **kwargs) -> None
dict.values() -> dict_valuesset has
# note: _update is usually available if you want to do operation inplace (w/o it it's creating new set)
set.add(x) -> None
set.clear() -> None
set.copy() -> set
set.difference(*others) -> set
set.difference_update(*others) -> None
set.discard(x) -> None # no error if x not present
set.intersection(*others) -> set
set.intersection_update(*others) -> None
set.isdisjoint(other) -> bool
set.issubset(other) -> bool
set.issuperset(other) -> bool
set.pop() -> Any # arbitrary element
set.remove(x) -> None # KeyError if not present
set.symmetric_difference(other) -> set
set.symmetric_difference_update(other) -> None
set.union(*others) -> set
set.update(*others) -> Nonecollections.Counter
- .most_common(k) internally utilizes either sorted(...) if k=n or nlargest heap (complexity O(nlogk) !!!)
- min queue, works as flat array, 2i, 2i+1 children. on push - appends to end and flows to bottom, on pop - moves last item to front and flow up
- used in PriorityQueue
- for leetcode used in e.g. merge N sorted lists
heapq.heapify(x) -> None # transforms list x into a valid min-heap, O(n)
heapq.heappush(heap, item) -> None # pushes item onto heap, maintaining heap property, O(log n)
heapq.heappop(heap) -> item # pops and returns smallest item, O(log n)
heapq.heappushpop(heap, item) -> item # push then pop (more efficient than separate), O(log n)
heapq.heapreplace(heap, item) -> item # pop then push (more efficient for replacement), O(log n)
heapq.nlargest(n, iterable[, key]) -> list # returns n largest elements, O(n log k)
heapq.nsmallest(n, iterable[, key]) -> list # returns n smallest elements, O(n log k)
heapq.merge(*iterables, key=None, reverse=False) -> iterator
# lazily merges multiple sorted inputs, O(n log k) where k = number of iterablesall queues are thread safe and blocking (they use lock inside their implementation). in contrast - base structures (list, deque, etc) are not thread safe
- queue.Queue (based on deque, amortized O(1) push/pop)
- queue.PriorityQueue (based on heap - log for push/pop)
- queue.LIFOQueue - stack. based on list, same as append/remove last from list
- queue.SimpleQueue (like Queue but 1 lock + 1 condition instead of 2 conditions. Queue supports max (and blocks producer if max capacity atm) and SimpleQueue is always unbounded. use case: unbounded consumer/producer)
- multiprocessing.Queue - queue but for multiple processes (all prev ones were thread-safe, not process-safe)
- .sort() or sorted() uses Timsort (which is insertions sort for small arrays or mergesort otherwise - works well on real data, esp partially sorted arrays - O(n) in that case)
- it has O(NlogN) time complexity (can be around O(N) best case for mostly sorted subarrays)
- it has 'optimized' space complexity but not O(1) - worst case is O(N). if you ever need true O(1) space in python - need to implement smth custom, e.g. heapsort
- it is STABLE (i.e. it PRESERVES ORDER of 'equal' elements)
- python uses function-level scoping (not block-level scoping like c++). so e.g. this will work
for i in range(5):
pass
print(i)- BUT list comprehensions have THEIR OWN SCOPE, e.g.
i = 10
[i for i in range(5)]
print(i) # prints 10Just NEVER do any of that
x = {1: 'a', 2: 'b'}
for k in x.keys(): # list(x.keys()) will work. .values(), just x, etc - will fail
x[3] = 'c' # 💥 RuntimeError: dictionary changed size during iterationa = [1, 2, 3, 4]
for x in a:
if x == 2:
a.remove(x) # 😬 - random bad things can happen- all object live on heap (even
x=42- int will be allocated on heap, only x (variable) will be on stack) - heap is not limited technically (limited to 32/64 bits depending on the system, for 64 bits practically limited to RAM available to machine - easily hungreds of gigabytes)
- memory on stack IS LIMITED (implementation-dependent but for c-python ~8Mb on Linux/Macos and ~1Mb on Windows)
- for max depth of recursion there's limit (default ~1k, can be reset but be careful as you might have OOM. when limit is reached will throw recursion error). so in practise you should use stack/queue for dfs/bfs
- for recursion - python doesn't have tailor-call-optimization (when return f(x) doesn't go to recursion in actual impl)
@lru_cache(maxsize=None)(by default none <-> not limited, otherwise last maxsize values only). internally uses mix of dict and double linked list
- python uses pymalloc (which allocates 512b blocks for small objects collective usage) and malloc for large memory blocks (256Kb+). when gc worked out memory returned to python memory manager and usually still kept. like it MIGHT be returned to the system but it is NOT GUARANTEED
- if you have to force it - use separate subprocess for memory-hungry operations. when that process terminates it will free memory to the system
- python counts references for object. as soon as counter reaches 0 it immediately frees memory (returns it to python memory allocator, not necessarily to the os)
- for cyclic references - python periodically runs check for unreachable loops and removes them as well
- python uses function-level scoping (not block-level scoping like c++). so e.g. this will work
for i in range(5):
pass
print(i)- BUT list comprehensions have THEIR OWN SCOPE, e.g.
i = 10
[i for i in range(5)]
print(i) # prints 10