Skip to content

Instantly share code, notes, and snippets.

@asmekal
Created December 6, 2025 16:22
Show Gist options
  • Select an option

  • Save asmekal/10f093e2e5dbf11b0d201f0068d88d69 to your computer and use it in GitHub Desktop.

Select an option

Save asmekal/10f093e2e5dbf11b0d201f0068d88d69 to your computer and use it in GitHub Desktop.

~~useful for interview

simple python

numbers

float('inf')
float('-inf')
# ^+-*/ and ><= etc work as expected, can lead to nan (e.g. inf-inf)
float('nan')

how to check variables a and b ref same object

(not just equals but exact same reference)

x is y

how == works for containers (dict, list, etc)

always deep comparison, e.g.

[1, 2, [3, 4]] == [1, 2, [3, 4]] # True
Counter(s) == Counter(t) # to check for anagrams

==, id, hash for custom objects

when you define new class unless you decide to override it's automatically creating (among others) these methods for you

  • __eq__, __hash__
  • if you OVERRIDE __eq__ - hash will be disabled by python for this object (you need to override this method if you want to use it)
  • default __eq__ and __hash__ compares object BY ID (address in memory). id(obj)

__hash__ invariants to comply

  1. if a==b then hash(a)==hash(b) must be true
  2. immutability - during object's lifetime value of __hash__ should be always the same
  3. must return int (not float or other types)

about id(x)

  • it returns address of object in memory casted to int. for 64-bit system up to 2^64-1 but less in practice (because OS usually restricts it to 48 bits only)

can you use custom object as key of a dict?

yes (if it does NOT have __eq__ or __hash__ overridden. or if both of them are overridden by you [reasonable assumptions apply, e.g. hash is consistent through lifetime of object, etc])

algorithms

binary search in sorted array

  • bisect_left - first idx so that a[:idx] < x and a[idx:] >= idx
import bisect # part of standard library
idx = bisect.bisect_left(array, num)
# bisect.bisect_left(a, x, lo=0, hi=len(a), *, key=None)

a = [0, 1, 5, 5, 5, 5, 6]
print(bisect.bisect(a, 5)) # (prints 6) # first > value index
print(bisect.bisect_left(a, 5)) # (prints 2) # first >= value index

copy.deepcopy vs loops

copy.deepcopy has internal way to track references of objects - so it will do a proper deepcopy for looped structures with cross-references as well

data structures

basic

list/tuple

  • tuple is immutable list, no other tricks. can be used as key to hashtables/etc
  • list is ~similar to std::vector but allocates +12% on pushback if exceeds the curr allocated memory. append is amortised O(1). doesn't shrink down so removing elements is just O(1). insert O(N).
    • it stores references of the elements (so for x[i] you pass 2 references and it's slower than vector which just stores it's elements. though python isn't about performance).
    • b=a[:] will create copy (shallow) and b=a[:][:][::-1] - 3 copies
    • [1] + [2] + [3] will create 5 lists in total

list has

list.append(x)              -> None
list.extend(iterable)       -> None
list.insert(i, x)           -> None
list.remove(x)              -> None       # removes first occurrence
list.pop([i])               -> Any        # default is -1 (last item)
list.clear()                -> None
list.index(x[, start[, end]]) -> int
list.count(x)               -> int
list.sort(*, key=None, reverse=False) -> None
list.reverse()              -> None
list.copy()                 -> list

deque

blocks of 64 (to fit cache line well) BUT ONLY 62 data refs + 2 refs to next/prev block

  • if the block is freed it does return memory back (to python memory manager)
  • because ^ if you always add/pop element which creates/frees new block - it'd be worse performance (still O(1), just inefficient)

dict/set

  • set is storing only keys (doesn't store values)
    • there's frozenset (in theory can be used as key - when order doesn't matter for inner objects. though I've never seen it and looks like a bad idea. gpt suggests storing edges of graph, caching results of functions which has set as argument, nested sets [set of sets only possible with frozenset])
  • dict is hashtable with open addressing collision resolution (~quadratic probing) - amortised O(1) for lookup/instert/remove. when remove place is market as 'tombstone' by special marker
  • to be a proper key
    • must (1) have __hash__ and __eq__ defined (2) be effectively immutable (must not affect hash during lifetime of object)

dict has

dict.clear()                        -> None
dict.copy()                         -> dict
dict.fromkeys(iterable[, value])   -> dict   # class method
dict.get(key[, default])           -> Any
dict.items()                       -> dict_items
dict.keys()                        -> dict_keys
dict.pop(key[, default])           -> Any
dict.popitem()                     -> tuple   # (key, value), LIFO since 3.7
dict.setdefault(key[, default])    -> Any
dict.update([other], **kwargs)     -> None
dict.values()                      -> dict_values

set has

# note: _update is usually available if you want to do operation inplace (w/o it it's creating new set)
set.add(x)                         -> None
set.clear()                        -> None
set.copy()                         -> set
set.difference(*others)           -> set
set.difference_update(*others)    -> None
set.discard(x)                     -> None    # no error if x not present
set.intersection(*others)         -> set
set.intersection_update(*others)  -> None
set.isdisjoint(other)             -> bool
set.issubset(other)               -> bool
set.issuperset(other)             -> bool
set.pop()                          -> Any     # arbitrary element
set.remove(x)                      -> None    # KeyError if not present
set.symmetric_difference(other)   -> set
set.symmetric_difference_update(other) -> None
set.union(*others)                -> set
set.update(*others)               -> None

specialized dicts

collections.Counter

  • .most_common(k) internally utilizes either sorted(...) if k=n or nlargest heap (complexity O(nlogk) !!!)

heap (heapq on top of list)

  • min queue, works as flat array, 2i, 2i+1 children. on push - appends to end and flows to bottom, on pop - moves last item to front and flow up
  • used in PriorityQueue
  • for leetcode used in e.g. merge N sorted lists
heapq.heapify(x)            -> None        # transforms list x into a valid min-heap, O(n)
heapq.heappush(heap, item)  -> None        # pushes item onto heap, maintaining heap property, O(log n)
heapq.heappop(heap)         -> item        # pops and returns smallest item, O(log n)
heapq.heappushpop(heap, item) -> item     # push then pop (more efficient than separate), O(log n)
heapq.heapreplace(heap, item) -> item     # pop then push (more efficient for replacement), O(log n)
heapq.nlargest(n, iterable[, key]) -> list  # returns n largest elements, O(n log k)
heapq.nsmallest(n, iterable[, key]) -> list # returns n smallest elements, O(n log k)
heapq.merge(*iterables, key=None, reverse=False) -> iterator
                                        # lazily merges multiple sorted inputs, O(n log k) where k = number of iterables

queues

all queues are thread safe and blocking (they use lock inside their implementation). in contrast - base structures (list, deque, etc) are not thread safe

  • queue.Queue (based on deque, amortized O(1) push/pop)
  • queue.PriorityQueue (based on heap - log for push/pop)
  • queue.LIFOQueue - stack. based on list, same as append/remove last from list
  • queue.SimpleQueue (like Queue but 1 lock + 1 condition instead of 2 conditions. Queue supports max (and blocks producer if max capacity atm) and SimpleQueue is always unbounded. use case: unbounded consumer/producer)
  • multiprocessing.Queue - queue but for multiple processes (all prev ones were thread-safe, not process-safe)

algorithms

  • .sort() or sorted() uses Timsort (which is insertions sort for small arrays or mergesort otherwise - works well on real data, esp partially sorted arrays - O(n) in that case)
    • it has O(NlogN) time complexity (can be around O(N) best case for mostly sorted subarrays)
    • it has 'optimized' space complexity but not O(1) - worst case is O(N). if you ever need true O(1) space in python - need to implement smth custom, e.g. heapsort
    • it is STABLE (i.e. it PRESERVES ORDER of 'equal' elements)

dangerous things

looping over values

  • python uses function-level scoping (not block-level scoping like c++). so e.g. this will work
for i in range(5):
    pass
print(i)
  • BUT list comprehensions have THEIR OWN SCOPE, e.g.
i = 10
[i for i in range(5)]
print(i) # prints 10

looping over python objects which CHANGE inside loop

Just NEVER do any of that

x = {1: 'a', 2: 'b'}
for k in x.keys(): # list(x.keys()) will work. .values(), just x, etc - will fail
    x[3] = 'c'  # 💥 RuntimeError: dictionary changed size during iteration
a = [1, 2, 3, 4]
for x in a:
    if x == 2:
        a.remove(x)  # 😬 - random bad things can happen

memory management

  • all object live on heap (even x=42 - int will be allocated on heap, only x (variable) will be on stack)
  • heap is not limited technically (limited to 32/64 bits depending on the system, for 64 bits practically limited to RAM available to machine - easily hungreds of gigabytes)
  • memory on stack IS LIMITED (implementation-dependent but for c-python ~8Mb on Linux/Macos and ~1Mb on Windows)
    • for max depth of recursion there's limit (default ~1k, can be reset but be careful as you might have OOM. when limit is reached will throw recursion error). so in practise you should use stack/queue for dfs/bfs
    • for recursion - python doesn't have tailor-call-optimization (when return f(x) doesn't go to recursion in actual impl)
    • @lru_cache(maxsize=None) (by default none <-> not limited, otherwise last maxsize values only). internally uses mix of dict and double linked list
  • python uses pymalloc (which allocates 512b blocks for small objects collective usage) and malloc for large memory blocks (256Kb+). when gc worked out memory returned to python memory manager and usually still kept. like it MIGHT be returned to the system but it is NOT GUARANTEED
    • if you have to force it - use separate subprocess for memory-hungry operations. when that process terminates it will free memory to the system

garbage collection

  • python counts references for object. as soon as counter reaches 0 it immediately frees memory (returns it to python memory allocator, not necessarily to the os)
  • for cyclic references - python periodically runs check for unreachable loops and removes them as well
  • python uses function-level scoping (not block-level scoping like c++). so e.g. this will work
for i in range(5):
    pass
print(i)
  • BUT list comprehensions have THEIR OWN SCOPE, e.g.
i = 10
[i for i in range(5)]
print(i) # prints 10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment