Skip to content

Instantly share code, notes, and snippets.

@curiousest
Created October 15, 2015 11:28
Show Gist options
  • Select an option

  • Save curiousest/ed3e595148f48ca427e3 to your computer and use it in GitHub Desktop.

Select an option

Save curiousest/ed3e595148f48ca427e3 to your computer and use it in GitHub Desktop.
Memory efficient Django Queryset Iterator
import gc
# inspiration: https://djangosnippets.org/snippets/1949/
def queryset_memreduce_iterator(queryset, field='pk', queryset_filter='pk__gt',
ordering_function=lambda x, y: x < y,
chunksize=1000):
'''''
Iterate over a Django Queryset ordered by field (kwarg)
This method loads a maximum of chunksize (default: 1000) rows in it's
memory at the same time while django normally would load all rows in it's
memory. Using the iterator() method only causes it to not preload all the
classes.
ordering_function: defaults to ascending, don't use >= or <= here or you
will get an infinite loop
queryset_filter: defaults to ascending, set the field here again if you're
not using pk as the field
'''
assert(queryset.count() > 1)
current_item = getattr(queryset.first(), field)
last_item = getattr(queryset.last(), field)
while ordering_function(current_item, last_item):
chunk = queryset.filter(**{queryset_filter: current_item})[:chunksize]
for row in chunk:
current_item = getattr(row, field)
yield row
gc.collect()
# Usage example (date field on Job model, desc):
jobs = Job.objects.all().order_by('-date')
memreduced_jobs = queryset_memreduce_iterator(
jobs, field='date', queryset_filter='date__lte',
ordering_function=lambda x, y: x > y
)
for job in memreduced_jobs:
pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment