Skip to content

Instantly share code, notes, and snippets.

@analytik
Forked from turtlemonvh/get_kafka_size.py
Last active September 8, 2021 07:15
Show Gist options
  • Select an option

  • Save analytik/fcead7c6e28081b7b940e27e9820c5e9 to your computer and use it in GitHub Desktop.

Select an option

Save analytik/fcead7c6e28081b7b940e27e9820c5e9 to your computer and use it in GitHub Desktop.
Get the size of kafka topics on disk, order by largest first
#!/usr/bin/python
import os
from collections import defaultdict
import subprocess
kafka_log_dir = "/var/lib/kafka/data/"
size_unit = pow(1024.0, 2) # mbs
topic_sizes = defaultdict(int)
for dir in os.listdir(kafka_log_dir):
fullpath = os.path.join(kafka_log_dir, dir)
if not os.path.isdir(fullpath):
continue
groupname = "-".join(dir.split("-")[:-1])
# Didn't have a "-" in it
if not "-" in dir:
continue
size = subprocess.check_output(['du','-b', fullpath]).split()[0]
topic_sizes[groupname] += int(size)
for topic, topic_size in sorted(topic_sizes.items(), key=lambda item: item[1], reverse=True):
print int(topic_size/(size_unit)), topic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment