Skip to content

Instantly share code, notes, and snippets.

View yanlesin's full-sized avatar

Yan Lyesin yanlesin

View GitHub Profile
@yanlesin
yanlesin / python_sec_13f_list.py
Created October 21, 2024 00:08
Python parsing of SEC 13(f) list
from poppler import load_from_file, PageRenderer
import pandas as pd
import polars as pl
import numpy as np
pdf_document = load_from_file("/Users/yanlyesin/Downloads/13flist2024q3.pdf")
df_from_pdf = pd.DataFrame(columns=['text', 'page', 'cusip', 'issuer_name', 'issuer_description', 'status', 'text_length'])
for page in range(2, pdf_document.pages):
text_from_pdf = pdf_document.create_page(page).text().split("\n")
df_text = pd.DataFrame({'text': text_from_pdf, 'page': [page] * len(text_from_pdf)})