Skip to content

Instantly share code, notes, and snippets.

@Markus-de-Koster
Created July 1, 2024 10:45
Show Gist options
  • Select an option

  • Save Markus-de-Koster/0def7d14850b76ad8a881f933bddf5c6 to your computer and use it in GitHub Desktop.

Select an option

Save Markus-de-Koster/0def7d14850b76ad8a881f933bddf5c6 to your computer and use it in GitHub Desktop.
Extract PDF files from Zotero used in LaTeX Bibliography of a document
import re
import shutil
import os
import glob
from pybtex.database.input import bibtex
AUXFILE = 'PQSE.aux'
BIBFILE = 'test.bib'
DEST_FOLDER = 'pdfs'
script_location = os.path.dirname(os.path.abspath(__file__))
os.chdir(script_location)
# Extract citation keys from .aux file
with open(AUXFILE, 'r') as f:
aux_content = f.read()
citation_keys_aux = re.findall(r'\\abx@aux@cite\{[0-9]+\}\{([^\}]+)\}', aux_content)
# Read .bib file using pybtex
parser = bibtex.Parser()
bib_data = parser.parse_file(BIBFILE)
# Initialize list for .bib keys
citation_keys_bib = []
# Extract keys and corresponding file paths
file_dict = {}
for entry_key, entry_value in bib_data.entries.items():
citation_keys_bib.append(entry_key)
if 'file' in entry_value.fields:
file_info = entry_value.fields['file']
file_path_match = re.search(r'files\/([0-9]+)\/([^:]+):application/pdf', file_info)
if file_path_match:
folder_number = file_path_match.group(1)
folder_path = os.path.join('files', folder_number)
pdf_files = glob.glob(os.path.join(folder_path, '*.pdf'))
if pdf_files:
file_dict[entry_key] = pdf_files[0]
# Match and copy PDFs
for key in citation_keys_aux:
if key in file_dict:
src_path = file_dict[key]
dest_path = os.path.join(DEST_FOLDER, os.path.basename(src_path))
shutil.copy(src_path, dest_path)

Extract PDFs

Zotero does not have an option to export all PDF files used in a LaTeX document. However, for publishing papers or theses, it might be necessary to provide a comprehensive list of used literature including PDFs. If you don't want to manually select each PDF file the script below provides a way to automate this by parsing the LaTeX generated aux file and finding the PDF in the bib file annotations. Therefore, make sure the "file" field is exported (by default it is).

Python script

The python script is provided below. Simply change the file paths at the top of the document. Depending on the used version, the aux syntax might have changed - adjust the regex parsing accordingly if something does not work.

Contributing

If you find an easier way to achieve this or have a fix, please leave a comment below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment