Skip to content

Instantly share code, notes, and snippets.

@wadoon
Created February 3, 2025 05:49
Show Gist options
  • Select an option

  • Save wadoon/5914f7a70c4c943d6da2b7da87ed20b9 to your computer and use it in GitHub Desktop.

Select an option

Save wadoon/5914f7a70c4c943d6da2b7da87ed20b9 to your computer and use it in GitHub Desktop.
Scripts for testing URLs inside of docx files for the status code.
#!/usr/bin/env python3
# pip install --user python-docx
from docx import Document
from docx.opc.constants import RELATIONSHIP_TYPE as RT
import sys
def iter_hyperlink_rels(rels):
for rel in rels:
if rels[rel].reltype == RT.HYPERLINK:
yield rels[rel]._target
for f in sys.argv[1:]:
document = Document(f)
rels = document.part.rels
for url in iter_hyperlink_rels(rels):
print(url)
#!/usr/bin/env sh
python testdocxurl.py $1 > urls.txt
while read LINE || [ -n "$LINE" ]; do
#curl --insecure -o /dev/null --silent --head --write-out "%{http_code} $LINE\n" "$LINE"
# if you want to follow also redirects (to detect 404s), add "-L --max-redirs 500"
curl -L --max-redirs 500 --insecure -o /dev/null --silent --head --write-out "%{http_code} $LINE\n" "$LINE"
done < urls.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment