Skip to content

Instantly share code, notes, and snippets.

@bossdown123
Created August 21, 2025 10:28
Show Gist options
  • Select an option

  • Save bossdown123/6c5bb75434adaa6a1d72bc262fa74a73 to your computer and use it in GitHub Desktop.

Select an option

Save bossdown123/6c5bb75434adaa6a1d72bc262fa74a73 to your computer and use it in GitHub Desktop.

Docs to HTML Converter

This repository contains convert_tex_to_html.py, a self-contained Python script that scans a source folder of LaTeX documents and converts the main documents to HTML using tex4ht tooling. It prefers make4ht and falls back to htlatex when needed. It also generates a simple HTML viewer and index to browse the converted docs.

See SETUP.md for installation requirements.

What It Does

  • Finds .tex files under a source directory (default tmp_Docs).
  • Heuristically selects “main” documents that contain a \documentclass (unless --all-tex is set).
  • Attempts HTML conversion with make4ht (HTML5 output with either MathML+MathJax or TeX+MathJax) and falls back to htlatex if needed.
  • Handles minted by ensuring -shell-escape is used, preferring cached output, and surfacing errors clearly.
  • Copies generated HTML and any local assets to a destination folder (default tmp_Docs_html), adds small CSS overrides, and optionally injects MathJax into HTML.
  • Produces docs_index.json and an index.html viewer to navigate all converted documents.

Command-Line Usage

python convert_tex_to_html.py [options]

Options:
  --src SRC            Source directory with .tex files (default: tmp_Docs)
  --dst DST            Output directory for HTML (default: tmp_Docs_html)
  --all-tex            Convert every .tex file, not just main docs
  --clean              Clean each output subfolder before converting
  --math-output {mathml,mathjax}
                       Math format for make4ht output:
                       - mathml  : write MathML; rendered via MathJax SVG
                       - mathjax : keep TeX; rendered via MathJax SVG
  --shell-escape       Enable -shell-escape for LaTeX runs (default)
  --no-shell-escape    Disable -shell-escape for LaTeX runs
  --engine {pdflatex,lualatex,xelatex}
                       LaTeX engine to use (default: pdflatex)
  --postbuild-fix-images
                       Run the post-build image fixer to copy/normalize
                       equation PNGs referenced by *_htmlwrap.html

Typical runs:

  • Convert all main documents: python convert_tex_to_html.py
  • Force convert every .tex: python convert_tex_to_html.py --all-tex
  • Clean outputs first: python convert_tex_to_html.py --clean
  • Use MathML flavor: python convert_tex_to_html.py --math-output mathml
  • Convert and auto-fix image references: python convert_tex_to_html.py --postbuild-fix-images

Outputs for each .tex go under DST/<subdir>/<stem>/. Each conversion also writes a build.log in the document output folder with the detailed tool logs.

Post-Build Image Fixes

Some tex4ht outputs reference equation images as <stem>_htmlwrapNx.png while the LaTeX run produces <stem>Nx.png in the source folder. To prevent 404s, run:

make postbuild-fix-images

This runs scripts/post_build_fix_images.py, which:

  • Copies math/equation PNGs from tmp_Docs/<Module>/ to the corresponding tmp_Docs_html/.../ locations, renaming to the expected _htmlwrapNx.png pattern.
  • Normalizes case-only file name mismatches (e.g., .PNG.png) next to the HTML files when needed.

How It Works (Pipeline)

  1. Detection: For each .tex, detect_main_tex reads at most the first 200 lines and flags files containing \documentclass as main docs (unless --all-tex is used).

  2. Environment setup: The script populates:

    • TEXINPUTS to include the .tex parent and source roots, ensuring local includes are found.
    • PATH with ~/.local/bin (for user-installed pygmentize) and ./bin.
  3. make4ht attempt: try_tex4ht_html tries make4ht first with:

    • Format: html5+mathjax or html5+mathml.
    • Shell-escape: Adds -s to make4ht and passes -interaction=nonstopmode and -shell-escape as the LaTeX argument when enabled.
    • Minted handling: If the source uses \usepackage{minted}, convert_one creates a tiny wrapper <stem>_htmlwrap.tex that does:
      • \PassOptionsToPackage{frozencache,cachedir=_minted-<stem>}{minted}
      • \input{<orig>.tex} This minimizes re-running pygmentize by reusing cached outputs when present.
    • Error detection: If make4ht logs contain common minted errors (missing -shell-escape or Pygments output), the script triggers the fallback even if an HTML file was produced.
  4. Fallback to htlatex: If needed, htlatex is called with correct option ordering:

    • htlatex <file> "tex4ht opts" "t4ht opts" "latex opts"
    • Output dir is passed in the t4ht argument (e.g., -d<dir>), and LaTeX options include -interaction=nonstopmode and (if enabled) -shell-escape.
    • The environment variables LATEX and TEX are set to the chosen engine and -shell-escape when enabled, for wrappers that respect them.
  5. Result handling: The produced HTML is copied into the destination folder (if generated elsewhere), together with any linked local assets (CSS/JS/images) and an assets directory named like the document stem.

  6. Post-process: A small CSS file (zzz_overrides.css) is added and linked into each page. When make4ht succeeds, the script injects a MathJax <script> snippet for math rendering; for htlatex fallback we keep MathJax injection off, as tex4ht often converts math to images.

  7. Catalog and Viewer: All successful conversions are gathered into docs_index.json and a simple index.html viewer is written to the destination root for browsing.

Function Reference

  • detect_main_tex(tex_path: Path) -> bool

    • Heuristic: returns True if \documentclass is found in the first ~200 non-comment lines.
  • which(cmd: str) -> Optional[str]

    • Wrapper for shutil.which.
  • run(cmd: List[str], cwd: Path, env: dict, timeout: int) -> Tuple[int, str, str]

    • Executes a subprocess with given working directory and environment, returns (exit_code, stdout, stderr); kills on timeout.
  • _collect_local_links(html: Path) -> Set[str]

    • Parses HTML and extracts local src/href references for asset copying.
  • _inject_overrides(html_file: Path, css_name: str, mj_snippet: Optional[str]) -> None

    • Inserts a link to css_name and an optional MathJax snippet into a single HTML file.
  • _inject_in_all_html(out_dir: Path, mj_snippet: Optional[str]) -> None

    • Writes a small zzz_overrides.css (if missing) and injects overrides/MathJax into all HTML files in out_dir.
  • try_tex4ht_html(tex: Path, out_dir: Path, env: dict, timeout: int = 3600, math_output: str = 'mathjax', shell_escape: bool = True, engine: str = 'pdflatex', extra_latex_opts: str = '', jobname: Optional[str] = None) -> Tuple[bool, str, Optional[Path]]

    • Orchestrates the conversion using make4ht first, then htlatex as a fallback. Accepts extra LaTeX options and a jobname (used by the minted wrapper).
  • convert_one(tex: Path, src_root: Path, dst_root: Path, clean: bool = False, math_output: str = 'mathjax', shell_escape: bool = True, engine: str = 'pdflatex') -> Tuple[bool, str, Path]

    • Prepares the output folder, environment, and minted wrapper (if needed), then calls try_tex4ht_html and returns a tuple (ok, summary, html_path_or_dir).
  • _extract_title(html_file: Path) -> str

    • Retrieves an HTML page title from <title> or the first <h1>/<h2>; falls back to the filename stem.
  • _write_catalog(dst_root: Path, items) -> int

    • Writes docs_index.json with the list of successfully converted docs and their titles; returns the count written.
  • _write_viewer(dst_root: Path) -> None

    • Writes a simple, static index.html that loads the catalog and lets you browse documents.
  • main()

    • CLI entry point. Parses arguments, discovers .tex files, converts them, and writes the catalog/viewer.

Minted, Pygments, and Caching

minted calls pygmentize via shell escape. The converter enables -shell-escape by default, and creates a wrapper that enables frozencache in minted to reuse previously generated .pyg outputs. If your environment disallows shell escape, run a one-time pdflatex -shell-escape on the source documents to populate _minted-<jobname> cache directories; subsequent HTML conversions can then reuse the cache.

Troubleshooting

  • “Package minted Error: You must invoke LaTeX with the -shell-escape flag.”

    • Ensure your TeX run permits -shell-escape (the script requests it by default).
    • Ensure pygmentize is installed and on PATH.
  • “Missing Pygments output; \inputminted was …”

    • Same as above; also verify the _minted-<stem> cache exists or prebuild it by running pdflatex -shell-escape once.
  • Math not rendering in the browser

    • make4ht path injects MathJax from a CDN; ensure network access. The htlatex fallback typically uses images for math and should work offline.
  • Broken relative images/links

    • The script copies assets listed in HTML src/href. If your document references assets generated by external tools during LaTeX runs, ensure those tools ran and outputs exist before conversion.

Development Notes

  • The script uses only the Python standard library. No Python packages are required beyond core Python.
  • Conversion logs are written per-document to help diagnose toolchain issues.
  • Changes were made to ensure proper argument ordering for htlatex and to pass shell-escape reliably to both make4ht and htlatex paths.
#!/usr/bin/env python3
import argparse
import os
import shutil
import subprocess
import sys
from pathlib import Path
import json
from typing import List, Optional, Tuple, Set
def detect_main_tex(tex_path: Path) -> bool:
"""
Heuristic: consider a .tex file a main document if it contains \\documentclass
in the first ~200 lines ignoring comment-only lines.
"""
try:
with tex_path.open('r', encoding='utf-8', errors='ignore') as f:
for i, line in enumerate(f):
if i > 200:
break
s = line.strip()
if not s or s.startswith('%'):
continue
if '\\documentclass' in s:
return True
return False
except Exception:
return False
def which(cmd: str) -> Optional[str]:
return shutil.which(cmd)
def run(cmd: List[str], cwd: Path, env: dict, timeout: int) -> Tuple[int, str, str]:
proc = subprocess.Popen(
cmd,
cwd=str(cwd),
env=env,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
)
try:
out, err = proc.communicate(timeout=timeout)
return proc.returncode, out, err
except subprocess.TimeoutExpired:
proc.kill()
out, err = proc.communicate()
return 124, out, err + "\n[timeout]"
def _collect_local_links(html: Path) -> Set[str]:
"""Parse a simple set of local asset links (src/href) from an HTML file."""
import re
data = html.read_text(encoding='utf-8', errors='ignore')
links: Set[str] = set()
for m in re.finditer(r"(?:src|href)\s*=\s*['\"]([^'\"]+)['\"]", data, flags=re.IGNORECASE):
href = m.group(1).strip()
if not href or href.startswith('#'):
continue
if '://' in href or href.startswith('mailto:'):
continue
if href.startswith('data:'):
continue
# Avoid absolute paths; keep relative only
if href.startswith('/'):
continue
links.add(href)
return links
def _inject_overrides(html_file: Path, css_name: str, mj_snippet: Optional[str]) -> None:
try:
html = html_file.read_text(encoding='utf-8', errors='ignore')
except Exception:
return
lines = []
inserted_css = False
inserted_mj = False
mj_present = ('MathJax' in html)
css_link = f'<link rel="stylesheet" type="text/css" href="{css_name}" />'
for line in html.splitlines():
# After existing CSS, add our override link once
if (not inserted_css) and ('</head>' in line or '<meta name="src"' in line or ('stylesheet' in line and '.css' in line)):
lines.append(line)
if 'stylesheet' in line and '.css' in line:
lines.append(css_link)
inserted_css = True
elif '</head>' in line:
lines.insert(max(0, len(lines)-1), css_link)
inserted_css = True
continue
# Right before </head>, inject MathJax if requested and not present
if (mj_snippet is not None) and (not mj_present) and (not inserted_mj) and '</head>' in line:
lines.append(mj_snippet)
inserted_mj = True
lines.append(line)
# Fallbacks if not inserted
if not inserted_css:
lines.insert(0, css_link)
if (mj_snippet is not None) and (not mj_present) and (not inserted_mj):
lines.insert(0, mj_snippet)
try:
html_file.write_text('\n'.join(lines), encoding='utf-8')
except Exception:
pass
def _inject_in_all_html(out_dir: Path, mj_snippet: Optional[str]) -> None:
fix_css = out_dir / 'zzz_overrides.css'
css_snippet = (
'/* overrides */\n'
'div.center, div.center div.center { text-align: center; }\n'
'div.center { margin-left: 0 !important; margin-right: 0 !important; }\n'
'figure.figure { margin-left: auto; margin-right: auto; }\n'
'/* Upscale figure images produced by tex4ht (often carry tiny width/height attributes) */\n'
'img[alt="PIC"] { display:block; width: auto !important; height: auto !important; max-width: min(100%, 900px) !important; margin: 1rem auto; }\n'
)
try:
if not fix_css.exists():
fix_css.write_text(css_snippet, encoding='utf-8')
else:
cur = fix_css.read_text(encoding='utf-8', errors='ignore')
# If an older rule was written earlier, append the improved sizing constraints
if 'max-width: min(100%' not in cur:
fix_css.write_text(cur.rstrip() + '\n' + css_snippet, encoding='utf-8')
except Exception:
return
for html in sorted(out_dir.glob('*.html')):
_inject_overrides(html, fix_css.name, mj_snippet=mj_snippet)
def try_tex4ht_html(tex: Path, out_dir: Path, env: dict, timeout: int = 3600, math_output: str = 'mathjax', shell_escape: bool = True, engine: str = 'pdflatex', extra_latex_opts: str = '', jobname: Optional[str] = None) -> Tuple[bool, str, Optional[Path]]:
"""
Convert a LaTeX document to HTML using tex4ht tools.
Preference order:
1) make4ht (modern wrapper)
2) htlatex (legacy wrapper)
We run in the source directory to preserve relative includes, and
then copy the produced HTML and assets into out_dir.
"""
out_dir.mkdir(parents=True, exist_ok=True)
logs: List[str] = []
def locate_html(and_from_parent: bool = True) -> Optional[Path]:
candidates = [
out_dir / 'index.html',
out_dir / f'{tex.stem}.html',
]
if and_from_parent:
candidates.extend([
tex.parent / 'index.html',
tex.parent / f'{tex.stem}.html',
])
for c in candidates:
if c.exists() and c.stat().st_size > 0:
return c
found = list(out_dir.rglob('*.html'))
if found:
return found[0]
if and_from_parent:
found = list(tex.parent.rglob('*.html'))
if found:
return found[0]
return None
def copy_results(html_path: Path) -> Path:
# Copy HTML to out_dir if needed
if html_path.parent != out_dir:
target_html = out_dir / html_path.name
target_html.write_text(html_path.read_text(encoding='utf-8', errors='ignore'), encoding='utf-8')
# Copy linked local assets
for rel in sorted(_collect_local_links(html_path)):
src = (html_path.parent / rel).resolve()
if src.is_file():
dst = out_dir / rel
dst.parent.mkdir(parents=True, exist_ok=True)
try:
shutil.copy2(src, dst)
except Exception:
pass
# Copy <stem>/ assets directory if present
assets_dir = html_path.parent / tex.stem
if assets_dir.exists() and assets_dir.is_dir():
dst_assets_dir = out_dir / assets_dir.name
shutil.copytree(assets_dir, dst_assets_dir, dirs_exist_ok=True)
return target_html
return html_path
# 1) Try make4ht
exe = which('make4ht')
if exe:
# Use HTML5 output; choose math extension
fmt = f"html5+{'mathml' if math_output == 'mathml' else 'mathjax'}"
# Try to propagate -shell-escape to LaTeX used by make4ht
env2 = env.copy()
if shell_escape:
latex_cmd = f"{engine} -shell-escape"
else:
latex_cmd = engine
env2['LATEX'] = latex_cmd
env2['TEX'] = latex_cmd
# Pass LaTeX options explicitly as the 4th positional argument too,
# since make4ht may ignore LATEX/TEX env vars in some setups.
latex_opts = '-interaction=nonstopmode'
if extra_latex_opts:
latex_opts += f' {extra_latex_opts}'
if shell_escape:
latex_opts += ' -shell-escape'
cmd = [exe]
if shell_escape:
cmd.append('-s') # enable shell-escape in LaTeX runs
if jobname:
cmd.extend(['-j', jobname])
cmd.extend(['-d', str(out_dir), '-f', fmt, tex.name, '', '', latex_opts])
code, out, err = run(cmd, cwd=tex.parent, env=env2, timeout=timeout)
# Fallback: if mathml requested but extension is missing, retry with mathjax
if (math_output == 'mathml') and ('Cannot load extension: mathml' in (out + err)):
fmt = 'html5+mathjax'
cmd = [exe]
if shell_escape:
cmd.append('-s')
if jobname:
cmd.extend(['-j', jobname])
cmd.extend(['-d', str(out_dir), '-f', fmt, tex.name, '', '', latex_opts])
code, out2, err2 = run(cmd, cwd=tex.parent, env=env2, timeout=timeout)
out, err = out + "\n[FALLBACK to mathjax]\n" + out2, err + err2
logs.append(f"== make4ht ==\n$ {' '.join(cmd)}\n\n[stdout]\n{out}\n\n[stderr]\n{err}\n")
html_path = locate_html(and_from_parent=True)
# Detect minted/shell-escape issues and force fallback to htlatex even if make4ht produced output
minted_fail = ('Package minted Error' in (out + err)) or ('You must invoke LaTeX with the -shell-escape flag' in (out + err)) or ('Missing Pygments output' in (out + err))
if (code == 0) and (html_path is not None) and (not minted_fail):
copied = copy_results(html_path)
# Add overrides & MathJax to all paginated pages in this doc dir
if math_output == 'mathml':
mj_snippet = (
'<script>window.MathJax = {svg:{fontCache:"global"}};</script>'
'<script async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/mml-svg.js"></script>'
)
else:
mj_snippet = (
'<script>window.MathJax = {tex:{inlineMath:[["$","$"],["\\(","\\)"]]},svg:{fontCache:"global"}};</script>'
'<script async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js"></script>'
)
_inject_in_all_html(out_dir, mj_snippet=mj_snippet)
(out_dir / 'build.log').write_text('\n\n'.join(logs), encoding='utf-8')
return True, 'ok', copied
elif minted_fail:
logs.append('[[ make4ht produced output but with minted/shell-escape errors; falling back to htlatex ]]')
# 2) Fallback to htlatex (run a couple of times to stabilize refs)
exe = which('htlatex')
if not exe:
(out_dir / 'build.log').write_text('\n\n'.join(logs + ['htlatex not found in PATH'] ), encoding='utf-8')
return False, 'tex4ht not available', None
log_parts: List[str] = []
# Prepare env for htlatex: some wrappers respect LATEX/TEX env
env_ht = env.copy()
if shell_escape:
env_ht['LATEX'] = f"{engine} -shell-escape"
env_ht['TEX'] = f"{engine} -shell-escape"
for i in range(2):
# htlatex expects arguments in this order:
# htlatex <file> "<tex4ht opts>" "<t4ht opts>" "<latex opts>"
# Pass output dir via t4ht opts (-d...) and LaTeX options in the 4th arg.
latex_opts = '-interaction=nonstopmode'
if extra_latex_opts:
latex_opts += f' {extra_latex_opts}'
if shell_escape:
latex_opts += ' -shell-escape'
cmd = [exe, tex.name, 'html,2', f'-d{str(out_dir)}', latex_opts]
code, out, err = run(cmd, cwd=tex.parent, env=env_ht, timeout=timeout)
log_parts.append(f"# pass {i+1}\n$ {' '.join(cmd)}\n\n[stdout]\n{out}\n\n[stderr]\n{err}\n")
# continue even if code != 0 to try to produce partial output
logs.append("== htlatex ==\n" + '\n'.join(log_parts))
html_path = locate_html(and_from_parent=True)
ok = html_path is not None and html_path.exists()
copied = copy_results(html_path) if html_path else None
# Add CSS overrides; MathJax injection is kept optional but htlatex output normally uses images
if copied is not None:
_inject_in_all_html(out_dir, mj_snippet=None)
(out_dir / 'build.log').write_text('\n\n'.join(logs), encoding='utf-8')
return ok, ('ok' if ok else 'fail'), copied
def convert_one(tex: Path, src_root: Path, dst_root: Path, clean: bool = False, math_output: str = 'mathjax', shell_escape: bool = True, engine: str = 'pdflatex') -> Tuple[bool, str, Path]:
rel_parent = tex.parent.relative_to(src_root)
out_dir = dst_root / rel_parent / tex.stem
if clean and out_dir.exists():
try:
shutil.rmtree(out_dir)
except Exception:
pass
out_dir.mkdir(parents=True, exist_ok=True)
# Prepare environment: extend TEXINPUTS so LaTeX can find includes
env = os.environ.copy()
texinputs = env.get('TEXINPUTS', '')
extra_paths = [str(tex.parent), str(src_root), str(src_root.parent)]
env['TEXINPUTS'] = os.pathsep.join(extra_paths + [texinputs, '']) # trailing '' adds default path
# Ensure pygmentize and other user-level tools are discoverable
home = Path.home()
user_bin = str(home / '.local' / 'bin')
repo_bin = str(Path.cwd() / 'bin')
env['PATH'] = os.pathsep.join([user_bin, repo_bin, env.get('PATH', '')])
# If the source uses minted, create a lightweight wrapper that freezes cache
extra_latex_opts = ''
jobname = None
tex_for_build = tex
try:
uses_minted = '\\usepackage{minted}' in tex.read_text(encoding='utf-8', errors='ignore')
except Exception:
uses_minted = False
if uses_minted:
wrapper = tex.parent / f"{tex.stem}_htmlwrap.tex"
try:
wrapper.write_text(
'% auto-generated wrapper for HTML conversion\n'
f'\\PassOptionsToPackage{{frozencache,cachedir=_minted-{tex.stem}}}{{minted}}\n'
f'\\input{{{tex.name}}}\n',
encoding='utf-8'
)
tex_for_build = wrapper
# Ensure jobname is the original stem so minted cache dir matches
extra_latex_opts = f'-jobname={tex.stem}'
jobname = tex.stem
except Exception:
pass
ok, status, html_path = try_tex4ht_html(tex_for_build, out_dir, env, math_output=math_output, shell_escape=shell_escape, engine=engine, extra_latex_opts=extra_latex_opts, jobname=jobname)
if html_path is None:
html_path = out_dir
return ok, f'tex4ht:{status}', html_path
def _extract_title(html_file: Path) -> str:
try:
data = html_file.read_text(encoding='utf-8', errors='ignore')
except Exception:
return html_file.stem
# Try HTML <title>
import re
m = re.search(r"<title>(.*?)</title>", data, flags=re.IGNORECASE | re.DOTALL)
if m:
t = re.sub(r"\s+", " ", m.group(1)).strip()
if t:
return t
# Try first h1/h2
m = re.search(r"<h[12][^>]*>(.*?)</h[12]>", data, flags=re.IGNORECASE | re.DOTALL)
if m:
t = re.sub(r"<[^>]+>", " ", m.group(1))
t = re.sub(r"\s+", " ", t).strip()
if t:
return t
return html_file.stem
def _write_catalog(dst_root: Path, items: List[Tuple[Path, bool, str, Path]]) -> int:
"""Write docs_index.json at dst_root with entries for successful conversions."""
catalog = []
for tex, ok, summary, html_path in items:
if not ok:
continue
if not html_path.exists() or html_path.is_dir():
continue
rel = html_path.relative_to(dst_root)
title = _extract_title(html_path)
catalog.append({
'title': title,
'path': str(rel).replace('\\', '/'),
'source': str(tex),
})
(dst_root / 'docs_index.json').write_text(json.dumps(catalog, ensure_ascii=False, indent=2), encoding='utf-8')
return len(catalog)
def _write_viewer(dst_root: Path) -> None:
html = """<!DOCTYPE html>
<html lang=\"en\">
<head>
<meta charset=\"utf-8\" />
<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />
<title>Docs Viewer</title>
<style>
html, body { height: 100%; margin: 0; font-family: system-ui, -apple-system, Segoe UI, Roboto, sans-serif; overflow: hidden; }
.layout { display: grid; grid-template-columns: 320px 1fr; height: 100vh; height: 100dvh; }
.sidebar { border-right: 1px solid #ddd; display: flex; flex-direction: column; min-width:0; overflow: hidden; }
.search { padding: 12px; border-bottom: 1px solid #eee; }
.search input { width: 100%; padding: 8px 10px; font-size: 14px; border: 1px solid #ccc; border-radius: 6px; }
.list { overflow: auto; padding: 8px; }
.group { border-radius:8px; margin-bottom:8px; border:1px solid #e5e7eb; }
.group summary { list-style:none; cursor:pointer; padding:6px 8px; font-weight:600; display:flex; align-items:center; gap:8px; }
.group summary::-webkit-details-marker{ display:none; }
.group[open] summary { background:#f3f5f7; border-bottom:1px solid #eee; }
.group-items { padding:6px; }
.item { padding: 6px 8px; border-radius: 6px; cursor: pointer; line-height: 1.2; }
.item:hover { background: #f3f5f7; }
.item.active { background: #e6f0ff; }
.item small { display:block; color:#666; }
.sections { margin: 6px 0 8px 8px; display:none; }
.item.active + .sections { display:block; }
.section-link { display:block; padding:4px 8px; margin:2px 0; border-radius:6px; background:#f8fafc; color:#0f172a; text-decoration:none; font-size:12px; }
.section-link.active { background:#dbeafe; color:#0f172a; box-shadow: inset 0 0 0 1px #bfdbfe; }
.section-link:hover { background:#eef2ff; }
.lvl1 { font-weight:600; }
.lvl2 { padding-left: 12px; }
.lvl3 { padding-left: 20px; }
.lvl4 { padding-left: 28px; }
.main { display:flex; flex-direction:column; height:100%; min-height:0; overflow: hidden; }
.viewer { border: 0; width: 100%; height: auto; flex: 1 1 0; min-height: 0; display:block; }
.topbar { display:flex; align-items:center; gap:8px; padding:8px 12px; border-bottom:1px solid #eee; }
.topbar .title { font-weight:600; flex:1; overflow:hidden; text-overflow:ellipsis; white-space:nowrap; }
.topbar a { font-size:13px; color:#0366d6; text-decoration:none; }
</style>
<script>
async function loadCatalog() {
const res = await fetch('docs_index.json');
const data = await res.json();
// sort by path then title
data.sort((a,b)=> (a.path < b.path ? -1 : a.path > b.path ? 1 : a.title.localeCompare(b.title)));
return data;
}
const tocCache = new Map();
async function loadTocFor(path) {
if (tocCache.has(path)) return tocCache.get(path);
try {
const res = await fetch(path);
const html = await res.text();
const parser = new DOMParser();
const doc = parser.parseFromString(html, 'text/html');
const hs = Array.from(doc.querySelectorAll('h1,h2,h3,h4,h5'));
const items = [];
for (const h of hs) {
const level = parseInt(h.tagName.substring(1), 10) || 2;
let id = h.id;
if (!id) {
const a = h.querySelector('a[id]');
if (a) id = a.id;
}
if (!id) {
const prev = h.previousElementSibling;
if (prev && prev.id) id = prev.id;
}
const text = (h.textContent || '').trim().replace(/\\s+/g,' ');
if (id && text) items.push({ id, text, level });
}
tocCache.set(path, items);
return items;
} catch (e) {
tocCache.set(path, []);
return [];
}
}
// Global helpers for section flashing inside the iframe
let pendingFlashId = null;
function ensureIframeHighlightStyles(doc){
try {
if (!doc) return;
if (doc.getElementById('flash-highlight-style')) return;
const style = doc.createElement('style');
style.id = 'flash-highlight-style';
style.textContent = `@keyframes flashBg{0%{background:#fff3bf;}50%{background:#ffe066;}100%{background:transparent;}}
.flash-highlight{animation:flashBg 1.6s ease-in-out 1; outline:2px solid #f59e0b; outline-offset:2px;}`;
(doc.head || doc.documentElement).appendChild(style);
} catch {}
}
function flashSectionInIframe(id){
try {
const frame = document.querySelector('iframe.viewer');
if (!frame) return;
const doc = frame.contentDocument;
if (!doc) return;
ensureIframeHighlightStyles(doc);
let el = doc.getElementById(id) || doc.querySelector(`[id="${id}"]`) || doc.querySelector(`a[name="${id}"]`);
if (el && el.tagName && el.tagName.toLowerCase() === 'a' && el.parentElement) {
el = el.parentElement;
}
if (!el) return;
el.classList.remove('flash-highlight');
void el.offsetWidth;
el.classList.add('flash-highlight');
try { el.scrollIntoView({behavior:'smooth', block:'start', inline:'nearest'}); } catch {}
setTimeout(()=>{ try { el.classList.remove('flash-highlight'); } catch {} }, 1600);
} catch {}
}
function renderList(data, container, onSelect) {
container.innerHTML = '';
const groups = new Map();
for (const d of data) {
const top = (d.path.split('/')||[''])[0] || 'Other';
if (!groups.has(top)) groups.set(top, []);
groups.get(top).push(d);
}
const sortedGroups = Array.from(groups.entries()).sort((a,b)=> a[0].localeCompare(b[0]));
for (const [name, docs] of sortedGroups) {
const details = document.createElement('details');
details.className = 'group';
details.open = true;
const summary = document.createElement('summary');
summary.textContent = name + ' (' + docs.length + ')';
details.appendChild(summary);
const items = document.createElement('div');
items.className = 'group-items';
for (const doc of docs.sort((a,b)=> a.title.localeCompare(b.title))) {
const div = document.createElement('div');
div.className = 'item';
div.dataset.path = doc.path;
div.innerHTML = `<strong>${escapeHtml(doc.title)}</strong><small>${escapeHtml(doc.path)}</small>`;
div.onclick = async () => {
onSelect(doc, div);
// Populate section links below the active item
const sec = div.nextSibling && div.nextSibling.classList && div.nextSibling.classList.contains('sections') ? div.nextSibling : document.createElement('div');
sec.className = 'sections';
sec.innerHTML = '<div style="color:#475569;font-size:12px;padding:4px 8px;">Sections</div>';
const toc = await loadTocFor(doc.path);
for (const t of toc.slice(0, 200)) { // guard excessive items
const a = document.createElement('a');
a.className = 'section-link lvl' + Math.min(Math.max((t.level||2)-0,1),4);
a.textContent = t.text;
a.href = '#' + t.id;
a.onclick = (ev) => {
ev.preventDefault();
const iframe = document.querySelector('iframe.viewer');
// Preserve current doc path; update hash
const base = doc.path.split('#')[0];
iframe.src = base + '#' + t.id;
// Highlight selected section
const sibs = Array.from(sec.querySelectorAll('.section-link'));
sibs.forEach(el => el.classList.remove('active'));
a.classList.add('active');
// Prepare/trigger flash in iframe
pendingFlashId = t.id;
setTimeout(() => flashSectionInIframe(t.id), 80);
};
sec.appendChild(a);
}
if (div.nextSibling !== sec) {
div.parentNode.insertBefore(sec, div.nextSibling);
}
};
items.appendChild(div);
// Placeholder for sections below item
const secPh = document.createElement('div');
secPh.className = 'sections';
items.appendChild(secPh);
}
details.appendChild(items);
container.appendChild(details);
}
}
function escapeHtml(s){
return s.replace(/[&<>"']/g, function(c){
return ({
'&': '&amp;',
'<': '&lt;',
'>': '&gt;',
'"': '&quot;',
"'": '&#39;'
})[c];
});
}
window.addEventListener('DOMContentLoaded', async () => {
const listEl = document.querySelector('.list');
const searchEl = document.querySelector('.search input');
const frame = document.querySelector('iframe.viewer');
const titleEl = document.querySelector('.topbar .title');
const openEl = document.querySelector('.topbar .open');
function ensureIframeHighlightStyles(doc){
try {
if (!doc) return;
if (doc.getElementById('flash-highlight-style')) return;
const style = doc.createElement('style');
style.id = 'flash-highlight-style';
style.textContent = `@keyframes flashBg{0%{background:#fff3bf;}50%{background:#ffe066;}100%{background:transparent;}}
.flash-highlight{animation:flashBg 1.6s ease-in-out 1; outline:2px solid #f59e0b; outline-offset:2px;}`;
(doc.head || doc.documentElement).appendChild(style);
} catch {}
}
function flashSectionInIframe(id){
try {
const doc = frame.contentDocument;
if (!doc) return;
ensureIframeHighlightStyles(doc);
let el = doc.getElementById(id) || doc.querySelector(`[id="${id}"]`) || doc.querySelector(`a[name="${id}"]`);
if (el && el.tagName && el.tagName.toLowerCase() === 'a' && el.parentElement) {
el = el.parentElement;
}
if (!el) return;
el.classList.remove('flash-highlight');
void el.offsetWidth; // restart animation
el.classList.add('flash-highlight');
try { el.scrollIntoView({behavior:'smooth', block:'start', inline:'nearest'}); } catch {}
setTimeout(()=>{ try { el.classList.remove('flash-highlight'); } catch {} }, 1600);
} catch {}
}
frame.addEventListener('load', () => {
try {
const hash = (frame.contentWindow && frame.contentWindow.location && frame.contentWindow.location.hash) ? frame.contentWindow.location.hash : '';
const id = pendingFlashId || (hash ? hash.substring(1) : '');
if (id) {
setTimeout(() => flashSectionInIframe(id), 60);
}
pendingFlashId = null;
} catch {}
});
const all = await loadCatalog();
let filtered = all.slice();
let activeEl = null;
const select = (doc, el) => {
if (activeEl) activeEl.classList.remove('active');
activeEl = el; if (activeEl) activeEl.classList.add('active');
frame.src = doc.path; titleEl.textContent = doc.title; openEl.href = doc.path;
};
const applyFilter = () => {
const q = searchEl.value.toLowerCase().trim();
if (!q) { filtered = all.slice(); }
else {
filtered = all.filter(d => d.title.toLowerCase().includes(q) || d.path.toLowerCase().includes(q));
}
renderList(filtered, listEl, select);
};
searchEl.addEventListener('input', applyFilter);
applyFilter();
if (filtered.length) {
// Auto-select first doc
const firstItem = listEl.querySelector('.item');
if (firstItem) firstItem.click();
}
});
</script>
</head>
<body>
<div class=\"layout\">
<div class=\"sidebar\">
<div class=\"search\"><input type=\"search\" placeholder=\"Search title or path...\" /><div class=\"count\"></div></div>
<div class=\"list\"></div>
</div>
<div class=\"main\">
<div class=\"topbar\"><div class=\"title\">Select a document</div><a class=\"open-src\" target=\"_blank\" href=\"#\">Open source</a><a class=\"open\" target=\"_blank\" href=\"#\">Open doc</a><button class=\"theme\" type=\"button\">Theme: System</button></div>
<iframe class=\"viewer\"></iframe>
</div>
</div>
</body>
</html>
"""
(dst_root / 'index.html').write_text(html, encoding='utf-8')
def main():
parser = argparse.ArgumentParser(description='Convert LaTeX documents in a folder to HTML.')
parser.add_argument('--src', default='tmp_Docs', help='Source directory with .tex files (default: tmp_Docs)')
parser.add_argument('--dst', default='tmp_Docs_html', help='Output directory for HTML (default: tmp_Docs_html)')
parser.add_argument('--all-tex', action='store_true', help='Attempt to convert every .tex file (not only those with \\documentclass)')
parser.add_argument('--clean', action='store_true', help='Clean each output subfolder before converting')
parser.add_argument('--math-output', choices=['mathml', 'mathjax'], default='mathjax',
help="Math format for make4ht: 'mathml' (generate MathML; rendered via MathJax SVG) or 'mathjax' (keep TeX; rendered via MathJax SVG)")
# Shell-escape is needed for packages like minted that spawn external tools
parser.add_argument('--shell-escape', dest='shell_escape', action='store_true', help='Enable -shell-escape for LaTeX runs')
parser.add_argument('--no-shell-escape', dest='shell_escape', action='store_false', help='Disable -shell-escape for LaTeX runs')
parser.set_defaults(shell_escape=True)
parser.add_argument('--engine', choices=['pdflatex','lualatex','xelatex'], default='pdflatex', help='LaTeX engine to use')
parser.add_argument('--postbuild-fix-images', action='store_true',
help='Run post-build image fixer to copy/normalize equation PNGs referenced by *_htmlwrap.html')
args = parser.parse_args()
src_root = Path(args.src).resolve()
if not src_root.exists():
# Be a bit forgiving for case: try tmp_Docs/tmp_docs
alt = Path('tmp_docs')
if alt.exists():
src_root = alt.resolve()
else:
print(f"Source directory not found: {args.src}", file=sys.stderr)
sys.exit(2)
dst_root = Path(args.dst).resolve()
dst_root.mkdir(parents=True, exist_ok=True)
tex_files = sorted(src_root.rglob('*.tex'))
if not tex_files:
print('No .tex files found.', file=sys.stderr)
sys.exit(1)
print(f"Found {len(tex_files)} .tex files under {src_root}")
converted = 0
failed = 0
skipped = 0
results = []
for tex in tex_files:
is_main = detect_main_tex(tex)
if not (args.all_tex or is_main):
skipped += 1
continue
print(f"Converting: {tex.relative_to(src_root)}")
ok, summary, html_path = convert_one(tex, src_root, dst_root, clean=args.clean, math_output=args.math_output, shell_escape=args.shell_escape, engine=args.engine)
results.append((tex, ok, summary, html_path))
if ok:
converted += 1
print(f" ✓ Success via {summary}. Output: {html_path.relative_to(dst_root)}")
else:
failed += 1
print(f" ✗ Failed ({summary}). See build.log under {html_path.relative_to(dst_root)}")
print('\nConversion summary:')
print(f" Converted: {converted}")
print(f" Failed: {failed}")
print(f" Skipped: {skipped}")
# Write viewer and catalog for easy navigation
try:
count = _write_catalog(dst_root, results)
_write_viewer(dst_root)
print(f"\nWrote catalog with {count} entries: {dst_root / 'docs_index.json'}")
print(f"Open the viewer: {dst_root / 'index.html'}")
except Exception as e:
print(f"Warning: failed to write viewer/catalog: {e}", file=sys.stderr)
# Optional: run post-build image fixer
if args.postbuild_fix_images:
try:
repo_root = Path(__file__).resolve().parent
fixer = repo_root / 'scripts' / 'post_build_fix_images.py'
if fixer.exists():
print("\nRunning post-build image fixer...")
subprocess.run([sys.executable, str(fixer)], check=False)
else:
print(f"post-build fixer not found at {fixer}")
except Exception as e:
print(f"Warning: post-build fixer failed: {e}", file=sys.stderr)
# Provide a helpful exit code
sys.exit(0 if failed == 0 else 1)
if __name__ == '__main__':
main()
PY ?= python3
.PHONY: postbuild-fix-images
postbuild-fix-images:
$(PY) scripts/post_build_fix_images.py
#!/usr/bin/env python3
"""
Post-build fixer for HTML image assets produced by tex4ht.
It performs two fixes:
1) Copies math/equation PNGs alongside each *_htmlwrap.html, renaming from
<base>{Nx}.png (in tmp_Docs/<Module>/) to <base>_htmlwrap{Nx}.png as referenced
by the generated HTML under tmp_Docs_html.
2) Normalizes case-only mismatches (e.g. .PNG vs .png) by creating a lowercase
copy next to the HTML when the referenced image is missing but a case-variant
exists.
Safe to run multiple times; it only copies missing targets.
"""
from __future__ import annotations
import os
import re
import shutil
from pathlib import Path
RE_HTMLWRAP_IMG = re.compile(r"([A-Za-z0-9_]+)_htmlwrap([0-9]+x)\.png$", re.I)
def fix_equation_images(html_base: Path, docs_base: Path) -> int:
"""Copy <base>{Nx}.png from tmp_Docs/<Module>/ to match *_htmlwrap{Nx}.png
next to each *_htmlwrap.html under tmp_Docs_html.
Returns number of files copied.
"""
copied = 0
for html in html_base.rglob("*_htmlwrap.html"):
html_dir = html.parent
try:
rel = html.relative_to(html_base)
except Exception:
continue
parts = rel.parts
if not parts:
continue
module = parts[0]
# Read HTML and find PNG references
try:
txt = html.read_text(errors="ignore")
except Exception:
continue
for m in re.finditer(r'src="([^"]+\.png)"', txt, flags=re.I):
img = m.group(1)
base = os.path.basename(img)
m2 = RE_HTMLWRAP_IMG.search(base)
if not m2:
continue
base_name = m2.group(1)
suffix = m2.group(2) # like 0x, 1x, ...
dst = html_dir / f"{base_name}_htmlwrap{suffix}.png"
if dst.exists():
continue
# Source: tmp_Docs/<Module>/<base>{Nx}.png
src = docs_base / module / f"{base_name}{suffix}.png"
if src.exists():
dst.parent.mkdir(parents=True, exist_ok=True)
shutil.copyfile(src, dst)
copied += 1
return copied
def fix_case_mismatches(html_base: Path) -> int:
"""For each *_htmlwrap.html, if a referenced .png is missing, search for a
file in the same directory with the same basename but different case (e.g. .PNG)
and copy it to the expected lowercase name.
Returns number of files copied.
"""
copied = 0
for html in html_base.rglob("*_htmlwrap.html"):
html_dir = html.parent
try:
txt = html.read_text(errors="ignore")
except Exception:
continue
for m in re.finditer(r'src="([^"]+\.png)"', txt, flags=re.I):
rel_img = m.group(1)
dst = html_dir / rel_img
if dst.exists():
continue
# Look for case-insensitive match in the same directory
candidate = None
rel_name = os.path.basename(rel_img)
rel_stem, rel_ext = os.path.splitext(rel_name)
for entry in html_dir.glob("*"):
if not entry.is_file():
continue
if entry.name.lower() == rel_name.lower():
candidate = entry
break
if candidate and candidate != dst:
dst.parent.mkdir(parents=True, exist_ok=True)
shutil.copyfile(candidate, dst)
copied += 1
return copied
def main() -> None:
repo_root = Path(__file__).resolve().parents[1]
html_base = repo_root / "tmp_Docs_html"
docs_base = repo_root / "tmp_Docs"
if not html_base.exists():
print(f"No {html_base} found; nothing to do.")
return
total = 0
total += fix_equation_images(html_base, docs_base)
total += fix_case_mismatches(html_base)
print(f"post_build_fix_images: copied {total} files")
if __name__ == "__main__":
main()

Setup

This project converts LaTeX documents to HTML using tex4ht tooling. It relies on a LaTeX distribution with tex4ht and (optionally) the minted package, plus Pygments for code highlighting.

Below are concise, platform-specific install instructions and a complete dependency checklist.

Quick Install

  • Debian/Ubuntu (recommended minimal set):

    • sudo apt update
    • sudo apt install -y texlive texlive-latex-recommended texlive-latex-extra texlive-pictures texlive-fonts-recommended texlive-plain-generic texlive-tex4ht texlive-xetex texlive-luatex
    • sudo apt install -y python3-pygments # provides pygmentize
  • Debian/Ubuntu (easy, larger install):

    • sudo apt install -y texlive-full
    • sudo apt install -y python3-pygments
  • Fedora:

    • sudo dnf install -y texlive-scheme-medium texlive-tex4ht texlive-minted texlive-collection-latexrecommended texlive-collection-latexextra texlive-collection-pictures python3-pygments
  • Arch Linux:

    • sudo pacman -S --needed texlive-most texlive-langenglish python-pygments
  • macOS (Homebrew):

    • brew install mactex # large; or MacTeX Basic + selected packages
    • brew install pygments
    • Ensure /Library/TeX/texbin is in PATH (for TeX tools)
  • Windows:

    • Install MiKTeX (with tex4ht and minted packages) and ensure miktex/bin is in PATH
    • Install Python and Pygments: py -m pip install --user Pygments (adds pygmentize in %USERPROFILE%\AppData\Roaming\Python\PythonXY\Scripts)
    • Make sure the Scripts folder is in your PATH

Dependency Checklist

Required command-line tools:

  • python3 (3.8+)
  • make4ht (tex4ht wrapper)
  • htlatex (tex4ht legacy wrapper)
  • pdflatex (or lualatex, xelatex if you choose those engines)

Required LaTeX packages (commonly available via TeX Live/MiKTeX collections):

  • Core: article, hyperref, graphicx, xcolor, amsmath, amssymb, amsfonts, longtable, supertabular
  • HTML conversion: tex4ht and friends (installed via texlive-tex4ht, provides make4ht, htlatex, t4ht)
  • Code listings: minted (optional but used by these documents)
    • Requires -shell-escape enabled in LaTeX runs
    • Requires pygmentize on PATH
  • Other commonly referenced packages in the docs: listings, floatrow, caption, siunitx, tabu, pbox, booktabs, todonotes, tikz/pgf, epstopdf

Optional/conditional tools:

  • pygmentize (from Python Pygments). Install via your OS package manager (python3-pygments on Debian/Ubuntu) or pip install --user Pygments. Ensure ~/.local/bin (Linux), the Python Scripts directory (Windows), or Homebrew prefix (macOS) is in PATH.
  • gnuplot (only if the source documents actually use gnuplottex; many docs include it but don’t necessarily execute it during HTML builds).
  • Ghostscript (for epstopdf if EPS images must be converted).

PATH and Shell Escape

  • Ensure the directory containing pygmentize is in your PATH. On Linux, this is commonly ~/.local/bin when installed via pip --user.
  • The converter enables -shell-escape for LaTeX runs by default. If your TeX distribution forbids it, allow shell escape or pre-generate minted caches by running pdflatex -shell-escape once on the source .tex files.

Network Access Notes

  • When make4ht succeeds, the converter injects MathJax via a CDN for rendering math in the browser. Viewing those HTML pages with math requires network access. If network access is unavailable, the fallback htlatex output often includes math as images and is viewable offline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment