This document captures lessons learned from translating German PowerPoint presentations to English while preserving formatting.
This guide uses the PPTX skill from Anthropic's official Claude Code skills, which were open-sourced by Anthropic. The skill provides scripts for unpacking, validating, and repacking PPTX files by manipulating the underlying Office Open XML (OOXML) format.
Install location: ~/.claude/skills/pptx/
Key scripts used:
ooxml/scripts/unpack.py- Extract PPTX to XML filesooxml/scripts/validate.py- Validate XML structureooxml/scripts/pack.py- Repack XML to PPTX
Critical insight: A translation script can only replace text in XML. It cannot:
- See how text actually renders in the slide
- Detect text overflow, wrapping, or overlap
- Know if translated text fits in fixed-width boxes
- Identify visual collisions between elements
The only reliable way to catch rendering issues is visual inspection of actual rendered slides.
- English translations are often longer than German originals
- Fixed-width text boxes cause overflow/wrapping when translation is longer
- Script/decorative fonts are especially problematic due to larger character width
Critical: Not all text in a slide is editable. Some content is embedded as images:
- Charts - Often embedded as images with labels baked in
- Timelines - Frequently embedded infographics
- Infographics - Complex graphics with text inside images
- Screenshots - Social media posts, news articles, etc.
How to identify embedded images:
- Search the slide XML for the German text - if not found, it's an embedded image
- Look for
<p:pic>elements withtitle="Chart"or similar - Check
unpacked/ppt/media/for image files
Acceptable to leave as-is: German text in embedded images cannot be translated through XML editing and should be accepted.
uv venv && source .venv/bin/activate
uv pip install defusedxmlpython ~/.claude/skills/pptx/ooxml/scripts/unpack.py input.pptx unpacked# Use markitdown to extract text content
python -m markitdown input.pptxOr extract from XML directly:
import re
with open('unpacked/ppt/slides/slide1.xml', 'r') as f:
content = f.read()
texts = re.findall(r'<a:t>([^<]+)</a:t>', content)- Translate text while being mindful of character length
- For script/decorative text, use SHORTER translations that fit the same space
- For headings in fixed-width boxes, abbreviate if necessary
- Include full context strings (e.g., "09:00–09:30 Anmeldung" not just "Anmeldung")
Replace text within <a:t>...</a:t> tags using regex:
pattern = re.compile(r'(<a:t>)(' + re.escape(german) + r')(</a:t>)')
content = pattern.sub(r'\g<1>' + english + r'\g<3>', content)For text that overflows, find the shape in XML and increase cx (width):
<a:ext cx="1520400" cy="708000"/> <!-- Increase cx value -->python ~/.claude/skills/pptx/ooxml/scripts/validate.py unpacked --original input.pptx
python ~/.claude/skills/pptx/ooxml/scripts/pack.py unpacked output.pptxCritical: After translation, ALWAYS verify by:
- Converting both original and translated to images
- Comparing all slides side-by-side
- Use parallel Sonnet subagents to check batches of slides
- Manually verify any reported issues - agents may produce false positives
# Generate individual slide images
soffice --headless --convert-to pdf --outdir slides_translated output.pptx
/opt/homebrew/bin/pdftoppm -jpeg -r 150 slides_translated/output.pdf slides_translated/slideLaunch 4 parallel Sonnet agents to verify slides in batches:
- Agent 1: Slides 1-10
- Agent 2: Slides 11-20
- Agent 3: Slides 21-30
- Agent 4: Slides 31-40
Important instructions for agents:
- Embedded images with German text are ACCEPTABLE - do not flag
- Only report issues with EDITABLE text
- Check for: untranslated text, text overflow, text overlap, formatting issues
Agents may produce false positives:
- Confusing embedded images with editable text
- Misreading translated text as German
- Looking at wrong slides
Always manually verify reported issues by reading the actual slide images before fixing.
Continue the loop until zero issues:
- Fix reported issues
- Re-unpack, re-apply translations, re-pack
- Re-generate slide images
- Re-verify with parallel agents
- Manually confirm any reported issues
- Repeat until clean
After running translation scripts, you MUST:
soffice --headless --convert-to pdf output.pptx
pdftoppm -jpeg -r 150 output.pdf workspace/slides/slideLook at each rendered slide image for:
- Text overflow/wrapping causing overlap
- Decorative text colliding with headers or content
- Text cut off at boundaries
- Missing translations
For each visual issue found:
- Identify the shape in the slide XML by searching for the text
- Find position/size attributes:
<a:off x="..." y="..."/>- position<a:ext cx="..." cy="..."/>- size (width, height)
- Edit values to fix the issue:
- Move text: change x, y values
- Widen box: increase cx value
- Shorten translation if positioning doesn't help
python ~/.claude/skills/pptx/ooxml/scripts/pack.py unpacked output.pptx
soffice --headless --convert-to pdf output.pptx
pdftoppm -jpeg -r 150 -f N -l N output.pdf workspace/slides/slide # specific slides| Slide | Problem | XML Fix |
|---|---|---|
| 9 | "nearly every" wrapped, overlapped "fourth child" | Changed text to "nearly", widened cx from 1520400 to 2200000 |
| 36 | "Fake signs" overlapped with "FAKES" header | Moved x from 1121076 to 200000, y from 634160 to 1250000 |
| 37 | "If suspected fake:" overlapped with "FAKES" | Shortened to "Check:", moved x to 150000, y to 700000 |
Symptom: Words wrap to next line, overlapping other content Solution:
- Use shorter translation
- OR increase text box width (cx attribute)
- OR shift text box position (x attribute)
- OR enable text autofit in XML
Symptom: Handwritten-style text overlaps with other elements Solution:
- Use significantly shorter translations
- "Vorschläge" → "Ideas" (not "Suggestions")
- "Antworten zum Wissenspiel" → "Quiz answers" (not "Answers to the knowledge game")
Symptom: Titles escape boundaries or get cut off Solution:
- Shorten translation
- Adjust text box position/width in XML
Symptom: Some text doesn't appear in translated version Cause: Translation pattern not matched (often due to HTML entities or context) Solution:
- Include HTML entity variants in translation dict
- Include full context strings (with time prefixes, punctuation, etc.)
Symptom: Items like "09:00–09:30 Anmeldung" not translated Cause: Translation dict only has "Anmeldung" without time prefix Solution: Add full string with time prefix:
"09:00–09:30 Anmeldung": "09:00–09:30 Registration",ä= äö= öü= üß= ß–= – (en-dash)„= „ (German opening quote)“= " (closing quote) = non-breaking space
<p:sp>
<p:nvSpPr>
<p:cNvPr id="578" name="Google Shape;578;p79"/>
</p:nvSpPr>
<p:spPr>
<a:xfrm>
<a:off x="3678913" y="1651655"/> <!-- Position -->
<a:ext cx="1520400" cy="708000"/> <!-- Size: width, height -->
</a:xfrm>
</p:spPr>
<p:txBody>
<a:bodyPr>
<a:noAutofit/> <!-- Change to <a:normAutofit/> for auto-shrink -->
</a:bodyPr>
<a:p>
<a:r>
<a:t>Text content here</a:t>
</a:r>
</a:p>
</p:txBody>
</p:sp><!-- Charts are often embedded images -->
<p:pic>
<p:nvPicPr>
<p:cNvPr id="601" name="Google Shape;601;p81" title="Chart"/>
</p:nvPicPr>
<p:blipFill>
<a:blip r:embed="rId6"/> <!-- Reference to image file -->
</p:blipFill>
</p:pic><a:noAutofit/>- No auto-sizing (default, text can overflow)<a:normAutofit/>- Shrink text to fit<a:spAutoFit/>- Expand shape to fit text
| German | Too Long (causes overflow) | Short Enough |
|---|---|---|
| fast jedes | almost every / nearly every | nearly |
| Vorschläge | Suggestions | Ideas |
| Antworten zum Wissenspiel | Answers to the knowledge game | Quiz answers |
| Bei Verdacht auf Fake prüfen | When suspecting a fake, check / If fake suspected, check | Check: |
| Merkmale von Fakes | Characteristics of Fakes / Fake characteristics | Fake signs |
| für die aktive Teilnahme | for your active participation | for your active participation! |
Key insight: When decorative/script fonts are involved, even "reasonable" translations may be too long. Be aggressive with shortening.
After translation, check EVERY slide for:
- Text overflow/wrapping
- Text escaping boundaries
- Missing text (not translated)
- Overlapping text
- Layout shifts
- Font rendering issues
- Embedded images (acceptable to have German)
markitdown: Extract text from PPTXLibreOffice soffice: Convert PPTX to PDFpdftoppm: Convert PDF pages to images (use/opt/homebrew/bin/pdftoppmon macOS)- PPTX skill scripts: unpack.py, validate.py, pack.py, thumbnail.py
uv: Python virtual environment manager
- pdftoppm path:
/opt/homebrew/bin/pdftoppm - LibreOffice:
soffice --headless --convert-to pdf - Use virtual environment due to externally-managed-environment restriction:
uv venv && source .venv/bin/activate