PowerPoint Translation Guide

Overview

This document captures lessons learned from translating German PowerPoint presentations to English while preserving formatting.

Prerequisites: PPTX Skill

This guide uses the PPTX skill from Anthropic's official Claude Code skills, which were open-sourced by Anthropic. The skill provides scripts for unpacking, validating, and repacking PPTX files by manipulating the underlying Office Open XML (OOXML) format.

Install location: ~/.claude/skills/pptx/

Key scripts used:

ooxml/scripts/unpack.py - Extract PPTX to XML files
ooxml/scripts/validate.py - Validate XML structure
ooxml/scripts/pack.py - Repack XML to PPTX

Key Principles

1. Scripts Cannot See Rendered Output - Visual Inspection is MANDATORY

Critical insight: A translation script can only replace text in XML. It cannot:

See how text actually renders in the slide
Detect text overflow, wrapping, or overlap
Know if translated text fits in fixed-width boxes
Identify visual collisions between elements

The only reliable way to catch rendering issues is visual inspection of actual rendered slides.

2. Text Length Awareness

English translations are often longer than German originals
Fixed-width text boxes cause overflow/wrapping when translation is longer
Script/decorative fonts are especially problematic due to larger character width

3. Embedded Images vs Editable Text

Critical: Not all text in a slide is editable. Some content is embedded as images:

Charts - Often embedded as images with labels baked in
Timelines - Frequently embedded infographics
Infographics - Complex graphics with text inside images
Screenshots - Social media posts, news articles, etc.

How to identify embedded images:

Search the slide XML for the German text - if not found, it's an embedded image
Look for <p:pic> elements with title="Chart" or similar
Check unpacked/ppt/media/ for image files

Acceptable to leave as-is: German text in embedded images cannot be translated through XML editing and should be accepted.

4. Translation Process

Step 1: Set Up Virtual Environment

uv venv && source .venv/bin/activate
uv pip install defusedxml

Step 2: Unpack the PPTX

python ~/.claude/skills/pptx/ooxml/scripts/unpack.py input.pptx unpacked

Step 3: Extract All Text

# Use markitdown to extract text content
python -m markitdown input.pptx

Or extract from XML directly:

import re
with open('unpacked/ppt/slides/slide1.xml', 'r') as f:
    content = f.read()
texts = re.findall(r'<a:t>([^<]+)</a:t>', content)

Step 4: Create Translations

Translate text while being mindful of character length
For script/decorative text, use SHORTER translations that fit the same space
For headings in fixed-width boxes, abbreviate if necessary
Include full context strings (e.g., "09:00–09:30 Anmeldung" not just "Anmeldung")

Step 5: Apply Translations

Replace text within <a:t>...</a:t> tags using regex:

pattern = re.compile(r'(<a:t>)(' + re.escape(german) + r')(</a:t>)')
content = pattern.sub(r'\g<1>' + english + r'\g<3>', content)

Step 6: Adjust Text Box Widths (if needed)

For text that overflows, find the shape in XML and increase cx (width):

<a:ext cx="1520400" cy="708000"/>  <!-- Increase cx value -->

Step 7: Validate and Repack

python ~/.claude/skills/pptx/ooxml/scripts/validate.py unpacked --original input.pptx
python ~/.claude/skills/pptx/ooxml/scripts/pack.py unpacked output.pptx

Step 8: ALWAYS Verify Visually

Critical: After translation, ALWAYS verify by:

Converting both original and translated to images
Comparing all slides side-by-side
Use parallel Sonnet subagents to check batches of slides
Manually verify any reported issues - agents may produce false positives

# Generate individual slide images
soffice --headless --convert-to pdf --outdir slides_translated output.pptx
/opt/homebrew/bin/pdftoppm -jpeg -r 150 slides_translated/output.pdf slides_translated/slide

Verification Workflow

Parallel Agent Verification

Launch 4 parallel Sonnet agents to verify slides in batches:

Agent 1: Slides 1-10
Agent 2: Slides 11-20
Agent 3: Slides 21-30
Agent 4: Slides 31-40

Important instructions for agents:

Embedded images with German text are ACCEPTABLE - do not flag
Only report issues with EDITABLE text
Check for: untranslated text, text overflow, text overlap, formatting issues

Agent Verification Caveats

Agents may produce false positives:

Confusing embedded images with editable text
Misreading translated text as German
Looking at wrong slides

Always manually verify reported issues by reading the actual slide images before fixing.

Fix-Verify Loop

Continue the loop until zero issues:

Fix reported issues
Re-unpack, re-apply translations, re-pack
Re-generate slide images
Re-verify with parallel agents
Manually confirm any reported issues
Repeat until clean

Manual Fix Workflow (Post-Translation)

After running translation scripts, you MUST:

Step 1: Generate Slide Images

soffice --headless --convert-to pdf output.pptx
pdftoppm -jpeg -r 150 output.pdf workspace/slides/slide

Step 2: Visually Inspect EVERY Slide

Look at each rendered slide image for:

Text overflow/wrapping causing overlap
Decorative text colliding with headers or content
Text cut off at boundaries
Missing translations

Step 3: Fix Rendering Issues by Editing XML Directly

For each visual issue found:

Identify the shape in the slide XML by searching for the text
Find position/size attributes:
- <a:off x="..." y="..."/> - position
- <a:ext cx="..." cy="..."/> - size (width, height)
Edit values to fix the issue:
- Move text: change x, y values
- Widen box: increase cx value
- Shorten translation if positioning doesn't help

Step 4: Rebuild and Re-verify

python ~/.claude/skills/pptx/ooxml/scripts/pack.py unpacked output.pptx
soffice --headless --convert-to pdf output.pptx
pdftoppm -jpeg -r 150 -f N -l N output.pdf workspace/slides/slide  # specific slides

Real Examples from This Project

Slide	Problem	XML Fix
9	"nearly every" wrapped, overlapped "fourth child"	Changed text to "nearly", widened cx from 1520400 to 2200000
36	"Fake signs" overlapped with "FAKES" header	Moved x from 1121076 to 200000, y from 634160 to 1250000
37	"If suspected fake:" overlapped with "FAKES"	Shortened to "Check:", moved x to 150000, y to 700000

Common Formatting Issues

Issue 1: Text Wrapping in Fixed Boxes

Symptom: Words wrap to next line, overlapping other content Solution:

Use shorter translation
OR increase text box width (cx attribute)
OR shift text box position (x attribute)
OR enable text autofit in XML

Issue 2: Script/Decorative Text Overlap

Symptom: Handwritten-style text overlaps with other elements Solution:

Use significantly shorter translations
"Vorschläge" → "Ideas" (not "Suggestions")
"Antworten zum Wissenspiel" → "Quiz answers" (not "Answers to the knowledge game")

Issue 3: Title Text Too Long

Symptom: Titles escape boundaries or get cut off Solution:

Shorten translation
Adjust text box position/width in XML

Issue 4: Missing Text After Translation

Symptom: Some text doesn't appear in translated version Cause: Translation pattern not matched (often due to HTML entities or context) Solution:

Include HTML entity variants in translation dict
Include full context strings (with time prefixes, punctuation, etc.)

Issue 5: Schedule/List Items Not Matched

Symptom: Items like "09:00–09:30 Anmeldung" not translated Cause: Translation dict only has "Anmeldung" without time prefix Solution: Add full string with time prefix:

"09:00&#8211;09:30 Anmeldung": "09:00&#8211;09:30 Registration",

HTML Entities Reference

ä = ä
ö = ö
ü = ü
ß = ß
– = – (en-dash)
„ = „ (German opening quote)
“ = " (closing quote)
  = non-breaking space

Text Box XML Structure

<p:sp>
  <p:nvSpPr>
    <p:cNvPr id="578" name="Google Shape;578;p79"/>
  </p:nvSpPr>
  <p:spPr>
    <a:xfrm>
      <a:off x="3678913" y="1651655"/>  <!-- Position -->
      <a:ext cx="1520400" cy="708000"/> <!-- Size: width, height -->
    </a:xfrm>
  </p:spPr>
  <p:txBody>
    <a:bodyPr>
      <a:noAutofit/>  <!-- Change to <a:normAutofit/> for auto-shrink -->
    </a:bodyPr>
    <a:p>
      <a:r>
        <a:t>Text content here</a:t>
      </a:r>
    </a:p>
  </p:txBody>
</p:sp>

Embedded Image Detection

<!-- Charts are often embedded images -->
<p:pic>
  <p:nvPicPr>
    <p:cNvPr id="601" name="Google Shape;601;p81" title="Chart"/>
  </p:nvPicPr>
  <p:blipFill>
    <a:blip r:embed="rId6"/>  <!-- Reference to image file -->
  </p:blipFill>
</p:pic>

Autofit Options

<a:noAutofit/> - No auto-sizing (default, text can overflow)
<a:normAutofit/> - Shrink text to fit
<a:spAutoFit/> - Expand shape to fit text

Length-Aware Translation Examples

German	Too Long (causes overflow)	Short Enough
fast jedes	almost every / nearly every	nearly
Vorschläge	Suggestions	Ideas
Antworten zum Wissenspiel	Answers to the knowledge game	Quiz answers
Bei Verdacht auf Fake prüfen	When suspecting a fake, check / If fake suspected, check	Check:
Merkmale von Fakes	Characteristics of Fakes / Fake characteristics	Fake signs
für die aktive Teilnahme	for your active participation	for your active participation!

Key insight: When decorative/script fonts are involved, even "reasonable" translations may be too long. Be aggressive with shortening.

Verification Checklist

After translation, check EVERY slide for:

Tools Used

markitdown: Extract text from PPTX
LibreOffice soffice: Convert PPTX to PDF
pdftoppm: Convert PDF pages to images (use /opt/homebrew/bin/pdftoppm on macOS)
PPTX skill scripts: unpack.py, validate.py, pack.py, thumbnail.py
uv: Python virtual environment manager

macOS-Specific Notes

pdftoppm path: /opt/homebrew/bin/pdftoppm
LibreOffice: soffice --headless --convert-to pdf
Use virtual environment due to externally-managed-environment restriction:
```
uv venv && source .venv/bin/activate
```

vlad-ds/CLAUDE.md