Skip to content

Instantly share code, notes, and snippets.

@vlad-ds
Last active January 5, 2026 19:49
Show Gist options
  • Select an option

  • Save vlad-ds/9d2982583a8181729721fecd0bb45a20 to your computer and use it in GitHub Desktop.

Select an option

Save vlad-ds/9d2982583a8181729721fecd0bb45a20 to your computer and use it in GitHub Desktop.
PowerPoint Translation Guide - Translating PPTX files while preserving formatting using Anthropic's PPTX skill

PowerPoint Translation Guide

Overview

This document captures lessons learned from translating German PowerPoint presentations to English while preserving formatting.

Prerequisites: PPTX Skill

This guide uses the PPTX skill from Anthropic's official Claude Code skills, which were open-sourced by Anthropic. The skill provides scripts for unpacking, validating, and repacking PPTX files by manipulating the underlying Office Open XML (OOXML) format.

Install location: ~/.claude/skills/pptx/

Key scripts used:

  • ooxml/scripts/unpack.py - Extract PPTX to XML files
  • ooxml/scripts/validate.py - Validate XML structure
  • ooxml/scripts/pack.py - Repack XML to PPTX

Key Principles

1. Scripts Cannot See Rendered Output - Visual Inspection is MANDATORY

Critical insight: A translation script can only replace text in XML. It cannot:

  • See how text actually renders in the slide
  • Detect text overflow, wrapping, or overlap
  • Know if translated text fits in fixed-width boxes
  • Identify visual collisions between elements

The only reliable way to catch rendering issues is visual inspection of actual rendered slides.

2. Text Length Awareness

  • English translations are often longer than German originals
  • Fixed-width text boxes cause overflow/wrapping when translation is longer
  • Script/decorative fonts are especially problematic due to larger character width

3. Embedded Images vs Editable Text

Critical: Not all text in a slide is editable. Some content is embedded as images:

  • Charts - Often embedded as images with labels baked in
  • Timelines - Frequently embedded infographics
  • Infographics - Complex graphics with text inside images
  • Screenshots - Social media posts, news articles, etc.

How to identify embedded images:

  1. Search the slide XML for the German text - if not found, it's an embedded image
  2. Look for <p:pic> elements with title="Chart" or similar
  3. Check unpacked/ppt/media/ for image files

Acceptable to leave as-is: German text in embedded images cannot be translated through XML editing and should be accepted.

4. Translation Process

Step 1: Set Up Virtual Environment

uv venv && source .venv/bin/activate
uv pip install defusedxml

Step 2: Unpack the PPTX

python ~/.claude/skills/pptx/ooxml/scripts/unpack.py input.pptx unpacked

Step 3: Extract All Text

# Use markitdown to extract text content
python -m markitdown input.pptx

Or extract from XML directly:

import re
with open('unpacked/ppt/slides/slide1.xml', 'r') as f:
    content = f.read()
texts = re.findall(r'<a:t>([^<]+)</a:t>', content)

Step 4: Create Translations

  • Translate text while being mindful of character length
  • For script/decorative text, use SHORTER translations that fit the same space
  • For headings in fixed-width boxes, abbreviate if necessary
  • Include full context strings (e.g., "09:00–09:30 Anmeldung" not just "Anmeldung")

Step 5: Apply Translations

Replace text within <a:t>...</a:t> tags using regex:

pattern = re.compile(r'(<a:t>)(' + re.escape(german) + r')(</a:t>)')
content = pattern.sub(r'\g<1>' + english + r'\g<3>', content)

Step 6: Adjust Text Box Widths (if needed)

For text that overflows, find the shape in XML and increase cx (width):

<a:ext cx="1520400" cy="708000"/>  <!-- Increase cx value -->

Step 7: Validate and Repack

python ~/.claude/skills/pptx/ooxml/scripts/validate.py unpacked --original input.pptx
python ~/.claude/skills/pptx/ooxml/scripts/pack.py unpacked output.pptx

Step 8: ALWAYS Verify Visually

Critical: After translation, ALWAYS verify by:

  1. Converting both original and translated to images
  2. Comparing all slides side-by-side
  3. Use parallel Sonnet subagents to check batches of slides
  4. Manually verify any reported issues - agents may produce false positives
# Generate individual slide images
soffice --headless --convert-to pdf --outdir slides_translated output.pptx
/opt/homebrew/bin/pdftoppm -jpeg -r 150 slides_translated/output.pdf slides_translated/slide

Verification Workflow

Parallel Agent Verification

Launch 4 parallel Sonnet agents to verify slides in batches:

  • Agent 1: Slides 1-10
  • Agent 2: Slides 11-20
  • Agent 3: Slides 21-30
  • Agent 4: Slides 31-40

Important instructions for agents:

  1. Embedded images with German text are ACCEPTABLE - do not flag
  2. Only report issues with EDITABLE text
  3. Check for: untranslated text, text overflow, text overlap, formatting issues

Agent Verification Caveats

Agents may produce false positives:

  • Confusing embedded images with editable text
  • Misreading translated text as German
  • Looking at wrong slides

Always manually verify reported issues by reading the actual slide images before fixing.

Fix-Verify Loop

Continue the loop until zero issues:

  1. Fix reported issues
  2. Re-unpack, re-apply translations, re-pack
  3. Re-generate slide images
  4. Re-verify with parallel agents
  5. Manually confirm any reported issues
  6. Repeat until clean

Manual Fix Workflow (Post-Translation)

After running translation scripts, you MUST:

Step 1: Generate Slide Images

soffice --headless --convert-to pdf output.pptx
pdftoppm -jpeg -r 150 output.pdf workspace/slides/slide

Step 2: Visually Inspect EVERY Slide

Look at each rendered slide image for:

  • Text overflow/wrapping causing overlap
  • Decorative text colliding with headers or content
  • Text cut off at boundaries
  • Missing translations

Step 3: Fix Rendering Issues by Editing XML Directly

For each visual issue found:

  1. Identify the shape in the slide XML by searching for the text
  2. Find position/size attributes:
    • <a:off x="..." y="..."/> - position
    • <a:ext cx="..." cy="..."/> - size (width, height)
  3. Edit values to fix the issue:
    • Move text: change x, y values
    • Widen box: increase cx value
    • Shorten translation if positioning doesn't help

Step 4: Rebuild and Re-verify

python ~/.claude/skills/pptx/ooxml/scripts/pack.py unpacked output.pptx
soffice --headless --convert-to pdf output.pptx
pdftoppm -jpeg -r 150 -f N -l N output.pdf workspace/slides/slide  # specific slides

Real Examples from This Project

Slide Problem XML Fix
9 "nearly every" wrapped, overlapped "fourth child" Changed text to "nearly", widened cx from 1520400 to 2200000
36 "Fake signs" overlapped with "FAKES" header Moved x from 1121076 to 200000, y from 634160 to 1250000
37 "If suspected fake:" overlapped with "FAKES" Shortened to "Check:", moved x to 150000, y to 700000

Common Formatting Issues

Issue 1: Text Wrapping in Fixed Boxes

Symptom: Words wrap to next line, overlapping other content Solution:

  • Use shorter translation
  • OR increase text box width (cx attribute)
  • OR shift text box position (x attribute)
  • OR enable text autofit in XML

Issue 2: Script/Decorative Text Overlap

Symptom: Handwritten-style text overlaps with other elements Solution:

  • Use significantly shorter translations
  • "Vorschläge" → "Ideas" (not "Suggestions")
  • "Antworten zum Wissenspiel" → "Quiz answers" (not "Answers to the knowledge game")

Issue 3: Title Text Too Long

Symptom: Titles escape boundaries or get cut off Solution:

  • Shorten translation
  • Adjust text box position/width in XML

Issue 4: Missing Text After Translation

Symptom: Some text doesn't appear in translated version Cause: Translation pattern not matched (often due to HTML entities or context) Solution:

  • Include HTML entity variants in translation dict
  • Include full context strings (with time prefixes, punctuation, etc.)

Issue 5: Schedule/List Items Not Matched

Symptom: Items like "09:00–09:30 Anmeldung" not translated Cause: Translation dict only has "Anmeldung" without time prefix Solution: Add full string with time prefix:

"09:00&#8211;09:30 Anmeldung": "09:00&#8211;09:30 Registration",

HTML Entities Reference

  • &#228; = ä
  • &#246; = ö
  • &#252; = ü
  • &#223; = ß
  • &#8211; = – (en-dash)
  • &#8222; = „ (German opening quote)
  • &#8220; = " (closing quote)
  • &#160; = non-breaking space

Text Box XML Structure

<p:sp>
  <p:nvSpPr>
    <p:cNvPr id="578" name="Google Shape;578;p79"/>
  </p:nvSpPr>
  <p:spPr>
    <a:xfrm>
      <a:off x="3678913" y="1651655"/>  <!-- Position -->
      <a:ext cx="1520400" cy="708000"/> <!-- Size: width, height -->
    </a:xfrm>
  </p:spPr>
  <p:txBody>
    <a:bodyPr>
      <a:noAutofit/>  <!-- Change to <a:normAutofit/> for auto-shrink -->
    </a:bodyPr>
    <a:p>
      <a:r>
        <a:t>Text content here</a:t>
      </a:r>
    </a:p>
  </p:txBody>
</p:sp>

Embedded Image Detection

<!-- Charts are often embedded images -->
<p:pic>
  <p:nvPicPr>
    <p:cNvPr id="601" name="Google Shape;601;p81" title="Chart"/>
  </p:nvPicPr>
  <p:blipFill>
    <a:blip r:embed="rId6"/>  <!-- Reference to image file -->
  </p:blipFill>
</p:pic>

Autofit Options

  • <a:noAutofit/> - No auto-sizing (default, text can overflow)
  • <a:normAutofit/> - Shrink text to fit
  • <a:spAutoFit/> - Expand shape to fit text

Length-Aware Translation Examples

German Too Long (causes overflow) Short Enough
fast jedes almost every / nearly every nearly
Vorschläge Suggestions Ideas
Antworten zum Wissenspiel Answers to the knowledge game Quiz answers
Bei Verdacht auf Fake prüfen When suspecting a fake, check / If fake suspected, check Check:
Merkmale von Fakes Characteristics of Fakes / Fake characteristics Fake signs
für die aktive Teilnahme for your active participation for your active participation!

Key insight: When decorative/script fonts are involved, even "reasonable" translations may be too long. Be aggressive with shortening.

Verification Checklist

After translation, check EVERY slide for:

  • Text overflow/wrapping
  • Text escaping boundaries
  • Missing text (not translated)
  • Overlapping text
  • Layout shifts
  • Font rendering issues
  • Embedded images (acceptable to have German)

Tools Used

  • markitdown: Extract text from PPTX
  • LibreOffice soffice: Convert PPTX to PDF
  • pdftoppm: Convert PDF pages to images (use /opt/homebrew/bin/pdftoppm on macOS)
  • PPTX skill scripts: unpack.py, validate.py, pack.py, thumbnail.py
  • uv: Python virtual environment manager

macOS-Specific Notes

  • pdftoppm path: /opt/homebrew/bin/pdftoppm
  • LibreOffice: soffice --headless --convert-to pdf
  • Use virtual environment due to externally-managed-environment restriction:
    uv venv && source .venv/bin/activate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment