Skip to content

Instantly share code, notes, and snippets.

@ewlarson
Created December 10, 2025 19:20
Show Gist options
  • Select an option

  • Save ewlarson/9474f690f0d39a9f435cba1eec055b11 to your computer and use it in GitHub Desktop.

Select an option

Save ewlarson/9474f690f0d39a9f435cba1eec055b11 to your computer and use it in GitHub Desktop.
Gemini Prompt — Extract text from maps, determine placenames, guess bbox
Prompt:
Digitized Historical Map OCR + Placenames + Bounding Box + Description
Your task is to analyze a digitized historical map image and produce a high-accuracy extraction of printed textual content, geographic interpretation, and an accessibility description, along with explicit reasoning metadata.
Follow these instructions exactly.
1️⃣ Text Extraction (Printed Text Only)
Focus only on printed text on the map (ignore handwriting unless it clearly functions as a map label).
Extract:
Main title and subtitles
Inset titles and labels
Legend text (category labels, symbols, explanatory notes)
Scale bar text, directional text (e.g., “N”, “North”), coordinate labels
Labels for geographic features, boundaries, regions, cities, waterbodies, etc.
Preserve:
Original spelling, punctuation, capitalization, accents/diacritics
Visible abbreviations
Line breaks only when important for meaning (e.g., multi-line titles)
Exclude:
Archival stickers, barcodes, library stamps, scanner metadata, watermarks that are clearly not part of the original cartographic content.
Do not hallucinate text that is not clearly readable. If partially legible, use a ? placeholder only for uncertain characters.
Represent each extracted text instance as an object in a JSON array under the key text:
JSON
"text": [
{
"content": "Full text exactly as printed",
"approx_bbox": [x_min, y_min, x_max, y_max],
"confidence": 0.0,
"role": "title | subtitle | legend | label | scale | coordinate | directional | other",
"reasoning": "Short explanation of how you read this text and why you assigned this role."
}
]
Notes:
approx_bbox is optional but recommended (0–1 or pixel-based).
confidence is a float (0.0 to 1.0).
reasoning should be concise (1–2 sentences).
2️⃣ Placename Identification
From the extracted text array:
Identify all substrings that appear to be geographic placenames (Countries, cities, waterbodies, mountains, parks, etc.).
Use surrounding context and cartographic conventions to infer validity.
When a text object contains multiple placenames, create multiple entries referencing the same source_text_index.
Store these in a placenames array:
JSON
"placenames": [
{
"name": "Placename exactly as printed",
"type": "unknown | country | region | state_province | county | city | town | village | waterbody | mountain | landmark | other",
"source_text_index": 0,
"confidence": 0.0,
"reasoning": "Why you believe this is a placename and why you chose this type."
}
]
3️⃣ Bounding Box Estimation (Map Extent in Geographic Coordinates)
Infer the approximate geographic bounding box of the map’s main area of interest in decimal degrees.
Use: Explicit coordinate labels, graticule lines, scale bars, or recognizable geographic shapes/placenames.
Prioritize: Explicit labels over visual inference.
Output:
JSON
"map_bbox_estimate": {
"west": -180.0,
"south": -90.0,
"east": 180.0,
"north": 90.0,
"confidence": 0.0,
"method": "explicit_labels | inferred_from_placenames | mixed | rough_contextual_guess",
"reasoning": "Step-by-step explanation of evidence used (coordinates, placenames, etc.)."
}
4️⃣ Accessibility Description
Synthesize a cohesive textual description of the map (maximum 250 words) suitable for low-vision users or as an academic catalog entry.
Tone: Adopt a formal, descriptive tone suitable for an undergraduate research context.
Content Focus:
Composition: Describe the layout, orientation (if not North-up), and visual hierarchy (e.g., "The map is dominated by the blue expanse of Lake Superior on the right...").
Geography: Summarize the primary region depicted, noting major water bodies, urban centers, and transportation networks.
Style: Mention the cartographic style (e.g., cadastral details, grid layouts, color coding for subdivisions, relief techniques).
Metadata: Weave in the title, publisher, and date (if available) naturally.
Avoid listing every single street name; focus on the "big picture" spatial relationships.
Store this as a single string under the key description.
5️⃣ Global Debug / Reasoning Metadata
Include a top-level debug object describing your overall approach:
JSON
"debug": {
"ocr_strategy": "How you segmented text and handled difficult labels.",
"placename_extraction_strategy": "How you distinguished placenames from other text.",
"bbox_inference_strategy": "How you calculated the coordinates.",
"limitations": "Known issues (resolution, damage, density)."
}
Final Output Format (Required)
You must respond only with a single JSON object with this exact top-level structure:
JSON
{
"text": [...],
"placenames": [...],
"map_bbox_estimate": { ... },
"description": "Your ~250 word accessibility description here.",
"debug": { ... }
}
Do not include any natural language explanation outside of this JSON object.
@ewlarson
Copy link
Author

TODO: Should add another step here to extract dominant colors and also to bucket the colors into a pre-defined colorset, like iOS "crayons".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment