ewlarson · December 10, 2025 19:20 · ewlarson · Dec 12, 2025
diff --git a/map_text_prompt.txt b/map_text_prompt.txt
 Prompt:
 Digitized Historical Map OCR + Placenames + Bounding Box + Description
 Your task is to analyze a digitized historical map image and produce a high-accuracy extraction of printed textual content, geographic interpretation, and an accessibility description, along with explicit reasoning metadata.
 Follow these instructions exactly.
 1️⃣ Text Extraction (Printed Text Only)
 Focus only on printed text on the map (ignore handwriting unless it clearly functions as a map label).
 Extract:
 Main title and subtitles
 Inset titles and labels
 Legend text (category labels, symbols, explanatory notes)
 Scale bar text, directional text (e.g., “N”, “North”), coordinate labels
 Labels for geographic features, boundaries, regions, cities, waterbodies, etc.
 Preserve:
 Original spelling, punctuation, capitalization, accents/diacritics
 Visible abbreviations
 Line breaks only when important for meaning (e.g., multi-line titles)
 Exclude:
 Archival stickers, barcodes, library stamps, scanner metadata, watermarks that are clearly not part of the original cartographic content.
 Do not hallucinate text that is not clearly readable. If partially legible, use a ? placeholder only for uncertain characters.
 Represent each extracted text instance as an object in a JSON array under the key text:
 JSON
 "text": [
 {
 "content": "Full text exactly as printed",
 "approx_bbox": [x_min, y_min, x_max, y_max],
 "confidence": 0.0,
 "role": "title | subtitle | legend | label | scale | coordinate | directional | other",
 "reasoning": "Short explanation of how you read this text and why you assigned this role."
 }
 ]
 Notes:
 approx_bbox is optional but recommended (0–1 or pixel-based).
 confidence is a float (0.0 to 1.0).
 reasoning should be concise (1–2 sentences).
 2️⃣ Placename Identification
 From the extracted text array:
 Identify all substrings that appear to be geographic placenames (Countries, cities, waterbodies, mountains, parks, etc.).
 Use surrounding context and cartographic conventions to infer validity.
 When a text object contains multiple placenames, create multiple entries referencing the same source_text_index.
 Store these in a placenames array:
 JSON
 "placenames": [
 {
 "name": "Placename exactly as printed",
 "type": "unknown | country | region | state_province | county | city | town | village | waterbody | mountain | landmark | other",
 "source_text_index": 0,
 "confidence": 0.0,
 "reasoning": "Why you believe this is a placename and why you chose this type."
 }
 ]
 3️⃣ Bounding Box Estimation (Map Extent in Geographic Coordinates)
 Infer the approximate geographic bounding box of the map’s main area of interest in decimal degrees.
 Use: Explicit coordinate labels, graticule lines, scale bars, or recognizable geographic shapes/placenames.
 Prioritize: Explicit labels over visual inference.
 Output:
 JSON
 "map_bbox_estimate": {
 "west": -180.0,
 "south": -90.0,
 "east": 180.0,
 "north": 90.0,
 "confidence": 0.0,
 "method": "explicit_labels | inferred_from_placenames | mixed | rough_contextual_guess",
 "reasoning": "Step-by-step explanation of evidence used (coordinates, placenames, etc.)."
 }
 4️⃣ Accessibility Description
 Synthesize a cohesive textual description of the map (maximum 250 words) suitable for low-vision users or as an academic catalog entry.
 Tone: Adopt a formal, descriptive tone suitable for an undergraduate research context.
 Content Focus:
 Composition: Describe the layout, orientation (if not North-up), and visual hierarchy (e.g., "The map is dominated by the blue expanse of Lake Superior on the right...").
 Geography: Summarize the primary region depicted, noting major water bodies, urban centers, and transportation networks.
 Style: Mention the cartographic style (e.g., cadastral details, grid layouts, color coding for subdivisions, relief techniques).
 Metadata: Weave in the title, publisher, and date (if available) naturally.
 Avoid listing every single street name; focus on the "big picture" spatial relationships.
 Store this as a single string under the key description.
 5️⃣ Global Debug / Reasoning Metadata
 Include a top-level debug object describing your overall approach:
 JSON
 "debug": {
 "ocr_strategy": "How you segmented text and handled difficult labels.",
 "placename_extraction_strategy": "How you distinguished placenames from other text.",
 "bbox_inference_strategy": "How you calculated the coordinates.",
 "limitations": "Known issues (resolution, damage, density)."
 }
 Final Output Format (Required)
 You must respond only with a single JSON object with this exact top-level structure:
 JSON
 {
 "text": [...],
 "placenames": [...],
 "map_bbox_estimate": { ... },
 "description": "Your ~250 word accessibility description here.",
 "debug": { ... }
 }
 Do not include any natural language explanation outside of this JSON object.
	Prompt:
	Digitized Historical Map OCR + Placenames + Bounding Box + Description
	Your task is to analyze a digitized historical map image and produce a high-accuracy extraction of printed textual content, geographic interpretation, and an accessibility description, along with explicit reasoning metadata.
	Follow these instructions exactly.
	1️⃣ Text Extraction (Printed Text Only)
	Focus only on printed text on the map (ignore handwriting unless it clearly functions as a map label).
	Extract:
	Main title and subtitles
	Inset titles and labels
	Legend text (category labels, symbols, explanatory notes)
	Scale bar text, directional text (e.g., “N”, “North”), coordinate labels
	Labels for geographic features, boundaries, regions, cities, waterbodies, etc.
	Preserve:
	Original spelling, punctuation, capitalization, accents/diacritics
	Visible abbreviations
	Line breaks only when important for meaning (e.g., multi-line titles)
	Exclude:
	Archival stickers, barcodes, library stamps, scanner metadata, watermarks that are clearly not part of the original cartographic content.
	Do not hallucinate text that is not clearly readable. If partially legible, use a ? placeholder only for uncertain characters.
	Represent each extracted text instance as an object in a JSON array under the key text:
	JSON
	"text": [
	{
	"content": "Full text exactly as printed",
	"approx_bbox": [x_min, y_min, x_max, y_max],
	"confidence": 0.0,
	"role": "title \| subtitle \| legend \| label \| scale \| coordinate \| directional \| other",
	"reasoning": "Short explanation of how you read this text and why you assigned this role."
	}
	]
	Notes:
	approx_bbox is optional but recommended (0–1 or pixel-based).
	confidence is a float (0.0 to 1.0).
	reasoning should be concise (1–2 sentences).
	2️⃣ Placename Identification
	From the extracted text array:
	Identify all substrings that appear to be geographic placenames (Countries, cities, waterbodies, mountains, parks, etc.).
	Use surrounding context and cartographic conventions to infer validity.
	When a text object contains multiple placenames, create multiple entries referencing the same source_text_index.
	Store these in a placenames array:
	JSON
	"placenames": [
	{
	"name": "Placename exactly as printed",
	"type": "unknown \| country \| region \| state_province \| county \| city \| town \| village \| waterbody \| mountain \| landmark \| other",
	"source_text_index": 0,
	"confidence": 0.0,
	"reasoning": "Why you believe this is a placename and why you chose this type."
	}
	]
	3️⃣ Bounding Box Estimation (Map Extent in Geographic Coordinates)
	Infer the approximate geographic bounding box of the map’s main area of interest in decimal degrees.
	Use: Explicit coordinate labels, graticule lines, scale bars, or recognizable geographic shapes/placenames.
	Prioritize: Explicit labels over visual inference.
	Output:
	JSON
	"map_bbox_estimate": {
	"west": -180.0,
	"south": -90.0,
	"east": 180.0,
	"north": 90.0,
	"confidence": 0.0,
	"method": "explicit_labels \| inferred_from_placenames \| mixed \| rough_contextual_guess",
	"reasoning": "Step-by-step explanation of evidence used (coordinates, placenames, etc.)."
	}
	4️⃣ Accessibility Description
	Synthesize a cohesive textual description of the map (maximum 250 words) suitable for low-vision users or as an academic catalog entry.
	Tone: Adopt a formal, descriptive tone suitable for an undergraduate research context.
	Content Focus:
	Composition: Describe the layout, orientation (if not North-up), and visual hierarchy (e.g., "The map is dominated by the blue expanse of Lake Superior on the right...").
	Geography: Summarize the primary region depicted, noting major water bodies, urban centers, and transportation networks.
	Style: Mention the cartographic style (e.g., cadastral details, grid layouts, color coding for subdivisions, relief techniques).
	Metadata: Weave in the title, publisher, and date (if available) naturally.
	Avoid listing every single street name; focus on the "big picture" spatial relationships.
	Store this as a single string under the key description.
	5️⃣ Global Debug / Reasoning Metadata
	Include a top-level debug object describing your overall approach:
	JSON
	"debug": {
	"ocr_strategy": "How you segmented text and handled difficult labels.",
	"placename_extraction_strategy": "How you distinguished placenames from other text.",
	"bbox_inference_strategy": "How you calculated the coordinates.",
	"limitations": "Known issues (resolution, damage, density)."
	}
	Final Output Format (Required)
	You must respond only with a single JSON object with this exact top-level structure:
	JSON
	{
	"text": [...],
	"placenames": [...],
	"map_bbox_estimate": { ... },
	"description": "Your ~250 word accessibility description here.",
	"debug": { ... }
	}
	Do not include any natural language explanation outside of this JSON object.
No results found