Parse a blood lab PDF and extract all test results into a structured JSON format.
A PDF file containing blood lab results from any lab (German, English, or other languages).
Read the PDF and identify all test results
Extract metadata : date, lab name/location, sample IDs, collection time
For each test result, extract :
Original marker name (as printed)
Value (numeric or qualitative)
Unit (as printed)
Reference range (min/max or expected value)
Method (if listed)
Normalize markers to canonical English identifiers
Normalize units to conventional system (see unit conversion below)
Determine status : normal, low, high based on reference range
Identify panels included and missing
Generate summary with flags for attention items
Critical: Output Encoding
Always output valid UTF-8 JSON. Preserve special characters correctly:
German: ä ö ü ß Ä Ö Ü
French: é è ê ë à â ç
Do NOT output mojibake like "Hämatologie" — output "Hämatologie"
Map original marker names to normalized identifiers. Use snake_case, English terms.
Original (German/English variants)
Normalized marker
Hämoglobin, Hemoglobin, Hb
hemoglobin
Leukozyten, Leukocytes, WBC
leukocytes
Erythrozyten, Erythrocytes, RBC
erythrocytes
Thrombozyten, Platelets, PLT
platelets
Hämatokrit, Hematocrit, HCT
hematocrit
GPT/ALAT, ALT, SGPT
alt
GOT/ASAT, AST, SGOT
ast
Gamma-GT, GGT, γ-GT
ggt
Alkalische Phosphatase, ALP, AP
alp
Kreatinin, Creatinine, Crea
creatinine
Harnstoff, Urea, BUN
urea
Harnsäure, Uric Acid
uric_acid
Cholesterin, Cholesterol
cholesterol_total
Triglyceride, Triglycerides
triglycerides
HDL-Cholesterin, HDL Cholesterol, HDL
hdl
LDL-Cholesterin, LDL Cholesterol, LDL
ldl
Non-HDL-Cholesterin
non_hdl
Ferritin
ferritin
Eisen, Iron
iron
Transferrin
transferrin
Transferrinsättigung, Transferrin Saturation
transferrin_saturation
Vitamin D (25-OH), 25-Hydroxyvitamin D, Vitamin D3
vitamin_d_25oh
Vitamin B12, Cobalamin
vitamin_b12
Folsäure, Folate, Folic Acid
folate
TSH, TSH basal
tsh
fT3, Free T3, freies T3
ft3
fT4, Free T4, freies T4
ft4
HbA1c
hba1c
HbA1C (n. IFCC), HbA1c IFCC
hba1c_ifcc
Glucose, Glukose (nüchtern/fasting)
glucose_fasting
Glucose, Glukose (random/nicht nüchtern)
glucose
CRP, C-reaktives Protein, C-Reactive Protein
crp
hs-CRP, hochsensitives CRP
crp_hs
Bilirubin (gesamt), Total Bilirubin
bilirubin_total
Bilirubin (direkt), Direct Bilirubin
bilirubin_direct
Natrium, Sodium, Na
sodium
Kalium, Potassium, K
potassium
Calcium, Ca
calcium
Magnesium, Mg
magnesium
Phosphat, Phosphate, P
phosphate
Chlorid, Chloride, Cl
chloride
GFR (MDRD), GFR (CKD-EPI), eGFR
gfr
Gesamteiweiß, Total Protein
total_protein
Albumin
albumin
HBs-Antigen, HBsAg
hbs_antigen
Anti-HCV, HCV-Ak
anti_hcv
Anti-HBs, HBs-Ak
anti_hbs
Anti-HBc, HBc-Ak
anti_hbc
For differential counts, use:
Percentage: neutrophils_percent, lymphocytes_percent, monocytes_percent, eosinophils_percent, basophils_percent
Absolute: neutrophils_absolute, lymphocytes_absolute, monocytes_absolute, eosinophils_absolute, basophils_absolute
For markers not in this list, create a reasonable snake_case identifier.
Unit Normalization (Critical for Cross-Lab Comparison)
Different labs use different unit systems. Always normalize to conventional units for consistency, while preserving the original values.
Marker
SI Unit
Conventional Unit
Conversion
Hemoglobin
mmol/l
g/dl
× 1.611
Hematocrit
l/l
%
× 100
Glucose
mmol/l
mg/dl
× 18.02
Cholesterol (total, HDL, LDL)
mmol/l
mg/dl
× 38.67
Triglycerides
mmol/l
mg/dl
× 88.57
Creatinine
µmol/l
mg/dl
÷ 88.4
Urea
mmol/l
mg/dl
× 6.006
Uric Acid
µmol/l
mg/dl
÷ 59.48
Bilirubin
µmol/l
mg/dl
÷ 17.1
Iron
µmol/l
µg/dl
× 5.587
Calcium
mmol/l
mg/dl
× 4.008
Magnesium
mmol/l
mg/dl
× 2.431
Phosphate
mmol/l
mg/dl
× 3.097
Total Protein
g/l
g/dl
÷ 10
Albumin
g/l
g/dl
÷ 10
MCH
fmol
pg
× 16.11
MCHC
mmol/l
g/dl
× 1.611
Leukocyte/Cell Count Equivalents
These are the same value, just different notation:
10³/µl = Gpt/l = ×10⁹/l = thousand/µl
10⁶/µl = Tpt/l = ×10¹²/l = million/µl
/µl = cells/µl (absolute count, no multiplier)
Normalize to: 10³/µl for WBC/platelets, 10⁶/µl for RBC, /µl for absolute differentials.
µkat/l to IU/l (U/l): × 60
Assign each marker to a panel:
Panel
Markers
complete_blood_count
leukocytes, erythrocytes, hemoglobin, hematocrit, platelets, mcv, mch, mchc, rdw, mpv, neutrophils_, lymphocytes_ , monocytes_, eosinophils_ , basophils_*
lipid_panel
cholesterol_total, triglycerides, hdl, ldl, non_hdl, ldl_hdl_ratio, vldl
liver_function
alt, ast, ggt, alp, bilirubin_total, bilirubin_direct, albumin
kidney_function
creatinine, urea, gfr, cystatin_c, uric_acid
thyroid
tsh, ft3, ft4, t3, t4, anti_tpo, anti_tg
iron_studies
ferritin, iron, transferrin, transferrin_saturation, tibc
glycemic
glucose, glucose_fasting, hba1c, hba1c_ifcc, insulin, c_peptide
vitamin_d
vitamin_d_25oh, vitamin_d_1_25oh
vitamin_b12
vitamin_b12, holotranscobalamin
folate
folate
inflammation_markers
crp, crp_hs, esr, il6
electrolytes
sodium, potassium, chloride, calcium, magnesium, phosphate
hepatitis_screening
hbs_antigen, anti_hbs, anti_hbc, anti_hcv
metabolic_basic
glucose_fasting, total_protein, albumin, uric_acid
{
"date" : " YYYY-MM-DD" ,
"lab" : {
"id" : " lab_identifier_snake_case" ,
"name" : " Full Lab Name" ,
"address" : " Address if available" ,
"phone" : " Phone if available"
},
"sample_ids" : [" id1" , " id2" ],
"collection_time" : " HH:MM" ,
"panels_included" : [" panel1" , " panel2" ],
"panels_missing" : [" panel3" , " panel4" ],
"results" : [
{
"category" : " Category as printed (preserved)" ,
"panel" : " panel_name" ,
"marker" : " normalized_marker_id" ,
"marker_original" : " Original Name As Printed" ,
"value" : 15.1 ,
"unit" : " g/dl" ,
"value_si" : 9.37 ,
"unit_si" : " mmol/l" ,
"reference_min" : 13.5 ,
"reference_max" : 17.2 ,
"reference_min_si" : 8.38 ,
"reference_max_si" : 10.67 ,
"status" : " normal" ,
"method" : " Method if listed" ,
"clinical_note" : " Optional note for interpretation"
},
{
"category" : " Category" ,
"panel" : " panel_name" ,
"marker" : " qualitative_marker" ,
"marker_original" : " Original Name" ,
"value_text" : " negative" ,
"reference_expected" : " negative" ,
"status" : " normal" ,
"method" : " Method"
}
],
"flags" : [" marker_status" ],
"summary" : {
"total_markers" : 38 ,
"normal" : 37 ,
"low" : 1 ,
"high" : 0 ,
"attention_items" : [
" Human readable note about items needing attention"
]
},
"source_file" : " original_filename.pdf" ,
"parsed_at" : " ISO8601 timestamp" ,
"schema_version" : " 1.1"
}
Primary values (value, unit, reference_min, reference_max):
Always in conventional units (g/dl, mg/dl, %, etc.)
If the lab reports in SI units, convert to conventional and store converted values here
SI values (value_si, unit_si, reference_min_si, reference_max_si):
Always in SI units (mmol/l, µmol/l, l/l, etc.)
If the lab reports in conventional units, convert to SI and store here
Omit these fields if no conversion is needed (e.g., percentages, counts)
Qualitative tests :
Use value_text instead of numeric value
Use reference_expected instead of min/max
normal: value within reference range
low: value below reference_min
high: value above reference_max
critical_low: value <50% of reference_min
critical_high: value >200% of reference_max
For qualitative tests, compare value_text to reference_expected.
Add clinical_note for:
Values technically out of range but clinically favorable (e.g., low HbA1c is good)
Values at the edge of normal that may warrant monitoring
Important context about the marker
Conversion notes if the original used unusual units
Verify extraction : Print a summary table for the user to review
Highlight flags : Show any values outside reference range
Save JSON : Output filename should be YYYY-MM-DD.json based on the test date
Use UTF-8 : Ensure proper encoding of all special characters
Parsed: Example Lab Name (2025-07-15)
Sample IDs: 12345678
Collection: 09:30
Markers extracted: 42
Panels: complete_blood_count, lipid_panel, liver_function, kidney_function, thyroid, vitamin_d
⚠️ Attention:
- Vitamin D: 18.5 ng/ml [LOW] (ref: 20-50) - consider supplementation
- Ferritin: 28 ng/ml [NORMAL but low-end] (ref: 22-275) - monitor if vegetarian/vegan
✓ 40 markers within normal range
Unit conversions applied:
- Hemoglobin: 9.2 mmol/l → 14.8 g/dl
- Cholesterol: 4.04 mmol/l → 156 mg/dl
- Glucose: 4.83 mmol/l → 87 mg/dl
Saved to: 2025-07-15.json
{
"date" : " 2025-07-15" ,
"lab" : {
"id" : " example_lab_berlin" ,
"name" : " Example Lab Berlin GmbH" ,
"address" : " Musterstraße 123, 10115 Berlin"
},
"sample_ids" : [" 12345678" ],
"collection_time" : " 09:30" ,
"panels_included" : [" complete_blood_count" , " lipid_panel" , " thyroid" ],
"panels_missing" : [" hepatitis_screening" , " inflammation_markers" ],
"results" : [
{
"category" : " Hämatologie" ,
"panel" : " complete_blood_count" ,
"marker" : " hemoglobin" ,
"marker_original" : " Hämoglobin" ,
"value" : 14.8 ,
"unit" : " g/dl" ,
"value_si" : 9.2 ,
"unit_si" : " mmol/l" ,
"reference_min" : 13.5 ,
"reference_max" : 17.2 ,
"reference_min_si" : 8.38 ,
"reference_max_si" : 10.67 ,
"status" : " normal" ,
"method" : " Photometry"
},
{
"category" : " Lipide" ,
"panel" : " lipid_panel" ,
"marker" : " cholesterol_total" ,
"marker_original" : " Cholesterin" ,
"value" : 156 ,
"unit" : " mg/dl" ,
"value_si" : 4.04 ,
"unit_si" : " mmol/l" ,
"reference_min" : null ,
"reference_max" : 200 ,
"reference_min_si" : null ,
"reference_max_si" : 5.17 ,
"status" : " normal" ,
"method" : " Enzymatic"
},
{
"category" : " Vitamine" ,
"panel" : " vitamin_d" ,
"marker" : " vitamin_d_25oh" ,
"marker_original" : " Vitamin D3 (25-OH)" ,
"value" : 18.5 ,
"unit" : " ng/ml" ,
"reference_min" : 20 ,
"reference_max" : 50 ,
"status" : " low" ,
"method" : " ECLIA" ,
"clinical_note" : " Below optimal range - consider supplementation, especially in winter months"
},
{
"category" : " Serologie" ,
"panel" : " hepatitis_screening" ,
"marker" : " hbs_antigen" ,
"marker_original" : " HBs-Antigen" ,
"value_text" : " negative" ,
"reference_expected" : " negative" ,
"status" : " normal" ,
"method" : " CMIA" ,
"clinical_note" : " No evidence of Hepatitis B infection"
}
],
"flags" : [" vitamin_d_25oh_low" ],
"summary" : {
"total_markers" : 42 ,
"normal" : 41 ,
"low" : 1 ,
"high" : 0 ,
"attention_items" : [
" Vitamin D: 18.5 ng/ml is below reference range (20-50) - consider supplementation"
]
},
"source_file" : " lab_results_2025-07-15.pdf" ,
"parsed_at" : " 2025-07-15T14:30:00Z" ,
"schema_version" : " 1.1"
}