HomeMethodology
AI methodologyHow our AI lab test analyzer actually works
Full transparency on architecture, training, evaluation, and the guardrails we put around our health AI. We publish this because health AI without transparency is not trustworthy.
1. Architecture overview
The blood-test.life analyzer is not a single language model. It is a four-stage pipeline, each stage independently testable and replaceable.
- Parser. A library of 400+ deterministic templates for known lab formats, with a vision-language fallback for unfamiliar layouts.
- Normalizer. Unit conversion, LOINC mapping, and reference-range adjustment (age, sex, pregnancy, ethnicity where validated).
- Clinical-rules engine. Validated deterministic logic (e.g., diabetes thresholds, CKD staging, lipid risk calculators). The rules engine has authority over the language model — the language model cannot overrule it.
- Narrative model. A fine-tuned, domain-adapted language model that writes the patient-facing report, constrained by a phrase library reviewed by our medical advisory board.
2. Training data
The narrative model is fine-tuned on a curated dataset of medical writing — guideline excerpts, clinical reasoning examples, and patient-friendly explanations — all reviewed by the medical board for clinical accuracy and tone. We do not train on user uploads. Patient lab reports are never added to the training set, ever.
3. Reference data sources
Reference ranges are sourced from validated population studies: CALIPER (pediatric), NORIP (Nordic adult), CDC NHANES (US adult), and the major specialty-society reference ranges (ATA for thyroid, AHA/ACC for lipids, ADA for diabetes, KDIGO for kidney, ACOG for pregnancy). Where multiple sources disagree, we use the most current guideline and document the choice.
4. Evaluation
We evaluate the analyzer on a 12,400-report anonymized validation set covering 22 lab providers across 4 continents. Reports are stratified by patient age, sex, pregnancy, and ethnicity to surface fairness issues. As of the June 2026 evaluation:
- Biomarker extraction: 99.1 % accuracy
- Unit normalization: 99.8 % accuracy
- Flag classification (normal / borderline / abnormal): 97.4 % agreement with board-certified physicians
- Hallucination rate (clinical claim not supported by extracted data): < 0.3 %
- Disclaimer presence: 100 % of reports include the medical disclaimer
We re-evaluate quarterly. The current evaluation report is available on request to clinical partners and academic researchers.
5. Guardrails
The narrative model is constrained in several specific ways:
- It cannot reference biomarkers not present in the report.
- It cannot invent reference ranges — only those provided by the normalizer.
- It cannot make diagnostic statements (\"you have X\"); it can describe patterns (\"this pattern is consistent with X\").
- It cannot recommend medications.
- It must end every report with the medical disclaimer.
- When confidence drops below threshold, it must flag the section as low-confidence rather than hallucinating.
6. Safety patterns
Certain biomarker patterns trigger an immediate \"see a clinician\" flag rather than a friendly explanation. Examples: potassium < 2.5 or > 6.5 mEq/L; platelets < 20 ×10³/µL; HbA1c > 10 %; eGFR < 30; troponin elevation; ALT/AST > 10× upper limit. These thresholds are reviewed by our medical board and updated when guidelines shift.
7. Updates
Model versions are tagged and visible in every report's audit trail. Material updates — new biomarker support, new reference data, new clinical rules — are summarized on the corrections log.
8. What's still imperfect
An honest methodology page admits what doesn't work yet:
- Handwritten lab reports drop our parser accuracy by ~7 percentage points; we route these to a low-confidence flow.
- Very rare biomarkers (~30 markers) are extracted but not yet interpreted with the same depth as the standard panels.
- Pediatric pregnancy (very rare) requires manual review.
- Some Eastern European lab formats use Roman numeral reference ranges; the parser was extended for this in June 2026 but coverage is still 96 %, not 99.