OpenOnco v0.1.0
EN UA
Try it →

Capabilities

OpenOnco takes a JSON patient profile and returns a structured treatment plan or a diagnostic brief, with a full trace of every decision and citations for every claim. There is no black box: every recommendation comes from declarative rules you can read in the knowledge base and follow through the trace. The rest of this page details how we work with data, sources, and requests.

What's already in

65
Diseases in the KB
37 hematologic · 28 solid · 64 with the full chain · 43 carry a 2L+ algorithm

Each disease carries an archetype (etiologically_driven like HCV-MZL, risk_stratified like MM, biomarker_driven like NSCLC/EGFR or HGBL/MYC-BCL2, stage_driven like cervical) which determines the algorithm logic for treatment selection. 1L is covered for all 65 diseases (65 algorithms); 2L+ is covered for 34 hematologic and 9 solid diseases (45 algorithms).

16
Clinician skills (virtual specialists)
each skill carries its own version, sources, last_reviewed

Hematologist, pathologist, infectious-disease/hepatologist, radiologist, molecular geneticist, clinical pharmacist, radiation oncologist, palliative care and others — each is activated by specific triggers in the patient profile and contributes its own open questions and supportive-care recommendations to the plan.

24
Workups (triage)
pre-biopsy diagnostic path

When histology is not yet confirmed (CHARTER §15.2 C7 forbids a treatment Plan without it), the engine switches into diagnostic mode: it emits a Workup Brief with the test list, biopsy approach, IHC panel, and the roles required in the triage MDT. Once histology is confirmed, the diagnostic plan is promoted to a treatment plan via revise_plan(...).

381
Red flags
escalation or investigation triggers

Red flags are structured clinical conditions that automatically change the plan: RF-BULKY-DISEASE (nodal mass >7 cm) switches HCV-MZL from antiviral-first to BR + DAA; RF-MM-HIGH-RISK-CYTOGENETICS (t(4;14), del(17p), gain 1q) escalates MM from triplet VRd to quadruplet D-VRd. Every RedFlag is bound to a domain role that "catches" it in the MDT brief.

244
Treatment regimens
302 indications (173 first-line · 129 2L+)

Each regimen is a list of drugs with doses, cycle schedule, dose adjustments (renal impairment, FIB-4, frailty), premedications, mandatory supportive care, and a monitoring schedule. Every Indication (the combination of disease + line_of_therapy + biomarker / stage / demographic filters) gates a specific Regimen — e.g. MGMT-METHYLATION for GBM Stupp; CD79B / COO-Hans / IPI for DLBCL R-CHOP vs Pola-R-CHP; t(11;14) / MIPI for MCL; MYC + BCL2 rearrangements for HGBL-DH; AFP for HCC; FLIPI for FL.

216
Drugs

ATC/RxNorm-coded. Each carries its FDA/EMA/local-MoH regulatory status plus reimbursement flags (e.g. daratumumab is currently NOT reimbursed by Ukraine's NHSU — a hard blocker for D-VRd, surfaced explicitly in the plan).

95
Tests / procedures

LOINC-coded labs + imaging + histology + IHC + genomic tests. Each carries a priority_class (critical / standard / desired / calculation_based) — rendered in every Plan as the "pre-treatment investigations" table.

261
Sources (top-level guidelines + RCTs)
NCCN · ESMO · EHA · BSH · EASL · MoH UA · WHO · CTCAE · FDA

Behind these 261 curated sources sit 29,450+ primary clinical publications (RCTs, meta-analyses, cohort studies) and 11,611 pages of guideline text. The NCCN B-Cell Lymphomas guideline alone is ~500 pages with ~700 references. Every Indication / Regimen / RedFlag cites specific sources with a position (supports / contradicts / context), a paraphrased quote, and page/section locator. FDA Criterion 4 — the clinician independently verifies the basis for every recommendation.

14
Specifications

CHARTER (governance + FDA positioning), CLINICAL_CONTENT_STANDARDS, KNOWLEDGE_SCHEMA, DATA_STANDARDS, SOURCE_INGESTION, REFERENCE_CASE, MDT_ORCHESTRATOR, DIAGNOSTIC_MDT, WORKUP_METHODOLOGY, SKILL_ARCHITECTURE.

Snapshot as of 2026-04-27 10:44 UTC. Reviewer sign-offs ≥ 2: 15/298 — all clinical content is marked STUB until two of three Clinical Co-Leads sign it off. This is a decision-support tool, not a medical device.

1. How a request is processed

The clinician feeds the engine a JSON patient profile (FHIR/mCODE-aligned in the future, a simplified dict in the MVP). The engine runs six sequential stages and returns a Plan with ≥2 alternative tracks (CHARTER §2 — both tracks live in the same document; alternatives are never hidden).

Stage 1
Disease + Algorithm resolve
disease.icd_o_3_morphology or disease.id → look up the Disease entity. Disease + line_of_therapy → look up the matching Algorithm.
Stage 2
Findings flattening
Merges demographics + biomarkers + findings into one flat dict for evaluation.
Stage 3
RedFlag evaluation
Each of the 381 RedFlags is checked against the findings. Boolean rule engine: any_of / all_of / none_of clauses with thresholds.
Stage 4
Algorithm walk
Decision tree, step by step. Each step → outcome → branch (result or next_step). The trace records every fired_red_flags entry at each step.
Stage 5
Tracks materialization
ALL Indications in algorithm.output_indications become their own tracks (standard / aggressive / surveillance). The first is the default; the rest are alternatives.
Stage 6
Per-track resolution
Indication → Regimen → MonitoringSchedule + SupportiveCare + Contraindications. Everything resolves from the read-only KB.

Per-patient processing time is 50–200 ms (KB load dominates). In Pyodide the first run takes 8–15 sec (runtime download); subsequent runs match a local CLI. There is no server — the engine runs locally (CLI) or in the user's browser (Pyodide). The patient JSON never leaves the machine.

2. How we work with patient data

The engine reads only structured fields from the patient profile. Each field has a clear semantic role: it either triggers a RedFlag, filters available Indications, or configures Regimen materialization. Unknown fields are ignored — no hidden side effects.

CategoryWhat we readHow we use it
Disease (entry point) disease.id · icd_o_3_morphology · line_of_therapy determines which Algorithm to run
Diagnostic-mode trigger disease.suspicion.lineage_hint · tissue_locations · presentation switches output to DiagnosticPlan instead of Plan (workup brief)
Demographics age · sex · ecog · fit_for_transplant · decompensated_cirrhosis · pregnancy_status filter on Indication.applicable_to.demographic_constraints
Biomarkers any BIO-X from the KB as keys: BIO-CLL-HIGH-RISK-GENETICS, BIO-MM-CYTOGENETICS-HR, BIO-HCV-RNA, … fire RedFlags, filter Indications
Findings 381+ structured fields — dominant_nodal_mass_cm, ldh_ratio_to_uln, creatinine_clearance_ml_min, blastoid_morphology, tp53_mutation, del_17p, … thresholds inside RedFlag triggers
Prior tests completed prior_tests_completed: [TEST-IDs] excludes already-done tests from generated workup_steps
Clinical record (free-form) any clinical_record envelope not read by the engine — surfaced only by the render layer for context

3. How we work with sources — and why this is our key advantage

29,450+
29,450+ primary clinical publications (RCTs, meta-analyses, cohort studies) sit beneath 261 curated top-level guidelines. That is 11,611 pages of guideline text in total. No clinician can physically work through that volume for every patient; the engine indexes it for you and returns a Plan with phrased citations and page-level locators.

Each source is its own Source entity with a stable ID (e.g. SRC-NCCN-BCELL-2025), title, version, license, and access mode (referenced vs hosted per SOURCE_INGESTION_SPEC §1.4). The KB currently holds 261 sources:

Source IDTypePagesPrimary refsRole in the corpus
SRC-NCCN-BCELL-2025NCCN B-Cell Lymphomas v.2.2025500700primary_guideline
SRC-NCCN-MM-2025NCCN Multiple Myeloma 2025400600primary_guideline
SRC-NCCN-AML-2025NCCN AML 2025350500primary_guideline
SRC-NCCN-MPN-2025NCCN MPN 2025300400primary_guideline
SRC-EASL-HCV-2023EASL HCV Guidelines 202380250primary_guideline
SRC-WHO-LNSC-2023WHO Lymph Node, Spleen, Thymus Cytopathology150200diagnostic_methodology
SRC-CTCAE-V5NCI CTCAE v5.0 (toxicity terminology)15030terminology
SRC-ESMO-MZL-2024ESMO Marginal Zone Lymphomas 202430150primary_guideline
SRC-BSH-MZL-2024BSH MZL Guideline 202450120regional_guideline
SRC-EHA-WORKUP-2024EHA Practical Workup Guidelines 202440100diagnostic_methodology
SRC-MOZ-UA-LYMPH-2024Ukraine MoH — Lymphomas (placeholder)6050regional_guideline
SRC-ARCAINI-2014IELSG-19 RCT — MALT lymphoma1050rct_publication
SRC-FDA-CDS-2026FDA CDS Software Guidance 20263020regulatory
Total11,61129,450+

Every clinical claim in the KB has a citation. Indication, Regimen, RedFlag, Algorithm — all carry a sources: list field where each source is annotated with:

Citation structure

  • source_id — points to the Source entity
  • positionsupports / contradicts / context
  • relevant_quote_paraphrase — paraphrased statement from the guideline (not verbatim copy-paste, for license safety)
  • page_or_section — exact locator inside the document

The render layer surfaces the full list of cited sources below every Indication in the Plan. This is FDA Criterion 4 (CHARTER §15.2): the clinician can independently verify the basis for every recommendation, instead of taking the engine on faith.

Source hosting defaults to referenced. We do not mirror external databases (NCCN, ESMO, etc.) — we link to them. Hosting requires an explicit H1–H5 justification (CHARTER §15.2, referenced vs hosted vs mixed). Exception: regulatory PDFs (FDA CDS, CTCAE) are kept locally for archive stability.

4. How we run requests

Three ways to run the engine — none of them server-bound:

CLI
Locally on the clinician's machine

python -m knowledge_base.engine.cli --patient profile.json --render plan.html. Works offline, no network needed. The profile stays on disk.

Pyodide
In the browser (try.html)

Pyodide v0.26.4 loads the Python WebAssembly runtime, micropip installs pydantic + pyyaml, and the engine bundle (~302 KB) is unpacked into the in-memory FS. The engine runs in the browser. Patient JSON never leaves the machine.

Library
Python import

from knowledge_base.engine import generate_plan, revise_plan — integration with EHRs, CSV pipelines, batch testing. Stateless and deterministic.

Privacy by design. Patient JSON never leaves the user's machine. There are no logs, no database, no accidental leakage. Reproducibility: Plan.knowledge_base_state.algorithm_version pins the KB version — same input + same KB = same output.

5. What we return

The engine returns a Plan (treatment mode) or a DiagnosticPlan (workup brief). Each Plan contains:

FieldContents
tracks[]≥2 alternative tracks (default first), each with indication + regimen + monitoring + supportive_care + contraindications
fda_complianceFDA Criterion 4 fields: intended_use, hcp_user_specification, patient_population_match, algorithm_summary, data_sources_summary, data_limitations, automation_bias_warning
tracestep-by-step history of walk_algorithm: step / outcome / branch / fired_red_flags for every step
knowledge_base_statesnapshot of the KB version at generation time (audit per CHARTER §10.2)
kb_resolvedall referenced entities (Disease, Tests, RedFlags, Algorithm) for the render layer
warningsschema/ref errors, time-critical disqualifications, missing-data hints
supersedes / superseded_byversion chain across plans for the same patient

Optionally, a MDT brief is added by orchestrate_mdt(): it reads the Plan + patient profile and appends required / recommended / optional roles (drawn from 16 virtual specialists), open questions, and a provenance graph. It renders as an inline section inside the Plan HTML.

6. How a plan updates when new data arrives

revise_plan(updated_patient, previous_plan, revision_trigger) takes the updated profile and produces a new plan version with a supersedes / superseded_by chain. Three legal transitions plus one prohibition:

FromWith changesTransitionResult
DiagnosticPlan vNonly suspicion (no histology)diagnostic → diagnosticDiagnosticPlan v(N+1)
DiagnosticPlan vNconfirmed histologydiagnostic → treatment (promotion)Plan v1 (first treatment)
Plan vNany update with histology presenttreatment → treatmentPlan v(N+1)
Plan vNhistology removedILLEGAL — ValueError, CHARTER §15.2 C7

The previous plan is not mutated — a deep copy is returned with superseded_by filled in. The caller (CLI or EHR) decides what to do with both versions. Per CHARTER §10.2, the older version is kept indefinitely.

What triggers a new plan: any change to one of the ~30 structured fields — new biomarkers (del(17p) detected), new stage (ECOG 1 → 3), new findings (bulky disease on restaging), new infectious flags (HBV reactivation), pregnancy detected. Changes to the clinical_record free-text do not trigger regeneration (the engine does not read free text).

7. What else we do

24
Diagnostic workups

Pre-biopsy mode: when histology is missing, the engine emits a Workup Brief with the test list, biopsy approach, IHC panel, and triage MDT roles. Per CHARTER §15.2 C7, no treatment Plan is generated without histology.

MDT
Multidisciplinary brief

The orchestrator reads the Plan + profile and assigns required / recommended / optional roles (16 catalog), formulates open questions (Q1–Q6 + DQ1–DQ4), and builds the decision-provenance graph.

stats
KB dashboard

python -m knowledge_base.stats — actual entity counts + per-disease coverage matrix + reviewer sign-off ratio. Available as CLI / JSON / embeddable HTML widget for the landing page.

render
A4 print-friendly HTML

Single-file HTML with embedded CSS, no external assets beyond Google Fonts. Adapts to A4 print via @page and @media print. Tracks are shown side by side; the alternative is never hidden (anti automation-bias per CHARTER §15.2 C6).

trials
Experimental options (Phase C — done)

enumerate_experimental_options(...) queries ClinicalTrials.gov v2 by disease + biomarker + line_of_therapy and returns ExperimentalOption entries with UA-availability metadata (sites_ua, countries) — render-time only; the engine never uses a trial as a selection signal. Status filter: RECRUITING / ACTIVE_NOT_RECRUITING / ENROLLING_BY_INVITATION; integrated into generate_plan, rendered as a third Plan track; 7-day on-disk TTL cache for offline runs.

access
Access Matrix (Phase D)

Every Plan carries an AccessMatrix — per-track aggregation of UA registration, НСЗУ coverage, ₴ cost orientation, and primary AccessPathway. Render-time only (CHARTER §8.3, guaranteed by the invariant test). Stale-cost warning fires when cost_last_updated > 180 days. Trial rows are appended automatically when the experimental track is active.