Capabilities

OpenOnco takes a JSON patient profile and returns a structured treatment plan or a diagnostic brief, with a full trace of every decision and citations for every claim. There is no black box: every recommendation comes from declarative rules you can read in the knowledge base and follow through the trace. The rest of this page details how we work with data, sources, and requests.

What's already in

Diseases in the KB

37 hematologic · 28 solid · 64 with the full chain · 43 carry a 2L+ algorithm

Each disease carries an archetype (etiologically_driven like HCV-MZL, risk_stratified like MM, biomarker_driven like NSCLC/EGFR or HGBL/MYC-BCL2, stage_driven like cervical) which determines the algorithm logic for treatment selection. 1L is covered for all 65 diseases (65 algorithms); 2L+ is covered for 34 hematologic and 9 solid diseases (45 algorithms).

Clinician skills (virtual specialists)

each skill carries its own version, sources, last_reviewed

Hematologist, pathologist, infectious-disease/hepatologist, radiologist, molecular geneticist, clinical pharmacist, radiation oncologist, palliative care and others — each is activated by specific triggers in the patient profile and contributes its own open questions and supportive-care recommendations to the plan.

Workups (triage)

pre-biopsy diagnostic path

When histology is not yet confirmed (CHARTER §15.2 C7 forbids a treatment Plan without it), the engine switches into diagnostic mode: it emits a Workup Brief with the test list, biopsy approach, IHC panel, and the roles required in the triage MDT. Once histology is confirmed, the diagnostic plan is promoted to a treatment plan via revise_plan(...).

381

Red flags

escalation or investigation triggers

Red flags are structured clinical conditions that automatically change the plan: RF-BULKY-DISEASE (nodal mass >7 cm) switches HCV-MZL from antiviral-first to BR + DAA; RF-MM-HIGH-RISK-CYTOGENETICS (t(4;14), del(17p), gain 1q) escalates MM from triplet VRd to quadruplet D-VRd. Every RedFlag is bound to a domain role that "catches" it in the MDT brief.

244

Treatment regimens

302 indications (173 first-line · 129 2L+)

Each regimen is a list of drugs with doses, cycle schedule, dose adjustments (renal impairment, FIB-4, frailty), premedications, mandatory supportive care, and a monitoring schedule. Every Indication (the combination of disease + line_of_therapy + biomarker / stage / demographic filters) gates a specific Regimen — e.g. MGMT-METHYLATION for GBM Stupp; CD79B / COO-Hans / IPI for DLBCL R-CHOP vs Pola-R-CHP; t(11;14) / MIPI for MCL; MYC + BCL2 rearrangements for HGBL-DH; AFP for HCC; FLIPI for FL.

216

Drugs

ATC/RxNorm-coded. Each carries its FDA/EMA/local-MoH regulatory status plus reimbursement flags (e.g. daratumumab is currently NOT reimbursed by Ukraine's NHSU — a hard blocker for D-VRd, surfaced explicitly in the plan).

Tests / procedures

LOINC-coded labs + imaging + histology + IHC + genomic tests. Each carries a priority_class (critical / standard / desired / calculation_based) — rendered in every Plan as the "pre-treatment investigations" table.

261

Sources (top-level guidelines + RCTs)

NCCN · ESMO · EHA · BSH · EASL · MoH UA · WHO · CTCAE · FDA

Behind these 261 curated sources sit 29,450+ primary clinical publications (RCTs, meta-analyses, cohort studies) and 11,611 pages of guideline text. The NCCN B-Cell Lymphomas guideline alone is ~500 pages with ~700 references. Every Indication / Regimen / RedFlag cites specific sources with a position (supports / contradicts / context), a paraphrased quote, and page/section locator. FDA Criterion 4 — the clinician independently verifies the basis for every recommendation.

Specifications

CHARTER (governance + FDA positioning), CLINICAL_CONTENT_STANDARDS, KNOWLEDGE_SCHEMA, DATA_STANDARDS, SOURCE_INGESTION, REFERENCE_CASE, MDT_ORCHESTRATOR, DIAGNOSTIC_MDT, WORKUP_METHODOLOGY, SKILL_ARCHITECTURE.

Snapshot as of 2026-04-27 10:44 UTC. Reviewer sign-offs ≥ 2: 15/298 — all clinical content is marked STUB until two of three Clinical Co-Leads sign it off. This is a decision-support tool, not a medical device.

1. How a request is processed

The clinician feeds the engine a JSON patient profile (FHIR/mCODE-aligned in the future, a simplified dict in the MVP). The engine runs six sequential stages and returns a Plan with ≥2 alternative tracks (CHARTER §2 — both tracks live in the same document; alternatives are never hidden).

Stage 1

Disease + Algorithm resolve

disease.icd_o_3_morphology or disease.id → look up the Disease entity. Disease + line_of_therapy → look up the matching Algorithm.

Stage 2

Findings flattening

Merges demographics + biomarkers + findings into one flat dict for evaluation.

Stage 3

RedFlag evaluation

Each of the 381 RedFlags is checked against the findings. Boolean rule engine: any_of / all_of / none_of clauses with thresholds.

Stage 4

Algorithm walk

Decision tree, step by step. Each step → outcome → branch (result or next_step). The trace records every fired_red_flags entry at each step.

Stage 5

Tracks materialization

ALL Indications in algorithm.output_indications become their own tracks (standard / aggressive / surveillance). The first is the default; the rest are alternatives.

Stage 6

Per-track resolution

Indication → Regimen → MonitoringSchedule + SupportiveCare + Contraindications. Everything resolves from the read-only KB.

Per-patient processing time is 50–200 ms (KB load dominates). In Pyodide the first run takes 8–15 sec (runtime download); subsequent runs match a local CLI. There is no server — the engine runs locally (CLI) or in the user's browser (Pyodide). The patient JSON never leaves the machine.

2. How we work with patient data

The engine reads only structured fields from the patient profile. Each field has a clear semantic role: it either triggers a RedFlag, filters available Indications, or configures Regimen materialization. Unknown fields are ignored — no hidden side effects.

Category	What we read	How we use it
Disease (entry point)	`disease.id` · `icd_o_3_morphology` · `line_of_therapy`	determines which Algorithm to run
Diagnostic-mode trigger	`disease.suspicion.lineage_hint` · `tissue_locations` · `presentation`	switches output to DiagnosticPlan instead of Plan (workup brief)
Demographics	`age` · `sex` · `ecog` · `fit_for_transplant` · `decompensated_cirrhosis` · `pregnancy_status`	filter on `Indication.applicable_to.demographic_constraints`
Biomarkers	any `BIO-X` from the KB as keys: `BIO-CLL-HIGH-RISK-GENETICS`, `BIO-MM-CYTOGENETICS-HR`, `BIO-HCV-RNA`, …	fire RedFlags, filter Indications
Findings	381+ structured fields — `dominant_nodal_mass_cm`, `ldh_ratio_to_uln`, `creatinine_clearance_ml_min`, `blastoid_morphology`, `tp53_mutation`, `del_17p`, …	thresholds inside RedFlag triggers
Prior tests completed	`prior_tests_completed: [TEST-IDs]`	excludes already-done tests from generated workup_steps
Clinical record (free-form)	any `clinical_record` envelope	not read by the engine — surfaced only by the render layer for context

3. How we work with sources — and why this is our key advantage

29,450+

29,450+ primary clinical publications (RCTs, meta-analyses, cohort studies) sit beneath 261 curated top-level guidelines. That is 11,611 pages of guideline text in total. No clinician can physically work through that volume for every patient; the engine indexes it for you and returns a Plan with phrased citations and page-level locators.

Each source is its own Source entity with a stable ID (e.g. SRC-NCCN-BCELL-2025), title, version, license, and access mode (referenced vs hosted per SOURCE_INGESTION_SPEC §1.4). The KB currently holds 261 sources:

Source ID	Type	Pages	Primary refs	Role in the corpus
`SRC-NCCN-BCELL-2025`	NCCN B-Cell Lymphomas v.2.2025	500	700	primary_guideline
`SRC-NCCN-MM-2025`	NCCN Multiple Myeloma 2025	400	600	primary_guideline
`SRC-NCCN-AML-2025`	NCCN AML 2025	350	500	primary_guideline
`SRC-NCCN-MPN-2025`	NCCN MPN 2025	300	400	primary_guideline
`SRC-EASL-HCV-2023`	EASL HCV Guidelines 2023	80	250	primary_guideline
`SRC-WHO-LNSC-2023`	WHO Lymph Node, Spleen, Thymus Cytopathology	150	200	diagnostic_methodology
`SRC-CTCAE-V5`	NCI CTCAE v5.0 (toxicity terminology)	150	30	terminology
`SRC-ESMO-MZL-2024`	ESMO Marginal Zone Lymphomas 2024	30	150	primary_guideline
`SRC-BSH-MZL-2024`	BSH MZL Guideline 2024	50	120	regional_guideline
`SRC-EHA-WORKUP-2024`	EHA Practical Workup Guidelines 2024	40	100	diagnostic_methodology
`SRC-MOZ-UA-LYMPH-2024`	Ukraine MoH — Lymphomas (placeholder)	60	50	regional_guideline
`SRC-ARCAINI-2014`	IELSG-19 RCT — MALT lymphoma	10	50	rct_publication
`SRC-FDA-CDS-2026`	FDA CDS Software Guidance 2026	30	20	regulatory
Total		11,611	29,450+	—

Every clinical claim in the KB has a citation. Indication, Regimen, RedFlag, Algorithm — all carry a sources: list field where each source is annotated with:

Citation structure

source_id — points to the Source entity
position — supports / contradicts / context
relevant_quote_paraphrase — paraphrased statement from the guideline (not verbatim copy-paste, for license safety)
page_or_section — exact locator inside the document

The render layer surfaces the full list of cited sources below every Indication in the Plan. This is FDA Criterion 4 (CHARTER §15.2): the clinician can independently verify the basis for every recommendation, instead of taking the engine on faith.

Source hosting defaults to referenced. We do not mirror external databases (NCCN, ESMO, etc.) — we link to them. Hosting requires an explicit H1–H5 justification (CHARTER §15.2, referenced vs hosted vs mixed). Exception: regulatory PDFs (FDA CDS, CTCAE) are kept locally for archive stability.

4. How we run requests

Three ways to run the engine — none of them server-bound:

CLI

Locally on the clinician's machine

python -m knowledge_base.engine.cli --patient profile.json --render plan.html. Works offline, no network needed. The profile stays on disk.

Pyodide

In the browser (try.html)

Pyodide v0.26.4 loads the Python WebAssembly runtime, micropip installs pydantic + pyyaml, and the engine bundle (~302 KB) is unpacked into the in-memory FS. The engine runs in the browser. Patient JSON never leaves the machine.

Library

Python import

from knowledge_base.engine import generate_plan, revise_plan — integration with EHRs, CSV pipelines, batch testing. Stateless and deterministic.

Privacy by design. Patient JSON never leaves the user's machine. There are no logs, no database, no accidental leakage. Reproducibility: Plan.knowledge_base_state.algorithm_version pins the KB version — same input + same KB = same output.

5. What we return

The engine returns a Plan (treatment mode) or a DiagnosticPlan (workup brief). Each Plan contains:

Field	Contents
`tracks[]`	≥2 alternative tracks (default first), each with indication + regimen + monitoring + supportive_care + contraindications
`fda_compliance`	FDA Criterion 4 fields: intended_use, hcp_user_specification, patient_population_match, algorithm_summary, data_sources_summary, data_limitations, automation_bias_warning
`trace`	step-by-step history of walk_algorithm: step / outcome / branch / fired_red_flags for every step
`knowledge_base_state`	snapshot of the KB version at generation time (audit per CHARTER §10.2)
`kb_resolved`	all referenced entities (Disease, Tests, RedFlags, Algorithm) for the render layer
`warnings`	schema/ref errors, time-critical disqualifications, missing-data hints
`supersedes` / `superseded_by`	version chain across plans for the same patient

Optionally, a MDT brief is added by orchestrate_mdt(): it reads the Plan + patient profile and appends required / recommended / optional roles (drawn from 16 virtual specialists), open questions, and a provenance graph. It renders as an inline section inside the Plan HTML.

6. How a plan updates when new data arrives

revise_plan(updated_patient, previous_plan, revision_trigger) takes the updated profile and produces a new plan version with a supersedes / superseded_by chain. Three legal transitions plus one prohibition:

From	With changes	Transition	Result
DiagnosticPlan vN	only suspicion (no histology)	diagnostic → diagnostic	DiagnosticPlan v(N+1)
DiagnosticPlan vN	confirmed histology	diagnostic → treatment (promotion)	Plan v1 (first treatment)
Plan vN	any update with histology present	treatment → treatment	Plan v(N+1)
Plan vN	histology removed	ILLEGAL — ValueError, CHARTER §15.2 C7

The previous plan is not mutated — a deep copy is returned with superseded_by filled in. The caller (CLI or EHR) decides what to do with both versions. Per CHARTER §10.2, the older version is kept indefinitely.

What triggers a new plan: any change to one of the ~30 structured fields — new biomarkers (del(17p) detected), new stage (ECOG 1 → 3), new findings (bulky disease on restaging), new infectious flags (HBV reactivation), pregnancy detected. Changes to the clinical_record free-text do not trigger regeneration (the engine does not read free text).

7. What else we do

Diagnostic workups

Pre-biopsy mode: when histology is missing, the engine emits a Workup Brief with the test list, biopsy approach, IHC panel, and triage MDT roles. Per CHARTER §15.2 C7, no treatment Plan is generated without histology.

MDT

Multidisciplinary brief

The orchestrator reads the Plan + profile and assigns required / recommended / optional roles (16 catalog), formulates open questions (Q1–Q6 + DQ1–DQ4), and builds the decision-provenance graph.

stats

KB dashboard

python -m knowledge_base.stats — actual entity counts + per-disease coverage matrix + reviewer sign-off ratio. Available as CLI / JSON / embeddable HTML widget for the landing page.

render

A4 print-friendly HTML

Single-file HTML with embedded CSS, no external assets beyond Google Fonts. Adapts to A4 print via @page and @media print. Tracks are shown side by side; the alternative is never hidden (anti automation-bias per CHARTER §15.2 C6).

trials

Experimental options (Phase C — done)

enumerate_experimental_options(...) queries ClinicalTrials.gov v2 by disease + biomarker + line_of_therapy and returns ExperimentalOption entries with UA-availability metadata (sites_ua, countries) — render-time only; the engine never uses a trial as a selection signal. Status filter: RECRUITING / ACTIVE_NOT_RECRUITING / ENROLLING_BY_INVITATION; integrated into generate_plan, rendered as a third Plan track; 7-day on-disk TTL cache for offline runs.

access

Access Matrix (Phase D)

Every Plan carries an AccessMatrix — per-track aggregation of UA registration, НСЗУ coverage, ₴ cost orientation, and primary AccessPathway. Render-time only (CHARTER §8.3, guaranteed by the invariant test). Stale-cost warning fires when cost_last_updated > 180 days. Trial rows are appended automatically when the experimental track is active.

Capabilities

One JSON profile → two alternative treatment plans, a citation under every recommendation.

What's already in

1. How a request is processed

2. How we work with patient data

3. How we work with sources — and why this is our key advantage

Citation structure

4. How we run requests

5. What we return

6. How a plan updates when new data arrives

7. What else we do