Capabilities
OpenOnco takes a JSON patient profile and returns a structured treatment plan or a diagnostic brief, with a full trace of every decision and citations for every claim. There is no black box: every recommendation comes from declarative rules you can read in the knowledge base and follow through the trace. The rest of this page details how we work with data, sources, and requests.
One JSON profile → two alternative treatment plans, a citation under every recommendation.
A declarative rule engine across 65 diseases, backed by 29,450+ clinical publications under 261 top-level guidelines. No LLM in the clinical decision, no server, no logs. Patient JSON never leaves the machine.
disease, biomarkers,
findings, line_of_therapy.
revise_plan(...) updates recommendations as new
biomarkers and findings come in.
What's already in
Each disease carries an archetype (etiologically_driven like HCV-MZL, risk_stratified like MM, biomarker_driven like NSCLC/EGFR or HGBL/MYC-BCL2, stage_driven like cervical) which determines the algorithm logic for treatment selection. 1L is covered for all 65 diseases (65 algorithms); 2L+ is covered for 34 hematologic and 9 solid diseases (45 algorithms).
Hematologist, pathologist, infectious-disease/hepatologist, radiologist, molecular geneticist, clinical pharmacist, radiation oncologist, palliative care and others — each is activated by specific triggers in the patient profile and contributes its own open questions and supportive-care recommendations to the plan.
When histology is not yet confirmed (CHARTER §15.2 C7 forbids a
treatment Plan without it), the engine switches into
diagnostic mode: it emits a Workup Brief with
the test list, biopsy approach, IHC panel, and the roles required
in the triage MDT. Once histology is confirmed, the diagnostic
plan is promoted to a treatment plan via
revise_plan(...).
Red flags are structured clinical conditions that automatically change the plan: RF-BULKY-DISEASE (nodal mass >7 cm) switches HCV-MZL from antiviral-first to BR + DAA; RF-MM-HIGH-RISK-CYTOGENETICS (t(4;14), del(17p), gain 1q) escalates MM from triplet VRd to quadruplet D-VRd. Every RedFlag is bound to a domain role that "catches" it in the MDT brief.
Each regimen is a list of drugs with doses, cycle schedule, dose adjustments (renal impairment, FIB-4, frailty), premedications, mandatory supportive care, and a monitoring schedule. Every Indication (the combination of disease + line_of_therapy + biomarker / stage / demographic filters) gates a specific Regimen — e.g. MGMT-METHYLATION for GBM Stupp; CD79B / COO-Hans / IPI for DLBCL R-CHOP vs Pola-R-CHP; t(11;14) / MIPI for MCL; MYC + BCL2 rearrangements for HGBL-DH; AFP for HCC; FLIPI for FL.
ATC/RxNorm-coded. Each carries its FDA/EMA/local-MoH regulatory status plus reimbursement flags (e.g. daratumumab is currently NOT reimbursed by Ukraine's NHSU — a hard blocker for D-VRd, surfaced explicitly in the plan).
LOINC-coded labs + imaging + histology + IHC + genomic tests.
Each carries a priority_class (critical / standard /
desired / calculation_based) — rendered in every Plan as the
"pre-treatment investigations" table.
Behind these 261 curated sources sit 29,450+ primary clinical publications (RCTs, meta-analyses, cohort studies) and 11,611 pages of guideline text. The NCCN B-Cell Lymphomas guideline alone is ~500 pages with ~700 references. Every Indication / Regimen / RedFlag cites specific sources with a position (supports / contradicts / context), a paraphrased quote, and page/section locator. FDA Criterion 4 — the clinician independently verifies the basis for every recommendation.
CHARTER (governance + FDA positioning), CLINICAL_CONTENT_STANDARDS, KNOWLEDGE_SCHEMA, DATA_STANDARDS, SOURCE_INGESTION, REFERENCE_CASE, MDT_ORCHESTRATOR, DIAGNOSTIC_MDT, WORKUP_METHODOLOGY, SKILL_ARCHITECTURE.
2026-04-27 10:44 UTC.
Reviewer sign-offs ≥ 2: 15/298
— all clinical content is marked STUB until two of
three Clinical Co-Leads sign it off. This is a decision-support tool,
not a medical device.
1. How a request is processed
The clinician feeds the engine a JSON patient profile (FHIR/mCODE-aligned in the future, a simplified dict in the MVP). The engine runs six sequential stages and returns a Plan with ≥2 alternative tracks (CHARTER §2 — both tracks live in the same document; alternatives are never hidden).
disease.icd_o_3_morphology or disease.id
→ look up the Disease entity. Disease + line_of_therapy
→ look up the matching Algorithm.
demographics + biomarkers +
findings into one flat dict for evaluation.
any_of / all_of /
none_of clauses with thresholds.
result or next_step). The trace records
every fired_red_flags entry at each step.
algorithm.output_indications become
their own tracks (standard / aggressive / surveillance). The first
is the default; the rest are alternatives.
Per-patient processing time is 50–200 ms (KB load dominates). In Pyodide the first run takes 8–15 sec (runtime download); subsequent runs match a local CLI. There is no server — the engine runs locally (CLI) or in the user's browser (Pyodide). The patient JSON never leaves the machine.
2. How we work with patient data
The engine reads only structured fields from the patient profile. Each field has a clear semantic role: it either triggers a RedFlag, filters available Indications, or configures Regimen materialization. Unknown fields are ignored — no hidden side effects.
| Category | What we read | How we use it |
|---|---|---|
| Disease (entry point) | disease.id · icd_o_3_morphology · line_of_therapy |
determines which Algorithm to run |
| Diagnostic-mode trigger | disease.suspicion.lineage_hint · tissue_locations · presentation |
switches output to DiagnosticPlan instead of Plan (workup brief) |
| Demographics | age · sex · ecog · fit_for_transplant · decompensated_cirrhosis · pregnancy_status |
filter on Indication.applicable_to.demographic_constraints |
| Biomarkers | any BIO-X from the KB as keys: BIO-CLL-HIGH-RISK-GENETICS, BIO-MM-CYTOGENETICS-HR, BIO-HCV-RNA, … |
fire RedFlags, filter Indications |
| Findings | 381+ structured fields — dominant_nodal_mass_cm, ldh_ratio_to_uln, creatinine_clearance_ml_min, blastoid_morphology, tp53_mutation, del_17p, … |
thresholds inside RedFlag triggers |
| Prior tests completed | prior_tests_completed: [TEST-IDs] |
excludes already-done tests from generated workup_steps |
| Clinical record (free-form) | any clinical_record envelope |
not read by the engine — surfaced only by the render layer for context |
3. How we work with sources — and why this is our key advantage
Each source is its own Source entity with a stable ID
(e.g. SRC-NCCN-BCELL-2025), title, version, license, and
access mode (referenced vs hosted per SOURCE_INGESTION_SPEC §1.4).
The KB currently holds 261 sources:
| Source ID | Type | Pages | Primary refs | Role in the corpus |
|---|---|---|---|---|
SRC-NCCN-BCELL-2025 | NCCN B-Cell Lymphomas v.2.2025 | 500 | 700 | primary_guideline |
SRC-NCCN-MM-2025 | NCCN Multiple Myeloma 2025 | 400 | 600 | primary_guideline |
SRC-NCCN-AML-2025 | NCCN AML 2025 | 350 | 500 | primary_guideline |
SRC-NCCN-MPN-2025 | NCCN MPN 2025 | 300 | 400 | primary_guideline |
SRC-EASL-HCV-2023 | EASL HCV Guidelines 2023 | 80 | 250 | primary_guideline |
SRC-WHO-LNSC-2023 | WHO Lymph Node, Spleen, Thymus Cytopathology | 150 | 200 | diagnostic_methodology |
SRC-CTCAE-V5 | NCI CTCAE v5.0 (toxicity terminology) | 150 | 30 | terminology |
SRC-ESMO-MZL-2024 | ESMO Marginal Zone Lymphomas 2024 | 30 | 150 | primary_guideline |
SRC-BSH-MZL-2024 | BSH MZL Guideline 2024 | 50 | 120 | regional_guideline |
SRC-EHA-WORKUP-2024 | EHA Practical Workup Guidelines 2024 | 40 | 100 | diagnostic_methodology |
SRC-MOZ-UA-LYMPH-2024 | Ukraine MoH — Lymphomas (placeholder) | 60 | 50 | regional_guideline |
SRC-ARCAINI-2014 | IELSG-19 RCT — MALT lymphoma | 10 | 50 | rct_publication |
SRC-FDA-CDS-2026 | FDA CDS Software Guidance 2026 | 30 | 20 | regulatory |
| Total | 11,611 | 29,450+ | — | |
Every clinical claim in the KB has a citation.
Indication, Regimen, RedFlag, Algorithm — all carry a
sources: list field where each source is annotated with:
Citation structure
source_id— points to the Source entityposition— supports / contradicts / contextrelevant_quote_paraphrase— paraphrased statement from the guideline (not verbatim copy-paste, for license safety)page_or_section— exact locator inside the document
The render layer surfaces the full list of cited sources below every Indication in the Plan. This is FDA Criterion 4 (CHARTER §15.2): the clinician can independently verify the basis for every recommendation, instead of taking the engine on faith.
4. How we run requests
Three ways to run the engine — none of them server-bound:
python -m knowledge_base.engine.cli --patient profile.json --render plan.html.
Works offline, no network needed. The profile stays on disk.
Pyodide v0.26.4 loads the Python WebAssembly runtime, micropip installs pydantic + pyyaml, and the engine bundle (~302 KB) is unpacked into the in-memory FS. The engine runs in the browser. Patient JSON never leaves the machine.
from knowledge_base.engine import generate_plan, revise_plan
— integration with EHRs, CSV pipelines, batch testing. Stateless
and deterministic.
Plan.knowledge_base_state.algorithm_version pins the
KB version — same input + same KB = same output.
5. What we return
The engine returns a Plan (treatment mode) or a DiagnosticPlan (workup brief). Each Plan contains:
| Field | Contents |
|---|---|
tracks[] | ≥2 alternative tracks (default first), each with indication + regimen + monitoring + supportive_care + contraindications |
fda_compliance | FDA Criterion 4 fields: intended_use, hcp_user_specification, patient_population_match, algorithm_summary, data_sources_summary, data_limitations, automation_bias_warning |
trace | step-by-step history of walk_algorithm: step / outcome / branch / fired_red_flags for every step |
knowledge_base_state | snapshot of the KB version at generation time (audit per CHARTER §10.2) |
kb_resolved | all referenced entities (Disease, Tests, RedFlags, Algorithm) for the render layer |
warnings | schema/ref errors, time-critical disqualifications, missing-data hints |
supersedes / superseded_by | version chain across plans for the same patient |
Optionally, a MDT brief is added by orchestrate_mdt(): it reads the Plan + patient profile and appends required / recommended / optional roles (drawn from 16 virtual specialists), open questions, and a provenance graph. It renders as an inline section inside the Plan HTML.
6. How a plan updates when new data arrives
revise_plan(updated_patient, previous_plan, revision_trigger)
takes the updated profile and produces a new plan version with a
supersedes / superseded_by chain. Three
legal transitions plus one prohibition:
| From | With changes | Transition | Result |
|---|---|---|---|
| DiagnosticPlan vN | only suspicion (no histology) | diagnostic → diagnostic | DiagnosticPlan v(N+1) |
| DiagnosticPlan vN | confirmed histology | diagnostic → treatment (promotion) | Plan v1 (first treatment) |
| Plan vN | any update with histology present | treatment → treatment | Plan v(N+1) |
| Plan vN | histology removed | ILLEGAL — ValueError, CHARTER §15.2 C7 | |
The previous plan is not mutated — a deep copy is
returned with superseded_by filled in. The caller (CLI
or EHR) decides what to do with both versions. Per CHARTER §10.2,
the older version is kept indefinitely.
clinical_record free-text do not trigger
regeneration (the engine does not read free text).
7. What else we do
Pre-biopsy mode: when histology is missing, the engine emits a Workup Brief with the test list, biopsy approach, IHC panel, and triage MDT roles. Per CHARTER §15.2 C7, no treatment Plan is generated without histology.
The orchestrator reads the Plan + profile and assigns required / recommended / optional roles (16 catalog), formulates open questions (Q1–Q6 + DQ1–DQ4), and builds the decision-provenance graph.
python -m knowledge_base.stats — actual entity counts
+ per-disease coverage matrix + reviewer sign-off ratio. Available
as CLI / JSON / embeddable HTML widget for the landing page.
Single-file HTML with embedded CSS, no external assets beyond
Google Fonts. Adapts to A4 print via @page and
@media print. Tracks are shown side by side; the
alternative is never hidden (anti automation-bias per
CHARTER §15.2 C6).
enumerate_experimental_options(...) queries
ClinicalTrials.gov v2 by disease + biomarker + line_of_therapy
and returns ExperimentalOption entries with
UA-availability metadata (sites_ua, countries) — render-time
only; the engine never uses a trial as a selection signal.
Status filter: RECRUITING / ACTIVE_NOT_RECRUITING /
ENROLLING_BY_INVITATION; integrated into generate_plan,
rendered as a third Plan track; 7-day on-disk TTL cache for
offline runs.
Every Plan carries an AccessMatrix — per-track
aggregation of UA registration, НСЗУ coverage, ₴ cost
orientation, and primary AccessPathway. Render-time
only (CHARTER §8.3, guaranteed by the invariant test). Stale-cost
warning fires when cost_last_updated > 180 days.
Trial rows are appended automatically when the experimental
track is active.