Skip to content

ADR-0030: Conversation v6.2-flexible — single-call SOP-contract-guided triage with per-case sticky dispatch

Context

The conversational agent has gone through three architectural generations. Each was gated by a Flagsmith flag so the previous generation stayed available as an instant rollback target:

  • v4 — phase×layer composition. The system prompt is assembled from a phase template crossed with per-layer context blocks, and a separate chain of extractor calls pulls structured state out of the conversation. Gated by prompt_arch="v4". This is the rollback floor: every newer dispatcher falls back to v4 on any error.
  • v6 — stages.yaml + knowledge addendums. compose_v6 (app/services/prompt_loader_v6/composer.py:65) assembles the prompt from a stage resolver plus injected knowledge/SOP addendums; extraction is still a separate concern. Gated by prompt_arch="v6". This is the current GA conversation path.
  • v6.2-flexible — single-LLM-call, SOP-contract-guided triage. One model call per turn both converses and emits a structured state delta, guided by a procedure-specific SOPContract that declares the fields, documents, and gates the procedure requires. Gated by prompt_arch_v6_2_flexible and consolidated_state_backfill_complete.

The v6 path works, but it carries two structural costs that v6.2 was built to remove:

  1. Multi-call fan-out per turn. Composition and extraction are separate LLM round-trips. That is latency and token spend on every turn, and two places for the model's view of state to diverge.
  2. State scattered across metadata layers. v4/v6 store extracted state across several case.metadata.layer_* blocks. Reading "what do we know about this case" means stitching layers together, and every new procedure requirement is another bespoke extractor.

v6.2-flexible collapses both: one call per turn, and one consolidated state document per case driven by a declarative per-procedure contract.

This ADR captures the architecture as a single decision because it landed across ~30 PRs over four phases (Phase 0 replay harness through Phase 4 document interpretation). The per-PR specs document each slice; this is the durable top-level record of why the architecture is shaped this way and how the rollout is gated.

Decision

Single-call triage guided by an SOPContract

Each turn runs through run_triage_turn_v6_2 (app/agents/triage_v6_2/dispatch.py:590), invoked from app/routers/chat.py (the live chat path) and app/agents/orchestrator_phases/intake_triage.py. One LLM call produces both the patient-facing reply and a StateDelta (dispatch.py:56) — the structured change to case state for this turn.

The call is shaped by a SOPContract (app/services/sop_contract.py:321), loaded per procedure from config/prompts/sops/*.yaml. The contract declares what a procedure needs — required intake fields, required documents, the document_findings_complete matcher gate, and (once Dr. Naidu signs off, #1376) the clinical_safety_rules that can pause the funnel. The agent is guided by the contract rather than by hand-written per-procedure extractor code.

State consolidated into one column

Case state for v6.2 lives in a single JSONB column, Case.consolidated_state (app/models/case.py:144), instead of scattered metadata.layer_* blocks. New cases get '{}' immediately; existing cases are migrated by scripts/backfill_consolidated_state.py. A dual-write shim (Phase 1) keeps the legacy layers and the consolidated column in sync during the transition so the read path can be swapped without a flag-day cutover.

Per-case sticky dispatch

Architecture is resolved once per case, at the first turn, by resolve_prompt_arch_for_case (dispatch.py:154, dispatched from app/agents/prompt_dispatcher.py:224). The resolved value is written to case.workflow_state.prompt_arch with prompt_arch_source="first_resolve".

Flag flips do not affect in-flight conversations. A case that started on v6.1 stays on v6.1 until it closes, even if prompt_arch_v6_2_flexible is flipped on mid-conversation. This is the single most important operational property of the dispatcher: a rollout (or rollback) only changes which arch new cases resolve to. It makes the canary safe — flipping the flag can never re-route a patient mid-conversation onto a different engine.

Dual-flag gate

A v6.2 turn is served only when both flags are true:

Flag Role Config
prompt_arch_v6_2_flexible Master gate. New cases route to run_triage_turn_v6_2 when on (per-tenant via Flagsmith identity). config/feature_flags.yaml:887, default false
consolidated_state_backfill_complete Operator sentinel. Set true only after backfill_consolidated_state.py has run across all prod cases and the shim swallow-counter has held at zero for ≥24h. config/feature_flags.yaml:901, default false

If consolidated_state_backfill_complete is false, the read path falls through to v6.1 silently with metric prompt_arch_fallback_backfill_incomplete. The backfill sentinel exists so the v6.2 read path can never run against a case whose consolidated_state was never populated — the engine assumes the column is authoritative, and the sentinel proves it is before the engine trusts it.

Both flags are flipped in both Flagsmith Production and Development environments together (dual-env rule). Rollback is flipping prompt_arch_v6_2_flexible false in both envs — no redeploy.

Consequences

Rollback is layered, not binary

There are three rollback surfaces, in increasing blast radius:

  1. Per-case — automatic. Any dispatcher error falls back to v4 (the v6_fallback_reason trace tag records why).
  2. v6.2 → v6.1 — flip prompt_arch_v6_2_flexible false. New cases resolve to v6.1; in-flight v6.2 cases finish on v6.2 (sticky).
  3. v6 → v4 — flip prompt_arch to "v4". The phase×layer floor.

Because dispatch is sticky, none of these rollbacks can yank a patient mid-conversation onto a different engine. The cost is that a bad v6.2 case already in flight runs to completion on v6.2 — mitigated by the per-turn fallback to v4 on hard errors.

The canary is gated on clinical sign-off, not just engineering

The SOP contracts ship with empty clinical_safety_rules: []. The gating machinery is wired but fed nothing, so no uploaded red flag can pause the elective-travel funnel until the rules are populated. Populating them requires Dr. Naidu's thresholds, tracked in #1376. The v6_sop_enabled flag and the v6.2 canary therefore stay OFF until that clinical gate clears — see Clinical Sign-off Governance. This is deliberate: the engine is production-ready; the clinical content it gates on is the remaining blocker.

State migration is a one-way door with a sentinel

Once consolidated_state is the read source, the legacy metadata.layer_* blocks become write-only history. The dual-write shim and the consolidated_state_backfill_complete sentinel exist specifically so the cutover is observable and reversible up to the point the sentinel flips. After the sentinel is true platform-wide, reverting to reading legacy layers would itself need a migration. The 24h zero-swallow-counter precondition on the sentinel is the guard against flipping it prematurely.

Document interpretation is part of the contract (Phase 4)

v6.2 folds document understanding into the same contract model: a vision preprocessor + document_interpreter service extract findings, an OCR pipeline hook feeds them in, and the SOPContract document_findings_complete matcher gate decides when a case has the documents it needs to advance. This is why the SOP contract — not a separate document subsystem — is the single declarative source for "what does this procedure need."

Test coverage

  • Replay harness (_live_replay_turn_v6_2) — replays recorded sessions through the real v6.2 pipeline with full state-diff assertions.
  • Dispatcher unit tests — sticky resolution, dual-flag gate, backfill-incomplete fallback.
  • FastAPI TestClient integration — chat handler → dispatcher → writer per the dispatcher architecture.
  • CI regression guards — mock/real Repository signature parity; except Exception with DB mutation must include rollback.

Why an ADR was overdue

v6.2 shipped across four phases and ~30 PRs, each with its own slice spec, but the top-level architecture — single-call triage, consolidated state, sticky dual-flag dispatch — was documented only in the implementation specs. ADRs stopped at 0029. This record establishes v6.2-flexible as a single architectural decision so the canary rollout, and any future change to the dispatcher, pushes against a written baseline.