Agent System¶
Overview¶
Curaway's AI agent system is the brain of the platform. It orchestrates multi-step workflows that combine clinical understanding, patient interaction, provider matching, and natural-language explanation. The system is built on three pillars:
- LangGraph -- Orchestration framework for multi-node, stateful AI workflows
- LangChain -- Tool wrappers that give agents access to databases, APIs, and external services
- Langfuse -- Observability platform for tracing, prompt management, and cost tracking
Healthcare Safety Principle
Every agent has a deterministic fallback path. If an LLM call fails, times out, or returns invalid output, the system falls back to rule-based logic. Healthcare workflows cannot be broken by LLM failure.
Architecture¶
Conversation engine: v6.2-flexible
The conversational turn is no longer driven by a fixed 8-phase state machine. It is dispatched to one of three architecture generations (v4 / v6 / v6.2-flexible) per case. v6.2-flexible — single-call, SOP-contract-guided triage — is the current target architecture. See ADR-0030 for the full decision and the v6.2 specs. The specialized agents described below (clinical context, matching, explanation) remain the functional components invoked as a case progresses.
Single Entry Point¶
All patient interactions flow through one API endpoint:
The handler (app/routers/chat.py) does not run a phase state machine. It resolves the conversation architecture for the case and dispatches the turn. Architecture is resolved once per case, at the first turn, and written to case.workflow_state.prompt_arch — flag flips never re-route an in-flight conversation (see sticky dispatch).
Conversation architecture dispatch¶
The chat handler dispatches each turn to one of three architecture generations, selected per case via Flagsmith and made sticky on first resolve:
| Arch | Engine | How a turn is handled | Flag |
|---|---|---|---|
| v6.2-flexible | run_triage_turn_v6_2 (app/agents/triage_v6_2/dispatch.py) |
One LLM call both converses and emits a structured StateDelta, guided by a per-procedure SOPContract. State lives in the consolidated Case.consolidated_state column. |
prompt_arch_v6_2_flexible + consolidated_state_backfill_complete |
| v6 | compose_v6 (app/services/prompt_loader_v6/composer.py) |
Prompt assembled from a stage resolver + injected knowledge/SOP addendums; extraction is a separate concern. Current GA path. | prompt_arch="v6" |
| v4 | phase×layer composition | Phase template crossed with per-layer context; separate extractor chain. The rollback floor — every newer dispatcher falls back to v4 on hard error. | prompt_arch="v4" |
graph TD
Entry["POST /chat"] --> Dispatch[Architecture dispatcher]
Dispatch -->|sticky per-case| Resolve{"workflow_state.prompt_arch"}
Resolve -->|v6.2-flexible| T62["run_triage_turn_v6_2<br/>(single call + SOPContract)"]
Resolve -->|v6| C6["compose_v6<br/>(stages + addendums)"]
Resolve -->|v4 / fallback| V4["phase×layer<br/>(rollback floor)"]
T62 --> Agents["Specialized agents:<br/>Clinical Context · Match · Explanation"]
C6 --> Agents
V4 --> Agents
style Dispatch fill:#008B8B,color:#fff
style T62 fill:#FF7F50,color:#fff
style Agents fill:#FF7F50,color:#fff
Sticky dispatch. resolve_prompt_arch_for_case resolves the architecture once and persists it on the case. A case that starts on v6.1 finishes on v6.1 even if prompt_arch_v6_2_flexible is flipped mid-conversation. A rollout (or rollback) only changes which arch new cases resolve to — it can never yank a patient mid-conversation onto a different engine. This is the property that makes the canary safe.
First-Message Attachments
When a user sends their first message with a file attachment (e.g. "I need a knee replacement" + blood work PDF), the agent identifies the procedure and processes the attachments in a single turn — it does not ask for records that were just uploaded. The procedure confirmation response is combined with the document analysis.
The Four Agents¶
1. Clinical Context Agent¶
The Clinical Context Agent processes medical documents and extracts structured clinical data. It is the most complex agent, implemented as a 4-node LangGraph workflow.
Purpose: Transform unstructured medical documents into structured FHIR R4 resources.
Model: Claude Sonnet 4.6 (requires high-accuracy clinical reasoning)
graph LR
A[extract_clinical_entities] --> B[map_to_medical_codes]
B --> C[generate_fhir_resources]
C --> D[store_resources]
style A fill:#008B8B,color:#fff
style B fill:#008B8B,color:#fff
style C fill:#008B8B,color:#fff
style D fill:#008B8B,color:#fff
Node Details:
| Node | Input | Output | Fallback |
|---|---|---|---|
extract_clinical_entities |
Raw OCR text | Structured entities (conditions, labs, meds) | Regex-based extraction patterns |
map_to_medical_codes |
Extracted entities | ICD-10, CPT, LOINC codes | Lookup table mapping |
generate_fhir_resources |
Coded entities | FHIR R4 JSON resources | Template-based FHIR generation |
store_resources |
Validated FHIR | Database confirmation | Direct SQL insert |
State Schema:
class ClinicalContextState(TypedDict):
"""State passed between Clinical Context Agent nodes."""
document_id: str
tenant_id: str
patient_id: str
case_id: str
raw_text: str
extracted_entities: dict # From node 1
medical_codes: dict # From node 2
fhir_resources: list[dict] # From node 3
store_confirmation: dict # From node 4
errors: list[str]
fallback_used: bool
Comorbidity Detection
Comorbidity detection is rule-based, not LLM-based. The system maintains a lookup table of common comorbidity pairs (e.g., diabetes + hypertension, obesity + sleep apnea) and flags them deterministically. This costs $0 per case.
2. Intake Agent¶
The Intake Agent conducts conversational intake to gather patient preferences, travel constraints, and treatment requirements.
Purpose: Collect structured preferences through natural conversation.
Model: Claude Haiku 4.5 (conversational, low-cost)
graph LR
A[classify_message] --> B[collect_preferences]
B --> C[suggest_options]
C --> D[update_case]
style A fill:#008B8B,color:#fff
style B fill:#008B8B,color:#fff
style C fill:#008B8B,color:#fff
style D fill:#008B8B,color:#fff
Node Details:
| Node | Input | Output | Fallback |
|---|---|---|---|
classify_message |
Patient message | Intent classification | Keyword matching |
collect_preferences |
Classified intent | Structured preference data | Form-based collection |
suggest_options |
Current preferences | Contextual suggestions | Static suggestion list |
update_case |
Confirmed preferences | Updated case record | Direct DB update |
Collected Preferences:
class PatientPreferences(BaseModel):
"""Preferences collected by the Intake Agent."""
budget_range_usd: Optional[tuple[int, int]]
preferred_countries: list[str] # ISO 3166-1 alpha-3
excluded_countries: list[str]
preferred_languages: list[str]
travel_date_range: Optional[tuple[date, date]]
companion_count: int = 0
dietary_restrictions: list[str]
accessibility_needs: list[str]
insurance_provider: Optional[str]
previous_medical_travel: bool = False
priority: str = "balanced" # "cost", "quality", "speed", "balanced"
State Schema:
class IntakeState(TypedDict):
"""State passed between Intake Agent nodes."""
case_id: str
tenant_id: str
patient_id: str
message: str
intent: str # From node 1
current_preferences: dict # Existing preferences
new_preferences: dict # From node 2
suggestions: list[str] # From node 3
update_confirmation: dict # From node 4
conversation_history: list[dict]
missing_fields: list[str]
errors: list[str]
3. Match Agent¶
The Match Agent orchestrates the provider matching workflow, combining graph traversal, semantic search, and weighted scoring.
Purpose: Find and rank the best providers and doctors for a patient's case.
Model: Claude Haiku 4.5 (orchestration) + deterministic scoring
graph LR
A[analyze_requirements] --> B[gather_requirements]
B --> C[execute_scoring]
C --> D[rerank_and_explain]
style A fill:#FF7F50,color:#fff
style B fill:#FF7F50,color:#fff
style C fill:#FF7F50,color:#fff
style D fill:#FF7F50,color:#fff
Node Details:
| Node | Input | Output | Fallback |
|---|---|---|---|
analyze_requirements |
Case data, FHIR resources | Structured matching criteria | Rule-based criteria extraction |
gather_requirements |
Matching criteria | Provider candidates from Neo4j + Qdrant | Direct Neo4j query |
execute_scoring |
Candidates + criteria | Scored and ranked results | Weighted rules scoring |
rerank_and_explain |
Scored results | Final ranking with explanations | Template-based explanations |
State Schema:
class MatchState(TypedDict):
"""State passed between Match Agent nodes."""
case_id: str
tenant_id: str
patient_id: str
clinical_data: dict # FHIR resources
patient_preferences: dict
matching_criteria: dict # From node 1
candidates: list[dict] # From node 2
scored_results: list[dict] # From node 3
final_results: list[dict] # From node 4
strategy_used: str
errors: list[str]
fallback_used: bool
4. Explanation Agent¶
The Explanation Agent generates natural-language explanations of matching results, tailored to the patient's locale and language.
Purpose: Make AI matching decisions transparent and understandable.
Model: Claude Haiku 4.5 (natural language generation)
Capabilities:
- Generates per-provider explanations (why this provider was recommended)
- Generates per-dimension explanations (why the clinical score is X)
- Adapts language to patient's
preferred_language - Adapts complexity to patient's indicated health literacy level
- Highlights strengths and potential concerns for each match
class ExplanationOutput(BaseModel):
"""Output from the Explanation Agent."""
provider_id: str
summary: str # 2-3 sentence overview
strengths: list[str] # Top 3 strengths
considerations: list[str] # Things to be aware of
dimension_explanations: dict[str, str] # Per-scoring-dimension
confidence_note: Optional[str] # If data completeness is low
locale: str # Language code used
Locale-Aware Explanations
The Explanation Agent detects the patient's preferred language from their profile and generates explanations in that language. For the MVP, English, Hindi, Arabic, Turkish, and Thai are supported.
Pre-Operative Risk Assessor (Rule-Based)¶
At the end of every EHR rebuild, ehr_builder_agent runs app/services/risk_assessor.py — a pure-function, rule-based pre-operative risk classifier (no LLM). It mirrors the lab_analyzer pattern: deterministic, auditable, healthcare-safe.
The assessor inspects four buckets and writes the result to ehr_snapshot.risk_factors:
| Bucket | Examples | Severity |
|---|---|---|
| Age | ≥70 moderate, ≥80 high | moderate / high |
| Comorbidities | Diabetes, AFib, heart failure, CKD, COPD, OSA, anemia | low → high |
| Medications | Anticoagulants (blocking), antiplatelets, immunosuppressants, NSAIDs, diabetes meds | low → blocking |
| Labs | HbA1c ≥9% (blocking), Hgb <8 (blocking), eGFR <30 (high), INR >1.5 (blocking), platelets <100k (blocking) | high / blocking |
Each risk record carries source provenance (which document / observation / med record it came from) and an is_blocking flag. Blocking risks halt forwarding until resolved; the frontend EHR drawer surfaces a "BLOCKING" badge when any risk is blocking. Covered by 26 unit tests in tests/test_risk_assessor.py.
Deterministic Fallbacks¶
Every agent node has a fallback implementation that runs without LLM calls:
async def extract_clinical_entities(state: ClinicalContextState) -> ClinicalContextState:
"""Extract clinical entities from document text."""
try:
# Primary: LLM-based extraction
result = await llm_extract(state["raw_text"])
state["extracted_entities"] = result
except (LLMError, TimeoutError, ValidationError) as e:
# Fallback: Regex + lookup table extraction
logger.warning(f"LLM extraction failed, using fallback: {e}")
result = regex_extract(state["raw_text"])
state["extracted_entities"] = result
state["fallback_used"] = True
state["errors"].append(f"Fallback used for extraction: {str(e)}")
return state
| Agent | Primary Path | Fallback Path | Fallback Quality |
|---|---|---|---|
| Clinical Context | Claude Sonnet extraction | Regex + lookup tables | ~70% of LLM accuracy |
| Intake | Claude Haiku conversation | Form-based collection | Functional but rigid |
| Match | LLM-enhanced scoring | Weighted rules only | ~90% of LLM accuracy |
| Explanation | Claude Haiku generation | Template-based text | Functional but generic |
MCP Server¶
Curaway exposes an MCP (Model Context Protocol) server with 6 tools for external AI assistants to interact with the platform:
| Tool | Description | Parameters |
|---|---|---|
search_patients |
Find patients by name, email, or ID | query, tenant_id |
get_patient_clinical_summary |
Get FHIR-based clinical summary | patient_id, tenant_id |
search_providers |
Search providers by specialty, location, accreditation | criteria, tenant_id |
run_match |
Execute matching for a case | case_id, tenant_id |
get_match_explanation |
Get explanation for a match result | match_id, tenant_id |
check_consent |
Verify patient consent status | patient_id, consent_type, tenant_id |
# MCP tool registration
@mcp_server.tool("search_providers")
async def search_providers(criteria: ProviderSearchCriteria, tenant_id: str):
"""Search for healthcare providers matching the given criteria."""
results = await provider_service.search(
tenant_id=tenant_id,
specialty=criteria.specialty,
country=criteria.country,
accreditation=criteria.accreditation,
max_results=criteria.max_results or 10,
)
return [provider.to_mcp_response() for provider in results]
Feature Flags¶
Agent behavior is controlled by Flagsmith feature flags:
| Flag | Default | Description |
|---|---|---|
agent_enhanced_matching |
false |
Use Match Agent instead of pure deterministic matching |
agent_explanations_enabled |
true |
Generate LLM explanations (vs. template-based) |
clinical_context_agent_enabled |
true |
Use LangGraph clinical extraction pipeline |
intake_agent_conversational |
true |
Conversational intake vs. form-based |
mcp_server_enabled |
false |
Expose MCP tools externally |
Observability¶
Events Table¶
Every agent action is logged to the events table:
await log_event(
tenant_id=tenant_id,
event_type="agent.clinical_context.extraction_complete",
case_id=case_id,
payload={
"document_id": doc_id,
"entities_found": len(entities),
"fallback_used": False,
"duration_ms": elapsed,
}
)
Langfuse Traces¶
Each agent invocation creates a Langfuse trace with:
- Trace: Full agent execution (e.g.,
clinical_context_agent) - Spans: Individual node executions (e.g.,
extract_clinical_entities) - Generations: LLM calls with input/output tokens and cost
- Scores: Quality metrics (extraction accuracy, explanation helpfulness)
graph TD
T[Trace: clinical_context_agent] --> S1[Span: extract_clinical_entities]
T --> S2[Span: map_to_medical_codes]
T --> S3[Span: generate_fhir_resources]
T --> S4[Span: store_resources]
S1 --> G1[Generation: claude-sonnet-4.6]
S2 --> G2[Generation: claude-haiku-4.5]
S3 --> G3[Generation: claude-haiku-4.5]
style T fill:#008B8B,color:#fff
style S1 fill:#4A90D9,color:#fff
style S2 fill:#4A90D9,color:#fff
style S3 fill:#4A90D9,color:#fff
style S4 fill:#4A90D9,color:#fff
style G1 fill:#FF7F50,color:#fff
style G2 fill:#FF7F50,color:#fff
style G3 fill:#FF7F50,color:#fff
Model Selection¶
| Agent / Task | Model | Rationale |
|---|---|---|
| Clinical Context Agent (extraction) | Claude Sonnet 4.6 | Highest accuracy needed for medical data |
| Clinical Context Agent (coding) | Claude Haiku 4.5 | Lookup-heavy, lower complexity |
| Intake Agent | Claude Haiku 4.5 | Conversational, high-volume |
| Match Agent (orchestration) | Claude Haiku 4.5 | Mostly deterministic scoring |
| Match Agent (reranking) | Claude Sonnet 4.6 | Complex multi-factor reasoning |
| Explanation Agent | Claude Haiku 4.5 | Natural language generation |
| MCP Tools | Claude Haiku 4.5 | External tool responses |