Agent System¶

Overview¶

Curaway's AI agent system is the brain of the platform. It orchestrates multi-step workflows that combine clinical understanding, patient interaction, provider matching, and natural-language explanation. The system is built on three pillars:

LangGraph -- Orchestration framework for multi-node, stateful AI workflows
LangChain -- Tool wrappers that give agents access to databases, APIs, and external services
Langfuse -- Observability platform for tracing, prompt management, and cost tracking

Healthcare Safety Principle

Every agent has a deterministic fallback path. If an LLM call fails, times out, or returns invalid output, the system falls back to rule-based logic. Healthcare workflows cannot be broken by LLM failure.

Architecture¶

Conversation engine: v6.2-flexible

The conversational turn is no longer driven by a fixed 8-phase state machine. It is dispatched to one of three architecture generations (v4 / v6 / v6.2-flexible) per case. v6.2-flexible — single-call, SOP-contract-guided triage — is the current target architecture. See ADR-0030 for the full decision and the v6.2 specs. The specialized agents described below (clinical context, matching, explanation) remain the functional components invoked as a case progresses.

Single Entry Point¶

All patient interactions flow through one API endpoint:

POST /api/v1/cases/{case_id}/chat

The handler (app/routers/chat.py) does not run a phase state machine. It resolves the conversation architecture for the case and dispatches the turn. Architecture is resolved once per case, at the first turn, and written to case.workflow_state.prompt_arch — flag flips never re-route an in-flight conversation (see sticky dispatch).

Conversation architecture dispatch¶

The chat handler dispatches each turn to one of three architecture generations, selected per case via Flagsmith and made sticky on first resolve:

Arch	Engine	How a turn is handled	Flag
v6.2-flexible	`run_triage_turn_v6_2` (`app/agents/triage_v6_2/dispatch.py`)	One LLM call both converses and emits a structured `StateDelta`, guided by a per-procedure `SOPContract`. State lives in the consolidated `Case.consolidated_state` column.	`prompt_arch_v6_2_flexible` + `consolidated_state_backfill_complete`
v6	`compose_v6` (`app/services/prompt_loader_v6/composer.py`)	Prompt assembled from a stage resolver + injected knowledge/SOP addendums; extraction is a separate concern. Current GA path.	`prompt_arch="v6"`
v4	phase×layer composition	Phase template crossed with per-layer context; separate extractor chain. The rollback floor — every newer dispatcher falls back to v4 on hard error.	`prompt_arch="v4"`

graph TD
    Entry["POST /chat"] --> Dispatch[Architecture dispatcher]
    Dispatch -->|sticky per-case| Resolve{"workflow_state.prompt_arch"}
    Resolve -->|v6.2-flexible| T62["run_triage_turn_v6_2<br/>(single call + SOPContract)"]
    Resolve -->|v6| C6["compose_v6<br/>(stages + addendums)"]
    Resolve -->|v4 / fallback| V4["phase×layer<br/>(rollback floor)"]
    T62 --> Agents["Specialized agents:<br/>Clinical Context · Match · Explanation"]
    C6 --> Agents
    V4 --> Agents

    style Dispatch fill:#008B8B,color:#fff
    style T62 fill:#FF7F50,color:#fff
    style Agents fill:#FF7F50,color:#fff

Sticky dispatch. resolve_prompt_arch_for_case resolves the architecture once and persists it on the case. A case that starts on v6.1 finishes on v6.1 even if prompt_arch_v6_2_flexible is flipped mid-conversation. A rollout (or rollback) only changes which arch new cases resolve to — it can never yank a patient mid-conversation onto a different engine. This is the property that makes the canary safe.

First-Message Attachments

When a user sends their first message with a file attachment (e.g. "I need a knee replacement" + blood work PDF), the agent identifies the procedure and processes the attachments in a single turn — it does not ask for records that were just uploaded. The procedure confirmation response is combined with the document analysis.

The Four Agents¶

1. Clinical Context Agent¶

The Clinical Context Agent processes medical documents and extracts structured clinical data. It is the most complex agent, implemented as a 4-node LangGraph workflow.

Purpose: Transform unstructured medical documents into structured FHIR R4 resources.

Model: Claude Sonnet 4.6 (requires high-accuracy clinical reasoning)

graph LR
    A[extract_clinical_entities] --> B[map_to_medical_codes]
    B --> C[generate_fhir_resources]
    C --> D[store_resources]

    style A fill:#008B8B,color:#fff
    style B fill:#008B8B,color:#fff
    style C fill:#008B8B,color:#fff
    style D fill:#008B8B,color:#fff

Node Details:

Node	Input	Output	Fallback
`extract_clinical_entities`	Raw OCR text	Structured entities (conditions, labs, meds)	Regex-based extraction patterns
`map_to_medical_codes`	Extracted entities	ICD-10, CPT, LOINC codes	Lookup table mapping
`generate_fhir_resources`	Coded entities	FHIR R4 JSON resources	Template-based FHIR generation
`store_resources`	Validated FHIR	Database confirmation	Direct SQL insert

State Schema:

class ClinicalContextState(TypedDict):
    """State passed between Clinical Context Agent nodes."""
    document_id: str
    tenant_id: str
    patient_id: str
    case_id: str
    raw_text: str
    extracted_entities: dict          # From node 1
    medical_codes: dict               # From node 2
    fhir_resources: list[dict]        # From node 3
    store_confirmation: dict          # From node 4
    errors: list[str]
    fallback_used: bool

Comorbidity Detection

Comorbidity detection is rule-based, not LLM-based. The system maintains a lookup table of common comorbidity pairs (e.g., diabetes + hypertension, obesity + sleep apnea) and flags them deterministically. This costs $0 per case.

2. Intake Agent¶

The Intake Agent conducts conversational intake to gather patient preferences, travel constraints, and treatment requirements.

Purpose: Collect structured preferences through natural conversation.

Model: Claude Haiku 4.5 (conversational, low-cost)

graph LR
    A[classify_message] --> B[collect_preferences]
    B --> C[suggest_options]
    C --> D[update_case]

    style A fill:#008B8B,color:#fff
    style B fill:#008B8B,color:#fff
    style C fill:#008B8B,color:#fff
    style D fill:#008B8B,color:#fff

Node Details:

Node	Input	Output	Fallback
`classify_message`	Patient message	Intent classification	Keyword matching
`collect_preferences`	Classified intent	Structured preference data	Form-based collection
`suggest_options`	Current preferences	Contextual suggestions	Static suggestion list
`update_case`	Confirmed preferences	Updated case record	Direct DB update

Collected Preferences:

class PatientPreferences(BaseModel):
    """Preferences collected by the Intake Agent."""
    budget_range_usd: Optional[tuple[int, int]]
    preferred_countries: list[str]           # ISO 3166-1 alpha-3
    excluded_countries: list[str]
    preferred_languages: list[str]
    travel_date_range: Optional[tuple[date, date]]
    companion_count: int = 0
    dietary_restrictions: list[str]
    accessibility_needs: list[str]
    insurance_provider: Optional[str]
    previous_medical_travel: bool = False
    priority: str = "balanced"               # "cost", "quality", "speed", "balanced"

State Schema:

class IntakeState(TypedDict):
    """State passed between Intake Agent nodes."""
    case_id: str
    tenant_id: str
    patient_id: str
    message: str
    intent: str                              # From node 1
    current_preferences: dict                # Existing preferences
    new_preferences: dict                    # From node 2
    suggestions: list[str]                   # From node 3
    update_confirmation: dict                # From node 4
    conversation_history: list[dict]
    missing_fields: list[str]
    errors: list[str]

3. Match Agent¶

The Match Agent orchestrates the provider matching workflow, combining graph traversal, semantic search, and weighted scoring.

Purpose: Find and rank the best providers and doctors for a patient's case.

Model: Claude Haiku 4.5 (orchestration) + deterministic scoring

graph LR
    A[analyze_requirements] --> B[gather_requirements]
    B --> C[execute_scoring]
    C --> D[rerank_and_explain]

    style A fill:#FF7F50,color:#fff
    style B fill:#FF7F50,color:#fff
    style C fill:#FF7F50,color:#fff
    style D fill:#FF7F50,color:#fff

Node Details:

Node	Input	Output	Fallback
`analyze_requirements`	Case data, FHIR resources	Structured matching criteria	Rule-based criteria extraction
`gather_requirements`	Matching criteria	Provider candidates from Neo4j + Qdrant	Direct Neo4j query
`execute_scoring`	Candidates + criteria	Scored and ranked results	Weighted rules scoring
`rerank_and_explain`	Scored results	Final ranking with explanations	Template-based explanations

State Schema:

class MatchState(TypedDict):
    """State passed between Match Agent nodes."""
    case_id: str
    tenant_id: str
    patient_id: str
    clinical_data: dict                      # FHIR resources
    patient_preferences: dict
    matching_criteria: dict                  # From node 1
    candidates: list[dict]                   # From node 2
    scored_results: list[dict]               # From node 3
    final_results: list[dict]                # From node 4
    strategy_used: str
    errors: list[str]
    fallback_used: bool

4. Explanation Agent¶

The Explanation Agent generates natural-language explanations of matching results, tailored to the patient's locale and language.

Purpose: Make AI matching decisions transparent and understandable.

Model: Claude Haiku 4.5 (natural language generation)

Capabilities:

Generates per-provider explanations (why this provider was recommended)
Generates per-dimension explanations (why the clinical score is X)
Adapts language to patient's preferred_language
Adapts complexity to patient's indicated health literacy level
Highlights strengths and potential concerns for each match

class ExplanationOutput(BaseModel):
    """Output from the Explanation Agent."""
    provider_id: str
    summary: str                             # 2-3 sentence overview
    strengths: list[str]                     # Top 3 strengths
    considerations: list[str]                # Things to be aware of
    dimension_explanations: dict[str, str]   # Per-scoring-dimension
    confidence_note: Optional[str]           # If data completeness is low
    locale: str                              # Language code used

Locale-Aware Explanations

The Explanation Agent detects the patient's preferred language from their profile and generates explanations in that language. For the MVP, English, Hindi, Arabic, Turkish, and Thai are supported.

Pre-Operative Risk Assessor (Rule-Based)¶

At the end of every EHR rebuild, ehr_builder_agent runs app/services/risk_assessor.py — a pure-function, rule-based pre-operative risk classifier (no LLM). It mirrors the lab_analyzer pattern: deterministic, auditable, healthcare-safe.

The assessor inspects four buckets and writes the result to ehr_snapshot.risk_factors:

Bucket	Examples	Severity
Age	≥70 moderate, ≥80 high	moderate / high
Comorbidities	Diabetes, AFib, heart failure, CKD, COPD, OSA, anemia	low → high
Medications	Anticoagulants (blocking), antiplatelets, immunosuppressants, NSAIDs, diabetes meds	low → blocking
Labs	HbA1c ≥9% (blocking), Hgb <8 (blocking), eGFR <30 (high), INR >1.5 (blocking), platelets <100k (blocking)	high / blocking

Each risk record carries source provenance (which document / observation / med record it came from) and an is_blocking flag. Blocking risks halt forwarding until resolved; the frontend EHR drawer surfaces a "BLOCKING" badge when any risk is blocking. Covered by 26 unit tests in tests/test_risk_assessor.py.

Deterministic Fallbacks¶

Every agent node has a fallback implementation that runs without LLM calls:

async def extract_clinical_entities(state: ClinicalContextState) -> ClinicalContextState:
    """Extract clinical entities from document text."""
    try:
        # Primary: LLM-based extraction
        result = await llm_extract(state["raw_text"])
        state["extracted_entities"] = result
    except (LLMError, TimeoutError, ValidationError) as e:
        # Fallback: Regex + lookup table extraction
        logger.warning(f"LLM extraction failed, using fallback: {e}")
        result = regex_extract(state["raw_text"])
        state["extracted_entities"] = result
        state["fallback_used"] = True
        state["errors"].append(f"Fallback used for extraction: {str(e)}")
    return state

Agent	Primary Path	Fallback Path	Fallback Quality
Clinical Context	Claude Sonnet extraction	Regex + lookup tables	~70% of LLM accuracy
Intake	Claude Haiku conversation	Form-based collection	Functional but rigid
Match	LLM-enhanced scoring	Weighted rules only	~90% of LLM accuracy
Explanation	Claude Haiku generation	Template-based text	Functional but generic

MCP Server¶

Curaway exposes an MCP (Model Context Protocol) server with 6 tools for external AI assistants to interact with the platform:

Tool	Description	Parameters
`search_patients`	Find patients by name, email, or ID	`query`, `tenant_id`
`get_patient_clinical_summary`	Get FHIR-based clinical summary	`patient_id`, `tenant_id`
`search_providers`	Search providers by specialty, location, accreditation	`criteria`, `tenant_id`
`run_match`	Execute matching for a case	`case_id`, `tenant_id`
`get_match_explanation`	Get explanation for a match result	`match_id`, `tenant_id`
`check_consent`	Verify patient consent status	`patient_id`, `consent_type`, `tenant_id`

# MCP tool registration
@mcp_server.tool("search_providers")
async def search_providers(criteria: ProviderSearchCriteria, tenant_id: str):
    """Search for healthcare providers matching the given criteria."""
    results = await provider_service.search(
        tenant_id=tenant_id,
        specialty=criteria.specialty,
        country=criteria.country,
        accreditation=criteria.accreditation,
        max_results=criteria.max_results or 10,
    )
    return [provider.to_mcp_response() for provider in results]

Feature Flags¶

Agent behavior is controlled by Flagsmith feature flags:

Flag	Default	Description
`agent_enhanced_matching`	`false`	Use Match Agent instead of pure deterministic matching
`agent_explanations_enabled`	`true`	Generate LLM explanations (vs. template-based)
`clinical_context_agent_enabled`	`true`	Use LangGraph clinical extraction pipeline
`intake_agent_conversational`	`true`	Conversational intake vs. form-based
`mcp_server_enabled`	`false`	Expose MCP tools externally

Observability¶

Events Table¶

Every agent action is logged to the events table:

await log_event(
    tenant_id=tenant_id,
    event_type="agent.clinical_context.extraction_complete",
    case_id=case_id,
    payload={
        "document_id": doc_id,
        "entities_found": len(entities),
        "fallback_used": False,
        "duration_ms": elapsed,
    }
)

Langfuse Traces¶

Each agent invocation creates a Langfuse trace with:

Trace: Full agent execution (e.g., clinical_context_agent)
Spans: Individual node executions (e.g., extract_clinical_entities)
Generations: LLM calls with input/output tokens and cost
Scores: Quality metrics (extraction accuracy, explanation helpfulness)

graph TD
    T[Trace: clinical_context_agent] --> S1[Span: extract_clinical_entities]
    T --> S2[Span: map_to_medical_codes]
    T --> S3[Span: generate_fhir_resources]
    T --> S4[Span: store_resources]
    S1 --> G1[Generation: claude-sonnet-4.6]
    S2 --> G2[Generation: claude-haiku-4.5]
    S3 --> G3[Generation: claude-haiku-4.5]

    style T fill:#008B8B,color:#fff
    style S1 fill:#4A90D9,color:#fff
    style S2 fill:#4A90D9,color:#fff
    style S3 fill:#4A90D9,color:#fff
    style S4 fill:#4A90D9,color:#fff
    style G1 fill:#FF7F50,color:#fff
    style G2 fill:#FF7F50,color:#fff
    style G3 fill:#FF7F50,color:#fff

Model Selection¶

Agent / Task	Model	Rationale
Clinical Context Agent (extraction)	Claude Sonnet 4.6	Highest accuracy needed for medical data
Clinical Context Agent (coding)	Claude Haiku 4.5	Lookup-heavy, lower complexity
Intake Agent	Claude Haiku 4.5	Conversational, high-volume
Match Agent (orchestration)	Claude Haiku 4.5	Mostly deterministic scoring
Match Agent (reranking)	Claude Sonnet 4.6	Complex multi-factor reasoning
Explanation Agent	Claude Haiku 4.5	Natural language generation
MCP Tools	Claude Haiku 4.5	External tool responses