Troubleshooting¶
Common issues and their resolutions for the Curaway platform.
Git Sync Issues¶
"Already up to date" but code is stale¶
Symptom: git pull says up to date, but local code doesn't match what's deployed.
Fix: Use the gsync alias:
Deployment Issues¶
Cloud Run health check timeout¶
Symptom: Cloud Run deploy fails — the new revision never becomes ready and traffic isn't routed to it.
Cause: The startup probe points at /health, which is too slow (11 DB queries + 3 HTTP calls) and exceeds the probe window.
Fix: Point the Cloud Run startup probe at the lightweight /ready endpoint (the container listens on port 8000):
gcloud run services update curaway-backend \
--project=curaway-dev --region=asia-south1 \
--startup-probe=httpGet.path=/ready,httpGet.port=8000,timeoutSeconds=10,periodSeconds=10,failureThreshold=3
The canonical probe config lives in service.yaml in the curaway-ai/curaway-gcp-infra repo (applied by the Cloud Deploy pipeline) — prefer fixing it there so a pipeline deploy doesn't overwrite a manual gcloud override.
API Issues¶
Cloudflare proxy returning HTML instead of JSON¶
Symptom: API calls return HTML challenge pages instead of JSON.
Cause: Cloudflare proxy mode (orange cloud) intercepts requests.
Fix: Use DNS-only mode (grey cloud) for all Cloudflare records. See ADR-0008.
File Upload Issues¶
Agent "can't see attachment"¶
Root causes (in order):
- OCR not completed: Async OCR queued but not finished when chat runs. Fix: synchronous PyMuPDF inline. See ADR-0010.
has_issuesstatus: Document validator flagged issues but status not counted as analyzed. Fix: counthas_issuesas analyzed.- Scanned PDF: PyMuPDF returns 0 chars. Fix: falls back to Unstructured.io or Claude Vision.
- Toxic history: Old "can't see file" messages poison LLM context. Fix: filter toxic messages.
First message with attachment — agent asks for records it already has¶
Cause: Orchestrator routed to procedure identification before processing attachments. The procedure ID handler always appended "do you have records?" without checking.
Fix (Session 27B): Orchestrator now detects attachments during procedure identification and processes them inline, replacing the records request with document analysis.
EHR showing duplicate conditions (e.g. 53 instead of 7)¶
Cause: Clinical Context Agent ran multiple times on the same document (QStash retries, re-analysis). Each run created duplicate FHIR Condition resources. EHR builder appended them all without dedup.
Fix (Session 27B): Two layers — EHR builder deduplicates by ICD-10 code (or name) at construction time. EHR API endpoint also deduplicates at response time for existing dirty snapshots.
Document matching wrong count¶
Fix: Embedding-based matching (Session 25B). Re-seed: python -m app.seed_embeddings
Frontend Issues¶
React error #310¶
Cause: useState after early return. Move ALL hooks before any conditional returns.
Blank page after signup¶
Cause: ReferenceError from variable in wrong scope. Check browser console.
Flagsmith warning flood¶
Cause: Checks for nonexistent flags. Fixed with 60s cache in feature_flags.py.
Database Issues¶
EHR showing 50% with nothing populated¶
Cause: len(comorbidities) >= 0 always true. Fixed to > 0.
Alembic migration conflicts¶
alembic current # Check state
alembic stamp head # Force stamp if stuck
alembic upgrade head # Re-run
Agent Issues¶
Duplicate assistant messages¶
Cause: QStash callback raced sync chat path. Fixed: removed async message insertion.