v6.2-flexible Canary Rollout Runbook¶
Operational guide for flipping
prompt_arch_v6_2_flexibleON for a single tenant + 24h soak. Code shipped 2026-06-03 (#1333); the rollout is a manual Flagsmith step gated on Dr. Naidu's clinical sign-off (#1376) + the post-dispatch wiring (#1398 / PR #1449 — shipped 2026-06-19).
Prerequisites (all must be ✅ before flip)¶
| Gate | State | Owner | Issue |
|---|---|---|---|
| v6.2-flexible code in main | ✅ shipped 2026-06-03 | eng | #1333 |
prompt_version metadata stamping + backfill |
✅ shipped + 47 rows backfilled | eng | #1443/#1444 (PR #1445) |
| FORMATTING strengthening (bold adherence) | ✅ shipped 2026-06-19 | eng | #1441 (PR #1447) |
| Voice-rule glad-reaching-out catch | ✅ shipped 2026-06-19 | eng | #1442 (PR #1446) |
| Post-dispatch matching + forward wiring | ✅ shipped 2026-06-19 | eng | #1398 (PR #1449) |
| Dr. Naidu clinical sign-off — Category A items | ⏸ pending | SD/Naidu | #1376 |
Backfill audit for prompt_arch_v6_2_flexible flag config in Flagsmith |
⏸ pending | SD | this runbook |
Pre-flight checks (T-1h before flip)¶
# 1. Confirm latest Cloud Run revision carries the post-dispatch wiring (#1398)
gcloud run revisions list --service=curaway-backend \
--project=curaway-dev --region=asia-south1 --limit=3
# Expect: most recent revision is ACTIVE and the image SHA matches main HEAD.
# 2. Audit the Flagsmith flag state — must be OFF in BOTH envs pre-flip
# (per [[feedback_flagsmith_dual_env]] — both Production AND Development)
# Production: https://flagsmith.curaway.ai/...
# Development: ditto, ensure dual-env consistency
# 3. Spot-check a v6.2_flexible message in prod via the rename-aware filter
export DATABASE_URL_ADMIN="$(gcloud secrets versions access latest \
--secret=DATABASE_URL_ADMIN --project=curaway-dev)"
python -c "
import os, asyncio, asyncpg
async def main():
conn = await asyncpg.connect(os.environ['DATABASE_URL_ADMIN'].replace('postgresql+asyncpg://','postgresql://'))
n = await conn.fetchval(\"SELECT COUNT(*) FROM messages WHERE role='assistant' AND metadata->>'prompt_version'='v6.2_flexible' AND created_at > NOW() - INTERVAL '24 hours'\")
print(f'v6.2_flexible messages last 24h (canonical filter): {n}')
asyncio.run(main())
"
# 4. Confirm Sentry events are arriving (catches the SDK-init regression
# orchestrator flagged earlier)
# Check: any unresolved Sentry issue created in last 24h on prod environment.
Flip steps (T+0 — flip moment)¶
-
Pick the canary tenant. Typical first canary: a single internal/sister test tenant + Sindhu (per prior session notes), NOT the full multi-tenant fleet. Default canary tenant ID lives in CLAUDE.md under "Test-Bench Hygiene".
-
Flip the Flagsmith flag for both envs.
PerFlag: prompt_arch_v6_2_flexible Production env: OFF → ON (segment: tenant-id = <canary-tenant-id>) Development env: OFF → ON (same segment)[[feedback_flagsmith_dual_env]]— both envs flip together so the developer's local always matches prod for the canary tenant. -
Smoke-test 1 conversation immediately. Open
app.curaway.ai, sign in as a canary-tenant test patient, start a fresh conversation, send the opening message ("I need a knee replacement" or similar). Verify: - First turn assistant response arrives without error
- Response contains bold markdown on key clinical nouns (per #1441 FORMATTING strengthening — was 54.5%; target >90% post-strengthen)
- Metadata in the messages table for the new turn carries
prompt_arch="v6.2_flexible"ANDprompt_version="v6.2_flexible"(canonical underscore form — see #1443)
24h soak observability targets (T+0 → T+24h)¶
| Surface | Target | Where to watch |
|---|---|---|
| Bold-formatting adherence | >90% of v6.2 assistant turns carry **bold** |
DB query at end of runbook |
| Voice-rule violations | Zero "I'm glad you reach(ed/ing) out" hits in messages.content |
DB query at end |
| Sentry error rate | No SEV-1 / SEV-2 from v6_2_* modules |
Sentry UI |
Telegram v6_2.forwarded:* alerts |
1 per forwarded case (no silent dedup-suppression — per #1398 PR #1449 round-2 fix) | Telegram |
workflow_state.matching_complete flip rate |
Climbs as v6.2 turns pass intake_complete + procedure_identified | DB query at end |
workflow_state.forwarded flip rate |
Climbs as v6.2 cases get consent_given + matching_complete | DB query at end |
| Stuck-in-SKIP cases (per #1452 followup) | No case re-hits ValueError-skip for >5 consecutive turns | Log scan |
| Langfuse trace filter | prompt_version:v6.2_flexible returns the live + backfilled cohort |
Langfuse UI |
Health-check SQL (run periodically during soak)¶
# Save as scripts/v6_2_canary_health.py — run locally with DATABASE_URL_ADMIN set
# export DATABASE_URL_ADMIN=$(gcloud secrets versions access latest --secret=DATABASE_URL_ADMIN --project=curaway-dev)
import os, asyncio, asyncpg
async def main():
url = os.environ['DATABASE_URL_ADMIN'].replace('postgresql+asyncpg://','postgresql://')
conn = await asyncpg.connect(url)
# 1. Bold-formatting adherence rate (last 24h, v6.2_flexible only)
r = await conn.fetchrow("""
SELECT
COUNT(*) AS n,
COUNT(*) FILTER (WHERE position('**' in content) > 0) AS bold,
ROUND(100.0 * COUNT(*) FILTER (WHERE position('**' in content) > 0) / NULLIF(COUNT(*),0), 1) AS pct
FROM messages
WHERE role='assistant'
AND metadata->>'prompt_version'='v6.2_flexible'
AND created_at > NOW() - INTERVAL '24 hours'
""")
print(f"Bold adherence (24h): {r['bold']}/{r['n']} = {r['pct']}% — target >90%")
# 2. Voice-rule violations
n_glad = await conn.fetchval("""
SELECT COUNT(*) FROM messages
WHERE role='assistant'
AND metadata->>'prompt_version'='v6.2_flexible'
AND created_at > NOW() - INTERVAL '24 hours'
AND (content ILIKE '%glad you%reach%')
""")
print(f"'glad you reach...' opener hits (24h): {n_glad} — target 0")
# 3. matching_complete + forwarded flip counts
flip_rows = await conn.fetch("""
SELECT
(workflow_state ? 'matching_complete') AS has_match,
(workflow_state ? 'forwarded') AS has_fwd,
COUNT(*) AS n
FROM cases
WHERE workflow_state->>'prompt_arch'='v6.2_flexible'
AND created_at > NOW() - INTERVAL '24 hours'
GROUP BY 1, 2
""")
print("v6.2_flexible cases last 24h:")
for r in flip_rows:
print(f" matching_complete={r['has_match']} forwarded={r['has_fwd']}: {r['n']}")
await conn.close()
asyncio.run(main())
Rollback path¶
If anything in the table above trips RED during the soak:
- Flip
prompt_arch_v6_2_flexibleOFF immediately for the canary tenant (Flagsmith — both envs). - Capture the failing case_id(s) + a short SQL snapshot of the workflow_state + the failing assistant turn content for post-mortem.
- Re-run the health-check SQL to confirm the metric recovers within 5 minutes (Flagsmith propagation is ~30s; full recovery confirms the flip not a deploy regression).
- File a post-mortem GH issue linked from #1376 + this runbook,
tagged
v6.2-canary-rollback.
Post-soak expansion (T+24h)¶
If all metrics stay GREEN for 24h:
- Re-measure bold adherence at T+24h and T+72h vs the 54.5% pre-fix baseline. If >90% sustained, declare #1441 closed.
- Expand canary to 2-3 additional tenants (Flagsmith segment widen). Soak another 24h.
- Full rollout — flip the flag default ON in both envs.
- File the v6.2 rollout-complete summary as a comment on #1376 (per the project_v6_2_flexible_complete memory).
Related¶
- [[project_v6_2_flexible_complete]] — v6.2 code shipping summary
- [[feedback_flagsmith_dual_env]] — dual-env flip discipline
- [[feedback_yaml_default_vs_flagsmith_live_drift]] — YAML default ≠ live; Flagsmith stored value wins
-
1376 — Dr. Naidu clinical sign-off (the human gate)¶
-
1398 — v6.2 handoff bridge (Phase 3 matching + Phase 7 forward shipped 2026-06-19)¶
-
1441 — Bold-formatting adherence (shipped 2026-06-19; T+24h re-measure pending)¶