v6.2-flexible Canary Rollout Runbook¶

Operational guide for flipping prompt_arch_v6_2_flexible ON for a single tenant + 24h soak. Code shipped 2026-06-03 (#1333); the rollout is a manual Flagsmith step gated on Dr. Naidu's clinical sign-off (#1376) + the post-dispatch wiring (#1398 / PR #1449 — shipped 2026-06-19).

Prerequisites (all must be ✅ before flip)¶

Gate	State	Owner	Issue
v6.2-flexible code in main	✅ shipped 2026-06-03	eng	#1333
`prompt_version` metadata stamping + backfill	✅ shipped + 47 rows backfilled	eng	#1443/#1444 (PR #1445)
FORMATTING strengthening (bold adherence)	✅ shipped 2026-06-19	eng	#1441 (PR #1447)
Voice-rule glad-reaching-out catch	✅ shipped 2026-06-19	eng	#1442 (PR #1446)
Post-dispatch matching + forward wiring	✅ shipped 2026-06-19	eng	#1398 (PR #1449)
Dr. Naidu clinical sign-off — Category A items	⏸ pending	SD/Naidu	#1376
Backfill audit for `prompt_arch_v6_2_flexible` flag config in Flagsmith	⏸ pending	SD	this runbook

Pre-flight checks (T-1h before flip)¶

# 1. Confirm latest Cloud Run revision carries the post-dispatch wiring (#1398)
gcloud run revisions list --service=curaway-backend \
  --project=curaway-dev --region=asia-south1 --limit=3
# Expect: most recent revision is ACTIVE and the image SHA matches main HEAD.

# 2. Audit the Flagsmith flag state — must be OFF in BOTH envs pre-flip
# (per [[feedback_flagsmith_dual_env]] — both Production AND Development)
# Production: https://flagsmith.curaway.ai/...
# Development: ditto, ensure dual-env consistency

# 3. Spot-check a v6.2_flexible message in prod via the rename-aware filter
export DATABASE_URL_ADMIN="$(gcloud secrets versions access latest \
  --secret=DATABASE_URL_ADMIN --project=curaway-dev)"
python -c "
import os, asyncio, asyncpg
async def main():
    conn = await asyncpg.connect(os.environ['DATABASE_URL_ADMIN'].replace('postgresql+asyncpg://','postgresql://'))
    n = await conn.fetchval(\"SELECT COUNT(*) FROM messages WHERE role='assistant' AND metadata->>'prompt_version'='v6.2_flexible' AND created_at > NOW() - INTERVAL '24 hours'\")
    print(f'v6.2_flexible messages last 24h (canonical filter): {n}')
asyncio.run(main())
"

# 4. Confirm Sentry events are arriving (catches the SDK-init regression
# orchestrator flagged earlier)
# Check: any unresolved Sentry issue created in last 24h on prod environment.

Flip steps (T+0 — flip moment)¶

Pick the canary tenant. Typical first canary: a single internal/sister test tenant + Sindhu (per prior session notes), NOT the full multi-tenant fleet. Default canary tenant ID lives in CLAUDE.md under "Test-Bench Hygiene".
Flip the Flagsmith flag for both envs.
```
Flag: prompt_arch_v6_2_flexible
Production env: OFF → ON (segment: tenant-id = <canary-tenant-id>)
Development env: OFF → ON (same segment)
```
Per [[feedback_flagsmith_dual_env]] — both envs flip together so the developer's local always matches prod for the canary tenant.
Smoke-test 1 conversation immediately. Open app.curaway.ai, sign in as a canary-tenant test patient, start a fresh conversation, send the opening message ("I need a knee replacement" or similar). Verify:
First turn assistant response arrives without error
Response contains bold markdown on key clinical nouns (per #1441 FORMATTING strengthening — was 54.5%; target >90% post-strengthen)
Metadata in the messages table for the new turn carries prompt_arch="v6.2_flexible" AND prompt_version="v6.2_flexible" (canonical underscore form — see #1443)

24h soak observability targets (T+0 → T+24h)¶

Surface	Target	Where to watch
Bold-formatting adherence	>90% of v6.2 assistant turns carry `bold`	DB query at end of runbook
Voice-rule violations	Zero "I'm glad you reach(ed/ing) out" hits in `messages.content`	DB query at end
Sentry error rate	No SEV-1 / SEV-2 from `v6_2_*` modules	Sentry UI
*Telegram `v6_2.forwarded:` alerts**	1 per forwarded case (no silent dedup-suppression — per #1398 PR #1449 round-2 fix)	Telegram
`workflow_state.matching_complete` flip rate	Climbs as v6.2 turns pass intake_complete + procedure_identified	DB query at end
`workflow_state.forwarded` flip rate	Climbs as v6.2 cases get consent_given + matching_complete	DB query at end
Stuck-in-SKIP cases (per #1452 followup)	No case re-hits ValueError-skip for >5 consecutive turns	Log scan
Langfuse trace filter	`prompt_version:v6.2_flexible` returns the live + backfilled cohort	Langfuse UI

Health-check SQL (run periodically during soak)¶

# Save as scripts/v6_2_canary_health.py — run locally with DATABASE_URL_ADMIN set
# export DATABASE_URL_ADMIN=$(gcloud secrets versions access latest --secret=DATABASE_URL_ADMIN --project=curaway-dev)
import os, asyncio, asyncpg
async def main():
    url = os.environ['DATABASE_URL_ADMIN'].replace('postgresql+asyncpg://','postgresql://')
    conn = await asyncpg.connect(url)

    # 1. Bold-formatting adherence rate (last 24h, v6.2_flexible only)
    r = await conn.fetchrow("""
        SELECT
          COUNT(*) AS n,
          COUNT(*) FILTER (WHERE position('**' in content) > 0) AS bold,
          ROUND(100.0 * COUNT(*) FILTER (WHERE position('**' in content) > 0) / NULLIF(COUNT(*),0), 1) AS pct
        FROM messages
        WHERE role='assistant'
          AND metadata->>'prompt_version'='v6.2_flexible'
          AND created_at > NOW() - INTERVAL '24 hours'
    """)
    print(f"Bold adherence (24h): {r['bold']}/{r['n']} = {r['pct']}% — target >90%")

    # 2. Voice-rule violations
    n_glad = await conn.fetchval("""
        SELECT COUNT(*) FROM messages
        WHERE role='assistant'
          AND metadata->>'prompt_version'='v6.2_flexible'
          AND created_at > NOW() - INTERVAL '24 hours'
          AND (content ILIKE '%glad you%reach%')
    """)
    print(f"'glad you reach...' opener hits (24h): {n_glad} — target 0")

    # 3. matching_complete + forwarded flip counts
    flip_rows = await conn.fetch("""
        SELECT
          (workflow_state ? 'matching_complete') AS has_match,
          (workflow_state ? 'forwarded') AS has_fwd,
          COUNT(*) AS n
        FROM cases
        WHERE workflow_state->>'prompt_arch'='v6.2_flexible'
          AND created_at > NOW() - INTERVAL '24 hours'
        GROUP BY 1, 2
    """)
    print("v6.2_flexible cases last 24h:")
    for r in flip_rows:
        print(f"  matching_complete={r['has_match']} forwarded={r['has_fwd']}: {r['n']}")

    await conn.close()

asyncio.run(main())

Rollback path¶

If anything in the table above trips RED during the soak:

Flip prompt_arch_v6_2_flexible OFF immediately for the canary tenant (Flagsmith — both envs).
Capture the failing case_id(s) + a short SQL snapshot of the workflow_state + the failing assistant turn content for post-mortem.
Re-run the health-check SQL to confirm the metric recovers within 5 minutes (Flagsmith propagation is ~30s; full recovery confirms the flip not a deploy regression).
File a post-mortem GH issue linked from #1376 + this runbook, tagged v6.2-canary-rollback.

Post-soak expansion (T+24h)¶

If all metrics stay GREEN for 24h:

Re-measure bold adherence at T+24h and T+72h vs the 54.5% pre-fix baseline. If >90% sustained, declare #1441 closed.
Expand canary to 2-3 additional tenants (Flagsmith segment widen). Soak another 24h.
Full rollout — flip the flag default ON in both envs.
File the v6.2 rollout-complete summary as a comment on #1376 (per the project_v6_2_flexible_complete memory).

[[project_v6_2_flexible_complete]] — v6.2 code shipping summary
[[feedback_flagsmith_dual_env]] — dual-env flip discipline
[[feedback_yaml_default_vs_flagsmith_live_drift]] — YAML default ≠ live; Flagsmith stored value wins
1376 — Dr. Naidu clinical sign-off (the human gate)¶
1398 — v6.2 handoff bridge (Phase 3 matching + Phase 7 forward shipped 2026-06-19)¶
1441 — Bold-formatting adherence (shipped 2026-06-19; T+24h re-measure pending)¶

v6.2-flexible Canary Rollout Runbook¶

Prerequisites (all must be ✅ before flip)¶

Pre-flight checks (T-1h before flip)¶

Flip steps (T+0 — flip moment)¶

24h soak observability targets (T+0 → T+24h)¶

Health-check SQL (run periodically during soak)¶

Rollback path¶

Post-soak expansion (T+24h)¶

Related¶

1376 — Dr. Naidu clinical sign-off (the human gate)¶

1398 — v6.2 handoff bridge (Phase 3 matching + Phase 7 forward shipped 2026-06-19)¶

1441 — Bold-formatting adherence (shipped 2026-06-19; T+24h re-measure pending)¶