Skip to content

v6.2-flexible Canary Rollout Runbook

Operational guide for flipping prompt_arch_v6_2_flexible ON for a single tenant + 24h soak. Code shipped 2026-06-03 (#1333); the rollout is a manual Flagsmith step gated on Dr. Naidu's clinical sign-off (#1376) + the post-dispatch wiring (#1398 / PR #1449 — shipped 2026-06-19).

Prerequisites (all must be ✅ before flip)

Gate State Owner Issue
v6.2-flexible code in main ✅ shipped 2026-06-03 eng #1333
prompt_version metadata stamping + backfill ✅ shipped + 47 rows backfilled eng #1443/#1444 (PR #1445)
FORMATTING strengthening (bold adherence) ✅ shipped 2026-06-19 eng #1441 (PR #1447)
Voice-rule glad-reaching-out catch ✅ shipped 2026-06-19 eng #1442 (PR #1446)
Post-dispatch matching + forward wiring ✅ shipped 2026-06-19 eng #1398 (PR #1449)
Dr. Naidu clinical sign-off — Category A items ⏸ pending SD/Naidu #1376
Backfill audit for prompt_arch_v6_2_flexible flag config in Flagsmith ⏸ pending SD this runbook

Pre-flight checks (T-1h before flip)

# 1. Confirm latest Cloud Run revision carries the post-dispatch wiring (#1398)
gcloud run revisions list --service=curaway-backend \
  --project=curaway-dev --region=asia-south1 --limit=3
# Expect: most recent revision is ACTIVE and the image SHA matches main HEAD.

# 2. Audit the Flagsmith flag state — must be OFF in BOTH envs pre-flip
# (per [[feedback_flagsmith_dual_env]] — both Production AND Development)
# Production: https://flagsmith.curaway.ai/...
# Development: ditto, ensure dual-env consistency

# 3. Spot-check a v6.2_flexible message in prod via the rename-aware filter
export DATABASE_URL_ADMIN="$(gcloud secrets versions access latest \
  --secret=DATABASE_URL_ADMIN --project=curaway-dev)"
python -c "
import os, asyncio, asyncpg
async def main():
    conn = await asyncpg.connect(os.environ['DATABASE_URL_ADMIN'].replace('postgresql+asyncpg://','postgresql://'))
    n = await conn.fetchval(\"SELECT COUNT(*) FROM messages WHERE role='assistant' AND metadata->>'prompt_version'='v6.2_flexible' AND created_at > NOW() - INTERVAL '24 hours'\")
    print(f'v6.2_flexible messages last 24h (canonical filter): {n}')
asyncio.run(main())
"

# 4. Confirm Sentry events are arriving (catches the SDK-init regression
# orchestrator flagged earlier)
# Check: any unresolved Sentry issue created in last 24h on prod environment.

Flip steps (T+0 — flip moment)

  1. Pick the canary tenant. Typical first canary: a single internal/sister test tenant + Sindhu (per prior session notes), NOT the full multi-tenant fleet. Default canary tenant ID lives in CLAUDE.md under "Test-Bench Hygiene".

  2. Flip the Flagsmith flag for both envs.

    Flag: prompt_arch_v6_2_flexible
    Production env: OFF → ON (segment: tenant-id = <canary-tenant-id>)
    Development env: OFF → ON (same segment)
    
    Per [[feedback_flagsmith_dual_env]] — both envs flip together so the developer's local always matches prod for the canary tenant.

  3. Smoke-test 1 conversation immediately. Open app.curaway.ai, sign in as a canary-tenant test patient, start a fresh conversation, send the opening message ("I need a knee replacement" or similar). Verify:

  4. First turn assistant response arrives without error
  5. Response contains bold markdown on key clinical nouns (per #1441 FORMATTING strengthening — was 54.5%; target >90% post-strengthen)
  6. Metadata in the messages table for the new turn carries prompt_arch="v6.2_flexible" AND prompt_version="v6.2_flexible" (canonical underscore form — see #1443)

24h soak observability targets (T+0 → T+24h)

Surface Target Where to watch
Bold-formatting adherence >90% of v6.2 assistant turns carry **bold** DB query at end of runbook
Voice-rule violations Zero "I'm glad you reach(ed/ing) out" hits in messages.content DB query at end
Sentry error rate No SEV-1 / SEV-2 from v6_2_* modules Sentry UI
Telegram v6_2.forwarded:* alerts 1 per forwarded case (no silent dedup-suppression — per #1398 PR #1449 round-2 fix) Telegram
workflow_state.matching_complete flip rate Climbs as v6.2 turns pass intake_complete + procedure_identified DB query at end
workflow_state.forwarded flip rate Climbs as v6.2 cases get consent_given + matching_complete DB query at end
Stuck-in-SKIP cases (per #1452 followup) No case re-hits ValueError-skip for >5 consecutive turns Log scan
Langfuse trace filter prompt_version:v6.2_flexible returns the live + backfilled cohort Langfuse UI

Health-check SQL (run periodically during soak)

# Save as scripts/v6_2_canary_health.py — run locally with DATABASE_URL_ADMIN set
# export DATABASE_URL_ADMIN=$(gcloud secrets versions access latest --secret=DATABASE_URL_ADMIN --project=curaway-dev)
import os, asyncio, asyncpg
async def main():
    url = os.environ['DATABASE_URL_ADMIN'].replace('postgresql+asyncpg://','postgresql://')
    conn = await asyncpg.connect(url)

    # 1. Bold-formatting adherence rate (last 24h, v6.2_flexible only)
    r = await conn.fetchrow("""
        SELECT
          COUNT(*) AS n,
          COUNT(*) FILTER (WHERE position('**' in content) > 0) AS bold,
          ROUND(100.0 * COUNT(*) FILTER (WHERE position('**' in content) > 0) / NULLIF(COUNT(*),0), 1) AS pct
        FROM messages
        WHERE role='assistant'
          AND metadata->>'prompt_version'='v6.2_flexible'
          AND created_at > NOW() - INTERVAL '24 hours'
    """)
    print(f"Bold adherence (24h): {r['bold']}/{r['n']} = {r['pct']}% — target >90%")

    # 2. Voice-rule violations
    n_glad = await conn.fetchval("""
        SELECT COUNT(*) FROM messages
        WHERE role='assistant'
          AND metadata->>'prompt_version'='v6.2_flexible'
          AND created_at > NOW() - INTERVAL '24 hours'
          AND (content ILIKE '%glad you%reach%')
    """)
    print(f"'glad you reach...' opener hits (24h): {n_glad} — target 0")

    # 3. matching_complete + forwarded flip counts
    flip_rows = await conn.fetch("""
        SELECT
          (workflow_state ? 'matching_complete') AS has_match,
          (workflow_state ? 'forwarded') AS has_fwd,
          COUNT(*) AS n
        FROM cases
        WHERE workflow_state->>'prompt_arch'='v6.2_flexible'
          AND created_at > NOW() - INTERVAL '24 hours'
        GROUP BY 1, 2
    """)
    print("v6.2_flexible cases last 24h:")
    for r in flip_rows:
        print(f"  matching_complete={r['has_match']} forwarded={r['has_fwd']}: {r['n']}")

    await conn.close()

asyncio.run(main())

Rollback path

If anything in the table above trips RED during the soak:

  1. Flip prompt_arch_v6_2_flexible OFF immediately for the canary tenant (Flagsmith — both envs).
  2. Capture the failing case_id(s) + a short SQL snapshot of the workflow_state + the failing assistant turn content for post-mortem.
  3. Re-run the health-check SQL to confirm the metric recovers within 5 minutes (Flagsmith propagation is ~30s; full recovery confirms the flip not a deploy regression).
  4. File a post-mortem GH issue linked from #1376 + this runbook, tagged v6.2-canary-rollback.

Post-soak expansion (T+24h)

If all metrics stay GREEN for 24h:

  1. Re-measure bold adherence at T+24h and T+72h vs the 54.5% pre-fix baseline. If >90% sustained, declare #1441 closed.
  2. Expand canary to 2-3 additional tenants (Flagsmith segment widen). Soak another 24h.
  3. Full rollout — flip the flag default ON in both envs.
  4. File the v6.2 rollout-complete summary as a comment on #1376 (per the project_v6_2_flexible_complete memory).
  • [[project_v6_2_flexible_complete]] — v6.2 code shipping summary
  • [[feedback_flagsmith_dual_env]] — dual-env flip discipline
  • [[feedback_yaml_default_vs_flagsmith_live_drift]] — YAML default ≠ live; Flagsmith stored value wins
  • 1376 — Dr. Naidu clinical sign-off (the human gate)

  • 1398 — v6.2 handoff bridge (Phase 3 matching + Phase 7 forward shipped 2026-06-19)

  • 1441 — Bold-formatting adherence (shipped 2026-06-19; T+24h re-measure pending)