Skip to content

Spend Report Dashboard — Setup Runbook

The /landscape page has a Spend (30d) tab that aggregates real cost data from every paid (and free-tier) service in the stack. This doc walks through enabling each data source.

Live URL: https://api.curaway.ai/landscape → click "Spend (30d)" tab Endpoint: GET /landscape/spend.json (returns the cached aggregator output)


Quick reference — environment variables

All optional. The dashboard works incrementally — set whichever keys you have and the rest will show as "No credentials" with a setup link.

Service Env vars Notes
Langfuse LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY Already set on Cloud Run
Anthropic (incl. Claude Code) ANTHROPIC_ADMIN_KEY Distinct from ANTHROPIC_API_KEY
OpenAI OPENAI_ADMIN_KEY Distinct from project keys
Cloudflare R2 CLOUDFLARE_API_TOKEN + CLOUDFLARE_ACCOUNT_ID Token needs R2:Read
Upstash UPSTASH_MGMT_API_KEY + UPSTASH_MGMT_EMAIL Management API

Railway + Vercel removed (GCP cutover, 2026-06). The backend now runs on Cloud Run and the frontend/docs on Firebase Hosting, so the per-service Railway and Vercel spend fetchers were deleted (PR #1494). GCP-native costs (Cloud Run, Firebase Hosting, Cloud SQL, Secret Manager) are tracked in the GCP Billing console for project curaway-dev, not in this dashboard yet — see the note at the end of this doc.

Setting a spend key on Cloud Run

Each key below is a secret. Store it in Secret Manager, then attach it to the curaway-backend Cloud Run service. The recipe is the same for every key — only the name changes:

# 1. Create (or add a new version to) the secret in Secret Manager.
printf '%s' 'sk-ant-admin-...' | gcloud secrets create ANTHROPIC_ADMIN_KEY \
  --project=curaway-dev --data-file=- \
  || printf '%s' 'sk-ant-admin-...' | gcloud secrets versions add ANTHROPIC_ADMIN_KEY \
       --project=curaway-dev --data-file=-

# 2. Attach it to the service (triggers a new revision).
gcloud run services update curaway-backend \
  --project=curaway-dev --region=asia-south1 \
  --update-secrets=ANTHROPIC_ADMIN_KEY=ANTHROPIC_ADMIN_KEY:latest

printf '%s' (not echo) avoids appending a trailing newline to the secret value. The sections below name the specific key for each service; the recipe is otherwise identical.


1. Langfuse (already configured)

No action needed. LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are already set on Cloud Run. Langfuse is the authoritative source for LLM spend because every Claude/OpenAI call in the stack is auto-traced via the LangChain callback handler. The other LLM providers below are cross-checks.

If for some reason you need to verify or rotate them: cloud.langfuse.com → Settings → API Keys.


2. Anthropic Admin Key (includes Claude Code)

The standard ANTHROPIC_API_KEY (which the backend uses to call Claude Haiku/Sonnet) does not have access to the org cost endpoints. You need a separate Admin Key.

  1. Visit https://console.anthropic.com/settings/admin-keys
  2. Click Create Admin Key
  3. Name it curaway-spend-report (or similar)
  4. Copy the key (starts with sk-ant-admin-...)
  5. Add ANTHROPIC_ADMIN_KEY to Cloud Run via Secret Manager (see Setting a spend key on Cloud Run).

What this unlocks: - 30-day daily cost breakdown for the Anthropic workspace - Includes Claude Code spend if Claude Code bills against the same workspace (it usually does — check console.anthropic.com → Billing → Usage to confirm) - API source: GET /v1/organizations/cost_report


3. OpenAI Admin Key

OpenAI's project keys can't query org-level cost. You need a service-account admin key.

  1. Visit https://platform.openai.com/settings/organization/admin-keys
  2. Click Create new secret key → set type to Admin
  3. Copy the key (starts with sk-svcacct-... or sk-admin-...)
  4. Add OPENAI_ADMIN_KEY to Cloud Run via Secret Manager (see Setting a spend key on Cloud Run).

What this unlocks: - 30-day daily cost breakdown for the OpenAI org - API source: GET /v1/organization/costs

Note: Curaway uses very little OpenAI in production (it's the fallback for Claude failures). This number should usually be near zero.


4. Cloudflare R2

Need a Cloudflare API token with Workers R2 Storage:Read permission, plus the account ID (which is already in R2_ACCOUNT_ID on Cloud Run).

  1. Visit https://dash.cloudflare.com/profile/api-tokens
  2. Click Create TokenCustom token
  3. Permissions: AccountWorkers R2 StorageRead
  4. Account resources: include the Curaway account
  5. Copy the token
  6. Add CLOUDFLARE_API_TOKEN to Cloud Run via Secret Manager (see Setting a spend key on Cloud Run). CLOUDFLARE_ACCOUNT_ID is auto-derived from R2_ACCOUNT_ID if not set.

What this unlocks: - Real R2 storage used (vs the 10 GB free tier limit) - The dashboard shows free-tier-used percentage so you know when to worry - API source: GET /accounts/{id}/r2/usage


5. Upstash Management API

Used for both Redis (cache) and QStash (async messages).

  1. Visit https://console.upstash.com/account/api
  2. Click Create API Key
  3. Copy both the key and the account email it's bound to
  4. Add UPSTASH_MGMT_API_KEY and UPSTASH_MGMT_EMAIL to Cloud Run via Secret Manager (see Setting a spend key on Cloud Run).

What this unlocks: - Daily Redis command count vs the 10K/day free tier - Same for QStash messages (500/day free) - API source: https://api.upstash.com/v2/redis/databases


Free-tier services (dashboard spot-check only)

These services have no programmatic usage API at the free tier. The Spend tab shows them as a table with direct links to each provider's dashboard so you can manually verify usage:

  • Neo4j Aura — 200K nodes free
  • Qdrant Cloud — 1 GB free
  • Clerk — 10K MAU free
  • Flagsmith — 50K req/mo free
  • Resend — 3K emails/mo free
  • Voyage AI — 50M tokens/mo free
  • PostHog — 1M events/mo free
  • GitHub — free

If usage on any of these crosses the threshold, you'll get an email from the provider — no need to monitor proactively.


How the dashboard works

  1. Frontend tab (/landscape → "Spend (30d)" button) makes a single GET /landscape/spend.json call when the tab is first clicked.
  2. Backend endpoint checks Redis cache (landscape:spend:30 key, 1h TTL).
  3. On miss, calls app.services.spend_report_service.collect_spend_report(days=30) which fans out to all 5 fetchers in parallel via asyncio.gather.
  4. Each fetcher returns a normalized dict with status{ok, no_credentials, error, free_tier}.
  5. The aggregator combines daily series, computes totals, and returns the payload.
  6. Cached for 1 hour to avoid hammering the external APIs.

The dashboard renders three things: - KPI tiles: 30-day total, biggest line item, paid services reporting, free-tier service count, generated-at timestamp - Daily spend stacked bar chart (last 30 days, one stack per provider) - Provider donut chart (share of total spend by provider) - Paid services table with per-service status badges + setup links for any unconfigured providers - Free tier services table with dashboard links

Force-refresh

The cache TTL is 1 hour. To force a fresh pull:

# Either delete the Redis key directly...
redis-cli -u $UPSTASH_REDIS_URL DEL landscape:spend:30

# ...or wait for the TTL to expire.

Troubleshooting

"No credentials" for a service I configured: a Cloud Run secret/env change only takes effect on a new revision. Confirm the gcloud run services update created a revision and that traffic routes to it: gcloud run services describe curaway-backend --project=curaway-dev --region=asia-south1 --format='value(status.latestReadyRevisionName)'. Check gcloud run services logs read curaway-backend --project=curaway-dev --region=asia-south1 for startup status.

"Error: HTTP 401": The key is invalid or expired. Regenerate from the provider console.

"Error: HTTP 403": The key is valid but lacks permission. For Anthropic and OpenAI, make sure you generated an Admin key, not a project/secret key.

Spend tab is blank: Check the browser console for /landscape/spend.json errors. The endpoint returns the aggregator output even when individual fetchers fail — a fully blank response means the cache or aggregator itself crashed.

"Connection timeout": External APIs occasionally hang. Each fetcher has a 15s timeout. The aggregator is wrapped in asyncio.gather(..., return_exceptions=True) so a single hang doesn't kill the others.


GCP-native costs (not in the dashboard yet)

Since the GCP cutover the platform's own infra runs on Google Cloud (Cloud Run, Firebase Hosting, Cloud SQL, Secret Manager, Artifact Registry). These are not collected by the Spend dashboard — the Railway and Vercel fetchers that previously covered backend/frontend hosting were removed in PR #1494.

Until a GCP billing fetcher is wired up, check these costs directly:

  • GCP Billing console → project curaway-devBilling → Reports (group by service to see Cloud Run vs Cloud SQL vs Firebase Hosting).
  • For programmatic access, enable a BigQuery billing export and query the daily cost table — this would be the integration point for a future _fetch_gcp fetcher in app/services/spend/.