Domain Specifications¶
Behavior-driven scenarios for verifying the AI Workloads Platform. Each scenario follows the GIVEN / WHEN / THEN structure.
Provider Connections¶
Register provider API key¶
The system SHALL accept an API key for a supported provider (OpenAI, Anthropic, OpenRouter), validate the key against the provider's usage API, encrypt it via AWS Secrets Manager, and store only the Secrets Manager ARN in the database. The raw API key SHALL never be persisted in the database or returned in any API response.
Scenario: Valid OpenAI admin key registration
WHEN a user submits a POST /api/v1/connections with provider "openai" and a valid admin API key (prefix sk-admin-)
THEN the system validates the key by probing the OpenAI usage API, stores the key in Secrets Manager, creates a ProviderConnection with status "active", auto-creates a Default project if none exists, and returns the connection ID without the API key
Scenario: Invalid or expired key rejected
WHEN a user submits a connection request with an invalid or expired API key
THEN the system returns HTTP 400 with a descriptive error message and does not store the key
Scenario: Duplicate provider connection rejected
WHEN a user submits a connection for a provider that already has an active connection in the same organization
THEN the system returns HTTP 409
List and retrieve connections¶
The system SHALL provide endpoints to list all connections for the authenticated organization and retrieve a single connection by ID, including provider, status, last poll time, and associated project.
Scenario: List connections for organization
WHEN an authenticated user sends GET /api/v1/connections
THEN the system returns all connections belonging to the user's organization with provider, status, last_polled_at, and project information
Scenario: Connection not found
WHEN a user requests a connection ID that does not exist or belongs to another organization
THEN the system returns HTTP 404
Delete connection with secret cleanup¶
The system SHALL allow users to delete a connection, which schedules the Secrets Manager secret for deletion and deactivates the associated workload. Historical telemetry events SHALL be preserved.
Scenario: Delete connection
WHEN a user sends DELETE /api/v1/connections/{id}
THEN the system schedules the Secrets Manager secret for deletion (30-day recovery window), deactivates the workload, and returns HTTP 204
Trigger manual sync¶
The system SHALL allow users to trigger an immediate usage sync for a connection, rate-limited to once per 5 minutes per connection.
Scenario: Manual sync enqueued
WHEN a user sends POST /api/v1/connections/{id}/sync for an active connection
THEN the system enqueues a background poll job and returns HTTP 202
Scenario: Manual sync on error connection rejected
WHEN a user sends POST /api/v1/connections/{id}/sync for a connection in "error" or "disabled" status
THEN the system returns HTTP 409
Remap connection to different project¶
The system SHALL allow users to remap a connection to a different project, creating a new active workload under the target project while deactivating the previous workload. Historical telemetry SHALL remain on the original workload.
Scenario: Remap connection
WHEN a user sends PUT /api/v1/connections/{id}/project with a valid project_id
THEN the system creates a new Workload under the target project, marks it active, deactivates the previous workload, and returns the updated connection
Connection error handling¶
The system SHALL classify provider API errors as transient (retry with exponential backoff) or permanent (stop polling and alert user). After 5 consecutive permanent failures, the connection status SHALL transition to "disabled".
Scenario: Transient error with retry
WHEN a provider API call fails with a transient error (timeout, 429, 503)
THEN the system retries with exponential backoff and increments consecutive_failures
Scenario: Permanent failure threshold reached
WHEN a connection accumulates 5 consecutive permanent failures
THEN the system transitions the connection to "disabled" status and stops polling until the user re-authenticates
Scenario: Successful poll resets failure counter
WHEN a provider API call succeeds after previous failures
THEN the system resets consecutive_failures to 0
Telemetry Ingestion¶
Hourly automated polling¶
The system SHALL poll all active provider connections hourly using cursor-based incremental retrieval. Each provider connector SHALL maintain a sync_cursor to fetch only new data since the last poll.
Scenario: Hourly poll cycle
WHEN the hourly polling cron fires
THEN the system iterates all connections with status "active", retrieves the API key from Secrets Manager, calls the provider's usage API from the sync_cursor forward, ingests new events, and updates the sync_cursor and last_polled_at
Scenario: Poll with no new data
WHEN the provider returns no new usage data since the last cursor
THEN the system updates last_polled_at without creating any new telemetry events
Idempotent event deduplication¶
The system SHALL deduplicate telemetry events using an idempotency hash (SHA-256 of provider:org_id:model:bucket_start_hour) with an upsert strategy (last-write-wins). Token counts MAY be updated on re-poll; the hash, model, and bucket_start SHALL never be modified after creation.
Scenario: First ingestion of a usage bucket
WHEN a new usage bucket is polled with a hash not yet in the database
THEN the system creates a new TelemetryEvent with all token counts and the idempotency hash
Scenario: Re-poll of existing bucket with updated data
WHEN the same bucket is re-polled with a matching idempotency hash but different token counts
THEN the system updates the token counts and raw_payload via upsert without creating a duplicate
Provider-specific token mapping¶
The system SHALL map Anthropic's three-way token split (input_tokens as uncached, cache_creation_input_tokens, cache_read_input_tokens) and treat all OpenAI and OpenRouter input tokens as uncached.
Scenario: Anthropic token mapping
WHEN Anthropic usage data is ingested
THEN the system maps input_tokens → input_tokens_uncached, cache_creation_input_tokens → input_tokens_cache_creation, and cache_read_input_tokens → input_tokens_cached
Scenario: OpenAI token mapping
WHEN OpenAI usage data is ingested
THEN the system maps all input tokens to input_tokens_uncached with input_tokens_cached and input_tokens_cache_creation set to 0
Trailing reconciliation¶
The system SHALL run a T+24h trailing reconciliation daily at 03:00 UTC that re-polls the last 24-hour window for all active connections to catch late-arriving provider data.
Scenario: Reconciliation updates stale data
WHEN the daily reconciliation job runs and a provider returns revised token counts for a previously polled bucket
THEN the system upserts the telemetry event with the updated counts and recalculates the associated emissions
Dashboard telemetry queries¶
The system SHALL provide API endpoints to retrieve telemetry summaries (total CO2, per-model breakdown, daily chart, cached vs. uncached split), paginated event lists, and distinct model aggregations, filterable by date range and project.
Scenario: Summary query with date filter
WHEN an authenticated user sends GET /api/v1/telemetry/summary with start_date and end_date
THEN the system returns aggregated CO2, per-model breakdown, daily chart, and cached vs. uncached split for the specified period
Scenario: Summary filtered by project
WHEN the user adds a project_id query parameter
THEN the system returns telemetry only for workloads belonging to that project
Emissions Engine¶
Token to CO2 calculation¶
The system SHALL calculate carbon emissions for each telemetry event using a three-phase pipeline: tokens → energy (joules), energy → kWh, kWh → CO2 (kg). The calculation SHALL account for prefill (uncached input), decode (output), and cached (cache_read) token phases separately, applying different energy-per-token rates for each.
Scenario: Standard emissions calculation
WHEN a telemetry event is ingested with uncached input tokens, output tokens, and zero cached tokens
THEN the system calculates energy as (uncached × prefill_rate + output × decode_rate), converts to kWh, multiplies by grid_intensity × PUE, and stores a CarbonCalculation record
Scenario: Cached tokens produce lower emissions
WHEN a telemetry event includes cached tokens (Anthropic cache_read)
THEN the cached phase uses approximately 10% of the prefill energy rate, producing materially lower CO2 than equivalent uncached usage
Versioned carbon factors¶
The system SHALL use a versioned CarbonFactors lookup table to map models to tiers via fnmatch glob patterns. Each factors version SHALL be immutable - new versions are appended, never mutated. Calculations SHALL reference the factors_version used for reproducibility.
Scenario: Model tier mapping via glob
WHEN a telemetry event for model "gpt-4o-2024-05-13" is calculated
THEN the system matches the model name against CarbonFactors patterns (e.g., gpt-4* → tier_3) and applies the corresponding energy rates
Scenario: Unknown model falls back to medium tier
WHEN a telemetry event references a model that matches no glob pattern
THEN the system applies the medium energy tier as a conservative default
Uncertainty bounds¶
The system SHALL calculate uncertainty bounds for each emission estimate as co2_kg × (1 ± uncertainty_pct/100), stored as co2_lower_bound_kg and co2_upper_bound_kg on the CarbonCalculation record.
Scenario: Uncertainty bounds stored
WHEN a CarbonCalculation is created with co2_kg = 0.10 and uncertainty_pct = 30
THEN co2_lower_bound_kg = 0.07 and co2_upper_bound_kg = 0.13
Project Organization¶
Project CRUD¶
The system SHALL allow users to create, list, retrieve, update, and delete projects within their organization. Project names SHALL be unique per organization. Each organization SHALL have an auto-created "Default" project.
Scenario: Create project
WHEN a user sends POST /api/v1/projects with name "Production App"
THEN the system creates the project and returns its ID, name, and is_default: false
Scenario: Duplicate project name rejected
WHEN a user creates a project with a name that already exists in the organization
THEN the system returns HTTP 409
Scenario: Delete project with active workloads rejected
WHEN a user sends DELETE on a project that still has active workloads
THEN the system returns HTTP 409 requiring workloads be remapped first
Scenario: Default project cannot be deleted
WHEN a user sends DELETE on the Default project
THEN the system returns HTTP 400
Auto-create default project¶
The system SHALL auto-create a "Default" project (is_default: true) when a connection is registered without a project_id. All connections without explicit project assignment SHALL route to the Default project.
Scenario: Default project auto-creation
WHEN a user creates a connection without specifying project_id and no Default project exists
THEN the system creates a Default project, creates a Workload under it, and links the connection
Per-project analytics¶
The system SHALL provide project detail endpoints showing only that project's telemetry: per-model breakdown, daily chart, CO2 totals, and connected connections.
Scenario: Project detail with telemetry
WHEN a user sends GET /api/v1/projects/{id} with date filters
THEN the system returns the project's connections and telemetry summary scoped only to that project's workloads
Data export with project filter¶
The system SHALL support CSV and JSON export of telemetry data, filterable by project. Exports SHALL be available on all tiers (free and paid).
Scenario: CSV export filtered by project
WHEN a user sends GET /api/v1/export/telemetry?format=csv&project_id={id}
THEN the system returns a CSV file containing only telemetry events from the specified project's workloads, with a project_name column
Billing & Subscriptions¶
Stripe subscription tiers¶
The system SHALL support five billing tiers (Free, Starter, Growth, Scale, Enterprise) managed via Stripe subscriptions. Free-tier users SHALL access analytics without payment. Paid-tier users SHALL receive automatic credit retirement and receipts.
Scenario: Free tier by default
WHEN a new organization is created
THEN the organization's plan_tier is set to "free" with full analytics access and no receipts
Scenario: Upgrade to paid tier
WHEN a user sends POST /api/v1/billing/upgrade with plan "starter"
THEN the system creates a Stripe Checkout session and returns the checkout URL
Scenario: Enterprise tier requires contact
WHEN a user requests upgrade to "enterprise"
THEN the system returns a contact form URL instead of a checkout session
Billing period lifecycle¶
The system SHALL maintain monthly billing periods with a state machine (open → closing → closed → failed). The "closing" state SHALL be triggered by Stripe's invoice.payment_succeeded webhook. Receipt generation SHALL be deferred until T+48h after closing to allow reconciliation.
Scenario: Payment succeeded triggers closing
WHEN Stripe sends an invoice.payment_succeeded webhook
THEN the system transitions the billing period to "closing" and schedules a billing close job deferred by 48 hours
Scenario: Payment failed marks period as failed
WHEN Stripe sends an invoice.payment_failed webhook
THEN the system transitions the billing period to "failed" without retiring credits
Scenario: Subscription cancelled downgrades to free
WHEN Stripe sends a customer.subscription.deleted webhook
THEN the system updates the organization's plan_tier to "free"
Billing status endpoint¶
The system SHALL provide an endpoint to view current billing status including plan tier, current period, and past periods with receipt references.
Scenario: View billing status
WHEN an authenticated user sends GET /api/v1/billing/status
THEN the system returns plan_tier, current open period with running CO2 total, and past periods with receipt serial numbers
Customer portal access¶
The system SHALL provide a link to Stripe's Customer Portal for subscription management (plan changes, payment method updates, cancellation).
Scenario: Get portal link
WHEN a user sends POST /api/v1/billing/portal
THEN the system returns a Stripe Customer Portal session URL
Carbon Receipts¶
Credit retirement and receipt generation¶
The system SHALL retire carbon credits from internal inventory at billing period close, generate a receipt with serial number (format CL-YYYYMM-XXXXX), sign it with Ed25519, produce a branded PDF, and store the receipt with credit serial numbers.
Scenario: Billing period closes with credit retirement
WHEN the T+48h billing close job runs for a period in "closing" status
THEN the system aggregates CO2 for the period, retires equivalent credits from inventory, assigns a serial number, signs the receipt, generates a PDF, transitions the period to "closed", and stores the receipt
Scenario: Insufficient credit inventory
WHEN the billing close job runs but credit inventory is insufficient
THEN the system marks the period as "failed", alerts operations, and defers the receipt
Ed25519 receipt signing¶
The system SHALL sign each receipt using Ed25519 (PyNaCl) with SHA-256 payload hashing. The signing flow SHALL: serialize the receipt payload as canonical JSON (sorted keys), SHA-256 hash the payload, sign the hash with the Ed25519 private key. Each receipt SHALL store the signature, payload_hash, public_key, and key_version to support key rotation.
Scenario: Receipt signing produces valid signature
WHEN a receipt is generated
THEN the stored signature verifies successfully against the payload_hash using the stored public_key via any Ed25519 library
Scenario: Key rotation preserves old receipt validity
WHEN the signing key is rotated to a new version
THEN old receipts remain verifiable using their stored public_key and key_version
Public receipt verification¶
The system SHALL expose a public endpoint (no authentication) at GET /public/receipts/verify/{serial_number} that returns the receipt metadata, signature, public_key, and verification instructions. This endpoint SHALL be rate-limited to 60 requests per minute per IP.
Scenario: Public verification of valid receipt
WHEN anyone sends GET /public/receipts/verify/CL-202603-00001
THEN the system returns the receipt metadata, signature, public_key, key_version, and verification instructions with verified: true
Scenario: Invalid serial number
WHEN a non-existent serial number is queried
THEN the system returns HTTP 404
Receipt listing and PDF download¶
The system SHALL provide authenticated endpoints to list receipts (paginated) and download individual receipt PDFs. Free-tier users SHALL receive HTTP 403 with an upgrade prompt.
Scenario: List receipts for paid org
WHEN a paid-tier user sends GET /api/v1/receipts
THEN the system returns paginated receipts with serial numbers, CO2 retired, credit references, and PDF URLs
Scenario: Free-tier user denied receipts
WHEN a free-tier user sends GET /api/v1/receipts
THEN the system returns HTTP 403 with an upgrade URL
Prior-period adjustments¶
The system SHALL handle late-arriving telemetry that falls in a closed billing period by creating a PriorPeriodAdjustment entry linked to the next open period. The adjustment SHALL include the CO2 delta and reason.
Scenario: Late telemetry after period close
WHEN a telemetry event is ingested with event_timestamp falling in a closed billing period
THEN the system creates a PriorPeriodAdjustment entry with co2_delta_kg and reason "late_telemetry" linked to the next open period
Audit & Compliance¶
Monthly audit pack generation¶
The system SHALL generate monthly audit packs (zip files) for paid-tier organizations on the 3rd of each month. Each pack SHALL contain all receipts (PDF), calculation breakdowns (JSON), retirement confirmations, and the methodology version used, with a manifest file whose SHA-256 hash is stored for integrity verification.
Scenario: Audit pack generated on schedule
WHEN the monthly audit pack cron runs on the 3rd
THEN the system iterates paid-tier organizations with closed periods in the prior month, generates a zip with receipts and calculations, uploads to S3, and stores an AuditPack record with the manifest hash
Scenario: No closed periods for org in prior month
WHEN the audit pack cron runs for an org with no closed billing periods in the prior month
THEN the system skips that organization without creating an audit pack
Audit pack download¶
The system SHALL provide an authenticated endpoint to download audit packs by year and month. Free-tier organizations SHALL receive HTTP 403.
Scenario: Download audit pack
WHEN a paid-tier user sends GET /api/v1/export/audit-pack/2026/02
THEN the system returns the zip file and the contained files match the stored manifest
Scenario: Audit pack not yet generated
WHEN a user requests an audit pack for a month that has not been generated yet
THEN the system returns HTTP 404
Scenario: Free-tier user denied
WHEN a free-tier user requests an audit pack download
THEN the system returns HTTP 403
Health & Observability¶
Health endpoint¶
The system SHALL expose a public /health endpoint (no authentication) that checks database connectivity, Redis availability, and the recency of the last poll timestamp. The endpoint SHALL return HTTP 200 when all checks pass and HTTP 503 when any check fails, for integration with load balancer health checks.
Scenario: All systems healthy
WHEN a client sends GET /health and database, Redis, and last poll are all nominal
THEN the system returns HTTP 200 with status "healthy" and individual check results with latency
Scenario: Redis unavailable
WHEN Redis is unreachable
THEN the system returns HTTP 503 with status "degraded" and the Redis check marked as "error"
Scenario: Stale polling detected
WHEN the last poll timestamp is older than 90 minutes
THEN the last_poll check status is "warning"; if older than 180 minutes, status is "error"
Structured logging with correlation IDs¶
The system SHALL use structured logging (JSON format) with correlation IDs that propagate across API requests and background jobs. Sensitive fields (API keys, Bearer tokens) SHALL be redacted from all log output.
Scenario: API request logging
WHEN an API request is processed
THEN the system logs the request with a unique correlation_id that appears in all related log entries (request, service calls, database queries, response)
Scenario: Background job logging
WHEN a background job executes (polling, reconciliation, billing close)
THEN the system logs with a job-specific correlation_id linking all operations within that job
Scenario: Sensitive field redaction
WHEN a log entry would contain an API key (sk-, sk-ant-) or Bearer token
THEN the system redacts the value, replacing it with [REDACTED]