Skip to content

Domain Specifications

Behavior-driven scenarios for verifying the AI Workloads Platform. Each scenario follows the GIVEN / WHEN / THEN structure.


Provider Connections

Register provider API key

The system SHALL accept an API key for a supported provider (OpenAI, Anthropic, OpenRouter), validate the key against the provider's usage API, encrypt it via AWS Secrets Manager, and store only the Secrets Manager ARN in the database. The raw API key SHALL never be persisted in the database or returned in any API response.

Scenario: Valid OpenAI admin key registration

WHEN a user submits a POST /api/v1/connections with provider "openai" and a valid admin API key (prefix sk-admin-)

THEN the system validates the key by probing the OpenAI usage API, stores the key in Secrets Manager, creates a ProviderConnection with status "active", auto-creates a Default project if none exists, and returns the connection ID without the API key

Scenario: Invalid or expired key rejected

WHEN a user submits a connection request with an invalid or expired API key

THEN the system returns HTTP 400 with a descriptive error message and does not store the key

Scenario: Duplicate provider connection rejected

WHEN a user submits a connection for a provider that already has an active connection in the same organization

THEN the system returns HTTP 409

List and retrieve connections

The system SHALL provide endpoints to list all connections for the authenticated organization and retrieve a single connection by ID, including provider, status, last poll time, and associated project.

Scenario: List connections for organization

WHEN an authenticated user sends GET /api/v1/connections

THEN the system returns all connections belonging to the user's organization with provider, status, last_polled_at, and project information

Scenario: Connection not found

WHEN a user requests a connection ID that does not exist or belongs to another organization

THEN the system returns HTTP 404

Delete connection with secret cleanup

The system SHALL allow users to delete a connection, which schedules the Secrets Manager secret for deletion and deactivates the associated workload. Historical telemetry events SHALL be preserved.

Scenario: Delete connection

WHEN a user sends DELETE /api/v1/connections/{id}

THEN the system schedules the Secrets Manager secret for deletion (30-day recovery window), deactivates the workload, and returns HTTP 204

Trigger manual sync

The system SHALL allow users to trigger an immediate usage sync for a connection, rate-limited to once per 5 minutes per connection.

Scenario: Manual sync enqueued

WHEN a user sends POST /api/v1/connections/{id}/sync for an active connection

THEN the system enqueues a background poll job and returns HTTP 202

Scenario: Manual sync on error connection rejected

WHEN a user sends POST /api/v1/connections/{id}/sync for a connection in "error" or "disabled" status

THEN the system returns HTTP 409

Remap connection to different project

The system SHALL allow users to remap a connection to a different project, creating a new active workload under the target project while deactivating the previous workload. Historical telemetry SHALL remain on the original workload.

Scenario: Remap connection

WHEN a user sends PUT /api/v1/connections/{id}/project with a valid project_id

THEN the system creates a new Workload under the target project, marks it active, deactivates the previous workload, and returns the updated connection

Connection error handling

The system SHALL classify provider API errors as transient (retry with exponential backoff) or permanent (stop polling and alert user). After 5 consecutive permanent failures, the connection status SHALL transition to "disabled".

Scenario: Transient error with retry

WHEN a provider API call fails with a transient error (timeout, 429, 503)

THEN the system retries with exponential backoff and increments consecutive_failures

Scenario: Permanent failure threshold reached

WHEN a connection accumulates 5 consecutive permanent failures

THEN the system transitions the connection to "disabled" status and stops polling until the user re-authenticates

Scenario: Successful poll resets failure counter

WHEN a provider API call succeeds after previous failures

THEN the system resets consecutive_failures to 0


Telemetry Ingestion

Hourly automated polling

The system SHALL poll all active provider connections hourly using cursor-based incremental retrieval. Each provider connector SHALL maintain a sync_cursor to fetch only new data since the last poll.

Scenario: Hourly poll cycle

WHEN the hourly polling cron fires

THEN the system iterates all connections with status "active", retrieves the API key from Secrets Manager, calls the provider's usage API from the sync_cursor forward, ingests new events, and updates the sync_cursor and last_polled_at

Scenario: Poll with no new data

WHEN the provider returns no new usage data since the last cursor

THEN the system updates last_polled_at without creating any new telemetry events

Idempotent event deduplication

The system SHALL deduplicate telemetry events using an idempotency hash (SHA-256 of provider:org_id:model:bucket_start_hour) with an upsert strategy (last-write-wins). Token counts MAY be updated on re-poll; the hash, model, and bucket_start SHALL never be modified after creation.

Scenario: First ingestion of a usage bucket

WHEN a new usage bucket is polled with a hash not yet in the database

THEN the system creates a new TelemetryEvent with all token counts and the idempotency hash

Scenario: Re-poll of existing bucket with updated data

WHEN the same bucket is re-polled with a matching idempotency hash but different token counts

THEN the system updates the token counts and raw_payload via upsert without creating a duplicate

Provider-specific token mapping

The system SHALL map Anthropic's three-way token split (input_tokens as uncached, cache_creation_input_tokens, cache_read_input_tokens) and treat all OpenAI and OpenRouter input tokens as uncached.

Scenario: Anthropic token mapping

WHEN Anthropic usage data is ingested

THEN the system maps input_tokens → input_tokens_uncached, cache_creation_input_tokens → input_tokens_cache_creation, and cache_read_input_tokens → input_tokens_cached

Scenario: OpenAI token mapping

WHEN OpenAI usage data is ingested

THEN the system maps all input tokens to input_tokens_uncached with input_tokens_cached and input_tokens_cache_creation set to 0

Trailing reconciliation

The system SHALL run a T+24h trailing reconciliation daily at 03:00 UTC that re-polls the last 24-hour window for all active connections to catch late-arriving provider data.

Scenario: Reconciliation updates stale data

WHEN the daily reconciliation job runs and a provider returns revised token counts for a previously polled bucket

THEN the system upserts the telemetry event with the updated counts and recalculates the associated emissions

Dashboard telemetry queries

The system SHALL provide API endpoints to retrieve telemetry summaries (total CO2, per-model breakdown, daily chart, cached vs. uncached split), paginated event lists, and distinct model aggregations, filterable by date range and project.

Scenario: Summary query with date filter

WHEN an authenticated user sends GET /api/v1/telemetry/summary with start_date and end_date

THEN the system returns aggregated CO2, per-model breakdown, daily chart, and cached vs. uncached split for the specified period

Scenario: Summary filtered by project

WHEN the user adds a project_id query parameter

THEN the system returns telemetry only for workloads belonging to that project


Emissions Engine

Token to CO2 calculation

The system SHALL calculate carbon emissions for each telemetry event using a three-phase pipeline: tokens → energy (joules), energy → kWh, kWh → CO2 (kg). The calculation SHALL account for prefill (uncached input), decode (output), and cached (cache_read) token phases separately, applying different energy-per-token rates for each.

Scenario: Standard emissions calculation

WHEN a telemetry event is ingested with uncached input tokens, output tokens, and zero cached tokens

THEN the system calculates energy as (uncached × prefill_rate + output × decode_rate), converts to kWh, multiplies by grid_intensity × PUE, and stores a CarbonCalculation record

Scenario: Cached tokens produce lower emissions

WHEN a telemetry event includes cached tokens (Anthropic cache_read)

THEN the cached phase uses approximately 10% of the prefill energy rate, producing materially lower CO2 than equivalent uncached usage

Versioned carbon factors

The system SHALL use a versioned CarbonFactors lookup table to map models to tiers via fnmatch glob patterns. Each factors version SHALL be immutable - new versions are appended, never mutated. Calculations SHALL reference the factors_version used for reproducibility.

Scenario: Model tier mapping via glob

WHEN a telemetry event for model "gpt-4o-2024-05-13" is calculated

THEN the system matches the model name against CarbonFactors patterns (e.g., gpt-4* → tier_3) and applies the corresponding energy rates

Scenario: Unknown model falls back to medium tier

WHEN a telemetry event references a model that matches no glob pattern

THEN the system applies the medium energy tier as a conservative default

Uncertainty bounds

The system SHALL calculate uncertainty bounds for each emission estimate as co2_kg × (1 ± uncertainty_pct/100), stored as co2_lower_bound_kg and co2_upper_bound_kg on the CarbonCalculation record.

Scenario: Uncertainty bounds stored

WHEN a CarbonCalculation is created with co2_kg = 0.10 and uncertainty_pct = 30

THEN co2_lower_bound_kg = 0.07 and co2_upper_bound_kg = 0.13


Project Organization

Project CRUD

The system SHALL allow users to create, list, retrieve, update, and delete projects within their organization. Project names SHALL be unique per organization. Each organization SHALL have an auto-created "Default" project.

Scenario: Create project

WHEN a user sends POST /api/v1/projects with name "Production App"

THEN the system creates the project and returns its ID, name, and is_default: false

Scenario: Duplicate project name rejected

WHEN a user creates a project with a name that already exists in the organization

THEN the system returns HTTP 409

Scenario: Delete project with active workloads rejected

WHEN a user sends DELETE on a project that still has active workloads

THEN the system returns HTTP 409 requiring workloads be remapped first

Scenario: Default project cannot be deleted

WHEN a user sends DELETE on the Default project

THEN the system returns HTTP 400

Auto-create default project

The system SHALL auto-create a "Default" project (is_default: true) when a connection is registered without a project_id. All connections without explicit project assignment SHALL route to the Default project.

Scenario: Default project auto-creation

WHEN a user creates a connection without specifying project_id and no Default project exists

THEN the system creates a Default project, creates a Workload under it, and links the connection

Per-project analytics

The system SHALL provide project detail endpoints showing only that project's telemetry: per-model breakdown, daily chart, CO2 totals, and connected connections.

Scenario: Project detail with telemetry

WHEN a user sends GET /api/v1/projects/{id} with date filters

THEN the system returns the project's connections and telemetry summary scoped only to that project's workloads

Data export with project filter

The system SHALL support CSV and JSON export of telemetry data, filterable by project. Exports SHALL be available on all tiers (free and paid).

Scenario: CSV export filtered by project

WHEN a user sends GET /api/v1/export/telemetry?format=csv&project_id={id}

THEN the system returns a CSV file containing only telemetry events from the specified project's workloads, with a project_name column


Billing & Subscriptions

Stripe subscription tiers

The system SHALL support five billing tiers (Free, Starter, Growth, Scale, Enterprise) managed via Stripe subscriptions. Free-tier users SHALL access analytics without payment. Paid-tier users SHALL receive automatic credit retirement and receipts.

Scenario: Free tier by default

WHEN a new organization is created

THEN the organization's plan_tier is set to "free" with full analytics access and no receipts

Scenario: Upgrade to paid tier

WHEN a user sends POST /api/v1/billing/upgrade with plan "starter"

THEN the system creates a Stripe Checkout session and returns the checkout URL

Scenario: Enterprise tier requires contact

WHEN a user requests upgrade to "enterprise"

THEN the system returns a contact form URL instead of a checkout session

Billing period lifecycle

The system SHALL maintain monthly billing periods with a state machine (open → closing → closed → failed). The "closing" state SHALL be triggered by Stripe's invoice.payment_succeeded webhook. Receipt generation SHALL be deferred until T+48h after closing to allow reconciliation.

Scenario: Payment succeeded triggers closing

WHEN Stripe sends an invoice.payment_succeeded webhook

THEN the system transitions the billing period to "closing" and schedules a billing close job deferred by 48 hours

Scenario: Payment failed marks period as failed

WHEN Stripe sends an invoice.payment_failed webhook

THEN the system transitions the billing period to "failed" without retiring credits

Scenario: Subscription cancelled downgrades to free

WHEN Stripe sends a customer.subscription.deleted webhook

THEN the system updates the organization's plan_tier to "free"

Billing status endpoint

The system SHALL provide an endpoint to view current billing status including plan tier, current period, and past periods with receipt references.

Scenario: View billing status

WHEN an authenticated user sends GET /api/v1/billing/status

THEN the system returns plan_tier, current open period with running CO2 total, and past periods with receipt serial numbers

Customer portal access

The system SHALL provide a link to Stripe's Customer Portal for subscription management (plan changes, payment method updates, cancellation).

Scenario: Get portal link

WHEN a user sends POST /api/v1/billing/portal

THEN the system returns a Stripe Customer Portal session URL


Carbon Receipts

Credit retirement and receipt generation

The system SHALL retire carbon credits from internal inventory at billing period close, generate a receipt with serial number (format CL-YYYYMM-XXXXX), sign it with Ed25519, produce a branded PDF, and store the receipt with credit serial numbers.

Scenario: Billing period closes with credit retirement

WHEN the T+48h billing close job runs for a period in "closing" status

THEN the system aggregates CO2 for the period, retires equivalent credits from inventory, assigns a serial number, signs the receipt, generates a PDF, transitions the period to "closed", and stores the receipt

Scenario: Insufficient credit inventory

WHEN the billing close job runs but credit inventory is insufficient

THEN the system marks the period as "failed", alerts operations, and defers the receipt

Ed25519 receipt signing

The system SHALL sign each receipt using Ed25519 (PyNaCl) with SHA-256 payload hashing. The signing flow SHALL: serialize the receipt payload as canonical JSON (sorted keys), SHA-256 hash the payload, sign the hash with the Ed25519 private key. Each receipt SHALL store the signature, payload_hash, public_key, and key_version to support key rotation.

Scenario: Receipt signing produces valid signature

WHEN a receipt is generated

THEN the stored signature verifies successfully against the payload_hash using the stored public_key via any Ed25519 library

Scenario: Key rotation preserves old receipt validity

WHEN the signing key is rotated to a new version

THEN old receipts remain verifiable using their stored public_key and key_version

Public receipt verification

The system SHALL expose a public endpoint (no authentication) at GET /public/receipts/verify/{serial_number} that returns the receipt metadata, signature, public_key, and verification instructions. This endpoint SHALL be rate-limited to 60 requests per minute per IP.

Scenario: Public verification of valid receipt

WHEN anyone sends GET /public/receipts/verify/CL-202603-00001

THEN the system returns the receipt metadata, signature, public_key, key_version, and verification instructions with verified: true

Scenario: Invalid serial number

WHEN a non-existent serial number is queried

THEN the system returns HTTP 404

Receipt listing and PDF download

The system SHALL provide authenticated endpoints to list receipts (paginated) and download individual receipt PDFs. Free-tier users SHALL receive HTTP 403 with an upgrade prompt.

Scenario: List receipts for paid org

WHEN a paid-tier user sends GET /api/v1/receipts

THEN the system returns paginated receipts with serial numbers, CO2 retired, credit references, and PDF URLs

Scenario: Free-tier user denied receipts

WHEN a free-tier user sends GET /api/v1/receipts

THEN the system returns HTTP 403 with an upgrade URL

Prior-period adjustments

The system SHALL handle late-arriving telemetry that falls in a closed billing period by creating a PriorPeriodAdjustment entry linked to the next open period. The adjustment SHALL include the CO2 delta and reason.

Scenario: Late telemetry after period close

WHEN a telemetry event is ingested with event_timestamp falling in a closed billing period

THEN the system creates a PriorPeriodAdjustment entry with co2_delta_kg and reason "late_telemetry" linked to the next open period


Audit & Compliance

Monthly audit pack generation

The system SHALL generate monthly audit packs (zip files) for paid-tier organizations on the 3rd of each month. Each pack SHALL contain all receipts (PDF), calculation breakdowns (JSON), retirement confirmations, and the methodology version used, with a manifest file whose SHA-256 hash is stored for integrity verification.

Scenario: Audit pack generated on schedule

WHEN the monthly audit pack cron runs on the 3rd

THEN the system iterates paid-tier organizations with closed periods in the prior month, generates a zip with receipts and calculations, uploads to S3, and stores an AuditPack record with the manifest hash

Scenario: No closed periods for org in prior month

WHEN the audit pack cron runs for an org with no closed billing periods in the prior month

THEN the system skips that organization without creating an audit pack

Audit pack download

The system SHALL provide an authenticated endpoint to download audit packs by year and month. Free-tier organizations SHALL receive HTTP 403.

Scenario: Download audit pack

WHEN a paid-tier user sends GET /api/v1/export/audit-pack/2026/02

THEN the system returns the zip file and the contained files match the stored manifest

Scenario: Audit pack not yet generated

WHEN a user requests an audit pack for a month that has not been generated yet

THEN the system returns HTTP 404

Scenario: Free-tier user denied

WHEN a free-tier user requests an audit pack download

THEN the system returns HTTP 403


Health & Observability

Health endpoint

The system SHALL expose a public /health endpoint (no authentication) that checks database connectivity, Redis availability, and the recency of the last poll timestamp. The endpoint SHALL return HTTP 200 when all checks pass and HTTP 503 when any check fails, for integration with load balancer health checks.

Scenario: All systems healthy

WHEN a client sends GET /health and database, Redis, and last poll are all nominal

THEN the system returns HTTP 200 with status "healthy" and individual check results with latency

Scenario: Redis unavailable

WHEN Redis is unreachable

THEN the system returns HTTP 503 with status "degraded" and the Redis check marked as "error"

Scenario: Stale polling detected

WHEN the last poll timestamp is older than 90 minutes

THEN the last_poll check status is "warning"; if older than 180 minutes, status is "error"

Structured logging with correlation IDs

The system SHALL use structured logging (JSON format) with correlation IDs that propagate across API requests and background jobs. Sensitive fields (API keys, Bearer tokens) SHALL be redacted from all log output.

Scenario: API request logging

WHEN an API request is processed

THEN the system logs the request with a unique correlation_id that appears in all related log entries (request, service calls, database queries, response)

Scenario: Background job logging

WHEN a background job executes (polling, reconciliation, billing close)

THEN the system logs with a job-specific correlation_id linking all operations within that job

Scenario: Sensitive field redaction

WHEN a log entry would contain an API key (sk-, sk-ant-) or Bearer token

THEN the system redacts the value, replacing it with [REDACTED]