Budgets & Spend Tracking¶
DeltaLLM records usage cost for proxied requests and can stop traffic when a budget is exhausted.
Quick Path¶
For a safe first rollout:
- Make sure each deployed model has pricing metadata
- Set a hard budget on the API key you want to protect
- Send test traffic through the gateway
- Check the Usage & Spend page or spend APIs to confirm cost is being recorded
Example API key with a hard cap:
curl -X POST http://localhost:8000/ui/api/keys \
-H "Authorization: Bearer YOUR_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"key_name": "budget-key",
"max_budget": 50.0
}'
When the key reaches its configured budget, DeltaLLM rejects new requests with a budget_exceeded error.
How Spend Is Calculated¶
Spend is based on the pricing metadata attached to the served model deployment.
model_list:
- model_name: gpt-4o-mini
deltallm_params:
provider: openai
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY
model_info:
input_cost_per_token: 0.00000015
output_cost_per_token: 0.0000006
DeltaLLM also supports other pricing units when they fit the model type:
input_cost_per_characterinput_cost_per_second- batch pricing fields such as
batch_input_cost_per_token
Budget Levels¶
Hard budgets can be enforced at these levels:
There is also support for team-per-model hard budgets, which is useful when one team can use several models but one of them needs its own cap.
View Spend¶
Use the Admin UI for the fastest view, or query the admin endpoints directly.
Summary:
Breakdown report:
curl "http://localhost:8000/ui/api/spend/report?group_by=model" \
-H "Authorization: Bearer YOUR_MASTER_KEY"
The legacy master-key spend endpoints also exist under /global/spend, /global/spend/report, /global/spend/keys, and /global/spend/teams.
Soft Budgets and Resets¶
DeltaLLM also supports soft-budget alerting for keys, users, teams, and organizations. A soft budget does not block traffic; it triggers an alert through the configured notification flow.
Soft-budget notifications are:
- opt-in
- disabled by default
- email-based
- deduplicated within the configured TTL window
To enable them:
general_settings:
email_enabled: true
governance_notifications_enabled: true
budget_notifications_enabled: true
budget_alert_ttl_seconds: 3600
Budget notifications require email delivery to be configured first.
If an entity has both budget_duration and budget_reset_at, the runtime can reset tracked spend automatically when the reset window is reached. Durations use a positive integer up to 10000 followed by h, d, or mo, for example 1h, 7d, 30d, and 1mo.
Organization monthly reset is available from the organization create page, organization list edit modal, and organization detail settings. Monthly reset timestamps are UTC. Monthly reset is lazy: it runs when budget enforcement checks the organization after the configured reset time. It clears the tracked organization spend counter and advances the next reset by calendar month. The selected UTC day of month is preserved; if the next month is shorter, the reset clamps for that month only, for example January 30 -> February 28 -> March 30. It does not carry unused budget forward.
Organization Soft Budgets¶
Organizations now support soft_budget alongside max_budget.
You can manage organization soft budgets from:
- the organization create flow
- the organization detail page
- the organization admin API
Admin UI¶
The Usage & Spend page is the main operator view for:
- total spend
- request volume
- per-model and per-key breakdowns
- detailed request logs