Observability¶

DeltaLLM exposes health endpoints, Prometheus metrics, spend views, and callback integrations so you can monitor both gateway behavior and provider traffic.

Quick Path¶

For a practical first setup:

Check /health after startup
Scrape /metrics from Prometheus
Use the Usage & Spend page for request and cost trends
Add callback integrations only when you need external sinks such as S3, Langfuse, or OpenTelemetry

Usage & Spend

Health Endpoints¶

These endpoints are the fastest way to confirm the service is alive and dependencies are reachable.

Endpoint	Purpose
`GET /health`	Combined liveliness and readiness view
`GET /health/liveliness`	Process is up
`GET /health/readiness`	Redis and database readiness
`GET /health/deployments`	Deployment health summary
`GET /health/fallback-events`	Recent retry and failover events

Enable background deployment checks if you want proactive health updates:

general_settings:
  background_health_checks: true
  health_check_interval: 300

Prometheus Metrics¶

Metrics are exposed at /metrics in Prometheus format.

Example scrape config:

scrape_configs:
  - job_name: deltallm
    scrape_interval: 15s
    static_configs:
      - targets: ["localhost:8000"]
    metrics_path: /metrics

Core metrics include:

Metric	Type	Meaning
`deltallm_requests_total`	Counter	Total proxied requests
`deltallm_request_failures_total`	Counter	Failed requests by error type
`deltallm_input_tokens_total`	Counter	Input tokens processed
`deltallm_output_tokens_total`	Counter	Output tokens processed
`deltallm_spend_total`	Counter	Recorded spend
`deltallm_cache_hit_total`	Counter	Cache hits
`deltallm_cache_miss_total`	Counter	Cache misses
`deltallm_request_total_latency_seconds`	Histogram	End-to-end latency
`deltallm_llm_api_latency_seconds`	Histogram	Provider-only latency
`deltallm_deployment_state`	Gauge	Deployment health state
`deltallm_deployment_active_requests`	Gauge	In-flight requests per deployment
`deltallm_deployment_cooldown`	Gauge	Whether a deployment is cooled down
`deltallm_prompt_resolutions_total`	Counter	Prompt registry resolution results
`deltallm_prompt_resolution_latency_seconds`	Histogram	Prompt resolution latency
`deltallm_audit_queue_depth`	Gauge	Audit ingestion backlog
`deltallm_audit_write_failures_total`	Counter	Audit write failures
`deltallm_audit_events_dropped_total`	Counter	Dropped audit events
`deltallm_audit_ingestion_latency_seconds`	Histogram	Audit write latency

Callback Integrations¶

DeltaLLM supports built-in callback integrations for:

prometheus
langfuse
otel
opentelemetry
s3

Example S3 logging:

deltallm_settings:
  success_callback:
    - s3
  callback_settings:
    s3:
      bucket: os.environ/DELTALLM_S3_BUCKET
      region: us-east-1
      prefix: deltallm-logs/

Message Logging Privacy¶

If you do not want request and response message content stored in the standard logging payloads, disable it:

deltallm_settings:
  turn_off_message_logging: true

Spend, token, and metadata tracking still continue.