DeltaLLM¶
An open-source LLM gateway that provides a unified API for multiple LLM providers with enterprise-grade features.
DeltaLLM acts as a proxy between your applications and LLM providers like OpenAI, Anthropic, Azure OpenAI, Groq, and more. It gives you a single OpenAI-compatible API while adding powerful features on top.
Key Features¶
- Unified API — One OpenAI-compatible endpoint for 100+ LLM providers and models
- Virtual API Keys — Issue scoped keys with budgets, rate limits, and model restrictions
- Routing & Failover — Multiple routing strategies with automatic failover and retries
- MCP Gateway & Tooling — Register external MCP servers, expose approved tools, and bridge them into chat flows
- Guardrails — Built-in PII detection and prompt injection protection
- Spend Tracking — Per-key, per-team, and per-model cost attribution
- Rate Limiting — Hierarchical limits at organization, team, user, and key levels
- Caching — Response caching with memory, Redis, or S3 backends
- Batch API — Asynchronous embedding and chat workloads with production worker sizing guidance
- RBAC — Role-based access control with platform, organization, and team scopes
- Authentication — Session-based login, invitations, password recovery, MFA, and SSO (Microsoft Entra, Google, Okta, OIDC)
- Email Lifecycle — Durable outbox-backed delivery for invitations, password reset, and operator test email
- Admin Dashboard — Full-featured web UI for managing the gateway
- Observability — Prometheus metrics, request logging, and spend analytics
How It Works¶
┌──────────────┐ ┌──────────────────────────────────┐ ┌──────────────┐
│ │ │ DeltaLLM │ │ OpenAI │
│ Your App │────▶│ Auth → Rate Limit → Guardrails │────▶│ Anthropic │
│ (OpenAI SDK)│◀────│ Route → Cache → Provider Call │◀────│ Azure │
│ │ │ Spend Track → Callbacks │ │ Groq ... │
└──────────────┘ └──────────────────────────────────┘ └──────────────┘
Your applications use the standard OpenAI SDK — just change the base_url to point at DeltaLLM. The gateway handles authentication, routing, reliability, and cost tracking transparently.
Quick Links¶
- Docker Compose — Fastest way to run DeltaLLM locally
- Installation — Full local setup for development and contribution
- Quick Start — Use the gateway with curl, Python, and JavaScript
- MCP Quick Start — Register a server, expose a tool, and test
/mcp - Configuration Reference — Starter
config.yamland full settings reference - Batch API & Production Setup — Async batch API usage, scheduler behavior, and production worker sizing
- API Reference — OpenAI-compatible and admin API endpoints
- Admin UI Guide — Managing the gateway through the web dashboard