Models¶
The Models page is where operators create and manage concrete provider-backed deployments.
Each deployment defines:
- the public model name clients will call
- the provider identity used for routing visibility, spend reports, callbacks, and dashboards
- the upstream provider model ID
- credentials and connection details, either inline or through a shared named credential
- the workload type, such as chat or embeddings
- optional pricing and default request parameters

Quick Success Workflow¶
- Open AI Gateway > Models
- Add one deployment for a model you already have provider access to
- Confirm the deployment becomes healthy
- Verify the model appears in
GET /v1/models - Send a test proxy request
What You Manage Here¶
model_name: the public name clients usedeployment_id: the stable internal identifier for this deployment- provider settings in
deltallm_params - workload mode in
model_info.mode - access groups in
model_info.access_groups - pricing metadata for spend tracking
- default request parameters
Recommended First Deployment¶
For a simple first deployment:
- Set
model_nameto the public name you want clients to use - Choose the provider
- Set
deltallm_params.modelto the upstream model ID - Add the provider API key inline, or select a shared named credential
- Keep the default mode as
chatunless this is an embeddings, image, audio, or rerank model
If you do not set a deployment_id, DeltaLLM creates one automatically.
Chat Batch Execution¶
For chat deployments, the model form includes Batch Execution controls in the Chat Settings section.
- Leave Mode as Default concurrent for the standard behavior: one upstream request per batch item, bounded by worker concurrency
- Select Concurrent when you want to persist an explicit per-deployment
max_in_flightlimit - Select Sync microbatch only for providers with a proven synchronous microbatch API that returns one response and exact usage per input item
- Select Disabled when the deployment should never use chat microbatch grouping
Blank numeric fields are treated as unset. For example, leaving Max
In-Flight empty does not send 0; it leaves that limit to the worker default.
When Mode is Default concurrent and all chat batching numeric fields are
empty, the UI clears any stored deltallm_params.chat_batching override. If
Max In-Flight is set while Mode remains Default concurrent, the UI saves
an explicit mode: concurrent override with that limit. Disabled ignores
numeric batching fields. If Sync microbatch is selected, Upstream Max
Size is required and must be at least 2.
Access Groups¶
The model form includes an Access Groups field for authorization grouping. Enter group keys such as beta or support when scopes should be able to grant access to a set of callable targets instead of selecting each model separately.
Access groups are attached to the public model name, not a single provider deployment. If several deployments share the same model_name, keep their access group lists identical so group expansion remains deterministic.
Do not use access groups for routing. Deployment tags remain routing metadata and can be matched by request metadata.tags; tags do not make a model visible to an organization, team, key, or user.
What the Table Tells You¶
- Model Name: the public runtime model name
- Provider: explicit deployment provider such as OpenAI or Groq
- Type: runtime mode such as
chatorembedding - Deployment ID: the internal ID used by route groups and policies
- Health: whether the runtime currently sees the deployment as healthy
When You Need Route Groups¶
You do not need a route group for a single deployment.
Create a route group when:
- you want multiple deployments behind one logical target
- you want explicit routing policy
- you want controlled failover behavior
- you want to bind prompt behavior at the route-group level
Custom Upstream Auth Headers¶
For these OpenAI-compatible providers, the model form supports inline upstream auth-header overrides:
openaiopenroutergroqtogetherfireworksdeepinfraperplexityvllmlmstudioollama
In the model form:
- Choose Inline credentials
- Enter
api_keyand anyapi_base - Expand the provider connection fields
- Fill
Auth Header NameandAuth Header Formatif the upstream does not acceptAuthorization: Bearer ...
Example inline deployment payload:
{
"model_name": "support-vllm",
"deltallm_params": {
"provider": "vllm",
"model": "vllm/meta-llama/Llama-3.1-8B-Instruct",
"api_key": "gateway-key",
"api_base": "https://vllm.example/v1",
"auth_header_name": "X-API-Key",
"auth_header_format": "{api_key}"
},
"model_info": {
"mode": "chat"
}
}
For shared gateway credentials, Named Credentials remain the recommended workflow. If a deployment references a named credential and also carries overlapping connection fields locally, the named credential values win.
Operational Notes¶
- DeltaLLM validates provider and mode compatibility when you create or update a deployment
- The explicit provider is the source of truth for dashboards, spend reports, callbacks, and metrics
- Shared provider credentials are best managed through Named Credentials
- Creating, updating, and deleting deployments requires admin access
- Readable deployment IDs make later route-group work easier
- Visibility to organizations, teams, keys, and users is governed through callable-target bindings, access-group bindings, and scope policies