Router Settings¶
Use router_settings for the gateway-wide defaults that control deployment selection, retries, timeouts, and aliases.
Quick Path¶
Start with a small, predictable config:
router_settings:
routing_strategy: simple-shuffle
num_retries: 1
retry_after: 1
timeout: 600
cooldown_time: 60
allowed_fails: 0
That is enough for most first deployments.
Tip: check the effective allowed_fails value in the config your deployment actually loads. Depending on how you run DeltaLLM, the source of truth is usually your config.yaml from DELTALLM_CONFIG_PATH, your Helm values.yaml, or the rendered Kubernetes ConfigMap.
Reference¶
| Setting | Default | What it controls |
|---|---|---|
routing_strategy |
simple-shuffle |
Global default strategy for choosing a deployment |
num_retries |
0 |
Extra retry attempts after the first failure |
retry_after |
0 |
Base backoff delay in seconds |
timeout |
600 |
Request timeout in seconds |
cooldown_time |
60 |
Seconds a failing deployment stays out of rotation |
allowed_fails |
0 |
Failures allowed before cooldown starts |
enable_pre_call_checks |
false |
Skip deployments already over configured RPM or TPM metadata |
model_group_alias |
{} |
Friendly names that map to real model groups |
route_groups |
[] |
File-defined route groups and membership |
Supported Strategies¶
These strategy names are valid today:
simple-shuffleleast-busylatency-based-routingcost-based-routingusage-based-routingtag-based-routingpriority-based-routingweightedrate-limit-aware
Short version:
simple-shuffle: best default when deployments are equivalentweighted: use for planned traffic splitspriority-based-routing: use for primary and standby routingleast-busy: use for burst balancinglatency-based-routing: use for latency-sensitive trafficcost-based-routing: use for lowest-cost routingusage-based-routing: use to spread quota usagerate-limit-aware: use to avoid hot deployments near RPM or TPM capstag-based-routing: use when tags decide eligibility and you want the route group to make that explicit
For non-text workloads, usage-aware routing can also use these deployment fields when they are configured:
image_pm_limitaudio_seconds_pm_limitchar_pm_limitrerank_units_pm_limit
See Routing & Failover for the full behavior and setup examples.
Route-Group Policy Support¶
Route-group policies currently support:
modestrategymemberstimeouts.global_msortimeouts.global_secondsretry.max_attemptsretry.retryable_error_classes
Helpful shortcut modes:
weightedmaps toweightedfallbackmaps topriority-based-routing
Do not treat conditional or adaptive as active runtime policy behaviors today.
Fallback Configuration¶
Fallback chains live under deltallm_settings, not router_settings:
deltallm_settings:
fallbacks:
- gpt-4o:
- gpt-4o-mini
context_window_fallbacks:
- gpt-4o-mini:
- gpt-4o
content_policy_fallbacks:
- gpt-4o:
- claude-3-sonnet
Model Aliases¶
Aliases let clients request a stable name while you map it to the real group: