Route Groups¶

Route Groups let you place multiple deployments behind one stable runtime target.

Use a route group when one public model name should:

balance across several deployments
fail over in a controlled way
carry its own routing policy
bind to a prompt at the group level

Route Groups List

Route Group Detail

Quick Success Workflow¶

Create the route group shell
Add one or more member deployments
Keep the default routing behavior at first
Mark the group live
Use the generated call example to test traffic

For most teams, this is the right first path. You do not need an advanced policy on day one.

What the Group Owns¶

A route group defines:

a stable group key
the workload type, such as chat or embeddings
which deployments are members
whether the group is live
optional prompt binding
optional routing policy history and overrides

Route groups are also callable targets. Their runtime visibility is governed through the same callable-target bindings and scope policies used for public model names.

What the List Page Shows¶

group key and display name
workload type
whether the group is live
member count
current routing state

What the Detail Page Lets You Do¶

edit the basic group metadata
add and remove member deployments
see the current usage example for calling the group
bind a prompt
inspect and publish routing policy changes

When You Need an Advanced Policy¶

Start with the default behavior unless you need one of these:

ordered failover
weighted traffic splits
a specific routing strategy
a draft, publish, rollback, or simulation workflow for routing changes

Routing Policy Basics¶

A route-group policy should stay small and explicit.

In practice, that means:

choose one routing strategy
optionally override member enabled, weight, or priority
optionally override the group timeout
optionally override retry behavior

The supported policy fields today are:

mode
strategy
members
timeouts.global_ms or timeouts.global_seconds
retry.max_attempts
retry.retryable_error_classes

Policy Modes¶

The UI can present policy modes as shortcuts:

weighted: use weighted traffic splitting
fallback: use ordered primary and standby behavior

How they behave:

weighted maps to the weighted strategy if you do not set a strategy explicitly
fallback maps to priority-based-routing if you do not set a strategy explicitly

Do not plan around these as live runtime modes yet:

conditional
adaptive

Those are not active route-policy behaviors in the runtime today.

Which Policy Should I Use?¶

Choose by goal:

use simple-shuffle when the deployments are roughly equal
use weighted when you want a controlled traffic split
use priority-based-routing or fallback when one deployment should be primary
use least-busy when you are smoothing burst traffic
use latency-based-routing when end-user latency matters most
use cost-based-routing when cost matters most
use rate-limit-aware when provider limits are the problem

For most teams, one of these three is enough:

simple-shuffle
weighted
priority-based-routing

Simple Policy Examples¶

Weighted rollout:

{
  "mode": "weighted",
  "members": [
    {"deployment_id": "dep-primary", "weight": 9},
    {"deployment_id": "dep-canary", "weight": 1}
  ]
}

Primary plus standby:

{
  "mode": "fallback",
  "members": [
    {"deployment_id": "dep-primary", "priority": 0},
    {"deployment_id": "dep-standby", "priority": 1}
  ]
}

Latency-sensitive route group:

{
  "strategy": "latency-based-routing",
  "timeouts": {"global_seconds": 45},
  "retry": {"max_attempts": 1}
}

Quota-aware route group:

{
  "strategy": "rate-limit-aware"
}

Member Overrides¶

Member overrides let the group behave differently without editing the underlying deployment definition.

enabled: take a member out of rotation without removing it
weight: change traffic share for weighted
priority: control order for priority-based-routing or fallback

Good Operating Pattern¶

Use this workflow:

Create the route group and add members
Start with simple-shuffle or weighted
Simulate before publishing policy changes
Publish only after the selection summary looks right
Check /health/deployments and /health/fallback-events after rollout

The simulation view is especially useful for:

checking weighted splits
confirming fallback order
confirming prompt-derived tag routing

Prompt Binding¶

Prompt binding belongs on the route group because the group decides where a prompt is applied.

That means:

Prompt Registry defines the prompt template and its versions
Route Groups decide which prompt is active for live traffic

If a prompt is bound, the usage example on the page should include the variables needed to call it correctly.

The backend exposes route-group endpoints for:

listing and editing groups
managing group members
reading and publishing routing policy
validating and simulating policy changes

See Admin Endpoints for the route-group API reference.