Proxy Endpoints¶

DeltaLLM's proxy API is OpenAI-compatible. In most clients, you only change the base_url and API key.

Quick Start¶

Use the same auth header for every proxy endpoint: Authorization: Bearer YOUR_API_KEY

Check which models are available:

curl http://localhost:8000/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

Send a chat request:

curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Hello from DeltaLLM"}
    ]
  }'

Endpoint Map¶

Endpoint	Purpose
`POST /v1/chat/completions`	Chat completions, including streaming
`POST /v1/messages`	Anthropic Messages API-compatible subset, see Anthropic Messages
`POST /v1/completions`	Legacy prompt-style completions
`POST /v1/responses`	Responses API compatible subset
`POST /v1/embeddings`	Text embeddings
`POST /v1/images/generations`	Image generation
`POST /v1/audio/speech`	Text-to-speech
`POST /v1/audio/transcriptions`	Speech-to-text
`POST /v1/rerank`	Reranking
`GET /v1/models`	Available public model names
`POST /v1/files`	Upload batch input files
`GET /v1/files/{file_id}`	Inspect batch files
`GET /v1/files/{file_id}/content`	Download batch file content
`POST /v1/batches`	Create embedding or non-streaming chat completion batches
`GET /v1/batches`	List batches
`GET /v1/batches/{batch_id}`	Inspect one batch
`POST /v1/batches/{batch_id}/cancel`	Cancel a batch

Text Endpoints¶

Chat Completions¶

POST /v1/chat/completions

This is the main endpoint most applications should start with.

Chat requests also support DeltaLLM-managed MCP tools through tools: [{ "type": "mcp", ... }] on non-streaming requests. See MCP Gateway & Tooling.

Completions (Legacy)¶

POST /v1/completions

Use this only if you still have prompt-based clients. DeltaLLM translates prompt into a chat-style user message internally.

Unsupported request fields currently return 400:

echo
best_of > 1
logprobs
suffix

Responses¶

POST /v1/responses

DeltaLLM supports a compatible subset of the Responses API and translates these requests into chat completions internally.

curl http://localhost:8000/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "input": "Write a one-line summary of DeltaLLM.",
    "stream": false
  }'

Responses requests also support DeltaLLM-managed MCP tools on non-streaming requests. See MCP Gateway & Tooling.

Embeddings¶

POST /v1/embeddings

curl http://localhost:8000/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox"
  }'

Multimodal and Specialized Endpoints¶

Image Generation¶

POST /v1/images/generations

curl http://localhost:8000/v1/images/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dall-e-3",
    "prompt": "A sunset over mountains",
    "size": "1024x1024"
  }'

Audio Speech¶

POST /v1/audio/speech

This endpoint returns audio bytes, not JSON.

Public speech requests remain OpenAI-compatible even when the selected deployment is ElevenLabs. For ElevenLabs, DeltaLLM maps input to native text, resolves the ElevenLabs voice ID from request voice or deployment model_info.default_params.voice_id, and calls POST /v1/text-to-speech/{voice_id} upstream with xi-api-key.

curl http://localhost:8000/v1/audio/speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello world",
    "voice": "alloy",
    "response_format": "mp3"
  }' \
  --output speech.mp3

Example using an ElevenLabs-backed public model:

curl http://localhost:8000/v1/audio/speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "elevenlabs-tts",
    "input": "Hello from ElevenLabs through DeltaLLM",
    "voice": "21m00Tcm4TlvDq8ikWAM",
    "response_format": "mp3"
  }' \
  --output elevenlabs-speech.mp3

Audio Transcription¶

POST /v1/audio/transcriptions

This endpoint accepts multipart form data.

Public transcription requests remain OpenAI-compatible even when the selected deployment is ElevenLabs. For ElevenLabs, DeltaLLM sends the uploaded audio to native POST /v1/speech-to-text, maps public language to language_code, maps temperature, and reshapes the provider response back into the requested public response format.

curl http://localhost:8000/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@sample.wav" \
  -F "model=whisper-large" \
  -F "response_format=json"

Example using an ElevenLabs-backed public model:

curl http://localhost:8000/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@sample.wav" \
  -F "model=elevenlabs-stt" \
  -F "language=en" \
  -F "response_format=verbose_json"

Rerank¶

POST /v1/rerank

curl http://localhost:8000/v1/rerank \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rerank-english-v2.0",
    "query": "What is machine learning?",
    "documents": [
      "Machine learning is a subset of AI.",
      "The weather is sunny today.",
      "Deep learning uses neural networks."
    ],
    "top_n": 2
  }'

Batch Endpoints¶

Files¶

POST /v1/files
GET /v1/files/{file_id}
GET /v1/files/{file_id}/content

Use files as the input and output artifacts for batch jobs.

curl http://localhost:8000/v1/files \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "purpose=batch" \
  -F "file=@input.jsonl"

Batches¶

POST /v1/batches
GET /v1/batches
GET /v1/batches/{batch_id}
POST /v1/batches/{batch_id}/cancel

Create a batch:

curl http://localhost:8000/v1/batches \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input_file_id": "file_123",
    "endpoint": "/v1/embeddings",
    "completion_window": "24h"
  }'

DeltaLLM supports the OpenAI-compatible 24h batch completion window. Missing or null completion_window values default to 24h; unsupported values return HTTP 400.

Inspect a batch:

curl http://localhost:8000/v1/batches/batch_123 \
  -H "Authorization: Bearer YOUR_API_KEY"

Current behavior:

batch endpoints are available only when general_settings.embeddings_batch_enabled: true
supported endpoints are "/v1/embeddings" and "/v1/chat/completions"
public batch responses include OpenAI-compatible completion_window, errors, expires_at, failed_at, and expired_at fields
chat batch requests must be non-streaming; stream: true is rejected
MCP tools are not supported in chat batch requests yet

Model Discovery¶

List Models¶

GET /v1/models

DeltaLLM returns the public model names that clients can request. This list is built from the current runtime model registry.

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4o-mini",
      "object": "model",
      "owned_by": "deltallm"
    }
  ]
}