Skip to main content

vLLM - Batch + Files API

LiteLLM supports vLLM's Batch and Files API for processing large volumes of requests asynchronously.

FeatureSupported
/v1/files✅
/v1/batches✅
Cost Tracking✅

Quick Start​

1. Setup config.yaml​

Define your vLLM model in config.yaml. LiteLLM uses the model name to route batch requests to the correct vLLM server.

model_list:
- model_name: my-vllm-model
litellm_params:
model: hosted_vllm/meta-llama/Llama-2-7b-chat-hf
api_base: http://localhost:8000 # your vLLM server

2. Start LiteLLM Proxy​

litellm --config /path/to/config.yaml

3. Create Batch File​

Create a JSONL file with your batch requests:

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "my-vllm-model", "messages": [{"role": "user", "content": "Hello!"}]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "my-vllm-model", "messages": [{"role": "user", "content": "How are you?"}]}}

4. Upload File & Create Batch​

Model Routing

LiteLLM needs to know which model (and therefore which vLLM server) to use for batch operations. Specify the model using the x-litellm-model header when uploading files. LiteLLM will encode this model info into the file ID, so subsequent batch operations automatically route to the correct server.

See Multi-Account / Model-Based Routing for more details.

Upload File

curl http://localhost:4000/v1/files \
-H "Authorization: Bearer sk-1234" \
-H "x-litellm-model: my-vllm-model" \
-F purpose="batch" \
-F file="@batch_requests.jsonl"

Create Batch

curl http://localhost:4000/v1/batches \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "file-abc123",
"endpoint": "/v1/chat/completions",
"completion_window": "24h"
}'

Check Batch Status

curl http://localhost:4000/v1/batches/batch_abc123 \
-H "Authorization: Bearer sk-1234"

Supported Operations​

OperationEndpointMethod
Upload file/v1/filesPOST
List files/v1/filesGET
Retrieve file/v1/files/{file_id}GET
Delete file/v1/files/{file_id}DELETE
Get file content/v1/files/{file_id}/contentGET
Create batch/v1/batchesPOST
List batches/v1/batchesGET
Retrieve batch/v1/batches/{batch_id}GET
Cancel batch/v1/batches/{batch_id}/cancelPOST

Environment Variables​

# Set vLLM server endpoint
export HOSTED_VLLM_API_BASE="http://localhost:8000"

# Optional: API key if your vLLM server requires authentication
export HOSTED_VLLM_API_KEY="your-api-key"

How Model Routing Works​

When you upload a file with x-litellm-model: my-vllm-model, LiteLLM:

  1. Encodes the model name into the returned file ID
  2. Uses this encoded model info to automatically route subsequent batch operations to the correct vLLM server
  3. No need to specify the model again when creating batches or retrieving results

This enables multi-tenant batch processing where different teams can use different vLLM deployments through the same LiteLLM proxy.

Learn more: Multi-Account / Model-Based Routing