Skip to main content

Azure Responses API

PropertyDetails
DescriptionAzure OpenAI Responses API
custom_llm_provider on LiteLLMazure/
Supported Operations/v1/responses
Azure OpenAI Responses APIAzure OpenAI Responses API ↗
Cost Tracking, Logging Support✅ LiteLLM will log, track cost for Responses API Requests
Supported OpenAI Params✅ All OpenAI params are supported, See here

Usage​

Create a model response​

Non-streaming​

Azure Responses API
import litellm

# Non-streaming response
response = litellm.responses(
model="azure/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com/",
api_version="2023-03-15-preview",
)

print(response)

Streaming​

Azure Responses API
import litellm

# Streaming response
response = litellm.responses(
model="azure/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com/",
api_version="2023-03-15-preview",
)

for event in response:
print(event)

Azure Codex Models​

Codex models use Azure's new /v1/preview API which provides ongoing access to the latest features with no need to update api-version each month.

LiteLLM will send your requests to the /v1/preview endpoint when you set api_version="preview".

Non-streaming​

Azure Codex Models
import litellm

# Non-streaming response with Codex models
response = litellm.responses(
model="azure/codex-mini",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com",
api_version="preview", # 👈 key difference
)

print(response)

Streaming​

Azure Codex Models
import litellm

# Streaming response with Codex models
response = litellm.responses(
model="azure/codex-mini",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com",
api_version="preview", # 👈 key difference
)

for event in response:
print(event)

Calling via /chat/completions​

You can also call the Azure Responses API via the /chat/completions endpoint.

from litellm import completion
import os

os.environ["AZURE_API_BASE"] = "https://my-endpoint-sweden-berri992.openai.azure.com/"
os.environ["AZURE_API_VERSION"] = "2023-03-15-preview"
os.environ["AZURE_API_KEY"] = "my-api-key"

response = completion(
model="azure/responses/my-custom-o1-pro",
messages=[{"role": "user", "content": "Hello world"}],
)

print(response)