v1.81.9 - Control which MCP Servers are exposed on the Internet
Deploy this version​
- Docker
- Pip
docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:main-v1.81.9
pip install litellm
pip install litellm==1.81.9
Key Highlights​
- Claude Opus 4.6 - Full support across Anthropic, AWS Bedrock, Azure AI, and Vertex AI with adaptive thinking and 1M context window
- A2A Agent Gateway - Call A2A (Agent-to-Agent) registered agents through the standard
/chat/completionsAPI - Expose MCP servers on the public internet - Launch MCP servers with public/private visibility and IP-based access control for internet-facing deployments
- Performance Optimizations - Multiple performance improvements including ~40% Prometheus CPU reduction, LRU caching, and optimized logging paths
MCP Servers on the Public Internet​
This release makes it safe to expose MCP servers on the public internet by adding public/private visibility and IP-based access control. You can now run internet-facing MCP services while restricting access to trusted networks and keeping internal tools private.
Let's dive in.
New Models / Updated Models​
New Model Support (13 new models)​
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|---|---|
| Anthropic | claude-opus-4-6 | 1M | $5.00 | $25.00 |
| AWS Bedrock | anthropic.claude-opus-4-6-v1 | 1M | $5.00 | $25.00 |
| Azure AI | azure_ai/claude-opus-4-6 | 200K | $5.00 | $25.00 |
| Vertex AI | vertex_ai/claude-opus-4-6 | 1M | $5.00 | $25.00 |
| Google Gemini | gemini/deep-research-pro-preview-12-2025 | 65K | $2.00 | $12.00 |
| Vertex AI | vertex_ai/deep-research-pro-preview-12-2025 | 65K | $2.00 | $12.00 |
| Moonshot | moonshot/kimi-k2.5 | 262K | $0.60 | $3.00 |
| OpenRouter | openrouter/qwen/qwen3-235b-a22b-2507 | 262K | $0.07 | $0.10 |
| OpenRouter | openrouter/qwen/qwen3-235b-a22b-thinking-2507 | 262K | $0.11 | $0.60 |
| Together AI | together_ai/zai-org/GLM-4.7 | 200K | $0.45 | $2.00 |
| Together AI | together_ai/moonshotai/Kimi-K2.5 | 256K | $0.50 | $2.80 |
| ElevenLabs | elevenlabs/eleven_v3 | - | $0.18/1K chars | - |
| ElevenLabs | elevenlabs/eleven_multilingual_v2 | - | $0.18/1K chars | - |
Features​
-
- Add 1hr tiered caching costs for long-context models - PR #20214
- Support TTL (1h) field in prompt caching for Bedrock Claude 4.5 models - PR #20338
- Add Nova Sonic speech-to-speech model support - PR #20244
- Fix empty assistant message for Converse API - PR #20390
- Fix content blocked handling - PR #20606
-
- Add Gemini Deep Research model support - PR #20406
- Fix Vertex AI Gemini streaming content_filter handling - PR #20105
- Allow using OpenAI-style tools for
web_searchwith Vertex AI/Gemini models - PR #20280 - Fix
supports_native_streamingfor Gemini and Vertex AI models - PR #20408 - Add mapping for responses tools in file IDs - PR #20402
-
- Support
dimensionsparam for Cohere embed v4 - PR #20235
- Support
-
- Add reasoning param support for GPT OSS Cerebras - PR #20258
-
- Add Kimi K2.5 model entries - PR #20273
-
- Add Qwen3-235B models - PR #20455
-
- Add GLM-4.7 and Kimi-K2.5 models - PR #20319
-
- Add
eleven_v3andeleven_multilingual_v2TTS models - PR #20522
- Add
-
- Add missing capability flags to models - PR #20276
-
- Fix system prompts being dropped and auto-add required Copilot headers - PR #20113
-
- Fix incorrect merging of consecutive user messages for GigaChat provider - PR #20341
-
- Add xAI
/realtimeAPI support - works with LiveKit SDK - PR #20381
- Add xAI
-
- Add
gpt-5-search-apimodel and docs clarifications - PR #20512
- Add
Bug Fixes​
-
- Fix extra inputs not permitted error for
provider_specific_fields- PR #20334
- Fix extra inputs not permitted error for
-
- Fix: Managed Batches inconsistent state management for list and cancel batches - PR #20331
-
- Fix
open_ai_embedding_modelsto havecustom_llm_providerNone - PR #20253
- Fix
LLM API Endpoints​
Features​
-
- Filter unsupported Claude Code beta headers for non-Anthropic providers - PR #20578
- Fix inconsistent response format in
anthropic.messages.acreate()when using non-Anthropic providers - PR #20442 - Fix 404 on
/api/event_logging/batchendpoint that caused Claude Code "route not found" errors - PR #20504
-
- Add support for delete and GET via file_id for Gemini - PR #20329
-
General
Management Endpoints / UI​
Features​
-
SSO Configuration
-
Auth / SDK
- Add
proxy_authfor auto OAuth2/JWT token management in SDK - PR #20238
- Add
-
Virtual Keys
-
Teams & Budgets
-
UI Improvements
- Default Team Settings: Migrate to use Reusable Model Select - PR #20310
- Navbar: Option to Hide Community Engagement Buttons - PR #20308
- Show team alias on Models health page - PR #20359
- Admin Settings: Add option for Authentication for public AI Hub - PR #20444
- Adjust daily spend date filtering for user timezone - PR #20472
-
SCIM
- Add base
/scim/v2endpoint for SCIM resource discovery - PR #20301
- Add base
-
Proxy CLI
- CLI arguments for RDS IAM auth - PR #20437
Bugs​
- Fix: Remove unnecessary key blocking on UI login that prevented access - PR #20210
- UI - Team Settings: Disable Global Guardrail Persistence - PR #20307
- UI - Model Info Page: Fix Input and Output Labels - PR #20462
- UI - Model Page: Column Resizing on Smaller Screens - PR #20599
- Fix
/key/listuser_idEmpty String Edge Case - PR #20623 - Add array type checks for model, agent, and MCP hub data to prevent UI crashes - PR #20469
- Fix unique constraint on daily tables + logging when updates fail - PR #20394
Logging / Guardrail / Prompt Management Integrations​
Bug Fixes (3 fixes)​
-
- Fix Langfuse OTEL trace export failing when spans contain null attributes - PR #20382
-
- Fix incorrect failure metrics labels causing miscounted error rates - PR #20152
-
- Fix Slack alert delivery failing for certain budget threshold configurations - PR #20257
Guardrails (7 updates)​
-
Custom Code Guardrails
-
Team-Based Guardrails
- Implement team-based isolation guardrails management - PR #20318
-
- Ensure OpenAI Moderations Guard works with OpenAI Embeddings - PR #20523
-
- Fix fail-open for GraySwan and pass metadata to Cygnal API endpoint - PR #19837
-
General
Spend Tracking, Budgets and Rate Limiting​
- Support 0 cost models - Allow zero-cost model entries for internal/free-tier models - PR #20249
MCP Gateway (9 updates)​
- MCP Semantic Filtering - Filter MCP tools using semantic similarity to reduce tool sprawl for LLM calls - PR #20296, PR #20316
- UI - MCP Semantic Filtering - Add support for MCP Semantic Filtering configuration on UI - PR #20454
- MCP IP-Based Access Control - Set MCP servers as private/public available on internet with IP-based restrictions - PR #20607, PR #20620
- Fix MCP "Session not found" error on VSCode reconnect - PR #20298
- Fix OAuth2 'Capabilities: none' bug for upstream MCP servers - PR #20602
- Include Config Defined Search Tools in
/search_tools/list- PR #20371 - UI - Search Tools: Show Config Defined Search Tools - PR #20436
- Ensure MCP permissions are enforced when using JWT Auth - PR #20383
- Fix
gcs_bucket_namenot being passed correctly for MCP server storage configuration - PR #20491
Performance / Loadbalancing / Reliability improvements (14 improvements)​
- Prometheus ~40% CPU reduction - Parallelize budget metrics, fix caching bug, reduce CPU usage - PR #20544
- Prevent closed client errors by reverting httpx client caching - PR #20025
- Avoid unnecessary Router creation when no models or search tools are configured - PR #20661
- Optimize
wrapper_asyncwithCallTypescaching and reduced lookups - PR #20204 - Cache
_get_relevant_args_to_use_for_logging()at module level - PR #20077 - LRU cache for
normalize_request_route- PR #19812 - Optimize
get_standard_logging_metadatawith set intersection - PR #19685 - Early-exit guards in
completion_costfor unused features - PR #20020 - Optimize
get_litellm_paramswith sparse kwargs extraction - PR #19884 - Guard debug log f-strings and remove redundant dict copies - PR #19961
- Replace enum construction with frozenset lookup - PR #20302
- Guard debug f-string in
update_environment_variables- PR #20360 - Warn when budget lookup fails to surface silent caching misses - PR #20545
- Add INFO-level session reuse logging per request for better observability - PR #20597
Database Changes​
Schema Updates​
| Table | Change Type | Description | PR | Migration |
|---|---|---|---|---|
LiteLLM_TeamTable | New Column | Added allow_team_guardrail_config boolean field for team-based guardrail isolation | PR #20318 | Migration |
LiteLLM_DeletedTeamTable | New Column | Added allow_team_guardrail_config boolean field | PR #20318 | Migration |
LiteLLM_TeamTable | New Column | Added soft_budget (double precision) for soft budget alerting | PR #20530 | Migration |
LiteLLM_DeletedTeamTable | New Column | Added soft_budget (double precision) | PR #20653 | Migration |
LiteLLM_MCPServerTable | New Column | Added available_on_public_internet boolean for MCP IP-based access control | PR #20607 | Migration |
Documentation Updates (14 updates)​
- Add FAQ for setting up and verifying LITELLM_LICENSE - PR #20284
- Model request tags documentation - PR #20290
- Add Prisma migration troubleshooting guide - PR #20300
- MCP Semantic Filtering documentation - PR #20316
- Add CopilotKit SDK doc as supported agents SDK - PR #20396
- Add documentation for Nova Sonic - PR #20320
- Update Vertex AI Text to Speech doc to show use of audio - PR #20255
- Improve Okta SSO setup guide with step-by-step instructions - PR #20353
- Langfuse doc update - PR #20443
- Expose MCPs on public internet documentation - PR #20626
- Add blog post: Achieving Sub-Millisecond Proxy Overhead - PR #20309
- Add blog post about litellm-observatory - PR #20622
- Update Opus 4.6 blog with adaptive thinking - PR #20637
gpt-5-search-apidocs clarifications - PR #20512
New Contributors​
- @Quentin-M made their first contribution in PR #19818
- @amirzaushnizer made their first contribution in PR #20235
- @cscguochang made their first contribution in PR #20214
- @krauckbot made their first contribution in PR #20273
- @agrattan0820 made their first contribution in PR #19784
- @nina-hu made their first contribution in PR #20472
- @swayambhu94 made their first contribution in PR #20469
- @ssadedin made their first contribution in PR #20566

