NLP（一百二十）LiteLLM解析：构建统一大模型接口的利器

本文介绍了如何使用 LiteLLM 以统一的 OpenAI 风格调用包括 OpenAI、Anthropic 和 Google 在内的多种主流大模型 API，并通过具体代码展示其 Completion 接口、路由机制的使用，以及在 OpenAI Agents 框架中集成任意模型的方式。LiteLLM 大大简化了多模型调用的复杂性，提高了多供应商大模型接入的效率与稳定性。

前言

在我们平时调用大模型API的时候，有些是采用公司对外提供的第三方API，如OpenAI, Anthropic, Google公司的LLM API，有些是采用自行部署的API，比如使用HuggingFace, vLLM部署的大模型API，有些是第三方集成公司提供的API，其它还有Groq, Mistral等模型的API调用。

上述API调用方式五花八门，不同API之间的请求参数也不一样，需要使用不同的工具包或HTTP接口来完成调用。而LiteLLM的出现改变了这一切复杂、无序的现状，它允许你以OpenAI的请风格来完成任意大模型的调用。

LiteLLM的官方网站为https://docs.litellm.ai/，其对应的Python工具目前已支持以OpenAI的输入、输出格式来调用100多种大模型。其强大的功能如下：

将输入转换为各服务商的 completion（文本生成）、embedding（向量嵌入）和 image_generation（图像生成）接口。
保持输出格式一致，结果响应始终可通过 ['choices'][0]['message']['content'] 获取。
支持多部署环境下的重试与回退机制（例如 Azure/OpenAI）——路由器模块负责处理。
通过LiteLLM Proxy Server追踪每个项目的开销，并设置预算。

下面笔者将会介绍如何使用LiteLLM来完成OpenAI, Anthropic, Google公司的LLM API，LiteLLM中的路由机制，并在Agent框架OpenAI Agents中使用这三家公司的LLM.

Completion

使用LiteLLM可轻松实现OpenAI, Anthropic, Google公司的LLM API，Python代码如下：

import os
from dotenv import load_dotenv
from litellm import completion

load_dotenv()


# set ENV variables
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["ANTHROPIC_API_KEY"] = os.getenv("ANTHROPIC_API_KEY")
os.environ["GEMINI_API_KEY"] = os.getenv("GEMINI_API_KEY")

messages = [{"content": "Who are you?", "role": "user"}]

# openai call
response = completion(model="openai/gpt-4o", messages=messages)
print("openai response:")
print(response.choices[0].message.content)

# anthropic call
response = completion(model="anthropic/claude-3-sonnet-20240229", messages=messages)
print("\nanthropic response:")
print(response.choices[0].message.content)

# gemini call
response = completion(model="gemini/gemini-2.0-flash", messages=messages)
print("\ngemini response:")
print(response.choices[0].message.content)

运行结果如下:

openai response:
I'm an AI language model developed by OpenAI, known as ChatGPT. I'm here to assist you with information, answer questions, and help with a wide range of topics. How can I assist you today?

anthropic response:
I am an artificial intelligence created by Anthropic. I am a large language model trained to assist with a variety of tasks like analysis, writing, research, answering questions and more. I have access to a large amount of information but I don't have subjective experiences or a physical form. I'm an AI assistant  aimed at being helpful, honest and harmless.

gemini response:
I am a large language model, trained by Google.

上述代码以统一的调用方式实现了各个大模型的Completion接口。其它功能，如异步客户端，流式输出，错误处理，观测行为，接口花费等，都是用OpenAI风格统一实现的。因此，有了这个功能，我们就可以用熟悉的OpenAI风格来调用各个大模型的不同功能接口。

路由机制

LiteLLM的路由机制（Router）提供丰富的管理功能：

在多个部署环境之间进行负载均衡（例如 Azure/OpenAI）。
对重要请求进行优先级处理（即排队机制），以确保这些请求不会失败。
基础可靠性逻辑：在多个部署/服务提供商之间支持冷却时间、回退机制、超时处理和重试（包括固定间隔与指数回退）。

这些功能都是大模型调用经常会遇到的，路由机制很好地帮助我们保障了LLM API服务的稳定性。

我们以路由策略为例，默认为基于权重的随机选择策略。示例Python代码如下：

import os
from litellm import Router
import asyncio


model_list = [{
    "model_name": "gcg",    # model alias
    "litellm_params": {
        "model": "openai/gpt-4o",
        "api_key": "sk-proj-xxx",
        "weight": 1
    }
}, {
    "model_name": "gcg",    # model alias
    "litellm_params": {
        "model": "anthropic/claude-3-5-sonnet-20240620",
        "api_key": "sk-ant-xxx",
        "weight": 1,
    }
}
]

router = Router(model_list=model_list, routing_strategy="simple-shuffle")

async def router_acompletion(i):
    print(f"Running {i} task...")
    response = await router.acompletion(
        model="gcg",
        messages=[{"role": "user", "content": "Who are you? reply in one sentence."}]
    )
    print(f"\nmodel: {response.model}, response: {response.choices[0].message.content}")


async def run():
    tasks = [router_acompletion(i) for i in range(5)]
    await asyncio.gather(*tasks)

if __name__ == "__main__":
    asyncio.run(run())

上述的五个协程中，每个协程都会随机调用别名为gcg的模型进行回复，gcg模型提供了权重相等的两种调用方式。

运行结果如下：

Running 0 task...
Running 1 task...
Running 2 task...
Running 3 task...
Running 4 task...

model: claude-3-5-sonnet-20240620, response: I am an AI assistant created by Anthropic to be helpful, harmless, and honest.

model: claude-3-5-sonnet-20240620, response: I am an AI assistant created by Anthropic to be helpful, harmless, and honest.

model: claude-3-5-sonnet-20240620, response: I am an artificial intelligence created by Anthropic to engage in conversation and assist with tasks.

model: gpt-4o-2024-08-06, response: I am an AI language model developed by OpenAI, designed to assist with information and answer questions across a wide range of topics.

model: gpt-4o-2024-08-06, response: I am an AI language model created by OpenAI, here to assist you with information and answer your questions.

LiteLLM的路由机制非常有用，笔者后续将会再单独详细介绍~

`OpenAI Agents`使用任何LLM

在Agent框架OpenAI Agents，借用LiteLLM，可实现任意LLM的调用。安装命令如下:

1	`pip install "openai-agents[litellm]"`

示例Python代码如下：

# -*- coding: utf-8 -*-
import os
import asyncio
import base64
import logfire
from random import choice
from datetime import datetime
from dotenv import load_dotenv
from agents import Agent, Runner, function_tool
from agents.extensions.models.litellm_model import LitellmModel


load_dotenv()

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["ANTHROPIC_API_KEY"] = os.getenv("ANTHROPIC_API_KEY")
os.environ["GEMINI_API_KEY"] = os.getenv("GEMINI_API_KEY")

# Build Basic Auth header.
LANGFUSE_AUTH = base64.b64encode(
    f"{os.environ.get('LANGFUSE_PUBLIC_KEY')}:{os.environ.get('LANGFUSE_SECRET_KEY')}".encode()
).decode()

# Configure OpenTelemetry endpoint & headers
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = os.environ.get("LANGFUSE_HOST") + "/api/public/otel"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {LANGFUSE_AUTH}"

# Configure logfire instrumentation.
logfire.configure(
    service_name='my_agent_service',
    send_to_logfire=False
)
# This method automatically patches the OpenAI Agents SDK to send logs via OTLP to Langfuse.
logfire.instrument_openai_agents()

openai_model_name = "openai/gpt-4o"
anthropic_model_name = "anthropic/claude-3-sonnet-20240229"
gemini_model_name = "gemini/gemini-2.0-flash"

model = LitellmModel(model=openai_model_name)


@function_tool
def get_weather(city: str) -> str:
    result = ['sunny', 'cloudy', 'rainy', 'snowy']
    return f"The weather in {city} is {choice(result)}."


@function_tool
def get_now_time() -> str:
    now_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    return f"The current time is {now_time}."


weather_agent = Agent(
    name="Weather agent",
    instructions="You are a weather agent.",
    model=model,
    tools=[get_weather],
)

time_agent = Agent(
    name="Time agent",
    instructions="You are a time agent.",
    model=model,
    tools=[get_now_time],
)

agent = Agent(
    name="Agent",
    instructions="You are an helpful agent.",
    model=model,
    handoffs=[weather_agent, time_agent],
)


async def main():
    result1 = await Runner.run(agent, input="What's the weather in Tokyo?")
    print(result1.final_output)

    result2 = await Runner.run(agent, input="What's the time now?")
    print(result2.final_output)


if __name__ == "__main__":
    asyncio.run(main())

以Gemini模型为例，运行的输出结果如下：

13:45:12.025 OpenAI Agents trace: Agent workflow
13:45:12.026   Agent run: 'Agent'
13:45:12.027     Chat completion with 'gemini/gemini-2.0-flash' [LLM]
13:45:12.972     Handoff: Agent → None
13:45:12.973   Agent run: 'Weather agent'
13:45:12.974     Chat completion with 'gemini/gemini-2.0-flash' [LLM]
13:45:14.013     Function: get_weather
13:45:14.015     Chat completion with 'gemini/gemini-2.0-flash' [LLM]
The weather in Tokyo is cloudy.

13:45:14.615 OpenAI Agents trace: Agent workflow
13:45:14.615   Agent run: 'Agent'
13:45:14.616     Chat completion with 'gemini/gemini-2.0-flash' [LLM]
13:45:15.752     Handoff: Agent → None
13:45:15.752   Agent run: 'Time agent'
13:45:15.753     Chat completion with 'gemini/gemini-2.0-flash' [LLM]
13:45:16.420     Function: get_now_time
13:45:16.422     Chat completion with 'gemini/gemini-2.0-flash' [LLM]
Right now, it's 2025-04-25 21:45:16.

总结

LiteLLM 提供了一个轻量级的多模型统一接口，极大地降低了不同大模型之间切换和集成的技术门槛。它通过模拟 OpenAI 的 API 结构，使得开发者几乎无需修改原有代码，即可接入如 OpenAI、Anthropic、Google 等多家厂商的大模型，适用于快速原型验证与上线部署。

其内置的模型路由功能为多模型调用场景提供了强有力的支持，支持设置优先级、失败重试、负载均衡等策略，帮助开发者在性能、稳定性和成本之间灵活权衡。这一点在构建实际应用时尤为重要，特别是在模型表现和费用不确定的情况下。

此外，LiteLLM 与 OpenAI Agents 的兼容性也为构建多模态、多模型协作的智能体系统打开了通路。通过简单的配置，用户就能让 Agent 使用来自任意厂商的模型，进一步提升了系统的可扩展性与灵活性。

未来可以进一步探索LiteLLM 在异步任务处理、调用链路监控、成本优化等方面的实践经验，拓展其在生产环境中的应用深度。

参考网站

LiteLLM - Getting Started: https://docs.litellm.ai/
Using any model via LiteLLM: https://openai.github.io/openai-agents-python/models/litellm/

欢迎关注我的公众号NLP奇幻之旅，原创技术文章第一时间推送。

欢迎关注我的知识星球“自然语言处理奇幻之旅”，笔者正在努力构建自己的技术社区。

NLP

#LiteLLM

NLP（一百二十）LiteLLM解析：构建统一大模型接口的利器

https://percent4.github.io/NLP（一百二十）LiteLLM解析：构建统一大模型接口的利器/

作者

Jclian91

发布于

2025年4月27日

许可协议

Redis进阶（一）使用Redis实现分布式锁上一篇

Kafka入门（五）Kafka_4.0零依赖Zookeeper：Docker一键部署+可视化管理+Python实战下一篇

NLP（一百二十）LiteLLM解析：构建统一大模型接口的利器

前言

Completion

路由机制

OpenAI Agents使用任何LLM

总结

参考网站

`OpenAI Agents`使用任何LLM