NLP(一百二十)LiteLLM解析:构建统一大模型接口的利器

本文介绍了如何使用 LiteLLM 以统一的 OpenAI 风格调用包括 OpenAI、Anthropic 和 Google 在内的多种主流大模型 API,并通过具体代码展示其 Completion 接口、路由机制的使用,以及在 OpenAI Agents 框架中集成任意模型的方式。LiteLLM 大大简化了多模型调用的复杂性,提高了多供应商大模型接入的效率与稳定性。

前言

在我们平时调用大模型API的时候,有些是采用公司对外提供的第三方API,如OpenAI, Anthropic, Google公司的LLM API,有些是采用自行部署的API,比如使用HuggingFace, vLLM部署的大模型API,有些是第三方集成公司提供的API,其它还有Groq, Mistral等模型的API调用。

上述API调用方式五花八门,不同API之间的请求参数也不一样,需要使用不同的工具包或HTTP接口来完成调用。而LiteLLM的出现改变了这一切复杂、无序的现状,它允许你以OpenAI的请风格来完成任意大模型的调用。

LiteLLM的官方网站为https://docs.litellm.ai/,其对应的Python工具目前已支持以OpenAI的输入、输出格式来调用100多种大模型。其强大的功能如下:

  • 将输入转换为各服务商的 completion(文本生成)、embedding(向量嵌入)和 image_generation(图像生成)接口。
  • 保持输出格式一致,结果响应始终可通过 ['choices'][0]['message']['content'] 获取。
  • 支持多部署环境下的重试与回退机制(例如 Azure/OpenAI)——路由器模块负责处理。
  • 通过LiteLLM Proxy Server追踪每个项目的开销,并设置预算。

下面笔者将会介绍如何使用LiteLLM来完成OpenAI, Anthropic, Google公司的LLM API,LiteLLM中的路由机制,并在Agent框架OpenAI Agents中使用这三家公司的LLM.

Completion

使用LiteLLM可轻松实现OpenAI, Anthropic, Google公司的LLM API,Python代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import os
from dotenv import load_dotenv
from litellm import completion

load_dotenv()


# set ENV variables
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["ANTHROPIC_API_KEY"] = os.getenv("ANTHROPIC_API_KEY")
os.environ["GEMINI_API_KEY"] = os.getenv("GEMINI_API_KEY")

messages = [{"content": "Who are you?", "role": "user"}]

# openai call
response = completion(model="openai/gpt-4o", messages=messages)
print("openai response:")
print(response.choices[0].message.content)

# anthropic call
response = completion(model="anthropic/claude-3-sonnet-20240229", messages=messages)
print("\nanthropic response:")
print(response.choices[0].message.content)

# gemini call
response = completion(model="gemini/gemini-2.0-flash", messages=messages)
print("\ngemini response:")
print(response.choices[0].message.content)

运行结果如下:

1
2
3
4
5
6
7
8
openai response:
I'm an AI language model developed by OpenAI, known as ChatGPT. I'm here to assist you with information, answer questions, and help with a wide range of topics. How can I assist you today?

anthropic response:
I am an artificial intelligence created by Anthropic. I am a large language model trained to assist with a variety of tasks like analysis, writing, research, answering questions and more. I have access to a large amount of information but I don't have subjective experiences or a physical form. I'm an AI assistant aimed at being helpful, honest and harmless.

gemini response:
I am a large language model, trained by Google.

上述代码以统一的调用方式实现了各个大模型的Completion接口。其它功能,如异步客户端,流式输出,错误处理,观测行为,接口花费等,都是用OpenAI风格统一实现的。因此,有了这个功能,我们就可以用熟悉的OpenAI风格来调用各个大模型的不同功能接口。

路由机制

LiteLLM的路由机制(Router)提供丰富的管理功能:

  • 在多个部署环境之间进行负载均衡(例如 Azure/OpenAI)。
  • 对重要请求进行优先级处理(即排队机制),以确保这些请求不会失败。
  • 基础可靠性逻辑:在多个部署/服务提供商之间支持冷却时间、回退机制、超时处理和重试(包括固定间隔与指数回退)。

这些功能都是大模型调用经常会遇到的,路由机制很好地帮助我们保障了LLM API服务的稳定性。

我们以路由策略为例,默认为基于权重的随机选择策略。示例Python代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import os
from litellm import Router
import asyncio


model_list = [{
"model_name": "gcg", # model alias
"litellm_params": {
"model": "openai/gpt-4o",
"api_key": "sk-proj-xxx",
"weight": 1
}
}, {
"model_name": "gcg", # model alias
"litellm_params": {
"model": "anthropic/claude-3-5-sonnet-20240620",
"api_key": "sk-ant-xxx",
"weight": 1,
}
}
]

router = Router(model_list=model_list, routing_strategy="simple-shuffle")

async def router_acompletion(i):
print(f"Running {i} task...")
response = await router.acompletion(
model="gcg",
messages=[{"role": "user", "content": "Who are you? reply in one sentence."}]
)
print(f"\nmodel: {response.model}, response: {response.choices[0].message.content}")


async def run():
tasks = [router_acompletion(i) for i in range(5)]
await asyncio.gather(*tasks)

if __name__ == "__main__":
asyncio.run(run())

上述的五个协程中,每个协程都会随机调用别名为gcg的模型进行回复,gcg模型提供了权重相等的两种调用方式。

运行结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Running 0 task...
Running 1 task...
Running 2 task...
Running 3 task...
Running 4 task...

model: claude-3-5-sonnet-20240620, response: I am an AI assistant created by Anthropic to be helpful, harmless, and honest.

model: claude-3-5-sonnet-20240620, response: I am an AI assistant created by Anthropic to be helpful, harmless, and honest.

model: claude-3-5-sonnet-20240620, response: I am an artificial intelligence created by Anthropic to engage in conversation and assist with tasks.

model: gpt-4o-2024-08-06, response: I am an AI language model developed by OpenAI, designed to assist with information and answer questions across a wide range of topics.

model: gpt-4o-2024-08-06, response: I am an AI language model created by OpenAI, here to assist you with information and answer your questions.

LiteLLM的路由机制非常有用,笔者后续将会再单独详细介绍~

OpenAI Agents使用任何LLM

在Agent框架OpenAI Agents,借用LiteLLM,可实现任意LLM的调用。安装命令如下:

1
pip install "openai-agents[litellm]"

示例Python代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# -*- coding: utf-8 -*-
import os
import asyncio
import base64
import logfire
from random import choice
from datetime import datetime
from dotenv import load_dotenv
from agents import Agent, Runner, function_tool
from agents.extensions.models.litellm_model import LitellmModel


load_dotenv()

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["ANTHROPIC_API_KEY"] = os.getenv("ANTHROPIC_API_KEY")
os.environ["GEMINI_API_KEY"] = os.getenv("GEMINI_API_KEY")

# Build Basic Auth header.
LANGFUSE_AUTH = base64.b64encode(
f"{os.environ.get('LANGFUSE_PUBLIC_KEY')}:{os.environ.get('LANGFUSE_SECRET_KEY')}".encode()
).decode()

# Configure OpenTelemetry endpoint & headers
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = os.environ.get("LANGFUSE_HOST") + "/api/public/otel"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {LANGFUSE_AUTH}"

# Configure logfire instrumentation.
logfire.configure(
service_name='my_agent_service',
send_to_logfire=False
)
# This method automatically patches the OpenAI Agents SDK to send logs via OTLP to Langfuse.
logfire.instrument_openai_agents()

openai_model_name = "openai/gpt-4o"
anthropic_model_name = "anthropic/claude-3-sonnet-20240229"
gemini_model_name = "gemini/gemini-2.0-flash"

model = LitellmModel(model=openai_model_name)


@function_tool
def get_weather(city: str) -> str:
result = ['sunny', 'cloudy', 'rainy', 'snowy']
return f"The weather in {city} is {choice(result)}."


@function_tool
def get_now_time() -> str:
now_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
return f"The current time is {now_time}."


weather_agent = Agent(
name="Weather agent",
instructions="You are a weather agent.",
model=model,
tools=[get_weather],
)

time_agent = Agent(
name="Time agent",
instructions="You are a time agent.",
model=model,
tools=[get_now_time],
)

agent = Agent(
name="Agent",
instructions="You are an helpful agent.",
model=model,
handoffs=[weather_agent, time_agent],
)


async def main():
result1 = await Runner.run(agent, input="What's the weather in Tokyo?")
print(result1.final_output)

result2 = await Runner.run(agent, input="What's the time now?")
print(result2.final_output)


if __name__ == "__main__":
asyncio.run(main())

以Gemini模型为例,运行的输出结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
13:45:12.025 OpenAI Agents trace: Agent workflow
13:45:12.026 Agent run: 'Agent'
13:45:12.027 Chat completion with 'gemini/gemini-2.0-flash' [LLM]
13:45:12.972 Handoff: Agent → None
13:45:12.973 Agent run: 'Weather agent'
13:45:12.974 Chat completion with 'gemini/gemini-2.0-flash' [LLM]
13:45:14.013 Function: get_weather
13:45:14.015 Chat completion with 'gemini/gemini-2.0-flash' [LLM]
The weather in Tokyo is cloudy.

13:45:14.615 OpenAI Agents trace: Agent workflow
13:45:14.615 Agent run: 'Agent'
13:45:14.616 Chat completion with 'gemini/gemini-2.0-flash' [LLM]
13:45:15.752 Handoff: Agent → None
13:45:15.752 Agent run: 'Time agent'
13:45:15.753 Chat completion with 'gemini/gemini-2.0-flash' [LLM]
13:45:16.420 Function: get_now_time
13:45:16.422 Chat completion with 'gemini/gemini-2.0-flash' [LLM]
Right now, it's 2025-04-25 21:45:16.

总结

LiteLLM 提供了一个轻量级的多模型统一接口,极大地降低了不同大模型之间切换和集成的技术门槛。它通过模拟 OpenAI 的 API 结构,使得开发者几乎无需修改原有代码,即可接入如 OpenAI、Anthropic、Google 等多家厂商的大模型,适用于快速原型验证与上线部署。

其内置的模型路由功能为多模型调用场景提供了强有力的支持,支持设置优先级、失败重试、负载均衡等策略,帮助开发者在性能、稳定性和成本之间灵活权衡。这一点在构建实际应用时尤为重要,特别是在模型表现和费用不确定的情况下。

此外,LiteLLM 与 OpenAI Agents 的兼容性也为构建多模态、多模型协作的智能体系统打开了通路。通过简单的配置,用户就能让 Agent 使用来自任意厂商的模型,进一步提升了系统的可扩展性与灵活性。

未来可以进一步探索LiteLLM 在异步任务处理、调用链路监控、成本优化等方面的实践经验,拓展其在生产环境中的应用深度。

参考网站

  1. LiteLLM - Getting Started: https://docs.litellm.ai/
  2. Using any model via LiteLLM: https://openai.github.io/openai-agents-python/models/litellm/

欢迎关注我的公众号NLP奇幻之旅,原创技术文章第一时间推送。

欢迎关注我的知识星球“自然语言处理奇幻之旅”,笔者正在努力构建自己的技术社区。


NLP(一百二十)LiteLLM解析:构建统一大模型接口的利器
https://percent4.github.io/NLP(一百二十)LiteLLM解析:构建统一大模型接口的利器/
作者
Jclian91
发布于
2025年4月27日
许可协议