本文介绍了如何使用 LiteLLM 以统一的 OpenAI 风格调用包括
OpenAI、Anthropic 和 Google 在内的多种主流大模型
API,并通过具体代码展示其 Completion 接口、路由机制的使用,以及在 OpenAI
Agents 框架中集成任意模型的方式。LiteLLM
大大简化了多模型调用的复杂性,提高了多供应商大模型接入的效率与稳定性。
前言
在我们平时调用大模型API的时候,有些是采用公司对外提供的第三方API,如OpenAI,
Anthropic, Google公司的LLM
API,有些是采用自行部署的API,比如使用HuggingFace,
vLLM部署的大模型API,有些是第三方集成公司提供的API,其它还有Groq,
Mistral等模型的API调用。
上述API调用方式五花八门,不同API之间的请求参数也不一样,需要使用不同的工具包或HTTP接口来完成调用。而LiteLLM
的出现改变了这一切复杂、无序的现状,它允许你以OpenAI的请风格来完成任意大模型的调用。
LiteLLM
的官方网站为https://docs.litellm.ai/,其对应的Python工具目前已支持以OpenAI的输入、输出格式来调用100多种大模型。其强大的功能如下:
将输入转换为各服务商的
completion(文本生成)、embedding(向量嵌入)和
image_generation(图像生成)接口。
保持输出格式一致,结果响应始终可通过
['choices'][0]['message']['content'] 获取。
支持多部署环境下的重试与回退机制(例如
Azure/OpenAI)——路由器模块负责处理。
通过LiteLLM Proxy Server
追踪每个项目的开销,并设置预算。
下面笔者将会介绍如何使用LiteLLM
来完成OpenAI, Anthropic,
Google公司的LLM
API,LiteLLM
中的路由机制,并在Agent框架OpenAI Agents
中使用这三家公司的LLM.
Completion
使用LiteLLM
可轻松实现OpenAI, Anthropic, Google公司的LLM
API,Python代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 import osfrom dotenv import load_dotenvfrom litellm import completion load_dotenv() os.environ["OPENAI_API_KEY" ] = os.getenv("OPENAI_API_KEY" ) os.environ["ANTHROPIC_API_KEY" ] = os.getenv("ANTHROPIC_API_KEY" ) os.environ["GEMINI_API_KEY" ] = os.getenv("GEMINI_API_KEY" ) messages = [{"content" : "Who are you?" , "role" : "user" }] response = completion(model="openai/gpt-4o" , messages=messages)print ("openai response:" )print (response.choices[0 ].message.content) response = completion(model="anthropic/claude-3-sonnet-20240229" , messages=messages)print ("\nanthropic response:" )print (response.choices[0 ].message.content) response = completion(model="gemini/gemini-2.0-flash" , messages=messages)print ("\ngemini response:" )print (response.choices[0 ].message.content)
运行结果如下:
1 2 3 4 5 6 7 8 openai response: I'm an AI language model developed by OpenAI, known as ChatGPT. I'm here to assist you with information, answer questions, and help with a wide range of topics. How can I assist you today? anthropic response: I am an artificial intelligence created by Anthropic. I am a large language model trained to assist with a variety of tasks like analysis, writing, research, answering questions and more. I have access to a large amount of information but I don't have subjective experiences or a physical form. I'm an AI assistant aimed at being helpful, honest and harmless. gemini response: I am a large language model, trained by Google.
上述代码以统一的调用方式实现了各个大模型的Completion接口。其它功能,如异步客户端,流式输出,错误处理,观测行为,接口花费等,都是用OpenAI风格统一实现的。因此,有了这个功能,我们就可以用熟悉的OpenAI风格来调用各个大模型的不同功能接口。
路由机制
LiteLLM
的路由机制(Router)提供丰富的管理功能:
在多个部署环境之间进行负载均衡(例如 Azure/OpenAI)。
对重要请求进行优先级处理(即排队机制),以确保这些请求不会失败。
基础可靠性逻辑:在多个部署/服务提供商之间支持冷却时间、回退机制、超时处理和重试(包括固定间隔与指数回退)。
这些功能都是大模型调用经常会遇到的,路由机制很好地帮助我们保障了LLM
API服务的稳定性。
我们以路由策略为例,默认为基于权重的随机选择策略。示例Python代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 import osfrom litellm import Routerimport asyncio model_list = [{ "model_name" : "gcg" , "litellm_params" : { "model" : "openai/gpt-4o" , "api_key" : "sk-proj-xxx" , "weight" : 1 } }, { "model_name" : "gcg" , "litellm_params" : { "model" : "anthropic/claude-3-5-sonnet-20240620" , "api_key" : "sk-ant-xxx" , "weight" : 1 , } } ] router = Router(model_list=model_list, routing_strategy="simple-shuffle" )async def router_acompletion (i ): print (f"Running {i} task..." ) response = await router.acompletion( model="gcg" , messages=[{"role" : "user" , "content" : "Who are you? reply in one sentence." }] ) print (f"\nmodel: {response.model} , response: {response.choices[0 ].message.content} " )async def run (): tasks = [router_acompletion(i) for i in range (5 )] await asyncio.gather(*tasks)if __name__ == "__main__" : asyncio.run(run())
上述的五个协程中,每个协程都会随机调用别名为gcg的模型进行回复,gcg模型提供了权重相等的两种调用方式。
运行结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Running 0 task... Running 1 task... Running 2 task... Running 3 task... Running 4 task... model: claude-3-5-sonnet-20240620, response: I am an AI assistant created by Anthropic to be helpful, harmless, and honest. model: claude-3-5-sonnet-20240620, response: I am an AI assistant created by Anthropic to be helpful, harmless, and honest. model: claude-3-5-sonnet-20240620, response: I am an artificial intelligence created by Anthropic to engage in conversation and assist with tasks. model: gpt-4o-2024-08-06, response: I am an AI language model developed by OpenAI, designed to assist with information and answer questions across a wide range of topics. model: gpt-4o-2024-08-06, response: I am an AI language model created by OpenAI, here to assist you with information and answer your questions.
LiteLLM
的路由机制非常有用,笔者后续将会再单独详细介绍~
OpenAI Agents
使用任何LLM
在Agent框架OpenAI Agents
,借用LiteLLM
,可实现任意LLM的调用。安装命令如下:
1 pip install "openai-agents[litellm]"
示例Python代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 import osimport asyncioimport base64import logfirefrom random import choicefrom datetime import datetimefrom dotenv import load_dotenvfrom agents import Agent, Runner, function_toolfrom agents.extensions.models.litellm_model import LitellmModel load_dotenv() os.environ["OPENAI_API_KEY" ] = os.getenv("OPENAI_API_KEY" ) os.environ["ANTHROPIC_API_KEY" ] = os.getenv("ANTHROPIC_API_KEY" ) os.environ["GEMINI_API_KEY" ] = os.getenv("GEMINI_API_KEY" ) LANGFUSE_AUTH = base64.b64encode( f"{os.environ.get('LANGFUSE_PUBLIC_KEY' )} :{os.environ.get('LANGFUSE_SECRET_KEY' )} " .encode() ).decode() os.environ["OTEL_EXPORTER_OTLP_ENDPOINT" ] = os.environ.get("LANGFUSE_HOST" ) + "/api/public/otel" os.environ["OTEL_EXPORTER_OTLP_HEADERS" ] = f"Authorization=Basic {LANGFUSE_AUTH} " logfire.configure( service_name='my_agent_service' , send_to_logfire=False ) logfire.instrument_openai_agents() openai_model_name = "openai/gpt-4o" anthropic_model_name = "anthropic/claude-3-sonnet-20240229" gemini_model_name = "gemini/gemini-2.0-flash" model = LitellmModel(model=openai_model_name)@function_tool def get_weather (city: str ) -> str : result = ['sunny' , 'cloudy' , 'rainy' , 'snowy' ] return f"The weather in {city} is {choice(result)} ." @function_tool def get_now_time () -> str : now_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S" ) return f"The current time is {now_time} ." weather_agent = Agent( name="Weather agent" , instructions="You are a weather agent." , model=model, tools=[get_weather], ) time_agent = Agent( name="Time agent" , instructions="You are a time agent." , model=model, tools=[get_now_time], ) agent = Agent( name="Agent" , instructions="You are an helpful agent." , model=model, handoffs=[weather_agent, time_agent], )async def main (): result1 = await Runner.run(agent, input ="What's the weather in Tokyo?" ) print (result1.final_output) result2 = await Runner.run(agent, input ="What's the time now?" ) print (result2.final_output)if __name__ == "__main__" : asyncio.run(main())
以Gemini模型为例,运行的输出结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 13:45:12.025 OpenAI Agents trace: Agent workflow 13:45:12.026 Agent run: 'Agent' 13:45:12.027 Chat completion with 'gemini/gemini-2.0-flash' [LLM] 13:45:12.972 Handoff: Agent → None 13:45:12.973 Agent run: 'Weather agent' 13:45:12.974 Chat completion with 'gemini/gemini-2.0-flash' [LLM] 13:45:14.013 Function: get_weather 13:45:14.015 Chat completion with 'gemini/gemini-2.0-flash' [LLM] The weather in Tokyo is cloudy. 13:45:14.615 OpenAI Agents trace: Agent workflow 13:45:14.615 Agent run: 'Agent' 13:45:14.616 Chat completion with 'gemini/gemini-2.0-flash' [LLM] 13:45:15.752 Handoff: Agent → None 13:45:15.752 Agent run: 'Time agent' 13:45:15.753 Chat completion with 'gemini/gemini-2.0-flash' [LLM] 13:45:16.420 Function: get_now_time 13:45:16.422 Chat completion with 'gemini/gemini-2.0-flash' [LLM] Right now, it's 2025-04-25 21:45:16.
总结
LiteLLM
提供了一个轻量级的多模型统一接口,极大地降低了不同大模型之间切换和集成的技术门槛。它通过模拟
OpenAI 的 API 结构,使得开发者几乎无需修改原有代码,即可接入如
OpenAI、Anthropic、Google
等多家厂商的大模型,适用于快速原型验证与上线部署。
其内置的模型路由功能为多模型调用场景提供了强有力的支持,支持设置优先级、失败重试、负载均衡等策略,帮助开发者在性能、稳定性和成本之间灵活权衡。这一点在构建实际应用时尤为重要,特别是在模型表现和费用不确定的情况下。
此外,LiteLLM
与 OpenAI Agents
的兼容性也为构建多模态、多模型协作的智能体系统打开了通路。通过简单的配置,用户就能让
Agent 使用来自任意厂商的模型,进一步提升了系统的可扩展性与灵活性。
未来可以进一步探索LiteLLM
在异步任务处理、调用链路监控、成本优化等方面的实践经验,拓展其在生产环境中的应用深度。
参考网站
LiteLLM - Getting Started: https://docs.litellm.ai/
Using any model via LiteLLM: https://openai.github.io/openai-agents-python/models/litellm/
欢迎关注我的公众号NLP奇幻之旅 ,原创技术文章第一时间推送。
欢迎关注我的知识星球“自然语言处理奇幻之旅 ”,笔者正在努力构建自己的技术社区。