NLP（一百一十三）使用Langfuse提升LLM和Agents的可观测性

本文将会介绍如何使用Langfuse来提升大模型（LLM）和Agents的可观测性。

在日常使用大模型（LLM）和 Agents 时，我们常常需要追踪它们的响应情况和成本。借助 Langfuse，我们可以大幅提升 LLM 和 Agents 的可观测性，实现更精细的监控与优化。

简介

Langfuse是一个开源的大型语言模型（LLM）工程平台，专注于为开发者和研究人员提供灵活且高效的语言模型开发环境。它旨在解决 LLM 应用的工程化挑战，包括模型训练、部署、监控和优化等方面的问题。

主要特点:

高度可定制化：Langfuse提供丰富的配置选项和灵活的 API 接口，允许用户根据实际需求定制 LLM 的功能和性能。
高效资源管理：用户可以轻松管理和调度各种计算资源，提高资源利用率。
完善的监控运维体系：内置强大的监控和运维工具，实时监控 LLM 的运行状态和性能指标。
多功能支持：包括 LLM 可观测性、提示管理、LLM 评估、数据集管理、LLM 指标分析等功能。

应用场景

Langfuse适用于构建生产级 LLM 应用，特别是在需要快速开发和优化自定义对话系统、机器翻译系统等场景。它支持多种编程语言和框架，降低了开发门槛，使得初学者也能快速上手。

接下来，笔者将会分别介绍Langfuse如何提升LLM与Agents的可观测性。

Langfuse与LLM

首先我们需要创建一个Langfuse账号，同时生成LANGFUSE_SECRET_KEY、LANGFUSE_PUBLIC_KEY和LANGFUSE_HOST变量。我们将这些变量以及LLM的API key都放在环境变量中。

Langfuse与OpenAI

Langfuse已支持OpenAI，因此调用方式非常简单，代码如下：

from dotenv import load_dotenv
from langfuse.decorators import observe
from langfuse.openai import openai

load_dotenv()


@observe()
def story():
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is Langfuse?"}],
    )
    return response.choices[0].message.content


@observe()
def main():
    return story()


main()

Langfuse提供了Tracing功能，它能帮助我们来追踪LLM的响应情况与成本。

从上图中，我们可以看到该LLM的调用路径为main() -> story() -> OpenAI-generation，还能观察到调用的模型、参数、响应时间、输入与输出token数量、token成本等信息，这对于我们掌握LLM的响应无疑有很大的帮助。

Langfuse与任意LLM

更为强大的是，Langfuse还具有任意LLM的追踪能力。我们以Anthropic的Claude系列模型为例，代码如下:

import os

from langfuse.decorators import observe, langfuse_context
import anthropic
from dotenv import load_dotenv

load_dotenv()

anthopic_client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))


# Wrap LLM function with decorator
@observe(as_type="generation")
def anthropic_completion(**kwargs):
    # optional, extract some fields from kwargs
    kwargs_clone = kwargs.copy()
    _input = kwargs_clone.pop('messages', None)
    model = kwargs_clone.pop('model', None)
    langfuse_context.update_current_observation(
        input=_input,
        model=model,
        metadata=kwargs_clone
    )

    response = anthopic_client.messages.create(**kwargs)

    # See docs for more details on token counts and usd cost in Langfuse
    # https://langfuse.com/docs/model-usage-and-cost
    langfuse_context.update_current_observation(
        usage_details={
            "input": response.usage.input_tokens,
            "output": response.usage.output_tokens
        }
    )
    return response.content[0].text


@observe()
def main():
    return anthropic_completion(
        model="claude-3-opus-20240229",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": "Hello, Claude"}
        ]
    )


main()

从上面的代码中，我们可以看到Tracing(追踪)过程中的参数为message, model, 输入、输出token以及其它参数等。

自定义配置

在Langfuse中，你可以自己设置调用LLM的任务名称，trace_id和session_id等信息。笔者使用Langfuse的Low-Level SDK实现示例代码如下：

import os
from uuid import uuid4
from dotenv import load_dotenv
from langfuse import Langfuse
from openai import OpenAI

load_dotenv()


def story(**kwargs):
    langfuse = Langfuse(environment="development")
    trace = langfuse.trace(
        id=kwargs.get("langfuse_observation_id"),
        name=kwargs.get("name"),
        tags=kwargs.get("tags"),
        session_id=kwargs.get("session_id")
    )

    model_name = "gpt-4o"
    max_tokens = 1000
    temperature = 0.5
    messages = [{"role": "user", "content": "What is Langfuse?"}]
    # creates generation
    generation = trace.generation(
        name="my-first-generation",
        model=model_name,
        model_parameters={"maxTokens": max_tokens, "temperature": temperature},
        input=messages
    )
    # creates chat completion
    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    response = client.chat.completions.create(
        model=model_name,
        messages=messages
    )
    # update span and sets end_time
    generation.end(
        output=response.choices[0].message.content,
        usage_details=response.usage
    )

    return response.choices[0].message.content


def main():
    name = "My first trace v5"
    custom_observation_id = str(uuid4())
    session_id = str(uuid4())
    tags = ["langfuse", "openai", "gpt-4o"]
    print("trace_id: ", custom_observation_id)
    print("session_id: ", session_id)
    return story(langfuse_observation_id=custom_observation_id, name=name, tags=tags, session_id=session_id)


main()

观测结果如下图所示：

Langfuse与Agents

OpenAI在前几天开源了它们的Multi Agents框架openai-agents-python，该工具允许你很方便地创建多智能体应用，其安装命令为：

1	`pip install openai-agents`

Single Agent

以单智能体为例，Langfuse的观测代码如下：

import os
import base64
import logfire
import asyncio
from dotenv import load_dotenv
from agents import Agent, Runner

load_dotenv()

# Build Basic Auth header.
LANGFUSE_AUTH = base64.b64encode(
    f"{os.environ.get('LANGFUSE_PUBLIC_KEY')}:{os.environ.get('LANGFUSE_SECRET_KEY')}".encode()
).decode()

# Configure OpenTelemetry endpoint & headers
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = os.environ.get("LANGFUSE_HOST") + "/api/public/otel"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {LANGFUSE_AUTH}"

# Configure logfire instrumentation.
logfire.configure(
    service_name='my_agent_service',
    send_to_logfire=False
)
# This method automatically patches the OpenAI Agents SDK to send logs via OTLP to Langfuse.
logfire.instrument_openai_agents()


async def main():
    agent = Agent(
        name="Assistant",
        instructions="You are a helpful assistant.",
    )
    result = await Runner.run(agent, "What is the captial of France?")
    print(result.final_output)


if __name__ == "__main__":
    asyncio.run(main())

输出结果为：

13:34:39.454 OpenAI Agents trace: Agent workflow
13:34:39.455   Agent run: 'Assistant'
13:34:39.460     Responses API with 'gpt-4o'
The capital of France is Paris.

观测结果如下图：

Multi Agents

以多智能体为例（本例中构建了3个Agent，分别负责中译英、英译中、Agent选择），Langfuse的观测代码如下：

# -*- coding: utf-8 -*-
import os
import base64
import logfire
import asyncio
from dotenv import load_dotenv
from agents import Agent, Runner


load_dotenv()

# Build Basic Auth header.
LANGFUSE_AUTH = base64.b64encode(
    f"{os.environ.get('LANGFUSE_PUBLIC_KEY')}:{os.environ.get('LANGFUSE_SECRET_KEY')}".encode()
).decode()

# Configure OpenTelemetry endpoint & headers
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = os.environ.get("LANGFUSE_HOST") + "/api/public/otel"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {LANGFUSE_AUTH}"

# Configure logfire instrumentation.
logfire.configure(
    service_name='my_agent_service',
    send_to_logfire=False
)
# This method automatically patches the OpenAI Agents SDK to send logs via OTLP to Langfuse.
logfire.instrument_openai_agents()

zh2en_agent = Agent(
    name="Chinese to English agent",
    instructions="You are a translator from Chinese to English.",
)

en2zh_agent = Agent(
    name="English to Chinese agent",
    instructions="You are a translator from English to Chinese.",
)

translation_agent = Agent(
    name="Translation agent",
    instructions="You are a translation agent. If the input is in Chinese, translate it to English."
                 " If the input is in English, translate it to Chinese.",
    handoffs=[zh2en_agent, en2zh_agent],
)


async def main():
    result = await Runner.run(translation_agent, input="The Shawshank Redemption")
    print(result.final_output)


if __name__ == "__main__":
    asyncio.run(main())

输出结果如下：

13:38:56.614 OpenAI Agents trace: Agent workflow
13:38:56.615   Agent run: 'Translation agent'
13:38:56.620     Responses API with 'gpt-4o'
13:38:58.060     Handoff: Translation agent -> None
13:38:58.061   Agent run: 'English to Chinese agent'
13:38:58.061     Responses API with 'gpt-4o'
《肖申克的救赎》

观测结果如下图：

Multi Agents with function calling

openai-agents-python框架支持多智能体的同时还支持工具调用（function calling），笔者以Weather Agent和Time Agent为例，分别调用get_weather和get_time函数，代码如下：

import os
import base64
import logfire
import asyncio
from datetime import datetime
from dotenv import load_dotenv
from agents import Agent, Runner, function_tool


load_dotenv()

# Build Basic Auth header.
LANGFUSE_AUTH = base64.b64encode(
    f"{os.environ.get('LANGFUSE_PUBLIC_KEY')}:{os.environ.get('LANGFUSE_SECRET_KEY')}".encode()
).decode()

# Configure OpenTelemetry endpoint & headers
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = os.environ.get("LANGFUSE_HOST") + "/api/public/otel"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {LANGFUSE_AUTH}"

# Configure logfire instrumentation.
logfire.configure(
    service_name='my_agent_service',
    send_to_logfire=False
)
# This method automatically patches the OpenAI Agents SDK to send logs via OTLP to Langfuse.
logfire.instrument_openai_agents()


@function_tool
def get_weather(city: str) -> str:
    return f"The weather in {city} is sunny."


@function_tool
def get_now_time() -> str:
    now_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    return f"The current time is {now_time}."


weather_agent = Agent(
    name="Weather agent",
    instructions="You are a weather agent.",
    tools=[get_weather],
)

time_agent = Agent(
    name="Time agent",
    instructions="You are a time agent.",
    tools=[get_now_time],
)

agent = Agent(
    name="Agent",
    instructions="You are an helpful agent.",
    handoffs=[weather_agent, time_agent],
)


async def main():
    result1 = await Runner.run(agent, input="What's the weather in Tokyo?")
    print(result1.final_output)
    # The weather in Tokyo is sunny.
    result2 = await Runner.run(agent, input="What's the time now?")
    print(result2.final_output)

if __name__ == "__main__":
    asyncio.run(main())

Agent调用情况如下图：

调用工具输出如下图：

总结

本文介绍了如何使用Langfuse 来提升大模型（LLM）和多智能体（Agents）的可观测性。Langfuse 作为一个开源 LLM 工程平台，提供了强大的 Tracing（追踪）功能，使开发者能够详细监控 LLM 的调用路径、输入输出、响应时间和成本等关键指标。文章展示了如何在 OpenAI 和 Anthropic 的 LLM 调用中集成 Langfuse，并通过自定义配置进一步增强可观测性。

此外，文章还介绍了 Langfuse 与 OpenAI Agents 框架结合使用的案例，包括单智能体、多智能体以及工具调用（function calling）的场景。通过 Langfuse，开发者可以清晰地跟踪智能体的决策路径、交互过程以及外部 API 的调用情况，从而优化 LLM 应用的稳定性和性能。

当然，Langfuse的功能是十分强大的，远不仅于此，笔者后续将会继续探索~

欢迎关注我的公众号NLP奇幻之旅，原创技术文章第一时间推送。