name

buttercup-langfuse

description

Buttercup-specific Langfuse wiring. Use when adding LLM tracing to a new LangChain/LangGraph entry point in this repo, attaching tags/metadata to existing chains, troubleshooting "no traces in Langfuse" against this codebase, or touching the Helm/values surface (`global.langfuse.*`, `langfuse-secret.yaml`, `common-env.yaml`). For platform-level Langfuse usage (CLI, instrumentation patterns generally, prompt migration, SDK upgrades), use the `langfuse` skill instead.

Langfuse wiring in Buttercup

How this repo integrates Langfuse. For general Langfuse platform/CLI/instrumentation guidance, defer to the langfuse skill — this skill only covers Buttercup-local conventions.

Tracing is always optional at runtime: if the env vars are unset or the host is unreachable, callbacks resolve to an empty list and code paths are unaffected. Follow the existing pattern — do not introduce a parallel integration.

The one helper to use

buttercup.common.llm.get_langfuse_callbacks() -> list[BaseCallbackHandler]

Defined in common/src/buttercup/common/llm.py
@functools.cached — cheap to call repeatedly, but the auth/connectivity probe runs only once per process
Returns [] when Langfuse is disabled, misconfigured, or unreachable
Internally: runs is_langfuse_available() (checks LANGFUSE_HOST, then HTTP-probes /api/public/ingestion), constructs langfuse.langchain.CallbackHandler(), then verifies credentials via langfuse_auth_check()

Do not instantiate CallbackHandler directly or read the env vars yourself. Always go through get_langfuse_callbacks().

Standard wiring pattern

Attach the callbacks via RunnableConfig on the compiled chain/graph. Canonical examples in the repo:

patcher/src/buttercup/patcher/agents/leader.py — LangGraph with tags + metadata
seed-gen/src/buttercup/seed_gen/seed_explore.py — minimal LangGraph
seed-gen/src/buttercup/seed_gen/task.py, vuln_base_task.py, seed_init.py — variants

Minimal form:

from langchain_core.runnables import RunnableConfig
from buttercup.common.llm import get_langfuse_callbacks

llm_callbacks = get_langfuse_callbacks()
chain = workflow.compile().with_config(
    RunnableConfig(tags=["<short-task-name>"], callbacks=llm_callbacks),
)
chain.invoke(state)

Rich form (patcher-style) — tags for filtering, metadata for searchable fields in the Langfuse UI:

chain = patch_team.compile().with_config(
    RunnableConfig(
        callbacks=llm_callbacks,
        tags=["patch_team", challenge.name, task_id, internal_patch_id],
        metadata={
            "task_id": task_id,
            "internal_patch_id": internal_patch_id,
            "challenge_project_name": challenge.name,
        },
        recursion_limit=RECURSION_LIMIT,
        configurable={...},
    ),
)

Conventions:

First tag is the workflow/team name ("patch_team", "seed-explore", …). Identifies the kind of run in the Langfuse UI.
Subsequent tags are high-cardinality identifiers (task_id, challenge name, patch id) for drill-down.
metadata mirrors the identifier tags as structured fields — tags are best for filtering, metadata for inspection.
Don't reach for langfuse.observe() decorators or other SDK entry points — this codebase routes everything through the LangChain callback handler.

When you're adding a NEW LangChain/LangGraph entry point

Import get_langfuse_callbacks from buttercup.common.llm.
Call it once near where you compile the chain/graph.
Pass through RunnableConfig(callbacks=..., tags=[...], metadata={...}).
Pick a workflow tag that doesn't collide with existing ones (patch_team, seed-explore, seed-init, vuln-discovery, …).
If the component runs in k8s, confirm its Deployment template pulls in the Langfuse env block (see below). Most existing components already do.

Deployment / config surface

Env vars (consumed by get_langfuse_callbacks()):

LANGFUSE_HOST — base URL, e.g. https://cloud.langfuse.com
LANGFUSE_PUBLIC_KEY (pk-lf-...)
LANGFUSE_SECRET_KEY (sk-lf-...)

Where they're set:

Local docker-compose: dev/docker-compose/env.template
Local make-based deploy: deployment/env.template (gated by LANGFUSE_ENABLED)
Kubernetes: deployment/k8s/values.yaml under global.langfuse.{enabled,host,publicKey,secretKey}
- Rendered into the <release>-langfuse-secrets Secret by deployment/k8s/templates/langfuse-secret.yaml
- Injected into pods via the buttercup.env.langfuse helper in deployment/k8s/templates/common-env.yaml
- A new component's Deployment template must include that helper to get traces

Python dependency: langfuse ~=4.0.1, declared under [project.optional-dependencies] full in common/pyproject.toml. Provided via the [full] extra of common. Components that emit traces depend on common[full] (as patcher and seed-gen already do). Don't pin langfuse independently — re-use the extra.

Debugging "I don't see traces"

Walk the chain top-to-bottom; the first failing step short-circuits everything downstream:

LANGFUSE_HOST unset. Logs: "LangFuse not configured". Set the env var.
Host unreachable / not Langfuse. is_langfuse_available() POSTs to /api/public/ingestion and expects HTTP 401 (unauthenticated). Anything else → False. Check network reachability from the pod and that the URL is correct.
Bad keys. Logs: "LangFuse authentication failed". langfuse_auth_check() POSTs the same endpoint with basic auth and expects HTTP 400 (authenticated but bad payload). 401 means wrong keys. Rotate the secret and re-apply.
Callbacks not attached. get_langfuse_callbacks() returned a handler, but the runnable was invoked without RunnableConfig(callbacks=...). Grep the call site — easy to miss when refactoring.
Wrong runnable. Only LangChain/LangGraph runnables route through callbacks. Direct HTTP calls to OpenAI/LiteLLM bypass Langfuse entirely. If the new code path uses httpx.post(...) or similar, it won't trace — that's expected.
Cache staleness. All three helpers are @functools.cached per process. If env vars change at runtime, the process must restart. In k8s this means rolling the pod after a secret change.

What NOT to do

Don't add a new langfuse direct dependency in a sibling component's pyproject.toml — depend on common[full] instead (as patcher and seed-gen do) to pull it in via the extra.
Don't gate code paths on is_langfuse_available() — the empty-callback-list pattern already makes Langfuse a no-op when disabled.
Don't pass the callback list into individual llm.invoke() calls when there's a parent chain — attach at the highest reasonable scope (the compiled workflow) so child runs nest correctly in the Langfuse UI.
Don't conflate Langfuse with OpenTelemetry. They coexist: Langfuse for LLM trace nesting/prompt inspection, OTel (buttercup.common.telemetry) for system-level spans. seed_explore.py shows both wrapped around the same chain.invoke(...).