Test your AI agent with simulation-based scenarios. Covers writing scenario test code (Scenario SDK), creating platform scenarios via the `langwatch` CLI, and red teaming for security vulnerabilities. Auto-detects whether to use code or platform approach based on context.

2026-07-22

agent-improve

software-developers

Turns production evidence into tested improvements for your AI agent. Forms hypotheses from real traces and analytics, explains the reasoning behind each one, then executes with the user: scenario tests that reproduce production failures, prompt and code changes as reviewable PRs, new evaluators and monitors that capture production signals, and experiments that settle open questions. Use when you want to know what to do next to improve your agent.

2026-07-21

agent-performance

software-developers

Deep-dive diagnosis of how your AI agent behaves in production. Explores LangWatch analytics and traces end to end to map failure patterns, dissatisfied users, token cost hotspots, edge cases, behavior changes, and outliers, then delivers an HTML report where every finding links to real example traces. Use when you want to truly understand what your agent is doing in production.

2026-07-21

debug-with-langwatch

software-developers

Root-cause production errors and misbehaving agent runs with LangWatch. Finds errored traces, inspects spans, checks monitor and evaluator scores, then narrows to a root cause. Use when something is failing or misbehaving in production — errors, bad answers, latency spikes.

2026-07-21

eval-triage

software-developers

Investigate failing experiments and evaluations with LangWatch. Triage a failing experiment run to the exact rows and evaluator scores that regressed, then to a root cause. Use when an experiment fails, scores drop, or evaluations regress.

2026-07-21

setup-lw

network-and-computer-systems-administrators

Set up and troubleshoot the LangWatch CLI — login (cloud and self-hosted), endpoint configuration, project selection, and connection problems. Use when the CLI isn't authenticated, can't reach LangWatch, or talks to the wrong project.

2026-07-21

debug-instrumentation

software-developers

Debug and improve your LangWatch traces. Inspects production traces for missing input/output, disconnected spans, unlabeled traces, and missing metadata. Use when traces look broken or incomplete.

2026-07-20

evaluations

computer-occupations-all-other

Compatibility router for LangWatch evaluation requests. Use only when the user asks for evaluations without making it clear whether they mean pre-deployment experiments or production online evaluations. Routes the request to the focused companion skill and does not implement either workflow itself.

2026-07-20

Showing top 8 of 26 collected skills in this repository.

#002

skills

6 skills21updated 2026-07-17

17% of creator

skill

occupation

description

updated

scenarios

software-quality-assurance-analysts-and-testers

2026-07-17

analytics

software-developers

Analyze your AI agent's performance using LangWatch analytics. Use when the user wants to understand costs, latency, error rates, usage trends, or debug specific traces. Works with any LangWatch-instrumented agent.

2026-06-11

level-up

software-developers

Take your AI agent to the next level with full LangWatch integration. Adds tracing, prompt versioning, evaluation experiments, and simulation tests in one go. Use when the user wants comprehensive observability, testing, and prompt management for their agent.

2026-06-11

prompts

software-developers

Version and manage your agent's prompts with LangWatch Prompts CLI. Use for both onboarding (set up prompt versioning for an entire codebase) and targeted operations (version a specific prompt, create a new prompt version). Supports Python and TypeScript.

2026-06-11

debug-instrumentation

software-developers

Debug and improve your LangWatch traces. Inspects production traces for missing input/output, disconnected spans, unlabeled traces, and missing metadata. Use when traces look broken or incomplete.

2026-04-24

improve-setup

software-developers

Expert AI engineering consultant for your LangWatch setup. Audits your codebase, traces, evaluations, and scenarios, then guides you to improve — starting from low-hanging fruit and going deeper. Use when you want to level up your agent's engineering quality.

2026-04-24

#003

tokenmaxxer

3 skills40updated 2026-06-12

8.3% of creator

skill

occupation

description

updated

evaluations

software-developers

Set up comprehensive evaluations for your AI agent with LangWatch — experiments (batch testing), evaluators (scoring functions), datasets, online evaluation (production monitoring), and guardrails (real-time blocking). Supports both code (SDK) and platform (CLI) approaches. Use when the user wants to evaluate, test, benchmark, monitor, or safeguard their agent.

2026-06-12

scenarios

software-quality-assurance-analysts-and-testers

2026-06-12

tracing

software-developers

Add LangWatch tracing and observability to your code. Use for both onboarding (instrument an entire codebase) and targeted operations (add tracing to a specific function or module). Supports Python and TypeScript with all major frameworks.

2026-06-12

#004

kanban-code

1 skills29731updated 2026-04-18

2.8% of creator

skill

occupation

description

updated

kanban-code

computer-occupations-all-other

Inspect cards, orchestrate Claude sessions, and chat with other agents over Kanban Code's channels. Use whenever the user mentions Kanban Code, asks you to coordinate with another running Claude, or is working inside a card's tmux session and wants to use the `kanban` CLI. Covers channels (Slack-like rooms), DMs, handles, and history.

2026-04-18

Showing 4 of 4 repositories

All repositories loaded