一键在 Manus 中运行任何 Skill

$pwd:

llmobs-testing

Name: Llmobs Testing
Author: DataDog

// Use when writing, modifying, or debugging tests for an LLMObs plugin in dd-trace-js. Triggers: "write LLMObs tests", "test an LLMObs plugin", "assertLlmObsSpanEvent", "useLlmObs", "getEvents", any MOCK_* matcher ("MOCK_STRING" / "MOCK_NOT_NULLISH" / "MOCK_NUMBER" / "MOCK_OBJECT"), "VCR cassette", "vcr proxy", "127.0.0.1:9126", any LlmObsCategory test ("LLM_CLIENT" / "MULTI_PROVIDER" / "ORCHESTRATION" / "INFRASTRUCTURE").

在 Manus 中运行

$ git log --oneline --stat

stars:811

forks:391

updated:2026年5月28日 18:55

文件资源管理器

5 个文件

SKILL.md

readonly

name

llmobs-testing

description

Use when writing, modifying, or debugging tests for an LLMObs plugin in dd-trace-js. Triggers: "write LLMObs tests", "test an LLMObs plugin", "assertLlmObsSpanEvent", "useLlmObs", "getEvents", any MOCK_* matcher ("MOCK_STRING" / "MOCK_NOT_NULLISH" / "MOCK_NUMBER" / "MOCK_OBJECT"), "VCR cassette", "vcr proxy", "127.0.0.1:9126", any LlmObsCategory test ("LLM_CLIENT" / "MULTI_PROVIDER" / "ORCHESTRATION" / "INFRASTRUCTURE").

LLM Observability Testing Skill

Determine the package category first

Before writing any test, determine the package's LlmObsCategory. Category picks the test strategy (VCR or not), the span kind, and the test structure. The wrong category produces tests that pass against the wrong contract — VCR cassettes for a workflow library produce empty recordings; pure-function tests for an HTTP-call wrapper miss the network surface entirely.

Quick check:

Direct HTTP calls to an LLM provider? → LLM_CLIENT or MULTI_PROVIDER — VCR.
Workflow / graph orchestration with state? → ORCHESTRATION — no VCR, pure functions, real LLM as the orchestration node.
Protocol / server implementation? → INFRASTRUCTURE — mock server.

See references/category-strategies.md for the FORBIDDEN-vs-REQUIRED matrix per category.

Core Testing Concepts

1. Test Structure

LLMObs tests use special helpers to validate span events.

Key components:

useLlmObs() - Initializes LLMObs test environment
getEvents() - Retrieves captured span events
assertLlmObsSpanEvent() - Validates span structure with flexible matchers

Basic test flow:

Initialize test environment with useLlmObs({ plugin: 'name' })
Call instrumented method (chat completion, workflow execution, etc.)
Get captured span events with getEvents()
Validate span structure with assertLlmObsSpanEvent()

See references/test-structure.md for complete test file templates.

2. VCR Cassettes

VCR records real API calls and replays them in tests for deterministic testing without external dependencies.

Purpose:

Record real LLM API responses once
Replay deterministically in CI without API keys
No external dependencies after recording

How it works:

Configure proxy baseURL: http://127.0.0.1:9126/vcr/{provider}
Run tests with real API keys (first time only)
VCR proxy records requests/responses to cassette files
Subsequent test runs replay from cassettes (no API keys needed)

Cassette location: test/llmobs/plugins/{integration}/cassettes/

When to use VCR:

✅ LlmObsCategory.LLM_CLIENT (Direct API wrappers)
✅ LlmObsCategory.MULTI_PROVIDER (Multi-provider frameworks)
❌ LlmObsCategory.ORCHESTRATION (Pure functions, no API calls)
❌ LlmObsCategory.INFRASTRUCTURE (Mock servers instead)

See references/vcr-cassettes.md for recording process and troubleshooting.

3. Category-Specific Test Strategies

The category-determination block at the top maps category to strategy. Non-obvious bits per category:

LLM_CLIENT / MULTI_PROVIDER: VCR proxy baseURL is http://127.0.0.1:9126/vcr/{provider}. Span kind: 'llm'. Cassettes record once with real API keys; CI replays them.
ORCHESTRATION: Span kind: 'workflow' or 'agent', never 'llm'. No VCR, no real API calls — the orchestrator itself doesn't make HTTP calls, it coordinates libraries that do. Mock LLM responses as plain return values from the node so the test exercises the workflow execution, not the provider API.
INFRASTRUCTURE: Mock server, protocol-specific validation, no VCR.

See references/category-strategies.md for per-category patterns.

4. Assertion Patterns

assertLlmObsSpanEvent(actual, expected)

Validates span structure with flexible matchers for non-deterministic values.

Available matchers:

MOCK_STRING - Matches any non-empty string (use for output text)
MOCK_NOT_NULLISH - Matches any truthy value (use for token counts)
MOCK_NUMBER - Matches any number
MOCK_OBJECT - Matches any object (use for errors)

Assertable fields:

spanKind (required) - Span type from LlmObsSpanKind enum
name - Operation name
modelName - Model identifier (for LLM spans)
modelProvider - Provider name (for LLM spans)
inputMessages - Input messages in [{content, role}] format
outputMessages - Output messages in [{content, role}] format
metrics - Token usage (input_tokens, output_tokens, total_tokens)
metadata - Model parameters (temperature, max_tokens, etc.)
error - Error object (if operation failed)

Partial validation: Only specified fields are checked, others ignored.

See references/assertion-helpers.md for complete API and patterns.

Test File Organization

Location: test/llmobs/plugins/{integration}/index.spec.js

Structure:

Import helpers from '../../util'
Initialize LLMObs test environment
Load modules in beforeEach() for fresh state
Group tests by method (describe('chat completions', ...))
Cover all instrumented methods
Test error cases

Standard imports:

useLlmObs, assertLlmObsSpanEvent, MOCK_STRING, MOCK_NOT_NULLISH, MOCK_NUMBER, MOCK_OBJECT

See references/test-structure.md for complete template.

Key Testing Points

Coverage Requirements

Test all instrumented methods with:

✅ Basic operation (single message/call)
✅ Multi-turn conversations (if applicable)
✅ Error cases
✅ All required span fields (spanKind, name, modelName, modelProvider)
✅ Message format validation ({content, role} structure)
✅ Metrics validation (token counts exist and are truthy)
✅ Metadata validation (parameters passed through)

Span Kind Validation

Match span kind to operation type using LlmObsSpanKind enum:

Chat/completions → 'llm'
Workflow execution → 'workflow'
Agent runs → 'agent'
Tool calls → 'tool'
Embeddings → 'embedding'
Retrieval → 'retrieval'

Error Handling

On errors, validate:

Empty output messages: [{content: '', role: ''}]
Error object exists: error: MOCK_OBJECT
Span still created (not dropped)

References

For detailed information, see:

references/test-structure.md - Complete test file templates and organization
references/vcr-cassettes.md - VCR recording process, cassette management, troubleshooting
references/assertion-helpers.md - Complete assertLlmObsSpanEvent API, matchers, patterns
references/category-strategies.md - Detailed test strategies for each LlmObsCategory

related-skills.json

同仓库

llmobs-integration.md

from "DataDog/dd-trace-js"

Use when adding, debugging, or modifying LLMObs plugins for an LLM library in dd-trace-js. Triggers: "add LLMObs support", "instrument chat completions / streaming / embeddings / agent runs / orchestration / tool calls / retrieval", "LLMObsPlugin", "getLLMObsSpanRegisterOptions", "setLLMObsTags", "LlmObsCategory", "LlmObsSpanKind", any provider tag ("openai" / "anthropic" / "genai" / "google" / "langchain" / "langgraph" / "ai-sdk" llmobs), "VCR cassettes".

2026-05-28811

apm-integrations.md

from "DataDog/dd-trace-js"

This skill should be used when the user asks to "add a new integration", "instrument a library", "add instrumentation for", "create instrumentation", "new dd-trace integration", "add tracing for", "TracingPlugin", "DatabasePlugin", "CachePlugin", "ClientPlugin", "ServerPlugin", "CompositePlugin", "ConsumerPlugin", "ProducerPlugin", "addHook", "shimmer.wrap", "orchestrion", "bindStart", "bindFinish", "startSpan", "diagnostic channel", "runStores", "reference plugin", "example plugin", "similar integration", or needs to build, modify, or debug the instrumentation and plugin layers for a third-party library in dd-trace-js.

2026-03-04811

package.json

"author": "DataDog"

"repository": "DataDog/dd-trace-js"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

name

llmobs-testing

description

LLM Observability Testing Skill

Determine the package category first

Quick check:

Direct HTTP calls to an LLM provider? → LLM_CLIENT or MULTI_PROVIDER — VCR.
Workflow / graph orchestration with state? → ORCHESTRATION — no VCR, pure functions, real LLM as the orchestration node.
Protocol / server implementation? → INFRASTRUCTURE — mock server.

See references/category-strategies.md for the FORBIDDEN-vs-REQUIRED matrix per category.

Core Testing Concepts

1. Test Structure

LLMObs tests use special helpers to validate span events.

Key components:

useLlmObs() - Initializes LLMObs test environment
getEvents() - Retrieves captured span events
assertLlmObsSpanEvent() - Validates span structure with flexible matchers

Basic test flow:

Initialize test environment with useLlmObs({ plugin: 'name' })
Call instrumented method (chat completion, workflow execution, etc.)
Get captured span events with getEvents()
Validate span structure with assertLlmObsSpanEvent()

See references/test-structure.md for complete test file templates.

2. VCR Cassettes

VCR records real API calls and replays them in tests for deterministic testing without external dependencies.

Purpose:

Record real LLM API responses once
Replay deterministically in CI without API keys
No external dependencies after recording

How it works:

Configure proxy baseURL: http://127.0.0.1:9126/vcr/{provider}
Run tests with real API keys (first time only)
VCR proxy records requests/responses to cassette files
Subsequent test runs replay from cassettes (no API keys needed)

Cassette location: test/llmobs/plugins/{integration}/cassettes/

When to use VCR:

✅ LlmObsCategory.LLM_CLIENT (Direct API wrappers)
✅ LlmObsCategory.MULTI_PROVIDER (Multi-provider frameworks)
❌ LlmObsCategory.ORCHESTRATION (Pure functions, no API calls)
❌ LlmObsCategory.INFRASTRUCTURE (Mock servers instead)

See references/vcr-cassettes.md for recording process and troubleshooting.

3. Category-Specific Test Strategies

The category-determination block at the top maps category to strategy. Non-obvious bits per category:

LLM_CLIENT / MULTI_PROVIDER: VCR proxy baseURL is http://127.0.0.1:9126/vcr/{provider}. Span kind: 'llm'. Cassettes record once with real API keys; CI replays them.
ORCHESTRATION: Span kind: 'workflow' or 'agent', never 'llm'. No VCR, no real API calls — the orchestrator itself doesn't make HTTP calls, it coordinates libraries that do. Mock LLM responses as plain return values from the node so the test exercises the workflow execution, not the provider API.
INFRASTRUCTURE: Mock server, protocol-specific validation, no VCR.

See references/category-strategies.md for per-category patterns.

4. Assertion Patterns

assertLlmObsSpanEvent(actual, expected)

Validates span structure with flexible matchers for non-deterministic values.

Available matchers:

MOCK_STRING - Matches any non-empty string (use for output text)
MOCK_NOT_NULLISH - Matches any truthy value (use for token counts)
MOCK_NUMBER - Matches any number
MOCK_OBJECT - Matches any object (use for errors)

Assertable fields:

spanKind (required) - Span type from LlmObsSpanKind enum
name - Operation name
modelName - Model identifier (for LLM spans)
modelProvider - Provider name (for LLM spans)
inputMessages - Input messages in [{content, role}] format
outputMessages - Output messages in [{content, role}] format
metrics - Token usage (input_tokens, output_tokens, total_tokens)
metadata - Model parameters (temperature, max_tokens, etc.)
error - Error object (if operation failed)

Partial validation: Only specified fields are checked, others ignored.

See references/assertion-helpers.md for complete API and patterns.

Test File Organization

Location: test/llmobs/plugins/{integration}/index.spec.js

Structure:

Import helpers from '../../util'
Initialize LLMObs test environment
Load modules in beforeEach() for fresh state
Group tests by method (describe('chat completions', ...))
Cover all instrumented methods
Test error cases

Standard imports:

useLlmObs, assertLlmObsSpanEvent, MOCK_STRING, MOCK_NOT_NULLISH, MOCK_NUMBER, MOCK_OBJECT

See references/test-structure.md for complete template.

Key Testing Points

Coverage Requirements

Test all instrumented methods with:

✅ Basic operation (single message/call)
✅ Multi-turn conversations (if applicable)
✅ Error cases
✅ All required span fields (spanKind, name, modelName, modelProvider)
✅ Message format validation ({content, role} structure)
✅ Metrics validation (token counts exist and are truthy)
✅ Metadata validation (parameters passed through)

Span Kind Validation

Match span kind to operation type using LlmObsSpanKind enum:

Chat/completions → 'llm'
Workflow execution → 'workflow'
Agent runs → 'agent'
Tool calls → 'tool'
Embeddings → 'embedding'
Retrieval → 'retrieval'

Error Handling

On errors, validate:

Empty output messages: [{content: '', role: ''}]
Error object exists: error: MOCK_OBJECT
Span still created (not dropped)

References

For detailed information, see:

references/test-structure.md - Complete test file templates and organization
references/vcr-cassettes.md - VCR recording process, cassette management, troubleshooting
references/assertion-helpers.md - Complete assertLlmObsSpanEvent API, matchers, patterns
references/category-strategies.md - Detailed test strategies for each LlmObsCategory

llmobs-testing

LLM Observability Testing Skill

Determine the package category first

Core Testing Concepts

1. Test Structure

2. VCR Cassettes

3. Category-Specific Test Strategies

4. Assertion Patterns

Test File Organization

Key Testing Points

Coverage Requirements

Span Kind Validation

Error Handling

References

同仓库更多 Skills

同仓库更多 Skills

LLM Observability Testing Skill

Determine the package category first

Core Testing Concepts

1. Test Structure

2. VCR Cassettes

3. Category-Specific Test Strategies

4. Assertion Patterns

Test File Organization

Key Testing Points

Coverage Requirements

Span Kind Validation

Error Handling

References