원클릭으로 Manus에서 모든 스킬 실행

$pwd:

genai-conformance

Name: Genai Conformance
Author: Arize-ai

// Run, interpret, and iterate on the OpenInference GenAI conformance MVP at python/openinference-instrumentation/scripts/conformance/. Use when the user mentions GenAI conformance, OTel GenAI semantic conventions, Weaver registry live-check, the dual-write conversion (`_genai_conversion.py`, `enable_genai_semconv`), `gen_ai.*` attribute coverage, or asks to add new providers / scenarios to the conformance harness.

Manus에서 실행

$ git log --oneline --stat

stars:998

forks:245

updated:2026년 5월 14일 16:29

SKILL.md

readonly

related-skills.json

같은 저장소

python-canary-fix.md

from "Arize-ai/openinference"

Investigate and propose fixes for Python canary cron failures in the openinference repo. Use when the user mentions Python canary failures, Python cron failures, or when the auto-fix CI job reports Python instrumentation canary issues.

2026-05-05998

js-docs-sync.md

from "Arize-ai/openinference"

Keep hand-written docs/ documentation in JS packages accurate and up to date with their source code. Use this skill whenever: (1) source files in a JS package that has a docs/ folder are modified — especially exports, function signatures, types, or public API changes, (2) the user asks to "update docs", "sync docs", "check if docs are accurate", "review the documentation", or similar, (3) new exports or features are added to a JS package and the docs need to reflect them. Also trigger when the user mentions documentation drift, stale examples, or missing API coverage in any JS package under js/packages/.

2026-04-03998

java-code-reviewer.md

from "Arize-ai/openinference"

Review Java OpenInference instrumentation code for correctness and completeness. Use this skill when reviewing a Java instrumentor package — whether it's a new instrumentor, a PR that modifies one, or when the user asks to audit/review/check an existing instrumentor's code quality. Trigger on phrases like "review the instrumentor", "check the Java code", "audit the package", "is this instrumentor correct", or any request to validate an OpenInference Java instrumentation package against project standards.

2026-03-21998

python-code-reviewer.md

from "Arize-ai/openinference"

Review Python OpenInference instrumentation code for correctness and completeness. Use this skill when reviewing a Python instrumentor package — whether it's a new instrumentor, a PR that modifies one, or when the user asks to audit/review/check an existing instrumentor's code quality. Trigger on phrases like "review the instrumentor", "check the code", "audit the package", "is this instrumentor correct", or any request to validate an OpenInference Python instrumentation package against project standards.

2026-03-11998

package.json

"author": "Arize-ai"

"repository": "Arize-ai/openinference"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

소프트웨어 품질 보증 분석가·테스터컴퓨터 및 수학직15-1253L4

name

genai-conformance

description

Run, interpret, and iterate on the OpenInference GenAI conformance MVP at python/openinference-instrumentation/scripts/conformance/. Use when the user mentions GenAI conformance, OTel GenAI semantic conventions, Weaver registry live-check, the dual-write conversion (`_genai_conversion.py`, `enable_genai_semconv`), `gen_ai.*` attribute coverage, or asks to add new providers / scenarios to the conformance harness.

GenAI Conformance

The repo ships a self-contained conformance harness at python/openinference-instrumentation/scripts/conformance/ that exercises OpenInference instrumentors against deterministic mock provider APIs, exports OTLP traces to weaver registry live-check, and prints a console summary of registry attributes seen / missing / advice-level counts. It validates the dual-write logic in _genai_conversion.py that translates OpenInference's native attributes (llm.*, input.*, output.*, openinference.*) into the OTel GenAI semantic conventions (gen_ai.*).

When to Use

User asks to run the conformance harness, "test conformance", or "run weaver".
User asks to maximize / improve gen_ai.* registry coverage.
User wants to extend the dual-write conversion in _genai_conversion.py.
User wants to add a new provider, a new test scenario, or a new mock endpoint.
User mentions specific gen_ai.* attributes (response.id, system_instructions, tool.call., retrieval., etc.) and whether they're being emitted.

Layout

scripts/conformance/
├── run.py                 # orchestrator (PEP 723, stdlib only)
├── mock_server.py         # Flask mock with all providers' endpoints
├── anthropic_conformance.py      # PEP 723 + editable [tool.uv.sources]
├── openai_conformance.py         # PEP 723 + editable [tool.uv.sources]
├── google_genai_conformance.py   # PEP 723 + editable [tool.uv.sources]
├── README.md
└── results/                      # gitignored Weaver output

Each provider script declares its deps as PEP 723 inline metadata and pins the local OpenInference packages via [tool.uv.sources.<pkg>] blocks (multi-section dotted-key form — single-line inline tables exceed ruff's 100-char limit). run.py invokes everything via uv run. Filenames avoid the bare provider name (openai.py, anthropic.py) because that would shadow the SDK package on sys.path[0].

run.py lives in PROVIDER_SCRIPTS — a tuple iterated for both prewarm and execution. To add a provider, append to PROVIDER_SCRIPTS and add the corresponding <provider>_conformance.py and any new mock endpoints.

Running

uv run python/openinference-instrumentation/scripts/conformance/run.py

First run downloads pinned weaver v0.22.1 and semantic-conventions v1.40.0 to ~/.cache/oi-conformance/; subsequent runs are fast. uv caches each provider script's env by PEP 723 metadata hash.

Interpreting the summary

Registry attributes seen — gen_ai.* (and a few service.* / telemetry.sdk.*) attrs the run emitted, with sample counts.
Non-registry attributes seen — OpenInference's native vocabulary. These show up as Weaver missing_attribute violations by design — they aren't (and shouldn't be) in the OTel registry.
Missing registry attributes (gen_ai.*) — registry attrs the run did not emit. Categorize each one:
1. Real dual-write gap — provider API has the data, instrumentor captures it as an OI attr, but _genai_conversion.py doesn't map it. Fixable in conversion.
2. Test scenario gap — conversion handles it, but the test doesn't exercise the relevant scenario (e.g. gen_ai.tool.call.* need a TOOL span; gen_ai.embeddings.* need an EMBEDDING span). Fixable in <provider>_conformance.py.
3. Mock data gap — instrumentor would capture it if the response included it (e.g. gen_ai.usage.cache_read.input_tokens requires cache_read_input_tokens in the mock's usage block). Fixable in mock_server.py.
4. Provider doesn't support it — e.g. Anthropic has no frequency_penalty. Document and skip.
5. Application-level / not auto-emittable — gen_ai.agent.*, gen_ai.evaluation.*, gen_ai.prompt.name, gen_ai.data_source.id. Require explicit user attribution; out of scope for SDK instrumentation.
6. Metric-only — gen_ai.token.type lives on gen_ai.client.token.usage metric, not spans.
Advice levels — violation counts are predominantly missing_attribute for the OI native vocab (expected); improvement counts are not_stable warnings for development-stage gen_ai.* attrs (also expected). The dual-write itself is well-formed — Weaver does not flag type/shape/value errors on the emitted gen_ai.* attrs.

Iterating to maximize coverage

For category 1 (dual-write gap):

Inspect results/live_check.json to see exactly what OI attributes the instrumentor emitted (look for the relevant span's attributes array).
Decide where to extend _genai_conversion.py (get_genai_request_attributes, get_genai_response_attributes, etc.).
Always add a unit test in test_genai.py for the new path. The existing tests cover the major span kinds; mirror that style.
Re-run the conformance harness. Verify the missing list shrinks and no existing gen_ai.* attribute regressed.

For category 2 (test scenario gap):

Use OITracer(trace.get_tracer(__name__), TraceConfig(enable_genai_semconv=True)) to manually emit non-LLM spans (TOOL, RETRIEVER, EMBEDDING, AGENT) inside a provider script. The Anthropic script already does this for TOOL / RETRIEVER / EMBEDDING — copy the pattern.

For category 3 (mock data gap):

Mock responses are simple dicts at the top of mock_server.py. The Anthropic mock already returns cache_creation_input_tokens / cache_read_input_tokens; the OpenAI mock returns prompt_tokens_details.cached_tokens. Add fields the SDK will surface and the OI instrumentor will turn into LLM_TOKEN_COUNT_PROMPT_DETAILS_CACHE_*.

Bumping the semconv version

The harness pins SEMCONV_VERSION (currently v1.41.1) and WEAVER_VERSION (currently v0.23.0) in run.py. When OTel cuts a new semconv release, walk this checklist:

Check for a newer Weaver release too — always run gh release list --repo open-telemetry/weaver --limit 5 alongside the semconv check. Weaver and the registry version independently; the harness depends on both. Bump WEAVER_VERSION whenever a newer release exists, and skim its notes for live-check-relevant fixes.
Bump the constants in run.py: SEMCONV_VERSION and WEAVER_VERSION to the latest releases.
Run the harness once (uv run python/openinference-instrumentation/scripts/conformance/run.py) so it downloads the new registry into ~/.cache/oi-conformance/semconv/<new-version>/.
Refresh the vendored JSON schemas at tests/fixtures/genai_schemas/ from ~/.cache/oi-conformance/semconv/<new-version>/docs/gen-ai/gen-ai-{input,output}-messages.json.
Run unit tests (pytest tests/test_genai.py). The _load_json_attribute validator runs the new schemas against every emitted message payload — any breaking shape change surfaces here.
Skim the semconv changelog for these specific risks (each one usually requires a code change in _genai_conversion.py):
- New required fields on ChatMessage / OutputMessage parts (TextPart, ToolCallRequestPart, etc.) → builder functions need to populate them.
- New Role enum values → _normalize_message_role may need a mapping.
- New FinishReason enum values → _normalize_finish_reason may need a mapping.
- Added gen_ai.* registry attrs → opportunity for new dual-write mappings; re-run the harness and look at the "Missing registry attributes" summary.
- Removed / renamed gen_ai.* attrs → drop from _genai_attributes.py and stop emitting in _genai_conversion.py.
Refresh inline version refs: the semconv-version mentions in test_genai.py (schema-source comment), README.md (caveats section, includes Weaver version too), and _genai_conversion.py (the encoding comment inside get_genai_message_attributes).
Re-run the conformance harness end-to-end; verify no gen_ai.* attribute regressed and no genuine shape errors appear in results/live_check.json (advice with id != "missing_attribute").

Gotchas

PEP 723 inline-table line length: [tool.uv.sources.<pkg>] { path = "...", editable = true } on one line easily exceeds 100 chars and trips ruff E501. Use the multi-section form ([tool.uv.sources.<pkg>]\npath = "..."\neditable = true).
The conformance dir is excluded from package-level checks: pyproject.toml excludes scripts/* from mypy and scripts from pytest's norecursedirs. ruff still lints it. Don't add Python imports that mypy/pytest can't resolve in the lint env (e.g. provider SDKs) outside this directory.
Weaver inactivity timeout is 90s. First-run uv installs of OpenTelemetry/OpenAI/Google SDKs can take a while — run.py prewarms each provider env via --prewarm early-exit before starting Weaver. If you add a new provider script, give it a --prewarm early-exit too.
Single mock server for all providers: one Flask app handles /v1/messages (Anthropic), /v1/chat/completions + /v1/embeddings + /v1/responses (OpenAI), and /v1beta/models/<path:model> (Google). Each endpoint discriminates between text and tool variants by checking body.get("tools").
Editable installs are mandatory: PyPI versions of openinference-instrumentation and the per-provider packages won't have in-progress dual-write changes. The [tool.uv.sources] blocks in each <provider>_conformance.py pin the local repo paths.
System messages stay in gen_ai.input.messages. The dual-write does not emit gen_ai.system_instructions; system instructions are assumed to flow through as a system-role entry in LLM_INPUT_MESSAGES. The gen_ai.input.messages JSON schema explicitly admits "system" as a valid role.
tool_call (singular) finish_reason — the conversion normalizes both tool_calls (OpenAI plural) and function_call (legacy) to "tool_call". The OTel registry doesn't constrain values for gen_ai.response.finish_reasons, but OTel's conventional value is plural tool_calls. If you change the normalization target, update the asserting tests in test_genai.py too.
Don't conflate violations with shape errors: the headline "violation: N" counts missing_attribute advice on OI native attrs (expected) plus any genuine shape mismatches on gen_ai.* attrs (real bugs). To find genuine shape errors, parse results/live_check.json and look at advice with id != "missing_attribute".

genai-conformance

이 저장소의 다른 Skills

이 저장소의 다른 Skills

GenAI Conformance

When to Use

Layout

Running

Interpreting the summary

Iterating to maximize coverage

Bumping the semconv version

Gotchas

GenAI Conformance

When to Use

Layout

Running

Interpreting the summary

Iterating to maximize coverage

Bumping the semconv version

Gotchas