Run any Skill in Manus with one click

$pwd:

rosetta-add-framework

Name: Rosetta Add Framework
Author: Arize-ai

// Add a new agent framework to the Rosetta Stone repo — researches the framework, builds all three observability tiers (no-observability, phoenix, ax), tests each, runs a Playwright smoke against the UI, updates README + TODO, and raises a PR. Trigger when the user asks to "add the <framework> framework", "implement <framework>", "wire up <framework>", or similar. The framework must be one of the Arize-supported agent frameworks (see TODO below).

Run Skill in Manus

$ git log --oneline --stat

stars:0

forks:0

updated:May 19, 2026 at 20:56

SKILL.md

readonly

related-skills.json

same repository

rosetta-add-framework-discover.md

from "Arize-ai/project-rosetta-stone"

Refresh the list of agent frameworks supported by Arize tracing and diff against what's already in the repo. Pulls live data from https://arize.com/docs/llms.txt and produces a clean to-do list. Part of the rosetta-add-framework flow; can also be invoked standalone to answer "what frameworks are left to add?"

2026-05-190

rosetta-add-framework-tier-build.md

from "Arize-ai/project-rosetta-stone"

Build a single tier (no-observability, phoenix, or ax) for a new framework. Clones the closest existing tier, swaps in framework-specific agent.py / tools.py / requirements.txt, and (for observability tiers) adds tracing.py + main.py wiring + eval-harness scripts. Part of the rosetta-add-framework flow; can be invoked standalone to rebuild a single tier from scratch.

2026-05-190

rosetta-add-framework-playwright.md

from "Arize-ai/project-rosetta-stone"

Run a public-flow Playwright smoke test against a freshly-built framework tier's Next.js frontend. Covers home page rendering + product browsing — the parts that don't require X/Twitter OAuth. The Playwright project (package.json, config, tests) lives inside this skill directory and is checked into the repo. Part of the rosetta-add-framework flow.

2026-05-190

rosetta-add-framework-docs.md

from "Arize-ai/project-rosetta-stone"

Finalise a newly-added framework — updates the README's supported-frameworks table, directory tree, and per-framework "what differs" section, marks off the framework in the orchestrator skill's embedded TODO, commits per tier, and raises a PR. Part of the rosetta-add-framework flow.

2026-05-190

rosetta-add-framework-tier-test.md

from "Arize-ai/project-rosetta-stone"

Test a freshly-built tier for a new framework — boots the backend, smoke-tests the chat endpoint, runs synthetic requests, and (for phoenix) runs the eval harness. Verifies traces land in the right project. Part of the rosetta-add-framework flow; can be invoked standalone after a build to validate.

2026-05-190

rosetta-demo-capture.md

from "Arize-ai/project-rosetta-stone"

Record a full Wonder Toys demo by running a canned 3-turn conversation (search dragons → buy plushie → ship), then opening Arize AX in Safari and screenshotting the session view plus every trace in it. Use when the user asks to "capture a demo", "record screenshots of an Arize session", "demo the agent flow", or any similar phrasing. macOS only — uses AppleScript and `screencapture`.

2026-05-190

package.json

"author": "Arize-ai"

"repository": "Arize-ai/project-rosetta-stone"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

rosetta-add-framework

description

Add a new agent framework to the Rosetta Stone repo — researches the framework, builds all three observability tiers (no-observability, phoenix, ax), tests each, runs a Playwright smoke against the UI, updates README + TODO, and raises a PR. Trigger when the user asks to "add the <framework> framework", "implement <framework>", "wire up <framework>", or similar. The framework must be one of the Arize-supported agent frameworks (see TODO below).

Rosetta — Add Framework Orchestrator

End-to-end workflow for bringing a new agent framework into the repo. Mirrors the existing structure: every new framework gets a no-observability/<framework>-py/ (or -ts/) directory, a phoenix/<framework>-py/ directory, and an ax/<framework>-py/ directory, each running an identical Wonder Toys shopping agent.

Inputs

<framework> — framework name in the form used by directory naming, e.g. agno, google-adk, smolagents, crewai. Use the slug from Arize's docs (https://arize.com/docs/ax/integrations/python-agent-frameworks/<slug>/<slug>-tracing).
<language> — py for Python, js/ts for TypeScript, java for Java. Affects which existing tier is used as the clone source. Note: Java tiers don't fit the current Node/Python repo shape — see the Java section under TODO before attempting.
--start-from <phase> (optional) — resume from discover | research | build-noobs | build-phoenix | build-ax | playwright | docs. Useful when a phase fails mid-flow.

TODO — frameworks not yet implemented

Snapshot (last refreshed 2026-05-18 from https://arize.com/docs/llms.txt). Each phase-1 discover run re-fetches and shows the diff.

Python (Tier A — clear agent frameworks)

TypeScript (Tier A)

BeeAI

Java (Tier A)

⚠️ Adding a Java tier is a significantly larger lift than adding another Python or TypeScript framework. The repo's current shape assumes a Node/Python stack — every existing tier is either a Next.js monolith or a FastAPI Python backend + Next.js frontend. A Java tier needs:

A JVM build setup (Maven or Gradle) added to the repo
A new backend serving the same /chat SSE protocol from Java (e.g. Spring Boot)
Either a separate Java→Next.js frontend bridge, or a Java-rendered UI
Java-specific tracing setup matching the Arize Java docs
Adjustments to scripts/start.sh patterns since the existing one is uv/npm-only

Decide whether the project wants this dimension before starting. If yes, the per-framework work is then:

LangChain4j
Spring AI
Arconia

Borderline — decide before building

Guardrails AI (output validation)
Instructor (structured output)
MCP / Model Context Protocol (wire protocol)
Portkey (LLM gateway)
Together AI (LLM provider)
OpenInference Annotation Tracing (Java) — @Chain/@LLM/@Tool/@Agent decorators applied via ByteBuddy for hand-built agents; not a framework

Phases

Execute in order. Each phase has a dedicated skill — read it and follow.

#	Phase	Skill
1	Discover + refresh TODO	`.claude/skills/rosetta-add-framework-discover/SKILL.md`
2	Pre-flight viability gate	(inline — see "Viability gate" below). Aborts cleanly if the framework can't be built as designed.
3	Research framework API + tracing	(inline — see "Research checklist" below)
4	Build no-observability tier	`.claude/skills/rosetta-add-framework-tier-build/SKILL.md` with `tier=no-observability`
5	Test no-observability tier	`.claude/skills/rosetta-add-framework-tier-test/SKILL.md` with `tier=no-observability`
6	Build phoenix tier	`.claude/skills/rosetta-add-framework-tier-build/SKILL.md` with `tier=phoenix`
7	Test phoenix tier (incl. evals)	`.claude/skills/rosetta-add-framework-tier-test/SKILL.md` with `tier=phoenix`
8	Build ax tier	`.claude/skills/rosetta-add-framework-tier-build/SKILL.md` with `tier=ax`
9	Test ax tier (synthetic only, evals UI-configured)	`.claude/skills/rosetta-add-framework-tier-test/SKILL.md` with `tier=ax`
10	Session.id verification + fix	(inline — see "Session.id audit" below)
11	Playwright UI smoke	`.claude/skills/rosetta-add-framework-playwright/SKILL.md`
12	Docs + commit + PR	`.claude/skills/rosetta-add-framework-docs/SKILL.md`

Viability gate (phase 2)

Before any code is written, verify three things. If any check fails, abort with a structured skip-with-reason report (see "Skip-with-reason vocabulary" below) and do not create any tier directories.

Check 1 — Anthropic Claude support

The repo standardises on Anthropic Claude. The framework must support it (directly via an Anthropic provider, via LiteLLM, via Bedrock, or similar).

Web-fetch the framework's "models" / "providers" docs page
Look for: an Anthropic provider class, a model string like anthropic/..., a LiteLlm / LiteLLM wrapper, or a Bedrock Anthropic integration
If none found → skip with reason no-anthropic-support

Check 2 — Streaming text API

The Wonder Toys agent streams SSE deltas to the UI. The framework must expose a token-level streaming API for text responses (not just final aggregates).

Look for: an async for chunk in agent.run_stream(...) shape, an event-bus with text-delta events, a RunConfig(streaming_mode=SSE)-style flag, or callbacks on partial tokens
If the framework only returns final aggregated responses → skip with reason no-streaming-api
If it streams only audio/video (e.g. Pipecat) → skip with reason not-chat-shaped

Check 3 — OpenInference instrumentation

The framework must have a published openinference-instrumentation-<framework> (or equivalent) package, or be auto-instrumented by a more general OpenInference instrumentor.

# Quick existence check via PyPI
curl -sf "https://pypi.org/pypi/openinference-instrumentation-${FRAMEWORK}/json" > /dev/null \
  && echo "instrumentation: yes" || echo "instrumentation: NO"

If the package doesn't exist AND no general instrumentor covers it → skip with reason no-instrumentation

Check 4 — Framework shape fits "chat-to-purchase"

Some frameworks aren't conversational by design — they assume batch task execution (CrewAI was on the edge of this; we adapted), declarative pipelines (DSPy), voice/video streams (Pipecat), or batch inference. Apply judgment:

If the framework has no conversational primitive (session/thread/message history) and no easy way to fake one by threading history into a task description → skip with reason not-chat-shaped
Borderline cases: build it but document the awkwardness inline (CrewAI's per-turn fresh Crew is an acceptable adaptation)

Skip-with-reason vocabulary

When a framework can't be built, write a structured marker so the TODO list and PR history stay honest. Apply each in the orchestrator's TODO and in MEMORY.md:

Reason	Meaning	Recovery
`no-anthropic-support`	Framework doesn't reach Claude at all	Revisit if framework adds Anthropic
`no-streaming-api`	No token-level text streaming	Revisit if framework adds streaming
`no-instrumentation`	No OpenInference package	Revisit if Arize adds it
`not-chat-shaped`	Voice/video, declarative, batch, etc.	Likely never — document why
`dep-conflict`	Installing the framework breaks an existing tier's deps in the shared venv (see "Regression check" below)	Use per-framework venv or pin transitive deps before retrying

Record skips in .claude/skills/rosetta-add-framework/SKILL.md (this file) under the TODO section by changing [ ] to [~] and appending the reason — e.g. - [~] Pipecat — skipped: not-chat-shaped (voice/video only). The discover phase preserves these markers across runs.

Research checklist (phase 2)

Before building, gather these answers. Save anything non-obvious to project memory so the next framework's research is faster.

Framework API

Agent construction — class name, constructor signature, how to pass system prompt and Anthropic Claude
Tool registration — decorator or constructor list? Does it accept Annotated[..., Field(description=...)] for params?
Streaming API — what method/event stream emits text deltas? How is a tool-call event distinguished from a text-delta event?
Conversation memory — does it have a session/thread primitive, or does the caller manage history? How is history passed across turns?
Anthropic Claude — model string format. Does the framework require a specific Sonnet version (some require strict-tools support, only available on 4.5+)?
Installed version — pin to a tested version in requirements.txt

Arize tracing

Phoenix pattern — fetch https://arize.com/docs/phoenix/integrations/python/<framework>/<framework>-tracing or equivalent. What instrumentor + span processor? Does register() route to the right project, or is the manual TracerProvider(resource=Resource.create({PROJECT_NAME: ...})) pattern required?
AX pattern — fetch https://arize.com/docs/ax/integrations/python-agent-frameworks/<framework>/<framework>-tracing. Same questions.
Session propagation — does the framework's OpenInference instrumentation auto-emit session.id, or is using_session(user_id) wrap required? Grep the installed openinference-instrumentation-<framework> package for session.id references.

Check memory first

Before web-fetching, check project memory for framework_<name>_* keys — earlier work on similar frameworks may have already documented these answers.

Session.id audit (phase 9)

After all 3 tiers are built and tested:

Run the AX tier, fire a single chat with x-user-id: session-audit-$(date +%s)
Wait 30s for ingestion
Query: ax traces list <project> -s $SPACE -o json --limit 10 and check whether the most recent trace has attributes.session.id set
If set — done. Session works automatically via the framework's primitives or its OpenInference instrumentation.

If not set — add using_session(user_id) wrap to agent.py following the CrewAI pattern:

from contextlib import nullcontext
try:
    from openinference.instrumentation import using_session
except ImportError:
    def using_session(session_id: str):
        return nullcontext()
...
with using_session(user_id):
    # the agent call that emits spans
    ...

Apply identically to all 3 tiers (the try/except keeps no-observability working). Re-test.

Failure handling

Discover fails (Arize docs unreachable, framework not in supported list) → abort, surface the diff
Viability gate fails → record the skip-with-reason in this file's TODO (change [ ] to [~] with reason), commit only that change, don't create any tier directories
Research surfaces a blocker not caught by the gate → same handling as viability-gate failure; tighten the gate's checks so the next framework catches it earlier
Build/test phase fails → fix the agent.py / requirements.txt, re-run just that phase via --start-from
Session.id audit fails after fix → document inline in agent.py comments and in commit message; mark the framework as having a session-tagging caveat in the PR description

Failure isolation (rollback) — for long autonomous runs

A long-running orchestrator pass should leave the repo in a clean, committable state regardless of whether any single framework succeeded. If a framework fails mid-build or mid-test:

Kill backend processes on ports 8001 (FastAPI), 3000 (Next.js), 6006 (Phoenix). Leave ChromaDB on 8000 alone — it's shared and slow to re-init.
Drop any half-built tier directories for this framework. The cleanest check: a tier counts as "complete" only after its corresponding test phase passes. Anything else is removable. Specifically: rm -rf <tier>/<FRAMEWORK_DIR> for any tier whose tier-test skill didn't return a PASS for this run.
Remove the framework's .env.local files from any directories that survived the cleanup.
Reset shared venv if the failure was a dep-conflict (see Regression check below). Reinstall the previous framework's requirements.txt to restore.
Branch hygiene — if the orchestrator created feat/add-<framework> and committed anything, hard-reset the branch back to origin/main or delete the branch outright. Do not push a partial branch.
Record the failure under the framework's TODO entry with [!] and a short reason (e.g. - [!] Smolagents — failed: streaming events never carry text deltas). Different marker from [~] so a human can see "tried but broke" vs "skipped on principle".

After rollback, the orchestrator may proceed to the next framework in the TODO (autonomous-run mode) or stop and surface the failure (default).

Regression check (after each Build phase)

After installing a new framework's requirements.txt into the shared .venv, verify that no previously-implemented Python tier's agent.py has broken imports. Dependency conflicts have surfaced before (e.g. installing CrewAI pinned opentelemetry-semconv==0.55b1 which then broke Pydantic AI's instrumentation which needs >=0.62b1).

for tier in no-observability phoenix ax; do
  for fwk_dir in $tier/*-py/; do
    [ "$fwk_dir" = "$TIER/$FRAMEWORK_DIR/" ] && continue   # skip the tier we just built
    fwk_name=$(basename "$fwk_dir")
    /Users/jimbobbennett/github/project-rosetta-stone/.venv/bin/python -c "
import sys; sys.path.insert(0, '$fwk_dir')
try:
    import backend.agent
    print('  OK: $fwk_name')
except Exception as e:
    print(f'  BROKEN: $fwk_name — {type(e).__name__}: {e}')
    sys.exit(1)
" || exit 1
  done
done

If any existing tier breaks, the new framework's install introduced a dep conflict. Mark the new framework as [!] with reason dep-conflict, run the rollback procedure above, and surface the conflict.

(Per-framework venv isolation would prevent this entirely — that's a future refactor. For now, the regression check is the cheap defence.)

State to thread through phases

The orchestrator should maintain these throughout:

FRAMEWORK — the slug (google-adk)
FRAMEWORK_DIR — derived (google-adk-py for Python, google-adk for TS — match existing naming)
LANGUAGE — py or ts
CLONE_SOURCE — closest existing tier to copy from. For Python: prefer pydantic-ai-py (simplest agent.py); for TS: prefer vercel-ai-sdk. The build skill takes this as input.
MODEL_OVERRIDE — optional. If the framework can't use the project default claude-sonnet-4-20250514 (e.g. CrewAI needed Sonnet 4.5), set this here and propagate.

Print these at the start of the run so a human can resume.

End-of-run output

Framework added: <FRAMEWORK>
  Tiers: no-observability, phoenix, ax — all green
  Traces verified: phoenix ✓, ax ✓
  Sessions tagged: ✓ (auto | via using_session wrap)
  Playwright smoke: ✓
  Docs updated: README.md, TODO snapshot in this skill
  PR: <URL>

rosetta-add-framework

More from this repository

More from this repository

Rosetta — Add Framework Orchestrator

Inputs

TODO — frameworks not yet implemented

Python (Tier A — clear agent frameworks)

TypeScript (Tier A)

Java (Tier A)

Borderline — decide before building

Phases

Viability gate (phase 2)

Check 1 — Anthropic Claude support

Check 2 — Streaming text API

Check 3 — OpenInference instrumentation

Check 4 — Framework shape fits "chat-to-purchase"

Skip-with-reason vocabulary

Research checklist (phase 2)

Framework API

Arize tracing

Check memory first

Session.id audit (phase 9)

Failure handling

Failure isolation (rollback) — for long autonomous runs

Regression check (after each Build phase)

State to thread through phases

End-of-run output

Rosetta — Add Framework Orchestrator

Inputs

TODO — frameworks not yet implemented

Python (Tier A — clear agent frameworks)

TypeScript (Tier A)

Java (Tier A)

Borderline — decide before building

Phases

Viability gate (phase 2)

Check 1 — Anthropic Claude support

Check 2 — Streaming text API

Check 3 — OpenInference instrumentation

Check 4 — Framework shape fits "chat-to-purchase"

Skip-with-reason vocabulary

Research checklist (phase 2)

Framework API

Arize tracing

Check memory first

Session.id audit (phase 9)

Failure handling

Failure isolation (rollback) — for long autonomous runs

Regression check (after each Build phase)

State to thread through phases

End-of-run output