一键在 Manus 中运行任何 Skill

capx-agentic-robotics

Agentic robotics with CaP-X — LLM-driven robot manipulation via code generation. Use when: (1) Setting up CaP-X / CaP-Gym environments for robot manipulation benchmarks, (2) Running CaP-Bench evaluations across LLMs/VLMs on robotic tasks, (3) Building or extending CaP-Agent0 agentic harnesses (skill libraries, visual differencing, parallel reasoning), (4) Training robot coding agents with CaP-RL (GRPO on code generation), (5) Developing perception APIs (SAM3, Molmo, depth, point clouds) or control APIs (IK solvers, grasp planners), (6) Sim-to-real transfer for Franka Panda, R1Pro humanoid, or other robot platforms, (7) Designing auto-synthesized skill libraries for physical manipulation (Voyager-style), (8) Integrating agentic robotics with Life Agent OS (Arcan orchestration, Spaces agent networking, Lago persistence), (9) Any task involving LLM-based robot control, manipulation benchmarks, robotic code synthesis, or embodied AI agents.

在 Manus 中运行

星标2

分支0

更新时间2026年6月4日 17:09

来源

broomva

broomva/skills

打开 GitHub 仓库查看创作者相关仓库

安装命令

下载

在 Manus 中运行

文件资源管理器

8 个文件

SKILL.md

readonly

同仓库更多 Skills

同仓库

revenuecast

broomva/skills

revenuecast — turn a real-world capability into a self-demonstrating, high-throughput generative-AI revenue engine (the "Kleos" method). It is "/skillify for revenue": skillify turns a workflow into a tested skill; revenuecast turns a capability into a revenue engine whose own output IS the advertisement. The bstack-native composition of the 2026 "show-then-sell-the-system" creator loop (realosias, aivideoskool, GenHQ): Brand-Lock -> Show -> Distribute -> Hook -> Sell -> Moat, where the showcased output's desirability + accessibility-via-AI creates demand for the method, and you monetize the reproducible system. Composes content-engine (the factory), content-creation, blog-post, seo-llmeo, arcan-glass, social-intelligence, strategy-skills, and symphony/arcan (autonomous runtime). Its deterministic core (scripts/revenuecast_check.py) gates an engine-instance manifest on the design canon — own-the-audience, a real moat (not leakable prompts), the compliance/survival pillar (FTC v. Air AI / EU AI Act Art.50 / NO

2026-06-092

skillify

broomva/skills

Skillify-as-a-verb — distill a working session (or a pointed-at chat history) into a permanent, TESTED, registered skill at the end of a workflow. The bstack-native composition of Garry Tan's 10-step "skillify it": look-back extraction → CreateSkill scaffold → latent/deterministic split → unit tests → resolver-eval (role-x.py eval) → script-test gate (bstack skills audit --require-tests) → P20 cross-review → bookkeeping file. Composes existing primitives; reimplements nothing. The deterministic core (scripts/skillify_check.py) makes "a feature that doesn't pass all ten is not a skill" machine-checkable. USE WHEN: "skillify it", "skillify this", "package this as a skill", "distill this into a skill", "make this a skill", "turn this into a skill", or at the end of an ad-hoc workflow that worked and should become permanent. NOT FOR: ingesting an external artifact (use /checkit); retrospective "what have I done repeatedly" discovery alone (use the look-back lens); a one-off task with no reusable procedure.

2026-06-062

handoff

broomva/skills

Fresh-session handoff doc drafting. Produces a stable, single-file human-readable narrative state for the NEXT agent context (fresh session, after `/clear`, after persist iteration, after a tab close). The artifact lives at `docs/handoffs/YYYY-MM-DD-<arc>.md` and follows a stable shape: TL;DR + State-of-the-world (P15 snapshot) + What-was-delivered (PR table with SHAs) + First action + Pickup state. Distinct from P12 persist's `PROMPT.md` (machine-state for cross-context loop) and the P1 Bridge session log (raw transcript) — the handoff is the narrative bridge a human reads in ten seconds and a fresh agent reads in thirty. Use when: (1) ending a substantive session that another agent will continue, (2) preparing a fresh-session pickup point mid-arc, (3) needing to compress a multi-PR arc into a single resumable document, (4) the user says "write a handoff" / "fresh-session handoff" / "let me come back to this tomorrow". Triggers on "handoff", "fresh-session", "fresh session", "pickup", "where we are", "leave

2026-06-062

investment-management

broomva/skills

Investment management skill — portfolio construction, analysis, and execution. Compounds finance-substrate (accounting/tax) + wealth-management (projections/goals) into a full financial framework. Covers traditional investing (stocks, ETFs, bonds), alternatives (crypto, prediction markets, real estate, VC), quantitative analytics (factor models, Monte Carlo, optimization), and platform integration (Alpaca, Coinbase, Polymarket, agent-browser for Colombian platforms). Embodies philosophies from Buffett, Dalio, Bogle, Munger, and Marks. Use when: (1) building or analyzing a portfolio, (2) screening stocks/ETFs/crypto, (3) running backtests or factor analysis, (4) executing trades or rebalancing, (5) tracking investments across platforms, (6) researching market data or fundamentals, (7) making investment decisions with structured frameworks. Triggers on 'investment', 'portfolio', 'stocks', 'ETF', 'bonds', 'crypto', 'trading', 'backtest', 'factor model', 'rebalance', 'Polymarket', 'Alpaca', 'asset allocation'.

2026-06-042

alkosto-wait-optimizer

broomva/skills

Estimate optimal waiting time for Alkosto's "every 25/50 customers" promotion using either checkout-flow observations or winner announcement timestamps. Use when the user asks how long to wait, wants a probability-based cutoff, or needs a fast in-store decision rule with uncertainty handling.

2026-06-042

harness-engineering-playbook

broomva/skills

Implement OpenAI Harness Engineering practices in any repository — AGENTS.md, PLANS.md, deterministic smoke/test/lint harness commands, strict architecture boundaries, observability from day 1, and entropy-control audits for reliable autonomous agent runs.

2026-06-042

name

capx-agentic-robotics

description

CaP-X Agentic Robotics

LLM agents that write Python code to control real robots — from zero-shot manipulation to RL-trained coding agents with sim-to-real transfer.

Based on CaP-X (NVIDIA, Berkeley, Stanford, CMU). arXiv: 2603.22435. MIT License.

Quick Start

1. Clone and Install

git clone https://github.com/capgym/cap-x.git
cd cap-x

# Requires Python 3.10 (BEHAVIOR) or 3.12 (RL/LIBERO)
uv sync          # uses uv (Astral) for package management

2. Start Perception Services

CaP-X runs perception models as microservices:

# SAM3 segmentation (port 8114)
python -m capx.perception.sam3_server

# Molmo 2 pointing (port 8117)
python -m capx.perception.molmo_server

# ContactGraspNet 6-DOF grasps (port 8115)
python -m capx.perception.grasp_server

# OWL-ViT detection (port 8118)
python -m capx.perception.owlvit_server

Requires CUDA GPU. For IK solvers:

PyRoKi (port 8116) — CPU-friendly inverse kinematics
cuRobo — GPU-accelerated motion planning (NVIDIA, requires CUDA)

3. Run a Benchmark Task

# Single task evaluation (zero-shot, 100 trials)
python scripts/eval.py \
  --task cube_lift \
  --model openai/gpt-5.2 \
  --tier S1 \
  --num_trials 100

# Full CaP-Bench sweep
python scripts/eval_capbench.py --model anthropic/claude-opus-4.5

Architecture

CaP-X
├── CaP-Gym ─── Gymnasium interface wrapping 187 tasks
│   ├── RoboSuite (7 core) ── Franka Panda tabletop/bimanual
│   ├── LIBERO-PRO (130+) ── Franka Panda kitchen/living
│   └── BEHAVIOR (50) ────── R1Pro humanoid, Isaac Sim
│
├── CaP-Bench ── 8 tiers (S1-S4 single, M1-M4 multi-turn)
│   ├── Varies: perception noise, API abstraction, visual grounding
│   └── 12 frontier LLMs/VLMs benchmarked
│
├── CaP-Agent0 ── Training-free agentic harness
│   ├── Visual Differencing Module (VDM)
│   ├── Auto-synthesized skill library (Voyager lineage)
│   └── Parallel ensembled reasoning (multi-model)
│
└── CaP-RL ──── GRPO post-training on code generation
    ├── 7B model: 25% → 80% in 50 iterations
    └── Zero-shot sim-to-real transfer

Control Flow

The agent never outputs raw joint commands. Instead:

LLM → generates Python code → composes perception + control APIs → robot executes

API abstraction levels (Franka example):

FrankaControlApi — Full high-level (perception + IK control)
FrankaControlPrivilegedApi — Oracle state (no perception noise)
FrankaControlReducedApi — Low-level primitives
FrankaControlReducedSkillLibraryApi — Low-level + auto-synthesized skills

CaP-Bench Tiers

Tier	Mode	Perception	Abstraction	Visual Grounding
S1	Single	Noiseless	High	--
S2	Single	Noisy	High	--
S3	Single	Noisy	Low	--
S4	Single	Noisy	Low	--
M1	Multi	Noisy	High	--
M2	Multi	Noisy	High	Multimodal feedback
M3	Multi	Noisy	Low	--
M4	Multi	Noisy	Low	VDM

Run specific tiers:

python scripts/eval.py --task cube_stack --model google/gemini-3-pro --tier M4

CaP-Agent0: Training-Free Harness

Three components inspired by Voyager (Wang et al., 2023):

1. Visual Differencing Module (VDM)

Convert before/after scene images into structured text describing what changed. Solves cross-modal alignment failures found in M2 tier.

2. Auto-Synthesized Skill Library

Reusable functions discovered from successful execution traces. Persist across trials. Compilation pipeline:

# After running evaluations, compile skill library from successful traces
python scripts/skill_library_compilation/compile.py \
  --eval_outputs results/cube_lift/ \
  --output skills/

# Use compiled library in evaluation
python scripts/eval.py --task cube_stack --tier M4 --skill_library skills/

9 task-agnostic skills discovered (geometric utilities, grasp filters, quaternion helpers).

3. Parallel Ensembled Reasoning

Multiple models generate candidate solutions:

python scripts/eval_agent0.py \
  --task cube_stack \
  --models "google/gemini-3-pro,openai/gpt-5.2,anthropic/claude-opus-4.5" \
  --ensemble

Result: Matches/exceeds human expert code on 4/7 tasks. Competitive with trained VLA policies (OpenVLA, pi_0) despite being training-free.

CaP-RL: RL on Code Generation

GRPO (Group Relative Policy Optimization) applied to Qwen2.5-Coder-7B:

# Train on privileged tier S1 for stable convergence
python scripts/train_rl.py \
  --task cube_lift \
  --base_model Qwen/Qwen2.5-Coder-7B-Instruct \
  --group_size 15 \
  --train_iterations 50 \
  --gpu_type h100

Task	Base (7B)	+CaP-RL (50 iter)	Human Expert
Cube Lift (sim)	25%	80%	93%
Cube Stack (sim)	4%	44%	73%
Spill Wipe (sim)	30%	93%	100%
Cube Lift (real)	24%	84%	92%
Cube Stack (real)	12%	76%	84%

Transfer zero-shot to real robots because reasoning is over abstract APIs, not raw pixels.

See references/caprl-training.md for full GRPO configuration, GPU requirements, and training recipes.

Sim-to-Real Transfer

Simulation (CaP-Gym) ──[abstract APIs]──> Real Robot
                                          ├── Franka Panda (primary)
                                          ├── AgiBot G1 (bimanual demos)
                                          └── R1Pro (mobile manipulation)

Real-world requirements:

Stereo camera with calibrated metric-scale depth (e.g., ZED)
robots_realtime package for Franka Panda control
Same perception service stack (SAM3, Molmo, ContactGraspNet)

Extending CaP-X

Add a New Task

# Register in capx/envs/your_suite/
class MyTask(CaPGymTask):
    """Gymnasium-compatible task wrapper."""

    def __init__(self):
        self.perception = PerceptionStack(sam3=True, molmo=True, depth=True)
        self.control = FrankaControlApi(ik_solver="pyroki")

    def get_prompt(self) -> str:
        return "Pick up the red cube and place it on the green platform."

    def compute_reward(self, obs, action, next_obs) -> float:
        # Verifiable environment reward for RLVR
        return float(self._check_cube_on_platform(next_obs))

Add a New Robot

Implement the control API interface. See references/api-spec.md for the full specification:

goto_pose(pos, quat) — Move end-effector to target pose via IK
grasp() / open_gripper() / close_gripper() — Gripper control
get_ee_pose() — Current end-effector state

Add a Perception Module

Input: RGB image (+ optional depth)
Output: Structured detection/segmentation result
Follow the pattern in capx/perception/ servers

Integration with Life Agent OS

CaP-X connects to the Broomva Agent OS stack at three points:

Arcan Orchestration

Orchestrate multi-step robotic workflows as Arcan task graphs:

// Arcan pipeline for robotic manipulation
let pipeline = arcan::Pipeline::new()
    .step("perceive", capx_perception_step)   // SAM3 + Molmo
    .step("plan", capx_agent_step)            // LLM code generation
    .step("execute", capx_control_step)       // Robot actuation
    .step("verify", capx_vdm_step)            // Visual differencing
    .on_failure("reflect", capx_reflect_step) // Self-correction loop
    .build();

Spaces Agent Networking

Robotic agents publish events to Spaces channels:

#robot-logs    <- Execution traces, success/failure, skill library updates
#agent-logs    <- Session summaries (standard bstack integration)
#perception    <- Scene descriptions, object detections, grasp candidates

Multiple robot agents share skill libraries via Spaces — a skill discovered by one robot transfers to others.

Lago Persistence

Store and version:

Skill libraries — SHA-256 hashed Python functions as Lago blobs
Evaluation traces — Execution logs for CaP-RL training data
Benchmark results — CaP-Bench scores across models and tiers
Trained checkpoints — RL-tuned model weights

Companion Skills

# ORCA Hand (17-DOF tendon-driven hand — ETH Zurich)
npx skills add broomva/skills --skill orcahand -g -y

# Remote GPU (offload perception/training to NUC or cloud)
npx skills add broomva/skills --skill remote-gpu -g -y

Resources

references/

references/api-spec.md — Full perception and control API specification
references/capbench-results.md — Benchmark results across 12 models, 8 tiers
references/caprl-training.md — CaP-RL training guide (GRPO config, GPU requirements, recipes)

scripts/

scripts/setup-perception.sh — Start all perception microservices
scripts/run-benchmark.sh — Full CaP-Bench evaluation sweep

Key Papers

CaP-X: arXiv:2603.22435 — Framework paper
Voyager: arXiv:2305.16291 — Skill library + self-reflection origin
Code as Policies: arXiv:2209.07753 — LLM code to robot control
SayCan: arXiv:2204.01691 — Grounding LLMs in robotic affordances