一键导入
team-massgen-run
// Launch parallel MassGen step mode processes — the lead manages all background tasks directly, tracks answers/votes, detects consensus, and synthesizes the result.
// Launch parallel MassGen step mode processes — the lead manages all background tasks directly, tracks answers/votes, detects consensus, and synthesizes the result.
Quick one-shot evaluation of work against quality criteria. Generates criteria, runs the round-evaluator, and presents the verdict. Lighter than /refine — does not auto-improve, just evaluates. Use for pre-PR quality checks or getting a critical assessment.
Launch a MassGen multi-agent run. Multiple LLM backends (codex, gemini, claude, grok) collaborate on a task via voting and consensus. Runs in Docker containers by default.
Run a MassGen-style quality refinement loop. Generates eval criteria, produces/improves answers with subagent help, regression-guards before submission, then evaluates with round-evaluator and trace-analyzer in parallel. Invoke with /refine.
Analyze media files (images, video, audio) using AI vision and audio models with a critical-first lens. Use for output-first verification — render your deliverable, then read_media to see what it actually looks like. Distinguishes fundamental issues from surface-level fixes.
| name | team-massgen-run |
| description | Launch parallel MassGen step mode processes — the lead manages all background tasks directly, tracks answers/votes, detects consensus, and synthesizes the result. |
| user-invocable | true |
| argument-hint | <query> [with codex, gemini, grok, claude] [--no-docker] [--max-rounds N] [--timeout N] [--cost-budget N] |
| allowed-tools | ["Read","Write","Edit","Bash","Glob","Grep","Agent"] |
You are orchestrating a multi-model run. You are the lead and sole orchestrator. You launch all MassGen step mode processes as background tasks, track their results, detect consensus, and synthesize the final answer. No Agent Teams, no teammates — you manage everything directly.
.massgen-quality/environment.json — verify massgen.availableapi_keys in environment.json — verify which providers have authopenai/* or codex/* → OPENAI_API_KEYgemini/* → GOOGLE_API_KEY or GEMINI_API_KEYclaude/* or claude_code/* → ANTHROPIC_API_KEYgrok/* → XAI_API_KEYmassgen --step is supported (read plugin_dir from environment.json):
bash "<plugin_dir>/scripts/run-massgen.sh" --help 2>&1 | grep -q "\-\-step"
query: The task (required)models: Which backends to use. Can be specified naturally — no flags needed.--no-docker: Use local execution (default: Docker if available)--max-rounds: Max rounds per agent (default: 5)--timeout: Total timeout in seconds (default: 1800)--cost-budget: Maximum total cost in dollars across all agents (no default — unlimited if unset)Models can be mentioned anywhere in the prompt naturally. All of these work:
/team-massgen-run "Build a landing page"
/team-massgen-run "Build a landing page" with codex, gemini, and grok
/team-massgen-run "Build a landing page" using codex and claude
/team-massgen-run "Build a landing page" --models codex gemini
Short names expand to default models:
| Short name | Expands to |
|---|---|
codex | codex/gpt-5.4 |
gemini | gemini/gemini-3.1-pro-preview |
claude | claude/claude-opus-4-6 |
grok | grok/grok-4.20-0309-reasoning |
Full specs like codex/gpt-5.4 are also accepted for specific model versions.
Read api_keys from .massgen-quality/environment.json and pick up to 3 available backends:
codex/gpt-5.4gemini/gemini-3.1-pro-previewclaude/claude-opus-4-6grok/grok-4.20-0309-reasoningPick the first 3 that have keys. If fewer than 2 have keys, abort.
Create team session directory (ID format: team_<YYYYMMDD_HHMMSS>):
.massgen-quality/sessions/<team_id>/
config.json
session/agents/ (empty)
configs/ (generated per-step configs)
synthesis/ (final answer)
Write config.json:
{
"session_id": "team_<timestamp>",
"query": "<query>",
"models": ["codex/gpt-5.4", "gemini/gemini-3.1-pro-preview"],
"agent_mapping": {
"agent_a": "codex/gpt-5.4",
"agent_b": "gemini/gemini-3.1-pro-preview"
},
"max_rounds": 5,
"timeout_seconds": 1800
}
Read plugin_dir from .massgen-quality/environment.json.
For each backend, generate a config using the massgen wrapper:
bash "<plugin_dir>/scripts/run-massgen.sh" \
--quickstart --headless \
--config .massgen-quality/sessions/<team_id>/configs/<agent_id>_step.yaml \
--config-backend <type> \
--config-model <model> \
--config-agent-id <agent_id>
Add --config-docker if Docker is available. Split the backend spec on /
(e.g., codex/gpt-5.4 → --config-backend codex --config-model gpt-5.4).
Use absolute paths for --config.
Generate evaluation criteria (via generate_eval_criteria MCP tool) and
convert to massgen's --eval-criteria format. The MCP tool outputs
{id, name, description, weight} but massgen expects {text, category}.
Convert each criterion:
{"text": "<description>", "category": "must"}
Write the converted array to .massgen-quality/sessions/<team_id>/session/eval_criteria.json.
Massgen --eval-criteria format (required fields):
[
{"text": "criterion description", "category": "must"},
{"text": "another criterion", "category": "should", "verify_by": "how to check"}
]
text: description of the criterion (required)category: "must" | "should" | "could" (required)verify_by: how to verify (optional)For each agent, launch massgen --step as a background task. All paths must
be absolute. Step mode runs take 5-30 minutes, so you MUST use
run_in_background: true on the Bash tool.
Launch all N processes simultaneously in a single message with multiple Bash tool calls:
bash "<plugin_dir>/scripts/run-massgen.sh" \
--step \
--session-dir "<session_dir>" \
--config "<config_path>" \
--eval-criteria "<criteria_path>" \
--automation \
"<query>"
The wrapper handles API key sourcing and massgen resolution automatically. Record the background task ID for each agent. You will be automatically notified when each completes — no polling needed.
As each background task completes (you receive auto-notification):
SESSION_DIR: /path/to/session
ACTION: new_answer|vote
STATUS: /path/to/session/agents/<agent_id>/last_action.json
<session_dir>/agents/<agent_id>/last_action.jsonagent_a: { status: "answered", latest_answer_step: 1 }
agent_b: { status: "voted", latest_answer_step: null, voted_for: "agent_a", seen_steps: {...} }
Wait until ALL agents have completed their step before checking consensus.
At any point, run the team status script for a formatted overview:
bash scripts/team-status.sh <session_dir> --config <config.json>
This shows: agent table, answer previews, vote graph, consensus status, and a timeline swimlane of all rounds.
CRITICAL: Only check consensus when ALL agents have completed their step.
If any agent submitted a new_answer this round:
vote.json → check seen_steps against current answer stepsseen_steps[X] < current answer step for agent X → vote is staleOnly when ALL agents have voted with non-stale votes:
If no consensus or stale votes exist:
If max_rounds, timeout_seconds, or --cost-budget reached:
When an agent submits a new_answer:
new_answer this round waitThis mirrors MassGen's production restart_pending logic.
After consensus:
agents/<winner>/last_action.json → note workspace_pathsession_dir/agents/<winner>/<step>/workspace/
(workspace paths in answer text point to this location — auto-replaced on save).massgen-quality/sessions/<team_id>/synthesis/final_answer.mdcost from each agent's last_action.json to get
cumulative spendtimeout_seconds reached: force consensus using the answer with the most
votes; if no votes, use the first answer submittedmax_rounds reached: same forced consensus logic--cost-budget exceeded: same forced consensus logic — report the total
spend to the userRead these for protocol details:
skills/team-massgen-run/references/step-mode.md — MassGen step mode interfaceskills/team-massgen-run/references/team-protocol.md — Lead-managed orchestration protocolsession_dir/agents/ directly — only massgen writes there