一键在 Manus 中运行任何 Skill

$pwd:

auto-research-blueprint-execute-swarm

Name: Auto Research Blueprint Execute Swarm
Author: modelscope

// Execute AgentJet reinforcement learning experiments using experiment blueprints in swarm mode. Handles full lifecycle: generate blueprint if needed, launch experiment in tmux, monitor progress, analyze errors, collect results, and write finish flag. Use when the user wants to run or debug AgentJet training experiments.

在 Manus 中运行

$ git log --oneline --stat

stars:209

forks:22

updated:2026年5月13日 05:03

SKILL.md

readonly

related-skills.json

同仓库

auto-research-blueprint-execute-classic.md

from "modelscope/AgentJet"

Execute AgentJet reinforcement learning experiments using experiment blueprints in classic (non-swarm) mode. Handles full lifecycle: launch experiment in tmux, monitor progress, analyze errors, collect results, and write finish flag. Use when the user wants to run AgentJet training experiments without the swarm distributed framework.

2026-05-13209

conda-install-agentjet-swarm-server.md

from "modelscope/AgentJet"

Install AgentJet swarm server using Conda. Handles Python 3.10 environment creation, dependency installation with the verl training backbone, flash-attn compilation, and optional PyPI mirror for China users.

2026-05-13209

docker-install-agentjet-swarm-server.md

from "modelscope/AgentJet"

Install and run the AgentJet Swarm Server in a Docker container with NVIDIA GPU support. Use when the user wants to deploy a swarm server on a GPU machine via Docker, including GPU driver setup, Docker mirror configuration, model weight mounting, and server startup.

2026-05-13209

download-from-swanlab-url.md

from "modelscope/AgentJet"

Download per-step time-series metric data (reward, entropy, response length, etc.) from a SwanLab cloud run URL as a pandas.DataFrame. Use when the user provides a SwanLab URL and wants to fetch or analyze training curves.

2026-05-13209

install-agentjet-client.md

from "modelscope/AgentJet"

Install AgentJet client for connecting to a swarm server. Use when the user only needs to run the AgentJet client (not a swarm server) and does not need to run models locally, e.g. on a laptop. Installs basic requirements via `pip install -e .`.

2026-05-13209

map-verl-config.md

from "modelscope/AgentJet"

Map VERL training configuration to AgentJet configuration. Find VERL config in verl_default.yaml, check for existing mappings in config_auto_convertion_verl.jsonc, add new mappings to ajet_default.yaml and the conversion schema, and optionally add parameters to AgentJetJob.

2026-05-13209

package.json

"author": "modelscope"

"repository": "modelscope/AgentJet"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

数据科学家计算机与数学类职业15-2051L4

# Experiment Blueprint ## [exp_purpose] - description: - hint: - content 1: - content 2: - content 3: ## [exp_codebase_dir] - description: - hint: - content 1: - content 2: - content 3: - warning 1: - warning 2: ## [exp_venv_exe] - description: - hint: - content 1: - content 2: - content 3: - warning 1: - warning 2: ## [exp_yaml_path] - description: - hint: - content 1: - content 2: - content 3: - warning 1: - warning 2: ## [exp_launch_command] - description: - hint: - content 1: - content 2: - content 3: - warning 1: - warning 2: ## [exp_result_dir] - description: - hint: - content 1: - content 2: - content 3: - warning 1: - warning 2: ## [exp_max_time] - description: - hint: - content 1: - content 2: - content 3: - warning 1: - warning 2: ## Other Notes - description: - note 1: - note 2: - note 3: - note 4: - note 5: ....

--- name: monitor-with-tmux description: 通过指数退避间隔（30秒、1分钟、2分钟、4分钟、8分钟、16分钟）读取tmux内容来监控训练进度，在出现异常时分析日志，并提供修复建议 license: 完整条款见 LICENSE.txt --- # 使用 Tmux 监控在 tmux 中监控，检测异常，分析错误，提供修复建议。 ## 步骤零创建用于 tmux 监控的睡眠脚本： 1. 创建 `./tmp/wait_tmux.py` ```python import argparse import subprocess import time SHELLS = {"bash", "zsh", "sh", "fish", "csh", "tcsh", "ksh", "dash", "ash"} def smart_sleep(session: str, seconds: float, check_every: float = 2.0) -> bool: """ 替代 time.sleep()，但在命令结束时提前返回。 Returns: True - 正常超时（命令还在跑） False - 提前返回（命令结束了或session没了） """ end_time = time.time() + seconds while time.time() < end_time: try: r = subprocess.run( ["tmux", "list-panes", "-F", "#{pane_current_command}", "-t", session], capture_output=True, text=True, timeout=5 ) if r.returncode != 0: return False # session没了 cmds = [l.strip().lower() for l in r.stdout.splitlines() if l.strip()] if not any(c not in SHELLS for c in cmds): return False # 命令结束了，回到shell except Exception: return False time.sleep(min(check_every, end_time - time.time())) return True def main(): parser = argparse.ArgumentParser(description="Wait for a tmux session with smart early-exit.") parser.add_argument("session", help="tmux session name") parser.add_argument("seconds", type=float, help="total seconds to wait") args = parser.parse_args() timed_out = smart_sleep(args.session, args.seconds, 2) raise SystemExit(0 if timed_out else 1) if __name__ == "__main__": main() ``` ## 开始监控当你需要监控一个 tmux 窗口时，运行： ```bash python ./tmp/tmux_wait.py my_ajet_session_name 30 ``` 意思是： 1. 监控名字为 my_ajet_session_name 的 tmux session 2. 等待30秒 - 退出码 0：正常超时（命令仍在运行） - 退出码 1：命令提前结束或 session 消失 ## 使用 SSH 使用 SSH 时，始终使用本地 tmux 窗口来建立 SSH 连接。 ## 当你想在再次读取 tmux 窗口前延迟一段时间必须要具备出错即时返回功能，即不能使用 `sleep xxx`，而是使用 `python ./tmux_wait.py my_ajet_session_name xxx` 不要使用：`sleep 60 && tmux capture-pane -t my_ajet_session_name -p | tail -80` 你应该使用：`python ./tmux_wait.py my_ajet_session_name 30 && tmux capture-pane -t my_ajet_session_name -p | tail -80` - 60秒太长了 - 始终使用 `python ./tmux_wait.py` 来等待 ## 示例： ### 不使用 SSH 的示例 0 个示例可用 ### 使用 SSH 的示例 1 个示例可用 ```agent --- example user instruction begin --- login ssh root@localhost -p 8824 goto /mnt/data_cpfs/qingxu.fu/agentjet/good-luck-agentjet run with venv and monitor source .venv/bin/activate && python -m ajet.launcher --conf tests/bench/benchmark_math/benchmark_math.yaml --autokill --- example user instruction end --- --- example agent reaction begin --- 用户希望我监控一个 tmux session 同时运行训练任务。让我分解他们要求的内容： 1. SSH 到远程机器（root@localhost -p 8824） 2. 转到特定目录 3. 激活虚拟环境并运行 Python 命令 4. 监控 tmux session # 编写 ./tmp/tmux_wait.py .... # 创建用于训练的 tmux session (注意，无论如何，session名字中必须包含关键字 `ajet` ) $ tmux new-session -d -s ajet_session -c "/mnt/data_cpfs/qingxu.fu/agentjet/good-luck-agentjet" # # 注意: 绝对不能在`tmux new-session`后面添加命令，否则 session 会很容易退出 ... # 发送 SSH 命令到 tmux session $ tmux send-keys -t ajet_session "ssh root@localhost -p 8824" Enter ... # 等待并检查 SSH 连接 $ python3 /tmp/tmux_wait.py ajet_session 10 && tmux capture-pane -t ajet_session -p | tail -20 SSH 连接成功。现在我需要导航到目录并运行训练命令。让我发送命令到 tmux session。 ... # 导航到工作目录 $ tmux send-keys -t ajet_session "cd /mnt/data_cpfs/qingxu.fu/agentjet/good-luck-agentjet" Enter ... # 使用 venv 和 launcher 启动训练 $ tmux send-keys -t ajet_session "source .venv/bin/activate && python -m ajet.launcher --conf tests/bench/benchmark_math/benchmark_math.yaml --autokill" Enter ... # 捕获初始训练输出 $ tmux capture-pane -t ajet_session -p | tail -40 ... # 等待30秒并检查训练进度 $ python3 /tmp/tmux_wait.py ajet_session 30 && tmux capture-pane -t ajet_session -p | tail -50 ... 训练正在运行。命令正在正确执行。让我继续以1分钟间隔监控。 # 等待1分钟并检查训练进度 $ python3 /tmp/tmux_wait.py ajet_session 60 && tmux capture-pane -t ajet_session -p | tail -50 ... # 等待2分钟并检查训练进度 $ python3 /tmp/tmux_wait.py ajet_session 120 && tmux capture-pane -t ajet_session -p | tail -60 ... 训练进展良好。检查点已加载，vLLM 服务器已启动，swanlab 正在跟踪运行。目前没有错误，只有警告。让我继续监控。 # 等待4分钟并检查训练进度 $ python3 /tmp/tmux_wait.py ajet_session 240 && tmux capture-pane -t ajet_session -p | tail -60 ... ... 多轮监控 ... --- example agent reaction end --- # 销毁 tmux session tmux kill-session -t ajet_session ```

auto-research-blueprint-execute-swarm

你的任务

实验蓝图：

YAML 配置内容提示：

使用tmux运行实验

不要轻易中止进行中的实验

监控实验的技能

你的任务

实验蓝图：

YAML 配置内容提示：

使用tmux运行实验

不要轻易中止进行中的实验

监控实验的技能

name	auto-research-blueprint-execute-swarm
description	Execute AgentJet reinforcement learning experiments using experiment blueprints in swarm mode. Handles full lifecycle: generate blueprint if needed, launch experiment in tmux, monitor progress, analyze errors, collect results, and write finish flag. Use when the user wants to run or debug AgentJet training experiments.

auto-research-blueprint-execute-swarm

同仓库更多 Skills

同仓库更多 Skills

你的任务

实验蓝图：

YAML 配置内容提示：

使用tmux运行实验

不要轻易中止进行中的实验

监控实验的技能

你的任务

实验蓝图：

YAML 配置内容提示：

使用tmux运行实验

不要轻易中止进行中的实验

监控实验的技能