一键在 Manus 中运行任何 Skill

开始使用

debug

星标1,129

分支133

更新时间2026年5月19日 23:30

Debug code bugs or Iris/Zephyr/TPU infrastructure faults with a structured debug log.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

marin-community

marin-community/marin

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

Skill: Debug

Systematic debugging for code-level bugs and Marin infrastructure faults. For infrastructure symptoms, route to the right OPS.md section first; for code bugs, keep a structured debug log.

Infrastructure faults

Read lib/iris/AGENTS.md or lib/zephyr/AGENTS.md for context, then follow the matching OPS.md section:

Symptom	Read
Stuck job, scheduling failure, resource leak, controller stalled	`lib/iris/OPS.md` → SQL Queries, Process Inspection & Profiling, Known Bugs, Troubleshooting
Iris task misbehaving, container inspection, profiling a running task	`lib/iris/OPS.md` → Task Operations, Process Inspection & Profiling
Zephyr pipeline slow / stragglers / data skew / worker failures	`lib/zephyr/OPS.md` → Diagnostic Patterns, Observability
TPU bad node (`No accelerator found`, `FAILED_PRECONDITION`, `Device or resource busy`)	`lib/iris/OPS.md` → TPU Bad-Node Recovery

Operational guardrails (never modify the controller DB, prefer iris process profile over SSH, never run a full iris cluster restart without approval) live next to the relevant commands in OPS.md — read those sections. After a TPU recovery or zephyr fix, return to the active babysit loop (babysit-job or babysit-zephyr).

Code bugs

For code-level bugs that are not infrastructure faults, maintain a debug log at docs/debug-log-<task-name>.md:

# Debugging log for <task>

<goal>

## Initial status
<initial status, as reported or observed>

## <Hypothesis N>
The suspected source of the bug, or a change needed to isolate it.

## Changes to make
Which files you are altering and how.

## Results
Test results and any new hypotheses. Repeat the Hypothesis/Results cycle as needed.

## Future work
- [ ] Cleanups observed along the way

同仓库更多 Skills

同仓库

commit

marin-community/marin

Lint, run the pre-PR checks, commit, push, and author or update the branch's pull request in the required plain-text format. Use when committing, pushing, or creating/updating a PR.

2026-06-201.1k

change-grug

marin-community/marin

Modify or upstream a Grug/Grugformer experiment variant.

2026-06-191.1k

evaluate-zephyr-perf

marin-community/marin

Run a perf gate on a PR that touches lib/zephyr internals.

2026-06-191.1k

organize-experiments

marin-community/marin

Curate the experiment report index at docs/reports/index.md.

2026-06-191.1k

triage-canary

marin-community/marin

Triage a failed canary ferry run (CI-invoked).

2026-06-191.1k

refresh-tpu-vllm-forks

marin-community/marin

Refresh Marin TPU-vLLM forks from a tpu-inference release/LKG pair, update exact SHA pins, run TPU smokes, and open the Marin PR.

2026-06-171.1k

name	debug
description	Debug code bugs or Iris/Zephyr/TPU infrastructure faults with a structured debug log.

Skill: Debug

Systematic debugging for code-level bugs and Marin infrastructure faults. For infrastructure symptoms, route to the right OPS.md section first; for code bugs, keep a structured debug log.

Infrastructure faults

Read lib/iris/AGENTS.md or lib/zephyr/AGENTS.md for context, then follow the matching OPS.md section:

Symptom	Read
Stuck job, scheduling failure, resource leak, controller stalled	`lib/iris/OPS.md` → SQL Queries, Process Inspection & Profiling, Known Bugs, Troubleshooting
Iris task misbehaving, container inspection, profiling a running task	`lib/iris/OPS.md` → Task Operations, Process Inspection & Profiling
Zephyr pipeline slow / stragglers / data skew / worker failures	`lib/zephyr/OPS.md` → Diagnostic Patterns, Observability
TPU bad node (`No accelerator found`, `FAILED_PRECONDITION`, `Device or resource busy`)	`lib/iris/OPS.md` → TPU Bad-Node Recovery

Code bugs

For code-level bugs that are not infrastructure faults, maintain a debug log at docs/debug-log-<task-name>.md:

# Debugging log for <task>

<goal>

## Initial status
<initial status, as reported or observed>

## <Hypothesis N>
The suspected source of the bug, or a change needed to isolate it.

## Changes to make
Which files you are altering and how.

## Results
Test results and any new hypotheses. Repeat the Hypothesis/Results cycle as needed.

## Future work
- [ ] Cleanups observed along the way