تشغيل أي مهارة في Manus بنقرة واحدة

ابدأ الآن

debug

النجوم١٬١٢٩

التفرعات١٣٣

آخر تحديث١٩ مايو ٢٠٢٦ في ٢٣:٣٠

Debug code bugs or Iris/Zephyr/TPU infrastructure faults with a structured debug log.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

marin-community

marin-community/marin

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مطوّرو البرمجياتمهن الحاسوب والرياضيات·SOC 15-1252

SKILL.md

readonly

name	debug
description	Debug code bugs or Iris/Zephyr/TPU infrastructure faults with a structured debug log.

Skill: Debug

Systematic debugging for code-level bugs and Marin infrastructure faults. For infrastructure symptoms, route to the right OPS.md section first; for code bugs, keep a structured debug log.

Infrastructure faults

Read lib/iris/AGENTS.md or lib/zephyr/AGENTS.md for context, then follow the matching OPS.md section:

Symptom	Read
Stuck job, scheduling failure, resource leak, controller stalled	`lib/iris/OPS.md` → SQL Queries, Process Inspection & Profiling, Known Bugs, Troubleshooting
Iris task misbehaving, container inspection, profiling a running task	`lib/iris/OPS.md` → Task Operations, Process Inspection & Profiling
Zephyr pipeline slow / stragglers / data skew / worker failures	`lib/zephyr/OPS.md` → Diagnostic Patterns, Observability
TPU bad node (`No accelerator found`, `FAILED_PRECONDITION`, `Device or resource busy`)	`lib/iris/OPS.md` → TPU Bad-Node Recovery

Operational guardrails (never modify the controller DB, prefer iris process profile over SSH, never run a full iris cluster restart without approval) live next to the relevant commands in OPS.md — read those sections. After a TPU recovery or zephyr fix, return to the active babysit loop (babysit-job or babysit-zephyr).

Code bugs

For code-level bugs that are not infrastructure faults, maintain a debug log at docs/debug-log-<task-name>.md:

# Debugging log for <task>

<goal>

## Initial status
<initial status, as reported or observed>

## <Hypothesis N>
The suspected source of the bug, or a change needed to isolate it.

## Changes to make
Which files you are altering and how.

## Results
Test results and any new hypotheses. Repeat the Hypothesis/Results cycle as needed.

## Future work
- [ ] Cleanups observed along the way

المزيد من هذا المستودع

نفس المستودع

commit

marin-community/marin

Lint, run the pre-PR checks, commit, push, and author or update the branch's pull request in the required plain-text format. Use when committing, pushing, or creating/updating a PR.

2026-06-201.1k

change-grug

marin-community/marin

Modify or upstream a Grug/Grugformer experiment variant.

2026-06-191.1k

evaluate-zephyr-perf

marin-community/marin

Run a perf gate on a PR that touches lib/zephyr internals.

2026-06-191.1k

organize-experiments

marin-community/marin

Curate the experiment report index at docs/reports/index.md.

2026-06-191.1k

triage-canary

marin-community/marin

Triage a failed canary ferry run (CI-invoked).

2026-06-191.1k

refresh-tpu-vllm-forks

marin-community/marin

Refresh Marin TPU-vLLM forks from a tpu-inference release/LKG pair, update exact SHA pins, run TPU smokes, and open the Marin PR.

2026-06-171.1k

name	debug
description	Debug code bugs or Iris/Zephyr/TPU infrastructure faults with a structured debug log.

Skill: Debug

Systematic debugging for code-level bugs and Marin infrastructure faults. For infrastructure symptoms, route to the right OPS.md section first; for code bugs, keep a structured debug log.

Infrastructure faults

Read lib/iris/AGENTS.md or lib/zephyr/AGENTS.md for context, then follow the matching OPS.md section:

Symptom	Read
Stuck job, scheduling failure, resource leak, controller stalled	`lib/iris/OPS.md` → SQL Queries, Process Inspection & Profiling, Known Bugs, Troubleshooting
Iris task misbehaving, container inspection, profiling a running task	`lib/iris/OPS.md` → Task Operations, Process Inspection & Profiling
Zephyr pipeline slow / stragglers / data skew / worker failures	`lib/zephyr/OPS.md` → Diagnostic Patterns, Observability
TPU bad node (`No accelerator found`, `FAILED_PRECONDITION`, `Device or resource busy`)	`lib/iris/OPS.md` → TPU Bad-Node Recovery

Code bugs

For code-level bugs that are not infrastructure faults, maintain a debug log at docs/debug-log-<task-name>.md:

# Debugging log for <task>

<goal>

## Initial status
<initial status, as reported or observed>

## <Hypothesis N>
The suspected source of the bug, or a change needed to isolate it.

## Changes to make
Which files you are altering and how.

## Results
Test results and any new hypotheses. Repeat the Hypothesis/Results cycle as needed.

## Future work
- [ ] Cleanups observed along the way