Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

autoresearch-debug

Name: Autoresearch Debug
Author: wjgoarxiv

// Scientific bug hunting using falsifiable hypotheses. Forms hypotheses, designs falsifying tests, eliminates candidates systematically, and logs the full investigation trail in a structured debug/ folder. TRIGGER when: user has a bug to investigate scientifically; user wants systematic root-cause analysis; user says "debug", "investigate", "root cause", "why is this failing"; user invokes /autoresearch:debug. DO NOT TRIGGER when: user wants to optimize a metric (use /autoresearch); user wants to fix a known error automatically (use /autoresearch:fix); user just wants a quick one-line answer about what a function does.

Ejecutar en Manus

$ git log --oneline --stat

stars:16

forks:2

updated:5 de abril de 2026, 06:56

Explorador de archivos

2 archivos

SKILL.md

readonly

name	autoresearch:debug
description	Scientific bug hunting using falsifiable hypotheses. Forms hypotheses, designs falsifying tests, eliminates candidates systematically, and logs the full investigation trail in a structured debug/ folder. TRIGGER when: user has a bug to investigate scientifically; user wants systematic root-cause analysis; user says "debug", "investigate", "root cause", "why is this failing"; user invokes /autoresearch:debug. DO NOT TRIGGER when: user wants to optimize a metric (use /autoresearch); user wants to fix a known error automatically (use /autoresearch:fix); user just wants a quick one-line answer about what a function does.
allowed-tools	["Read","Write","Edit","Bash","WebFetch","WebSearch"]

autoresearch:debug — Scientific Bug Investigation

Root-cause analysis using the scientific method: form falsifiable hypotheses, design tests that could disprove them, eliminate candidates, and converge on confirmed root causes. Every step is logged — nothing is assumed, nothing is skipped.

Core Principle

A hypothesis is only useful if it can be falsified. "Something is wrong" is not a hypothesis. "The cache is returning stale data because the TTL is not being reset on write" is a hypothesis — it can be tested and disproved.

Output Structure

Create debug/ in the working directory (or alongside the failing artifact):

File	Purpose
`debug/hypotheses.md`	Active candidates under investigation
`debug/eliminated.md`	Ruled-out hypotheses with proof of elimination
`debug/findings.md`	Confirmed root causes with reproduction case

Initialize all three files before the first iteration.

Investigation Loop

Repeat until stop condition:

[Observe] --> [Hypothesize] --> [Design Test] --> [Run Test] --> [Update] --> [Log]
     ^                                                                          |
     |__________________________________________________________________________|

Stage 1 — Observe

Collect all available evidence before forming any hypothesis:

Read error messages, stack traces, logs — verbatim, not paraphrased
Identify: What is the symptom? When does it appear? When does it NOT appear?
Identify: What changed recently? (git log, config changes, dependency updates)
Identify: Is the bug deterministic or intermittent?

Write a symptom summary at the top of debug/findings.md:

## Symptom
[Exact error message or behavior]

## Observed conditions
- Occurs: [when/where]
- Does NOT occur: [contrasting case if known]
- First observed: [commit/date/event]

Stage 2 — Hypothesize

Form at least 2 candidate hypotheses before testing any of them. More candidates = less confirmation bias.

Hypothesis format:

H-N: [X] causes [Y] because [Z].
  Test: [specific action that would disprove this if false]
  Confidence: low | medium | high

Example:

H-1: The database connection pool is exhausted because max_connections is set too low.
  Test: Print active connection count during the failure window. If count < max_connections, this hypothesis is false.
  Confidence: medium

H-2: The timeout is triggered by a slow DNS lookup, not the actual request.
  Test: Replace hostname with IP address in the connection string. If bug disappears, H-2 is confirmed.
  Confidence: low

Write all active hypotheses to debug/hypotheses.md.

Prioritization rule: Test high-confidence, low-cost hypotheses first. A cheap test that eliminates a hypothesis is more valuable than an expensive test that confirms one.

Stage 3 — Design Falsifying Test

For each hypothesis under test, design the minimal experiment that could disprove it.

Ask: "If this hypothesis is FALSE, what would I observe?"

Design the test to produce that observable. If the test does NOT produce the falsifying observation, the hypothesis survives (not confirmed — survives).

Good test design:

Changes exactly one variable
Has a clear pass/fail criterion defined BEFORE running
Can be run in under 5 minutes
Leaves the system in the same state it started (reversible)

Stage 4 — Run Test

Execute the test. Record:

Exact command(s) run
Exact output (stdout, stderr, exit code)
Whether the falsifying observation was produced

Do not interpret yet. Record what happened, literally.

Stage 5 — Update Hypothesis Set

If the test produced the falsifying observation:

The hypothesis is eliminated
Move it from debug/hypotheses.md to debug/eliminated.md
Record why: "H-1 eliminated: connection count was 3/100 during failure — pool exhaustion is not the cause"

If the test did NOT produce the falsifying observation:

The hypothesis survives
Increase its confidence level
Narrow its scope: "H-2 now more specifically: slow DNS on IPv6 lookups only"

If the test produced unexpected output:

Add a new hypothesis to explain the unexpected result
Do not discard it — unexpected observations are often the path to the root cause

Stage 6 — Log

After each iteration, update all three files:

debug/hypotheses.md — current active candidates (sorted by confidence, high first) debug/eliminated.md — append the eliminated hypothesis with proof debug/findings.md — append iteration summary

Then immediately begin Stage 1 of the next iteration.

Stop Conditions

Success: Root cause confirmed — stop when ALL of these are true:

At least one hypothesis has been confirmed (not just survived — confirmed by a positive test)
A minimal reproduction case exists: the smallest code/config/command that triggers the bug
The fix has been identified (even if not yet implemented)

Write confirmed root cause to debug/findings.md:

## Root Cause (Confirmed)
[Hypothesis text]

## Evidence
[Test that confirmed it]

## Reproduction case
[Minimal steps/code to reproduce]

## Proposed fix
[What needs to change]

## Next step
Run `/autoresearch:fix` to implement and verify the fix iteratively.

Budget exhausted: If max_iterations is reached without confirmation, write a partial findings report with the strongest surviving hypothesis and the evidence collected so far.

Stuck Protocol

If more than 3 iterations pass without eliminating any hypothesis:

Do not continue making minor variations. Switch observation technique entirely.

Available technique changes (see investigation-techniques.md for details):

Currently using log analysis → switch to bisect (binary search over commits)
Currently using bisect → switch to minimal reproduction (shrink the failing case)
Currently using code reading → switch to instrumentation (add logging/metrics)
Currently using instrumentation → switch to differential diagnosis (compare working vs broken)
Currently using one environment → switch to strace/dtrace (system-call level)

Log the technique switch:

## TECHNIQUE SWITCH — Iteration N
- Previous technique: [name]
- Reason: 3 iterations, 0 hypotheses eliminated
- New technique: [name]
- Rationale: [why this technique is more likely to produce new evidence]

Autonomy Directive

Once the investigation begins:

Do not stop to ask permission. Form hypotheses and run tests autonomously.
Do not summarize and wait. After logging an iteration, begin the next one.
Do not declare "root cause" without positive confirmation. Surviving a test is not confirmation.
Do not skip the log step. Every iteration must update all three files.

The only valid stops are: root cause confirmed, or budget exhausted.

Initial Setup

When /autoresearch:debug is invoked:

Ask: "What is the bug? Paste the error message or describe the failing behavior."
Ask: "What is the last known good state? (last commit that worked, last config that worked)"
Ask: "How many investigation iterations should I run before stopping? (default: 15)"
Create debug/ directory and initialize the three files.
Begin Stage 1 — Observe.

Do not ask more questions after setup. The investigation is autonomous from Stage 1 onward.

related-skills.json

mismo repositorio

autoresearch-skill.md

from "wjgoarxiv/autoresearch-skill"

Autonomous research and experimentation toolkit with 10 commands. Core loop inspired by Karpathy's autoresearch — generalizes to any domain with mechanical evaluation, overnight persistence, and zero dependencies. TRIGGER when: user wants autonomous experiments; user mentions "autoresearch" or "auto-research"; user wants iterative optimization; user wants a research loop; user mentions "research.md"; user wants to iterate until some condition; user wants to optimize code, prompts, configs, or parameters iteratively; user invokes any /autoresearch:* subcommand. DO NOT TRIGGER when: user wants a one-shot answer; user wants manual step-by-step guidance; user just wants to read a single paper; user wants a simple web search.

2026-04-0516

pdf.md

from "wjgoarxiv/autoresearch-skill"

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When the LLM (Claude, ChatGPT, Gemini, or others) needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

2026-04-0516

autoresearch.md

from "wjgoarxiv/autoresearch-skill"

Core autonomous research loop. Reads research.md, proposes hypotheses, runs experiments, evaluates results mechanically, keeps improvements, discards failures, and iterates until the target metric is achieved or the iteration budget is exhausted. TRIGGER when: user invokes "autoresearch" (no subcommand); research.md exists; user wants the 5-stage loop; user wants iterative optimization overnight.

2026-04-0516

autoresearch-fix.md

from "wjgoarxiv/autoresearch-skill"

Iterative error-crusher loop that auto-stops at 0 errors. Cascade-aware: fixes dependency errors before their dependents. Refuses anti-patterns that hide errors instead of fixing them. TRIGGER when: user has errors or failures to fix iteratively; user asks to "fix all errors"; user has a failing test suite; user has compilation errors; user has linter errors; user wants systematic error elimination; user invokes /autoresearch:fix. DO NOT TRIGGER when: user wants a one-shot fix for a single obvious bug; user wants debugging guidance only; user wants code review without fixing.

2026-04-0516

autoresearch-plan.md

from "wjgoarxiv/autoresearch-skill"

7-step setup wizard that produces a complete, ready-to-run research.md without executing the research loop. Walks the user through goal, metric, search space, constraints, evaluator design, and baseline measurement, then writes the file. TRIGGER when: user wants to set up a research project; user wants to plan before running the loop; user says "plan my research"; user has a goal but no research.md; user invokes /autoresearch:plan. DO NOT TRIGGER when: research.md already exists and the user wants to run the loop; user wants a one-shot answer; user wants to debug, not optimize.

2026-04-0516

autoresearch-predict.md

from "wjgoarxiv/autoresearch-skill"

Multi-perspective deliberation engine. Gathers independent positions from diverse personas, runs cross-examination and rebuttal rounds, detects herd behavior, and synthesizes a neutral judge verdict with confidence levels. TRIGGER when: user wants multi-perspective prediction, forecasting, scenario analysis, decision analysis, "what will happen if", "should we", "predict the outcome of", structured devil's advocacy, or any question benefiting from adversarial deliberation.

2026-04-0516

package.json

"author": "wjgoarxiv"

"repository": "wjgoarxiv/autoresearch-skill"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Analistas de garantía de calidad de software y probadoresOcupaciones informáticas y matemáticas15-1253L4

name	autoresearch:debug
description	Scientific bug hunting using falsifiable hypotheses. Forms hypotheses, designs falsifying tests, eliminates candidates systematically, and logs the full investigation trail in a structured debug/ folder. TRIGGER when: user has a bug to investigate scientifically; user wants systematic root-cause analysis; user says "debug", "investigate", "root cause", "why is this failing"; user invokes /autoresearch:debug. DO NOT TRIGGER when: user wants to optimize a metric (use /autoresearch); user wants to fix a known error automatically (use /autoresearch:fix); user just wants a quick one-line answer about what a function does.
allowed-tools	["Read","Write","Edit","Bash","WebFetch","WebSearch"]

autoresearch:debug — Scientific Bug Investigation

Core Principle

Output Structure

Create debug/ in the working directory (or alongside the failing artifact):

File	Purpose
`debug/hypotheses.md`	Active candidates under investigation
`debug/eliminated.md`	Ruled-out hypotheses with proof of elimination
`debug/findings.md`	Confirmed root causes with reproduction case

Initialize all three files before the first iteration.

Investigation Loop

Repeat until stop condition:

[Observe] --> [Hypothesize] --> [Design Test] --> [Run Test] --> [Update] --> [Log]
     ^                                                                          |
     |__________________________________________________________________________|

Stage 1 — Observe

Collect all available evidence before forming any hypothesis:

Read error messages, stack traces, logs — verbatim, not paraphrased
Identify: What is the symptom? When does it appear? When does it NOT appear?
Identify: What changed recently? (git log, config changes, dependency updates)
Identify: Is the bug deterministic or intermittent?

Write a symptom summary at the top of debug/findings.md:

## Symptom
[Exact error message or behavior]

## Observed conditions
- Occurs: [when/where]
- Does NOT occur: [contrasting case if known]
- First observed: [commit/date/event]

Stage 2 — Hypothesize

Form at least 2 candidate hypotheses before testing any of them. More candidates = less confirmation bias.

Hypothesis format:

H-N: [X] causes [Y] because [Z].
  Test: [specific action that would disprove this if false]
  Confidence: low | medium | high

Example:

H-1: The database connection pool is exhausted because max_connections is set too low.
  Test: Print active connection count during the failure window. If count < max_connections, this hypothesis is false.
  Confidence: medium

H-2: The timeout is triggered by a slow DNS lookup, not the actual request.
  Test: Replace hostname with IP address in the connection string. If bug disappears, H-2 is confirmed.
  Confidence: low

Write all active hypotheses to debug/hypotheses.md.

Prioritization rule: Test high-confidence, low-cost hypotheses first. A cheap test that eliminates a hypothesis is more valuable than an expensive test that confirms one.

Stage 3 — Design Falsifying Test

For each hypothesis under test, design the minimal experiment that could disprove it.

Ask: "If this hypothesis is FALSE, what would I observe?"

Design the test to produce that observable. If the test does NOT produce the falsifying observation, the hypothesis survives (not confirmed — survives).

Good test design:

Changes exactly one variable
Has a clear pass/fail criterion defined BEFORE running
Can be run in under 5 minutes
Leaves the system in the same state it started (reversible)

Stage 4 — Run Test

Execute the test. Record:

Exact command(s) run
Exact output (stdout, stderr, exit code)
Whether the falsifying observation was produced

Do not interpret yet. Record what happened, literally.

Stage 5 — Update Hypothesis Set

If the test produced the falsifying observation:

The hypothesis is eliminated
Move it from debug/hypotheses.md to debug/eliminated.md
Record why: "H-1 eliminated: connection count was 3/100 during failure — pool exhaustion is not the cause"

If the test did NOT produce the falsifying observation:

The hypothesis survives
Increase its confidence level
Narrow its scope: "H-2 now more specifically: slow DNS on IPv6 lookups only"

If the test produced unexpected output:

Add a new hypothesis to explain the unexpected result
Do not discard it — unexpected observations are often the path to the root cause

Stage 6 — Log

After each iteration, update all three files:

Then immediately begin Stage 1 of the next iteration.

Stop Conditions

Success: Root cause confirmed — stop when ALL of these are true:

At least one hypothesis has been confirmed (not just survived — confirmed by a positive test)
A minimal reproduction case exists: the smallest code/config/command that triggers the bug
The fix has been identified (even if not yet implemented)

Write confirmed root cause to debug/findings.md:

## Root Cause (Confirmed)
[Hypothesis text]

## Evidence
[Test that confirmed it]

## Reproduction case
[Minimal steps/code to reproduce]

## Proposed fix
[What needs to change]

## Next step
Run `/autoresearch:fix` to implement and verify the fix iteratively.

Budget exhausted: If max_iterations is reached without confirmation, write a partial findings report with the strongest surviving hypothesis and the evidence collected so far.

Stuck Protocol

If more than 3 iterations pass without eliminating any hypothesis:

Do not continue making minor variations. Switch observation technique entirely.

Available technique changes (see investigation-techniques.md for details):

Currently using log analysis → switch to bisect (binary search over commits)
Currently using bisect → switch to minimal reproduction (shrink the failing case)
Currently using code reading → switch to instrumentation (add logging/metrics)
Currently using instrumentation → switch to differential diagnosis (compare working vs broken)
Currently using one environment → switch to strace/dtrace (system-call level)

Log the technique switch:

## TECHNIQUE SWITCH — Iteration N
- Previous technique: [name]
- Reason: 3 iterations, 0 hypotheses eliminated
- New technique: [name]
- Rationale: [why this technique is more likely to produce new evidence]

Autonomy Directive

Once the investigation begins:

Do not stop to ask permission. Form hypotheses and run tests autonomously.
Do not summarize and wait. After logging an iteration, begin the next one.
Do not declare "root cause" without positive confirmation. Surviving a test is not confirmation.
Do not skip the log step. Every iteration must update all three files.

The only valid stops are: root cause confirmed, or budget exhausted.

Initial Setup

When /autoresearch:debug is invoked:

Ask: "What is the bug? Paste the error message or describe the failing behavior."
Ask: "What is the last known good state? (last commit that worked, last config that worked)"
Ask: "How many investigation iterations should I run before stopping? (default: 15)"
Create debug/ directory and initialize the three files.
Begin Stage 1 — Observe.

Do not ask more questions after setup. The investigation is autonomous from Stage 1 onward.

autoresearch-debug

autoresearch:debug — Scientific Bug Investigation

Core Principle

Output Structure

Investigation Loop

Stage 1 — Observe

Stage 2 — Hypothesize

Stage 3 — Design Falsifying Test

Stage 4 — Run Test

Stage 5 — Update Hypothesis Set

Stage 6 — Log

Stop Conditions

Stuck Protocol

Autonomy Directive

Initial Setup

Más de este repositorio

Más de este repositorio

autoresearch:debug — Scientific Bug Investigation

Core Principle

Output Structure

Investigation Loop

Stage 1 — Observe

Stage 2 — Hypothesize

Stage 3 — Design Falsifying Test

Stage 4 — Run Test

Stage 5 — Update Hypothesis Set

Stage 6 — Log

Stop Conditions

Stuck Protocol

Autonomy Directive

Initial Setup