Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

wagf-experiment-designer

Name: Wagf Experiment Designer
Author: WenyuChiou

// Turn a WAGF research question into a reproducible experiment matrix (model × governance × seed × metric × artefact path). Use when the user says "design an experiment", "plan an ablation", "compare strict vs disabled", "set up cross-model evaluation", or wants a runnable matrix written to .research/.

Ejecutar en Manus

$ git log --oneline --stat

stars:0

forks:0

updated:26 de abril de 2026, 01:14

Explorador de archivos

5 archivos

SKILL.md

readonly

related-skills.json

mismo repositorio

llm-agent-audit-trace-analyzer.md

from "WenyuChiou/WAGF"

Turn raw WAGF audit traces (household_governance_audit.csv + raw/*.jsonl) into paper-ready governance metrics — IBR, EHE, rejection taxonomy, retry outcomes, model-condition comparisons. Use when the user says "analyze these traces", "compute governance metrics", "summarize rejection and retry outcomes", or hands over a results directory and asks "what does this say".

2026-05-260

wagf-domain-builder.md

from "WenyuChiou/WAGF"

Walk a researcher (PhD, collaborator, lab-mate) through building their first single-agent WAGF domain — from "I have a research question + maybe an external model" to "I have a working WAGF experiment producing audit traces." Conducts a structured S0-S7 interview, invokes `broker.tools.scaffold_domain` at S4, guides 4 surgical edits in S5, and runs `broker.tools.validate_prompt` after every change. Hands off to `wagf-coupling-designer` for any coupling work and to `wagf-experiment-designer` / `abm-reproducibility-checker` once the domain runs green. Use when the user says "I want to build a WAGF model for <my domain>", "help me set up a new domain", "I'm new to WAGF and have a research question", or "scaffold a domain from scratch".

2026-05-260

model-coupling-contract-checker.md

from "WenyuChiou/WAGF"

Verify the contract between WAGF/ABM agents and an external model (flood, hydrology, irrigation, seismic, catastrophe) — units, time steps, state mutation direction, feedback-loop double-counting. Use when the user says "check ABM-model coupling", "audit feedback loop", "verify units between WAGF and X model", or asks to confirm an external-model integration is safe.

2026-05-170

wagf-coupling-designer.md

from "WenyuChiou/WAGF"

Walk a researcher through designing the LLM↔external-model interface — decision flow IN, observation flow OUT — for a single-agent WAGF domain. Emits a coupling contract, a working mock adapter, and a pattern-specific real-model adapter scaffold so the WAGF side can be built and smoke-tested BEFORE the real model is wired in. Use when the user says "I want to couple my LLM agents to <my simulator>", "help me design the WAGF↔X interface", "scaffold the external model adapter", "draft a coupling contract", "I have a Python / R / CSV-based model and want WAGF to drive it". Sister skill to `model-coupling-contract-checker` (which AUDITS existing contracts; this one DESIGNS new ones).

2026-05-170

abm-reproducibility-checker.md

from "WenyuChiou/WAGF"

Verify another researcher can reproduce a WAGF experiment — manifests, seeds, configs, runnable commands, data provenance vs git blame, figure-script outputs match references. Use when the user says "audit reproducibility", "prepare for submission", "check this experiment folder", or any time a results directory needs a pre-publication integrity sweep.

2026-04-260

wagf-quickstart.md

from "WenyuChiou/WAGF"

First-time WAGF setup walkthrough — environment check, smoke test, first experiment, and handoff to the four lifecycle skills. Use when the user says "I just cloned WAGF", "set up WAGF", "first WAGF run", "I'm new to this", "where do I start with WAGF", or opens a Claude Code session in a freshly-cloned WAGF repo without a clear task.

2026-04-260

package.json

"author": "WenyuChiou"

"repository": "WenyuChiou/WAGF"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Científicos de datosOcupaciones informáticas y matemáticas15-2051L4

name	wagf-experiment-designer
description	Turn a WAGF research question into a reproducible experiment matrix (model × governance × seed × metric × artefact path). Use when the user says "design an experiment", "plan an ablation", "compare strict vs disabled", "set up cross-model evaluation", or wants a runnable matrix written to .research/.

WAGF: Experiment Designer

Convert a research question into a reproducible experiment matrix that the WAGF runner can execute. The skill produces three artefacts:

.research/wagf_experiment_matrix.yml — the (model × governance × seed × …) cross-product to run.
.research/metrics_plan.md — the metric-to-artefact mapping that the analyzer skill will read.
.research/run_plan.md — runnable bat / shell commands derived from the matrix, with idempotent skip-checks.

When to Use

Load this skill when the user says:

"Design an experiment to test whether governance reduces hallucinated actions."
"Compare strict vs relaxed governance across models."
"Plan a cross-model WAGF ablation."
"I want to test [hypothesis] — set up the experiment matrix."
"Lay out runs for [model list] × [seeds]."

Do NOT use this skill for:

Analysing existing audit traces → llm-agent-audit-trace-analyzer.
Reproducibility audit of completed runs → abm-reproducibility-checker.
Generic project planning → project-planner.

Inputs

The skill needs answers to ALL of:

Research question — one-sentence claim or comparison.
Hypothesis — what direction of effect is expected, with sign.
Domain — flood (single_agent), irrigation (irrigation_abm), or multi_agent_flood. If unsure, ask. Do not guess.
Candidate models — exact Ollama tags (e.g., gemma3:4b, gemma4:e4b, ministral-3:8b). Refuse to invent unavailable models; check via ollama list if unsure.
Governance conditions — list from examples/<domain>/agent_types.yaml's governance_profile keys. Common values: strict, disabled, relaxed, plus any ablation variant present in the domain config.
Seed budget — integer count; default 5 (paper convention) but ask for confirmation. Specify which seeds (typically 42–46).
Time horizon — number of simulation years (flood: 10; irrigation: 42; user override OK).
Agent count — flood: 100; irrigation: 78 (CRSS); MA flood: 400.
Metric set — drawn from references/metrics_catalog.md. Refuse to invent metrics not in the catalogue.

If any input is missing, ask. Do not assume.

Workflow

Clarify: confirm all 9 inputs above; if missing, ask.
Build matrix: cross-product of (model × condition × seed) with shared time horizon and agent count. Each row = one run.
Map metrics → artefacts: each metric in the user's set must point to (a) the canonical analysis script (per references/metrics_catalog.md), and (b) the expected output path (e.g., analysis/<metric>_summary.md).
Write run plan: one bat / shell command per matrix row, with if exist <output>/simulation_log.csv skip-check for idempotent resume. Reference an existing per-domain bat as template (e.g., examples/irrigation_abm/run_gemma4_e2b_batch.bat).
Write the three artefacts.

Outputs (mandatory artefacts)

`.research/wagf_experiment_matrix.yml`

research_question: "<one-sentence>"
hypothesis: "<directional claim>"
domain: flood | irrigation | multi_agent_flood
agent_count: <int>
time_horizon_years: <int>

models:
  - tag: gemma4:e4b
    governance:
      - strict
      - disabled
    seeds: [42, 43, 44, 45, 46]

  - tag: gemma3:4b
    governance:
      - strict
      - disabled
    seeds: [42, 43, 44, 45, 46]

metrics:
  - name: ibr
    canonical_script: examples/single_agent/analysis/gemma4_nw_crossmodel_analysis.py
    formula_ref: references/metrics_catalog.md#ibr
    output_artefact: analysis/governance_metrics.csv
  - name: ehe
    canonical_script: examples/irrigation_abm/analysis/nw_bootstrap_ci.py:shannon_entropy
    formula_ref: references/metrics_catalog.md#ehe
    output_artefact: analysis/governance_metrics.csv

statistical_comparisons:
  - name: governed_vs_disabled_per_model
    test: paired_t
    metric: ibr
    df_per_model: 4   # n_seeds - 1

`.research/metrics_plan.md`

Free-form markdown with one section per metric:

## IBR (Irrational Behaviour Rate)
- Definition: <restate from metrics_catalog.md>
- Canonical script: <path>
- Input data path: <path glob>
- Output: <path>
- Statistical comparison: paired t (governed vs disabled, n=5 seeds)
- Acceptance: ΔIBR with 95% CI excludes 0 → governance effect confirmed

`.research/run_plan.md`

Runnable command list with idempotency:

# Phase 1: gemma4:e4b governed
for seed in 42 43 44 45 46; do
    if [ -f "examples/single_agent/results/JOH_FINAL_v2/gemma4_e4b/Group_C/Run_${seed}/simulation_log.csv" ]; then
        echo "skip: gemma4:e4b governed seed=${seed}"
    else
        python examples/single_agent/run_flood.py \
            --model gemma4:e4b --seed ${seed} \
            --governance-mode strict \
            --output examples/single_agent/results/JOH_FINAL_v2/gemma4_e4b/Group_C/Run_${seed} \
            --num-ctx 8192 --num-predict 1536 \
            > .../seed${seed}.stdout.log 2>&1
    fi
done

Refusal Protocol

The skill MUST refuse to:

Invent model tags. If user gives "Gemma" or "Claude" without a specific Ollama tag, ask. Do not guess.
Invent metrics. Ask the user to choose from references/metrics_catalog.md or to specify a new formula (out-of-scope; must be implemented as a new analysis script first).
Invent governance modes. Read available modes from the domain's agent_types.yaml; ask if the requested mode is not configured.
Run with seed budget < 3 unless the user explicitly states the experiment is exploratory. Below n=3 paired-t is unreliable; flag.
Mix domains in one matrix. Each matrix is single-domain by default; cross-domain comparison requires a separate matrix per domain plus a comparison plan.

Output structure contract

metrics_plan.md MUST have one section per metric with these fields in order: Definition, Canonical script, Input data path, Output, Statistical comparison, Acceptance.

run_plan.md MUST have one command per matrix row with idempotent skip-check (no command may overwrite an existing simulation_log.csv without explicit user opt-in).

wagf_experiment_matrix.yml MUST validate against the research_question + hypothesis + domain + models[] + metrics[] schema (top-level keys mandatory).

Bundled resources

references/matrix_template.md — copy-paste-ready wagf_experiment_matrix.yml skeleton with annotated comments.
references/metrics_catalog.md — every metric the WAGF analyser can produce, with file:line refs to the canonical implementation.
references/governance_modes.md — the available governance modes per domain, derived from each agent_types.yaml.
references/anti_overclaim.md — refusal patterns and clarification prompts to avoid scope-creep.

Acceptance criteria

The skill is ready when:

For input "Plan a cross-model WAGF flood ablation across Gemma-3 4B, Gemma-4 e4b, Ministral 8B with 5 seeds", produces a valid wagf_experiment_matrix.yml with 3 models × 2 conditions × 5 seeds = 30 rows, plus metrics_plan.md with IBR + EHE + retry-rate, plus run_plan.md with skip-checks pointing to JOH_FINAL_v2/.
For input "Compare governed vs ungoverned" without specifying models, the skill asks rather than guessing.
For input "Test the WAGF Coherence Index across drought scenarios", the skill refuses ("metric not in catalogue") and asks user to pick from the catalogue or define a new metric.

wagf-experiment-designer

Más de este repositorio

Más de este repositorio

WAGF: Experiment Designer

When to Use

Inputs

Workflow

Outputs (mandatory artefacts)

.research/wagf_experiment_matrix.yml

.research/metrics_plan.md

.research/run_plan.md

Refusal Protocol

Output structure contract

Bundled resources

Acceptance criteria

WAGF: Experiment Designer

When to Use

Inputs

Workflow

Outputs (mandatory artefacts)

.research/wagf_experiment_matrix.yml

.research/metrics_plan.md

.research/run_plan.md

Refusal Protocol

Output structure contract

Bundled resources

Acceptance criteria

`.research/wagf_experiment_matrix.yml`

`.research/metrics_plan.md`

`.research/run_plan.md`

`.research/wagf_experiment_matrix.yml`

`.research/metrics_plan.md`

`.research/run_plan.md`