Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

Loslegen

audit-result-to-claim

Stage 2 audit (ARIS §3.1). For each experimental claim, decide supported / partially / invalidated against the logs.

In Manus ausführen

Überblick

Stage 2 audit (ARIS §3.1). For each experimental claim, decide supported / partially / invalidated against the logs.

Installationsbefehl

npx skills add https://github.com/PAMF2/oniro-colab --skill audit-result-to-claim

Kopieren Sie diesen Befehl und fügen Sie ihn in Claude Code ein, um den Skill zu installieren

Quelle

PAMF2/oniro-colab

Sterne0

Forks0

Aktualisiert12. Mai 2026 um 14:16

SKILL.md

readonly

name	audit-result-to-claim
description	Stage 2 audit (ARIS §3.1). For each experimental claim, decide supported / partially / invalidated against the logs.
trigger	reviewer-stage-2
allowed_tools	["read_file","grep","glob"]
output	json {verdict, items}

audit-result-to-claim

You are the Reviewer (Stage 2). You hold the run's quantitative claims to the raw logs.

Procedure

Extract every quantitative claim in the run's report (numbers, deltas, p-values).
For each claim, locate the supporting line in metrics.jsonl / wandb.log / the wiki variant frontmatter.
Verdict per claim:
- supported — numbers match within rounding tolerance.
- partially_supported — claim is true on a subset of the eval set, not all of it.
- invalidated — no log line supports the claim.
Aggregate to a single verdict for the bundle:
- supported iff every individual claim is supported.
- invalidated iff any single claim is invalidated.
- else partially_supported.

Output

{
  "verdict": "supported" | "partially_supported" | "invalidated",
  "items": [
    "claim 'ARC-3 dev 0.32' supported by metrics.jsonl line 4812",
    "claim 'Gödel gate accept rate 22%' partially_supported: 22% on weeks 1-3, 14% on week 4"
  ]
}

Mehr aus diesem Repository

gleiches Repository

audit-experiment-integrity

PAMF2/oniro-colab

Stage 1 audit (ARIS §3.1). Verify the training run actually converged, seeds were logged, no data leakage.

2026-05-120

audit-paper-claim

PAMF2/oniro-colab

Stage 3 audit (ARIS §3.1). Fresh zero-context reviewer re-reads wiki narrative; cross-checks against raw results.

2026-05-120

novelty-bonus

PAMF2/oniro-colab

Compute novelty score for a candidate descriptor via k-NN distance in archive descriptor space.

2026-05-120

post-mortem

PAMF2/oniro-colab

On REJECT verdict, write a structured entry to the failure ledger so future executors skip the closed branch.

2026-05-120

propose-mutation

PAMF2/oniro-colab

Read wiki frontier + failure ledger, emit one typed mutation as unified diff. Used by Executor agent.

2026-05-120

qd-archive-update

PAMF2/oniro-colab

After a mutation passes (or just falls in an empty novelty cell), update the MAP-Elites archive.

2026-05-120

Quelle

PAMF2

PAMF2/oniro-colab

GitHub-Repository öffnen Creator-Repositorys ansehen

Installationsbefehl

Download

In Manus ausführen

Nützlich fürSOC

Sonstige BiowissenschaftlerNatur- und Sozialwissenschaften19-1029L4

name	audit-result-to-claim
description	Stage 2 audit (ARIS §3.1). For each experimental claim, decide supported / partially / invalidated against the logs.
trigger	reviewer-stage-2
allowed_tools	["read_file","grep","glob"]
output	json {verdict, items}

audit-result-to-claim

You are the Reviewer (Stage 2). You hold the run's quantitative claims to the raw logs.

Procedure

Extract every quantitative claim in the run's report (numbers, deltas, p-values).
For each claim, locate the supporting line in metrics.jsonl / wandb.log / the wiki variant frontmatter.
Verdict per claim:
- supported — numbers match within rounding tolerance.
- partially_supported — claim is true on a subset of the eval set, not all of it.
- invalidated — no log line supports the claim.
Aggregate to a single verdict for the bundle:
- supported iff every individual claim is supported.
- invalidated iff any single claim is invalidated.
- else partially_supported.

Output

{
  "verdict": "supported" | "partially_supported" | "invalidated",
  "items": [
    "claim 'ARC-3 dev 0.32' supported by metrics.jsonl line 4812",
    "claim 'Gödel gate accept rate 22%' partially_supported: 22% on weeks 1-3, 14% on week 4"
  ]
}