Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

Loslegen

audit-experiment-integrity

Stage 1 audit (ARIS §3.1). Verify the training run actually converged, seeds were logged, no data leakage.

In Manus ausführen

Überblick

Stage 1 audit (ARIS §3.1). Verify the training run actually converged, seeds were logged, no data leakage.

Installationsbefehl

npx skills add https://github.com/PAMF2/oniro-colab --skill audit-experiment-integrity

Kopieren Sie diesen Befehl und fügen Sie ihn in Claude Code ein, um den Skill zu installieren

Quelle

PAMF2/oniro-colab

Sterne0

Forks0

Aktualisiert12. Mai 2026 um 14:16

SKILL.md

readonly

name	audit-experiment-integrity
description	Stage 1 audit (ARIS §3.1). Verify the training run actually converged, seeds were logged, no data leakage.
trigger	reviewer-stage-1
allowed_tools	["read_file","grep","glob","list_dir"]
output	json {verdict, items}

audit-experiment-integrity

You are the Reviewer (Stage 1). You evaluate whether the run is trustworthy before downstream stages look at its results.

Failure modes to check

Non-convergence — loss curve plateaus above target, NaN/Inf encountered.
Seed not logged — torch.manual_seed not called, results irreproducible.
Model-derived reference labels — labels come from a related model, not GT.
Self-normalized scores — metric normalized by the candidate's own outputs.
Phantom results — metric values present without corresponding logs.
Dead-code inflation — claimed components disabled at runtime via flags.
Scope inflation — claim covers eval set wider than what was actually run.

Output

{
  "verdict": "supported" | "partially_supported" | "invalidated",
  "items": [
    "phantom metric: claim mentions slot_purity=0.78 but logs only show 0.71",
    "..."
  ]
}

Return supported only if NONE of the seven failure modes fire.

Mehr aus diesem Repository

gleiches Repository

audit-paper-claim

PAMF2/oniro-colab

Stage 3 audit (ARIS §3.1). Fresh zero-context reviewer re-reads wiki narrative; cross-checks against raw results.

2026-05-120

audit-result-to-claim

PAMF2/oniro-colab

Stage 2 audit (ARIS §3.1). For each experimental claim, decide supported / partially / invalidated against the logs.

2026-05-120

novelty-bonus

PAMF2/oniro-colab

Compute novelty score for a candidate descriptor via k-NN distance in archive descriptor space.

2026-05-120

post-mortem

PAMF2/oniro-colab

On REJECT verdict, write a structured entry to the failure ledger so future executors skip the closed branch.

2026-05-120

propose-mutation

PAMF2/oniro-colab

Read wiki frontier + failure ledger, emit one typed mutation as unified diff. Used by Executor agent.

2026-05-120

qd-archive-update

PAMF2/oniro-colab

After a mutation passes (or just falls in an empty novelty cell), update the MAP-Elites archive.

2026-05-120

Quelle

PAMF2

PAMF2/oniro-colab

GitHub-Repository öffnen Creator-Repositorys ansehen

Installationsbefehl

Download

In Manus ausführen

Nützlich fürSOC

DatenwissenschaftlerInformatik- und Mathematikberufe15-2051L4

name	audit-experiment-integrity
description	Stage 1 audit (ARIS §3.1). Verify the training run actually converged, seeds were logged, no data leakage.
trigger	reviewer-stage-1
allowed_tools	["read_file","grep","glob","list_dir"]
output	json {verdict, items}

audit-experiment-integrity

You are the Reviewer (Stage 1). You evaluate whether the run is trustworthy before downstream stages look at its results.

Failure modes to check

Non-convergence — loss curve plateaus above target, NaN/Inf encountered.
Seed not logged — torch.manual_seed not called, results irreproducible.
Model-derived reference labels — labels come from a related model, not GT.
Self-normalized scores — metric normalized by the candidate's own outputs.
Phantom results — metric values present without corresponding logs.
Dead-code inflation — claimed components disabled at runtime via flags.
Scope inflation — claim covers eval set wider than what was actually run.

Output

{
  "verdict": "supported" | "partially_supported" | "invalidated",
  "items": [
    "phantom metric: claim mentions slot_purity=0.78 but logs only show 0.71",
    "..."
  ]
}

Return supported only if NONE of the seven failure modes fire.