Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

Commencer

phira-baseline-parity

Étoiles10

Forks3

Mis à jour1 mars 2026 à 11:38

Checklist for proving baseline behavior is unchanged (or changes are intentional) via controlled A/B runs.

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Exécuter dans Manus

Source

phira-ai

phira-ai/Phira

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Téléchargement

Exécuter dans Manus

Métiers associésSOC

Basé sur la classification professionnelle SOC

Analystes en assurance qualité des logiciels et testeursProfessions informatiques et mathématiques·SOC 15-1253

SKILL.md

readonly

name	phira-baseline-parity
description	Checklist for proving baseline behavior is unchanged (or changes are intentional) via controlled A/B runs.

Use this skill when implementing a feature that should be opt-in, when defaults must remain identical, or when reviewers ask "did you change baseline behavior?".

This skill is intended to be lazy-loaded on demand: only run parity A/B commands if the researcher approved running them.

Goal: demonstrate parity between baseline and modified code under the same configuration.

Core principle

If the acceptance criteria does not explicitly change defaults, your implementation should preserve baseline behavior by default.

Parity checklist

Feature gating

Ensure the new behavior is behind a config/flag.
Default should match baseline behavior unless the task explicitly changes defaults.

Controlled A/B

Pick one minimal config that exercises the relevant path.
Run A: baseline behavior (flag off / old path).
Run B: same config with flag off (in the new codebase) to prove unchanged defaults.
Run C (optional): flag on to prove the feature does something.

Fix confounders

Same seed(s), same data split, same batch size, same precision settings.
If determinism is not guaranteed, compare invariants and qualitative traces (e.g., loss decreases, shapes, number of steps, identical config resolution).

What to compare (choose what exists)

Config resolution output / logged hyperparameters
First-step loss (or first N steps) within tolerance
Model parameter count / module tree
Output tensor shapes and dtypes
Checkpoint metadata

Reporting (copy/paste)

Baseline parity
- Default behavior preserved: <yes/no/unknown>
- Evidence:
  - A/B commands: <commands>
  - Compared: <what you compared>
  - Outcome: <match/tolerance/diff>
- If not preserved: <why + where documented + migration notes>

Plus depuis ce dépôt

même dépôt

phira-archive-dag

phira-ai/Phira

Deterministic Graphviz/DOT derivation + validation for archive records.

2026-03-0110

phira-archive-format

phira-ai/Phira

Canonical per-record archive format (YAML front matter + Markdown body).

2026-03-0110

phira-archive-pointers

phira-ai/Phira

Pointer semantics for "since last record" and detected-change bookkeeping.

2026-03-0110

phira-cheap-checks

phira-ai/Phira

Fast, compute-aware verification checklist for implementation changes (imports, configs, tiny runs).

2026-03-0110

phira-conference-search-pipeline

phira-ai/Phira

Workflow for conference/venue paper search and short reading lists.

2026-03-0110

phira-env-contract

phira-ai/Phira

Procedural checklist to infer and state the repo's effective runtime environment and dependency policy.

2026-03-0110

name	phira-baseline-parity
description	Checklist for proving baseline behavior is unchanged (or changes are intentional) via controlled A/B runs.

Use this skill when implementing a feature that should be opt-in, when defaults must remain identical, or when reviewers ask "did you change baseline behavior?".

This skill is intended to be lazy-loaded on demand: only run parity A/B commands if the researcher approved running them.

Goal: demonstrate parity between baseline and modified code under the same configuration.

Core principle

If the acceptance criteria does not explicitly change defaults, your implementation should preserve baseline behavior by default.

Parity checklist

Feature gating

Ensure the new behavior is behind a config/flag.
Default should match baseline behavior unless the task explicitly changes defaults.

Controlled A/B

Pick one minimal config that exercises the relevant path.
Run A: baseline behavior (flag off / old path).
Run B: same config with flag off (in the new codebase) to prove unchanged defaults.
Run C (optional): flag on to prove the feature does something.

Fix confounders

Same seed(s), same data split, same batch size, same precision settings.
If determinism is not guaranteed, compare invariants and qualitative traces (e.g., loss decreases, shapes, number of steps, identical config resolution).

What to compare (choose what exists)

Config resolution output / logged hyperparameters
First-step loss (or first N steps) within tolerance
Model parameter count / module tree
Output tensor shapes and dtypes
Checkpoint metadata

Reporting (copy/paste)

Baseline parity
- Default behavior preserved: <yes/no/unknown>
- Evidence:
  - A/B commands: <commands>
  - Compared: <what you compared>
  - Outcome: <match/tolerance/diff>
- If not preserved: <why + where documented + migration notes>