Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

trainer-election

Sterne0

Forks0

Aktualisiert29. März 2026 um 21:21

Elect the strongest prompt or skill candidate from an existing evaluation workspace. Use this skill whenever a workflow already has multiple scored configurations and needs a separate leader-selection pass over grading, timing, or benchmark artifacts, especially when comparing optimize outputs without pushing that selection logic back into trainer-optimize.

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

Tyler-R-Kendrick

Tyler-R-Kendrick/copilot-auto-training

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Verwandte BerufeSOC

Basierend auf der SOC-Berufsklassifikation

Softwarequalitätssicherungsanalysten und -testerInformatik- und Mathematikberufe·SOC 15-1253

SKILL.md

readonly

name	trainer-election
description	Elect the strongest prompt or skill candidate from an existing evaluation workspace. Use this skill whenever a workflow already has multiple scored configurations and needs a separate leader-selection pass over grading, timing, or benchmark artifacts, especially when comparing optimize outputs without pushing that selection logic back into trainer-optimize.
license	MIT
compatibility	Requires Python 3.11+. Reads eval workspaces that follow the Agent Skills evaluation layout with eval metadata, grading.json, timing.json, and optional benchmark.json artifacts.
metadata	{"author":"your-org","version":"0.1.0"}

Election

Use this skill to elect a winner from existing evaluation artifacts. Treat it as the standalone selection step after candidates have already been run and graded. It does not generate candidates, perform research, synthesize evals, or re-run optimization.

When to use this skill

A workflow already produced multiple candidate configurations and now needs a winner chosen from scored artifacts.
Multiple candidate configurations already exist as with_skill, without_skill, old_skill, or other config directories inside a skill-eval workspace.
Each candidate has already been run against authored evals and saved grading.json and optional timing.json artifacts.
You need a separate election pass that picks the strongest configuration from workspace results instead of generating new candidates.
The workflow explicitly needs comparison across multiple optimize outputs, prompt rewrites, or skill revisions without folding that comparison into trainer-optimize.

Do not use this skill to gather datasets, synthesize evals, optimize prompts, or run missing evaluations from scratch. Those remain separate skills.

Inputs

workspace_dir: root workspace path, a specific iteration directory, or a direct eval directory
iteration: optional iteration selector when the workspace contains multiple iterations
manifest_file: optional authored evals/evals.json path for expected eval coverage

If manifest_file is omitted, the runtime searches evals/evals.json next to the iteration directory, then one level higher.

Accepted workspace layouts

The runtime accepts these shapes:

a workspace root that contains iteration-N/ directories
a direct iteration directory
a direct eval directory when only one eval folder is available
iteration directories that keep evals at the top level or under runs/

Config directories may contain grading.json and timing.json directly or nested run-N/ subdirectories. Baselines are recognized for baseline, without_skill, old_skill, and names ending in _baseline.

Election Behavior

Read the requested iteration directory or the latest iteration in the workspace.
Discover eval directories and configuration directories that follow the Agent Skills evaluation structure.
Resolve expected eval coverage from the explicit manifest when provided, otherwise from the nearest evals/evals.json, otherwise from the discovered eval keys.
Load scored runs from raw grading.json and timing.json artifacts.
Fall back to benchmark.json only when raw run artifacts are unavailable.
Aggregate mean pass rate, coverage ratio, penalty, mean time, mean token count, and mean error count per configuration.
Penalize incomplete eval coverage so partially graded candidates do not beat fully validated ones by omission.
Elect the leader by adjusted score, using raw score, lower error count, lower time, lower token usage, and stable name ordering as tie-breakers.
Persist candidate metadata so callers can explain the winner and locate the winning prompt artifact when one exists.

Output contract

The runtime returns JSON with these top-level fields:

winner: winning configuration name
best_prompt: prompt text when a prompt artifact is discoverable
best_prompt_file: path to the winning prompt artifact when present
best_candidate: the winning candidate record
persisted_candidates: all candidate summaries with is_winner, coverage, score, and cost metadata
selection_source: workspace when raw run artifacts were used, otherwise benchmark
iteration_dir, manifest_file, and expected_eval_count for caller traceability

Prompt artifacts are discovered from outputs/ or the run directory using prompt-like filenames such as prompt.md, candidate.md, or *.prompt.md.

Guardrails

Prefer full eval coverage over partial wins.
Treat missing grading.json as unscored instead of inventing results.
Do not assume optimize internals or regenerate candidates from election.
Keep baseline configurations in the pool so comparisons remain explainable.
Stop with a clear error if no scored candidate runs can be found.

Running the runtime

python skills/trainer-election/scripts/run_election.py <workspace_dir> \
  [--iteration <iteration-number-or-path>] \
  [--manifest-file <path-to-evals.json>] \
  [--output-file <path-to-result.json>]

Use skills/trainer-election/references/leader-election.md for the short rationale behind the scoring model.

Naming rationale

election is still the right public name because the skill owns the act of electing a winner from a scored field of candidates. The key change is that the field now comes from evaluation artifacts rather than from internal optimize-side search.

Mehr aus diesem Repository

gleiches Repository

trainer-optimize

Tyler-R-Kendrick/copilot-auto-training

Improve a markdown prompt file using Agent Lightning APO (Automatic Prompt Optimization). Use when the user asks to optimize or improve a markdown prompt, or starts a message with /trainer-optimize.

2026-04-130

trainer-train-agent

Tyler-R-Kendrick/copilot-auto-training

Own the end-to-end trainer loop for agent contract targets (*.agent.md files, custom agent definitions, and agent instruction documents). Use this whenever the caller needs to research, synthesize datasets, optimize, validate, and write back a trained candidate for an agent-type target. Prefer this specialized loop whenever the selected target defines tool routing, MCP skill configuration, agent personas, or handoff behavior rather than raw prompts, code, or skill definitions.

2026-04-120

trainer-train-code

Tyler-R-Kendrick/copilot-auto-training

Own the end-to-end trainer loop for Python code targets optimized with Microsoft Trace (nodes, bundles, models, and trainable agent components). Use this whenever the caller needs to research, synthesize test-based datasets, optimize, validate, and write back a trained candidate for a code-type target. Prefer this specialized loop for any Python file or callable that benefits from deterministic, test-based or benchmark-based feedback rather than open-ended language instruction quality.

2026-04-120

trainer-train-code

Tyler-R-Kendrick/copilot-auto-training

2026-04-120

trainer-train-prompt

Tyler-R-Kendrick/copilot-auto-training

Own the end-to-end trainer loop for prompt-like files (*.prompt.md, *.prompty, *.instructions.md, system prompts, and other natural-language instruction artifacts). Use this whenever the caller needs to research, synthesize datasets, optimize, validate, and write back a trained candidate for a prompt-type target. Prefer this specialized loop for any file whose primary content is natural-language instructions rather than code, skill configuration, or agent contracts.

2026-04-120

trainer-train-prompt

Tyler-R-Kendrick/copilot-auto-training

2026-04-120

name	trainer-election
description	Elect the strongest prompt or skill candidate from an existing evaluation workspace. Use this skill whenever a workflow already has multiple scored configurations and needs a separate leader-selection pass over grading, timing, or benchmark artifacts, especially when comparing optimize outputs without pushing that selection logic back into trainer-optimize.
license	MIT
compatibility	Requires Python 3.11+. Reads eval workspaces that follow the Agent Skills evaluation layout with eval metadata, grading.json, timing.json, and optional benchmark.json artifacts.
metadata	{"author":"your-org","version":"0.1.0"}

Election

When to use this skill

A workflow already produced multiple candidate configurations and now needs a winner chosen from scored artifacts.
Multiple candidate configurations already exist as with_skill, without_skill, old_skill, or other config directories inside a skill-eval workspace.
Each candidate has already been run against authored evals and saved grading.json and optional timing.json artifacts.
You need a separate election pass that picks the strongest configuration from workspace results instead of generating new candidates.
The workflow explicitly needs comparison across multiple optimize outputs, prompt rewrites, or skill revisions without folding that comparison into trainer-optimize.

Do not use this skill to gather datasets, synthesize evals, optimize prompts, or run missing evaluations from scratch. Those remain separate skills.

Inputs

workspace_dir: root workspace path, a specific iteration directory, or a direct eval directory
iteration: optional iteration selector when the workspace contains multiple iterations
manifest_file: optional authored evals/evals.json path for expected eval coverage

If manifest_file is omitted, the runtime searches evals/evals.json next to the iteration directory, then one level higher.

Accepted workspace layouts

The runtime accepts these shapes:

a workspace root that contains iteration-N/ directories
a direct iteration directory
a direct eval directory when only one eval folder is available
iteration directories that keep evals at the top level or under runs/

Election Behavior

Read the requested iteration directory or the latest iteration in the workspace.
Discover eval directories and configuration directories that follow the Agent Skills evaluation structure.
Resolve expected eval coverage from the explicit manifest when provided, otherwise from the nearest evals/evals.json, otherwise from the discovered eval keys.
Load scored runs from raw grading.json and timing.json artifacts.
Fall back to benchmark.json only when raw run artifacts are unavailable.
Aggregate mean pass rate, coverage ratio, penalty, mean time, mean token count, and mean error count per configuration.
Penalize incomplete eval coverage so partially graded candidates do not beat fully validated ones by omission.
Elect the leader by adjusted score, using raw score, lower error count, lower time, lower token usage, and stable name ordering as tie-breakers.
Persist candidate metadata so callers can explain the winner and locate the winning prompt artifact when one exists.

Output contract

The runtime returns JSON with these top-level fields:

winner: winning configuration name
best_prompt: prompt text when a prompt artifact is discoverable
best_prompt_file: path to the winning prompt artifact when present
best_candidate: the winning candidate record
persisted_candidates: all candidate summaries with is_winner, coverage, score, and cost metadata
selection_source: workspace when raw run artifacts were used, otherwise benchmark
iteration_dir, manifest_file, and expected_eval_count for caller traceability

Prompt artifacts are discovered from outputs/ or the run directory using prompt-like filenames such as prompt.md, candidate.md, or *.prompt.md.

Guardrails

Prefer full eval coverage over partial wins.
Treat missing grading.json as unscored instead of inventing results.
Do not assume optimize internals or regenerate candidates from election.
Keep baseline configurations in the pool so comparisons remain explainable.
Stop with a clear error if no scored candidate runs can be found.

Running the runtime

python skills/trainer-election/scripts/run_election.py <workspace_dir> \
  [--iteration <iteration-number-or-path>] \
  [--manifest-file <path-to-evals.json>] \
  [--output-file <path-to-result.json>]

Use skills/trainer-election/references/leader-election.md for the short rationale behind the scoring model.