원클릭으로 Manus에서 모든 스킬 실행

autoany

Evaluator-Governed Recursive Improvement (EGRI) framework for turning ambiguous goals into safe, measurable, rollback-capable recursive improvement systems. Use when the user wants to: (1) build a self-improving system for any domain (ML, RAG, workflows, ETL, UI, compiler tuning, etc.), (2) formalize a vague optimization goal into a bounded loop with evaluator, harness, and promotion policy, (3) create an autoresearch-style system beyond ML training, (4) design a mutable-artifact + immutable-evaluator architecture, (5) scaffold a problem-spec for recursive improvement, (6) turn "make X better" into a safe, auditable optimization process. Triggers on: "self-improving", "autoresearch", "autoany", "EGRI", "recursive improvement", "optimization loop", "evaluator-governed", "harness + evaluator", "mutable artifact", "problem compiler", "benchmark loop", "mutation surface".

Manus에서 실행

스타3

포크0

업데이트2026년 4월 25일 01:16

출처

broomva

broomva/autoany

GitHub 저장소 열기 Creator 저장소 보기

설치 명령

다운로드

Manus에서 실행

유용한 대상SOC

소프트웨어 개발자컴퓨터 및 수학직15-1252L4

SKILL.md

readonly

name

autoany

description

Autoany — EGRI Skill

Turn ambiguous user goals into safe, measurable, rollback-capable recursive improvement systems.

Core Principle

Do not grant an agent more mutation freedom than your evaluator can reliably judge.

Operating Procedure

Phase 1: Problem Compilation

Extract from the user's goal:

Objective — metric(s) to optimize (scalar or vector)
Hard constraints — what must never be violated (memory, latency, cost, compliance)
Mutable artifacts — what the loop may change (the train.py equivalent)
Immutable artifacts — what stays fixed (the prepare.py equivalent)
Evaluator — how to score candidates reliably enough to compare them
Execution backend — where candidates run (local, container, simulator, API)
Budget — time, tokens, money, or trial count per candidate
Promotion policy — keep-if-improves, Pareto, threshold, human-gate
Autonomy mode — suggestion, sandbox, auto-promote, or portfolio

Produce a problem-spec.yaml. See assets/problem-spec.template.yaml for the schema and references/PROBLEM-SPEC.md for field-by-field semantics.

Phase 2: Evaluator-First Design

Before touching the mutable artifact:

Define the evaluator — what it measures, how it scores, what thresholds matter
Build or identify the benchmark / replay set / test suite
Establish baseline score by running the current artifact through the evaluator
Confirm the evaluator is trusted — if not, fix it before proceeding

Law: The evaluator must exist and produce a baseline score before any mutation begins.

Phase 3: Harness Construction

Build the immutable execution shell:

Execution script — runs the candidate artifact deterministically
Scoring script — invokes the evaluator, outputs structured results
Constraint checker — rejects candidates violating hard constraints
Rollback mechanism — restores previous state on failure or rejection
Telemetry — logs trial metadata (duration, resource use, errors)
Ledger — append-only record of all trials (see assets/ledger.schema.json)

Phase 4: Mutation Surface Definition

Identify artifact type (code, config, prompt, graph, parameters)
Define mutation operators (edit, replace, compose, parameterize, restructure)
Start with the smallest viable mutation surface — expand only after baseline is stable
Mark everything else as immutable

Phase 5: Loop Execution

x_t = current best artifact state
while budget remains:
    m = propose_mutation(x_t, ledger, strategy)
    x' = apply(m, x_t)
    result = execute(x', harness)
    score = evaluate(result)
    if violates_constraints(result): discard(x'), log("rejected")
    elif promotion_policy(score, x_t_score): promote(x'), x_t = x'
    else: discard(x'), log("no improvement")
    record(ledger, trial_metadata)

Phase 6: Ledger Review and Strategy Distillation

After each batch of trials:

Review ledger for patterns (what helped, what failed, what is exhausted)
Induce reusable abstractions ("depth increases hurt under this budget")
Update search strategy based on accumulated evidence
Decide: continue, branch, simplify, or escalate to human

Autonomy Modes

Mode	Mutate	Execute	Promote	When to use
Suggestion	Propose only	No	No	Evaluator untrusted or high-risk domain
Sandbox	Yes	Yes	No	Evaluator exists but promotion needs human review
Auto-promote	Yes	Yes	Yes	Strong evaluator, bounded damage, clear constraints
Portfolio	Yes	Yes	Yes	Multiple loops, budget allocation across subproblems

Default to sandbox. Escalate only with explicit user approval.

Safety Rules

Never mutate evaluator and artifact in the same trial
Never promote without constraint checks passing
Never exceed budget — fail closed, not open
Always maintain rollback capability to last promoted state
Log every trial, including failures and rejections
If evaluator is suspected gamed, halt and escalate

Domain Adaptation

Read references/DOMAIN-MAPPINGS.md for concrete artifact/harness/evaluator choices per domain.

Formal Model

Read references/REFERENCE.md for full EGRI formal model: Pi = (X, M, H, E, J, C, B, P, L).

Nested Loops and Meta-Optimization

Read references/META-LOOP.md for Level 1-3 loops (policy, portfolio, org).

Scaffold Initialization

python3 scripts/autoany_init.py <project-name> --domain <code|rag|workflow|etl|ui|generic> --path <output-dir>

이 저장소의 다른 Skills

같은 저장소

autoany

broomva/autoany

2026-04-013

name

autoany

description

Autoany — EGRI Skill

Turn ambiguous user goals into safe, measurable, rollback-capable recursive improvement systems.

Core Principle

Do not grant an agent more mutation freedom than your evaluator can reliably judge.

Operating Procedure

Phase 1: Problem Compilation

Extract from the user's goal:

Objective — metric(s) to optimize (scalar or vector)
Hard constraints — what must never be violated (memory, latency, cost, compliance)
Mutable artifacts — what the loop may change (the train.py equivalent)
Immutable artifacts — what stays fixed (the prepare.py equivalent)
Evaluator — how to score candidates reliably enough to compare them
Execution backend — where candidates run (local, container, simulator, API)
Budget — time, tokens, money, or trial count per candidate
Promotion policy — keep-if-improves, Pareto, threshold, human-gate
Autonomy mode — suggestion, sandbox, auto-promote, or portfolio

Produce a problem-spec.yaml. See assets/problem-spec.template.yaml for the schema and references/PROBLEM-SPEC.md for field-by-field semantics.

Phase 2: Evaluator-First Design

Before touching the mutable artifact:

Define the evaluator — what it measures, how it scores, what thresholds matter
Build or identify the benchmark / replay set / test suite
Establish baseline score by running the current artifact through the evaluator
Confirm the evaluator is trusted — if not, fix it before proceeding

Law: The evaluator must exist and produce a baseline score before any mutation begins.

Phase 3: Harness Construction

Build the immutable execution shell:

Execution script — runs the candidate artifact deterministically
Scoring script — invokes the evaluator, outputs structured results
Constraint checker — rejects candidates violating hard constraints
Rollback mechanism — restores previous state on failure or rejection
Telemetry — logs trial metadata (duration, resource use, errors)
Ledger — append-only record of all trials (see assets/ledger.schema.json)

Phase 4: Mutation Surface Definition

Identify artifact type (code, config, prompt, graph, parameters)
Define mutation operators (edit, replace, compose, parameterize, restructure)
Start with the smallest viable mutation surface — expand only after baseline is stable
Mark everything else as immutable

Phase 5: Loop Execution

x_t = current best artifact state
while budget remains:
    m = propose_mutation(x_t, ledger, strategy)
    x' = apply(m, x_t)
    result = execute(x', harness)
    score = evaluate(result)
    if violates_constraints(result): discard(x'), log("rejected")
    elif promotion_policy(score, x_t_score): promote(x'), x_t = x'
    else: discard(x'), log("no improvement")
    record(ledger, trial_metadata)

Phase 6: Ledger Review and Strategy Distillation

After each batch of trials:

Review ledger for patterns (what helped, what failed, what is exhausted)
Induce reusable abstractions ("depth increases hurt under this budget")
Update search strategy based on accumulated evidence
Decide: continue, branch, simplify, or escalate to human

Autonomy Modes

Mode	Mutate	Execute	Promote	When to use
Suggestion	Propose only	No	No	Evaluator untrusted or high-risk domain
Sandbox	Yes	Yes	No	Evaluator exists but promotion needs human review
Auto-promote	Yes	Yes	Yes	Strong evaluator, bounded damage, clear constraints
Portfolio	Yes	Yes	Yes	Multiple loops, budget allocation across subproblems

Default to sandbox. Escalate only with explicit user approval.

Safety Rules

Never mutate evaluator and artifact in the same trial
Never promote without constraint checks passing
Never exceed budget — fail closed, not open
Always maintain rollback capability to last promoted state
Log every trial, including failures and rejections
If evaluator is suspected gamed, halt and escalate

Domain Adaptation

Read references/DOMAIN-MAPPINGS.md for concrete artifact/harness/evaluator choices per domain.

Formal Model

Read references/REFERENCE.md for full EGRI formal model: Pi = (X, M, H, E, J, C, B, P, L).

Nested Loops and Meta-Optimization

Read references/META-LOOP.md for Level 1-3 loops (policy, portfolio, org).

Scaffold Initialization

python3 scripts/autoany_init.py <project-name> --domain <code|rag|workflow|etl|ui|generic> --path <output-dir>