一键在 Manus 中运行任何 Skill

optimize

星标16,098

分支2,216

更新时间2026年4月30日 16:09

Autonomous optimization loop — hill-climb any target. Code with metrics, or skills/prompts/agents with LLM-as-judge eval. USE WHEN optimize, autoresearch, hill climb, improve metric, reduce latency, improve performance, benchmark optimization, bundle size, page speed, autonomous improvement loop, optimize skill, optimize prompt, improve quality, eval mode.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

danielmiessler

danielmiessler/LifeOS

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

SKILL.md

readonly

name	Optimize
description	Autonomous optimization loop — hill-climb any target. Code with metrics, or skills/prompts/agents with LLM-as-judge eval. USE WHEN optimize, autoresearch, hill climb, improve metric, reduce latency, improve performance, benchmark optimization, bundle size, page speed, autonomous improvement loop, optimize skill, optimize prompt, improve quality, eval mode.
disable-model-invocation	true
effort	medium

/optimize — Autonomous Optimization v2

Run an autonomous optimization loop against any target. Two modes:

Metric mode — code targets with a shell command that produces a number (the original)
Eval mode — skills, prompts, agents, or any text target judged by LLM-as-judge binary evals

The agent modifies the target, measures the result, keeps improvements, discards failures, and repeats.

Inspired by Karpathy's autoresearch and extended with LLM-as-judge evaluation.

Invocation

Metric Mode (code targets)

/optimize --metric "lighthouse_score" --higher-is-better \
  --measure "npx lighthouse http://localhost:3000 --output=json" \
  --extract "jq '.categories.performance.score * 100' lighthouse.json" \
  --files "src/**/*.tsx,src/**/*.css" \
  --budget 120

/optimize --resume        # Resume a previous optimization loop
/optimize --status        # Show results summary from last/current run

Eval Mode (skill/prompt/agent targets)

/optimize --target "~/.claude/skills/ExtractWisdom"
/optimize --target "~/.claude/skills/Research/Workflows/QuickResearch.md"
/optimize --target "prompts/my-prompt.md"
/optimize --target "~/.claude/skills/ExtractWisdom" --max-experiments 20

In eval mode, the system automatically:

Detects the target type (skill, prompt, agent, code, function)
Reads the target to understand its purpose and constraints
Generates 3-6 binary eval criteria and 3-5 test inputs
Presents criteria + inputs for your approval before starting
Runs the optimization loop using LLM-as-judge scoring
Presents a recommendation (apply/reject/partial) when done

What Happens

This skill triggers the PAI Algorithm in mode: optimize:

OBSERVE — Define or auto-detect the target, set eval_mode
THINK — Analyze codebase/skill, generate hypothesis queue
PLAN — Prioritize hypotheses by expected impact
BUILD — Phase 0: TARGET ANALYSIS (see optimize-loop.md)
- Detect target type, auto-generate eval criteria (eval mode), set up sandbox, baseline
EXECUTE — The autonomous loop (optimize-loop.md):
- Hypothesize → Modify target → Measure (metric or eval) → Keep/Revert → Repeat
- Metric mode: ~12 experiments/hour (at 5-min budget)
- Eval mode: ~6-8 experiments/hour (multi-run judging is slower)
VERIFY — Phase 9: RECOMMEND — diff, summary, apply/reject/partial options
LEARN — Phase 10: EXTRACT LEARNINGS — what worked, what didn't, structured insights

Arguments — Metric Mode

Argument	Required	Default	Description
`--metric NAME`	yes		Human-readable metric name
`--measure COMMAND`	yes		Shell command that produces the metric
`--files GLOB`	yes		Files the agent may modify (comma-separated)
`--higher-is-better`		(default)	Higher metric values are better
`--lower-is-better`			Lower metric values are better
`--extract COMMAND`		Last number in stdout	Extract metric from output
`--budget SECONDS`		300	Time budget per experiment
`--target VALUE`		none	Stop when metric reaches this value
`--max-experiments N`		none	Stop after N experiments
`--locked GLOB`		none	Files the agent must NOT modify
`--constraints TEXT`		none	Additional rules (e.g., "tests must pass")

Arguments — Eval Mode

Argument	Required	Default	Description
`--target PATH`	yes		Path to skill directory, prompt file, or agent definition
`--max-experiments N`		none	Stop after N experiments
`--runs N`		3	Runs per experiment (more = more reliable, slower)
`--criteria "Q1" "Q2"`		auto-generated	Override auto-generated eval criteria
`--inputs "I1" "I2"`		auto-generated	Override auto-generated test inputs
`--budget SECONDS`		300	Time budget per experiment

Shared Arguments

Argument	Description
`--resume`	Resume a previous optimization run
`--status`	Show results summary

Algorithm Integration

When /optimize is invoked, the Algorithm enters with mode: optimize in the ISA frontmatter. The eval_mode is set based on arguments:

--measure provided → eval_mode: metric (git branch sandbox)
--target provided → eval_mode: eval (directory sandbox)

ISC criteria become guard rails — assertions that must hold true across ALL experiments. Guard rails must REMAIN satisfied perpetually. A violation triggers automatic revert regardless of score improvement.

Reference files:

~/.claude/PAI/ALGORITHM/optimize-loop.md — the full loop protocol
~/.claude/PAI/ALGORITHM/eval-guide.md — how to write good eval criteria
~/.claude/PAI/ALGORITHM/target-types.md — target detection and ISC generation

Examples

Metric Mode

Optimize page load time:

/optimize --metric "lighthouse_perf" --higher-is-better \
  --measure "npx lighthouse http://localhost:3000 --output=json --output-path=lh.json" \
  --extract "jq '.categories.performance.score * 100' lh.json" \
  --files "src/**/*.tsx,src/**/*.css" \
  --target 95 --budget 120

Optimize bundle size:

/optimize --metric "bundle_bytes" --lower-is-better \
  --measure "bun run build 2>&1 && du -sb dist/ | cut -f1" \
  --files "src/**/*.ts" \
  --constraints "all tests must pass"

ML training (Karpathy-style):

/optimize --metric "val_bpb" --lower-is-better \
  --measure "uv run train.py > run.log 2>&1 && grep '^val_bpb:' run.log | cut -d' ' -f2" \
  --files "train.py" \
  --locked "prepare.py" \
  --budget 300

Eval Mode

Optimize a skill's Extract workflow:

/optimize --target "~/.claude/skills/ExtractWisdom" --max-experiments 15

Optimize a standalone prompt:

/optimize --target "prompts/summarize-article.md" --runs 5

Optimize with custom criteria:

/optimize --target "~/.claude/skills/Research/Workflows/QuickResearch.md" \
  --criteria "Does the output contain specific facts with sources?" \
            "Is the output structured with clear sections?" \
            "Does the output avoid generic filler?" \
  --inputs "research quantum computing breakthroughs 2025" \
           "quick research on supply chain security" \
           "find recent developments in AI agents"

Gotchas

Hill-climbing can get stuck in local optima. If score plateaus, consider resetting with different initial conditions.
Eval mode vs metric mode: Use metric mode for quantifiable targets (latency, size). Use eval mode for qualitative targets (skill quality, prompt effectiveness).
Regression tolerance prevents catastrophic changes. Don't set it to 0 — some regression in secondary metrics is acceptable if primary metric improves significantly.

同仓库更多 Skills

同仓库

art

danielmiessler/LifeOS

Generates static visual content across 20+ formats via Flux, Nano Banana Pro (Gemini 3 Pro), and GPT-Image-1. Covers blog header illustrations, editorial art, Mermaid flowcharts, technical architecture diagrams, D3.js dashboards, taxonomies, timelines, 2x2 framework matrices, comparisons, annotated screenshots, recipe cards, aphorism/quote cards, conceptual maps, stat cards, comic panels, YouTube thumbnails, PAI pack icons, and brand-logo wallpapers. Named workflows: Essay, D3Dashboards, Visualize, Mermaid, TechnicalDiagrams, Taxonomies, Timelines, Frameworks, Comparisons, AnnotatedScreenshots, RecipeCards, Aphorisms, Maps, Stats, Comics, YouTubeThumbnailChecklist, AdHocYouTubeThumbnail, CreatePAIPackIcon, LogoWallpaper, RemoveBackground. SKILLCUSTOMIZATIONS loads PREFERENCES.md, CharacterSpecs.md, and SceneConstruction.md. --remove-bg flag produces transparent-background PNG (can produce black backgrounds — verify visually). Up to 14 reference images per request (5 human, 6 object Gemini API limit). Output s

2026-05-0116.1k

evals

danielmiessler/LifeOS

Comprehensive AI agent evaluation framework with three grader types (code-based: deterministic/fast; model-based: nuanced/LLM rubric; human: gold standard) and pass@k / pass^k scoring. Evaluates agent transcripts, tool-call sequences, and multi-turn conversations — not just single outputs. Supports capability evals (~70% pass target) and regression evals (~99% pass target). Workflows: RunEval, CompareModels, ComparePrompts, CreateJudge, CreateUseCase, RunScenario, CreateScenario, ViewResults. Integrates with THE ALGORITHM ISC rows for automated verification. Domain patterns pre-configured for coding, conversational, research, and computer-use agent types in Data/DomainPatterns.yaml. Tools: AlgorithmBridge.ts (ISC integration), FailureToTask.ts (failures → tasks), SuiteManager.ts (create/graduate/saturation-check), ScenarioRunner.ts (multi-turn simulated-user), TranscriptCapture.ts, PAIAgentAdapter.ts (wraps Inference.ts), ScenarioToTranscript.ts. Code-based graders: string_match, regex_match, binary_tests, st

2026-05-0116.1k

prompting

danielmiessler/LifeOS

Meta-prompting standard library — the PAI system for generating, optimizing, and composing prompts programmatically. Owns three pillars: Standards (Anthropic Claude 4.x best practices, context engineering principles, 1,500+ paper synthesis, Fabric pattern system, markdown-first / no-XML-tags); Templates (Handlebars-based — Briefing.hbs, Structure.hbs, Gate.hbs, DynamicAgent.hbs, and eval-specific templates Judge.hbs, Rubric.hbs, TestCase.hbs, Comparison.hbs, Report.hbs used by Agents and Evals skills); and Tools (RenderTemplate.ts for CLI/TypeScript rendering with data-content separation). Philosophy: prompts that write prompts — structure is code, content is data. Delivered 65% token reduction across PAI (53K → 18K tokens) via template extraction. Output is always a prompt to be used elsewhere, not final content. Reference files: Standards.md (complete prompt engineering guide), Tools/RenderTemplate.ts (rendering implementation). NOT FOR generating final content or answers — this skill produces prompts only

2026-05-0116.1k

agents

danielmiessler/LifeOS

Compose CUSTOM agents from Base Traits + Voice + Specialization, and manage predefined functional TEAMS. Traits combine expertise (security, technical, research), personality (skeptical, analytical, enthusiastic), and approach (thorough, rapid, systematic). ComposeAgent.ts merges base + user config, outputs unique prompt + ElevenLabs voice + prosody. Predefined teams: engineering, architecture, marketing, design, security, research, content, strategy — each YAML-configured with roles, tensions, and specialist members. Observer team variant: read-only oversight agents that vote continue/halt/escalate against the tool-activity audit log (high-blast-radius or unattended runs only). USE WHEN create custom agents, spin up agents, specialized agents, agent personalities, available traits, list traits, agent voices, compose agent, spawn parallel agents, launch agents, engineering team, architecture team, marketing team, design team, security team, research team, content team, strategy team, get the team on this, obs

2026-04-3016.1k

apertureoscillation

danielmiessler/LifeOS

3-pass scope oscillation that holds a question constant while shifting the scope envelope — narrow/tactical, wide/strategic, then synthesis — to surface design tensions invisible at any single zoom level. Requires two distinct inputs: the tactical target (what you're building) and strategic context (the larger system it serves). Pass 1 captures the component's own internal logic. Pass 2 reveals what the system needs it to be. Pass 3 finds where those views diverge — that delta is the output. Produces: design tensions, scope recommendations, and coherence assessments. Single workflow: Workflows/Oscillate.md. BPE-fragile — quarterly test recommended to verify smarter models don't naturally oscillate scope without prompting. Best integration point: Algorithm OBSERVE phase (before ISC) or THINK phase (before approach commitment). NOT a lens rotation (that's IterativeDepth) and NOT idea generation (that's BeCreative). NOT FOR deep incident causal chains (use RootCauseAnalysis) or assumption decomposition (use Firs

2026-04-3016.1k

aphorisms

danielmiessler/LifeOS

Manages a curated aphorism collection with full CRUD — content-based matching, themed search, thinker research, and database maintenance. Organizes quotes by author, theme, context, and newsletter usage history to prevent repetition. Four workflows: FindAphorism (analyze newsletter content, match themes, return 3-5 ranked recommendations with rationale), AddAphorism (parse quote + author, extract themes, validate uniqueness, update theme index), ResearchThinker (deep research on philosopher, add sourced quotes to database), SearchAphorisms (search by theme, keyword, or author). Database at ~/.claude/skills/aphorisms/Database/aphorisms.md — stores full quote text, author attribution, theme tags, context/background, source reference, and usage history per entry. Theme index supports 12+ categories: Work Ethic, Resilience, Learning, Stoicism, Risk, Wisdom, Truth-seeking, Excellence, Curiosity, Freedom, Rationality, Clarity. Supported thinkers: Hitchens, Feynman, Deutsch, Sam Harris, Spinoza, plus any requested a

2026-04-3016.1k

name	Optimize
description	Autonomous optimization loop — hill-climb any target. Code with metrics, or skills/prompts/agents with LLM-as-judge eval. USE WHEN optimize, autoresearch, hill climb, improve metric, reduce latency, improve performance, benchmark optimization, bundle size, page speed, autonomous improvement loop, optimize skill, optimize prompt, improve quality, eval mode.
disable-model-invocation	true
effort	medium

/optimize — Autonomous Optimization v2

Run an autonomous optimization loop against any target. Two modes:

Metric mode — code targets with a shell command that produces a number (the original)
Eval mode — skills, prompts, agents, or any text target judged by LLM-as-judge binary evals

The agent modifies the target, measures the result, keeps improvements, discards failures, and repeats.

Inspired by Karpathy's autoresearch and extended with LLM-as-judge evaluation.

Invocation

Metric Mode (code targets)

/optimize --metric "lighthouse_score" --higher-is-better \
  --measure "npx lighthouse http://localhost:3000 --output=json" \
  --extract "jq '.categories.performance.score * 100' lighthouse.json" \
  --files "src/**/*.tsx,src/**/*.css" \
  --budget 120

/optimize --resume        # Resume a previous optimization loop
/optimize --status        # Show results summary from last/current run

Eval Mode (skill/prompt/agent targets)

/optimize --target "~/.claude/skills/ExtractWisdom"
/optimize --target "~/.claude/skills/Research/Workflows/QuickResearch.md"
/optimize --target "prompts/my-prompt.md"
/optimize --target "~/.claude/skills/ExtractWisdom" --max-experiments 20

In eval mode, the system automatically:

Detects the target type (skill, prompt, agent, code, function)
Reads the target to understand its purpose and constraints
Generates 3-6 binary eval criteria and 3-5 test inputs
Presents criteria + inputs for your approval before starting
Runs the optimization loop using LLM-as-judge scoring
Presents a recommendation (apply/reject/partial) when done

What Happens

This skill triggers the PAI Algorithm in mode: optimize:

OBSERVE — Define or auto-detect the target, set eval_mode
THINK — Analyze codebase/skill, generate hypothesis queue
PLAN — Prioritize hypotheses by expected impact
BUILD — Phase 0: TARGET ANALYSIS (see optimize-loop.md)
- Detect target type, auto-generate eval criteria (eval mode), set up sandbox, baseline
EXECUTE — The autonomous loop (optimize-loop.md):
- Hypothesize → Modify target → Measure (metric or eval) → Keep/Revert → Repeat
- Metric mode: ~12 experiments/hour (at 5-min budget)
- Eval mode: ~6-8 experiments/hour (multi-run judging is slower)
VERIFY — Phase 9: RECOMMEND — diff, summary, apply/reject/partial options
LEARN — Phase 10: EXTRACT LEARNINGS — what worked, what didn't, structured insights

Arguments — Metric Mode

Argument	Required	Default	Description
`--metric NAME`	yes		Human-readable metric name
`--measure COMMAND`	yes		Shell command that produces the metric
`--files GLOB`	yes		Files the agent may modify (comma-separated)
`--higher-is-better`		(default)	Higher metric values are better
`--lower-is-better`			Lower metric values are better
`--extract COMMAND`		Last number in stdout	Extract metric from output
`--budget SECONDS`		300	Time budget per experiment
`--target VALUE`		none	Stop when metric reaches this value
`--max-experiments N`		none	Stop after N experiments
`--locked GLOB`		none	Files the agent must NOT modify
`--constraints TEXT`		none	Additional rules (e.g., "tests must pass")

Arguments — Eval Mode

Argument	Required	Default	Description
`--target PATH`	yes		Path to skill directory, prompt file, or agent definition
`--max-experiments N`		none	Stop after N experiments
`--runs N`		3	Runs per experiment (more = more reliable, slower)
`--criteria "Q1" "Q2"`		auto-generated	Override auto-generated eval criteria
`--inputs "I1" "I2"`		auto-generated	Override auto-generated test inputs
`--budget SECONDS`		300	Time budget per experiment

Shared Arguments

Argument	Description
`--resume`	Resume a previous optimization run
`--status`	Show results summary

Algorithm Integration

When /optimize is invoked, the Algorithm enters with mode: optimize in the ISA frontmatter. The eval_mode is set based on arguments:

--measure provided → eval_mode: metric (git branch sandbox)
--target provided → eval_mode: eval (directory sandbox)

Reference files:

~/.claude/PAI/ALGORITHM/optimize-loop.md — the full loop protocol
~/.claude/PAI/ALGORITHM/eval-guide.md — how to write good eval criteria
~/.claude/PAI/ALGORITHM/target-types.md — target detection and ISC generation

Examples

Metric Mode

Optimize page load time:

/optimize --metric "lighthouse_perf" --higher-is-better \
  --measure "npx lighthouse http://localhost:3000 --output=json --output-path=lh.json" \
  --extract "jq '.categories.performance.score * 100' lh.json" \
  --files "src/**/*.tsx,src/**/*.css" \
  --target 95 --budget 120

Optimize bundle size:

/optimize --metric "bundle_bytes" --lower-is-better \
  --measure "bun run build 2>&1 && du -sb dist/ | cut -f1" \
  --files "src/**/*.ts" \
  --constraints "all tests must pass"

ML training (Karpathy-style):

/optimize --metric "val_bpb" --lower-is-better \
  --measure "uv run train.py > run.log 2>&1 && grep '^val_bpb:' run.log | cut -d' ' -f2" \
  --files "train.py" \
  --locked "prepare.py" \
  --budget 300

Eval Mode

Optimize a skill's Extract workflow:

/optimize --target "~/.claude/skills/ExtractWisdom" --max-experiments 15

Optimize a standalone prompt:

/optimize --target "prompts/summarize-article.md" --runs 5

Optimize with custom criteria:

/optimize --target "~/.claude/skills/Research/Workflows/QuickResearch.md" \
  --criteria "Does the output contain specific facts with sources?" \
            "Is the output structured with clear sections?" \
            "Does the output avoid generic filler?" \
  --inputs "research quantum computing breakthroughs 2025" \
           "quick research on supply chain security" \
           "find recent developments in AI agents"

Gotchas

Hill-climbing can get stuck in local optima. If score plateaus, consider resetting with different initial conditions.
Eval mode vs metric mode: Use metric mode for quantifiable targets (latency, size). Use eval mode for qualitative targets (skill quality, prompt effectiveness).
Regression tolerance prevents catastrophic changes. Don't set it to 0 — some regression in secondary metrics is acceptable if primary metric improves significantly.