Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

alphaproof-nexus

Knowledge scaffold for building, using, or reasoning about AlphaProof Nexus — Google DeepMind's LLM-aided formal proof search system (arXiv:2605.22763). Always use this skill for ANY of the following: AI-driven theorem proving in Lean 4, reproducing or extending the AlphaProof Nexus agent architecture, solving open mathematics problems with formal verification, integrating evolutionary algorithms with LLM proof search, applying the system to Erdős problems / OEIS conjectures / algebraic geometry / optimization / graph theory, understanding the EVOLVE-BLOCK / EVOLVE-VALUE prompt interface, comparing the four agent configurations (A/B/C/D), or the Elo/P-UCB sketch rating mechanism. Also trigger for adjacent queries like "automate math research with AI", "connect Lean compiler feedback to an LLM loop", "cheapest way to prove hard math with AI", "reproduce a DeepMind theorem prover", "LLM + formal verification pipeline", or anything about AlphaProof, AlphaEvolve applied to proofs, or the Formal Conjectures benchm

In Manus ausführen

Überblick

Installationsbefehl

npx skills add https://github.com/daedalus/skills --skill alphaproof-nexus

Kopieren Sie diesen Befehl und fügen Sie ihn in Claude Code ein, um den Skill zu installieren

Quelle

daedalus/skills

Sterne1

Forks0

Aktualisiert24. Mai 2026 um 03:57

SKILL.md

readonly

Mehr aus diesem Repository

gleiches Repository

ai-code-review

daedalus/skills

Orchestrate multi-agent AI code review on a git diff or merge request. Use this skill whenever the user wants to review code changes with AI, analyze a diff, audit a pull request or merge request, check for bugs/security issues/performance problems, or set up an automated code review pipeline. Trigger even for casual phrasing like "can you review this PR", "check my diff for issues", "look over these changes", or "what do you think of this code change". Always use this skill when code review, diff analysis, or MR/PR review is involved — do not attempt ad-hoc review without it. Do NOT trigger for reviewing prose, essays, documentation-only files, or non-code content.

2026-05-301

adhd-reasoning-mode

daedalus/skills

Apply exploratory, curiosity-driven reasoning inspired by ADHD-associated cognitive traits — including curiosity-biased attention, associative jumps across distant domains, interrupt-driven anomaly detection, hyperfocus under uncertainty, and parallel weak-stream ideation. Use this skill whenever the user asks for: creative brainstorming, cross-domain analogies, unconventional problem-solving, research hypothesis generation, adversarial/security thinking, scientific discovery tasks, or any time the user says "think outside the box", "what am I missing", "explore weird angles", "be creative", "ADHD mode", or "exploratory reasoning". Also trigger when a conventional answer would be too narrow, too domain-local, or when the problem space benefits from wide associative search before convergence. Trigger mid-task too: if reasoning has stayed in one domain for several steps without surprise, this skill applies even if it wasn't requested upfront.

2026-05-281

python-project-scaffold

daedalus/skills

Full Python project bootstrapping workflow. Use this skill whenever the user wants to build a new Python tool, library, CLI, or module from scratch — especially when they mention "create X", "build X in Python", "write a Python project for X", or ask for a proper project with tests, linting, versioning, or git setup. Triggers on any request to scaffold, initialize, or structure a new Python project. Even if the user only says "build me X in Python", apply this skill — it encodes the full professional workflow: SPEC → implementation → pytest → README → lint → git. Always use this skill rather than improvising a one-off script when the deliverable is a reusable project.

2026-05-261

os-bootstrap

daedalus/skills

Bootstrap the creation of a POSIX-like operating system kernel from scratch. Use this skill whenever someone wants to build, start, or plan a kernel or OS — including requests like "help me write an OS", "I want to build a kernel", "start an operating system project", "implement POSIX syscalls", "build a process scheduler", "write a VFS layer", "implement memory management for my kernel", "create a bootable system", or any request involving kernel internals (interrupts, paging, scheduling, file systems, system calls). Also trigger when someone wants to extend an existing hobby OS with a new kernel subsystem. This skill covers both project scaffolding AND deep technical implementation guidance — use it for either or both.

2026-05-241

ai-code-detection

daedalus/skills

Detect whether a piece of code or an entire software project was written by a human, AI, or some hybrid thereof. Use this skill whenever the user wants to audit a file, snippet, repo, or commit history for AI authorship signals; phrases like "is this AI-generated", "was this written by ChatGPT", "detect LLM code", "human or AI?", "check for AI authorship", "is this vibe-coded", or any request to judge, score, or explain the provenance of code. Also trigger when the user pastes code and asks "did a human write this?" or "does this look AI-generated?" — even casually phrased.

2026-05-191

social-engineering-jailbreak

daedalus/skills

Analyze, reproduce, and defend against social engineering jailbreaks on LLMs — attacks that exploit psychological compliance patterns rather than technical prompt injection. Use this skill whenever the user wants to: map a manipulation sequence move-by-move, generate a social engineering attack transcript against a target model, evaluate a model's resistance to psychosocial pressure, audit a conversation for coercive structure, or build robustness evals for manipulation-vector attacks. Trigger on phrases like: "jailbreak without injection", "psychopathy jailbreak", "social engineering an LLM", "manipulation sequence", "coercive compliance", "identity reframe", "authority jailbreak", "test model against social pressure", "does the model resist gaslighting", "analyze this conversation for manipulation", "incremental escalation", "grooming pattern", "commitment and consistency exploit", "why did the model comply", or any request to understand why an LLM failed to hold a boundary under conversational pressure rat

2026-05-191

Quelle

daedalus

daedalus/skills

GitHub-Repository öffnen Creator-Repositorys ansehen

Installationsbefehl

Download

In Manus ausführen

Nützlich fürSOC

SoftwareentwicklerInformatik- und Mathematikberufe15-1252L4

name

alphaproof-nexus

description

AlphaProof Nexus — Skill

Overview

AlphaProof Nexus is a framework from Google DeepMind (May 2026, arXiv:2605.22763) that combines frontier LLMs with the Lean 4 proof assistant and an optional evolutionary search layer to autonomously solve open mathematics research problems. Key results: 9/353 open Erdős problems solved, 44/492 OEIS conjectures proved, plus contributions to algebraic geometry, convex optimization, graph theory, and quantum optics — all at a cost of a few hundred USD per problem.

Core Concepts

Lean 4 as Verifier

Every proof step is machine-checked; no hallucinations can silently survive.
The sorry tactic closes goals without proof — a solved problem has zero sorry occurrences.
The Lean compiler returns structured error messages reused as LLM feedback.

Proof Sketch Interface

The user supplies a .lean file with:

The target theorem (with sorry as placeholder) — never modified by the agent.
-- EVOLVE-BLOCK-START / END markers: regions the agent may rewrite freely (helper lemmas, definitions, tactic blocks).
-- EVOLVE-VALUE-START / END markers: single scalar expressions or answer terms (e.g., True/False, a numerical bound, an algorithm step-size) the agent may substitute. Not for arbitrary code — use EVOLVE-BLOCK for that.
Optional natural-language context and domain knowledge encoded in Lean.

The agent outputs a sorry-free proof inside those markers, touching nothing else.

OEIS / sequence conjectures: Before attempting the main conjecture, require the agent to prove "test lemmas" that verify the first few terms of the sequence against its formal definition. This guards against misformalization silently producing a trivially-provable but wrong statement.

Agent Architecture (A → D)

Agent	Core loop	AlphaProof tool	Evolution
(A) Basic	Ralph loop (multi-turn LLM + search-replace + Lean compiler)	✗	✗
(B) Basic + AP	Same as A	✓	✗
(C) Basic + Evolution	Shares population DB; Elo-rated sketches	✗	✓
(D) Full	All of the above; coordinates via P-UCB sampling	✓	✓

Default recommendation: Use (B) for most problems (cost-efficient, high solve rate). Use (D) for problems where simpler agents plateau (e.g., Erdős #125 required the evolutionary layer for reliable convergence). Avoid (C) alone — evolution without AlphaProof is generally dominated by (A)/(B) on both cost and solve rate; the evolutionary machinery only pays off when combined with AlphaProof in (D).

Ralph Loop (Basic subagent)

episode:
  LLM session (Gemini 3.1 Pro, chain-of-thought)
    ↓ search_replace tool → edit sketch
    ↓ Lean compiler → error / success feedback
    ↓ repeat until budget or sorry-free
  if still sorry: write lessons-learned comment → next episode

N subagents run independently in parallel; first success terminates all others.

Full Agent (D) Controller Loop

DB sampling — select root sketch S_root + 2 inspiration sketches via P-UCB.
Prompt assembly — problem spec + Lean source + prior AlphaProof feedback + diversity injection ("decompose unsolved goals", "try new approach", …).
Prover subagent — multi-turn LLM episode; can call AlphaProof mid-episode (max 5 calls, 90 search-replace edits per episode).
Validation — SafeVerify checks: no axiom injection, no altered theorem statement, proof compiles.
DB registration — sketch + AlphaProof subgoal feedback added; Elo updated asynchronously by rater subagents.

AlphaProof Integration

Called with unsolved Lean subgoals; returns proof / disproof / failure message.
Runs in low-compute tree-search inference mode (~400 simulations, bounded RPC timeout).
Global goal cache: subgoals hashed by exact Lean state; proven/disproven results reused across the population without re-querying.
Estimated cost: ~$60 USD per problem on v6e TPUs (not included in reported LLM costs).

Elo Rating System

Rater subagents (Gemini 3.0 Flash) run P=7-sketch tournaments continuously.
Ranking model: Plackett-Luce with hierarchical Gamma prior; Gibbs sampling (I=1000, B=200 burn-in).
Elo formula: Elo = 1200 + 400·log10(λ_mean)
Criteria (descending priority): strategic robustness & generalizability, decomposition quality of remaining sorry gaps, logical correctness.
Key rater insight: "AlphaProof failure ≠ bad sketch." A sketch with well-chosen sorry gaps beats one with no gaps but a dead-end strategy.

P-UCB Sampling

score = q + c · sqrt(ΣV_i / (v + 1))

q = Elo normalized to [0,1] over top-64 sketches.
v = visit count for this sketch; ΣV_i = total visits in elite tier.
c = 0.2 (empirically tuned).
Prevents collapse to a single lineage while exploiting top candidates.

Cost & Scaling Guidelines

Config	Parallel subagents K	Typical cost range	Best for
A/B (K=1–10)	Independent reruns	$10–$600	Easy–medium problems
D (K=10)	10 async generators	$50–$1000+	Hard problems

Per-problem cost has high variance (stochastic search); budget for 3–5× median.
Exploring all 353 Erdős problems to identify tractable ones was itself a large compute investment — factor this into project planning.
Smaller models (Gemini 3.0 Flash, Gemini 3.1 Flash-Lite) as provers: 0/9 Erdős problems solved — don't substitute for 3.1 Pro on the prover.
AlphaProof standalone tree search (64 TPU-hours/problem): 0/9 solved — needs LLM orchestration.

Prompt Engineering Notes

Prover Subagent Prompt (Agent D) — Key Directives

Think like a mathematician: focus on key insights, not brute-force casework.
Decompose hard subgoals into simpler lemmas before calling AlphaProof — monolithic goals fail.
Use have for intermediate steps; let for local definitions.
Aim for 8–10 search_replace calls per episode; use them exhaustively.
Never use a single sorry to cover multiple reasoning steps.
Write Lean comments (--) explaining high-level intuition alongside each step.
Do NOT add import statements — Mathlib is loaded by default.

Rater Prompt — Key Directives

Rank from best to worst using <decision>2 > 1 = 3</decision> format.
Penalize "overfitting" (brute-force computation that won't generalize).
Reward diversity of strategy across the population.
Ignore timeouts/service errors in feedback.

Basic Agent Prompt — Key Directives

George Dantzig framing: approach with confidence, as if the problem is solvable.
Never end a session with non-compiling code.
If stuck, explore specializations/generalizations as helper lemmas for insight.
All findings/plans go as Lean comments — only file content persists to the next episode.

Proven Results by Domain

Domain	Result
Erdős Problems	9/353 solved (incl. two open since 1970: #12(i), #12(ii))
OEIS	44/492 open conjectures proved
Optimization	O(1/t) convergence of Anchored GDA (agent simultaneously found proof + novel learning schedule)
Graph Theory	Bipartite reconstruction conjecture variant; Graffiti conjecture #2 (spanning tree leaves)
Algebraic Geometry	Log-concavity of pure O-sequences, codimension 3 type 2 (open ~15 years)
Additive Combinatorics	Counterexample to Green's problem #57 (complex-valued case)
Quantum Optics	Monochromatic quantum graph existence for N=d∈{4,6,10}

Known Failure Modes

Circular sorry: Agent offloads core difficulty into a helper lemma that simply restates the target. Explicit prompting against this helps but doesn't eliminate it.
Hallucinated lemmas: Agent cites non-existent established results as sorry-ed lemmas. End-to-end formal verification is the only reliable filter.
Misformalization: Ambiguous natural-language terms (e.g., "density" vs. "lower density" vs. "upper density") can lead to wrong Lean statements. Agent can serve as a diagnostic tool — if it proves the wrong version easily, check the formalization.
Scope limitation: Current successes concentrated where Mathlib is mature (combinatorics, number theory, convex optimization). Problems requiring extensive new theory remain out of reach.

Implementation Details

Language: Python with asyncio; Docker sandboxes for Lean compilation (Lean v4.27 + Pantograph).
Lean interface: Pantograph for stateful proof interaction; SafeVerify for final integrity checks.
LLM backend: Gemini 3.1 Pro (prover), Gemini 3.0 Flash (rater).
Code repo: https://github.com/google-deepmind/alphaproof-nexus-results

Quick Reference: EVOLVE Marker Usage

-- EVOLVE-BLOCK-START
-- Helper lemmas and definitions go here
-- EVOLVE-BLOCK-END

theorem target_theorem : answer(
  -- EVOLVE-VALUE-START
  default          -- ← agent fills in the answer value
  -- EVOLVE-VALUE-END
) ↔ <condition> := by
  -- EVOLVE-BLOCK-START
  sorry            -- ← agent fills in the proof
  -- EVOLVE-BLOCK-END

For optimization problems, place the algorithm parameter inside EVOLVE-VALUE so the agent jointly discovers proof + optimal parameter.

References

Paper: arXiv:2605.22763 (Tsoukalas et al., Google DeepMind, May 2026)
AlphaProof (RL-based olympiad prover): arXiv via Hubert et al., Nature 2025
AlphaEvolve (evolutionary algorithm backbone): arXiv:2506.13131
Formal Conjectures repo (Erdős/OEIS Lean formalizations): github.com/google-deepmind/formal-conjectures
Terence Tao's AI contributions wiki: github.com/teorth/erdosproblems/wiki/AI-contributions
ErdosProblems catalog: erdosproblems.com