| name | alphaproof-nexus |
| description | Knowledge scaffold for building, using, or reasoning about AlphaProof Nexus — Google DeepMind's LLM-aided formal proof search system (arXiv:2605.22763). Always use this skill for ANY of the following: AI-driven theorem proving in Lean 4, reproducing or extending the AlphaProof Nexus agent architecture, solving open mathematics problems with formal verification, integrating evolutionary algorithms with LLM proof search, applying the system to Erdős problems / OEIS conjectures / algebraic geometry / optimization / graph theory, understanding the EVOLVE-BLOCK / EVOLVE-VALUE prompt interface, comparing the four agent configurations (A/B/C/D), or the Elo/P-UCB sketch rating mechanism. Also trigger for adjacent queries like "automate math research with AI", "connect Lean compiler feedback to an LLM loop", "cheapest way to prove hard math with AI", "reproduce a DeepMind theorem prover", "LLM + formal verification pipeline", or anything about AlphaProof, AlphaEvolve applied to proofs, or the Formal Conjectures benchmark.
|
AlphaProof Nexus — Skill
Overview
AlphaProof Nexus is a framework from Google DeepMind (May 2026, arXiv:2605.22763)
that combines frontier LLMs with the Lean 4 proof assistant and an optional
evolutionary search layer to autonomously solve open mathematics research problems.
Key results: 9/353 open Erdős problems solved, 44/492 OEIS conjectures proved,
plus contributions to algebraic geometry, convex optimization, graph theory, and
quantum optics — all at a cost of a few hundred USD per problem.
Core Concepts
Lean 4 as Verifier
- Every proof step is machine-checked; no hallucinations can silently survive.
- The
sorry tactic closes goals without proof — a solved problem has zero sorry occurrences.
- The Lean compiler returns structured error messages reused as LLM feedback.
Proof Sketch Interface
The user supplies a .lean file with:
- The target theorem (with
sorry as placeholder) — never modified by the agent.
-- EVOLVE-BLOCK-START / END markers: regions the agent may rewrite freely (helper lemmas, definitions, tactic blocks).
-- EVOLVE-VALUE-START / END markers: single scalar expressions or answer terms (e.g., True/False, a numerical bound, an algorithm step-size) the agent may substitute. Not for arbitrary code — use EVOLVE-BLOCK for that.
- Optional natural-language context and domain knowledge encoded in Lean.
The agent outputs a sorry-free proof inside those markers, touching nothing else.
OEIS / sequence conjectures: Before attempting the main conjecture, require the agent to prove "test lemmas" that verify the first few terms of the sequence against its formal definition. This guards against misformalization silently producing a trivially-provable but wrong statement.
Agent Architecture (A → D)
| Agent | Core loop | AlphaProof tool | Evolution |
|---|
| (A) Basic | Ralph loop (multi-turn LLM + search-replace + Lean compiler) | ✗ | ✗ |
| (B) Basic + AP | Same as A | ✓ | ✗ |
| (C) Basic + Evolution | Shares population DB; Elo-rated sketches | ✗ | ✓ |
| (D) Full | All of the above; coordinates via P-UCB sampling | ✓ | ✓ |
Default recommendation: Use (B) for most problems (cost-efficient, high solve rate). Use (D) for problems where simpler agents plateau (e.g., Erdős #125 required the evolutionary layer for reliable convergence). Avoid (C) alone — evolution without AlphaProof is generally dominated by (A)/(B) on both cost and solve rate; the evolutionary machinery only pays off when combined with AlphaProof in (D).
Ralph Loop (Basic subagent)
episode:
LLM session (Gemini 3.1 Pro, chain-of-thought)
↓ search_replace tool → edit sketch
↓ Lean compiler → error / success feedback
↓ repeat until budget or sorry-free
if still sorry: write lessons-learned comment → next episode
N subagents run independently in parallel; first success terminates all others.
Full Agent (D) Controller Loop
- DB sampling — select root sketch S_root + 2 inspiration sketches via P-UCB.
- Prompt assembly — problem spec + Lean source + prior AlphaProof feedback + diversity injection ("decompose unsolved goals", "try new approach", …).
- Prover subagent — multi-turn LLM episode; can call AlphaProof mid-episode (max 5 calls, 90 search-replace edits per episode).
- Validation — SafeVerify checks: no axiom injection, no altered theorem statement, proof compiles.
- DB registration — sketch + AlphaProof subgoal feedback added; Elo updated asynchronously by rater subagents.
AlphaProof Integration
- Called with unsolved Lean subgoals; returns proof / disproof / failure message.
- Runs in low-compute tree-search inference mode (~400 simulations, bounded RPC timeout).
- Global goal cache: subgoals hashed by exact Lean state; proven/disproven results reused across the population without re-querying.
- Estimated cost: ~$60 USD per problem on v6e TPUs (not included in reported LLM costs).
Elo Rating System
- Rater subagents (Gemini 3.0 Flash) run P=7-sketch tournaments continuously.
- Ranking model: Plackett-Luce with hierarchical Gamma prior; Gibbs sampling (I=1000, B=200 burn-in).
- Elo formula:
Elo = 1200 + 400·log10(λ_mean)
- Criteria (descending priority): strategic robustness & generalizability, decomposition quality of remaining
sorry gaps, logical correctness.
- Key rater insight: "AlphaProof failure ≠ bad sketch." A sketch with well-chosen
sorry gaps beats one with no gaps but a dead-end strategy.
P-UCB Sampling
score = q + c · sqrt(ΣV_i / (v + 1))
q = Elo normalized to [0,1] over top-64 sketches.
v = visit count for this sketch; ΣV_i = total visits in elite tier.
c = 0.2 (empirically tuned).
- Prevents collapse to a single lineage while exploiting top candidates.
Cost & Scaling Guidelines
| Config | Parallel subagents K | Typical cost range | Best for |
|---|
| A/B (K=1–10) | Independent reruns | $10–$600 | Easy–medium problems |
| D (K=10) | 10 async generators | $50–$1000+ | Hard problems |
- Per-problem cost has high variance (stochastic search); budget for 3–5× median.
- Exploring all 353 Erdős problems to identify tractable ones was itself a large compute investment — factor this into project planning.
- Smaller models (Gemini 3.0 Flash, Gemini 3.1 Flash-Lite) as provers: 0/9 Erdős problems solved — don't substitute for 3.1 Pro on the prover.
- AlphaProof standalone tree search (64 TPU-hours/problem): 0/9 solved — needs LLM orchestration.
Prompt Engineering Notes
Prover Subagent Prompt (Agent D) — Key Directives
- Think like a mathematician: focus on key insights, not brute-force casework.
- Decompose hard subgoals into simpler lemmas before calling AlphaProof — monolithic goals fail.
- Use
have for intermediate steps; let for local definitions.
- Aim for 8–10
search_replace calls per episode; use them exhaustively.
- Never use a single
sorry to cover multiple reasoning steps.
- Write Lean comments (
--) explaining high-level intuition alongside each step.
- Do NOT add
import statements — Mathlib is loaded by default.
Rater Prompt — Key Directives
- Rank from best to worst using
<decision>2 > 1 = 3</decision> format.
- Penalize "overfitting" (brute-force computation that won't generalize).
- Reward diversity of strategy across the population.
- Ignore timeouts/service errors in feedback.
Basic Agent Prompt — Key Directives
- George Dantzig framing: approach with confidence, as if the problem is solvable.
- Never end a session with non-compiling code.
- If stuck, explore specializations/generalizations as helper lemmas for insight.
- All findings/plans go as Lean comments — only file content persists to the next episode.
Proven Results by Domain
| Domain | Result |
|---|
| Erdős Problems | 9/353 solved (incl. two open since 1970: #12(i), #12(ii)) |
| OEIS | 44/492 open conjectures proved |
| Optimization | O(1/t) convergence of Anchored GDA (agent simultaneously found proof + novel learning schedule) |
| Graph Theory | Bipartite reconstruction conjecture variant; Graffiti conjecture #2 (spanning tree leaves) |
| Algebraic Geometry | Log-concavity of pure O-sequences, codimension 3 type 2 (open ~15 years) |
| Additive Combinatorics | Counterexample to Green's problem #57 (complex-valued case) |
| Quantum Optics | Monochromatic quantum graph existence for N=d∈{4,6,10} |
Known Failure Modes
- Circular sorry: Agent offloads core difficulty into a helper lemma that simply restates the target. Explicit prompting against this helps but doesn't eliminate it.
- Hallucinated lemmas: Agent cites non-existent established results as
sorry-ed lemmas. End-to-end formal verification is the only reliable filter.
- Misformalization: Ambiguous natural-language terms (e.g., "density" vs. "lower density" vs. "upper density") can lead to wrong Lean statements. Agent can serve as a diagnostic tool — if it proves the wrong version easily, check the formalization.
- Scope limitation: Current successes concentrated where Mathlib is mature (combinatorics, number theory, convex optimization). Problems requiring extensive new theory remain out of reach.
Implementation Details
- Language: Python with
asyncio; Docker sandboxes for Lean compilation (Lean v4.27 + Pantograph).
- Lean interface: Pantograph for stateful proof interaction; SafeVerify for final integrity checks.
- LLM backend: Gemini 3.1 Pro (prover), Gemini 3.0 Flash (rater).
- Code repo: https://github.com/google-deepmind/alphaproof-nexus-results
Quick Reference: EVOLVE Marker Usage
-- EVOLVE-BLOCK-START
-- Helper lemmas and definitions go here
-- EVOLVE-BLOCK-END
theorem target_theorem : answer(
-- EVOLVE-VALUE-START
default -- ← agent fills in the answer value
-- EVOLVE-VALUE-END
) ↔ <condition> := by
-- EVOLVE-BLOCK-START
sorry -- ← agent fills in the proof
-- EVOLVE-BLOCK-END
For optimization problems, place the algorithm parameter inside EVOLVE-VALUE so the agent jointly discovers proof + optimal parameter.
References
- Paper: arXiv:2605.22763 (Tsoukalas et al., Google DeepMind, May 2026)
- AlphaProof (RL-based olympiad prover): arXiv via Hubert et al., Nature 2025
- AlphaEvolve (evolutionary algorithm backbone): arXiv:2506.13131
- Formal Conjectures repo (Erdős/OEIS Lean formalizations): github.com/google-deepmind/formal-conjectures
- Terence Tao's AI contributions wiki: github.com/teorth/erdosproblems/wiki/AI-contributions
- ErdosProblems catalog: erdosproblems.com