تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

lean-verify

Name: Lean Verify
Author: greyhaven-ai

// Use when running the experimental Lean formal-proof harness, validating machine-checked proof candidates, benchmarking playbook transfer, or invoking the pi-autocontext-lean-verify package. Triggers: Lean proof repair, formal proof, theorem proving, verifier-backed proof, autocontext formal, proof playbook, pre-repair hints.

تشغيل في Manus

$ git log --oneline --stat

stars:١

forks:٠

updated:١٤ مايو ٢٠٢٦ في ٢١:٤٧

SKILL.md

readonly

package.json

"author": "greyhaven-ai"

"repository": "greyhaven-ai/pi-autocontext-lean-verify"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

محللو ضمان جودة البرمجيات والمختبرونمهن الحاسوب والرياضيات15-1253L4

تشغيل أي مهارة بنقرة واحدة

name: lean-verify summary: Verifier-backed Lean proof repair using autocontext and Pi. description: Use when running the experimental Lean formal-proof harness, validating machine-checked proof candidates, benchmarking playbook transfer, or invoking the pi-autocontext-lean-verify package. Triggers: Lean proof repair, formal proof, theorem proving, verifier-backed proof, autocontext formal, proof playbook, pre-repair hints.

Formal Proof Harness

Use this skill for the external Lean proof-repair experiment harness. The package wraps support/formal-proof-lean-pilot and exposes the autocontext_lean_verify Pi tool.

Core rule

Lean is the correctness oracle. Never count prose, mathematical-looking text, or an LLM's claim of success as a theorem proof. A proof succeeds only when the fixed Lean theorem template compiles with the candidate proof body.

Guardrails

The harness rejects or avoids:

sorry
admit
new axioms
unsafe
theorem weakening
changing theorem statements
new imports / proof-by-importing-answer
candidates that are not proof bodies for the {{PROOF}} placeholder

Recommended workflow

Run, benchmark, and attribution actions default to short temp result paths under the system temp directory (pi-autocontext-lean-verify/...) (or $AUTOCONTEXT_LEAN_VERIFY_RESULTS_ROOT) so npm-loaded packages avoid Pi session path-length failures.

Use challenge_v6_frontier and challenge_v7_frontier for the first packaged verifier-only frontier suites. v7 is the first packaged suite where seeded autocontext is intentionally unstable while unseeded/direct solve none. Version 0.1.10 packages challenge_v8_diagnostics, challenge_v9_composition_gradient, challenge_v10_stats_reification, challenge_v11_metric_composition, and challenge_v12_simultaneous_metrics. Version 0.1.11 packages challenge_v13_decomposition_order and challenge_v14_metric_order_permutations; version 0.1.12 packages challenge_v15_proof_shape_hints. Version 0.1.13 packages challenge_v16_compact_reassembly_hints and challenge_v17_proof_plan_hints; version 0.1.14 packages challenge_v18_prompt_only_skeleton_hints, challenge_v19_bare_skeleton_names, and challenge_v20_description_only_skeleton to test whether detailed descriptions, bare labels, or both can live only in seeded playbook context without theorem-level True hypotheses. Version 0.1.15 packages challenge_v21_neutral_anchor_skeleton and the process-group timeout/short-run-root hardening used before v22 code-like anchor probes. Version 0.1.16 packages challenge_v22_code_like_anchor_skeleton, challenge_v23_exact_labels_without_plan_prefix, challenge_v24_plan_prefix_generic_labels, and challenge_v25_step_prefix_exact_labels; v22-v25 collectively show that snake_case anchors, exact roots without the original prefix, generic prefix labels, and alternate-prefixed exact roots do not reproduce v18 stability. Version 0.1.16 also carries a second timeout hardening patch that bounds post-kill waits and force-closes leaked subprocess pipes. Version 0.1.17 updates the explicit autocontext runtime pin to autocontext==0.5.1 to inherit the upstream Pi CLI timeout/process cleanup hardening.

Run setup or preflight first:

{
  "action": "setup"
}

or:

{
  "action": "preflight"
}

setup runs preflight and a minimal Lean smoke proof.

For the current strongest repair-only mode, use a named fixture group:

{
  "action": "run",
  "mode": "pre_repair_hint",
  "fixtureGroup": "broader"
}

This maps to the validated harness flags:

--no-pregenerate
--structured-alternate-retry
--structured-hint-candidates
--pre-repair-hint-candidates
--max-attempts 2
--rounds 2
--timeout 60

For expanded negative controls, prefer maxAttempts=3:

{
  "action": "run",
  "mode": "pre_repair_hint",
  "fixtureGroup": "negative_controls",
  "maxAttempts": 3,
  "rounds": 2,
  "timeoutSeconds": 60
}

To reproduce the proof-transfer benchmark, compare seeded autocontext/Pi against direct Pi repair-loop on no-expected-proof challenge fixtures:

{
  "action": "benchmark",
  "fixtureGroup": "challenge_v3_generalization",
  "maxAttempts": 2,
  "rounds": 2,
  "timeoutSeconds": 120
}

This mode uses --no-pregenerate and --structured-alternate-retry; synthetic hint candidates are disabled.

For seeded/unseeded/direct attribution on v5 fixtures:

{
  "action": "attribution",
  "fixtureGroup": "challenge_v5_attribution",
  "maxAttempts": 2,
  "rounds": 2,
  "timeoutSeconds": 120
}

Summarize a run with:

{
  "action": "summarize",
  "runRoot": "results/<run>"
}

Fixture groups

smoke: one quick theorem, add_zero_right.
broader: seven broader fixtures covering Nat distribution, Bool/List recursion, list helpers, tree helpers, and reverse/append.
heldout: seven original held-out transfer fixtures.
combined: broader + heldout (14 fixtures).
negative_controls: six fixtures where generic Nat/List hint candidates should fail Lean before fallback: distribution, multiplication associativity/commutativity, addition cancellation, and filter length bound.
challenge_v2_no_helper: three no-expected-proof fixtures requiring local helper lemma discovery.
challenge_v3_generalization: four no-expected-proof theorem-generalization fixtures requiring accumulator and multi-helper proof plans.
challenge_transfer: all v2/v3 challenge fixtures (7 fixtures).
challenge_v4_count: four no-expected-proof count/list/tree fixtures.
challenge_v5_attribution: four no-expected-proof simultaneous-invariant and multi-helper composition fixtures.
challenge_v5_tree_tally: one hard tree/tally mirror fixture.
challenge_v6_frontier: four frontier fixtures covering multi-accumulator reverse invariants, nested tree partition stats, successor-map reverse count/sum, and mutual tree stats.
challenge_v7_frontier: four frontier-plus fixtures combining partition/tree mirror composition with stats accumulator normalization.
challenge_v8_diagnostics: four diagnostic fixtures isolating the v7 partition-heavy accumulator frontier into smaller components.
challenge_v9_composition_gradient: six composition-gradient fixtures splitting keep/drop partition-through-mirror into scalar and paired length/sum obligations.
challenge_v10_stats_reification: six statsAcc-reification fixtures isolating tuple normalization, metric extensionality, count metrics, and stats equality from metric hypotheses.
challenge_v11_metric_composition: six metric-composition fixtures combining keep/drop metric bundles with statsAcc extensionality boundaries.
challenge_v12_simultaneous_metrics: seven simultaneous-metric fixtures splitting keep/drop count-only, length-only, sum-only, pair, and triple metric bundles.
challenge_v13_decomposition_order: six decomposition/order fixtures isolating v12 triple-bundle ordering, append/filter sub-lemmas, tree-only metrics, and conjunction reassembly.
challenge_v14_metric_order_permutations: six metric-order permutation fixtures covering all count/length/sum orderings in the full simultaneous keep/drop triple bundle.
challenge_v15_proof_shape_hints: four proof-shape hint fixtures exposing metric-bundle, side-bundle, append/filter, and final reassembly hypotheses.
challenge_v16_compact_reassembly_hints: four compact reassembly-hint fixtures replacing the large v15 higher-order hypothesis with generic/named packers.
challenge_v17_proof_plan_hints: four harmless True proof-plan skeleton hint fixtures naming induction, case split, simp, and Nat add normalization steps.
challenge_v18_prompt_only_skeleton_hints: two v16 theorem-shape fixtures where detailed proof-plan skeleton names are present only in seeded playbook context.
challenge_v19_bare_skeleton_names: ablation over the clean v18 fixtures using only bare skeleton-name bullets in the seeded playbook.
challenge_v20_description_only_skeleton: ablation over the clean v18 fixtures using compact skeleton descriptions without plan_... labels in the seeded playbook.
challenge_v21_neutral_anchor_skeleton: ablation over the clean v18 fixtures using neutral step anchors plus compact skeleton descriptions without plan_... labels in the seeded playbook.
challenge_v22_code_like_anchor_skeleton: ablation over the clean v18 fixtures using compact descriptions plus code-like snake_case step anchors without the earlier detailed-plan prefix.
challenge_v23_exact_labels_without_plan_prefix: ablation over the clean v18 fixtures using v18's exact detailed labels with only the prefix removed.
challenge_v24_plan_prefix_generic_labels: ablation over the clean v18 fixtures using compact descriptions plus shorter generic plan_ labels.
challenge_v25_step_prefix_exact_labels: ablation over the clean v18 fixtures using compact descriptions plus step_ prefixed exact v18 label roots.
challenge_extended_transfer: all v2/v3/v4/v5/v6/v7/v8/v9/v10/v11/v12/v13/v14/v15/v16/v17/v18 challenge fixtures (78 fixtures).

Modes

seeded_pregenerate: allow Pi/autocontext to pregenerate from the verified playbook, then Lean-check.
pre_repair_hint: no pregeneration; try Lean-checked strategy candidates before primary Pi repair.
post_repair_hint: no pregeneration; run primary Pi repair, then Lean-checked hints before alternate Pi fallback.
structured_retry: no pregeneration; structured alternate Pi retry without hint candidates.

Interpretation guidance

Challenge fixtures intentionally omit expected_proof.lean; success must come from the generated proof body passing Lean.
Use action="attribution" when comparing seeded, unseeded isolated, and direct Pi repair-loop outcomes.
Treat pre_repair_hint as a hybrid experiment mode, not pure LLM proof generation.
Lean-rejected hint candidates are good safety evidence when fallback recovers.
Compare methods using proof success, Pi calls, Lean verifier attempts, repair edits, and token edit distance.
Keep artifacts indexed with results/index.json and results/EXPERIMENTS.md.

Packaging status

This package is npm-distributed as pi-autocontext-lean-verify. Current patch releases use GitHub trusted publishing with npm provenance. The package remains experimental: report proof-transfer results as verifier-backed evidence, not as a claim that autocontext is a standalone theorem prover.