Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

agi-framework-chollet

Provides François Chollet's framework for understanding intelligence, AGI development paths, and the limitations of current AI approaches. Use this skill when users ask about- (1) What intelligence really means and how to define AGI, (2) Why scaling pre-training alone won't achieve AGI, (3) The difference between memorized skills and fluid intelligence, (4) Test-time adaptation and its role in AGI, (5) The ARC benchmark and what it measures, (6) Type 1 vs Type 2 abstraction in AI systems, (7) Program synthesis approaches to intelligence, (8) Evaluating claims about AGI progress, or (9) Understanding the conceptual foundations needed for building generally intelligent systems.

Exécuter dans Manus

Étoiles0

Forks2

Mis à jour28 janvier 2026 à 03:45

Source

jona

jona/ycombinator-skills

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Commande d'installation

Téléchargement

Exécuter dans Manus

Utile pourSOC

Scientifiques en recherche informatique et en informationProfessions informatiques et mathématiques15-1221L4

SKILL.md

readonly

name

agi-framework-chollet

description

François Chollet's Framework for AGI

This skill encapsulates François Chollet's theoretical framework for understanding intelligence and the path to AGI, based on his analysis of AI progress and the ARC benchmark.

Core Concepts

Two Definitions of Intelligence

Minsky-style view (task-based):

AI performs tasks normally done by humans
Corporate AGI definition: "model that could perform most economically valuable tasks"
Focus on capability benchmarks

McCarthy-style view (adaptation-based):

Intelligence is the ability to understand something new on the fly
Focus on fluid reasoning, not memorized skills
Measures generalization to novel situations

Key distinction: Memorized skills (static, task-specific) vs. fluid general intelligence (dynamic, adaptable).

The Pre-Training Scaling Era (2020-2024)

What worked:

Self-supervised text modeling at scale
Predictable benchmark improvements with more compute/data
Crushing performance on traditional benchmarks

Why it plateaued for AGI:

Benchmarks measured memorized skills, not fluid intelligence
50,000x scale-up (2019→2024) yielded only 0%→10% on ARC
Humans score >95% on ARC with no training
Confusion between benchmark performance and true understanding

Test-Time Adaptation Era (2024+)

Core principle: Models modify their own behavior dynamically based on specific data encountered during inference.

Key techniques:

Test-time training
Program synthesis
Chain-of-thought synthesis
Self-reprogramming for tasks at hand

Evidence of progress:

OpenAI's O3 (fine-tuned on ARC) achieved human-level ARC performance
All high-performing ARC approaches use test-time adaptation

Type 1 vs Type 2 Abstraction

Type 1 (Perceptual):

Pattern recognition from raw sensory data
Continuous, gradient-based learning
What deep learning excels at

Type 2 (Discrete/Programmatic):

Symbolic, compositional reasoning
Discrete program-like structures
Requires on-the-fly recombination of concepts

AGI requirement: Both types working together—deep learning for perception, program search for reasoning.

Analytical Workflows

Evaluate AGI Claims

When assessing claims about AGI progress:

Identify the benchmark being used
Determine if it measures memorized skills or fluid intelligence
Check if the model was trained on similar data
Ask: "Would performance hold on genuinely novel problems?"
Look for test-time adaptation vs. pure pre-training

Red flags:

Claims based solely on traditional benchmarks
No evaluation on out-of-distribution tasks
Conflating task performance with general intelligence

Analyze AI System Architecture

When examining an AI system's potential for general intelligence:

Assess pre-training approach
- What knowledge is baked in?
- How task-specific is the training?
Evaluate test-time capabilities
- Can the system adapt during inference?
- Does it use chain-of-thought or program synthesis?
- Can it modify its behavior for novel problems?
Check for both abstraction types
- Type 1: Perceptual pattern matching
- Type 2: Discrete symbolic reasoning
Determine generalization potential
- How does it perform on ARC-style tasks?
- Can it handle problems outside training distribution?

Assess Intelligence Metrics

When evaluating whether a metric truly measures intelligence:

Apply the Chollet criteria:
- Does it require understanding novel situations?
- Can it be solved by memorization?
- Is the solution space too narrow?
Check for the "ARC test":
- Would 50,000x more compute dramatically improve scores?
- If no, the metric likely measures fluid intelligence
- If yes, it may just measure knowledge retrieval
Consider human baseline:
- Humans with no training should outperform on true intelligence tests
- Large gap between human and AI suggests memorization-based approach

Key Frameworks

The Compute Cost Trajectory

Historical pattern since 1940:

Compute cost falls ~100x per decade
No sign of stopping
Implication: Compute alone doesn't solve intelligence

Why Pre-Training Scaling Failed for AGI

Scaling pre-training → Better benchmark scores
Better benchmark scores ≠ General intelligence
General intelligence requires → Novel problem adaptation
Novel problem adaptation requires → Test-time learning

The AGI Architecture Hypothesis

AGI = Deep Learning (Type 1) + Program Search (Type 2)
     + Test-Time Adaptation
     
Where:
- Deep Learning handles perception and pattern matching
- Program Search handles discrete reasoning and composition
- Test-Time Adaptation enables on-the-fly learning

Common Questions

"Is AGI already here with O3?"

Evaluate by asking:

Was the model fine-tuned specifically on ARC-like data?
Does performance generalize to other novel reasoning tasks?
Is it truly adapting or just better memorization?

O3 shows promising signs but fine-tuning on ARC means true generalization is uncertain.

"What's missing from current LLMs?"

Key gaps:

True test-time learning (not just prompting)
Program synthesis capabilities
Type 2 discrete abstraction
Generalization without task-specific fine-tuning

"How should we measure AGI progress?"

Use metrics that:

Cannot be solved by memorization
Require novel problem understanding
Show ceiling effects with scale alone
Demonstrate human-like fluid reasoning

ARC exemplifies these properties.

Terminology Reference

Term	Definition
Fluid intelligence	Ability to understand novel situations without prior training
Memorized skills	Static, task-specific capabilities from training
Test-time adaptation	Model modifying behavior during inference
ARC	Abstraction and Reasoning Corpus benchmark
Type 1 abstraction	Perceptual, continuous pattern recognition
Type 2 abstraction	Discrete, programmatic reasoning
Program synthesis	Generating programs to solve problems
Scaling laws	Predictable performance gains from more compute/data

Application Guidelines

When Discussing AGI Timelines

Distinguish capability claims from intelligence claims
Ask what benchmarks support the claim
Check for test-time adaptation in the architecture
Consider whether the system generalizes beyond training

When Designing AI Systems for Generalization

Include test-time adaptation mechanisms
Don't rely solely on pre-training scale
Incorporate program synthesis where possible
Test on out-of-distribution problems like ARC
Balance Type 1 and Type 2 capabilities

When Evaluating AI Research Directions

Prioritize approaches that:

Enable learning at inference time
Combine neural and symbolic methods
Demonstrate novel problem-solving
Go beyond benchmark optimization

Plus depuis ce dépôt

même dépôt

claude-code-best-practices

jona/ycombinator-skills

Best practices for using Claude Code effectively based on insights from its creator Boris Cherny. Trigger this skill when users ask about optimizing Claude Code usage, configuring CLAUDE.md files, using plan mode, working with sub-agents, understanding Claude Code philosophy, improving coding productivity with Claude Code, or building AI coding tools. Also trigger when users mention blatant demand, scaffolding in AI products, building for future model capabilities, or ask about Anthropic's approach to AI coding assistants.

2026-02-170

yc-startup-fundamentals

jona/ycombinator-skills

Y Combinator startup methodology covering team formation, MVP development, growth strategies, fundraising, PR, operations, and hiring. Trigger when users ask about starting a startup, forming a founding team, building an MVP, achieving product-market fit, raising venture capital, startup fundraising strategy, doing PR for startups, startup hiring decisions, startup operations, or when they need guidance on early-stage company building. Also trigger when users mention YC, Y Combinator, startup acceleration, or reference startup fundamentals like runway, burn rate, or co-founder dynamics.

2026-01-280

ai-product-building-heller

jona/ycombinator-skills

Guide for building successful AI startups based on Jake Heller's Casetext journey ($650M exit). Use when users need help with- (1) Selecting AI startup ideas by identifying jobs people pay humans to do, (2) Building reliable AI products through systematic evaluation and prompt iteration, (3) Pricing AI products based on value delivered, (4) Marketing AI products through product quality rather than sales tactics, (5) Understanding the assistance/replacement/unthinkable framework for AI opportunities, (6) Creating evaluation frameworks for AI prompts, or (7) Bridging the trust gap with enterprise customers for AI products.

2026-01-280

b2b-ai-startup-levie

jona/ycombinator-skills

Strategic framework for evaluating and building B2B AI startups based on Aaron Levie's insights from building Box through the cloud transformation. Use when founders or advisors need to - (1) Evaluate AI startup ideas for defensibility and market timing, (2) Design pricing models for AI products (consumption vs seat-based), (3) Analyze competitive positioning against incumbents, (4) Identify high-value AI opportunities in enterprise unstructured data, (5) Assess whether to target "core" vs "context" business functions, (6) Understand the 2024-2027 AI startup window dynamics, or (7) Apply Innovator's Dilemma and Crossing the Chasm frameworks to AI market entry.

2026-01-280

developer-tools-strategy-truell

jona/ycombinator-skills

Strategic guidance for building developer tools and AI-first products, derived from Michael Truell's experience building Cursor. Use when- (1) Evaluating whether to enter a market with established competitors, (2) Deciding between product improvement vs growth engineering investment, (3) Architecting AI-assisted developer tools, (4) Choosing between building custom infrastructure vs using existing solutions, (5) Navigating early user feedback that conflicts with product vision, (6) Assessing startup opportunities in AI/developer tools space, (7) Planning technical product launches and distribution strategies.

2026-01-280

software-paradigms-karpathy

jona/ycombinator-skills

Explains Andrej Karpathy's framework for understanding the three paradigms of software (1.0- traditional code, 2.0- neural network weights, 3.0- LLM prompts). Use when users ask about software paradigm shifts, the evolution of programming, how LLMs fit into software development history, Software 1.0/2.0/3.0 distinctions, prompt engineering as programming, or when they need to explain or apply Karpathy's mental model for understanding modern AI development. Also useful when discussing how to think about building software in the AI era, choosing between traditional code vs neural nets vs LLM prompts, or explaining the significance of "programming in English."

2026-01-280

name

agi-framework-chollet

description

François Chollet's Framework for AGI

This skill encapsulates François Chollet's theoretical framework for understanding intelligence and the path to AGI, based on his analysis of AI progress and the ARC benchmark.

Core Concepts

Two Definitions of Intelligence

Minsky-style view (task-based):

AI performs tasks normally done by humans
Corporate AGI definition: "model that could perform most economically valuable tasks"
Focus on capability benchmarks

McCarthy-style view (adaptation-based):

Intelligence is the ability to understand something new on the fly
Focus on fluid reasoning, not memorized skills
Measures generalization to novel situations

Key distinction: Memorized skills (static, task-specific) vs. fluid general intelligence (dynamic, adaptable).

The Pre-Training Scaling Era (2020-2024)

What worked:

Self-supervised text modeling at scale
Predictable benchmark improvements with more compute/data
Crushing performance on traditional benchmarks

Why it plateaued for AGI:

Benchmarks measured memorized skills, not fluid intelligence
50,000x scale-up (2019→2024) yielded only 0%→10% on ARC
Humans score >95% on ARC with no training
Confusion between benchmark performance and true understanding

Test-Time Adaptation Era (2024+)

Core principle: Models modify their own behavior dynamically based on specific data encountered during inference.

Key techniques:

Test-time training
Program synthesis
Chain-of-thought synthesis
Self-reprogramming for tasks at hand

Evidence of progress:

OpenAI's O3 (fine-tuned on ARC) achieved human-level ARC performance
All high-performing ARC approaches use test-time adaptation

Type 1 vs Type 2 Abstraction

Type 1 (Perceptual):

Pattern recognition from raw sensory data
Continuous, gradient-based learning
What deep learning excels at

Type 2 (Discrete/Programmatic):

Symbolic, compositional reasoning
Discrete program-like structures
Requires on-the-fly recombination of concepts

AGI requirement: Both types working together—deep learning for perception, program search for reasoning.

Analytical Workflows

Evaluate AGI Claims

When assessing claims about AGI progress:

Identify the benchmark being used
Determine if it measures memorized skills or fluid intelligence
Check if the model was trained on similar data
Ask: "Would performance hold on genuinely novel problems?"
Look for test-time adaptation vs. pure pre-training

Red flags:

Claims based solely on traditional benchmarks
No evaluation on out-of-distribution tasks
Conflating task performance with general intelligence

Analyze AI System Architecture

When examining an AI system's potential for general intelligence:

Assess pre-training approach
- What knowledge is baked in?
- How task-specific is the training?
Evaluate test-time capabilities
- Can the system adapt during inference?
- Does it use chain-of-thought or program synthesis?
- Can it modify its behavior for novel problems?
Check for both abstraction types
- Type 1: Perceptual pattern matching
- Type 2: Discrete symbolic reasoning
Determine generalization potential
- How does it perform on ARC-style tasks?
- Can it handle problems outside training distribution?

Assess Intelligence Metrics

When evaluating whether a metric truly measures intelligence:

Apply the Chollet criteria:
- Does it require understanding novel situations?
- Can it be solved by memorization?
- Is the solution space too narrow?
Check for the "ARC test":
- Would 50,000x more compute dramatically improve scores?
- If no, the metric likely measures fluid intelligence
- If yes, it may just measure knowledge retrieval
Consider human baseline:
- Humans with no training should outperform on true intelligence tests
- Large gap between human and AI suggests memorization-based approach

Key Frameworks

The Compute Cost Trajectory

Historical pattern since 1940:

Compute cost falls ~100x per decade
No sign of stopping
Implication: Compute alone doesn't solve intelligence

Why Pre-Training Scaling Failed for AGI

Scaling pre-training → Better benchmark scores
Better benchmark scores ≠ General intelligence
General intelligence requires → Novel problem adaptation
Novel problem adaptation requires → Test-time learning

The AGI Architecture Hypothesis

AGI = Deep Learning (Type 1) + Program Search (Type 2)
     + Test-Time Adaptation
     
Where:
- Deep Learning handles perception and pattern matching
- Program Search handles discrete reasoning and composition
- Test-Time Adaptation enables on-the-fly learning

Common Questions

"Is AGI already here with O3?"

Evaluate by asking:

Was the model fine-tuned specifically on ARC-like data?
Does performance generalize to other novel reasoning tasks?
Is it truly adapting or just better memorization?

O3 shows promising signs but fine-tuning on ARC means true generalization is uncertain.

"What's missing from current LLMs?"

Key gaps:

True test-time learning (not just prompting)
Program synthesis capabilities
Type 2 discrete abstraction
Generalization without task-specific fine-tuning

"How should we measure AGI progress?"

Use metrics that:

Cannot be solved by memorization
Require novel problem understanding
Show ceiling effects with scale alone
Demonstrate human-like fluid reasoning

ARC exemplifies these properties.

Terminology Reference

Term	Definition
Fluid intelligence	Ability to understand novel situations without prior training
Memorized skills	Static, task-specific capabilities from training
Test-time adaptation	Model modifying behavior during inference
ARC	Abstraction and Reasoning Corpus benchmark
Type 1 abstraction	Perceptual, continuous pattern recognition
Type 2 abstraction	Discrete, programmatic reasoning
Program synthesis	Generating programs to solve problems
Scaling laws	Predictable performance gains from more compute/data

Application Guidelines

When Discussing AGI Timelines

Distinguish capability claims from intelligence claims
Ask what benchmarks support the claim
Check for test-time adaptation in the architecture
Consider whether the system generalizes beyond training

When Designing AI Systems for Generalization

Include test-time adaptation mechanisms
Don't rely solely on pre-training scale
Incorporate program synthesis where possible
Test on out-of-distribution problems like ARC
Balance Type 1 and Type 2 capabilities

When Evaluating AI Research Directions

Prioritize approaches that:

Enable learning at inference time
Combine neural and symbolic methods
Demonstrate novel problem-solving
Go beyond benchmark optimization