Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

ai-product-building-heller

Guide for building successful AI startups based on Jake Heller's Casetext journey ($650M exit). Use when users need help with- (1) Selecting AI startup ideas by identifying jobs people pay humans to do, (2) Building reliable AI products through systematic evaluation and prompt iteration, (3) Pricing AI products based on value delivered, (4) Marketing AI products through product quality rather than sales tactics, (5) Understanding the assistance/replacement/unthinkable framework for AI opportunities, (6) Creating evaluation frameworks for AI prompts, or (7) Bridging the trust gap with enterprise customers for AI products.

Ejecutar en Manus

Estrellas0

Forks2

Actualizado28 de enero de 2026, 03:45

Fuente

jona

jona/ycombinator-skills

Abrir repositorio de GitHub Ver repositorios del creador

Comando de instalación

Descarga

Ejecutar en Manus

Útil paraSOC

Analistas de gestiónOperaciones empresariales y financieras13-1111L4

SKILL.md

readonly

Más de este repositorio

mismo repositorio

claude-code-best-practices

jona/ycombinator-skills

Best practices for using Claude Code effectively based on insights from its creator Boris Cherny. Trigger this skill when users ask about optimizing Claude Code usage, configuring CLAUDE.md files, using plan mode, working with sub-agents, understanding Claude Code philosophy, improving coding productivity with Claude Code, or building AI coding tools. Also trigger when users mention blatant demand, scaffolding in AI products, building for future model capabilities, or ask about Anthropic's approach to AI coding assistants.

2026-02-170

yc-startup-fundamentals

jona/ycombinator-skills

Y Combinator startup methodology covering team formation, MVP development, growth strategies, fundraising, PR, operations, and hiring. Trigger when users ask about starting a startup, forming a founding team, building an MVP, achieving product-market fit, raising venture capital, startup fundraising strategy, doing PR for startups, startup hiring decisions, startup operations, or when they need guidance on early-stage company building. Also trigger when users mention YC, Y Combinator, startup acceleration, or reference startup fundamentals like runway, burn rate, or co-founder dynamics.

2026-01-280

agi-framework-chollet

jona/ycombinator-skills

Provides François Chollet's framework for understanding intelligence, AGI development paths, and the limitations of current AI approaches. Use this skill when users ask about- (1) What intelligence really means and how to define AGI, (2) Why scaling pre-training alone won't achieve AGI, (3) The difference between memorized skills and fluid intelligence, (4) Test-time adaptation and its role in AGI, (5) The ARC benchmark and what it measures, (6) Type 1 vs Type 2 abstraction in AI systems, (7) Program synthesis approaches to intelligence, (8) Evaluating claims about AGI progress, or (9) Understanding the conceptual foundations needed for building generally intelligent systems.

2026-01-280

b2b-ai-startup-levie

jona/ycombinator-skills

Strategic framework for evaluating and building B2B AI startups based on Aaron Levie's insights from building Box through the cloud transformation. Use when founders or advisors need to - (1) Evaluate AI startup ideas for defensibility and market timing, (2) Design pricing models for AI products (consumption vs seat-based), (3) Analyze competitive positioning against incumbents, (4) Identify high-value AI opportunities in enterprise unstructured data, (5) Assess whether to target "core" vs "context" business functions, (6) Understand the 2024-2027 AI startup window dynamics, or (7) Apply Innovator's Dilemma and Crossing the Chasm frameworks to AI market entry.

2026-01-280

developer-tools-strategy-truell

jona/ycombinator-skills

Strategic guidance for building developer tools and AI-first products, derived from Michael Truell's experience building Cursor. Use when- (1) Evaluating whether to enter a market with established competitors, (2) Deciding between product improvement vs growth engineering investment, (3) Architecting AI-assisted developer tools, (4) Choosing between building custom infrastructure vs using existing solutions, (5) Navigating early user feedback that conflicts with product vision, (6) Assessing startup opportunities in AI/developer tools space, (7) Planning technical product launches and distribution strategies.

2026-01-280

software-paradigms-karpathy

jona/ycombinator-skills

Explains Andrej Karpathy's framework for understanding the three paradigms of software (1.0- traditional code, 2.0- neural network weights, 3.0- LLM prompts). Use when users ask about software paradigm shifts, the evolution of programming, how LLMs fit into software development history, Software 1.0/2.0/3.0 distinctions, prompt engineering as programming, or when they need to explain or apply Karpathy's mental model for understanding modern AI development. Also useful when discussing how to think about building software in the AI era, choosing between traditional code vs neural nets vs LLM prompts, or explaining the significance of "programming in English."

2026-01-280

name

ai-product-building-heller

description

AI Startup Building Framework

Build successful AI startups by picking ideas based on existing paid work, iterating obsessively on evaluation, and letting product quality drive growth.

Core Principles

Job-as-Market Framework

Instead of inventing what people might want, identify what people currently pay other people to do. This eliminates product-market fit risk.

Key insight: AI has made finding product-market fit easier—we already know what people want because they're paying humans to do it.

TAM Expansion

Traditional SaaS: TAM = software seats × subscription price

AI products: TAM = combined salaries of all workers doing the job being replaced/assisted

This means 10-1000x larger markets than traditional SaaS.

Idea Selection Workflow

Step 1: Identify Target Jobs

Look for jobs people are already outsourcing or paying humans to do:

Customer support representatives
Insurance adjusters
Paralegals and legal researchers
Personal trainers
Executive assistants
Data entry clerks
Content moderators

Best signal: Jobs being outsourced to other countries indicate price sensitivity and clear task definition—prime AI targets.

Step 2: Categorize the Opportunity

Classify your idea into one of three categories:

Category	Description	Example	Complexity
Assistance	Help professionals do tasks faster	AI legal research for lawyers	Medium
Replacement	Become the service provider directly	AI-powered customer support	High
Previously Unthinkable	Tasks too expensive for humans at scale	Personalized tutoring for every student	Highest

Decision guidance:

Start with assistance if entering a regulated industry (law, medicine, finance)
Consider replacement for commoditized services with clear quality metrics
Pursue unthinkable only with strong technical differentiation

Step 3: Validate Domain Expertise

You must understand how professionals actually do the work. Ask:

What are the specific steps in this workflow?
Where do humans make judgment calls?
What does "good" look like to an expert?
What mistakes would be unacceptable?

If you cannot answer these questions, partner with domain experts or spend time learning the profession deeply.

Building Reliable AI Products

Workflow Decomposition

Break professional tasks into specific steps. For each step, decide:

Is this step deterministic?
├── Yes → Implement as code (no LLM needed)
└── No → Does it require judgment?
    ├── Yes → Create a prompt with evaluation
    └── No → Can it be rule-based?
        ├── Yes → Implement as code
        └── No → Create a prompt with evaluation

Mental model: Each prompt represents injecting human-level intelligence at a specific decision point.

Best Expert Framework

Design AI workflows by asking: "How would the best person in this field approach this task if they had unlimited time and 1000 AI instances working simultaneously?"

This reframes constraints—you're not limited by human availability or time pressure.

Evaluation Framework

The Eval-Driven Development Process

Most AI products fail because builders stop at 60-70% accuracy demos. Follow this process instead:

Phase 1: Initial Development (12 evals per prompt)

Write initial prompt
Create 12 diverse test cases covering:
- Common scenarios (6 cases)
- Edge cases (3 cases)
- Adversarial inputs (3 cases)
Run evaluations
Iterate until all 12 pass

Phase 2: Expansion (reach 100 evals)

Add 10 more test cases after achieving 100% on initial set
Identify failure patterns
Iterate on prompt until new tests pass
Repeat until you have 100 evaluations per prompt

Phase 3: Holdout Validation

Keep 20% of evaluations as a holdout set
Never look at holdout results during development
Use holdout only for final validation
If holdout fails, return to Phase 2

Evaluation Criteria

For each test case, define:

test_case:
  input: "The specific input to test"
  expected_behavior: "What the output should contain/do"
  failure_conditions:
    - "Specific failure mode 1"
    - "Specific failure mode 2"
  pass_threshold: 0.97  # 97% minimum for production

The Two-Week Grind

Critical insight: The willingness to spend two weeks sleeplessly iterating on a single prompt separates successful products from demos.

When accuracy matters (finance, medicine, law):

Block two weeks for prompt refinement
Accept no compromises below 97% accuracy
Document every iteration and why it failed

Converting Complaints to Tests

Post-beta launch workflow:

Receive customer complaint
Reproduce the issue
Create new test case capturing the failure
Add to evaluation suite
Iterate until test passes
Verify no regression on existing tests

Real user behavior is your best evaluation source.

Pricing AI Products

Value-Based Pricing

Price based on value delivered, not SaaS conventions:

AI Product Price = (Human cost for equivalent work) × (0.1 to 0.5)

Example: If a paralegal costs $50/hour and takes 4 hours for research ($200 total), price AI at $20-100 per equivalent task.

Discovery Process

Ask customers directly: "How would you like to pay for this?"

Common models:

Per-task pricing (for discrete, measurable outputs)
Seat-based (for ongoing assistance tools)
Usage-based (for variable consumption patterns)
Outcome-based (for replacement products)

Avoiding PRR Trap

Pilot Recurring Revenue (PRR): Revenue from pilot programs that may not convert to real ARR.

Warning signs:

Pilots that keep extending without conversion
Usage metrics that don't match payment
Customers who praise but don't deploy

Focus on actual customer adoption and usage, not pilot revenue.

Marketing and Sales

Product-is-Everything Principle

Your product isn't just pixels on screen—it includes:

Customer support quality
Onboarding experience
Training materials
Customer success interactions
Founder involvement in early deals

Great products generate word-of-mouth. Product quality drives marketing success more than marketing investment.

Bridging the Trust Gap

Enterprise buyers face uncertainty moving from controllable humans (trainable, fireable) to unknown AI.

Trust-building tactics:

Side-by-side comparisons: Offer head-to-head tests against existing human services during pilots
Controlled pilots: Let customers test with real work in controlled environment
Published studies: Create case studies with measurable outcomes
Gradual rollout: Start with low-risk tasks, expand as trust builds

Forward Deployed Engineers

For enterprise customers, place engineers who sit with customers to:

Ensure products work in their specific environment
Gather real feedback on failure modes
Build relationship and trust
Identify expansion opportunities

Building Defensibility

Defensibility comes from accumulated iteration complexity, not proprietary models.

Sources of defensibility:

Thousands of evaluations refined over years
Deep understanding of domain-specific edge cases
Integrated workflows that are painful to switch
Brand trust built through consistent quality
Data flywheel from customer usage

Common Mistakes to Avoid

Mistake	Why It Fails	Better Approach
Stopping at 70% accuracy	Unusable for professional work	Iterate until 97%+
Using agent frameworks without understanding	Adds complexity without reliability	Build simple, testable pipelines
Over-investing in sales vs. product	Unsustainable growth	Let product quality drive growth
Pricing like SaaS	Leaves value on table	Price based on human cost replacement
Ignoring domain expertise	Miss critical failure modes	Partner with or become domain experts
Counting PRR as real revenue	Inflates metrics, hides problems	Track actual deployment and usage

Quick Reference: Evaluation Tools

Promptfoo (open source, command line):

Run batch evaluations
Compare prompt versions
Track regression over time

Basic usage pattern:

promptfoo eval --config eval_config.yaml

Structure evaluations in YAML with inputs, expected outputs, and grading criteria.