Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

dspy-better-together

Estrellas6

Forks1

Actualizado13 de junio de 2026, 13:41

Use when you have already tried prompt-only optimization and want the next level — jointly tuning prompts and model weights for maximum quality. Common scenarios - you have maxed out prompt optimization and need the next level, combining instruction tuning with weight tuning for maximum quality, making a small model match a large model through joint optimization, or squeezing the last few percent of accuracy. Related - ai-fine-tuning, ai-improving-accuracy, ai-cutting-costs. Also used for dspy.BetterTogether, joint prompt and weight optimization, beyond prompt engineering, combine fine-tuning with prompt optimization, maximum possible quality from DSPy, hybrid optimization strategy, prompt optimization hit a ceiling, fine-tune and optimize prompts at the same time, advanced DSPy optimization, best possible accuracy, what to try after MIPROv2, next level AI quality.

Instalación

Instalar con Codex o Claude Copia este prompt, pégalo en Codex, Claude u otro asistente, y deja que revise la página de la skill y la instale por ti.

Ejecutar en Manus

Fuente

lebsral

lebsral/DSPy-Programming-not-prompting-LMs-skills

Abrir repositorio de GitHub Ver repositorios del creador

Descarga

Ejecutar en Manus

Ocupaciones relacionadasSOC

Basado en la clasificación ocupacional SOC

Desarrolladores de softwareOcupaciones informáticas y matemáticas·SOC 15-1252

Explorador de archivos

4 archivos

SKILL.md

readonly

Más de este repositorio

mismo repositorio

ai-building-chatbots

lebsral/DSPy-Programming-not-prompting-LMs-skills

Build a conversational AI assistant with memory and state. Use when you need a customer support chatbot, helpdesk bot, onboarding assistant, sales qualification bot, FAQ assistant, or any multi-turn conversational AI. Also used for chatbot remember previous messages, conversational AI keeps forgetting context, build a helpdesk bot that actually works, chatbot drops context after a few turns, Intercom bot alternative, Zendesk AI alternative, build WhatsApp bot, Slack bot with AI, chatbot escalation to human agent, LangChain chatbot but simpler, chatbot for SaaS onboarding flow.

2026-06-276

ai-building-pipelines

lebsral/DSPy-Programming-not-prompting-LMs-skills

Chain multiple AI steps into one reliable pipeline. Use when your AI task is too complex for one prompt, you need to break AI logic into stages, combine classification then generation, do multi-step reasoning, build a compound AI system, orchestrate multiple models, or wire AI components together. Also used for LangChain LCEL alternative, how to chain LLM calls together, one prompt is not enough, multi-step AI workflow, AI pipeline that actually works in production, prompt chaining keeps breaking, DAG of LLM calls, extract then classify then generate, compound AI system design, how to combine multiple AI steps without spaghetti code.

2026-06-276

ai-checking-outputs

lebsral/DSPy-Programming-not-prompting-LMs-skills

Verify and validate AI output before it reaches users. Use when you need guardrails, output validation, safety checks, content filtering, fact-checking AI responses, catching hallucinations, preventing bad outputs, or quality gates. Also used for - AI output looks right but is wrong, how to validate JSON from LLM, LLM returns invalid data, catch bad AI outputs before users see them, output quality gate, AI guardrails for production, verify LLM did not hallucinate fields, post-processing LLM responses. Uses dspy.Refine (iterative with feedback) and dspy.BestOfN (sampling, pick best).

2026-06-276

ai-cleaning-data

lebsral/DSPy-Programming-not-prompting-LMs-skills

Normalize and fix messy data fields using AI. Use when normalizing addresses, standardizing company names, fixing inconsistent date formats, cleaning CSV data before import, correcting typos in bulk data, normalizing phone number formats, standardizing job titles, cleaning up free-text fields, data quality improvement with AI, fixing formatting inconsistencies, bulk data normalization, preparing messy data for analysis, AI-powered data wrangling.

2026-06-276

ai-coordinating-agents

lebsral/DSPy-Programming-not-prompting-LMs-skills

Build multiple AI agents that work together. Use when you need a supervisor agent that delegates to specialists, agent handoff, parallel research agents, support escalation (L1 to L2), content pipeline (writer + editor + fact-checker), or any multi-agent system. Also used for CrewAI alternative, AutoGen alternative, LangGraph multi-agent, agents that talk to each other, specialist agents with a supervisor, agents keep stepping on each other, build an AI team, route tasks to the right agent, when one agent is not enough, parallel agents for research.

2026-06-276

ai-cutting-costs

lebsral/DSPy-Programming-not-prompting-LMs-skills

Reduce your AI API bill. Use when AI costs are too high, API calls are too expensive, you want to use cheaper models, optimize token usage, reduce LLM spending, route easy questions to cheap models, or make your AI feature more cost-effective. Also used for GPT-4 costs too much for production, AI bill keeps growing, how to reduce OpenAI costs, optimize LLM token usage, smart model routing saves money, prompt is too long and expensive, cheaper than GPT-4 with same quality.

2026-06-276

name

dspy-better-together

description

BetterTogether: Joint Prompt + Weight Optimization

Guide the user through using dspy.BetterTogether to get the best possible quality by combining prompt optimization and model fine-tuning in alternating rounds. Each round builds on the improvements from the previous one, creating compounding gains that beat either approach alone.

What it is

BetterTogether is a DSPy optimizer that alternates between prompt optimization (instructions, few-shot examples) and weight optimization (fine-tuning). Instead of running these independently, it chains them so each phase builds on the previous one's improvements:

Prompt optimization discovers effective task decompositions and reasoning strategies
Weight optimization specializes the model to execute those discovered patterns efficiently
Repeated rounds compound the gains -- each phase benefits from the prior improvements

Research shows this consistently outperforms either approach alone, with 5-78% gains over individual techniques (arXiv 2407.10930v2). A Databricks case study on IE Bench showed GEPA alone +2.1 points, fine-tuning alone +1.9 points, but combined they achieved +4.8 points over baseline.

When to use

You have 500+ labeled examples and a reliable metric
You've already tried prompt optimization (MIPROv2) and fine-tuning (BootstrapFinetune) separately and want more
You want the absolute best quality and have the compute budget for multiple optimization rounds
Fine-tuning alone didn't close the gap to your quality target
You need a production-grade model and can afford longer optimization time

When NOT to use

You have fewer than 500 examples -- use MIPROv2 or BootstrapFewShot instead (see /ai-improving-accuracy)
You haven't tried prompt optimization yet -- start with /ai-improving-accuracy
Your baseline is below 50% -- fix your task definition or data first
You're still iterating on what the task is -- BetterTogether is expensive to re-run
You don't have access to a fine-tunable model (OpenAI gpt-4o-mini/gpt-4o, or local models)

Prerequisites

Before starting, confirm:

Data: 500+ labeled examples (1000+ recommended), split 80/10/10 (train/dev/test)
Baseline: Measured accuracy from prompt optimization (MIPROv2) and/or fine-tuning (BootstrapFinetune)
Metric: Automated metric that scores predictions
Fine-tunable model: OpenAI fine-tuning API, Databricks, or local models with GPU
Budget: Multiple optimization rounds cost 2-3x more than a single optimizer run

Basic usage

import dspy

lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)

# Define your program
class Classify(dspy.Signature):
    """Classify the support ticket into a category."""
    text: str = dspy.InputField()
    category: str = dspy.OutputField()

program = dspy.ChainOfThought(Classify)

# IMPORTANT: All predictors must have explicit LMs assigned
program.set_lm(lm)

# Define your metric
def metric(example, prediction, trace=None):
    return prediction.category.strip().lower() == example.category.strip().lower()

# Prepare data
trainset = [dspy.Example(text=x["text"], category=x["category"]).with_inputs("text") for x in data]
valset = trainset[800:900]
trainset = trainset[:800]

# Run BetterTogether with defaults
optimizer = dspy.BetterTogether(metric=metric)
compiled = optimizer.compile(program, trainset=trainset, valset=valset)

By default, BetterTogether uses:

p: BootstrapFewShotWithRandomSearch for prompt optimization
w: BootstrapFinetune for weight optimization
Strategy: "p -> w -> p" (prompts, then weights, then prompts again)

How it combines prompt and weight tuning

BetterTogether executes a strategy string that defines the order of optimization phases:

"p -> w -> p"
 |    |    |
 |    |    +-- Re-optimize prompts for the fine-tuned model
 |    +------- Fine-tune weights using the optimized prompts
 +------------ Optimize prompts first (instructions + few-shot)

At each step:

Shuffle the trainset (prevents overfitting to data order)
Run the designated optimizer on the current best program
Evaluate the result on the validation set
Record the candidate program and score
Move to the next step in the strategy

After all steps, BetterTogether returns the best-scoring candidate across all phases (ties broken by earlier position).

Why alternating works

Prompt optimization finds the right "recipe" -- effective instructions, good examples, useful reasoning patterns
Weight optimization bakes those patterns into the model so it executes them reliably and cheaply
Re-optimizing prompts after fine-tuning discovers new strategies that the specialized model can now handle

Custom optimizers

Pass your own optimizers as keyword arguments. The keys become identifiers in the strategy string:

optimizer = dspy.BetterTogether(
    metric=metric,
    p=dspy.GEPA(metric=metric, auto="medium"),
    w=dspy.BootstrapFinetune(metric=metric),
)

program.set_lm(lm)
compiled = optimizer.compile(
    program,
    trainset=trainset,
    valset=valset,
    strategy="p -> w -> p",
)

You can use any DSPy Teleprompter as an optimizer. Common choices:

Key	Optimizer	Best for
`p`	`GEPA`	Instruction tuning, fewer examples
`p`	`MIPROv2`	Best general prompt optimization
`p`	`BootstrapFewShotWithRandomSearch`	Fast prompt optimization (default)
`w`	`BootstrapFinetune`	Weight optimization (default)

Key parameters

Constructor: `BetterTogether(metric, **optimizers)`

Parameter	Type	Description
`metric`	`Callable`	Evaluation function `(example, prediction, trace=None) -> numeric`
`**optimizers`	keyword args	Custom optimizers. Keys become strategy identifiers (e.g., `p=GEPA(...)`, `w=BootstrapFinetune(...)`)

Compile: `optimizer.compile(student, *, trainset, ...)`

Parameter	Type	Default	Description
`student`	`Module`	required	Program to optimize. All predictors must have LMs via `set_lm()`
`trainset`	`list[Example]`	required	Training examples
`valset`	`list[Example]`	`None`	Validation set. If `None`, splits from trainset
`valset_ratio`	`float`	`0.1`	Fraction of trainset to use as valset when `valset=None`
`strategy`	`str`	`"p -> w -> p"`	Optimizer execution order using keys from constructor
`teacher`	`Module` or `list[Module]`	`None`	Optional teacher program(s) for distillation
`num_threads`	`int`	`None`	Parallel threads for evaluation
`shuffle_trainset_between_steps`	`bool`	`True`	Shuffle trainset before each step
`seed`	`int`	`None`	Random seed for reproducibility
`optimizer_compile_args`	`dict`	`None`	Per-optimizer custom compile arguments

Return value

The compiled program has two extra attributes:

candidate_programs: List of dicts with 'program', 'score', 'strategy' keys, sorted by score descending
flag_compilation_error_occurred: Boolean indicating if any step failed

Strategy patterns

Strategy	Rounds	Use case
`"p -> w -> p"`	3	Default. Best balance of quality and cost
`"p -> w"`	2	Simpler, cheaper. Good starting point
`"w -> p"`	2	When your model needs weight tuning first
`"p -> w -> p -> w"`	4	Maximum quality, highest cost

Computational cost

BetterTogether runs multiple optimization rounds, so it costs more than individual optimizers:

Strategy	Approximate cost	Time
`"p -> w"`	1x prompt opt + 1x fine-tune	Hours
`"p -> w -> p"` (default)	2x prompt opt + 1x fine-tune	Hours to half a day
`"p -> w -> p -> w"`	2x prompt opt + 2x fine-tune	Half a day to a day

Fine-tuning is the expensive part. Each fine-tuning round involves:

Bootstrapping traces from training data
Uploading traces to the fine-tuning provider
Waiting for fine-tuning to complete (minutes to hours depending on provider)
Evaluating the fine-tuned model

Reducing cost

Start with "p -> w" to see if two rounds are enough
Use a smaller valset (but keep at least 50-100 examples)
Use optimizer_compile_args to limit individual optimizer budgets

BetterTogether vs individual optimizers

Approach	Data needed	Quality	Cost	When to use
MIPROv2 alone	200+	Good	Low	First optimization attempt
BootstrapFinetune alone	500+	Better	Medium	When prompts hit a ceiling
BetterTogether	500+	Best	High	When you need maximum quality

Rule of thumb: Try MIPROv2 first. If you're still short of your quality target, try BootstrapFinetune. If you need more, use BetterTogether.

Important requirements

Explicit LM assignment: All predictors in your program must have LMs assigned via set_lm(). Global dspy.configure(lm=...) is not enough for BetterTogether.

program = dspy.ChainOfThought(MySignature)
program.set_lm(lm)  # Required

Fine-tunable model: The weight optimizer needs a model that supports fine-tuning (OpenAI, Databricks, or local models with GPU).
Validation data: Provide either an explicit valset or set valset_ratio > 0. Without validation data, BetterTogether returns the latest program instead of the best one.
Strategy keys must match: Keys in the strategy string must match the keyword argument names from the constructor.

Inspecting results

After compilation, examine all candidate programs:

compiled = optimizer.compile(program, trainset=trainset, valset=valset)

# See all candidates ranked by score
for candidate in compiled.candidate_programs:
    print(f"Strategy step: {candidate['strategy']}, Score: {candidate['score']:.1f}%")

# Check if any errors occurred
if compiled.flag_compilation_error_occurred:
    print("Warning: one or more optimization steps failed")

Error handling

BetterTogether has built-in resilience. If any optimization step fails:

It logs the error and continues to the next step
Returns the best program found before the failure
Sets flag_compilation_error_occurred = True on the result

Always check this flag in production workflows.

Gotchas

Claude forgets set_lm() and relies on global dspy.configure(). BetterTogether requires every predictor to have an explicit LM assignment via program.set_lm(lm). Without it, the weight optimizer cannot identify which model to fine-tune and raises an error. Always call set_lm() on the program before compile().
Claude jumps straight to BetterTogether without trying simpler optimizers first. BetterTogether costs 2-3x more than a single optimizer and takes hours. If you have not tried MIPROv2 or BootstrapFinetune individually first, start there — BetterTogether is only worth it when individual optimizers have plateaued.
Claude omits the validation set. Without a valset (or valset_ratio > 0), BetterTogether returns the latest program instead of the best-scoring one across all phases. Always provide a valset or leave valset_ratio=0.1 so the optimizer can select the best candidate.
Claude uses the same trainset for both training and validation. If valset overlaps with trainset, the optimizer selects based on inflated scores. Use a held-out split or let BetterTogether auto-split via valset_ratio.
Claude does not check flag_compilation_error_occurred after compile. If a fine-tuning step fails silently (API timeout, quota exceeded), BetterTogether returns the best program found before the failure. Always check compiled.flag_compilation_error_occurred and inspect compiled.candidate_programs to verify which steps completed.

Additional resources

dspy.BetterTogether API docs
reference.md — constructor parameters, compile() method, key behaviors
examples.md — combined optimization workflow, two-phase strategy with custom optimizers

Cross-references

Install any skill: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>

For the full fine-tuning workflow, see /ai-fine-tuning
For prompt optimization alone, see /ai-improving-accuracy
For evaluation and metrics, see /dspy-evaluate
For data preparation, see /dspy-data
Install /ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do

dspy-better-together

Más de este repositorio

Más de este repositorio

BetterTogether: Joint Prompt + Weight Optimization

What it is

When to use

When NOT to use

Prerequisites

Basic usage

How it combines prompt and weight tuning

Why alternating works

Custom optimizers

Key parameters

Constructor: BetterTogether(metric, **optimizers)

Compile: optimizer.compile(student, *, trainset, ...)

Return value

Strategy patterns

Computational cost

Reducing cost

BetterTogether vs individual optimizers

Important requirements

Inspecting results

Error handling

Gotchas

Additional resources

Cross-references

BetterTogether: Joint Prompt + Weight Optimization

What it is

When to use

When NOT to use

Prerequisites

Basic usage

How it combines prompt and weight tuning

Why alternating works

Custom optimizers

Key parameters

Constructor: BetterTogether(metric, **optimizers)

Compile: optimizer.compile(student, *, trainset, ...)

Return value

Strategy patterns

Computational cost

Reducing cost

BetterTogether vs individual optimizers

Important requirements

Inspecting results

Error handling

Gotchas

Additional resources

Cross-references

Constructor: `BetterTogether(metric, **optimizers)`

Compile: `optimizer.compile(student, *, trainset, ...)`

Constructor: `BetterTogether(metric, **optimizers)`

Compile: `optimizer.compile(student, *, trainset, ...)`