一键在 Manus 中运行任何 Skill

dspy-ensemble

星标6

分支1

更新时间2026年6月13日 13:41

Use when you have multiple optimized versions of a program and want to combine them — voting, averaging, or routing across program variants for more robust outputs. Common scenarios - you have optimized several versions of a program and want to combine the best ones, using majority voting across multiple programs for higher accuracy, building a robust system by routing to different specialized programs, or reducing variance by averaging outputs. Related - ai-improving-accuracy, ai-making-consistent, dspy-bootstrap-rs. Also used for dspy.Ensemble, combine multiple optimized programs, majority voting across models, ensemble of DSPy programs, voting for reliability, reduce variance with multiple programs, aggregate predictions, combine outputs from different optimizers, when one program is not reliable enough, model committee, ensemble for production robustness, multiple programs one answer.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

lebsral

lebsral/DSPy-Programming-not-prompting-LMs-skills

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

相关职业SOC

基于 SOC 职业分类

软件开发工程师计算机与数学类职业·SOC 15-1252

文件资源管理器

4 个文件

SKILL.md

readonly

同仓库更多 Skills

同仓库

ai-auditing-code

lebsral/DSPy-Programming-not-prompting-LMs-skills

Review DSPy code for correctness and best practices. Use when you want a code review of your DSPy program, need to check if your AI code follows best practices, want to find anti-patterns in your DSPy usage, or need a quality audit of your AI implementation. Also use for DSPy code review, is my DSPy code correct, review my AI code, best practices check, DSPy anti-patterns, code quality audit, am I using DSPy right, sanity check my AI code, peer review my DSPy program, does this follow DSPy conventions.

2026-06-136

ai-checking-outputs

lebsral/DSPy-Programming-not-prompting-LMs-skills

Verify and validate AI output before it reaches users. Use when you need guardrails, output validation, safety checks, content filtering, fact-checking AI responses, catching hallucinations, preventing bad outputs, or quality gates. Also used for - AI output looks right but is wrong, how to validate JSON from LLM, LLM returns invalid data, catch bad AI outputs before users see them, output quality gate, AI guardrails for production, verify LLM did not hallucinate fields, post-processing LLM responses. Uses dspy.Refine (iterative with feedback) and dspy.BestOfN (sampling, pick best).

2026-06-136

ai-choosing-architecture

lebsral/DSPy-Programming-not-prompting-LMs-skills

Pick the right DSPy module and architecture for your AI feature. Use when you are not sure whether to use Predict, ChainOfThought, ReAct, or a pipeline, need to choose between DSPy patterns, want architecture advice for your AI feature, or are deciding between a single module and a multi-step pipeline. Also use for which DSPy module should I use, Predict vs ChainOfThought, when to use ReAct, single module vs pipeline, DSPy architecture decision, CoT vs PoT vs ReAct, do I need a pipeline, module selection guide, DSPy pattern selection, how to structure my DSPy program.

2026-06-136

ai-cleaning-data

lebsral/DSPy-Programming-not-prompting-LMs-skills

Normalize and fix messy data fields using AI. Use when normalizing addresses, standardizing company names, fixing inconsistent date formats, cleaning CSV data before import, correcting typos in bulk data, normalizing phone number formats, standardizing job titles, cleaning up free-text fields, data quality improvement with AI, fixing formatting inconsistencies, bulk data normalization, preparing messy data for analysis, AI-powered data wrangling.

2026-06-136

ai-cutting-costs

lebsral/DSPy-Programming-not-prompting-LMs-skills

Reduce your AI API bill. Use when AI costs are too high, API calls are too expensive, you want to use cheaper models, optimize token usage, reduce LLM spending, route easy questions to cheap models, or make your AI feature more cost-effective. Also used for GPT-4 costs too much for production, AI bill keeps growing, how to reduce OpenAI costs, optimize LLM token usage, smart model routing saves money, prompt is too long and expensive, cheaper than GPT-4 with same quality.

2026-06-136

ai-do

lebsral/DSPy-Programming-not-prompting-LMs-skills

Describe your AI problem and get routed to the right skill with a ready-to-use prompt. Use when you are not sure which ai- skill to use, want help picking the right approach, or just want to describe what you need in plain language. Also use this when someone says I want to build an AI that..., how do I make my AI..., or describes any AI/LLM task without naming a specific skill, I need AI but do not know where to start, which AI pattern should I use, what is the best way to add AI to my app, recommend an AI approach, AI feature discovery, too many AI options, overwhelmed by AI frameworks, just tell me what to build, new to DSPy, beginner AI project help, which LLM pattern fits my use case, confused about AI architecture, help me figure out my AI approach.

2026-06-136

name

dspy-ensemble

description

Combine Programs with dspy.Ensemble

Guide the user through using DSPy's Ensemble optimizer to combine multiple optimized programs into a single ensemble that aggregates their outputs. This is useful when you have run several optimization passes (different optimizers, different hyperparameters, different random seeds) and want to combine them for more robust predictions.

What is Ensemble

dspy.Ensemble is an optimizer (teleprompter) that takes a list of DSPy programs and returns a single EnsembledProgram. When you call the ensembled program, it runs each constituent program on the same inputs and aggregates the results using a reduce function you provide.

Program A ──┐
Program B ──┼──> Run all ──> reduce_fn ──> Single output
Program C ──┘

Unlike other optimizers that tune prompts or weights, Ensemble does not change the programs themselves. It combines their outputs at inference time.

When to use Ensemble

You ran multiple optimization passes (e.g., several BootstrapFewShot runs with different seeds) and want to combine the best of each
You want majority voting -- run several programs and pick the most common answer for higher reliability
You want to average numeric outputs -- combine scores or probabilities from multiple models
Different optimizers produced different strengths -- one program is good at precision, another at recall, and you want both
You need a quick reliability boost -- ensembling is a well-known technique to reduce variance

Do not use Ensemble when:

You only have one program (nothing to ensemble)
Latency is critical -- ensembling runs every program, multiplying your inference time
Cost is a hard constraint -- you pay for every program in the ensemble
Your programs produce complex structured outputs that are hard to aggregate

Basic usage

import dspy

lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)

# 1. Define your base program
qa = dspy.ChainOfThought("question -> answer")

# 2. Create a training set and metric
trainset = [
    dspy.Example(question="What is the capital of France?", answer="Paris").with_inputs("question"),
    dspy.Example(question="What is 2 + 2?", answer="4").with_inputs("question"),
    # ... more examples
]

def exact_match(example, pred, trace=None):
    return pred.answer.strip().lower() == example.answer.strip().lower()

# 3. Run multiple optimization passes to get different programs
programs = []
for i in range(3):
    optimizer = dspy.BootstrapFewShot(
        metric=exact_match,
        max_bootstrapped_demos=4,
        max_labeled_demos=4,
    )
    optimized = optimizer.compile(qa, trainset=trainset)
    programs.append(optimized)

# 4. Combine with Ensemble using majority voting
ensemble_optimizer = dspy.Ensemble(reduce_fn=dspy.majority, size=None)
ensemble_program = ensemble_optimizer.compile(programs)

# 5. Use the ensemble like any module
result = ensemble_program(question="What is the capital of Germany?")
print(result.answer)

Constructor parameters

dspy.Ensemble(
    reduce_fn=None,     # Function to aggregate outputs from all programs
    size=None,          # How many programs to sample (None = use all)
    deterministic=False,  # Must be False (deterministic mode not yet implemented)
)

Parameter	Type	Description
`reduce_fn`	`Callable \| None`	Aggregation function applied to the list of outputs. If `None`, returns the raw list of predictions.
`size`	`int \| None`	Number of programs to randomly sample from the ensemble. `None` means use all programs.
`deterministic`	`bool`	Reserved for future use. Must be `False`.

compile method

ensemble_optimizer.compile(programs)

Parameter	Type	Description
`programs`	`list[dspy.Module]`	List of DSPy programs to ensemble

Returns an EnsembledProgram that runs the selected programs and applies reduce_fn.

Reduce functions

The reduce function determines how outputs from multiple programs are combined into a single result.

dspy.majority (built-in)

The most common reduce function. It picks the most frequent output value across all programs -- majority voting.

ensemble = dspy.Ensemble(reduce_fn=dspy.majority)

Use dspy.majority when:

Outputs are categorical (classification labels, short factual answers, yes/no)
You want the most robust answer -- the one most programs agree on

Custom reduce: averaging numeric outputs

def average_scores(predictions):
    """Average a numeric output field across all predictions."""
    scores = [float(p.score) for p in predictions]
    avg = sum(scores) / len(scores)
    # Return a Prediction-like object with the averaged score
    return predictions[0].__class__(score=str(avg))

ensemble = dspy.Ensemble(reduce_fn=average_scores)

Custom reduce: weighted voting

def weighted_vote(predictions):
    """Pick the answer backed by the most programs, with confidence weighting."""
    from collections import Counter
    votes = Counter(p.answer for p in predictions)
    winner = votes.most_common(1)[0][0]
    # Return a prediction with the winning answer
    return predictions[0].__class__(answer=winner)

ensemble = dspy.Ensemble(reduce_fn=weighted_vote)

No reduce function

If you pass reduce_fn=None, the ensembled program returns the raw list of predictions from all programs. This is useful when you want to implement custom post-processing logic outside the ensemble.

ensemble = dspy.Ensemble(reduce_fn=None)
ensemble_program = ensemble.compile(programs)

# Returns a list of predictions
all_predictions = ensemble_program(question="What is DSPy?")
# Process them yourself
for pred in all_predictions:
    print(pred.answer)

Combining different optimizers

One of the most powerful uses of Ensemble is combining programs from different optimization strategies. Each optimizer may find different strengths.

import dspy

lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)

qa = dspy.ChainOfThought("question -> answer")

# Program 1: Optimized with BootstrapFewShot
opt1 = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
prog1 = opt1.compile(qa, trainset=trainset)

# Program 2: Optimized with MIPROv2
opt2 = dspy.MIPROv2(metric=metric, auto="light")
prog2 = opt2.compile(qa, trainset=trainset)

# Program 3: Optimized with BootstrapFewShotWithRandomSearch
opt3 = dspy.BootstrapFewShotWithRandomSearch(
    metric=metric,
    max_bootstrapped_demos=4,
    num_candidate_programs=5,
)
prog3 = opt3.compile(qa, trainset=trainset)

# Ensemble all three
ensemble = dspy.Ensemble(reduce_fn=dspy.majority)
combined = ensemble.compile([prog1, prog2, prog3])

result = combined(question="What is the tallest mountain?")
print(result.answer)

This approach works because different optimizers explore different parts of the prompt space. BootstrapFewShot finds good demonstrations, MIPROv2 finds good instructions, and combining them via voting smooths out individual weaknesses.

Sampling with size

When you have many optimized programs (e.g., from a large random search), you can use size to randomly sample a subset at inference time. This reduces cost while still benefiting from diversity.

# You have 10 programs from BootstrapFewShotWithRandomSearch
programs = [...]  # 10 optimized programs

# Only run 3 of them per inference call (randomly sampled)
ensemble = dspy.Ensemble(reduce_fn=dspy.majority, size=3)
ensemble_program = ensemble.compile(programs)

Each call to ensemble_program randomly picks 3 of the 10 programs, runs them, and applies majority voting. This balances diversity against cost.

Cost and latency considerations

Ensemble multiplies your inference cost and latency by the number of programs (or size if set):

Programs	Cost multiplier	Latency (sequential)
3	3x	3x
5	5x	5x
10	10x	10x

Ways to manage this:

Use size to cap the number of programs run per inference call
Use cheaper models for the ensemble members and reserve expensive models for critical paths
Ensemble at evaluation time only to pick the single best program, then deploy that one program in production
Parallelize if your infrastructure supports concurrent LM calls -- the programs are independent

Ensemble vs BestOfN

Both combine multiple outputs, but they work differently:

	Ensemble	BestOfN
What it combines	Different optimized programs	Multiple runs of the same program
Selection method	Voting / averaging across programs	Reward function picks the best single run
Diversity source	Different prompts/demos from optimization	Temperature sampling of the same prompt
When to use	You have multiple optimized programs	You have one program and a scoring metric
Optimizer type	Combines at the program level	Combines at the inference level

You can even stack them: ensemble multiple optimized programs, then wrap the ensemble with BestOfN for additional quality.

Gotchas

Claude passes a single program instead of a list to compile(). Ensemble.compile() expects a list[dspy.Module], not a single module. Always wrap even two programs in a list: ensemble.compile([prog1, prog2]).
Claude forgets that each ensemble member uses its own LM context. Programs optimized under different dspy.configure(lm=...) calls retain their LM binding. You do not need to re-configure the LM before calling the ensemble -- each program already knows which LM to use.
Claude sets deterministic=True expecting reproducible sampling. The deterministic parameter is reserved but not yet implemented -- setting it to True raises an error. Leave it at the default False.
Claude uses Ensemble when BestOfN is the right tool. Ensemble combines different optimized programs. If you have one program and want to run it multiple times with temperature sampling and pick the best output, use dspy.BestOfN instead.
Claude builds a custom reduce function that returns a raw string instead of a Prediction. The reduce_fn receives a list of dspy.Prediction objects and must return a dspy.Prediction (or compatible object). Returning a plain string breaks downstream field access.

Additional resources

Ensemble API docs
reference.md -- constructor parameters, compile method, reduce function protocol
examples.md -- worked examples with majority voting and multi-model ensembles

Cross-references

Install any skill: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>

BestOfN for picking the best from multiple runs of a single program -- see /dspy-best-of-n
BootstrapFewShot for generating the programs to ensemble -- see /ai-improving-accuracy
MIPROv2 for instruction optimization -- see /ai-improving-accuracy
Evaluate for measuring ensemble quality with metrics -- see /dspy-evaluate
For worked examples, see examples.md
Install /ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do