Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

ai-cutting-costs

Estrellas6

Forks1

Actualizado13 de junio de 2026, 13:41

Reduce your AI API bill. Use when AI costs are too high, API calls are too expensive, you want to use cheaper models, optimize token usage, reduce LLM spending, route easy questions to cheap models, or make your AI feature more cost-effective. Also used for GPT-4 costs too much for production, AI bill keeps growing, how to reduce OpenAI costs, optimize LLM token usage, smart model routing saves money, prompt is too long and expensive, cheaper than GPT-4 with same quality.

Instalación

Instalar con Codex o Claude Copia este prompt, pégalo en Codex, Claude u otro asistente, y deja que revise la página de la skill y la instale por ti.

Ejecutar en Manus

Fuente

lebsral

lebsral/DSPy-Programming-not-prompting-LMs-skills

Abrir repositorio de GitHub Ver repositorios del creador

Descarga

Ejecutar en Manus

Ocupaciones relacionadasSOC

Basado en la clasificación ocupacional SOC

Desarrolladores de softwareOcupaciones informáticas y matemáticas·SOC 15-1252

Explorador de archivos

3 archivos

SKILL.md

readonly

Más de este repositorio

mismo repositorio

ai-auditing-code

lebsral/DSPy-Programming-not-prompting-LMs-skills

Review DSPy code for correctness and best practices. Use when you want a code review of your DSPy program, need to check if your AI code follows best practices, want to find anti-patterns in your DSPy usage, or need a quality audit of your AI implementation. Also use for DSPy code review, is my DSPy code correct, review my AI code, best practices check, DSPy anti-patterns, code quality audit, am I using DSPy right, sanity check my AI code, peer review my DSPy program, does this follow DSPy conventions.

2026-06-136

ai-checking-outputs

lebsral/DSPy-Programming-not-prompting-LMs-skills

Verify and validate AI output before it reaches users. Use when you need guardrails, output validation, safety checks, content filtering, fact-checking AI responses, catching hallucinations, preventing bad outputs, or quality gates. Also used for - AI output looks right but is wrong, how to validate JSON from LLM, LLM returns invalid data, catch bad AI outputs before users see them, output quality gate, AI guardrails for production, verify LLM did not hallucinate fields, post-processing LLM responses. Uses dspy.Refine (iterative with feedback) and dspy.BestOfN (sampling, pick best).

2026-06-136

ai-choosing-architecture

lebsral/DSPy-Programming-not-prompting-LMs-skills

Pick the right DSPy module and architecture for your AI feature. Use when you are not sure whether to use Predict, ChainOfThought, ReAct, or a pipeline, need to choose between DSPy patterns, want architecture advice for your AI feature, or are deciding between a single module and a multi-step pipeline. Also use for which DSPy module should I use, Predict vs ChainOfThought, when to use ReAct, single module vs pipeline, DSPy architecture decision, CoT vs PoT vs ReAct, do I need a pipeline, module selection guide, DSPy pattern selection, how to structure my DSPy program.

2026-06-136

ai-cleaning-data

lebsral/DSPy-Programming-not-prompting-LMs-skills

Normalize and fix messy data fields using AI. Use when normalizing addresses, standardizing company names, fixing inconsistent date formats, cleaning CSV data before import, correcting typos in bulk data, normalizing phone number formats, standardizing job titles, cleaning up free-text fields, data quality improvement with AI, fixing formatting inconsistencies, bulk data normalization, preparing messy data for analysis, AI-powered data wrangling.

2026-06-136

ai-do

lebsral/DSPy-Programming-not-prompting-LMs-skills

Describe your AI problem and get routed to the right skill with a ready-to-use prompt. Use when you are not sure which ai- skill to use, want help picking the right approach, or just want to describe what you need in plain language. Also use this when someone says I want to build an AI that..., how do I make my AI..., or describes any AI/LLM task without naming a specific skill, I need AI but do not know where to start, which AI pattern should I use, what is the best way to add AI to my app, recommend an AI approach, AI feature discovery, too many AI options, overwhelmed by AI frameworks, just tell me what to build, new to DSPy, beginner AI project help, which LLM pattern fits my use case, confused about AI architecture, help me figure out my AI approach.

2026-06-136

ai-fine-tuning

lebsral/DSPy-Programming-not-prompting-LMs-skills

Fine-tune models on your data to maximize quality and cut costs. Use when prompt optimization hit a ceiling, you need domain specialization, you want cheaper models to match expensive ones, you heard fine-tuning will make us AI-native, you have 500+ training examples, or you need to train on proprietary data. Also use when you have spent weeks of manual iteration with no systematic improvement path, or manual prompt tuning got you to a working system but quality plateaued. Covers DSPy BootstrapFinetune, BetterTogether, model distillation, and when to fine-tune vs optimize prompts, LoRA vs full fine-tune, when to fine-tune vs few-shot, distill GPT-4 into a smaller model, teacher-student model training, custom model training with DSPy, model distillation, make a cheap model as good as GPT-4.

2026-06-136

name

ai-cutting-costs

description

Cut Your AI Costs

Guide the user through reducing AI API costs without sacrificing quality. Multiple strategies, from quick wins to advanced techniques.

Step 1: Understand where the money goes

Ask the user:

Which provider/model are you using? (GPT-4o, Claude, etc.)
How many API calls per day/month?
Is there a specific module or step that's most expensive?

Quick cost audit

import dspy

# Run your program and check token usage
lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)

result = my_program(question="test")
dspy.inspect_history(n=3)  # Shows token counts per call

Step 2: Quick wins

Use a cheaper model everywhere

The simplest fix — switch to a cheaper model and see if quality holds:

# Instead of GPT-4o (~$5/M input tokens)
lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-haiku-4-5-20251001", etc. — ~$0.15/M input tokens

# Or use an open-source model
lm = dspy.LM("together_ai/meta-llama/Llama-3-70b-chat-hf")  # or any provider DSPy supports

Always measure quality before and after with /ai-improving-accuracy. When you switch models, re-optimize your prompts — they don't transfer. See /ai-switching-models for the full workflow.

Enable caching

DSPy caches LM calls by default. Make sure you're not disabling it:

# Caching is ON by default — same inputs won't re-call the API
lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-haiku-4-5-20251001", etc. — cached automatically

# To verify caching is working, run the same input twice
# and check that the second call is instant

Step 3: Use different models for different tasks

Not every step in your pipeline needs the expensive model. Use dspy.context or set_lm to assign cheaper models to simpler steps:

expensive_lm = dspy.LM("openai/gpt-4o")  # or "anthropic/claude-sonnet-4-5-20250929", etc.
cheap_lm = dspy.LM("openai/gpt-4o-mini")  # or "anthropic/claude-haiku-4-5-20251001", etc.

dspy.configure(lm=expensive_lm)  # default

class MyPipeline(dspy.Module):
    def __init__(self):
        self.classify = dspy.ChainOfThought(ClassifySignature)
        self.generate = dspy.ChainOfThought(GenerateSignature)

    def forward(self, text):
        # Use cheap model for simple classification
        with dspy.context(lm=cheap_lm):
            category = self.classify(text=text)

        # Use expensive model only for complex generation
        return self.generate(text=text, category=category.label)

Per-module LM assignment

# Set LM on specific modules permanently
my_program.classify.lm = cheap_lm
my_program.generate.lm = expensive_lm

Step 4: Smart routing — cheap model for easy inputs, expensive for hard ones

Instead of sending everything to the expensive model, classify inputs by difficulty and route accordingly. This is the pattern behind FrugalGPT (up to 90% cost savings matching GPT-4 quality):

Route by complexity

from typing import Literal

class ComplexityRouter(dspy.Module):
    def __init__(self):
        self.assess = dspy.Predict(AssessComplexity)
        self.simple_handler = dspy.Predict(AnswerQuestion)
        self.complex_handler = dspy.ChainOfThought(AnswerQuestion)

    def forward(self, question):
        # Use the cheap model to decide complexity
        with dspy.context(lm=cheap_lm):
            assessment = self.assess(question=question)

        # Route to the right model
        if assessment.complexity == "simple":
            with dspy.context(lm=cheap_lm):
                return self.simple_handler(question=question)
        else:
            with dspy.context(lm=expensive_lm):
                return self.complex_handler(question=question)

class AssessComplexity(dspy.Signature):
    """Assess if this question needs a powerful model or a simple one can handle it."""
    question: str = dspy.InputField()
    complexity: Literal["simple", "complex"] = dspy.OutputField(
        desc="simple = factual/straightforward, complex = reasoning/nuanced"
    )

Cascading — try cheap first, fall back to expensive

class CascadingPipeline(dspy.Module):
    def __init__(self):
        self.answer = dspy.ChainOfThought(AnswerQuestion)
        self.verify = dspy.Predict(CheckConfidence)

    def forward(self, question):
        # Try cheap model first
        with dspy.context(lm=cheap_lm):
            result = self.answer(question=question)
            check = self.verify(question=question, answer=result.answer)

        # If cheap model isn't confident, escalate to expensive
        if not check.is_confident:
            with dspy.context(lm=expensive_lm):
                result = self.answer(question=question)

        return result

class CheckConfidence(dspy.Signature):
    """Is this answer confident and complete, or should we escalate to a better model?"""
    question: str = dspy.InputField()
    answer: str = dspy.InputField()
    is_confident: bool = dspy.OutputField()

Typical savings: 50-90% cost reduction. Most real-world traffic is simple questions that a cheap model handles fine.

Step 5: Reduce prompt length

Long prompts = more tokens = more cost.

Reduce few-shot examples

# Fewer demos = shorter prompts = lower cost
optimizer = dspy.BootstrapFewShot(
    metric=metric,
    max_bootstrapped_demos=2,   # down from 4
    max_labeled_demos=2,        # down from 4
)

Reduce retrieved passages

# Fewer passages = shorter context
class DocSearch(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=2)  # down from 5
        self.answer = dspy.ChainOfThought(AnswerSignature)

Simplify signatures

# Verbose — costs more tokens
class Verbose(dspy.Signature):
    """Given the following text, carefully analyze the content and provide a detailed classification."""
    text: str = dspy.InputField(desc="The full text content to be analyzed and classified")
    label: str = dspy.OutputField(desc="The classification label for this text")

# Concise — same quality, fewer tokens
class Concise(dspy.Signature):
    """Classify the text."""
    text: str = dspy.InputField()
    label: str = dspy.OutputField()

Step 6: Fine-tune a cheap model (advanced)

The biggest cost saver: train a small cheap model to do what the expensive model does. Distill from an expensive teacher to a cheap student:

# Build and optimize with the expensive model, then fine-tune a cheap one
optimizer = dspy.BootstrapFinetune(metric=metric, num_threads=24)
finetuned = optimizer.compile(my_program, trainset=trainset, teacher=teacher_optimized)

Requirements: 500+ training examples, a fine-tunable model. Typical savings: 10-50x cost reduction with 85-95% quality retention.

For the complete model distillation workflow (decision framework, prerequisites, BetterTogether, troubleshooting), see /ai-fine-tuning.

Step 7: Use `Predict` instead of `ChainOfThought` where possible

ChainOfThought adds a reasoning step which uses extra tokens. For simple tasks, Predict may be sufficient:

# ChainOfThought — more tokens, better for complex tasks
classifier = dspy.ChainOfThought(ClassifySignature)

# Predict — fewer tokens, fine for simple tasks
classifier = dspy.Predict(ClassifySignature)

Test with /ai-improving-accuracy to make sure quality doesn't drop.

Saturation-aware early stopping

When running prompt optimization (especially with GEPA or MIPROv2), monitor for score plateaus. Stopping early when the optimizer saturates can save 30-40% of optimization compute. See /dspy-gepa for saturation diagnosis details.

Cost reduction checklist

Switch to a cheaper model (measure quality first)
Verify caching is enabled
Use cheap models for simple steps, expensive for complex
Route easy inputs to cheap models, hard ones to expensive (Step 4)
Reduce few-shot examples (2 instead of 4)
Reduce retrieved passages
Use Predict instead of ChainOfThought for simple tasks
Fine-tune a cheap model for production (if 500+ examples available)

Gotchas

Don't re-optimize prompts on the old model after switching. Claude tends to keep the expensive model's optimized prompts when switching to a cheaper model. Prompts don't transfer between models — always re-run your optimizer after changing the LM. See /ai-switching-models.
Don't use ChainOfThought for the complexity router itself. The router in Step 4 should use dspy.Predict, not dspy.ChainOfThought — adding reasoning to the routing step defeats the purpose of saving tokens on easy inputs.
Don't cut demos to zero and expect quality to hold. Reducing max_bootstrapped_demos from 4 to 2 is fine; setting it to 0 removes all few-shot learning and quality collapses. Keep at least 1-2 demos.
Don't forget to measure before and after every cost change. Claude often applies multiple cost optimizations at once without baselining. Run dspy.evaluate before each change so you can attribute quality drops to the specific optimization that caused them.
Don't cache non-deterministic calls and expect reproducibility. If temperature > 0, cached results lock in one sample. Set temperature=0 for deterministic caching, or disable caching for calls where you want diversity.

When NOT to optimize costs

Do not cut costs if you have not baselined quality first. Optimizing costs on a system that already underperforms just locks in bad results at a lower price. Fix accuracy first with /ai-improving-accuracy, then reduce costs.

Do not route to cheap models if your traffic is uniformly complex. The routing pattern (Step 4) saves money when most inputs are easy — if 90% of your inputs genuinely need the expensive model, routing adds latency and complexity for minimal savings.

Do not fine-tune to save money if your use case changes frequently. Fine-tuned models are frozen in time — if your categories, policies, or domain shift monthly, the retraining cost and lag outweigh the per-call savings. Use prompt optimization instead.

Cross-references

Install any skill: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>

Multi-step pipelines with per-stage model assignment — see /ai-building-pipelines
Measure quality before and after cost cuts — see /ai-improving-accuracy
Debug breakage from cost optimization — see /ai-fixing-errors
Switch models without breaking prompts — see /ai-switching-models
DSPy modules (Predict vs ChainOfThought tradeoffs) — see /dspy-modules
Fine-tuning workflow and decision framework — see /ai-fine-tuning
Install /ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do

ai-cutting-costs

Más de este repositorio

Más de este repositorio

Cut Your AI Costs

Step 1: Understand where the money goes

Quick cost audit

Step 2: Quick wins

Use a cheaper model everywhere

Enable caching

Step 3: Use different models for different tasks

Per-module LM assignment

Step 4: Smart routing — cheap model for easy inputs, expensive for hard ones

Route by complexity

Cascading — try cheap first, fall back to expensive

Step 5: Reduce prompt length

Reduce few-shot examples

Reduce retrieved passages

Simplify signatures

Step 6: Fine-tune a cheap model (advanced)

Step 7: Use Predict instead of ChainOfThought where possible

Saturation-aware early stopping

Cost reduction checklist

Gotchas

When NOT to optimize costs

Cross-references

Cut Your AI Costs

Step 1: Understand where the money goes

Quick cost audit

Step 2: Quick wins

Use a cheaper model everywhere

Enable caching

Step 3: Use different models for different tasks

Per-module LM assignment

Step 4: Smart routing — cheap model for easy inputs, expensive for hard ones

Route by complexity

Cascading — try cheap first, fall back to expensive

Step 5: Reduce prompt length

Reduce few-shot examples

Reduce retrieved passages

Simplify signatures

Step 6: Fine-tune a cheap model (advanced)

Step 7: Use Predict instead of ChainOfThought where possible

Saturation-aware early stopping

Cost reduction checklist

Gotchas

When NOT to optimize costs

Cross-references

Step 7: Use `Predict` instead of `ChainOfThought` where possible

Step 7: Use `Predict` instead of `ChainOfThought` where possible