一键在 Manus 中运行任何 Skill

$pwd:

dspy-advanced-workflow

Name: Dspy Advanced Workflow
Author: intertwine

// Drive a complete DSPy 3.2.x project end-to-end — spec → program → metric → baseline → GEPA optimize → export → deploy. Orchestrates the other four DSPy skills (dspy-fundamentals, dspy-evaluation-harness, dspy-gepa-optimizer, dspy-rlm-module) in the correct order. Use this for any non-trivial DSPy build from scratch.

在 Manus 中运行

$ git log --oneline --stat

stars:245

forks:22

updated:2026年5月25日 05:28

文件资源管理器

3 个文件

SKILL.md

readonly

related-skills.json

同仓库

dspy-evaluation-harness.md

from "intertwine/dspy-agent-skills"

Build DSPy evaluation harnesses with rich-feedback metrics that are essential for GEPA optimization. Use when writing a metric function, calling dspy.Evaluate, splitting dev/val sets, debugging "why is my optimizer not improving?", or designing CI-ready DSPy eval suites.

2026-05-25245

dspy-fundamentals.md

from "intertwine/dspy-agent-skills"

Write idiomatic DSPy 3.2.x programs — typed Signatures, dspy.Module subclasses, Predict/ChainOfThought/ReAct/ProgramOfThought, and save/load. Use this when starting any new DSPy project or when fixing non-idiomatic DSPy code (hard-coded prompts, ad-hoc string templates, untyped outputs, non-serializable classes).

2026-05-25245

dspy-gepa-optimizer.md

from "intertwine/dspy-agent-skills"

Optimize DSPy programs with dspy.GEPA — the reflective/evolutionary optimizer that is the 2026 gold standard for DSPy (beats MIPROv2 on complex tasks with far fewer rollouts when the metric returns rich feedback). Use when the user says optimize, compile, GEPA, reflective optimization, or "make this program better" and a DSPy program + metric + trainset exist.

2026-05-25245

dspy-rlm-module.md

from "intertwine/dspy-agent-skills"

Use dspy.RLM (Recursive Language Model) for reasoning over contexts too large to fit in an LLM's working window — entire codebases, long logs, massive documents, or multi-step data exploration that needs a sandboxed Python REPL. Use when the input is >100k tokens, needs recursive chunking, or benefits from the LLM writing and running code to probe data.

2026-04-21245

package.json

"author": "intertwine"

"repository": "intertwine/dspy-agent-skills"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件开发工程师计算机与数学类职业15-1252L4

name	dspy-advanced-workflow
description	Drive a complete DSPy 3.2.x project end-to-end — spec → program → metric → baseline → GEPA optimize → export → deploy. Orchestrates the other four DSPy skills (dspy-fundamentals, dspy-evaluation-harness, dspy-gepa-optimizer, dspy-rlm-module) in the correct order. Use this for any non-trivial DSPy build from scratch.
when_to_use	User wants to build, optimize, and ship a new DSPy pipeline; says "full workflow" / "end to end" / "from scratch"; or needs the standard loop applied to a greenfield task.

DSPy Advanced Workflow (2026)

This skill runs the seven-step loop that turns a natural-language task description into an optimized, saved, deployable DSPy program. Every step delegates to a specific skill — invoke them in order.

The seven steps

1. Spec

Rephrase the user's task in one sentence. Identify inputs, outputs, the quality axis that matters, and any constraints (latency, cost, tool access, context size). Pick predictor shape:

Task shape	Predictor
Single-step structured I/O	`dspy.Predict` / `dspy.ChainOfThought`
Tool use / multi-step	`dspy.ReAct`
Code execution	`dspy.ProgramOfThought`
Long context / codebase	`dspy.RLM` → `dspy-rlm-module`

2. Program

Write the typed dspy.Signature + dspy.Module subclass per dspy-fundamentals. No hard-coded prompts. Keep predictors named so GEPA can target them.

3. Data

Build trainset and separate valset as dspy.Example(...).with_inputs(...). For GEPA, maximize trainset size and keep validation just large enough to represent downstream behavior; held-out testset is reported on at the end only. See dspy-evaluation-harness.

4. Rich metric

Write rich_metric(gold, pred, trace=None, pred_name=None, pred_trace=None) returning dspy.Prediction(score=0..1, feedback="natural-language critique"). The feedback is load-bearing — it's what GEPA's reflection LM learns from. A dict with the same fields crashes dspy.Evaluate; only dspy.Prediction aggregates correctly. See dspy-evaluation-harness.

5. Baseline

evaluator = dspy.Evaluate(devset=valset, metric=rich_metric,
                          num_threads=8, display_progress=True,
                          provide_traceback=True,
                          save_as_json="runs/baseline.json")
baseline = evaluator(program)
print("Baseline:", baseline.score)

6. GEPA optimize

reflection_lm = dspy.LM("openai/gpt-5", temperature=1.0, max_tokens=32000)
optimizer = dspy.GEPA(
    metric=rich_metric,
    auto="medium",
    reflection_lm=reflection_lm,
    candidate_selection_strategy="pareto",
    track_stats=True,
    track_best_outputs=True,
    log_dir="./gepa_logs",
    num_threads=8,
    seed=0,
)
optimized = optimizer.compile(student=program, trainset=trainset, valset=valset)
print("Optimized:", evaluator(optimized).score)

Run auto="light" first as a sanity check; move to auto="medium"/"heavy" for the final run. See dspy-gepa-optimizer.

If you need a deliberate multi-stage compile loop, DSPy 3.2.x also exposes dspy.BetterTogether(metric=..., bootstrap=..., gepa=...) for chaining named optimizers after you have a clean baseline GEPA setup.

7. Export & deploy

optimized.save("artifacts/program.json", save_program=False)     # state, portable
# or for full deployment artifact:
optimized.save("artifacts/program_dir/", save_program=True)

Deploy:

Load with dspy.load("artifacts/program_dir/") or reconstruct + .load("program.json").
Wrap in FastAPI/CLI.
Enable track_usage=True for cost/latency observability.
Log with MLflow (mlflow.dspy.autolog()) or W&B in CI.
Keep an offline regression test that runs the evaluator against the saved program and fails CI below a threshold.

Full orchestration template

"""DSPy end-to-end pipeline — spec → optimize → deploy."""

import dspy
from pathlib import Path

# ----- 1–2. Spec & program (dspy-fundamentals) -----
class MyTask(dspy.Signature):
    """<one-line instruction from the spec>."""
    input_field: str = dspy.InputField()
    output_field: str = dspy.OutputField()

class MyProgram(dspy.Module):
    def __init__(self):
        super().__init__()
        self.step = dspy.ChainOfThought(MyTask)
    def forward(self, **kw):
        return self.step(**kw)

# ----- 3. Data (dspy-evaluation-harness) -----
trainset = [...]   # list[dspy.Example(...).with_inputs(...)]
valset   = [...]

# ----- 4. Rich metric (dspy-evaluation-harness) -----
def rich_metric(gold, pred, trace=None, pred_name=None, pred_trace=None):
    score = ...          # compute 0..1
    feedback = ...       # detailed critique
    return dspy.Prediction(score=score, feedback=feedback)  # NOT a dict

# ----- 5. Baseline -----
dspy.configure(lm=dspy.LM("openai/gpt-4o"), track_usage=True)
evaluator = dspy.Evaluate(devset=valset, metric=rich_metric, num_threads=8,
                          display_progress=True, provide_traceback=True,
                          save_as_json="runs/baseline.json")
program = MyProgram()
print("Baseline:", evaluator(program).score)

# ----- 6. GEPA optimize (dspy-gepa-optimizer) -----
optimizer = dspy.GEPA(
    metric=rich_metric,
    auto="medium",
    reflection_lm=dspy.LM("openai/gpt-5", temperature=1.0, max_tokens=32000),
    candidate_selection_strategy="pareto",
    track_stats=True, track_best_outputs=True,
    log_dir="./gepa_logs", num_threads=8, seed=0,
)
optimized = optimizer.compile(student=program, trainset=trainset, valset=valset)
print("Optimized:", evaluator(optimized).score)

# ----- 7. Export (dspy-fundamentals) -----
Path("artifacts").mkdir(exist_ok=True)
optimized.save("artifacts/program.json", save_program=False)

Guardrails

Never skip step 3 (rich metric). GEPA without feedback ≈ random search.
Always baseline before optimizing — no baseline, no claim.
Save both pre- and post-optimization metrics to JSON for auditability.
If held-out test score drops post-optimization, your valset is too narrow. Expand valset and re-run.
Freeze optimized program with module._compiled = True before multi-stage re-compilation.

dspy-advanced-workflow

DSPy Advanced Workflow (2026)

The seven steps

1. Spec

2. Program

3. Data

4. Rich metric

5. Baseline

6. GEPA optimize

7. Export & deploy

Full orchestration template

Guardrails

Runnable scaffold → example_pipeline.py

DSPy Advanced Workflow (2026)

The seven steps

1. Spec

2. Program

3. Data

4. Rich metric

5. Baseline

6. GEPA optimize

7. Export & deploy

Full orchestration template

Guardrails

Runnable scaffold → example_pipeline.py

dspy-advanced-workflow

同仓库更多 Skills

同仓库更多 Skills

DSPy Advanced Workflow (2026)

The seven steps

1. Spec

2. Program

3. Data

4. Rich metric

5. Baseline

6. GEPA optimize

7. Export & deploy

Full orchestration template

Guardrails

Runnable scaffold → example_pipeline.py

DSPy Advanced Workflow (2026)

The seven steps

1. Spec

2. Program

3. Data

4. Rich metric

5. Baseline

6. GEPA optimize

7. Export & deploy

Full orchestration template

Guardrails

Runnable scaffold → example_pipeline.py