| name | dspy-primitives |
| description | DSPy typed wrappers (dspy.Image, dspy.Audio, dspy.Code, dspy.History, dspy.File, dspy.Reasoning, dspy.Tool, dspy.ToolCalls) for multimodal data, files, and structured outputs in signatures. Use when working with non-text inputs like images, audio, or code, passing PDFs or documents to the LM, capturing native reasoning traces from reasoning models, building multimodal AI pipelines, processing images alongside text, handling audio transcription inputs, working with code files as typed inputs, or managing conversation history in multi-turn chatbots. Also used for multimodal DSPy, image input in DSPy signature, process images with DSPy, audio input in DSPy, dspy.File, pass PDF to language model, document input in DSPy, dspy.Reasoning, capture thinking traces, native reasoning output, dspy.Tool, dspy.ToolCalls, typed fields in signatures, non-text data in DSPy, vision model with DSPy, Claude vision with DSPy, multimodal pipeline, image classification with DSPy, pass images to language model, conversation history type, structured types beyond strings. |
DSPy Primitives
Guide the user through DSPy's built-in primitive types for multimodal inputs, code handling, and conversation history.
Step 1: Understand the task
Before using primitives, clarify:
- What kind of non-text data? Images, audio, code, or conversation history — each has its own primitive type.
- Does the LM support it natively?
dspy.Image needs a vision model (GPT-4o, Claude 3+, Gemini). dspy.Audio needs an audio model (GPT-4o-audio-preview, Gemini). dspy.Code and dspy.History work with any LM.
- Is the data an input, output, or both? All primitives work as both input and output fields, but some patterns are more natural (e.g.,
dspy.Code for code generation output, dspy.Image for image analysis input).
What are primitives
Primitives are DSPy's custom types that go beyond plain strings. They let you pass images, audio, code, files, and conversation history directly into signatures, and capture structured outputs like native reasoning traces. DSPy handles the formatting, encoding, and adapter logic so the LM receives the data in the right format for its provider.
The core primitives:
| Primitive | Purpose | Typical use case |
|---|
dspy.Image | Images from URLs, files, or bytes | Vision tasks, image analysis, multimodal Q&A |
dspy.Audio | Audio from files, URLs, or arrays | Transcription, audio classification |
dspy.Code | Code with language annotation | Code generation, code review, analysis |
dspy.History | Conversation turns | Chatbots, multi-turn dialogue, follow-up questions |
dspy.File | Files (PDFs, documents) from a path, bytes, or upload ID | Document Q&A, PDF summarization |
dspy.Reasoning | Native reasoning/thinking traces from reasoning models | Capturing o1/o3/R1/extended-thinking output |
dspy.Tool | Wraps a Python callable as a tool the LM can call | Agents, ReAct (see /dspy-tools) |
dspy.ToolCalls | The LM's tool-call requests as a structured output | Agents, ReAct (see /dspy-tools) |
Two more types are the core input/output containers: dspy.Example holds a single labeled input/output pair (for training and few-shot data) and dspy.Prediction is what a module returns. Use them constantly but rarely construct primitives — see /dspy-data for dspy.Example depth.
dspy.Image
Wraps an image from any source into a format the LM can process. DSPy normalizes the input into a base64 data URI or plain URL automatically.
Constructor
dspy.Image(url=<source>, download=False, verify=True)
Parameters:
url — the image source. Accepts:
str — HTTP/HTTPS URL, GS URL, or local file path
bytes — raw image bytes
PIL.Image.Image — a PIL image instance
dict — {"url": value} (legacy form)
- An already-encoded data URI
download (bool, default False) — whether to download remote URLs to infer MIME type
verify (bool, default True) — whether to verify SSL certificates. Set False for self-signed certs.
Usage in signatures
import dspy
lm = dspy.LM("openai/gpt-4o")
dspy.configure(lm=lm)
class DescribeImage(dspy.Signature):
"""Describe what you see in the image."""
image: dspy.Image = dspy.InputField(desc="Image to analyze")
description: str = dspy.OutputField(desc="Detailed description of the image")
describer = dspy.Predict(DescribeImage)
result = describer(image=dspy.Image(url="https://example.com/photo.jpg"))
print(result.description)
result = describer(image=dspy.Image(url="/path/to/photo.png"))
from PIL import Image as PILImage
pil_img = PILImage.open("photo.png")
result = describer(image=dspy.Image(url=pil_img))
Multiple images
class CompareImages(dspy.Signature):
"""Compare two images and describe the differences."""
image_a: dspy.Image = dspy.InputField(desc="First image")
image_b: dspy.Image = dspy.InputField(desc="Second image")
differences: str = dspy.OutputField(desc="Key differences between the images")
dspy.Audio
Wraps audio data for LMs that support native audio input. Audio is encoded as base64 internally.
Creating Audio objects
audio = dspy.Audio.from_file("recording.wav")
audio = dspy.Audio.from_url("https://example.com/clip.mp3")
import numpy as np
audio = dspy.Audio.from_array(samples, sampling_rate=16000, format="wav")
audio = dspy.Audio(data="<base64-string>", audio_format="wav")
Usage in signatures
import dspy
lm = dspy.LM("openai/gpt-4o-audio-preview")
dspy.configure(lm=lm)
class TranscribeAudio(dspy.Signature):
"""Transcribe the spoken content in the audio."""
audio: dspy.Audio = dspy.InputField(desc="Audio recording to transcribe")
transcript: str = dspy.OutputField(desc="Transcribed text")
transcriber = dspy.Predict(TranscribeAudio)
result = transcriber(audio=dspy.Audio.from_file("meeting.wav"))
print(result.transcript)
Audio classification
from typing import Literal
class ClassifyAudio(dspy.Signature):
"""Classify the type of audio content."""
audio: dspy.Audio = dspy.InputField(desc="Audio clip to classify")
category: Literal["speech", "music", "ambient", "silence"] = dspy.OutputField()
language: str = dspy.OutputField(desc="Detected language if speech, else 'N/A'")
dspy.Code
Wraps code with a language annotation. DSPy formats it as a markdown code block so the LM sees properly delimited, syntax-aware code.
Language specification
Use bracket notation to specify the language:
dspy.Code["python"]
dspy.Code["java"]
dspy.Code["sql"]
dspy.Code["rust"]
The language tag tells DSPy to format the code as a fenced markdown block (```python ... ```) and guides the LM on syntax expectations.
Usage in signatures
Code generation (output):
import dspy
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)
class GenerateCode(dspy.Signature):
"""Generate Python code that solves the given problem."""
problem: str = dspy.InputField(desc="Problem description")
code: dspy.Code["python"] = dspy.OutputField(desc="Working Python solution")
generator = dspy.Predict(GenerateCode)
result = generator(problem="Write a function that checks if a string is a palindrome")
print(result.code)
Code analysis (input):
class ReviewCode(dspy.Signature):
"""Review the code for bugs, performance issues, and style problems."""
code: dspy.Code["python"] = dspy.InputField(desc="Code to review")
issues: list[str] = dspy.OutputField(desc="List of issues found")
severity: Literal["clean", "minor", "major", "critical"] = dspy.OutputField()
reviewer = dspy.ChainOfThought(ReviewCode)
result = reviewer(code="def fib(n):\n if n <= 1: return n\n return fib(n-1) + fib(n-2)")
print(result.issues)
print(result.severity)
Code transformation (input and output):
class ConvertCode(dspy.Signature):
"""Convert the Python code to equivalent Java code."""
python_code: dspy.Code["python"] = dspy.InputField(desc="Python source code")
java_code: dspy.Code["java"] = dspy.OutputField(desc="Equivalent Java code")
dspy.History
Represents conversation history as a list of message turns. Use it to build multi-turn chatbots and follow-up interactions in DSPy.
Creating History objects
history = dspy.History(messages=[
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "Paris"},
])
history = dspy.History(messages=[
{"question": "What is the capital of France?", "answer": "Paris"},
{"question": "What is the capital of Germany?", "answer": "Berlin"},
])
History objects are immutable (frozen). Create a new History to add turns.
Usage in signatures
import dspy
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)
class Chat(dspy.Signature):
"""Answer the user's question given the conversation history."""
history: dspy.History = dspy.InputField(desc="Prior conversation turns")
question: str = dspy.InputField(desc="Current user question")
answer: str = dspy.OutputField(desc="Response to the user")
chatbot = dspy.Predict(Chat)
Building conversation incrementally
Capture each response and append it to history for the next turn:
chatbot = dspy.Predict(Chat)
result = chatbot(
history=dspy.History(messages=[]),
question="What is the capital of France?"
)
print(result.answer)
history = dspy.History(messages=[
{"question": "What is the capital of France?", "answer": result.answer}
])
result = chatbot(
history=history,
question="What is its population?"
)
print(result.answer)
history = dspy.History(messages=[
{"question": "What is the capital of France?", "answer": "Paris"},
{"question": "What is its population?", "answer": result.answer},
])
result = chatbot(
history=history,
question="How does that compare to London?"
)
Helper pattern for managing history
class Chatbot(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(Chat)
self.turns = []
def forward(self, question):
history = dspy.History(messages=self.turns)
result = self.respond(history=history, question=question)
self.turns.append({"question": question, "answer": result.answer})
return result
dspy.File
Wraps a file (PDF, document, etc.) so you can pass it to an LM that supports file inputs. DSPy encodes the file as a base64 data URI (data:<mime_type>;base64,<...>) following the OpenAI file content-part spec. Added in DSPy 3.1.
Creating File objects
file = dspy.File.from_path("./research.pdf")
file = dspy.File.from_bytes(pdf_bytes, filename="research.pdf", mime_type="application/pdf")
file = dspy.File.from_file_id("file-abc123", filename="research.pdf")
file = dspy.File(file_data="data:application/pdf;base64,<base64-string>")
Usage in signatures
import dspy
lm = dspy.LM("openai/gpt-4o")
dspy.configure(lm=lm)
class SummarizeDoc(dspy.Signature):
"""Summarize the key findings in the document."""
file: dspy.File = dspy.InputField(desc="Document to summarize")
summary: str = dspy.OutputField(desc="Concise summary of the document")
summarizer = dspy.Predict(SummarizeDoc)
result = summarizer(file=dspy.File.from_path("./research.pdf"))
print(result.summary)
Use dspy.File when you need to feed a whole document to the LM (PDF Q&A, contract review, report summarization) rather than extracting and pasting its text. The model must support file inputs.
dspy.Reasoning
Captures the native reasoning/thinking trace emitted by reasoning models (o1/o3, DeepSeek-R1, Claude extended thinking) as a structured output field. When the configured model supports native reasoning, DSPy pulls the reasoning trace directly from the response; otherwise it falls back to a generated reasoning field, so the same signature works across reasoning and non-reasoning models. Added in DSPy 3.1.
dspy.Reasoning behaves like a string (you can index, slice, concatenate, and call string methods on it) while also being a typed primitive.
Usage in signatures
import dspy
lm = dspy.LM("openai/o3-mini")
dspy.configure(lm=lm)
class SolveProblem(dspy.Signature):
"""Solve the math problem."""
problem: str = dspy.InputField()
reasoning: dspy.Reasoning = dspy.OutputField(desc="The model's native reasoning")
answer: str = dspy.OutputField()
solver = dspy.Predict(SolveProblem)
result = solver(problem="If a train travels 60 km in 45 minutes, what is its speed in km/h?")
print(result.reasoning)
print(result.answer)
Use dspy.Reasoning when you want access to a reasoning model's native thinking as a first-class output. For a plain generated rationale that works on any LM, use dspy.ChainOfThought instead — see /dspy-chain-of-thought.
dspy.Tool and dspy.ToolCalls
dspy.Tool wraps a Python callable so the LM can invoke it as a tool, and dspy.ToolCalls represents the LM's tool-call requests as a structured output type. These are the building blocks for agents and tool use (ReAct). They are covered in depth — including registration, argument schemas, and execution loops — in /dspy-tools; reach for that skill rather than constructing these primitives directly.
Combining primitives in signatures
You can mix primitives with regular typed fields in the same signature:
class AnalyzeScreenshot(dspy.Signature):
"""Analyze a UI screenshot and generate test code for the visible elements."""
screenshot: dspy.Image = dspy.InputField(desc="Screenshot of the UI")
framework: str = dspy.InputField(desc="Test framework to use, e.g. 'playwright'")
test_code: dspy.Code["python"] = dspy.OutputField(desc="Generated test code")
element_count: int = dspy.OutputField(desc="Number of interactive elements found")
class AudioChat(dspy.Signature):
"""Respond to a user's audio message in a conversation."""
history: dspy.History = dspy.InputField(desc="Prior conversation turns")
audio_message: dspy.Audio = dspy.InputField(desc="User's spoken message")
response: str = dspy.OutputField(desc="Text response to the user")
Provider requirements
Not all LM providers support all primitives natively:
| Primitive | Requires |
|---|
dspy.Image | A vision-capable model (GPT-4o, Claude 3+, Gemini, etc.) |
dspy.Audio | An audio-capable model (GPT-4o-audio-preview, Gemini, etc.) |
dspy.Code | Any LM (formatted as markdown code blocks) |
dspy.History | Any LM (formatted as conversation turns) |
dspy.File | A model that supports file inputs (GPT-4o, Claude, Gemini, etc.) |
dspy.Reasoning | Best with a reasoning model (o1/o3, DeepSeek-R1, Claude extended thinking); falls back to a generated field on others |
DSPy's adapter system handles the provider-specific formatting. You write the signature once; DSPy translates it for the target LM.
Gotchas
- Claude uses the deprecated
dspy.Image.from_file() / from_url() class methods. Use dspy.Image(url="path-or-url") directly instead — the constructor accepts file paths, URLs, bytes, and PIL images via the url parameter.
- Claude passes raw strings where a primitive is expected. If a signature field is typed as
dspy.Code["python"], pass a string directly — DSPy's validate_input coerces it. But for dspy.Image and dspy.Audio, you must construct the primitive object explicitly. Raw strings will not be auto-converted.
- Claude uses
role/content keys in History messages instead of signature field names. History messages should use keys matching the signature fields (e.g., {"question": ..., "answer": ...}), not the generic role/content format. Using role/content works but produces worse prompt formatting because DSPy cannot map the history entries to the right signature fields.
- Claude forgets that History is frozen (immutable). You cannot append to an existing History object. Create a new
dspy.History(messages=[...old_turns, new_turn]) each time. Attempting to mutate raises a ValidationError.
- Claude uses
dspy.Image with a non-vision model. If the configured LM does not support vision (e.g., GPT-4o-mini, older Claude models), image inputs are silently ignored or cause errors. Always verify the model supports the primitive type.
Additional resources
Cross-references
Install any skill: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
- Defining signatures — see
/dspy-signatures
- Using signatures with modules — see
/dspy-modules, /dspy-predict, /dspy-chain-of-thought
- Tools and agents (
dspy.Tool, dspy.ToolCalls) — see /dspy-tools
dspy.Example and training data — see /dspy-data
- Building chatbots with History — see
/ai-building-chatbots
- Install
/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do