// |
| name | openai-responses |
| description | Build agentic AI applications with OpenAI's Responses API - the stateful successor to Chat Completions. Preserves reasoning across turns for 5% better multi-turn performance and 40-80% improved cache utilization. Use when: building AI agents with persistent reasoning, integrating MCP servers for external tools, using built-in Code Interpreter/File Search/Web Search, managing stateful conversations, implementing background processing for long tasks, or migrating from Chat Completions to gain polymorphic outputs and server-side tools. |
| license | MIT |
Status: Production Ready Last Updated: 2025-10-25 API Launch: March 2025 Dependencies: openai@5.19.1+ (Node.js) or fetch API (Cloudflare Workers)
The Responses API (/v1/responses) is OpenAI's unified interface for building agentic applications, launched in March 2025. It fundamentally changes how you interact with OpenAI models by providing stateful conversations and a structured loop for reasoning and acting.
Unlike Chat Completions where reasoning is discarded between turns, Responses keeps the notebook open. The model's step-by-step thought processes survive into the next turn, improving performance by approximately 5% on TAUBench and enabling better multi-turn interactions.
| Feature | Chat Completions | Responses API | Benefit |
|---|---|---|---|
| State Management | Manual (you track history) | Automatic (conversation IDs) | Simpler code, less error-prone |
| Reasoning | Dropped between turns | Preserved across turns | Better multi-turn performance |
| Tools | Client-side round trips | Server-side hosted | Lower latency, simpler code |
| Output Format | Single message | Polymorphic (messages, reasoning, tool calls) | Richer debugging, better UX |
| Cache Utilization | Baseline | 40-80% better | Lower costs, faster responses |
| MCP Support | Manual integration | Built-in | Easy external tool connections |
# Sign up at https://platform.openai.com/
# Navigate to API Keys section
# Create new key and save securely
export OPENAI_API_KEY="sk-proj-..."
Why this matters:
npm install openai
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const response = await openai.responses.create({
model: 'gpt-5',
input: 'What are the 5 Ds of dodgeball?',
});
console.log(response.output_text);
CRITICAL:
gpt-5 (can use gpt-5-mini, gpt-4o, etc.)input can be string or array of messages// No SDK needed - use fetch()
const response = await fetch('https://api.openai.com/v1/responses', {
method: 'POST',
headers: {
'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-5',
input: 'Hello, world!',
}),
});
const data = await response.json();
console.log(data.output_text);
Why fetch?
Use Responses API when:
Use Chat Completions when:
Chat Completions Flow:
User Input → Model → Single Message → Done
(Reasoning discarded, state lost)
Responses API Flow:
User Input → Model (preserved reasoning) → Polymorphic Outputs
↓ (server-side tools)
Tool Call → Tool Result → Model → Final Response
(Reasoning preserved, state maintained)
Cache Utilization:
Reasoning Performance:
The Responses API can automatically manage conversation state using conversation IDs.
// Create conversation with initial message
const conversation = await openai.conversations.create({
metadata: { user_id: 'user_123' },
items: [
{
type: 'message',
role: 'user',
content: 'Hello!',
},
],
});
console.log(conversation.id); // "conv_abc123..."
// First turn
const response1 = await openai.responses.create({
model: 'gpt-5',
conversation: 'conv_abc123',
input: 'What are the 5 Ds of dodgeball?',
});
console.log(response1.output_text);
// Second turn - model remembers previous context
const response2 = await openai.responses.create({
model: 'gpt-5',
conversation: 'conv_abc123',
input: 'Tell me more about the first one',
});
console.log(response2.output_text);
// Model automatically knows "first one" refers to first D from previous turn
Why this matters:
If you need full control, you can manually manage history:
let history = [
{ role: 'user', content: 'Tell me a joke' },
];
const response = await openai.responses.create({
model: 'gpt-5',
input: history,
store: true, // Optional: store for retrieval later
});
// Add response to history
history = [
...history,
...response.output.map(el => ({
role: el.role,
content: el.content,
})),
];
// Next turn
history.push({ role: 'user', content: 'Tell me another' });
const secondResponse = await openai.responses.create({
model: 'gpt-5',
input: history,
});
When to use manual management:
The Responses API includes server-side hosted tools that eliminate costly backend round trips.
| Tool | Purpose | Use Case |
|---|---|---|
| Code Interpreter | Execute Python code | Data analysis, calculations, charts |
| File Search | RAG without vector stores | Search uploaded files for answers |
| Web Search | Real-time web information | Current events, fact-checking |
| Image Generation | DALL-E integration | Create images from descriptions |
| MCP | Connect external tools | Stripe, databases, custom APIs |
Execute Python code server-side for data analysis, calculations, and visualizations.
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Calculate the mean, median, and mode of: 10, 20, 30, 40, 50',
tools: [{ type: 'code_interpreter' }],
});
console.log(response.output_text);
// Model writes and executes Python code, returns results
Advanced Example: Data Analysis
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Analyze this sales data and create a bar chart showing monthly revenue: [data here]',
tools: [{ type: 'code_interpreter' }],
});
// Check output for code execution results
response.output.forEach(item => {
if (item.type === 'code_interpreter_call') {
console.log('Code executed:', item.input);
console.log('Result:', item.output);
}
});
Why this matters:
Search through uploaded files without building your own RAG pipeline.
// 1. Upload files first (one-time setup)
const file = await openai.files.create({
file: fs.createReadStream('knowledge-base.pdf'),
purpose: 'assistants',
});
// 2. Use file search
const response = await openai.responses.create({
model: 'gpt-5',
input: 'What does the document say about pricing?',
tools: [
{
type: 'file_search',
file_ids: [file.id],
},
],
});
console.log(response.output_text);
// Model searches file and provides answer with citations
Supported File Types:
Get real-time information from the web.
const response = await openai.responses.create({
model: 'gpt-5',
input: 'What are the latest updates on GPT-5?',
tools: [{ type: 'web_search' }],
});
console.log(response.output_text);
// Model searches web and provides current information with sources
Why this matters:
Generate images directly in the Responses API.
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Create an image of a futuristic cityscape at sunset',
tools: [{ type: 'image_generation' }],
});
// Find image in output
response.output.forEach(item => {
if (item.type === 'image_generation_call') {
console.log('Image URL:', item.output.url);
}
});
Models Available:
The Responses API has built-in support for Model Context Protocol (MCP) servers, allowing you to connect external tools.
MCP is an open protocol that standardizes how applications provide context to LLMs. It allows you to:
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Roll 2d6 dice',
tools: [
{
type: 'mcp',
server_label: 'dice',
server_url: 'https://example.com/mcp',
},
],
});
// Model discovers available tools on MCP server and uses them
console.log(response.output_text);
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Create a $20 payment link',
tools: [
{
type: 'mcp',
server_label: 'stripe',
server_url: 'https://mcp.stripe.com',
authorization: process.env.STRIPE_OAUTH_TOKEN,
},
],
});
console.log(response.output_text);
// Model uses Stripe MCP server to create payment link
CRITICAL:
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Roll 2d4+1',
tools: [
{
type: 'mcp',
server_label: 'dice',
server_url: 'https://dmcp.example.com',
},
],
});
// Inspect tool calls
response.output.forEach(item => {
if (item.type === 'mcp_call') {
console.log('Tool:', item.name);
console.log('Arguments:', item.arguments);
console.log('Output:', item.output);
}
if (item.type === 'mcp_list_tools') {
console.log('Available tools:', item.tools);
}
});
Output Types:
mcp_list_tools - Tools discovered on servermcp_call - Tool invocation and resultmessage - Final response to userThe Responses API preserves the model's internal reasoning state across turns, unlike Chat Completions which discards it.
Visual Analogy:
TAUBench Results (GPT-5):
Why This Matters:
The Responses API provides reasoning summaries at no additional cost.
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Solve this complex math problem: [problem]',
});
// Inspect reasoning
response.output.forEach(item => {
if (item.type === 'reasoning') {
console.log('Model reasoning:', item.summary[0].text);
}
if (item.type === 'message') {
console.log('Final answer:', item.content[0].text);
}
});
Use Cases:
For tasks that take longer than standard timeout limits, use background mode.
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Analyze this 500-page document and summarize key findings',
background: true,
tools: [{ type: 'file_search', file_ids: [fileId] }],
});
// Returns immediately with status
console.log(response.status); // "in_progress"
console.log(response.id); // Use to check status later
// Poll for completion
const checkStatus = async (responseId) => {
const result = await openai.responses.retrieve(responseId);
if (result.status === 'completed') {
console.log(result.output_text);
} else if (result.status === 'failed') {
console.error('Task failed:', result.error);
} else {
// Still running, check again later
setTimeout(() => checkStatus(responseId), 5000);
}
};
checkStatus(response.id);
When to Use:
Timeout Limits:
The Responses API returns multiple output types instead of a single message.
| Type | Description | Example |
|---|---|---|
message | Text response to user | Final answer, explanation |
reasoning | Model's internal thought process | Step-by-step reasoning summary |
code_interpreter_call | Code execution | Python code + results |
mcp_call | Tool invocation | Tool name, args, output |
mcp_list_tools | Available tools | Tool definitions from MCP server |
file_search_call | File search results | Matched chunks, citations |
web_search_call | Web search results | URLs, snippets |
image_generation_call | Image generation | Image URL |
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Search the web for the latest AI news and summarize',
tools: [{ type: 'web_search' }],
});
// Process different output types
response.output.forEach(item => {
switch (item.type) {
case 'reasoning':
console.log('Reasoning:', item.summary[0].text);
break;
case 'web_search_call':
console.log('Searched:', item.query);
console.log('Sources:', item.results);
break;
case 'message':
console.log('Response:', item.content[0].text);
break;
}
});
// Or use helper for text-only
console.log(response.output_text);
Why This Matters:
| Feature | Chat Completions | Responses API | Migration |
|---|---|---|---|
| Endpoint | /v1/chat/completions | /v1/responses | Update URL |
| Parameter | messages | input | Rename parameter |
| State | Manual (messages array) | Automatic (conversation ID) | Use conversation IDs |
| Tools | tools array with functions | Built-in types + MCP | Update tool definitions |
| Output | choices[0].message.content | output_text or output array | Update response parsing |
| Streaming | data: {"choices":[...]} | SSE with multiple item types | Update stream parser |
Before (Chat Completions):
const response = await openai.chat.completions.create({
model: 'gpt-5',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' },
],
});
console.log(response.choices[0].message.content);
After (Responses):
const response = await openai.responses.create({
model: 'gpt-5',
input: [
{ role: 'developer', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' },
],
});
console.log(response.output_text);
Key Differences:
chat.completions.create → responses.createmessages → inputsystem role → developer rolechoices[0].message.content → output_textMigrate now if:
Stay on Chat Completions if:
Error:
Conversation state not maintained between turns
Cause:
Solution:
// Create conversation once
const conv = await openai.conversations.create();
// Reuse conversation ID for all turns
const response1 = await openai.responses.create({
model: 'gpt-5',
conversation: conv.id, // ✅ Same ID
input: 'First message',
});
const response2 = await openai.responses.create({
model: 'gpt-5',
conversation: conv.id, // ✅ Same ID
input: 'Follow-up message',
});
Error:
{
"error": {
"type": "mcp_connection_error",
"message": "Failed to connect to MCP server"
}
}
Causes:
Solutions:
// 1. Verify URL is correct
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Test MCP',
tools: [
{
type: 'mcp',
server_label: 'test',
server_url: 'https://api.example.com/mcp', // ✅ Full URL
authorization: process.env.AUTH_TOKEN, // ✅ Valid token
},
],
});
// 2. Test server URL manually
const testResponse = await fetch('https://api.example.com/mcp');
console.log(testResponse.status); // Should be 200
// 3. Check token expiration
console.log('Token expires:', parseJWT(token).exp);
Error:
{
"error": {
"type": "code_interpreter_timeout",
"message": "Code execution exceeded time limit"
}
}
Cause:
Solution:
// Use background mode for long-running code
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Process this large dataset',
background: true, // ✅ Extended timeout
tools: [{ type: 'code_interpreter' }],
});
// Poll for results
const result = await openai.responses.retrieve(response.id);
Error:
{
"error": {
"type": "rate_limit_error",
"message": "DALL-E rate limit exceeded"
}
}
Cause:
Solution:
// Implement retry with exponential backoff
const generateImage = async (prompt, retries = 3) => {
try {
return await openai.responses.create({
model: 'gpt-5',
input: prompt,
tools: [{ type: 'image_generation' }],
});
} catch (error) {
if (error.type === 'rate_limit_error' && retries > 0) {
const delay = (4 - retries) * 1000; // 1s, 2s, 3s
await new Promise(resolve => setTimeout(resolve, delay));
return generateImage(prompt, retries - 1);
}
throw error;
}
};
Problem:
Solution:
// Use more specific queries
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Find sections about pricing in Q4 2024 specifically', // ✅ Specific
// NOT: 'Find pricing' (too vague)
tools: [{ type: 'file_search', file_ids: [fileId] }],
});
// Or filter results manually
response.output.forEach(item => {
if (item.type === 'file_search_call') {
const relevantChunks = item.results.filter(
chunk => chunk.score > 0.7 // ✅ Only high-confidence matches
);
}
});
Problem:
Explanation:
Solution:
// Monitor usage
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Hello',
store: false, // ✅ Don't store if not needed
});
console.log('Usage:', response.usage);
// {
// prompt_tokens: 10,
// completion_tokens: 20,
// tool_tokens: 5,
// total_tokens: 35
// }
Error:
{
"error": {
"type": "invalid_request_error",
"message": "Conversation conv_xyz not found"
}
}
Causes:
Solution:
// Verify conversation exists before using
const conversations = await openai.conversations.list();
const exists = conversations.data.some(c => c.id === 'conv_xyz');
if (!exists) {
// Create new conversation
const newConv = await openai.conversations.create();
// Use newConv.id
}
Problem:
Solution:
// Use helper methods
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Search for AI news',
tools: [{ type: 'web_search' }],
});
// Helper: Get text-only output
console.log(response.output_text);
// Manual: Inspect all outputs
response.output.forEach(item => {
console.log('Type:', item.type);
console.log('Content:', item);
});
1. Use Conversation IDs (Cache Benefits)
// ✅ GOOD: Reuse conversation ID
const conv = await openai.conversations.create();
const response1 = await openai.responses.create({
model: 'gpt-5',
conversation: conv.id,
input: 'Question 1',
});
// 40-80% better cache utilization
// ❌ BAD: New manual history each time
const response2 = await openai.responses.create({
model: 'gpt-5',
input: [...previousHistory, newMessage],
});
// No cache benefits
2. Disable Storage When Not Needed
// For one-off requests
const response = await openai.responses.create({
model: 'gpt-5',
input: 'Quick question',
store: false, // ✅ Don't store conversation
});
3. Use Smaller Models When Possible
// For simple tasks
const response = await openai.responses.create({
model: 'gpt-5-mini', // ✅ 50% cheaper
input: 'Summarize this paragraph',
});
const createResponseWithRetry = async (params, maxRetries = 3) => {
for (let i = 0; i < maxRetries; i++) {
try {
return await openai.responses.create(params);
} catch (error) {
if (error.type === 'rate_limit_error' && i < maxRetries - 1) {
const delay = Math.pow(2, i) * 1000; // Exponential backoff
console.log(`Rate limited, retrying in ${delay}ms`);
await new Promise(resolve => setTimeout(resolve, delay));
} else {
throw error;
}
}
}
};
const monitoredResponse = async (input) => {
const startTime = Date.now();
try {
const response = await openai.responses.create({
model: 'gpt-5',
input,
});
// Log success metrics
console.log({
status: 'success',
latency: Date.now() - startTime,
tokens: response.usage.total_tokens,
model: response.model,
conversation: response.conversation_id,
});
return response;
} catch (error) {
// Log error metrics
console.error({
status: 'error',
latency: Date.now() - startTime,
error: error.message,
type: error.type,
});
throw error;
}
};
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
export async function handleRequest(input: string) {
const response = await openai.responses.create({
model: 'gpt-5',
input,
tools: [{ type: 'web_search' }],
});
return response.output_text;
}
Pros:
Cons:
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const { input } = await request.json();
const response = await fetch('https://api.openai.com/v1/responses', {
method: 'POST',
headers: {
'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-5',
input,
tools: [{ type: 'web_search' }],
}),
});
const data = await response.json();
return new Response(data.output_text, {
headers: { 'Content-Type': 'text/plain' },
});
},
};
Pros:
Cons:
Use conversation IDs for multi-turn interactions
const conv = await openai.conversations.create();
// Reuse conv.id for all related turns
Handle all output types in polymorphic responses
response.output.forEach(item => {
if (item.type === 'reasoning') { /* log */ }
if (item.type === 'message') { /* display */ }
});
Use background mode for long-running tasks
const response = await openai.responses.create({
background: true, // ✅ For tasks >30s
...
});
Provide authorization tokens for MCP servers
tools: [{
type: 'mcp',
authorization: process.env.TOKEN, // ✅ Required
}]
Monitor token usage for cost control
console.log(response.usage.total_tokens);
Never expose API keys in client-side code
// ❌ DANGER: API key in browser
const response = await fetch('https://api.openai.com/v1/responses', {
headers: { 'Authorization': 'Bearer sk-proj-...' }
});
Never assume single message output
// ❌ BAD: Ignores reasoning, tool calls
console.log(response.output[0].content);
// ✅ GOOD: Use helper or check all types
console.log(response.output_text);
Never reuse conversation IDs across users
// ❌ DANGER: User A sees User B's conversation
const sharedConv = 'conv_123';
Never ignore error types
// ❌ BAD: Generic error handling
try { ... } catch (e) { console.log('error'); }
// ✅ GOOD: Type-specific handling
catch (e) {
if (e.type === 'rate_limit_error') { /* retry */ }
if (e.type === 'mcp_connection_error') { /* alert */ }
}
Never poll faster than 1 second for background tasks
// ❌ BAD: Too frequent
setInterval(() => checkStatus(), 100);
// ✅ GOOD: Reasonable interval
setInterval(() => checkStatus(), 5000);
templates/ - Working code examplesreferences/responses-vs-chat-completions.md - Feature comparisonreferences/mcp-integration-guide.md - MCP server setupreferences/built-in-tools-guide.md - Tool usage patternsreferences/stateful-conversations.md - Conversation managementreferences/migration-guide.md - Chat Completions → Responsesreferences/top-errors.md - Common errors and solutionstemplates/basic-response.ts - Simple exampletemplates/stateful-conversation.ts - Multi-turn chattemplates/mcp-integration.ts - External toolsreferences/top-errors.md - Avoid common pitfallsreferences/migration-guide.md - If migrating from Chat CompletionsHappy building with the Responses API! 🚀