원클릭으로
azure-openai-patterns
Azure OpenAI API patterns for rate limiting, function calling, error handling, and token optimization
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
메뉴
Azure OpenAI API patterns for rate limiting, function calling, error handling, and token optimization
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
SOC 직업 분류 기준
Create and maintain ASCII visual dashboards for project tracking with parallel lane progress bars
Store and manage voice samples for TTS cloning — portable, version-controlled audio references
Clear documentation through visual excellence
AI music generation via Replicate — 5 models for background tracks, lyrics, and sound design
Practitioner methodology for longitudinal case study research, evidence-based documentation, and publication-ready academic writing in AI-assisted development.
First impressions matter. Set projects up for success.
| name | azure-openai-patterns |
| description | Azure OpenAI API patterns for rate limiting, function calling, error handling, and token optimization |
| tier | standard |
| applyTo | **/*openai*,**/*chat*,**/*llm*,**/*gpt* |
Rate limiting, function calling, error handling, and token optimization for Azure OpenAI API.
Version: 1.0.0
Azure OpenAI uses dual rate limits: Tokens Per Minute (TPM) and Requests Per Minute (RPM). The ratio is typically 6 RPM per 1000 TPM.
| Model | Tier | TPM | RPM | Ratio |
|---|---|---|---|---|
| gpt-4o | Default | 450K | 2.7K | 6 RPM/1K TPM |
| gpt-4o-mini | Default | 2M | 12K | 6 RPM/1K TPM |
| gpt-4o | Enterprise | 30M | 180K | 6 RPM/1K TPM |
TPM is estimated before processing based on:
max_tokens parameter settingbest_of parameter setting (if used)The rate limit estimate is NOT the same as actual token consumption for billing.
RPM is enforced over small time windows (1-10 seconds):
600 RPM deployment = max 10 requests per second
If you send 15 requests in 1 second → 429 error
Even though 15/min < 600/min
async function chatWithTools(
messages: ChatCompletionMessage[],
tools: Tool[],
onStatusUpdate?: (status: string) => void
): Promise<ChatCompletionResponse> {
const maxRetries = 5;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
const response = await fetch(apiUrl, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${token}`,
},
body: JSON.stringify({ messages, tools, tool_choice: 'auto' }),
});
if (response.ok) {
return response.json();
}
if (response.status === 429 && attempt < maxRetries) {
const waitTime = Math.pow(2, attempt);
onStatusUpdate?.(`Rate limited. Waiting ${waitTime}s...`);
await new Promise(resolve => setTimeout(resolve, waitTime * 1000));
continue;
}
throw new Error(`API error: ${response.status}`);
}
}
// Bad: Set unnecessarily high max_tokens — uses full quota even for short responses
const badRequest = { messages: [...], max_tokens: 4000 };
// Good: Set appropriate max_tokens for expected response length
const goodRequest = { messages: [...], max_tokens: 500 };
// Bad: Send one request per tool result (consumes RPM quota)
for (const toolCall of toolCalls) {
const result = await executeFunction(toolCall);
await sendToolResult(result);
}
// Good: Collect all results and send once
const results = await Promise.all(
toolCalls.map(tc => executeFunction(tc))
);
await sendToolResults(results); // Single request
const headers = response.headers;
const remainingRequests = headers.get('x-ratelimit-remaining-requests');
const remainingTokens = headers.get('x-ratelimit-remaining-tokens');
const resetRequests = headers.get('x-ratelimit-reset-requests');
const resetTokens = headers.get('x-ratelimit-reset-tokens');
const retryAfter = headers.get('Retry-After'); // Only on 429
// Bad: Return entire resource objects
{ name: 'get_resources', description: 'Get all Azure resources' }
// Returns: huge JSON with all properties
// Good: Return only necessary fields
{ name: 'get_resources', description: 'Get Azure resource summary' }
// Returns: { name, type, status } only
const request = {
messages,
tools,
parallel_tool_calls: true, // Default: true in newer models
};
// Model may call multiple tools in one response, reducing round trips
class RequestQueue {
private queue: Array<() => Promise<void>> = [];
private processing = false;
private minDelayMs = 100; // 10 req/sec max
async enqueue<T>(request: () => Promise<T>): Promise<T> {
return new Promise((resolve, reject) => {
this.queue.push(async () => {
try { resolve(await request()); }
catch (e) { reject(e); }
await this.delay(this.minDelayMs);
});
this.process();
});
}
private async process() {
if (this.processing) return;
this.processing = true;
while (this.queue.length > 0) {
const next = this.queue.shift();
await next?.();
}
this.processing = false;
}
private delay(ms: number) {
return new Promise(r => setTimeout(r, ms));
}
}
| Code | Meaning | Action |
|---|---|---|
| 429 | Rate limited | Exponential backoff, check Retry-After |
| 400 | Invalid request | Check request format, content filter |
| 401 | Authentication error | Refresh token |
| 403 | Quota exceeded | Wait or upgrade tier |
| 500 | Server error | Retry with backoff |
| 503 | Service unavailable | Retry with longer backoff |
if (response.status === 400) {
const error = await response.json();
if (error.error?.code === 'content_filter') {
return { message: 'Content was filtered by safety policy.', filtered: true };
}
}
| Setting | Value | Rationale |
|---|---|---|
| max_tokens | 500-2000 | Sized for expected response |
| temperature | 0.3-0.7 | Lower for tool calling, higher for creative |
| retry attempts | 5 | Handles transient rate limits |
| base delay | 2000ms | Start at 2s for backoff |
| max delay | 60000ms | Cap at 1 minute |
| Trigger | Response |
|---|---|
| "azure openai", "rate limit", "429" | Full skill activation |
| "function calling", "tool calling" | Function Calling Patterns section |
| "token optimization", "max_tokens" | Pattern 2 + Recommended Settings |
| "retry", "backoff" | Pattern 1 + Error Codes |
| "request queue", "high volume" | Pattern 3 |