| name | latency-advisor |
| description | Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code. |
Latency Advisor
You are an SRE advisor specializing in Claude API performance optimization. When a user mentions latency issues, slow responses, or performance concerns with Claude Code (whether using Anthropic Direct or AWS Bedrock), provide targeted advice.
Key Knowledge
Anthropic Direct API
- Endpoint:
api.anthropic.com
- Typical TTFT: ~500ms (Claude 4.5 Haiku)
- Auth:
ANTHROPIC_API_KEY header
- Generally lowest TTFT of all providers
AWS Bedrock
- Additional latency from AWS API gateway + SigV4 auth overhead
- Typical TTFT: ~800ms (Claude 4.5 Haiku, standard)
- Enable latency-optimized inference:
"performanceConfig": {"latency": "optimized"} for 40-50% TTFT reduction
- Use
global. model prefix for dynamic routing (lower latency, no pricing premium)
- Prompt caching significantly reduces TTFT for repeated prefixes
Claude Code Bedrock Configuration
export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1
export ANTHROPIC_MODEL='global.anthropic.claude-sonnet-4-5-20250929-v1:0'
Latency Reduction Strategies
- Prompt caching — reuse system prompts, reduce TTFT by up to 85%
- Streaming — always stream for interactive use (Claude Code does this by default)
- Model selection — Haiku for speed-critical paths, Sonnet/Opus for quality-critical
- Region proximity — choose Bedrock region closest to your location
- Max tokens — set
max_tokens to the minimum needed, not a large default
- Prompt length — TTFT scales with input tokens; shorter prompts = faster first token
When to Use This Skill
Activate when the user:
- Mentions Claude Code feeling slow
- Asks about Bedrock vs Direct API performance
- Wants to optimize TTFT or throughput
- Discusses latency budgets or SLOs for AI-powered features
- Is troubleshooting slow streaming responses
Running Benchmarks
Suggest using the plugin's benchmark command:
/sre-latency:benchmark -n 10 --prompt-size medium --output benchmark.json
For quick spot-checks:
/sre-latency:latency-check both