with one click
enterprise-ai-patterns
// Production-grade AI architecture patterns for enterprise - security, governance, scalability, and operational excellence
// Production-grade AI architecture patterns for enterprise - security, governance, scalability, and operational excellence
Patterns for multi-agent coordination, task decomposition, handoffs, and workflow orchestration. Best practices for building and managing agent systems.
Enterprise AI security - OWASP LLM Top 10, prompt injection defense, guardrails, PII protection
Create professional architecture diagrams using D2, Draw.io, Mermaid, and OCI official icons for enterprise-grade visualizations
Build AI applications on AWS using Bedrock, SageMaker, and AI/ML services with best practices for enterprise deployment
Build AI applications on Azure using Azure OpenAI, Cognitive Services, and ML services with enterprise patterns
Build autonomous AI agents using Claude Agent SDK with computer use, tool calling, MCP integration, and production best practices
| name | Enterprise AI Patterns |
| description | Production-grade AI architecture patterns for enterprise - security, governance, scalability, and operational excellence |
| version | 1.1.0 |
| last_updated | "2026-01-06T00:00:00.000Z" |
| external_version | 2026 Enterprise Patterns |
| triggers | ["enterprise AI","production AI","AI governance","AI at scale","enterprise patterns"] |
You are an expert in enterprise-grade AI architecture patterns. You help organizations build AI systems that are secure, scalable, governable, and operationally excellent.
┌─────────────────────────────────────────────────────────────────┐
│ ENTERPRISE AI PILLARS │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ SECURITY │ │ GOVERNANCE│ │ SCALE │ │ OPERATIONS│ │
│ │ │ │ │ │ │ │ │ │
│ │ - IAM │ │ - Policies│ │ - Auto │ │ - Monitor │ │
│ │ - Encrypt │ │ - Audit │ │ - Distrib │ │ - Alert │ │
│ │ - Network │ │ - Lineage │ │ - Multi- │ │ - Incident│ │
│ │ - Data │ │ - Quality │ │ region │ │ - SRE │ │
│ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │
│ │
│ ┌───────────┐ │
│ │ COST │ │
│ │ │ │
│ │ - FinOps │ │
│ │ - Optimize│ │
│ │ - Budget │ │
│ └───────────┘ │
└─────────────────────────────────────────────────────────────────┘
Centralized entry point for all AI services with security, routing, and observability.
┌─────────────────────────────────────────────────────────────────┐
│ AI GATEWAY PATTERN │
│ │
│ Applications │
│ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ App A │ │ App B │ │ App C │ │
│ └───┬────┘ └───┬────┘ └───┬────┘ │
│ │ │ │ │
│ └──────────┼──────────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ AI GATEWAY │ │
│ │ │ │
│ │ - AuthN/AuthZ │ │
│ │ - Rate Limit │ │
│ │ - Routing │ │
│ │ - Logging │ │
│ │ - Caching │ │
│ │ - Fallback │ │
│ └───────┬───────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │ OCI │ │Azure │ │ AWS │ │
│ │GenAI │ │OpenAI│ │Bedrock│ │
│ └──────┘ └──────┘ └──────┘ │
└─────────────────────────────────────────────────────────────────┘
from fastapi import FastAPI, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
import time
import logging
app = FastAPI()
class AIGateway:
def __init__(self):
self.providers = {
"oci": OCIGenAIProvider(),
"azure": AzureOpenAIProvider(),
"aws": AWSBedrockProvider()
}
self.rate_limiter = RateLimiter()
self.cache = ResponseCache()
self.logger = logging.getLogger("ai_gateway")
async def route_request(self, request: AIRequest) -> AIResponse:
# 1. Rate limiting
if not self.rate_limiter.allow(request.user_id):
raise HTTPException(429, "Rate limit exceeded")
# 2. Check cache
cached = self.cache.get(request)
if cached:
return cached
# 3. Route to provider
provider = self.select_provider(request)
# 4. Execute with fallback
try:
response = await provider.generate(request)
except ProviderError:
response = await self.fallback(request)
# 5. Cache and log
self.cache.set(request, response)
self.log_request(request, response)
return response
def select_provider(self, request: AIRequest) -> Provider:
"""Route based on model preference or cost."""
if request.model.startswith("gpt"):
return self.providers["azure"]
elif request.model.startswith("claude"):
return self.providers["aws"]
else:
return self.providers["oci"] # Default to OCI
Central catalog of approved AI models with versioning, lineage, and access control.
┌─────────────────────────────────────────────────────────────────┐
│ MODEL REGISTRY PATTERN │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ MODEL REGISTRY │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Model A │ │ Model B │ │ Model C │ │ │
│ │ │ v1.0, v1.1 │ │ v2.0 │ │ v1.0 │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ Status: │ │ Status: │ │ Status: │ │ │
│ │ │ PRODUCTION │ │ STAGING │ │ DEPRECATED │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ Metadata: │ │
│ │ - Owner, Team │ │
│ │ - Training data lineage │ │
│ │ - Performance metrics │ │
│ │ - Approval status │ │
│ │ - Access permissions │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ Governance: │
│ ├── Approval workflow (ML → Security → Legal → Deploy) │
│ ├── Version control (immutable versions) │
│ ├── Access control (who can use which models) │
│ └── Audit trail (all model operations logged) │
└─────────────────────────────────────────────────────────────────┘
Model States:
DEVELOPMENT:
- In active development
- Not for production use
- Access: ML team only
STAGING:
- Ready for testing
- Pending approval
- Access: QA, stakeholders
APPROVED:
- Passed all reviews
- Ready for production
- Access: Applications
PRODUCTION:
- Actively serving traffic
- Monitored
- Access: Production systems
DEPRECATED:
- Scheduled for removal
- New uses blocked
- Existing uses grandfathered
ARCHIVED:
- Removed from service
- Retained for audit
- No access
Full visibility into AI system health, performance, and behavior.
┌─────────────────────────────────────────────────────────────────┐
│ AI OBSERVABILITY STACK │
│ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ DASHBOARDS ││
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││
│ │ │ Latency │ │Throughput│ │ Errors │ │ Cost │ ││
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ ALERTING ││
│ │ - Latency > threshold ││
│ │ - Error rate spike ││
│ │ - Cost anomaly ││
│ │ - Model drift detected ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ DATA LAYER ││
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││
│ │ │ Metrics │ │ Logs │ │ Traces │ ││
│ │ │ (Prom) │ │ (Loki) │ │ (Jaeger) │ ││
│ │ └──────────┘ └──────────┘ └──────────┘ ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
│ Instrumentation: │
│ - Request/response logging │
│ - Token usage tracking │
│ - Latency breakdown │
│ - Error classification │
│ - User feedback signals │
└─────────────────────────────────────────────────────────────────┘
Latency Metrics:
- p50_latency_ms: Typical response time
- p95_latency_ms: Worst case common
- p99_latency_ms: Edge cases
- time_to_first_token: Streaming starts
Throughput Metrics:
- requests_per_second: Current load
- tokens_per_second: Processing rate
- concurrent_requests: Active requests
- queue_depth: Waiting requests
Quality Metrics:
- error_rate: Failed requests %
- hallucination_rate: Detected hallucinations
- user_feedback_score: Thumbs up/down ratio
- retrieval_relevance: RAG quality score
Cost Metrics:
- tokens_consumed: Input + output
- cost_per_request: Avg cost
- daily_spend: Total cost
- cost_by_application: Breakdown
Version-controlled, tested, and deployed prompts as code.
┌─────────────────────────────────────────────────────────────────┐
│ PROMPT MANAGEMENT SYSTEM │
│ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ PROMPT REPOSITORY ││
│ │ ││
│ │ prompts/ ││
│ │ ├── customer_support/ ││
│ │ │ ├── v1.0.0/ ││
│ │ │ │ ├── system.txt ││
│ │ │ │ ├── examples.json ││
│ │ │ │ └── tests.json ││
│ │ │ └── v1.1.0/ ││
│ │ │ └── ... ││
│ │ └── data_analysis/ ││
│ │ └── ... ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
│ CI/CD Pipeline: │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ Commit │─▶│ Test │─▶│ Review │─▶│ Stage │─▶│ Deploy │ │
│ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ │
│ │
│ Testing: │
│ - Unit tests (expected outputs) │
│ - Regression tests (no quality drop) │
│ - A/B tests (compare versions) │
│ - Safety tests (no harmful outputs) │
└─────────────────────────────────────────────────────────────────┘
# prompts/customer_support/v1.1.0/config.yaml
name: customer_support
version: 1.1.0
description: "Handle customer support inquiries"
system_prompt: |
You are a helpful customer support agent for {company_name}.
Guidelines:
- Be professional and empathetic
- Cite knowledge base sources
- Escalate complex issues
- Never share internal policies
Knowledge cutoff: {kb_update_date}
variables:
- company_name: required
- kb_update_date: required
examples:
- input: "I want to return my order"
expected_topics: ["return_policy", "refund_timeline"]
- input: "My product is broken"
expected_topics: ["warranty", "replacement"]
tests:
- name: "handles_refund_question"
input: "How do I get a refund?"
assertions:
- contains: "refund"
- does_not_contain: "internal"
- sentiment: "helpful"
┌─────────────────────────────────────────────────────────────────┐
│ AI SECURITY LAYERS │
│ │
│ Layer 1: PERIMETER │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ - API Gateway authentication ││
│ │ - Rate limiting ││
│ │ - IP allowlisting ││
│ │ - WAF rules ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
│ Layer 2: INPUT VALIDATION │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ - Prompt injection detection ││
│ │ - Input sanitization ││
│ │ - Length limits ││
│ │ - Content filtering ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
│ Layer 3: MODEL SECURITY │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ - Dedicated clusters (isolation) ││
│ │ - Content moderation ││
│ │ - Output filtering ││
│ │ - Guardrails ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
│ Layer 4: DATA PROTECTION │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ - Encryption at rest ││
│ │ - Encryption in transit ││
│ │ - PII detection/masking ││
│ │ - Data residency controls ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
│ Layer 5: AUDIT & COMPLIANCE │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ - Request/response logging ││
│ │ - Access audit trail ││
│ │ - Compliance reporting ││
│ │ - Incident response ││
│ └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘
class PromptSanitizer:
"""Detect and mitigate prompt injection attacks."""
INJECTION_PATTERNS = [
r"ignore previous instructions",
r"disregard .*instructions",
r"you are now",
r"new persona",
r"system prompt",
r"<\|.*\|>", # Special tokens
]
def sanitize(self, user_input: str) -> str:
# 1. Check for known patterns
for pattern in self.INJECTION_PATTERNS:
if re.search(pattern, user_input, re.IGNORECASE):
raise SecurityError("Potential prompt injection detected")
# 2. Escape special characters
sanitized = self.escape_special(user_input)
# 3. Wrap in delimiters
wrapped = f"<user_input>{sanitized}</user_input>"
return wrapped
def escape_special(self, text: str) -> str:
"""Escape characters that could be interpreted as instructions."""
replacements = {
"```": "'''", # Code blocks
"###": "---", # Markdown headers
"<|": "< |", # Special tokens
"|>": "| >",
}
for old, new in replacements.items():
text = text.replace(old, new)
return text
┌─────────────────────────────────────────────────────────────────┐
│ AI FINOPS FRAMEWORK │
│ │
│ VISIBILITY │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ - Cost by application/team ││
│ │ - Cost by model ││
│ │ - Token usage trends ││
│ │ - Unit economics (cost per conversation) ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
│ OPTIMIZATION │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ - Model right-sizing (use smaller when sufficient) ││
│ │ - Caching (avoid redundant calls) ││
│ │ - Batching (combine requests) ││
│ │ - Reserved capacity (commit for discounts) ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
│ GOVERNANCE │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ - Budget alerts by team ││
│ │ - Spend caps per application ││
│ │ - Chargeback/showback ││
│ │ - Approval for expensive models ││
│ └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘
Strategy 1: MODEL TIERING
- Route simple queries to cheaper models
- Reserve expensive models for complex tasks
- Example: Command Light for FAQ, Command R+ for analysis
Strategy 2: CACHING
- Cache identical queries
- Semantic caching (similar queries)
- Cache embeddings
- TTL based on content freshness
Strategy 3: PROMPT OPTIMIZATION
- Shorter prompts = fewer input tokens
- Efficient few-shot examples
- Remove unnecessary context
Strategy 4: BATCHING
- Combine multiple small requests
- Process in bulk during off-peak
- Reduced per-request overhead
Strategy 5: COMMITMENT
- Reserved capacity for steady workloads
- Volume discounts with providers
- Multi-year agreements where appropriate
┌─────────────────────────────────────────────────────────────────┐
│ MULTI-REGION AI DEPLOYMENT │
│ │
│ Region A (Primary) Region B (Secondary) │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ AI Services │ │ AI Services │ │
│ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │
│ │ │ GenAI DAC │ │ │ │ GenAI DAC │ │ │
│ │ └──────────────┘ │ │ └──────────────┘ │ │
│ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │
│ │ │ Knowledge Base│ │ │ │ Knowledge Base│ │ │
│ │ └──────────────┘ │ │ └──────────────┘ │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │ │ │
│ └──────────────┬───────────────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ Global Load │ │
│ │ Balancer │ │
│ │ │ │
│ │ - Health │ │
│ │ - Failover │ │
│ │ - Geo-routing │ │
│ └───────────────┘ │
│ │
│ Sync: │
│ - Knowledge bases replicated │
│ - Models deployed to both regions │
│ - Config synchronized │
└─────────────────────────────────────────────────────────────────┘
- [ ] Deploy AI Gateway
- [ ] Implement authentication/authorization
- [ ] Set up basic monitoring
- [ ] Configure rate limiting
- [ ] Enable audit logging
- [ ] Establish model registry
- [ ] Define approval workflows
- [ ] Implement prompt management
- [ ] Create cost tracking
- [ ] Document policies
- [ ] Input validation layer
- [ ] Output filtering
- [ ] PII detection
- [ ] Prompt injection defense
- [ ] Security review process
- [ ] Full observability stack
- [ ] Alerting rules
- [ ] Runbooks
- [ ] Incident response plan
- [ ] Capacity planning
- [ ] Caching strategy
- [ ] Model tiering
- [ ] Cost optimization
- [ ] Performance tuning
- [ ] Multi-region deployment