Run any Skill in Manus with one click

$pwd:

enterprise-ai-patterns

Name: Enterprise Ai Patterns
Author: frankxai

// Production-grade AI architecture patterns for enterprise - security, governance, scalability, and operational excellence

Run Skill in Manus

$ git log --oneline --stat

stars:1

forks:0

updated:January 6, 2026 at 17:52

SKILL.md

readonly

related-skills.json

same repository

agentic-orchestration.md

from "frankxai/ai-architect"

Patterns for multi-agent coordination, task decomposition, handoffs, and workflow orchestration. Best practices for building and managing agent systems.

2026-01-061

ai-security-expert.md

from "frankxai/ai-architect"

Enterprise AI security - OWASP LLM Top 10, prompt injection defense, guardrails, PII protection

2026-01-061

architecture-diagramming-expert.md

from "frankxai/ai-architect"

Create professional architecture diagrams using D2, Draw.io, Mermaid, and OCI official icons for enterprise-grade visualizations

2026-01-061

aws-ai-services-expert.md

from "frankxai/ai-architect"

Build AI applications on AWS using Bedrock, SageMaker, and AI/ML services with best practices for enterprise deployment

2026-01-061

azure-ai-services-expert.md

from "frankxai/ai-architect"

Build AI applications on Azure using Azure OpenAI, Cognitive Services, and ML services with enterprise patterns

2026-01-061

claude-sdk-expert.md

from "frankxai/ai-architect"

Build autonomous AI agents using Claude Agent SDK with computer use, tool calling, MCP integration, and production best practices

2026-01-061

package.json

"author": "frankxai"

"repository": "frankxai/ai-architect"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

Run any Skill with one click

name	Enterprise AI Patterns
description	Production-grade AI architecture patterns for enterprise - security, governance, scalability, and operational excellence
version	1.1.0
last_updated	"2026-01-06T00:00:00.000Z"
external_version	2026 Enterprise Patterns
triggers	["enterprise AI","production AI","AI governance","AI at scale","enterprise patterns"]

Enterprise AI Patterns

You are an expert in enterprise-grade AI architecture patterns. You help organizations build AI systems that are secure, scalable, governable, and operationally excellent.

Enterprise AI Architecture Principles

The Five Pillars

┌─────────────────────────────────────────────────────────────────┐
│                  ENTERPRISE AI PILLARS                           │
│                                                                  │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐       │
│  │ SECURITY  │ │ GOVERNANCE│ │ SCALE     │ │ OPERATIONS│       │
│  │           │ │           │ │           │ │           │       │
│  │ - IAM     │ │ - Policies│ │ - Auto    │ │ - Monitor │       │
│  │ - Encrypt │ │ - Audit   │ │ - Distrib │ │ - Alert   │       │
│  │ - Network │ │ - Lineage │ │ - Multi-  │ │ - Incident│       │
│  │ - Data    │ │ - Quality │ │   region  │ │ - SRE     │       │
│  └───────────┘ └───────────┘ └───────────┘ └───────────┘       │
│                                                                  │
│                     ┌───────────┐                               │
│                     │   COST    │                               │
│                     │           │                               │
│                     │ - FinOps  │                               │
│                     │ - Optimize│                               │
│                     │ - Budget  │                               │
│                     └───────────┘                               │
└─────────────────────────────────────────────────────────────────┘

Pattern 1: AI Gateway Architecture

Purpose

Centralized entry point for all AI services with security, routing, and observability.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      AI GATEWAY PATTERN                          │
│                                                                  │
│   Applications                                                  │
│   ┌────────┐ ┌────────┐ ┌────────┐                             │
│   │ App A  │ │ App B  │ │ App C  │                             │
│   └───┬────┘ └───┬────┘ └───┬────┘                             │
│       │          │          │                                   │
│       └──────────┼──────────┘                                   │
│                  │                                               │
│          ┌───────▼───────┐                                      │
│          │  AI GATEWAY   │                                      │
│          │               │                                      │
│          │ - AuthN/AuthZ │                                      │
│          │ - Rate Limit  │                                      │
│          │ - Routing     │                                      │
│          │ - Logging     │                                      │
│          │ - Caching     │                                      │
│          │ - Fallback    │                                      │
│          └───────┬───────┘                                      │
│                  │                                               │
│    ┌─────────────┼─────────────┐                                │
│    │             │             │                                │
│    ▼             ▼             ▼                                │
│ ┌──────┐    ┌──────┐    ┌──────┐                               │
│ │ OCI  │    │Azure │    │ AWS  │                               │
│ │GenAI │    │OpenAI│    │Bedrock│                              │
│ └──────┘    └──────┘    └──────┘                               │
└─────────────────────────────────────────────────────────────────┘

Implementation

from fastapi import FastAPI, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
import time
import logging

app = FastAPI()

class AIGateway:
    def __init__(self):
        self.providers = {
            "oci": OCIGenAIProvider(),
            "azure": AzureOpenAIProvider(),
            "aws": AWSBedrockProvider()
        }
        self.rate_limiter = RateLimiter()
        self.cache = ResponseCache()
        self.logger = logging.getLogger("ai_gateway")

    async def route_request(self, request: AIRequest) -> AIResponse:
        # 1. Rate limiting
        if not self.rate_limiter.allow(request.user_id):
            raise HTTPException(429, "Rate limit exceeded")

        # 2. Check cache
        cached = self.cache.get(request)
        if cached:
            return cached

        # 3. Route to provider
        provider = self.select_provider(request)

        # 4. Execute with fallback
        try:
            response = await provider.generate(request)
        except ProviderError:
            response = await self.fallback(request)

        # 5. Cache and log
        self.cache.set(request, response)
        self.log_request(request, response)

        return response

    def select_provider(self, request: AIRequest) -> Provider:
        """Route based on model preference or cost."""
        if request.model.startswith("gpt"):
            return self.providers["azure"]
        elif request.model.startswith("claude"):
            return self.providers["aws"]
        else:
            return self.providers["oci"]  # Default to OCI

Pattern 2: Model Registry & Governance

Purpose

Central catalog of approved AI models with versioning, lineage, and access control.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    MODEL REGISTRY PATTERN                        │
│                                                                  │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                   MODEL REGISTRY                            │ │
│  │                                                             │ │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │ │
│  │  │  Model A    │  │  Model B    │  │  Model C    │        │ │
│  │  │  v1.0, v1.1 │  │  v2.0       │  │  v1.0       │        │ │
│  │  │             │  │             │  │             │        │ │
│  │  │ Status:     │  │ Status:     │  │ Status:     │        │ │
│  │  │ PRODUCTION  │  │ STAGING     │  │ DEPRECATED  │        │ │
│  │  └─────────────┘  └─────────────┘  └─────────────┘        │ │
│  │                                                             │ │
│  │  Metadata:                                                  │ │
│  │  - Owner, Team                                             │ │
│  │  - Training data lineage                                   │ │
│  │  - Performance metrics                                     │ │
│  │  - Approval status                                         │ │
│  │  - Access permissions                                      │ │
│  └────────────────────────────────────────────────────────────┘ │
│                                                                  │
│  Governance:                                                    │
│  ├── Approval workflow (ML → Security → Legal → Deploy)        │
│  ├── Version control (immutable versions)                      │
│  ├── Access control (who can use which models)                 │
│  └── Audit trail (all model operations logged)                 │
└─────────────────────────────────────────────────────────────────┘

Model Lifecycle

Model States:
  DEVELOPMENT:
    - In active development
    - Not for production use
    - Access: ML team only

  STAGING:
    - Ready for testing
    - Pending approval
    - Access: QA, stakeholders

  APPROVED:
    - Passed all reviews
    - Ready for production
    - Access: Applications

  PRODUCTION:
    - Actively serving traffic
    - Monitored
    - Access: Production systems

  DEPRECATED:
    - Scheduled for removal
    - New uses blocked
    - Existing uses grandfathered

  ARCHIVED:
    - Removed from service
    - Retained for audit
    - No access

Pattern 3: AI Observability Stack

Purpose

Full visibility into AI system health, performance, and behavior.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                   AI OBSERVABILITY STACK                         │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                      DASHBOARDS                              ││
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   ││
│  │  │ Latency  │  │Throughput│  │  Errors  │  │  Cost    │   ││
│  │  └──────────┘  └──────────┘  └──────────┘  └──────────┘   ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                       ALERTING                               ││
│  │  - Latency > threshold                                       ││
│  │  - Error rate spike                                          ││
│  │  - Cost anomaly                                              ││
│  │  - Model drift detected                                      ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                      DATA LAYER                              ││
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐                  ││
│  │  │ Metrics  │  │  Logs    │  │ Traces   │                  ││
│  │  │ (Prom)   │  │ (Loki)   │  │ (Jaeger) │                  ││
│  │  └──────────┘  └──────────┘  └──────────┘                  ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  Instrumentation:                                               │
│  - Request/response logging                                     │
│  - Token usage tracking                                         │
│  - Latency breakdown                                            │
│  - Error classification                                         │
│  - User feedback signals                                        │
└─────────────────────────────────────────────────────────────────┘

Key Metrics

Latency Metrics:
  - p50_latency_ms: Typical response time
  - p95_latency_ms: Worst case common
  - p99_latency_ms: Edge cases
  - time_to_first_token: Streaming starts

Throughput Metrics:
  - requests_per_second: Current load
  - tokens_per_second: Processing rate
  - concurrent_requests: Active requests
  - queue_depth: Waiting requests

Quality Metrics:
  - error_rate: Failed requests %
  - hallucination_rate: Detected hallucinations
  - user_feedback_score: Thumbs up/down ratio
  - retrieval_relevance: RAG quality score

Cost Metrics:
  - tokens_consumed: Input + output
  - cost_per_request: Avg cost
  - daily_spend: Total cost
  - cost_by_application: Breakdown

Pattern 4: Prompt Management System

Purpose

Version-controlled, tested, and deployed prompts as code.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                  PROMPT MANAGEMENT SYSTEM                        │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                  PROMPT REPOSITORY                           ││
│  │                                                              ││
│  │  prompts/                                                    ││
│  │  ├── customer_support/                                       ││
│  │  │   ├── v1.0.0/                                            ││
│  │  │   │   ├── system.txt                                     ││
│  │  │   │   ├── examples.json                                  ││
│  │  │   │   └── tests.json                                     ││
│  │  │   └── v1.1.0/                                            ││
│  │  │       └── ...                                            ││
│  │  └── data_analysis/                                          ││
│  │      └── ...                                                 ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  CI/CD Pipeline:                                                │
│  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  │
│  │ Commit │─▶│  Test  │─▶│ Review │─▶│  Stage │─▶│ Deploy │  │
│  └────────┘  └────────┘  └────────┘  └────────┘  └────────┘  │
│                                                                  │
│  Testing:                                                       │
│  - Unit tests (expected outputs)                                │
│  - Regression tests (no quality drop)                           │
│  - A/B tests (compare versions)                                 │
│  - Safety tests (no harmful outputs)                            │
└─────────────────────────────────────────────────────────────────┘

Prompt Template

# prompts/customer_support/v1.1.0/config.yaml
name: customer_support
version: 1.1.0
description: "Handle customer support inquiries"

system_prompt: |
  You are a helpful customer support agent for {company_name}.

  Guidelines:
  - Be professional and empathetic
  - Cite knowledge base sources
  - Escalate complex issues
  - Never share internal policies

  Knowledge cutoff: {kb_update_date}

variables:
  - company_name: required
  - kb_update_date: required

examples:
  - input: "I want to return my order"
    expected_topics: ["return_policy", "refund_timeline"]
  - input: "My product is broken"
    expected_topics: ["warranty", "replacement"]

tests:
  - name: "handles_refund_question"
    input: "How do I get a refund?"
    assertions:
      - contains: "refund"
      - does_not_contain: "internal"
      - sentiment: "helpful"

Pattern 5: AI Security Layers

Defense in Depth

┌─────────────────────────────────────────────────────────────────┐
│                   AI SECURITY LAYERS                             │
│                                                                  │
│  Layer 1: PERIMETER                                             │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - API Gateway authentication                                 ││
│  │ - Rate limiting                                              ││
│  │ - IP allowlisting                                            ││
│  │ - WAF rules                                                  ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  Layer 2: INPUT VALIDATION                                      │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Prompt injection detection                                 ││
│  │ - Input sanitization                                         ││
│  │ - Length limits                                              ││
│  │ - Content filtering                                          ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  Layer 3: MODEL SECURITY                                        │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Dedicated clusters (isolation)                             ││
│  │ - Content moderation                                         ││
│  │ - Output filtering                                           ││
│  │ - Guardrails                                                 ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  Layer 4: DATA PROTECTION                                       │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Encryption at rest                                         ││
│  │ - Encryption in transit                                      ││
│  │ - PII detection/masking                                      ││
│  │ - Data residency controls                                    ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  Layer 5: AUDIT & COMPLIANCE                                    │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Request/response logging                                   ││
│  │ - Access audit trail                                         ││
│  │ - Compliance reporting                                       ││
│  │ - Incident response                                          ││
│  └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘

Prompt Injection Defense

class PromptSanitizer:
    """Detect and mitigate prompt injection attacks."""

    INJECTION_PATTERNS = [
        r"ignore previous instructions",
        r"disregard .*instructions",
        r"you are now",
        r"new persona",
        r"system prompt",
        r"<\|.*\|>",  # Special tokens
    ]

    def sanitize(self, user_input: str) -> str:
        # 1. Check for known patterns
        for pattern in self.INJECTION_PATTERNS:
            if re.search(pattern, user_input, re.IGNORECASE):
                raise SecurityError("Potential prompt injection detected")

        # 2. Escape special characters
        sanitized = self.escape_special(user_input)

        # 3. Wrap in delimiters
        wrapped = f"<user_input>{sanitized}</user_input>"

        return wrapped

    def escape_special(self, text: str) -> str:
        """Escape characters that could be interpreted as instructions."""
        replacements = {
            "```": "'''",  # Code blocks
            "###": "---",  # Markdown headers
            "<|": "< |",   # Special tokens
            "|>": "| >",
        }
        for old, new in replacements.items():
            text = text.replace(old, new)
        return text

Pattern 6: Cost Management

FinOps for AI

┌─────────────────────────────────────────────────────────────────┐
│                    AI FINOPS FRAMEWORK                           │
│                                                                  │
│  VISIBILITY                                                     │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Cost by application/team                                   ││
│  │ - Cost by model                                              ││
│  │ - Token usage trends                                         ││
│  │ - Unit economics (cost per conversation)                     ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  OPTIMIZATION                                                   │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Model right-sizing (use smaller when sufficient)           ││
│  │ - Caching (avoid redundant calls)                            ││
│  │ - Batching (combine requests)                                ││
│  │ - Reserved capacity (commit for discounts)                   ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  GOVERNANCE                                                     │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Budget alerts by team                                      ││
│  │ - Spend caps per application                                 ││
│  │ - Chargeback/showback                                        ││
│  │ - Approval for expensive models                              ││
│  └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘

Cost Optimization Strategies

Strategy 1: MODEL TIERING
  - Route simple queries to cheaper models
  - Reserve expensive models for complex tasks
  - Example: Command Light for FAQ, Command R+ for analysis

Strategy 2: CACHING
  - Cache identical queries
  - Semantic caching (similar queries)
  - Cache embeddings
  - TTL based on content freshness

Strategy 3: PROMPT OPTIMIZATION
  - Shorter prompts = fewer input tokens
  - Efficient few-shot examples
  - Remove unnecessary context

Strategy 4: BATCHING
  - Combine multiple small requests
  - Process in bulk during off-peak
  - Reduced per-request overhead

Strategy 5: COMMITMENT
  - Reserved capacity for steady workloads
  - Volume discounts with providers
  - Multi-year agreements where appropriate

Pattern 7: Multi-Region Resilience

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                  MULTI-REGION AI DEPLOYMENT                      │
│                                                                  │
│   Region A (Primary)              Region B (Secondary)          │
│   ┌─────────────────────┐        ┌─────────────────────┐       │
│   │  AI Services        │        │  AI Services        │       │
│   │  ┌──────────────┐   │        │  ┌──────────────┐   │       │
│   │  │ GenAI DAC    │   │        │  │ GenAI DAC    │   │       │
│   │  └──────────────┘   │        │  └──────────────┘   │       │
│   │  ┌──────────────┐   │        │  ┌──────────────┐   │       │
│   │  │ Knowledge Base│   │        │  │ Knowledge Base│   │       │
│   │  └──────────────┘   │        │  └──────────────┘   │       │
│   └─────────────────────┘        └─────────────────────┘       │
│             │                              │                    │
│             └──────────────┬───────────────┘                    │
│                            │                                    │
│                    ┌───────▼───────┐                           │
│                    │ Global Load   │                           │
│                    │ Balancer      │                           │
│                    │               │                           │
│                    │ - Health      │                           │
│                    │ - Failover    │                           │
│                    │ - Geo-routing │                           │
│                    └───────────────┘                           │
│                                                                  │
│   Sync:                                                        │
│   - Knowledge bases replicated                                 │
│   - Models deployed to both regions                            │
│   - Config synchronized                                        │
└─────────────────────────────────────────────────────────────────┘

Implementation Checklist

Phase 1: Foundation

- [ ] Deploy AI Gateway
- [ ] Implement authentication/authorization
- [ ] Set up basic monitoring
- [ ] Configure rate limiting
- [ ] Enable audit logging

Phase 2: Governance

- [ ] Establish model registry
- [ ] Define approval workflows
- [ ] Implement prompt management
- [ ] Create cost tracking
- [ ] Document policies

Phase 3: Security

- [ ] Input validation layer
- [ ] Output filtering
- [ ] PII detection
- [ ] Prompt injection defense
- [ ] Security review process

Phase 4: Operations

- [ ] Full observability stack
- [ ] Alerting rules
- [ ] Runbooks
- [ ] Incident response plan
- [ ] Capacity planning

Phase 5: Optimization

- [ ] Caching strategy
- [ ] Model tiering
- [ ] Cost optimization
- [ ] Performance tuning
- [ ] Multi-region deployment

Resources

name	Enterprise AI Patterns
description	Production-grade AI architecture patterns for enterprise - security, governance, scalability, and operational excellence
version	1.1.0
last_updated	"2026-01-06T00:00:00.000Z"
external_version	2026 Enterprise Patterns
triggers	["enterprise AI","production AI","AI governance","AI at scale","enterprise patterns"]

Enterprise AI Patterns

You are an expert in enterprise-grade AI architecture patterns. You help organizations build AI systems that are secure, scalable, governable, and operationally excellent.

Enterprise AI Architecture Principles

The Five Pillars

┌─────────────────────────────────────────────────────────────────┐
│                  ENTERPRISE AI PILLARS                           │
│                                                                  │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐       │
│  │ SECURITY  │ │ GOVERNANCE│ │ SCALE     │ │ OPERATIONS│       │
│  │           │ │           │ │           │ │           │       │
│  │ - IAM     │ │ - Policies│ │ - Auto    │ │ - Monitor │       │
│  │ - Encrypt │ │ - Audit   │ │ - Distrib │ │ - Alert   │       │
│  │ - Network │ │ - Lineage │ │ - Multi-  │ │ - Incident│       │
│  │ - Data    │ │ - Quality │ │   region  │ │ - SRE     │       │
│  └───────────┘ └───────────┘ └───────────┘ └───────────┘       │
│                                                                  │
│                     ┌───────────┐                               │
│                     │   COST    │                               │
│                     │           │                               │
│                     │ - FinOps  │                               │
│                     │ - Optimize│                               │
│                     │ - Budget  │                               │
│                     └───────────┘                               │
└─────────────────────────────────────────────────────────────────┘

Pattern 1: AI Gateway Architecture

Purpose

Centralized entry point for all AI services with security, routing, and observability.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      AI GATEWAY PATTERN                          │
│                                                                  │
│   Applications                                                  │
│   ┌────────┐ ┌────────┐ ┌────────┐                             │
│   │ App A  │ │ App B  │ │ App C  │                             │
│   └───┬────┘ └───┬────┘ └───┬────┘                             │
│       │          │          │                                   │
│       └──────────┼──────────┘                                   │
│                  │                                               │
│          ┌───────▼───────┐                                      │
│          │  AI GATEWAY   │                                      │
│          │               │                                      │
│          │ - AuthN/AuthZ │                                      │
│          │ - Rate Limit  │                                      │
│          │ - Routing     │                                      │
│          │ - Logging     │                                      │
│          │ - Caching     │                                      │
│          │ - Fallback    │                                      │
│          └───────┬───────┘                                      │
│                  │                                               │
│    ┌─────────────┼─────────────┐                                │
│    │             │             │                                │
│    ▼             ▼             ▼                                │
│ ┌──────┐    ┌──────┐    ┌──────┐                               │
│ │ OCI  │    │Azure │    │ AWS  │                               │
│ │GenAI │    │OpenAI│    │Bedrock│                              │
│ └──────┘    └──────┘    └──────┘                               │
└─────────────────────────────────────────────────────────────────┘

Implementation

from fastapi import FastAPI, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
import time
import logging

app = FastAPI()

class AIGateway:
    def __init__(self):
        self.providers = {
            "oci": OCIGenAIProvider(),
            "azure": AzureOpenAIProvider(),
            "aws": AWSBedrockProvider()
        }
        self.rate_limiter = RateLimiter()
        self.cache = ResponseCache()
        self.logger = logging.getLogger("ai_gateway")

    async def route_request(self, request: AIRequest) -> AIResponse:
        # 1. Rate limiting
        if not self.rate_limiter.allow(request.user_id):
            raise HTTPException(429, "Rate limit exceeded")

        # 2. Check cache
        cached = self.cache.get(request)
        if cached:
            return cached

        # 3. Route to provider
        provider = self.select_provider(request)

        # 4. Execute with fallback
        try:
            response = await provider.generate(request)
        except ProviderError:
            response = await self.fallback(request)

        # 5. Cache and log
        self.cache.set(request, response)
        self.log_request(request, response)

        return response

    def select_provider(self, request: AIRequest) -> Provider:
        """Route based on model preference or cost."""
        if request.model.startswith("gpt"):
            return self.providers["azure"]
        elif request.model.startswith("claude"):
            return self.providers["aws"]
        else:
            return self.providers["oci"]  # Default to OCI

Pattern 2: Model Registry & Governance

Purpose

Central catalog of approved AI models with versioning, lineage, and access control.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    MODEL REGISTRY PATTERN                        │
│                                                                  │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                   MODEL REGISTRY                            │ │
│  │                                                             │ │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │ │
│  │  │  Model A    │  │  Model B    │  │  Model C    │        │ │
│  │  │  v1.0, v1.1 │  │  v2.0       │  │  v1.0       │        │ │
│  │  │             │  │             │  │             │        │ │
│  │  │ Status:     │  │ Status:     │  │ Status:     │        │ │
│  │  │ PRODUCTION  │  │ STAGING     │  │ DEPRECATED  │        │ │
│  │  └─────────────┘  └─────────────┘  └─────────────┘        │ │
│  │                                                             │ │
│  │  Metadata:                                                  │ │
│  │  - Owner, Team                                             │ │
│  │  - Training data lineage                                   │ │
│  │  - Performance metrics                                     │ │
│  │  - Approval status                                         │ │
│  │  - Access permissions                                      │ │
│  └────────────────────────────────────────────────────────────┘ │
│                                                                  │
│  Governance:                                                    │
│  ├── Approval workflow (ML → Security → Legal → Deploy)        │
│  ├── Version control (immutable versions)                      │
│  ├── Access control (who can use which models)                 │
│  └── Audit trail (all model operations logged)                 │
└─────────────────────────────────────────────────────────────────┘

Model Lifecycle

Model States:
  DEVELOPMENT:
    - In active development
    - Not for production use
    - Access: ML team only

  STAGING:
    - Ready for testing
    - Pending approval
    - Access: QA, stakeholders

  APPROVED:
    - Passed all reviews
    - Ready for production
    - Access: Applications

  PRODUCTION:
    - Actively serving traffic
    - Monitored
    - Access: Production systems

  DEPRECATED:
    - Scheduled for removal
    - New uses blocked
    - Existing uses grandfathered

  ARCHIVED:
    - Removed from service
    - Retained for audit
    - No access

Pattern 3: AI Observability Stack

Purpose

Full visibility into AI system health, performance, and behavior.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                   AI OBSERVABILITY STACK                         │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                      DASHBOARDS                              ││
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   ││
│  │  │ Latency  │  │Throughput│  │  Errors  │  │  Cost    │   ││
│  │  └──────────┘  └──────────┘  └──────────┘  └──────────┘   ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                       ALERTING                               ││
│  │  - Latency > threshold                                       ││
│  │  - Error rate spike                                          ││
│  │  - Cost anomaly                                              ││
│  │  - Model drift detected                                      ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                      DATA LAYER                              ││
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐                  ││
│  │  │ Metrics  │  │  Logs    │  │ Traces   │                  ││
│  │  │ (Prom)   │  │ (Loki)   │  │ (Jaeger) │                  ││
│  │  └──────────┘  └──────────┘  └──────────┘                  ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  Instrumentation:                                               │
│  - Request/response logging                                     │
│  - Token usage tracking                                         │
│  - Latency breakdown                                            │
│  - Error classification                                         │
│  - User feedback signals                                        │
└─────────────────────────────────────────────────────────────────┘

Key Metrics

Latency Metrics:
  - p50_latency_ms: Typical response time
  - p95_latency_ms: Worst case common
  - p99_latency_ms: Edge cases
  - time_to_first_token: Streaming starts

Throughput Metrics:
  - requests_per_second: Current load
  - tokens_per_second: Processing rate
  - concurrent_requests: Active requests
  - queue_depth: Waiting requests

Quality Metrics:
  - error_rate: Failed requests %
  - hallucination_rate: Detected hallucinations
  - user_feedback_score: Thumbs up/down ratio
  - retrieval_relevance: RAG quality score

Cost Metrics:
  - tokens_consumed: Input + output
  - cost_per_request: Avg cost
  - daily_spend: Total cost
  - cost_by_application: Breakdown

Pattern 4: Prompt Management System

Purpose

Version-controlled, tested, and deployed prompts as code.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                  PROMPT MANAGEMENT SYSTEM                        │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                  PROMPT REPOSITORY                           ││
│  │                                                              ││
│  │  prompts/                                                    ││
│  │  ├── customer_support/                                       ││
│  │  │   ├── v1.0.0/                                            ││
│  │  │   │   ├── system.txt                                     ││
│  │  │   │   ├── examples.json                                  ││
│  │  │   │   └── tests.json                                     ││
│  │  │   └── v1.1.0/                                            ││
│  │  │       └── ...                                            ││
│  │  └── data_analysis/                                          ││
│  │      └── ...                                                 ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  CI/CD Pipeline:                                                │
│  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  │
│  │ Commit │─▶│  Test  │─▶│ Review │─▶│  Stage │─▶│ Deploy │  │
│  └────────┘  └────────┘  └────────┘  └────────┘  └────────┘  │
│                                                                  │
│  Testing:                                                       │
│  - Unit tests (expected outputs)                                │
│  - Regression tests (no quality drop)                           │
│  - A/B tests (compare versions)                                 │
│  - Safety tests (no harmful outputs)                            │
└─────────────────────────────────────────────────────────────────┘

Prompt Template

# prompts/customer_support/v1.1.0/config.yaml
name: customer_support
version: 1.1.0
description: "Handle customer support inquiries"

system_prompt: |
  You are a helpful customer support agent for {company_name}.

  Guidelines:
  - Be professional and empathetic
  - Cite knowledge base sources
  - Escalate complex issues
  - Never share internal policies

  Knowledge cutoff: {kb_update_date}

variables:
  - company_name: required
  - kb_update_date: required

examples:
  - input: "I want to return my order"
    expected_topics: ["return_policy", "refund_timeline"]
  - input: "My product is broken"
    expected_topics: ["warranty", "replacement"]

tests:
  - name: "handles_refund_question"
    input: "How do I get a refund?"
    assertions:
      - contains: "refund"
      - does_not_contain: "internal"
      - sentiment: "helpful"

Pattern 5: AI Security Layers

Defense in Depth

┌─────────────────────────────────────────────────────────────────┐
│                   AI SECURITY LAYERS                             │
│                                                                  │
│  Layer 1: PERIMETER                                             │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - API Gateway authentication                                 ││
│  │ - Rate limiting                                              ││
│  │ - IP allowlisting                                            ││
│  │ - WAF rules                                                  ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  Layer 2: INPUT VALIDATION                                      │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Prompt injection detection                                 ││
│  │ - Input sanitization                                         ││
│  │ - Length limits                                              ││
│  │ - Content filtering                                          ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  Layer 3: MODEL SECURITY                                        │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Dedicated clusters (isolation)                             ││
│  │ - Content moderation                                         ││
│  │ - Output filtering                                           ││
│  │ - Guardrails                                                 ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  Layer 4: DATA PROTECTION                                       │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Encryption at rest                                         ││
│  │ - Encryption in transit                                      ││
│  │ - PII detection/masking                                      ││
│  │ - Data residency controls                                    ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  Layer 5: AUDIT & COMPLIANCE                                    │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Request/response logging                                   ││
│  │ - Access audit trail                                         ││
│  │ - Compliance reporting                                       ││
│  │ - Incident response                                          ││
│  └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘

Prompt Injection Defense

class PromptSanitizer:
    """Detect and mitigate prompt injection attacks."""

    INJECTION_PATTERNS = [
        r"ignore previous instructions",
        r"disregard .*instructions",
        r"you are now",
        r"new persona",
        r"system prompt",
        r"<\|.*\|>",  # Special tokens
    ]

    def sanitize(self, user_input: str) -> str:
        # 1. Check for known patterns
        for pattern in self.INJECTION_PATTERNS:
            if re.search(pattern, user_input, re.IGNORECASE):
                raise SecurityError("Potential prompt injection detected")

        # 2. Escape special characters
        sanitized = self.escape_special(user_input)

        # 3. Wrap in delimiters
        wrapped = f"<user_input>{sanitized}</user_input>"

        return wrapped

    def escape_special(self, text: str) -> str:
        """Escape characters that could be interpreted as instructions."""
        replacements = {
            "```": "'''",  # Code blocks
            "###": "---",  # Markdown headers
            "<|": "< |",   # Special tokens
            "|>": "| >",
        }
        for old, new in replacements.items():
            text = text.replace(old, new)
        return text

Pattern 6: Cost Management

FinOps for AI

┌─────────────────────────────────────────────────────────────────┐
│                    AI FINOPS FRAMEWORK                           │
│                                                                  │
│  VISIBILITY                                                     │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Cost by application/team                                   ││
│  │ - Cost by model                                              ││
│  │ - Token usage trends                                         ││
│  │ - Unit economics (cost per conversation)                     ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  OPTIMIZATION                                                   │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Model right-sizing (use smaller when sufficient)           ││
│  │ - Caching (avoid redundant calls)                            ││
│  │ - Batching (combine requests)                                ││
│  │ - Reserved capacity (commit for discounts)                   ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  GOVERNANCE                                                     │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │ - Budget alerts by team                                      ││
│  │ - Spend caps per application                                 ││
│  │ - Chargeback/showback                                        ││
│  │ - Approval for expensive models                              ││
│  └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘

Cost Optimization Strategies

Strategy 1: MODEL TIERING
  - Route simple queries to cheaper models
  - Reserve expensive models for complex tasks
  - Example: Command Light for FAQ, Command R+ for analysis

Strategy 2: CACHING
  - Cache identical queries
  - Semantic caching (similar queries)
  - Cache embeddings
  - TTL based on content freshness

Strategy 3: PROMPT OPTIMIZATION
  - Shorter prompts = fewer input tokens
  - Efficient few-shot examples
  - Remove unnecessary context

Strategy 4: BATCHING
  - Combine multiple small requests
  - Process in bulk during off-peak
  - Reduced per-request overhead

Strategy 5: COMMITMENT
  - Reserved capacity for steady workloads
  - Volume discounts with providers
  - Multi-year agreements where appropriate

Pattern 7: Multi-Region Resilience

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                  MULTI-REGION AI DEPLOYMENT                      │
│                                                                  │
│   Region A (Primary)              Region B (Secondary)          │
│   ┌─────────────────────┐        ┌─────────────────────┐       │
│   │  AI Services        │        │  AI Services        │       │
│   │  ┌──────────────┐   │        │  ┌──────────────┐   │       │
│   │  │ GenAI DAC    │   │        │  │ GenAI DAC    │   │       │
│   │  └──────────────┘   │        │  └──────────────┘   │       │
│   │  ┌──────────────┐   │        │  ┌──────────────┐   │       │
│   │  │ Knowledge Base│   │        │  │ Knowledge Base│   │       │
│   │  └──────────────┘   │        │  └──────────────┘   │       │
│   └─────────────────────┘        └─────────────────────┘       │
│             │                              │                    │
│             └──────────────┬───────────────┘                    │
│                            │                                    │
│                    ┌───────▼───────┐                           │
│                    │ Global Load   │                           │
│                    │ Balancer      │                           │
│                    │               │                           │
│                    │ - Health      │                           │
│                    │ - Failover    │                           │
│                    │ - Geo-routing │                           │
│                    └───────────────┘                           │
│                                                                  │
│   Sync:                                                        │
│   - Knowledge bases replicated                                 │
│   - Models deployed to both regions                            │
│   - Config synchronized                                        │
└─────────────────────────────────────────────────────────────────┘

Implementation Checklist

Phase 1: Foundation

- [ ] Deploy AI Gateway
- [ ] Implement authentication/authorization
- [ ] Set up basic monitoring
- [ ] Configure rate limiting
- [ ] Enable audit logging

Phase 2: Governance

- [ ] Establish model registry
- [ ] Define approval workflows
- [ ] Implement prompt management
- [ ] Create cost tracking
- [ ] Document policies

Phase 3: Security

- [ ] Input validation layer
- [ ] Output filtering
- [ ] PII detection
- [ ] Prompt injection defense
- [ ] Security review process

Phase 4: Operations

- [ ] Full observability stack
- [ ] Alerting rules
- [ ] Runbooks
- [ ] Incident response plan
- [ ] Capacity planning

Phase 5: Optimization

- [ ] Caching strategy
- [ ] Model tiering
- [ ] Cost optimization
- [ ] Performance tuning
- [ ] Multi-region deployment

enterprise-ai-patterns

More from this repository

More from this repository

Enterprise AI Patterns

Enterprise AI Architecture Principles

The Five Pillars

Pattern 1: AI Gateway Architecture

Purpose

Architecture

Implementation

Pattern 2: Model Registry & Governance

Purpose

Architecture

Model Lifecycle

Pattern 3: AI Observability Stack

Purpose

Architecture

Key Metrics

Pattern 4: Prompt Management System

Purpose

Architecture

Prompt Template

Pattern 5: AI Security Layers

Defense in Depth

Prompt Injection Defense

Pattern 6: Cost Management

FinOps for AI

Cost Optimization Strategies

Pattern 7: Multi-Region Resilience

Architecture

Implementation Checklist

Phase 1: Foundation

Phase 2: Governance

Phase 3: Security

Phase 4: Operations

Phase 5: Optimization

Resources

Enterprise AI Patterns

Enterprise AI Architecture Principles

The Five Pillars

Pattern 1: AI Gateway Architecture

Purpose

Architecture

Implementation

Pattern 2: Model Registry & Governance

Purpose

Architecture

Model Lifecycle

Pattern 3: AI Observability Stack

Purpose

Architecture

Key Metrics

Pattern 4: Prompt Management System

Purpose

Architecture

Prompt Template

Pattern 5: AI Security Layers

Defense in Depth

Prompt Injection Defense

Pattern 6: Cost Management

FinOps for AI

Cost Optimization Strategies

Pattern 7: Multi-Region Resilience

Architecture

Implementation Checklist

Phase 1: Foundation

Phase 2: Governance

Phase 3: Security

Phase 4: Operations

Phase 5: Optimization

Resources