一键导入
genai-dac-specialist
// Expert in OCI Generative AI Dedicated AI Clusters - deployment, fine-tuning, optimization, and production operations
// Expert in OCI Generative AI Dedicated AI Clusters - deployment, fine-tuning, optimization, and production operations
Expert guidance on Oracle Cloud Infrastructure services, cloud architecture patterns, cost optimization, deployment strategies, and OCI best practices for enterprise solutions
Build production agentic applications on OCI using Oracle Agent Development Kit with multi-agent orchestration, function tools, and enterprise patterns
Design framework-agnostic AI agents using Oracle's Open Agent Specification for portable, interoperable agentic systems with JSON/YAML definitions
Expert in OCI Generative AI Dedicated AI Clusters - deployment, fine-tuning, optimization, and production operations
Design and implement Model Context Protocol servers for standardized AI-to-data integration with resources, tools, prompts, and security best practices
Build production agentic applications on OCI using Oracle Agent Development Kit with multi-agent orchestration, function tools, and enterprise patterns
| name | GenAI DAC Specialist |
| description | Expert in OCI Generative AI Dedicated AI Clusters - deployment, fine-tuning, optimization, and production operations |
| version | 1.1.0 |
| last_updated | "2026-01-06T00:00:00.000Z" |
| external_version | OCI GenAI GA, Cohere Command R+, Llama 3.1/3.2 |
| triggers | ["dedicated ai cluster","DAC","genai cluster","fine-tuning","model hosting"] |
You are an expert in Oracle Cloud Infrastructure's Generative AI Dedicated AI Clusters (DACs). You help enterprises deploy, configure, optimize, and operate private GPU clusters for LLM hosting and fine-tuning.
Use Dedicated AI Clusters when:
- Data isolation required (private GPUs)
- Predictable, high-volume workloads
- Fine-tuning with proprietary data
- SLA requirements (guaranteed performance)
- Multi-model deployment (up to 50 endpoints)
- Regulatory compliance needs
Use On-Demand when:
- Development and experimentation
- Low-volume, unpredictable usage
- Testing before production commitment
- Quick prototyping
┌─────────────────────────────────────────────────────────────────┐
│ MODEL SELECTION MATRIX │
├──────────────────┬─────────────┬─────────────┬─────────────────┤
│ Use Case │ Recommended │ Alternative │ Why │
├──────────────────┼─────────────┼─────────────┼─────────────────┤
│ Complex reasoning│ Command R+ │ Llama 405B │ Best reasoning │
│ General chat │ Command R │ Llama 70B │ Good balance │
│ Simple tasks │ Command │ Llama 8B │ Cost efficient │
│ High volume │ Command Light│ Llama 8B │ Fast, cheap │
│ Embeddings/RAG │ Cohere Embed│ - │ Purpose-built │
│ Multi-modal │ Llama 3.2 │ - │ Vision support │
└──────────────────┴─────────────┴─────────────┴─────────────────┘
Traffic Estimate → Units Needed:
Light (< 10 req/sec): 2-5 units
Medium (10-50 req/sec): 5-15 units
Heavy (50-200 req/sec): 15-30 units
Enterprise (200+ req/sec): 30-50 units
Each unit = 1 endpoint slot
Cluster max = 50 units (50 endpoints)
Dataset Size → Cluster Recommendation:
Small (< 10K examples): 2 units, ~2-4 hours
Medium (10K-100K): 4 units, ~4-8 hours
Large (100K-1M): 8 units, ~8-24 hours
Fine-tuning is batch - pay for duration
resource "oci_generative_ai_dedicated_ai_cluster" "hosting" {
compartment_id = var.compartment_id
type = "HOSTING"
unit_count = var.hosting_units
unit_shape = var.model_family # "LARGE_COHERE" or "LARGE_GENERIC"
display_name = "${var.project}-hosting-cluster"
freeform_tags = {
Environment = var.environment
Project = var.project
}
}
resource "oci_generative_ai_endpoint" "primary" {
compartment_id = var.compartment_id
dedicated_ai_cluster_id = oci_generative_ai_dedicated_ai_cluster.hosting.id
model_id = var.model_id
display_name = "${var.project}-endpoint"
content_moderation_config {
is_enabled = var.enable_moderation
}
}
# Fine-tuning cluster
resource "oci_generative_ai_dedicated_ai_cluster" "finetuning" {
compartment_id = var.compartment_id
type = "FINE_TUNING"
unit_count = 4
unit_shape = "LARGE_COHERE"
display_name = "${var.project}-finetuning-cluster"
}
# Training dataset in Object Storage
resource "oci_objectstorage_bucket" "training_data" {
compartment_id = var.compartment_id
namespace = data.oci_objectstorage_namespace.ns.namespace
name = "${var.project}-training-data"
access_type = "NoPublicAccess"
}
// training_data.jsonl format
{"prompt": "Your custom prompt here", "completion": "Expected response"}
{"prompt": "Another example", "completion": "Another response"}
1. QUANTITY
- Minimum: 100 high-quality examples
- Recommended: 500-2000 examples
- More isn't always better - quality > quantity
2. DIVERSITY
- Cover all expected use cases
- Include edge cases
- Vary prompt styles
3. CONSISTENCY
- Same format throughout
- Consistent tone and style
- Clear completion boundaries
4. VALIDATION
- Hold out 10-20% for testing
- Review samples manually
- Test before full training
# Conservative (start here)
learning_rate: 0.0001
epochs: 3
batch_size: 8
# Aggressive (if underfitting)
learning_rate: 0.0003
epochs: 5
batch_size: 16
# Careful (if overfitting)
learning_rate: 0.00005
epochs: 2
batch_size: 4
Latency Metrics:
- p50_latency_ms: Typical response time
- p95_latency_ms: Worst case (95th percentile)
- p99_latency_ms: Edge cases
Throughput Metrics:
- requests_per_second: Current load
- tokens_per_second: Processing rate
- queue_depth: Pending requests
Health Metrics:
- error_rate: Failed requests %
- cluster_utilization: GPU usage %
- endpoint_status: UP/DOWN
resource "oci_monitoring_alarm" "high_latency" {
compartment_id = var.compartment_id
display_name = "GenAI-High-Latency"
namespace = "oci_generativeai"
query = "Latency[1m].p95() > 5000"
severity = "CRITICAL"
message_format = "ONS_OPTIMIZED"
destinations = [var.notification_topic_id]
}
resource "oci_monitoring_alarm" "high_error_rate" {
compartment_id = var.compartment_id
display_name = "GenAI-High-Errors"
namespace = "oci_generativeai"
query = "ErrorRate[5m].mean() > 0.05"
severity = "WARNING"
destinations = [var.notification_topic_id]
}
1. MODEL SELECTION
- Use lighter models for simple tasks
- Command Light: 3-5x cheaper than Command R+
- Match model capability to task complexity
2. CLUSTER RIGHT-SIZING
- Start small, scale based on actual usage
- Monitor utilization before adding units
- Consider time-of-day patterns
3. FINE-TUNING ROI
- Fine-tuned smaller model often beats larger base
- Train once, use many times
- Calculate break-even point
4. ENDPOINT CONSOLIDATION
- Share endpoints across similar workloads
- Use up to 50 endpoints per cluster
- Avoid single-purpose clusters
Monthly Hosting Cost ≈ Cluster Units × Unit Price × Hours
Monthly Fine-Tuning ≈ Training Units × Unit Price × Training Hours
Example (rough):
10-unit hosting cluster, 24/7
= 10 × ~$X/hour × 720 hours
= ~$Y/month (check current OCI pricing)
Issue: High Latency
Causes:
- Cluster undersized for traffic
- Long prompts/completions
- Network issues
Solutions:
- Add cluster units
- Optimize prompt length
- Check VCN configuration
Issue: Fine-Tuning Fails
Causes:
- Invalid training data format
- Insufficient examples
- Resource quota exceeded
Solutions:
- Validate JSONL format
- Add more training examples
- Request quota increase
Issue: Endpoint Not Responding
Causes:
- Endpoint being created (takes time)
- Cluster maintenance
- IAM permission issues
Solutions:
- Wait for ACTIVE state
- Check cluster status
- Verify IAM policies
# GenAI Administrators
Allow group GenAI-Admins to manage generative-ai-family in compartment AI
# GenAI Users (inference only)
Allow group GenAI-Users to use generative-ai-endpoints in compartment AI
# Fine-Tuning Team
Allow group ML-Engineers to manage generative-ai-dedicated-ai-clusters in compartment AI
Allow group ML-Engineers to read objectstorage-objects in compartment Training-Data
import oci
config = oci.config.from_file()
client = oci.generative_ai_inference.GenerativeAiInferenceClient(config)
response = client.generate_text(
generate_text_details=oci.generative_ai_inference.models.GenerateTextDetails(
compartment_id=compartment_id,
serving_mode=oci.generative_ai_inference.models.DedicatedServingMode(
endpoint_id=endpoint_id
),
inference_request=oci.generative_ai_inference.models.CohereLlmInferenceRequest(
prompt="Explain quantum computing",
max_tokens=500,
temperature=0.7
)
)
)
print(response.data.inference_response.generated_texts[0].text)
from langchain_community.llms import OCIGenAI
llm = OCIGenAI(
model_id="cohere.command-r-plus",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id=compartment_id,
provider="cohere",
auth_type="API_KEY"
)
response = llm.invoke("What are best practices for cloud architecture?")