| name | ai-agent-development |
| description | Build production-ready AI agents with Microsoft Foundry and Agent Framework. Covers agent architecture, model selection, orchestration, tracing, and evaluation. |
AI Agent Development
Purpose: Build production-ready AI agents with Microsoft Foundry and Agent Framework.
Scope: Agent architecture, model selection, orchestration, observability, evaluation.
Quick Start
Installation
Python (Recommended):
pip install agent-framework-azure-ai --pre
.NET:
dotnet add package Microsoft.Agents.AI.AzureAI --prerelease
dotnet add package Microsoft.Agents.AI.Workflows --prerelease
Model Selection
Top Production Models (Microsoft Foundry):
| Model | Best For | Context | Cost/1M |
|---|
| gpt-5.2 | Enterprise agents, structured outputs | 200K/100K | TBD |
| gpt-5.1-codex-max | Agentic coding workflows | 272K/128K | $3.44 |
| claude-opus-4-5 | Complex agents, coding, computer use | 200K/64K | $10 |
| gpt-5.1 | Multi-step reasoning | 200K/100K | $3.44 |
| o3 | Advanced reasoning | 200K/100K | $3.5 |
Deploy Model: Ctrl+Shift+P → AI Toolkit: Deploy Model
Agent Patterns
Single Agent
from agent_framework.openai import OpenAIChatClient
client = OpenAIChatClient(
model="gpt-5.1",
api_key=os.getenv("FOUNDRY_API_KEY"),
endpoint=os.getenv("FOUNDRY_ENDPOINT")
)
agent = {
"name": "Assistant",
"instructions": "You are a helpful assistant.",
"tools": []
}
response = await client.chat(
messages=[{"role": "user", "content": "Hello"}],
agent=agent
)
Multi-Agent Orchestration
from agent_framework.workflows import SequentialWorkflow
researcher = {"name": "Researcher", "instructions": "Gather information."}
writer = {"name": "Writer", "instructions": "Write based on research."}
workflow = SequentialWorkflow(
agents=[researcher, writer],
handoff_strategy="on_completion"
)
result = await workflow.run(query="Write about AI agents")
Advanced Patterns: Search github.com/microsoft/agent-framework for:
- Group Chat, Concurrent, Conditional, Loop
- Human-in-the-Loop, Reflection, Fan-out/Fan-in
- MCP, Multimodal, Custom Executors
Observability (Tracing)
Setup OpenTelemetry
from agent_framework.observability import configure_otel_providers
configure_otel_providers(
vs_code_extension_port=4317,
enable_sensitive_data=True
)
Open Trace Viewer: Ctrl+Shift+P → AI Toolkit: Open Trace Viewer
⚠️ CRITICAL: Open trace viewer BEFORE running your agent.
Evaluation
Workflow
- Upload dataset (JSONL)
- Define evaluators (built-in or custom)
- Create evaluation
- Run evaluation
- Analyze results
Prerequisites
pip install "azure-ai-projects>=2.0.0b2"
Built-in Evaluators
Agent Evaluators:
builtin.intent_resolution - Intent correctly identified?
builtin.task_adherence - Instructions followed?
builtin.task_completion - Task completed end-to-end?
builtin.tool_call_accuracy - Tools used correctly?
builtin.tool_selection - Right tools chosen?
Quality Evaluators:
builtin.coherence - Natural text flow?
builtin.fluency - Grammar correct?
builtin.groundedness - Claims substantiated? (RAG)
builtin.relevance - Answers key points? (RAG)
Evaluation Example
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient
from openai.types.eval_create_params import DataSourceConfigCustom
from openai.types.evals.create_eval_jsonl_run_data_source_param import (
CreateEvalJSONLRunDataSourceParam, SourceFileID
)
endpoint = os.getenv("FOUNDRY_PROJECT_ENDPOINT")
model_deployment = os.getenv("MODEL_DEPLOYMENT_NAME")
with (
DefaultAzureCredential() as credential,
AIProjectClient(endpoint=endpoint, credential=credential) as project_client,
project_client.get_openai_client() as openai_client,
):
dataset = project_client.datasets.upload_file(
name="eval-data",
version="1",
file_path="data.jsonl"
)
data_source_config = DataSourceConfigCustom({
"type": "custom",
"item_schema": {
"type": "object",
"properties": {
"query": {"type": "string"},
"response": {"type": "string"}
},
"required": ["query", "response"]
},
"include_sample_schema": True
})
testing_criteria = [
{
"type": "azure_ai_evaluator",
"name": "coherence",
"evaluator_name": "builtin.coherence",
"data_mapping": {
"query": "{{item.query}}",
"response": "{{item.response}}"
},
"initialization_parameters": {"deployment_name": model_deployment}
}
]
evaluation = openai_client.evals.create(
name="agent-eval",
data_source_config=data_source_config,
testing_criteria=testing_criteria
)
run = openai_client.evals.runs.create(
eval_id=evaluation.id,
name="eval-run",
data_source=CreateEvalJSONLRunDataSourceParam(
type="jsonl",
source=SourceFileID(type="file_id", id=dataset.id)
)
)
while run.status not in ["completed", "failed"]:
run = openai_client.evals.runs.retrieve(run_id=run.id, eval_id=evaluation.id)
time.sleep(3)
print(f"Report: {run.report_url}")
Custom Evaluators
Code-based (objective metrics):
code_evaluator = project_client.evaluators.create_version(
name="response_length_check",
evaluator_version={
"name": "response_length_check",
"definition": {
"type": "CODE",
"code_text": """
def grade(sample, item):
length = len(item.get("response", ""))
return 1.0 if 100 <= length <= 500 else 0.5
""",
}
}
)
Prompt-based (subjective metrics):
prompt_evaluator = project_client.evaluators.create_version(
name="friendliness_check",
evaluator_version={
"name": "friendliness_check",
"definition": {
"type": "PROMPT",
"prompt_text": """
Rate friendliness (1-5):
Query: {{query}}
Response: {{response}}
Output JSON: {"result": <int>, "reason": "<text>"}
""",
}
}
)
Best Practices
Development
✅ DO:
- Plan agent architecture before coding (Research → Design → Implement)
- Use Microsoft Foundry models for production
- Implement tracing from day one
- Test with evaluation datasets before deployment
- Use structured outputs for reliable agent responses
- Implement error handling and retry logic
- Version your agents and track changes
❌ DON'T:
- Hardcode API keys or endpoints
- Skip tracing setup (critical for debugging)
- Deploy without evaluation
- Use GitHub models in production (free tier has limits)
- Ignore token limits and context windows
- Mix agent logic with business logic
Security
- Store credentials in environment variables or Azure Key Vault
- Validate all tool inputs and outputs
- Implement rate limiting for agent APIs
- Log agent actions for audit trails
- Use role-based access control (RBAC) for Foundry resources
- Review OWASP Top 10 for AI: owasp.org/AI-Security-and-Privacy-Guide
Performance
- Cache model responses when appropriate
- Use batch processing for multiple requests
- Monitor token usage and costs
- Implement timeout handling
- Use async/await for I/O operations
- Consider model size vs. latency tradeoffs
Monitoring
- Track key metrics: latency, success rate, token usage, cost
- Set up alerts for failures and anomalies
- Use structured logging with context
- Integrate with Azure Monitor / Application Insights
- Review traces regularly for optimization opportunities
Production Checklist
Development
Observability
Evaluation
Security & Compliance
Operations
Resources
Official Documentation:
AI Toolkit:
- Model Catalog:
Ctrl+Shift+P → AI Toolkit: Model Catalog
- Trace Viewer:
Ctrl+Shift+P → AI Toolkit: Open Trace Viewer
- Playground:
Ctrl+Shift+P → AI Toolkit: Model Playground
Security:
Related: AGENTS.md for agent behavior guidelines • Skills.md for general production practices
Last Updated: January 17, 2026