Run any Skill in Manus with one click

azure-ai-contentunderstanding-py

Stars2,607

Forks297

UpdatedMay 18, 2026 at 15:32

Azure AI Content Understanding SDK for Python. Use for multimodal content extraction from documents, images, audio, and video. Triggers: "azure-ai-contentunderstanding", "ContentUnderstandingClient", "multimodal analysis", "document extraction", "video analysis", "audio transcription".

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

microsoft

microsoft/skills

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software DevelopersComputer and Mathematical Occupations·SOC 15-1252

SKILL.md

readonly

More from this repository

same repository

microsoft-foundry

microsoft/skills

Deploy, evaluate, fine-tune, and manage Foundry agents end-to-end with azd: hosted agent scaffold/run/deploy, prompt agent create, batch eval, continuous eval, prompt optimizer, Agent Optimizer scaffold, agent.yaml, dataset curation from traces, model fine-tuning (SFT/DPO/RFT). USE FOR: azd ai agent, azd provision/deploy, deploy agent, hosted agent, create agent, add tool to agent, invoke agent, evaluate agent, continuous eval, continuous monitoring, optimize prompt, improve prompt, optimize agent instructions, agent optimizer, deploy model, Foundry project, RBAC, role assignment, permissions, quota, capacity, region, troubleshoot agent, deployment failure, AI Services, create Foundry resource, provision, knowledge index, customize deployment, onboard, availability, fine-tune, SFT, DPO, RFT, training-data, grader, distillation, fine-tuned model, large file upload. DO NOT USE FOR: Azure Functions, App Service, general Azure deploy (use azure-deploy), general Azure prep (use azure-prepare).

2026-06-232.6k

debugview

microsoft/skills

Sysinternals DebugView CLI (DbgViewCli) for capturing and analyzing usermode and kernel-mode Windows debug output from the command line. USE FOR: capturing OutputDebugString output, kernel DbgPrint/KdPrint capture, boot-time debug logging, remote debug monitoring, filtering debug output by PID or process name, crash dump analysis, automated debug capture with bounded execution. DO NOT USE FOR: non-Windows platforms, application-level logging frameworks (log4j, serilog), Azure Monitor or cloud telemetry, ETW tracing (use WPR/xperf instead), user-mode crash dumps (use WinDbg). Triggers: "debug output", "DbgView", "DebugView", "kernel debug", "capture debug logs", "boot logging", "OutputDebugString", "DbgPrint", "KdPrint", "remote debug monitor", "debug capture CLI".

2026-06-162.6k

azure-prepare

microsoft/skills

Prepare Azure apps for deployment (infra Bicep/Terraform, azure.yaml, Dockerfiles). Use for create/modernize or create+deploy; not cross-cloud migration (use azure-cloud-migrate). DO NOT USE FOR: copilot-sdk apps (use azure-hosted-copilot-sdk), or Python code-only App Service deploys (use python-appservice-deploy). WHEN: "create app", "build web app", "create API", "modernize application", "host on Azure", "deploy to Azure", "deploy to Azure using Terraform", "deploy to Azure App Service", "deploy to Azure App Service using Terraform", "deploy to Azure Container Apps", "generate Terraform", "generate Bicep", "function app", "timer trigger", "service bus trigger", "event-driven function", "managed identity".

2026-06-152.6k

python-appservice-deploy

microsoft/skills

Deploy Python (Flask/Django/FastAPI) code to Azure App Service Linux. WHEN: "Flask App Service", "Django App Service", "FastAPI App Service", "deploy Python to App Service". DO NOT USE FOR: Container Apps, Functions, non-Python, Terraform/Bicep/IaC, full infra — use azure-prepare.

2026-06-152.6k

azure-compute

microsoft/skills

Azure VM/VMSS router. WHEN: create / provision / deploy / spin-up VM, recommend VM size, compare VM pricing, VMSS, scale set, autoscale, burstable, lightweight server, website, backend, GPU, machine learning, HPC simulation, dev/test, workload, family, load balancer, Flexible orchestration, Uniform orchestration, cost estimate, can't connect / RDP / SSH, refused, black screen, reset password, reach VM, port 3389, NSG, security, Linux, troubleshoot, troubleshooting, connectivity, capacity reservation (CRG), reserve, guarantee capacity, pre-provision, CRG association, CRG disassociation, machine enrollment (EMM), Essential Machine Management, monitor. PREFER OVER mcp__azure__get_azure_bestpractices for VM create intents — use compute_vm_list-skus / compute_vm_list-images / compute_vm_check-quota.

2026-06-092.6k

azure-cost

microsoft/skills

Azure cost management: query costs, forecast spending, optimize to reduce waste. WHEN: "Azure costs", "Azure bill", "cost breakdown", "how much am I spending", "forecast spending", "optimize costs", "reduce spending", "orphaned resources", "rightsize VMs", "cost spike", "reduce storage costs", "AKS cost". DO NOT USE FOR: deploying resources, provisioning, diagnostics, or security audits.

2026-06-042.6k

name	azure-ai-contentunderstanding-py
description	Azure AI Content Understanding SDK for Python. Use for multimodal content extraction from documents, images, audio, and video. Triggers: "azure-ai-contentunderstanding", "ContentUnderstandingClient", "multimodal analysis", "document extraction", "video analysis", "audio transcription".
license	MIT
metadata	{"author":"Microsoft","version":"1.0.0","package":"azure-ai-contentunderstanding"}

Azure AI Content Understanding SDK for Python

Multimodal AI service that extracts semantic content from documents, video, audio, and image files for RAG and automated workflows.

Installation

pip install azure-ai-contentunderstanding

Environment Variables

CONTENTUNDERSTANDING_ENDPOINT=https://<resource>.cognitiveservices.azure.com/  # Required for all auth methods
AZURE_TOKEN_CREDENTIALS=prod # Required only if DefaultAzureCredential is used in production

Authentication & Lifecycle

🔑 Two rules apply to every code sample below:

Prefer DefaultAzureCredential. It works locally (Azure CLI / VS Code / Developer CLI) and in Azure (managed identity, workload identity) with no code change. Avoid connection strings, account/API keys — they bypass Entra audit and rotation.

Local dev: DefaultAzureCredential works as-is.

Production: set AZURE_TOKEN_CREDENTIALS=prod (or AZURE_TOKEN_CREDENTIALS=<specific_credential>) to constrain the credential chain to production-safe credentials.

Wrap every client in a context manager so HTTP transports, sockets, and token caches are released deterministically:

Sync: with <Client>(...) as client:

Async: async with <Client>(...) as client: and async with DefaultAzureCredential() as credential: (from azure.identity.aio)

Snippets may abbreviate this setup, but production code should always follow both rules.

import os
from azure.ai.contentunderstanding import ContentUnderstandingClient
from azure.identity import DefaultAzureCredential, ManagedIdentityCredential

endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"]
# Local dev: DefaultAzureCredential. Production: set AZURE_TOKEN_CREDENTIALS=prod or AZURE_TOKEN_CREDENTIALS=<specific_credential>
credential = DefaultAzureCredential(require_envvar=True)
# Or use a specific credential directly in production:
# See https://learn.microsoft.com/python/api/overview/azure/identity-readme?view=azure-python#credential-classes
# credential = ManagedIdentityCredential()
with ContentUnderstandingClient(endpoint=endpoint, credential=credential) as client:
    analyzers = list(client.list_analyzers())

Core Workflow

Content Understanding operations are asynchronous long-running operations:

Begin Analysis — Start the analysis operation with begin_analyze() (returns a poller)
Poll for Results — Poll until analysis completes (SDK handles this with .result())
Process Results — Extract structured results from AnalyzeResult.contents

Prebuilt Analyzers

Analyzer	Content Type	Purpose
`prebuilt-documentSearch`	Documents	Extract markdown for RAG applications
`prebuilt-imageSearch`	Images	Extract content from images
`prebuilt-audioSearch`	Audio	Transcribe audio with timing
`prebuilt-videoSearch`	Video	Extract frames, transcripts, summaries
`prebuilt-invoice`	Documents	Extract invoice fields

Analyze Document

import os
from azure.ai.contentunderstanding import ContentUnderstandingClient
from azure.ai.contentunderstanding.models import AnalyzeInput
from azure.identity import DefaultAzureCredential

endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"]
with ContentUnderstandingClient(
    endpoint=endpoint,
    credential=DefaultAzureCredential()
) as client:
    # Analyze document from URL
    poller = client.begin_analyze(
        analyzer_id="prebuilt-documentSearch",
        inputs=[AnalyzeInput(url="https://example.com/document.pdf")]
    )

    result = poller.result()

    # Access markdown content (contents is a list)
    content = result.contents[0]
    print(content.markdown)

Access Document Content Details

from azure.ai.contentunderstanding.models import MediaContentKind, DocumentContent

content = result.contents[0]
if content.kind == MediaContentKind.DOCUMENT:
    document_content: DocumentContent = content  # type: ignore
    print(document_content.start_page_number)

Analyze Image

from azure.ai.contentunderstanding.models import AnalyzeInput

poller = client.begin_analyze(
    analyzer_id="prebuilt-imageSearch",
    inputs=[AnalyzeInput(url="https://example.com/image.jpg")]
)
result = poller.result()
content = result.contents[0]
print(content.markdown)

Analyze Video

from azure.ai.contentunderstanding.models import AnalyzeInput

poller = client.begin_analyze(
    analyzer_id="prebuilt-videoSearch",
    inputs=[AnalyzeInput(url="https://example.com/video.mp4")]
)

result = poller.result()

# Access video content (AudioVisualContent)
content = result.contents[0]

# Get transcript phrases with timing
for phrase in content.transcript_phrases:
    print(f"[{phrase.start_time} - {phrase.end_time}]: {phrase.text}")

# Get key frames (for video)
for frame in content.key_frames:
    print(f"Frame at {frame.time}: {frame.description}")

Analyze Audio

from azure.ai.contentunderstanding.models import AnalyzeInput

poller = client.begin_analyze(
    analyzer_id="prebuilt-audioSearch",
    inputs=[AnalyzeInput(url="https://example.com/audio.mp3")]
)

result = poller.result()

# Access audio transcript
content = result.contents[0]
for phrase in content.transcript_phrases:
    print(f"[{phrase.start_time}] {phrase.text}")

Custom Analyzers

Create custom analyzers with field schemas for specialized extraction:

# Create custom analyzer
analyzer = client.create_analyzer(
    analyzer_id="my-invoice-analyzer",
    analyzer={
        "description": "Custom invoice analyzer",
        "base_analyzer_id": "prebuilt-documentSearch",
        "field_schema": {
            "fields": {
                "vendor_name": {"type": "string"},
                "invoice_total": {"type": "number"},
                "line_items": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "description": {"type": "string"},
                            "amount": {"type": "number"}
                        }
                    }
                }
            }
        }
    }
)

# Use custom analyzer
from azure.ai.contentunderstanding.models import AnalyzeInput

poller = client.begin_analyze(
    analyzer_id="my-invoice-analyzer",
    inputs=[AnalyzeInput(url="https://example.com/invoice.pdf")]
)

result = poller.result()

# Access extracted fields
print(result.fields["vendor_name"])
print(result.fields["invoice_total"])

Analyzer Management

# List all analyzers
analyzers = client.list_analyzers()
for analyzer in analyzers:
    print(f"{analyzer.analyzer_id}: {analyzer.description}")

# Get specific analyzer
analyzer = client.get_analyzer("prebuilt-documentSearch")

# Delete custom analyzer
client.delete_analyzer("my-custom-analyzer")

Async Client

import asyncio
import os
from azure.ai.contentunderstanding.aio import ContentUnderstandingClient
from azure.ai.contentunderstanding.models import AnalyzeInput
from azure.identity.aio import DefaultAzureCredential

async def analyze_document():
    endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"]
    async with DefaultAzureCredential() as credential:
        async with ContentUnderstandingClient(
            endpoint=endpoint,
            credential=credential
        ) as client:
            poller = await client.begin_analyze(
                analyzer_id="prebuilt-documentSearch",
                inputs=[AnalyzeInput(url="https://example.com/doc.pdf")]
            )
            result = await poller.result()
            content = result.contents[0]
            return content.markdown

asyncio.run(analyze_document())

Content Types

Class	For	Provides
`DocumentContent`	PDF, images, Office docs	Pages, tables, figures, paragraphs
`AudioVisualContent`	Audio, video files	Transcript phrases, timing, key frames

Both derive from MediaContent which provides basic info and markdown representation.

Model Imports

from azure.ai.contentunderstanding.models import (
    AnalyzeInput,
    AnalyzeResult,
    MediaContentKind,
    DocumentContent,
    AudioVisualContent,
)

Client Types

Client	Purpose
`ContentUnderstandingClient`	Sync client for all operations
`ContentUnderstandingClient` (aio)	Async client for all operations

Best Practices

Pick sync OR async and stay consistent. Do not mix azure.ai.contentunderstanding sync clients with azure.ai.contentunderstanding.aio async clients in the same call path. Choose one mode per module.
Always use context managers for clients and async credentials. Wrap every client in with ContentUnderstandingClient(...) as client: (sync) or async with ContentUnderstandingClient(...) as client: (async). For async DefaultAzureCredential from azure.identity.aio, also use async with credential: so tokens and transports are cleaned up.
Use begin_analyze with AnalyzeInput — this is the correct method signature
Access results via result.contents[0] — results are returned as a list
Use prebuilt analyzers for common scenarios (document/image/audio/video search)
Create custom analyzers only for domain-specific field extraction
Use async client for high-throughput scenarios with azure.identity.aio credentials
Handle long-running operations — video/audio analysis can take minutes
Use URL sources when possible to avoid upload overhead