تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

ml-azureml-adf-automation

Name: Ml Azureml Adf Automation
Author: JosiahSiegel

// This skill should be used when the user asks to automate Azure ML and Azure Data Factory production workflows. PROACTIVELY activate for: (1) Azure ML code asset registration, azure-ai-ml SDK, AML code versions, `result.version`, requested-vs-actual versions, (2) ADF to Azure ML orchestration, ADF WebActivity, managed identity blob reads, `connectVia`, managed VNet IR, (3) code asset version pointer blobs, latest.json contracts, training/scoring code version propagation, (4) private storage firewalls, Microsoft-hosted CI agents, temporary network rules, storage data-plane smoke tests, (5) marshmallow<4 pinning, AML SDK import failures, runtime validation for Azure ML infrastructure. Provides: operationally safe Azure ML + ADF automation patterns.

تشغيل في Manus

$ git log --oneline --stat

stars:٣٩

forks:٧

updated:٢٨ مايو ٢٠٢٦ في ٠٤:٠٣

SKILL.md

readonly

related-skills.json

نفس المستودع

ml-cloud-deployment.md

from "JosiahSiegel/claude-plugin-marketplace"

This skill should be used when the user asks to deploy, scale, or cost-optimize ML workloads on cloud platforms. PROACTIVELY activate for: (1) AWS SageMaker Studio, Training, Processing, Pipelines, Endpoints, Model Monitor, Feature Store, Clarify, Ground Truth, EC2 GPUs, EKS, Lambda, Inferentia, Trainium, (2) GCP Vertex AI Training, Pipelines, Endpoints, Feature Store, Model Monitoring, AutoML, Matching Engine, TPU, GKE, Cloud Run, (3) Azure ML workspaces, pipelines, managed endpoints, AutoML, Responsible ML, AKS/ACI, (4) Databricks, Modal, Replicate, RunPod, Lambda Labs, Anyscale. Provides: cloud ML architecture, autoscaling, hardware, security, and cost guidance.

2026-05-2839

ml-data-pipeline.md

from "JosiahSiegel/claude-plugin-marketplace"

This skill should be used when the user asks to ingest, clean, validate, transform, version, monitor, or serve ML data and features. PROACTIVELY activate for: (1) data ingestion, preprocessing, feature engineering, leakage prevention, train/serving skew, (2) Spark, Dask, Polars, pandas, Ray Data, streaming pipelines, (3) Great Expectations, TFDV, Deequ, data quality and validation, (4) DVC, lakehouse tables, dataset versioning, lineage, reproducibility, (5) Feast, Tecton, Hopsworks feature stores, point-in-time joins, online/offline features. Provides: scalable, reproducible, leakage-safe ML data pipeline design.

2026-05-2839

ml-mlops.md

from "JosiahSiegel/claude-plugin-marketplace"

This skill should be used when the user asks to productionize, track, version, govern, monitor, or automate ML systems. PROACTIVELY activate for: (1) MLflow, Weights & Biases, Neptune, Comet, ClearML experiment tracking, (2) model registry, model versioning, artifact lineage, reproducibility, (3) Kubeflow, SageMaker Pipelines, Vertex AI Pipelines, Azure ML pipelines, Databricks workflows, (4) CI/CD, continuous training/evaluation, A/B tests, canary/shadow deployments, (5) drift detection, model monitoring, data validation, responsible AI governance. Provides: end-to-end MLOps architecture and operational safeguards.

2026-05-2839

competition-workflows.md

from "JosiahSiegel/claude-plugin-marketplace"

Kaggle competition notebook workflows and submissions. PROACTIVELY activate for: (1) submitting notebook outputs to competitions, (2) `kaggle competitions submit -k`, (3) downloading competition data, (4) validating submission.csv format, (5) leakage review, (6) cross-validation split design, (7) public leaderboard overfitting concerns, (8) competition rule compliance, (9) reproducible top-to-bottom notebook execution, (10) fold-aware preprocessing for ML pipelines. Provides: submission commands, validation checklist, leakage controls, and competition-ready notebook guidance.

2026-05-2739

datasets-models-sources.md

from "JosiahSiegel/claude-plugin-marketplace"

Kaggle datasets, models, sources, and kagglehub workflows. PROACTIVELY activate for: (1) downloading datasets with kagglehub, (2) uploading datasets, (3) downloading or uploading Kaggle models, (4) competition_download, (5) notebook_output_download, (6) choosing Kaggle CLI vs kagglehub, (7) attaching dataset_sources, competition_sources, kernel_sources, or model_sources, (8) model artifact transfer, (9) source dependency cleanup, (10) kagglehub limitations for notebooks. Provides: dataset/model transfer patterns, source attachment guidance, tool selection, and limitation checks.

2026-05-2739

kaggle-environment.md

from "JosiahSiegel/claude-plugin-marketplace"

Kaggle runtime environment, paths, accelerators, and reproducibility. PROACTIVELY activate for: (1) `/kaggle/input` path errors, (2) `/kaggle/working` output placement, (3) local vs Kaggle notebook behavior, (4) GPU/TPU/accelerator selection, (5) internet enablement, (6) package/version pinning, (7) memory cleanup and timeout issues, (8) DEBUG flags for fast runs, (9) Kaggle Secrets usage guidance, (10) quota-consuming runtime settings. Provides: path conventions, runtime checklist, accelerator IDs, reproducibility patterns, and resource safeguards.

2026-05-2739

package.json

"author": "JosiahSiegel"

"repository": "JosiahSiegel/claude-plugin-marketplace"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

name

ml-azureml-adf-automation

description

This skill should be used when the user asks to automate Azure ML and Azure Data Factory production workflows. PROACTIVELY activate for: (1) Azure ML code asset registration, azure-ai-ml SDK, AML code versions, `result.version`, requested-vs-actual versions, (2) ADF to Azure ML orchestration, ADF WebActivity, managed identity blob reads, `connectVia`, managed VNet IR, (3) code asset version pointer blobs, latest.json contracts, training/scoring code version propagation, (4) private storage firewalls, Microsoft-hosted CI agents, temporary network rules, storage data-plane smoke tests, (5) marshmallow<4 pinning, AML SDK import failures, runtime validation for Azure ML infrastructure. Provides: operationally safe Azure ML + ADF automation patterns.

Azure ML and ADF Automation

Overview

Use this skill for Azure Machine Learning automation that registers code assets in CI and orchestrates training, scoring, registration, or deployment through Azure Data Factory. The main invariant is that runtime systems must consume the exact Azure ML asset versions that were actually registered, not the versions a pipeline attempted to request. Validate every recommendation against runtime behavior because Azure ML, ADF, storage networking, and SDK dependency behavior can diverge from static API documentation.

Core Invariants

CI owns Azure ML code asset registration and publishes the actual SDK-returned version.
ADF receives code versions through an explicit contract, usually a storage pointer blob, instead of discovering AML code versions at runtime.
The SDK result is the source of truth: requested version, build ID, branch name, or commit-derived strings are not authoritative.
Private storage requires both correct RBAC and proven data-plane reachability from the executing runtime.
ADF WebActivity networking must be tested through the intended integration runtime, not just validated as JSON.
Dependency constraints for Azure ML automation are pinned in CI environments.
Runtime evidence beats plausible ARM paths, documentation snippets, or successful template compilation.

Azure ML Code Asset Registration

Prefer the Python SDK for registering Azure ML code assets when automation must reliably capture the registered version. Use the Azure CLI only after confirming the target environment's az ml extension supports the needed code commands and returns enough information for downstream automation.

from azure.ai.ml import MLClient
from azure.ai.ml.entities._assets._artifacts.code import Code
from azure.identity import AzureCliCredential

ml_client = MLClient(
    AzureCliCredential(),
    subscription_id,
    resource_group,
    workspace_name,
)

result = ml_client._code.create_or_update(
    Code(name=code_name, version=requested_version, path=staged_code_path)
)
actual_version = result.version
print(actual_version)

Do not assume requested_version == result.version. Azure ML code assets can deduplicate uploads by content hash and return an existing version when the staged directory matches a prior asset. That is useful storage behavior but dangerous if CI publishes a requested build identifier instead of the SDK-returned version.

CI Output Variable Pattern

Publish the returned version as a pipeline output variable and wire downstream steps to that output.

print(
    "##vso[task.setvariable "
    f"variable=trainingCodeVersion;isOutput=true]{result.version}"
)

Prefer:

$registeredVersion = '$(RegisterTrainingCode.trainingCodeVersion)'

Avoid:

$registeredVersion = '$(Build.BuildId)'

If unique asset versions are operationally required even when code content repeats, stage the code directory and write a small marker file such as .aml-code-asset-version before registration. Treat this only as a dedup workaround. The real contract remains the SDK-returned result.version.

ADF to Azure ML Version Resolution

Avoid making ADF discover AML code versions through Azure ML ARM code-container endpoints unless that exact path has passed runtime validation. Some AML management endpoints can appear valid in REST documentation but fail from ADF WebActivity at execution time with unsupported-operation behavior. Treat that as service behavior until proven otherwise, not primarily an RBAC problem.

Use a CI-written pointer blob as the runtime contract between registration and orchestration:

https://<storage-account>.blob.core.windows.net/ml-globals/code-assets/training-code/latest.json

Example payload:

{
  "assetName": "training-code",
  "version": "<actual-sdk-returned-version>",
  "workspaceName": "<workspace-name>",
  "resourceGroup": "<resource-group>",
  "subscriptionId": "<subscription-id>",
  "buildId": "<build-or-run-id>",
  "sourceBranch": "<branch>",
  "sourceVersion": "<source-version>",
  "registeredAtUtc": "<timestamp>"
}

ADF reads version from this blob and passes it as a parameter to training, scoring, model registration, or deployment pipelines. The payload may include provenance fields, but downstream jobs should depend only on fields that are deliberately part of the contract.

ADF WebActivity for Pointer Blob Reads

Read the pointer blob with managed identity authentication against Azure Storage:

{
  "name": "ReadLatestTrainingCodeVersion",
  "type": "WebActivity",
  "typeProperties": {
    "method": "GET",
    "url": {
      "type": "Expression",
      "value": "@concat('https://', pipeline().globalParameters.StorageAccountName, '.blob.core.windows.net/ml-globals/code-assets/training-code/latest.json')"
    },
    "headers": {
      "x-ms-version": "2023-11-03",
      "Accept": "application/json"
    },
    "authentication": {
      "type": "MSI",
      "resource": "https://storage.azure.com/"
    },
    "connectVia": {
      "referenceName": "<managed-vnet-ir-name>",
      "type": "IntegrationRuntimeReference"
    }
  }
}

Critical placement rule: for ADF WebActivity, connectVia belongs inside typeProperties. If it is placed at the activity root, it can be ignored, causing traffic to leave over the public internet and fail against storage accounts with defaultAction: Deny.

Required access commonly includes:

ADF managed identity: Storage Blob Data Reader on the pointer container or account scope.
CI service connection identity: Storage Blob Data Contributor to write pointer blobs.
CI service connection identity: Storage Account Contributor when the pipeline manages storage firewall rules.

Private Storage and Hosted CI Agents

For storage accounts with private endpoints and defaultAction: Deny, Microsoft-hosted CI agents usually egress from public per-run IP addresses. Correct RBAC is not enough if the agent cannot reach the storage data plane. Before blaming the Azure ML SDK, ADF, or IAM, prove storage reachability from the agent.

Safe CI pattern:

Resolve the agent public IP.
Add a temporary storage network rule for that IP.
Wait for propagation.
Smoke-test storage data-plane access.
Register code assets and write pointer blobs.
Remove the temporary rule in an always() cleanup step.

$agentIp = (Invoke-RestMethod -Uri 'https://api.ipify.org' -TimeoutSec 20).Trim()

az storage account network-rule add `
  --resource-group $rg `
  --account-name $storageAccount `
  --action Allow `
  --ip-address $agentIp `
  --only-show-errors

Start-Sleep -Seconds 30

az storage container list `
  --account-name $storageAccount `
  --auth-mode login `
  --only-show-errors `
  -o none

Cleanup should run even when registration fails. In Azure DevOps YAML, put network-rule removal in a step with condition: always().

Python Dependency Pinning

Some azure-ai-ml versions import private marshmallow symbols that are unavailable in marshmallow 4.x. Hosted agents can install an incompatible transitive version and fail before any Azure ML API call runs. Pin the SDK and transitive dependency together when using affected versions.

python -m pip install --upgrade `
  "azure-ai-ml==1.24.0" `
  "azure-identity==1.19.0" `
  "marshmallow>=3.18,<4.0"

If using a newer SDK, verify the dependency behavior in CI rather than removing the pin based on local success.

ADF Development Workflow

Confirm which ADF execution mode reads unpublished Git-branch state and which mode runs the published factory definition. Debug runs may exercise branch state, while scheduled and production runs typically execute the last published factory. Manual trigger behavior depends on how the factory is configured and invoked. Pick the mode that actually exercises the change being validated.

Runtime Validation Standard

Accept runtime evidence, not structural plausibility. Validate:

Azure ML registration returned result.version and CI propagated that exact value.
Pointer blobs contain the version that Azure ML actually registered.
ADF reads the blob through the intended integration runtime and managed identity.
Storage is tested with the real firewall posture and CI agent egress path.
Downstream ADF parameters flow into the AML job definition used at runtime.
The AML training or scoring job starts with the expected code asset version.
SDK imports succeed in the same hosted-agent image used by CI.

Insufficient validation includes: docs showing an endpoint exists, JSON parsing, a plausible ARM URL, a successful deployment template, a requested version printed in logs without checking result.version, or review comments without runtime evidence.

Operational Checklist

Short Rules

Use the Azure ML Python SDK for code registration unless CLI behavior is verified in the target environment.
Never assume requested version equals registered version.
Treat SDK result.version as the source of truth.
Avoid AML ARM /codes/... discovery from ADF without runtime testing.
Use blob pointers for ADF-readable ML asset version contracts.
Put ADF WebActivity connectVia inside typeProperties.
Assume hosted CI agents need temporary storage firewall access for private storage.
Pin transitive dependencies when AML SDK imports have known constraints.
Require runtime validation for ML infrastructure changes.

Sources

Azure Machine Learning documentation: https://learn.microsoft.com/azure/machine-learning/
Azure Data Factory Web activity documentation: https://learn.microsoft.com/azure/data-factory/control-flow-web-activity
Azure Storage firewall and virtual network documentation: https://learn.microsoft.com/azure/storage/common/storage-network-security
Azure Machine Learning Python SDK documentation: https://learn.microsoft.com/python/api/overview/azure/ai-ml-readme

ml-azureml-adf-automation

المزيد من هذا المستودع

المزيد من هذا المستودع

Azure ML and ADF Automation

Overview

Core Invariants

Azure ML Code Asset Registration

CI Output Variable Pattern

ADF to Azure ML Version Resolution

ADF WebActivity for Pointer Blob Reads

Private Storage and Hosted CI Agents

Python Dependency Pinning

ADF Development Workflow

Runtime Validation Standard

Operational Checklist

Short Rules

Sources

Azure ML and ADF Automation

Overview

Core Invariants

Azure ML Code Asset Registration

CI Output Variable Pattern

ADF to Azure ML Version Resolution

ADF WebActivity for Pointer Blob Reads

Private Storage and Hosted CI Agents

Python Dependency Pinning

ADF Development Workflow

Runtime Validation Standard

Operational Checklist

Short Rules

Sources