This skill should be used when the user asks to automate Azure ML and Azure Data Factory production workflows. PROACTIVELY activate for: (1) Azure ML code asset registration, azure-ai-ml SDK, AML code versions, `result.version`, requested-vs-actual versions, (2) ADF to Azure ML orchestration, ADF WebActivity, managed identity blob reads, `connectVia`, managed VNet IR, (3) code asset version pointer blobs, latest.json contracts, training/scoring code version propagation, (4) private storage firewalls, Microsoft-hosted CI agents, temporary network rules, storage data-plane smoke tests, (5) marshmallow<4 pinning, AML SDK import failures, runtime validation for Azure ML infrastructure. Provides: operationally safe Azure ML + ADF automation patterns.
This skill should be used when the user asks to deploy, scale, or cost-optimize ML workloads on cloud platforms. PROACTIVELY activate for: (1) AWS SageMaker Studio, Training, Processing, Pipelines, Endpoints, Model Monitor, Feature Store, Clarify, Ground Truth, EC2 GPUs, EKS, Lambda, Inferentia, Trainium, (2) GCP Vertex AI Training, Pipelines, Endpoints, Feature Store, Model Monitoring, AutoML, Matching Engine, TPU, GKE, Cloud Run, (3) Azure ML workspaces, pipelines, managed endpoints, AutoML, Responsible ML, AKS/ACI, (4) Databricks, Modal, Replicate, RunPod, Lambda Labs, Anyscale. Provides: cloud ML architecture, autoscaling, hardware, security, and cost guidance.
This skill should be used when the user asks to ingest, clean, validate, transform, version, monitor, or serve ML data and features. PROACTIVELY activate for: (1) data ingestion, preprocessing, feature engineering, leakage prevention, train/serving skew, (2) Spark, Dask, Polars, pandas, Ray Data, streaming pipelines, (3) Great Expectations, TFDV, Deequ, data quality and validation, (4) DVC, lakehouse tables, dataset versioning, lineage, reproducibility, (5) Feast, Tecton, Hopsworks feature stores, point-in-time joins, online/offline features. Provides: scalable, reproducible, leakage-safe ML data pipeline design.
This skill should be used when the user asks to productionize, track, version, govern, monitor, or automate ML systems. PROACTIVELY activate for: (1) MLflow, Weights & Biases, Neptune, Comet, ClearML experiment tracking, (2) model registry, model versioning, artifact lineage, reproducibility, (3) Kubeflow, SageMaker Pipelines, Vertex AI Pipelines, Azure ML pipelines, Databricks workflows, (4) CI/CD, continuous training/evaluation, A/B tests, canary/shadow deployments, (5) drift detection, model monitoring, data validation, responsible AI governance. Provides: end-to-end MLOps architecture and operational safeguards.
Kaggle competition notebook workflows and submissions. PROACTIVELY activate for: (1) submitting notebook outputs to competitions, (2) `kaggle competitions submit -k`, (3) downloading competition data, (4) validating submission.csv format, (5) leakage review, (6) cross-validation split design, (7) public leaderboard overfitting concerns, (8) competition rule compliance, (9) reproducible top-to-bottom notebook execution, (10) fold-aware preprocessing for ML pipelines. Provides: submission commands, validation checklist, leakage controls, and competition-ready notebook guidance.
Kaggle datasets, models, sources, and kagglehub workflows. PROACTIVELY activate for: (1) downloading datasets with kagglehub, (2) uploading datasets, (3) downloading or uploading Kaggle models, (4) competition_download, (5) notebook_output_download, (6) choosing Kaggle CLI vs kagglehub, (7) attaching dataset_sources, competition_sources, kernel_sources, or model_sources, (8) model artifact transfer, (9) source dependency cleanup, (10) kagglehub limitations for notebooks. Provides: dataset/model transfer patterns, source attachment guidance, tool selection, and limitation checks.
Kaggle runtime environment, paths, accelerators, and reproducibility. PROACTIVELY activate for: (1) `/kaggle/input` path errors, (2) `/kaggle/working` output placement, (3) local vs Kaggle notebook behavior, (4) GPU/TPU/accelerator selection, (5) internet enablement, (6) package/version pinning, (7) memory cleanup and timeout issues, (8) DEBUG flags for fast runs, (9) Kaggle Secrets usage guidance, (10) quota-consuming runtime settings. Provides: path conventions, runtime checklist, accelerator IDs, reproducibility patterns, and resource safeguards.
Kaggle `kernel-metadata.json` setup and repair. PROACTIVELY activate for: (1) creating kernel-metadata.json, (2) fixing metadata validation errors, (3) setting `id`, `title`, `code_file`, language, or kernel type, (4) configuring private/public, internet, GPU, or accelerator behavior, (5) attaching dataset_sources, competition_sources, kernel_sources, or model_sources, (6) preparing metadata before `kaggle kernels push`, (7) converting script/notebook metadata. Provides: schema checklist, valid fields, source arrays, defaults, and push-readiness review.