一键导入
ai-ml-expert
AI and ML expert covering PyTorch, TensorFlow, Hugging Face, scikit-learn, LLM integration, RAG pipelines, MLOps, and production ML systems
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
AI and ML expert covering PyTorch, TensorFlow, Hugging Face, scikit-learn, LLM integration, RAG pipelines, MLOps, and production ML systems
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
Creates structured plans from requirements. Generates comprehensive plans with steps, dependencies, risks, and success criteria. Coordinates with specialist agents for planning input and validates plan completeness. Uses template-renderer for formatted output.
Create, validate, and convert skills for the agent ecosystem. Enforces standardized structure for consistency. Enables self-evolution by creating new skills on demand, converting MCP servers and codebases to skills.
Research-backed skill refresh workflow for updating existing skills with TDD checkpoints, memory-aware integration, and EVOLVE/reflection trigger handling.
Ensure accessibility in UI components including semantic HTML, ARIA attributes, keyboard navigation, and WCAG 2.2 AA compliance.
Use when you want to improve response quality through meta-cognitive reasoning. Applies 15+ reasoning methods to reconsider and refine initial outputs.
N-round opposing-stance debates for trade-off analysis. Assigns pro/con roles to agents, runs structured debate rounds with quality scoring, and produces a moderator synthesis with confidence-rated recommendation. Generalizable to architecture, technology, security, and design decisions.
| name | ai-ml-expert |
| description | AI and ML expert covering PyTorch, TensorFlow, Hugging Face, scikit-learn, LLM integration, RAG pipelines, MLOps, and production ML systems |
| version | 2.1.0 |
| model | sonnet |
| invoked_by | both |
| user_invocable | true |
| tools | ["Read","Write","Edit","Bash","Grep","Glob","WebSearch"] |
| best_practices | ["Reproducibility first — fix random seeds, log all hyperparameters","Data quality and preprocessing as the foundation of every model","Evaluate with multiple metrics aligned to business goals","Test data never seen during training (rigorous splits)","Prefer fine-tuning and transfer learning over training from scratch"] |
| error_handling | graceful |
| streaming | supported |
| verified | true |
| lastVerifiedAt | "2026-02-19T00:00:00.000Z" |
| source | builtin |
| trust_score | 100 |
| provenance_sha | 54c7d87f033bd4e4 |
When reviewing or writing PyTorch code, apply these guidelines:
torch.nn.Module for all model definitions; avoid raw function-based modelsmodel.to(device), tensor.to(device)model.train() and model.eval() context switches appropriatelyoptimizer.zero_grad() at the top of the training looptorch.no_grad() or @torch.inference_mode() for all inference codepin_memory=True) and use multiple workers in DataLoader for GPU trainingtorch.compile() (PyTorch 2.x) for production inference speedupsF.cross_entropy over manual softmax + NLLLoss (numerically stable)When reviewing or writing TensorFlow code, apply these guidelines:
tf.data.Dataset pipelines over manual batching for scalabilitytf.function for graph execution on performance-critical pathstf.keras.mixed_precision.set_global_policy('mixed_float16')tf.saved_model for portable model export; avoid picklingWhen reviewing or writing Hugging Face code, apply these guidelines:
padding=True and truncation=True when tokenizing batchesAutoModel, AutoTokenizer, and AutoConfig for checkpoint portabilitymodel.gradient_checkpointing_enable() to reduce memory for large modelsTrainer API for standard fine-tuning; use custom loops only when Trainer is insufficientTRANSFORMERS_CACHE environment variable in CI/CD pipelinesWhen reviewing or writing scikit-learn code, apply these guidelines:
Pipeline to chain preprocessing and model steps; prevents data leakageStratifiedKFold for classification tasks with class imbalanceGridSearchCV or RandomizedSearchCV for hyperparameter tuning.fit() only on training data; transform test data with the fitted transformerjoblib.dump / joblib.load (faster than pickle for large arrays)"Think step by step...") for complex reasoning taskstemperature=0 for deterministic, fact-based outputs; increase for creative tasks# Standard RAG pipeline components
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS # or Chroma, Pinecone, Weaviate
from langchain.chains import RetrievalQA
# 1. Embed and index documents
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vectorstore = FAISS.from_documents(documents, embeddings)
# 2. Retrieve relevant chunks
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
# 3. Generate with retrieved context
chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
RAG best practices:
LCEL (LangChain Expression Language) for composable chainsRunnableParallel for concurrent retrieval stepsLangGraph for stateful multi-agent workflows with cyclesRunnableRetry for unreliable external calls# Standard PyTorch training loop with best practices
for epoch in range(num_epochs):
model.train()
for batch in train_dataloader:
optimizer.zero_grad()
inputs, labels = batch["input_ids"].to(device), batch["labels"].to(device)
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) # gradient clipping
optimizer.step()
scheduler.step()
# Validation loop
model.eval()
with torch.no_grad():
for batch in val_dataloader:
# evaluate...
Key standards:
max_norm=1.0) for stability in Transformer trainingsmoothing=0.1) to reduce overconfidencefrom peft import LoraConfig, get_peft_model, TaskType
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16, # LoRA rank
lora_alpha=32, # scaling factor
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
)
model = get_peft_model(base_model, lora_config)
model.print_trainable_parameters() # verify < 1% parameters trainable
PEFT guidelines:
r=8 to r=64; higher rank = more capacity, more memoryimport mlflow
with mlflow.start_run():
mlflow.log_params({"learning_rate": lr, "batch_size": bs, "epochs": epochs})
mlflow.log_metrics({"train_loss": loss, "val_accuracy": acc}, step=epoch)
mlflow.pytorch.log_model(model, "model")
import wandb
wandb.init(project="my-project", config={"lr": 1e-4, "epochs": 10})
wandb.log({"train_loss": loss, "val_f1": f1_score})
wandb.finish()
MLOps standards:
torch.manual_seed(42), np.random.seed(42), random.seed(42)| Task | Primary Metrics | Secondary Metrics |
|---|---|---|
| Binary Classification | AUC-ROC, F1, Precision/Recall | Calibration (Brier Score) |
| Multi-class | Macro F1, Weighted F1, Cohen's Kappa | Confusion Matrix |
| Regression | RMSE, MAE, R² | Residual Analysis |
| NLP Generation | BLEU, ROUGE, BERTScore | Human Evaluation |
| Ranking/Retrieval | NDCG@k, MRR, MAP | Hit Rate@k |
| LLM Evaluation | LLM-as-judge, exact match, pass@k | Hallucination Rate |
torch.onnx.export(model, ...)torch.quantization.quantize_dynamic(model, ...)# Example: data drift detection with Evidently
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=reference_df, current_data=production_df)
report.save_html("drift_report.html")
Monitoring standards:
# Proper train/test split to avoid leakage
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y # stratify for classification
)
# Fit scaler ONLY on training data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test) # transform only, never fit_transform
Standards:
torch.manual_seed(42), np.random.seed(42), random.seed(42) and log via MLflow/W&B..transform() test; fitting on test causes data leakage and inflated performance estimates.| Anti-Pattern | Problem | Fix |
|---|---|---|
| Ignoring class imbalance | Model biased to majority class | Stratified sampling, class weights, SMOTE |
| No validation set | Overfitting undetected | Hold out 10-20% for validation |
| Optimizing a single metric | Missing failure modes | Multiple metrics (precision, recall, F1, AUC) |
| No baseline comparison | Cannot assess model quality | Establish heuristic baseline before ML |
| Accuracy on imbalanced data | Misleading performance estimate | Use F1, precision-recall curve, ROC-AUC |
| Data leakage (test in train) | Inflated performance estimates | Fit on train only; transform test with fitted obj |
| No error analysis | Cannot improve strategically | Analyze failure cases by error type |
| Training without checkpoints | Lost progress on failure | Save best model by validation metric |
| Mutable global random state | Non-reproducible experiments | Fix all seeds; log in experiment metadata |
| Embedding model in application | Cannot update model independently | Serve model via API (REST, gRPC) |
| No latency budget | Inference too slow for production | Profile and set SLO before deployment |
Training a Transformer classifier:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=3)
def tokenize(batch):
return tokenizer(batch["text"], padding=True, truncation=True, max_length=512)
dataset = dataset.map(tokenize, batched=True)
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=16,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
metric_for_best_model="f1",
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset["train"],
eval_dataset=dataset["validation"],
compute_metrics=compute_metrics,
)
trainer.train()
Minimal RAG pipeline:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
qa = RetrievalQA.from_chain_type(ChatOpenAI(model="gpt-4o"), retriever=retriever)
answer = qa.run("What is the refund policy?")
This skill is used by:
developer — Implements ML models, data pipelines, and LLM integrationsresearcher — Investigates novel architectures and evaluates research papersarchitect — Designs ML system architecture and deployment topologysecurity-architect — Reviews data privacy, model security, and inference safetypython-backend-expert — NumPy, Pandas, async Python patternscode-analyzer — Static analysis and complexity metrics for ML codedebugging — Systematic debugging for training failures and inference errorsBefore starting:
cat .claude/context/memory/learnings.md
Check for:
After completing:
.claude/context/memory/learnings.md.claude/context/memory/issues.md.claude/context/memory/decisions.mdASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.