一键导入
model-deployment
Deploy trained machine learning models as production-ready services using REST APIs, containers, serverless functions, and orchestration platforms.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Deploy trained machine learning models as production-ready services using REST APIs, containers, serverless functions, and orchestration platforms.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
Identify at-risk customer accounts by analyzing usage patterns, engagement signals, and support history to generate churn risk scores and intervention recommendations.
Analyze NPS, CSAT, and qualitative customer feedback to extract themes, identify trends, and generate actionable insight reports.
Write clear, searchable help center articles and FAQ entries based on support data, product documentation, and common customer questions.
Design structured customer onboarding workflows with phased checklists, email templates, success milestones, and ownership assignments.
Classify, prioritize, and route incoming support tickets by extracting intent and entities, assigning severity, and generating initial responses.
Create and manage budgets with variance analysis and departmental allocation
| name | model-deployment |
| description | Deploy trained machine learning models as production-ready services using REST APIs, containers, serverless functions, and orchestration platforms. |
| license | MIT |
| metadata | {"author":"AI Agent Skills","version":"1.0.0"} |
This skill enables an AI agent to deploy trained machine learning models into production environments. It covers packaging models into serving APIs with FastAPI or Flask, containerizing with Docker, orchestrating with Kubernetes, and deploying to serverless platforms. The agent handles model versioning, health checks, input validation, logging, and monitoring to ensure reliable and scalable inference in production.
Serialize and package the model: Export the trained model to a portable format such as ONNX, TorchScript, SavedModel, or joblib pickle. Bundle the model artifact with its preprocessing pipeline and any required configuration files so inference is self-contained.
Build the serving API: Create a REST API using FastAPI or Flask that loads the model at startup and exposes prediction endpoints. Include a health check endpoint, request/response schemas with input validation (Pydantic models), structured logging, and error handling that returns meaningful HTTP status codes.
Containerize with Docker: Write a Dockerfile that installs dependencies from a pinned requirements.txt, copies the model artifact and serving code, and sets the entrypoint to the API server. Use multi-stage builds to minimize image size and avoid including training-only dependencies.
Configure orchestration and scaling: Define Kubernetes Deployment and Service manifests (or equivalent for your platform) with resource requests/limits, readiness and liveness probes pointing at the health check endpoint, and a Horizontal Pod Autoscaler to scale based on CPU, memory, or custom metrics like request latency.
Deploy and verify: Push the container image to a registry, apply the Kubernetes manifests or deploy to the serverless platform, and run smoke tests against the live endpoint. Validate that responses match expected outputs for a set of known inputs.
Monitor and iterate: Integrate with monitoring tools like Prometheus and Grafana to track request latency, error rates, throughput, and model-specific metrics like prediction distribution drift. Set up alerts for anomalies and establish a redeployment workflow for updated model versions using blue-green or canary strategies.
Provide the agent with a trained model artifact, its dependencies, and the target deployment environment (local Docker, Kubernetes cluster, serverless). The agent will generate all necessary serving code, container configuration, and deployment manifests, then guide you through the deployment process.
# app.py
import joblib
import numpy as np
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, validator
from contextlib import asynccontextmanager
from typing import List
model = None
@asynccontextmanager
async def lifespan(app: FastAPI):
global model
model = joblib.load("model.pkl")
yield
app = FastAPI(title="ML Model API", version="1.0.0", lifespan=lifespan)
class PredictionRequest(BaseModel):
features: List[float]
@validator("features")
def validate_features(cls, v):
if len(v) != 4:
raise ValueError("Expected exactly 4 features")
return v
class PredictionResponse(BaseModel):
prediction: int
probability: List[float]
@app.get("/health")
def health_check():
return {"status": "healthy", "model_loaded": model is not None}
@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
try:
features = np.array(request.features).reshape(1, -1)
prediction = int(model.predict(features)[0])
probability = model.predict_proba(features)[0].tolist()
return PredictionResponse(prediction=prediction, probability=probability)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
Dockerfile:
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin/uvicorn /usr/local/bin/uvicorn
COPY app.py model.pkl ./
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
k8s-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-model-api
spec:
replicas: 3
selector:
matchLabels:
app: ml-model-api
template:
metadata:
labels:
app: ml-model-api
spec:
containers:
- name: api
image: registry.example.com/ml-model-api:v1.0.0
ports:
- containerPort: 8000
resources:
requests: { cpu: "250m", memory: "512Mi" }
limits: { cpu: "1000m", memory: "1Gi" }
readinessProbe:
httpGet: { path: /health, port: 8000 }
initialDelaySeconds: 10
periodSeconds: 5
livenessProbe:
httpGet: { path: /health, port: 8000 }
initialDelaySeconds: 15
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: ml-model-api
spec:
selector:
app: ml-model-api
ports:
- port: 80
targetPort: 8000
type: LoadBalancer
requirements.txt and use deterministic Docker builds to guarantee reproducibility across environments.terminationGracePeriodSeconds to allow enough time for pending requests to complete.torch.load(path, map_location="cpu")) and test inference on the target hardware before deployment.