with one click
new-service
// Create a complete new ML service from template — end-to-end scaffolding
// Create a complete new ML service from template — end-to-end scaffolding
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | new-service |
| description | Create a complete new ML service from template — end-to-end scaffolding |
| allowed-tools | ["Read","Write","Edit","Grep","Glob","Bash(cp:*)","Bash(mkdir:*)","Bash(sed:*)","Bash(docker:*)","Bash(kubectl:*)","Bash(dvc:*)","Bash(terraform:*)"] |
| when_to_use | Use when creating a new ML microservice from scratch for a business problem. Examples: 'create a new churn prediction service', 'scaffold a fraud detection API', 'new service for loan default prediction' |
| argument-hint | <service-name> <business-problem> |
| arguments | ["service-name","business-problem"] |
| authorization_mode | {"scaffold_files":"AUTO","init_dvc":"AUTO","create_mlflow_experiment":"AUTO","wire_cicd":"AUTO","push_initial_commit":"CONSULT","escalation_triggers":[{"target_dir_exists":"STOP"},{"eda_artifacts_missing":"STOP"},{"service_name_collides":"STOP"}]} |
Guides creation of a complete, production-ready ML service using the template system.
$service-name: Service slug (e.g., bankchurn, frauddetect)$business-problem: What the service predicts/classifiesA fully deployed, tested, monitored ML service with all quality gates passing, drift detection running, and documentation complete.
templates/scripts/new-service.sh exists and is executableHuman checkpoint: Confirm requirements before scaffolding.
Answer these questions:
bash templates/scripts/new-service.sh "$service-name" "$service-slug"
Verify no remaining placeholders:
grep -r "{ServiceName}\|{service}\|{SERVICE}" $service-name/ --include="*.py" --include="*.yaml" | head -20
Success criteria: Directory created with zero remaining {ServiceName}, {service}, or {SERVICE} placeholders. Run examples/minimal/ if this is the first time to validate template works.
src/$service-name/schemas.pydvc add data/raw/dataset.csvSuccess criteria: Pandera schema validates sample data without errors. DVC tracking configured.
FeatureEngineer class in src/$service-name/training/features.pysrc/$service-name/training/model.pyTrainer.run() in src/$service-name/training/train.py:
Success criteria: python -m src.$service-name.cli train --data data/raw/dataset.csv completes with all quality gates passing.
app/schemas.pyapp/main.py owns lifespan, /health, /ready, CORS, tracing,
error envelope, /model/info, and /model/reloadapp/fastapi_app.py owns /predict, /predict_batch,
/metrics, model loading, feature parity, SHAP, and prediction
logging/predict with ThreadPoolExecutor (NEVER sync predict in async)/predict?explain=true with SHAP KernelExplainer/predict_batch for batch predictions (note: underscore, not slash)/health for liveness probe (200 while process alive)/ready for readiness probe (503 until warm-up complete — D-23)/metrics for PrometheusFeatureEngineer.transform_inference() aligned with trainingpredict_proba_wrapper for SHAP in original feature spacetests/test_fastapi_template_contract.py passingSuccess criteria: pytest tests/test_fastapi_template_contract.py tests/test_api.py -v passes. curl localhost:8000/health returns healthy and /ready returns 200 only after the model is loaded and warmed.
Dockerfile (multi-stage, non-root, HEALTHCHECK).dockerignore excludes models/, data/raw/, tests/docker build -t $service-name:dev .
docker run -p 8000:8000 $service-name:dev
curl localhost:8000/health
Success criteria: Docker build succeeds. Container starts and /health returns 200.
templates/k8s/deployment.yamltemplates/k8s/hpa.yamltemplates/k8s/service.yamlSuccess criteria: for o in gcp-dev gcp-staging gcp-prod aws-dev aws-staging aws-prod; do kustomize build k8s/overlays/$o; done renders valid YAML for all 6 overlays.
infra/terraform/{cloud}/terraform plan → verify → terraform applySuccess criteria: terraform plan shows expected resources with no errors.
.github/workflows/ci.ymlretrain-$service-name.yml with quality gatesSuccess criteria: CI workflow triggers on PR and runs tests + lint + type check.
/metrics exports {service}_requests_total, {service}_request_duration_secondstemplates/monitoring/grafana-dashboard.jsonSuccess criteria: Grafana dashboard shows live metrics. Alert rules configured.
drift_detection.py with quantile-based binsSuccess criteria: CronJob runs successfully. PSI metrics appear in Pushgateway.
README.md with real metricsSuccess criteria: README includes measured metrics, not estimates.
Success criteria: pytest tests/ -v --cov=src --cov-report=term-missing shows >= 90% coverage.
== for ML package pinning — use ~= (compatible release)A service is production-ready when ALL of these pass: