Manus에서 모든 스킬 실행
원클릭으로
원클릭으로
원클릭으로 Manus에서 모든 스킬 실행
시작하기containerization
Docker, Kubernetes, container orchestration, and cloud-native deployment for data applications
스타4
포크1
업데이트2025년 12월 30일 12:44
파일 탐색기
6 개 파일SKILL.md
readonly메뉴
Docker, Kubernetes, container orchestration, and cloud-native deployment for data applications
| name | containerization |
| description | Docker, Kubernetes, container orchestration, and cloud-native deployment for data applications |
| sasmp_version | 1.3.0 |
| bonded_agent | 03-devops-engineer |
| bond_type | PRIMARY_BOND |
| skill_version | 2.0.0 |
| last_updated | 2025-01 |
| complexity | intermediate |
| estimated_mastery_hours | 120 |
| prerequisites | ["python-programming","cloud-platforms"] |
| unlocks | ["mlops","big-data"] |
Production-grade container orchestration for data engineering workloads with Docker and Kubernetes.
# Dockerfile for PySpark data application
FROM python:3.12-slim
# Install Java for Spark
RUN apt-get update && apt-get install -y openjdk-17-jdk-headless && \
apt-get clean && rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Install dependencies first (cache optimization)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY src/ ./src/
COPY config/ ./config/
# Non-root user for security
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
ENV PYTHONPATH=/app
ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
ENTRYPOINT ["python", "-m", "src.main"]
# Build stage
FROM python:3.12 AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt
# Runtime stage
FROM python:3.12-slim AS runtime
COPY --from=builder /wheels /wheels
RUN pip install --no-cache-dir /wheels/* && rm -rf /wheels
COPY src/ /app/src/
WORKDIR /app
USER 1000
CMD ["python", "-m", "src.main"]
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: etl-worker
labels:
app: etl-worker
spec:
replicas: 3
selector:
matchLabels:
app: etl-worker
template:
metadata:
labels:
app: etl-worker
spec:
containers:
- name: etl-worker
image: company/etl-worker:v1.2.0
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
- name: LOG_LEVEL
value: "INFO"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: etl-worker
topologyKey: kubernetes.io/hostname
# cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-etl
spec:
schedule: "0 2 * * *" # 2 AM daily
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
backoffLimit: 2
activeDeadlineSeconds: 7200 # 2 hour timeout
template:
spec:
restartPolicy: Never
containers:
- name: etl-job
image: company/etl-pipeline:v1.0.0
resources:
requests:
memory: "4Gi"
cpu: "2000m"
limits:
memory: "8Gi"
cpu: "4000m"
env:
- name: EXECUTION_DATE
value: "{{ .Date }}"
volumeMounts:
- name: config
mountPath: /app/config
readOnly: true
volumes:
- name: config
configMap:
name: etl-config
# Chart.yaml
apiVersion: v2
name: data-pipeline
version: 1.0.0
appVersion: "2.0.0"
description: Data pipeline Helm chart
# values.yaml
replicaCount: 3
image:
repository: company/data-pipeline
tag: "latest"
pullPolicy: IfNotPresent
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "2000m"
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
env:
LOG_LEVEL: INFO
BATCH_SIZE: "1000"
secrets:
- name: DATABASE_URL
secretName: db-credentials
key: url
# docker-compose.yml
version: '3.8'
services:
postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: datawarehouse
POSTGRES_USER: admin
POSTGRES_PASSWORD: ${DB_PASSWORD}
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U admin"]
interval: 5s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
ports:
- "6379:6379"
airflow-webserver:
image: apache/airflow:2.8.0-python3.11
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_started
environment:
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://admin:${DB_PASSWORD}@postgres/datawarehouse
AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0
ports:
- "8080:8080"
volumes:
- ./dags:/opt/airflow/dags
- ./plugins:/opt/airflow/plugins
volumes:
postgres_data:
| Tool | Purpose | Version (2025) |
|---|---|---|
| Docker | Containerization | 25+ |
| Kubernetes | Orchestration | 1.29+ |
| Helm | K8s package manager | 3.14+ |
| ArgoCD | GitOps deployment | 2.10+ |
| Kustomize | K8s config management | Built-in |
| containerd | Container runtime | 1.7+ |
| Podman | Docker alternative | 4.8+ |
| Issue | Symptoms | Root Cause | Fix |
|---|---|---|---|
| OOMKilled | Pod restarts, exit code 137 | Memory limit exceeded | Increase limits, optimize code |
| CrashLoopBackOff | Pod keeps restarting | App crash, bad config | Check logs: kubectl logs pod |
| ImagePullBackOff | Pod stuck in Pending | Image not found, auth | Check image name, pull secrets |
| Pending Pod | Pod won't schedule | No resources, node selector | Check resources, affinity rules |
# Check pod status and events
kubectl describe pod <pod-name>
# View container logs
kubectl logs <pod-name> -c <container-name> --previous
# Execute shell in container
kubectl exec -it <pod-name> -- /bin/sh
# Check resource usage
kubectl top pods
# Debug networking
kubectl run debug --image=busybox -it --rm -- sh
# ✅ DO: Use specific image tags
FROM python:3.12.1-slim
# ✅ DO: Use non-root user
USER 1000
# ✅ DO: Use multi-stage builds
# ✅ DO: Set resource limits
# ✅ DO: Use health checks
# ❌ DON'T: Run as root
# ❌ DON'T: Use latest tag
# ❌ DON'T: Store secrets in images
Skill Certification Checklist:
FastAPI, REST APIs, GraphQL, data service design, and API best practices
Apache Spark, Hadoop, distributed computing, and large-scale data processing for petabyte-scale workloads
Portfolio building, technical interviews, job search strategies, and continuous learning
GitHub Actions, GitLab CI, Jenkins, and automated deployment pipelines
AWS, GCP, Azure data platforms, infrastructure as code, and cloud-native data solutions
Data pipeline architecture, ETL/ELT patterns, data modeling, and production data platform design