| name | aws-cost-optimization |
| description | Evaluate AWS costs and generate actionable optimization suggestions. Covers EC2, S3, RDS, Lambda, ECS/EKS, and data transfer. Use when analyzing AWS spending, right-sizing resources, choosing pricing models (RI/Savings Plans/Spot), implementing tagging strategies, setting budget alerts, or auditing infrastructure for cost waste. |
AWS Cost Optimization
Evaluate current AWS costs and produce actionable optimization suggestions across compute, storage, networking, and managed services.
Do not use this skill when
- The task targets Azure, GCP, or a non-AWS provider
- The task is about general application performance without a cost dimension
Instructions
- Identify the AWS services in scope (EC2, S3, RDS, Lambda, etc.).
- Apply the Cost Optimization Framework top-to-bottom: Visibility → Right-Sizing → Pricing → Architecture.
- For each service, check the Service-Specific Quick Reference below.
- Generate a prioritized list of suggestions with estimated savings.
- For Terraform/CLI/boto3 implementation details, open
references/implementation-playbook.md.
Cost Optimization Framework
1. Visibility — Know What You Spend
| Action | Tool / Service |
|---|
| Cost allocation tags on every resource | AWS Tag Editor, Tag Policies |
| Monthly/daily spend dashboards | Cost Explorer, QuickSight |
| Budget alerts at 50 %, 80 %, 100 % | AWS Budgets |
| Anomaly detection | AWS Cost Anomaly Detection |
| Per-team / per-env cost breakdown | Cost Categories, Linked Accounts |
2. Right-Sizing — Stop Over-Provisioning
| Signal | Action |
|---|
| CPU < 20 % sustained | Downsize instance family or switch to Graviton |
| Memory < 30 % sustained | Use memory-optimized → general purpose |
| EBS IOPS < provisioned | Switch gp3 / reduce provisioned IOPS |
| Idle resource (EIP, ELB, NAT, EBS) | Delete or stop |
| Lambda memory > 2× needed | Run aws lambda get-function-configuration and tune |
Use AWS Compute Optimizer and Trusted Advisor for automated right-sizing recommendations.
3. Pricing Models — Pay Less Per Unit
| Model | Savings | Best For | Commitment |
|---|
| On-Demand | 0 % | Spiky, unpredictable | None |
| Savings Plans (Compute) | up to 66 % | Steady compute (EC2, Fargate, Lambda) | 1 or 3 yr |
| Savings Plans (EC2 Instance) | up to 72 % | Known instance family & region | 1 or 3 yr |
| Reserved Instances (Standard) | up to 72 % | Steady-state, known type | 1 or 3 yr |
| Reserved Instances (Convertible) | up to 54 % | Steady-state, flexible type | 1 or 3 yr |
| Spot Instances | up to 90 % | Fault-tolerant batch, CI/CD, HPC | None (2-min notice) |
Decision heuristic:
- Steady 24/7 → Savings Plan or RI
- Batch / stateless → Spot with On-Demand fallback
- Unknown workload → start On-Demand, analyze with Cost Explorer, then commit
4. Architecture — Spend Smarter
| Pattern | Why |
|---|
| Serverless First | Zero idle cost; Lambda, Step Functions, EventBridge |
| Graviton (ARM) instances | 20-40 % cheaper at same performance |
| Multi-tier S3 storage | Auto-transition hot → IA → Glacier → Deep Archive |
| Caching (ElastiCache, CloudFront) | Reduce origin hits and data transfer |
| VPC Endpoints | Eliminate NAT Gateway data processing charges |
| Regional consolidation | Reduce cross-region transfer costs |
Service-Specific Quick Reference
EC2
- Use Graviton (
c7g, m7g, r7g) for 20-40 % cost reduction.
- Mix Spot + On-Demand via Capacity-Optimized allocation in ASGs.
- Enable auto-scaling with target tracking (CPU 60-70 %).
- Schedule dev/staging instances off-hours with Instance Scheduler.
S3
| Storage Class | Use Case | vs Standard |
|---|
| Standard | Frequently accessed | baseline |
| Standard-IA | Accessed < 1×/month | –45 % |
| One Zone-IA | Non-critical, infrequent | –60 % |
| Glacier Instant Retrieval | Quarterly access, ms retrieval | –68 % |
| Glacier Flexible Retrieval | Annual access, hours retrieval | –78 % |
| Deep Archive | Compliance / 7-yr retention | –95 % |
Implement S3 Intelligent-Tiering when access patterns are unpredictable.
Use S3 Lifecycle rules for deterministic transitions (see playbook).
RDS / Aurora
| Environment | Recommended Tier |
|---|
| Development | db.t4g.micro – db.t4g.small |
| Staging | db.t4g.medium – db.t4g.large |
| Production | db.r7g.xlarge + read replicas |
- Use Aurora Serverless v2 for variable traffic.
- Enable RDS Reserved Instances for production.
- Use Aurora I/O-Optimized if I/O costs > 25 % of total DB cost.
Lambda
- Right-size memory with AWS Lambda Power Tuning.
- Use Graviton2 (
arm64 architecture) for ~34 % cost reduction.
- Enable Provisioned Concurrency only when cold-start SLA < 100 ms.
- Prefer Step Functions over chained Lambdas to avoid idle billing.
ECS / EKS
- Use Fargate Spot for fault-tolerant tasks (up to 70 % savings).
- Use Compute Savings Plans for steady Fargate workloads.
- For EKS: enable Karpenter for intelligent node provisioning.
- Right-size task/pod CPU and memory with Container Insights.
Data Transfer
| Path | Cost | Mitigation |
|---|
| Same AZ | Free | Co-locate services |
| Cross-AZ | $0.01/GB each way | Use AZ-aware routing |
| Internet egress | $0.09/GB first 10 TB | CloudFront ($0.085/GB), S3 Transfer Acceleration |
| Cross-region | $0.02/GB | Consolidate regions; use Global Accelerator |
| NAT Gateway processing | $0.045/GB | VPC Endpoints for S3/DynamoDB |
Tagging Strategy (Mandatory)
Every AWS resource MUST have these tags:
| Tag Key | Example | Purpose |
|---|
Environment | production | Filter by env |
Project | navigator | Cost allocation |
CostCenter | engineering | Chargeback |
Owner | team@example.com | Accountability |
ManagedBy | terraform | Audit |
Enforce via AWS Organizations Tag Policies and SCP deny rules.
Cost Evaluation Workflow
When asked to evaluate costs, follow this sequence:
- Inventory: List services, instance types, storage volumes, and data flows.
- Tag audit: Check for missing cost-allocation tags.
- Utilization check: Review CloudWatch metrics (CPU, memory, IOPS, network).
- Pricing check: Compare current pricing model vs optimal (RI/SP/Spot).
- Architecture review: Identify unnecessary data transfer, missing caching, idle resources.
- Report: Produce a table of findings with:
- Resource / Service
- Current monthly cost (estimated or from Cost Explorer)
- Suggested action
- Estimated savings (% and $)
- Effort (low / medium / high)
- Risk (low / medium / high)
Tools
- AWS Cost Explorer — Spend trends, forecasting, RI/SP recommendations
- AWS Compute Optimizer — EC2, EBS, Lambda right-sizing
- AWS Trusted Advisor — Idle resources, under-utilized instances
- AWS Cost Anomaly Detection — ML-based spend anomaly alerts
- AWS Budgets — Threshold alerts and auto-actions
- Kubecost — Kubernetes cost allocation (EKS)
Implementation Details
For Terraform, AWS CLI, and boto3 code examples, see implementation-playbook.md.
Related Skills
aws-serverless — Serverless architecture patterns (Lambda, API Gateway, DynamoDB)
production-dockerfile — Containerization best practices