| name | terraform-aws |
| description | Build, review, and troubleshoot Terraform configurations using the AWS provider. Use for provider setup, authentication patterns, tagging strategy, multi-region usage, import/migration, and AWS-specific safety issues. |
Terraform AWS Provider
Provider Baseline
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 6.36.0"
}
}
}
provider "aws" {
region = "us-east-1"
default_tags {
tags = {
environment = "dev"
owner = "the_foundry"
terraform = "True"
}
}
}
Terragrunt Structure
The infrastructure uses two Terragrunt repos with different responsibilities:
| Repo | Purpose | State bucket |
|---|
terragrunt-aws | Shared infra: ECR registries, KMS keys, CodeArtifact | the-foundry-terraform-dev |
ubichat-terragrunt-aws | App deployments: ECS tasks, security groups, service discovery | the-foundry-terraform-dev |
Both use root.hcl to generate providers (AWS + Vault) and remote state (S3 backend).
Adding a new ECR repository
ECR repositories are managed centrally in terragrunt-aws, not in per-app repos.
Module: modules/ecr/main.tf
resource "aws_ecr_repository" "repo" {
for_each = toset(var.repository_list)
name = each.key
image_tag_mutability = "MUTABLE"
encryption_configuration {
encryption_type = "KMS"
kms_key = aws_kms_key.ecs.arn
}
image_scanning_configuration {
scan_on_push = true
}
}
resource "aws_ecr_repository_policy" "repo_policy" {
for_each = aws_ecr_repository.repo
repository = each.value.name
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Sid = "AllowPull"
Effect = "Allow"
Principal = { AWS = var.ecr_pull_principal }
Action = [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability"
]
}]
})
}
To add a new repo, edit live/dev/ecr/terragrunt.hcl and append to repository_list:
# live/dev/ecr/terragrunt.hcl
inputs = {
ecr_pull_principal = "arn:aws:iam::<PROD_ACCOUNT_ID>:root"
repository_list = [
# ... existing repos ...
"my-new-service",
]
}
Then apply:
cd terragrunt-aws/live/dev/ecr
terragrunt apply
Never create ECR repos manually via AWS Console — they won't have proper KMS encryption, scan-on-push, or cross-account pull policies, and they won't be in Terraform state.
Deploying an ECS task (from ubichat-terragrunt-aws)
ECS tasks consume ECR image URIs as input variables:
# live/dev/my-service/terragrunt.hcl
inputs = {
name = "ubichat"
compute_subnet_ids = ["subnet-xxx", "subnet-yyy"]
vpc_id = "vpc-zzz"
image_uri = "<ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/my-new-service:v0.1.0"
}
The module references a shared ECS task module:
module "my_service" {
source = "git::ssh://git@gitlab-ncsa.ubisoft.org/the-foundry/admin/tf-modules/aws-ecs-task?ref=<commit>"
environment = var.environment
project_name = var.name
name = "my-service"
cluster_name = data.aws_ecs_cluster.ecs.cluster_name
namespace = data.aws_service_discovery_http_namespace.ecs.arn
subnets = var.compute_subnet_ids
# ...
}
Full deploy workflow (example: atlassian-mcp-servers)
cd terragrunt-aws/live/dev/ecr
terragrunt apply
cd atlassian-mcp-servers
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com
docker build --build-arg SERVER=jira \
-t <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/atlassian-mcp-servers:jira-mcp-v0.1.0 .
docker push <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/atlassian-mcp-servers:jira-mcp-v0.1.0
docker build --build-arg SERVER=confluence \
-t <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/atlassian-mcp-servers:confluence-mcp-v0.1.0 .
docker push <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/atlassian-mcp-servers:confluence-mcp-v0.1.0
cd ubichat-terragrunt-aws/live/dev/atlassian-mcp-servers
terragrunt apply
Multi-Region (v6+)
Prefer top-level region on resources/data sources over aliased providers:
resource "aws_vpc" "west" {
region = "us-west-2"
cidr_block = "10.1.0.0/16"
}
Import in a specific region:
terraform import aws_vpc.test_vpc vpc-a01106c2@eu-west-1
AWS-Specific Pitfalls
- IAM propagation delay — role/policy changes take seconds to propagate; expect transient auth errors after create.
- Eventual consistency — "not found" errors immediately after create are normal for IAM, networking, S3.
- Resource replacement — immutable attribute changes trigger destroy+create; always review plan for
# forces replacement.
ignore_changes drift — broad ignore masks real drift; scope it narrowly and document why.
- ECR manual creation — never create ECR repos manually. They need KMS encryption, scan-on-push, and cross-account pull policies that the Terraform module provides. Manual repos block prod deployments.
ignore_tags for External Controllers
Use when external systems (e.g. Kubernetes) own tag namespaces:
provider "aws" {
ignore_tags {
key_prefixes = ["kubernetes.io/"]
}
}
Retry for Transient API Errors
provider "aws" {
region = "us-east-1"
max_retries = 25
retry_mode = "standard"
}