Manus에서 모든 스킬 실행
원클릭으로
원클릭으로
원클릭으로 Manus에서 모든 스킬 실행
시작하기data-warehousing
Snowflake, BigQuery, Redshift, dimensional modeling, and modern data warehouse architecture
스타4
포크1
업데이트2025년 12월 30일 12:44
파일 탐색기
6 개 파일SKILL.md
readonly메뉴
Snowflake, BigQuery, Redshift, dimensional modeling, and modern data warehouse architecture
FastAPI, REST APIs, GraphQL, data service design, and API best practices
Apache Spark, Hadoop, distributed computing, and large-scale data processing for petabyte-scale workloads
Portfolio building, technical interviews, job search strategies, and continuous learning
GitHub Actions, GitLab CI, Jenkins, and automated deployment pipelines
AWS, GCP, Azure data platforms, infrastructure as code, and cloud-native data solutions
Docker, Kubernetes, container orchestration, and cloud-native deployment for data applications
| name | data-warehousing |
| description | Snowflake, BigQuery, Redshift, dimensional modeling, and modern data warehouse architecture |
| sasmp_version | 1.3.0 |
| bonded_agent | 01-data-engineer |
| bond_type | PRIMARY_BOND |
| skill_version | 2.0.0 |
| last_updated | 2025-01 |
| complexity | intermediate |
| estimated_mastery_hours | 130 |
| prerequisites | ["sql-databases"] |
| unlocks | ["big-data","etl-tools"] |
Production-grade data warehouse design with Snowflake, BigQuery, and dimensional modeling patterns.
-- Snowflake Modern Data Warehouse Setup
CREATE WAREHOUSE analytics_wh
WITH WAREHOUSE_SIZE = 'MEDIUM'
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT = 4;
-- Create dimensional model
CREATE TABLE marts.fact_orders (
order_key BIGINT AUTOINCREMENT PRIMARY KEY,
date_key INT NOT NULL REFERENCES dim_date(date_key),
customer_key INT NOT NULL,
product_key INT NOT NULL,
quantity INT NOT NULL,
unit_price DECIMAL(10,2) NOT NULL,
total_amount DECIMAL(12,2) NOT NULL,
_loaded_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
) CLUSTER BY (date_key);
-- Dimension with SCD Type 2
CREATE TABLE marts.dim_customer (
customer_key INT AUTOINCREMENT PRIMARY KEY,
customer_id VARCHAR(50) NOT NULL,
customer_name VARCHAR(255),
segment VARCHAR(50),
valid_from DATE NOT NULL,
valid_to DATE DEFAULT '9999-12-31',
is_current BOOLEAN DEFAULT TRUE
);
-- Star Schema Design
-- Fact table: measurable business events
-- Dimension tables: context for analysis
-- Date dimension (conformed)
CREATE TABLE dim_date (
date_key INT PRIMARY KEY,
full_date DATE NOT NULL,
day_of_week INT,
day_name VARCHAR(10),
month_num INT,
month_name VARCHAR(10),
quarter INT,
year INT,
is_weekend BOOLEAN,
fiscal_year INT,
fiscal_quarter INT
);
-- SCD Type 2 MERGE pattern
MERGE INTO dim_customer AS target
USING staging_customer AS source
ON target.customer_id = source.customer_id AND target.is_current = TRUE
WHEN MATCHED AND (
target.customer_name != source.customer_name OR
target.segment != source.segment
) THEN UPDATE SET valid_to = CURRENT_DATE - 1, is_current = FALSE
WHEN NOT MATCHED THEN INSERT (
customer_id, customer_name, segment, valid_from
) VALUES (
source.customer_id, source.customer_name, source.segment, CURRENT_DATE
);
-- Clustering for performance
ALTER TABLE fact_orders CLUSTER BY (date_key, customer_key);
SELECT SYSTEM$CLUSTERING_INFORMATION('fact_orders');
-- Materialized views for aggregations
CREATE MATERIALIZED VIEW mv_daily_sales AS
SELECT date_key, SUM(total_amount) AS daily_revenue, COUNT(*) AS order_count
FROM fact_orders GROUP BY date_key;
-- Search optimization
ALTER TABLE fact_orders ADD SEARCH OPTIMIZATION ON EQUALITY(order_id);
-- Time travel for debugging
SELECT * FROM fact_orders AT(TIMESTAMP => '2024-01-15 10:00:00'::TIMESTAMP);
-- Zero-copy cloning
CREATE TABLE fact_orders_dev CLONE fact_orders;
-- Partitioned and clustered table
CREATE TABLE `project.dataset.fact_events`
PARTITION BY DATE(event_timestamp)
CLUSTER BY user_id, event_type
OPTIONS (partition_expiration_days = 365, require_partition_filter = TRUE)
AS SELECT * FROM source_events;
-- Efficient query with partition pruning
SELECT event_type, COUNT(*) AS event_count
FROM `project.dataset.fact_events`
WHERE DATE(event_timestamp) BETWEEN '2024-01-01' AND '2024-01-31'
GROUP BY event_type;
-- BigQuery ML inline
CREATE OR REPLACE MODEL `project.dataset.churn_model`
OPTIONS (model_type = 'LOGISTIC_REG', input_label_cols = ['churned'])
AS SELECT tenure_months, monthly_spend, churned FROM customer_features;
| Tool | Purpose | Version (2025) |
|---|---|---|
| Snowflake | Cloud data warehouse | Latest |
| BigQuery | Serverless analytics | Latest |
| Redshift | AWS data warehouse | Serverless |
| Databricks SQL | Lakehouse analytics | Latest |
| dbt | Transformation | 1.7+ |
| Monte Carlo | Data observability | Latest |
| Issue | Symptoms | Root Cause | Fix |
|---|---|---|---|
| Slow Query | Query timeout | No clustering | Add clustering key |
| High Cost | Budget exceeded | Large warehouse | Auto-suspend, right-size |
| Data Skew | Uneven processing | Poor partition key | Choose better key |
-- ✅ DO: Use surrogate keys
customer_key INT AUTOINCREMENT PRIMARY KEY
-- ✅ DO: Add audit columns
_loaded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
-- ✅ DO: Cluster on filter columns
CLUSTER BY (date_key)
-- ❌ DON'T: Use natural keys as PK
-- ❌ DON'T: SELECT * in production
Skill Certification Checklist: