Run any Skill in Manus with one click

$pwd:

data-engineering-lifecycle-and-principles

Name: Data Engineering Lifecycle And Principles
Author: AlexYedi

// Apply Reis & Housley's Data Engineering Lifecycle (Generation → Storage → Ingestion → Transformation → Serving) plus the six undercurrents (Security, Data Management, DataOps, Data Architecture, Orchestration, Software Engineering) and the nine architecture principles (common components, plan for failure, scalability, leadership, always architecting, loose coupling, reversibility, security, FinOps). Use when scoping a new data platform, diagnosing why a data system is failing, deciding what role / team structure a company needs, or evaluating maturity. Triggers: "build a data platform", "are we doing data engineering right", "what's the data engineering lifecycle", "data team structure", "data maturity", "data engineering principles", "data engineer vs data scientist".

Run Skill in Manus

$ git log --oneline --stat

stars:1

forks:0

updated:May 6, 2026 at 19:27

SKILL.md

readonly

package.json

"author": "AlexYedi"

"repository": "AlexYedi/alex-agents-skills"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name

data-engineering-lifecycle-and-principles

description

Apply Reis & Housley's Data Engineering Lifecycle (Generation → Storage → Ingestion → Transformation → Serving) plus the six undercurrents (Security, Data Management, DataOps, Data Architecture, Orchestration, Software Engineering) and the nine architecture principles (common components, plan for failure, scalability, leadership, always architecting, loose coupling, reversibility, security, FinOps). Use when scoping a new data platform, diagnosing why a data system is failing, deciding what role / team structure a company needs, or evaluating maturity. Triggers: "build a data platform", "are we doing data engineering right", "what's the data engineering lifecycle", "data team structure", "data maturity", "data engineering principles", "data engineer vs data scientist".

Data Engineering Lifecycle and Principles

You apply Joe Reis & Matt Housley's framework from Fundamentals of Data Engineering: data engineering is a discipline organized around a lifecycle (end-to-end flow of data) and undercurrents (cross-cutting concerns), guided by nine architecture principles that hold across cloud-native data systems.

When to Use This Skill

Scoping a new data platform from scratch — what stages and undercurrents must be addressed
Diagnosing why an existing data team is firefighting — likely a missing undercurrent
Deciding team structure: when to split data engineer / analytics engineer / data scientist roles
Evaluating organizational data maturity (Starting / Scaling / Leading)
Greenfield architecture review against the nine principles
Educating a non-data leader on what data engineering actually is

The Lifecycle (Five Stages)

GENERATION → STORAGE → INGESTION → TRANSFORMATION → SERVING
                                                       │
                                                       ├── Analytics (BI, dashboards)
                                                       ├── ML (training data, features)
                                                       └── Reverse ETL (back to SaaS)

Each stage has its own concerns. Storage is unique — it spans all stages because data is stored at every transition.

Stage	Primary concern	Common pitfalls
Generation	Source system reliability, schema, volume	Engineers ignored upstream; "we'll deal with it downstream"
Storage	Cost vs latency tradeoff; retention; format	Storing raw forever; one-size-fits-all storage
Ingestion	Frequency (batch vs streaming), idempotency, ordering	Building bespoke pipelines instead of using CDC tooling
Transformation	Schema evolution, lineage, idempotency, testability	Untested SQL; no rerun safety; no semantic layer
Serving	Query patterns, freshness SLAs, access control	Serving raw data without modeling; no semantic layer

The Six Undercurrents

Cross-cutting concerns that span every lifecycle stage. A weakness in any undercurrent eventually breaks the lifecycle.

Undercurrent	What it covers	Failure looks like
Security	Access control, encryption, secrets, audit, least privilege	Compliance fire drill; data leaks; broad-default access
Data Management	Lineage, governance, quality, master data, metadata, MDM	"Where does this number come from?" with no answer
DataOps	Monitoring, SLOs, deployments, on-call for data	3am pages with no runbook; silent data quality failures
Data Architecture	Platform design, integration patterns, build/buy decisions	Tool sprawl; bespoke everything; can't migrate
Orchestration	DAG management, dependency tracking, retries, scheduling	Cron jobs everywhere; no dependency graph; ad-hoc reruns
Software Engineering	Code quality, testing, version control, IaC, CI/CD	Spreadsheets; SQL in Slack; no PRs

The discipline diagnostic: if your team is missing one undercurrent, that's where the next outage / compliance issue / migration disaster will originate.

The Nine Architecture Principles

Source: Reis & Housley, distilled across multiple cloud architecture frameworks (AWS Well-Architected, Google Cloud's 5 principles, plus the authors' synthesis).

1. Choose Common Components Wisely

Pick widely adopted, interoperable building blocks. Avoid bespoke when commodity will do.

In practice: Postgres before custom-built. Airflow / Dagster / Prefect before homegrown DAG runner. Parquet / Iceberg / Delta before proprietary formats.

2. Plan for Failure

Every component fails. Define availability, durability, RTO (Recovery Time Objective), RPO (Recovery Point Objective) explicitly.

In practice: Multi-region for tier-1 data. Idempotent writes everywhere. Automated runbooks. Practiced restores, not theoretical ones.

3. Architect for Scalability

Scale up and scale down. Scale to zero is a feature; idle resources are waste.

In practice: Serverless for spiky loads. Auto-scaling for sustained ones. Separate compute from storage so each scales independently.

4. Architecture Is Leadership

Architects are senior engineers, not ivory-tower designers. They mentor, document, and decentralize decisions.

In practice: Architects on PRs. Architects writing ADRs. Architects pairing with juniors, not gatekeeping releases.

5. Always Be Architecting

The target architecture is a moving target. Continuously refactor; don't wait for "the rewrite."

In practice: Quarterly architecture reviews. Strangler-fig migrations. No "freeze for 6 months while we redesign."

6. Build Loosely Coupled Systems

Components communicate via abstraction layers (APIs, events, queues). Internal change is invisible to consumers.

In practice: Contract-tested APIs. Event-driven where producers and consumers shouldn't know each other. No shared database mutations between services.

7. Make Reversible Decisions

Default to choices that can be undone. The cost of being wrong should be hours, not years.

In practice: Open formats (Parquet over proprietary). Cloud-portable infrastructure (Terraform, Kubernetes). Multi-cloud-friendly storage abstractions. Don't sign 5-year DW contracts on month 1.

8. Prioritize Security

Security is part of design, not a final-stage checklist. Least privilege, zero trust, encryption at rest and in transit, secrets management.

In practice: No service accounts with admin. No long-lived credentials. Audit logs for every PII access. Secrets in a vault, not env vars in YAML.

9. Embrace FinOps

Cloud is consumption-based. Treat cost as a continuous engineering concern, not an end-of-quarter spreadsheet shock.

In practice: Cost-per-query dashboards. Tagged workloads. Compute warehouse auto-suspend. Engineering KPI: $/business-outcome, not $/raw-resource.

Data Maturity Model (Three Stages)

Stage	Characteristics	Right move	Wrong move
Starting with data	Ad hoc requests, spreadsheets, single analyst, no platform	Build the foundation: warehouse + ETL + dashboards. Defer ML.	Hire a data scientist as the first data hire.
Scaling with data	Multiple teams, formal practices emerging, data platform exists, governance gaps	Invest in undercurrents: lineage, quality, DataOps. Establish ownership.	Buy 8 vendor tools to "modernize" before fixing process.
Leading with data	Mature platform, data products, decentralized ownership (often Data Mesh), constant maintenance	Optimize, extend, govern. Push compute and decisions to domains.	Build a central monolith because "consistency".

Diagnostic: Maturity is determined by how the team operates, not how much it spends. A 3-person team using dbt + Snowflake + Airflow can be at "Scaling"; a 50-person team firefighting can be at "Starting".

Data Engineer Roles — Type A vs Type B

Type A (Abstraction): Builds on managed services and common tooling. Composes a stack from existing parts. The right hire for most companies.

Type B (Build): Develops custom tools and platforms when off-the-shelf doesn't fit (FAANG-scale problems, novel domains).

Internal-facing: BI, executive dashboards, finance reports. Stable schemas; correctness over latency.

External-facing: Product analytics, transactional event data, customer-visible dashboards. Latency-sensitive; reliability over fancy modeling.

Hiring rule: Identify Type A vs Type B and Internal vs External before writing the job description. Confusing these produces bad hires and frustrated engineers.

Principles

Lifecycle thinking beats stage thinking. A failure at Serving usually originates in Generation or Storage. Trace upstream.
Undercurrents are non-negotiable. Missing one will eventually outweigh all the gains from your shiniest pipeline tool.
Business value > technology novelty. A boring Postgres + cron stack that ships analytics weekly beats a bespoke streaming Kappa setup that's 6 months from production.
Reversibility > optimization. Optimizing for a wrong assumption is worse than carrying a 20% inefficiency that you can refactor away.
Cost is a design dimension. $5K/month of unattended Snowflake warehouse equals one engineer's headcount. Treat it accordingly.
Data quality is a feature, not a phase. Build assertions, monitors, and data contracts in from day 1. Adding them later is 10x harder.
Production-grade software discipline applies. Tests, version control, code review, IaC. "It's just a notebook" is an anti-pattern.

Anti-Patterns

Premature Machine Learning

Looks like: Hiring a data scientist before you have reliable ETL. Promising "AI features" with no production-quality training data.

Why it fails: The Data Science Hierarchy of Needs is real. 70-80% of a DS's time is foundational work that should be done by data engineering. Skipping the foundation produces unreliable models and frustrated DS hires.

The fix: Foundation first. Lifecycle stages 1-4 must be reliable before stage 5 (Serving for ML) is meaningful.

Tool-First Architecture

Looks like: "We need Snowflake/Databricks/Airflow because everyone has it." Tool selected before use case understood.

Why it fails: Tools don't solve undefined problems. Your team ends up customizing the tool to fit the unclear use case, multiplying effort.

The fix: Use case → access patterns → architecture → tool. In that order.

"Big Data" as the Default Frame

Looks like: Treating every dataset as if it were petabyte-scale. Choosing distributed everything.

Why it fails: Most companies have small data. Distributed systems are 10x more operational burden than a single large machine.

The fix: Measure your actual data volume and growth. A single Postgres instance handles 90% of business data. Don't reach for Spark until you have to.

One-Size-Fits-All Storage

Looks like: Everything in one warehouse, or everything in one lake, or everything in one database.

Why it fails: Workloads have different access patterns, latency requirements, and cost profiles. Forcing all into one tier overpays for some and underdelivers for others.

The fix: Layered storage. Hot path (low latency, expensive). Warm path (queryable, moderate cost). Cold path (archival, cheap). Each workload uses the appropriate tier.

Architects as Gatekeepers

Looks like: Architects produce designs from a tower; engineers implement them and find they don't fit reality.

Why it fails: Architects lose touch with the actual system. Designs miss constraints. Engineers feel disempowered. Architecture decisions ossify.

The fix: Architects on the team, in PRs, doing some implementation. Architects mentor and decentralize, not approve and gate.

Static Architecture

Looks like: "We did the architecture review last year. We'll redo it next year."

Why it fails: Business changes faster than annual reviews. Drift accumulates. The "next architecture rewrite" balloons until it's existential.

The fix: Always Be Architecting. Quarterly small refactors. Strangler-fig migrations. Never a 6-month freeze.

Cost as Afterthought

Looks like: End-of-quarter shock at the cloud bill. Engineers rewriting last quarter's code for cost.

Why it fails: Cost surfaces are far from the engineer who controls them. By the time finance complains, the architecture is locked in.

The fix: Cost telemetry on every workload. Engineering KPI on $/outcome. Tag everything. Auto-suspend idle resources.

Decision Rules

Situation	Action
Greenfield data platform, < 50 person company	Cloud DW (Snowflake / BigQuery / Redshift) + dbt + managed orchestrator. Skip custom.
Real-time analytics required	Confirm "real time" is genuinely sub-second; usually "near-real-time" (1-10 min) is enough and 10x simpler.
First data hire	Type A internal-facing data engineer. Not data scientist. Not architect.
ML team requesting features	Confirm DE foundation is solid. If not, fix that first.
Multiple teams want different DWs	Either Data Mesh (decentralized, but requires maturity) or one DW with strong governance. Don't pretend "5 DWs is fine, they'll integrate later."
Vendor proposing 3-year contract	Reject. Reversibility principle. Negotiate a 1-year with extension options.
Tool sprawl complaint	Audit by undercurrent. Likely some undercurrent has 4 tools that should be 1.
"We need a data scientist"	Verify lifecycle stages 1-4 are reliable. If not, hire a data engineer first.
Annual architecture review only	Switch to quarterly. Always Be Architecting.
Cloud bill surprise	Establish cost telemetry. Tag every workload. Set per-workload budget alerts.
Data quality issue blamed on engineer	Likely a missing undercurrent (governance, lineage, monitoring). Fix the system, not the person.

Worked Example: Diagnosing a Failing Data Team

Context: Series-B startup, 50 employees, 4-person data team. Constant fires: stale dashboards, exec frustration, two recent compliance escalations.

Lifecycle audit:

Stage	State	Issue
Generation	OK	Production app emits clean events
Storage	Cluttered	One Snowflake DB, no schemas, no governance
Ingestion	Bespoke	Custom Python scripts on cron; no idempotency; no monitoring
Transformation	Untested	Inline SQL in Looker; no version control
Serving	Slow	Looker queries hit raw tables; 2-min response time

Undercurrents audit:

Undercurrent	State	Issue
Security	Failing	Service account with admin; PII in shared schemas
Data Management	Failing	No lineage, no MDM, "what is a customer" varies
DataOps	Failing	No SLOs; no monitoring; on-call is one engineer's Slack
Data Architecture	OK	Snowflake choice was sound
Orchestration	Failing	Cron + Python; no DAG visibility
Software Engineering	Failing	No PR review on transformations; no tests; no IaC

Diagnosis: Architecture is OK. Five of six undercurrents are failing. This is why fires don't stop.

Plan (six months, in priority order):

DataOps: Adopt dbt + Airflow. Move SQL to version control. Add tests on every model. (Wins: ingestion + transformation + orchestration + SE undercurrents.)
Data Management: Define data product owners. Adopt a metadata catalog (DataHub / Atlan / OpenMetadata). Establish data contracts at ingestion.
Security: Audit access. Remove admin service accounts. Mask PII in non-prod. Enforce least privilege.
DataOps continued: Add SLOs on freshness and quality. Monitor with synthetic queries. Establish on-call runbook.
Serving: Build a semantic layer. Materialize core dimensional models. Move Looker off raw tables.
Architecture review: Now that fires are out, quarterly architecture reviews to keep undercurrents healthy.

Lesson: Tool changes are the easy part. The undercurrents are the work. A data team that's "constantly behind" is almost always missing undercurrents, not architectural pieces.

Gotchas

Lifecycle is not waterfall. Stages happen continuously and concurrently. The diagram is conceptual.
Undercurrents aren't optional even at small scale. A 2-person team still needs DataOps; it just looks different (a Slack channel + dbt tests instead of a 24/7 on-call).
"Type A vs Type B" is not seniority. Both are senior roles, just with different focus. Confusing them is a hiring error.
Maturity is per-undercurrent. A team can be Leading on Architecture and Starting on Security simultaneously. Audit each independently.
The 9 principles can conflict. "Plan for failure" (multi-region) is in tension with "FinOps" (single-region is cheaper). Architecture is making the tradeoff explicit.
Reversibility is a tax. Reversible choices are sometimes 10-20% slower or pricier than locked-in ones. The principle says: pay the tax for important decisions, not for trivial ones.
"Data engineer" is not a unified role. Different companies mean different things. Read the job description carefully — it could be ETL plumber, analytics engineer, or platform builder.

Data Engineering Lifecycle and Principles

When to Use This Skill

Scoping a new data platform from scratch — what stages and undercurrents must be addressed
Diagnosing why an existing data team is firefighting — likely a missing undercurrent
Deciding team structure: when to split data engineer / analytics engineer / data scientist roles
Evaluating organizational data maturity (Starting / Scaling / Leading)
Greenfield architecture review against the nine principles
Educating a non-data leader on what data engineering actually is

The Lifecycle (Five Stages)

GENERATION → STORAGE → INGESTION → TRANSFORMATION → SERVING
                                                       │
                                                       ├── Analytics (BI, dashboards)
                                                       ├── ML (training data, features)
                                                       └── Reverse ETL (back to SaaS)

Each stage has its own concerns. Storage is unique — it spans all stages because data is stored at every transition.

Stage	Primary concern	Common pitfalls
Generation	Source system reliability, schema, volume	Engineers ignored upstream; "we'll deal with it downstream"
Storage	Cost vs latency tradeoff; retention; format	Storing raw forever; one-size-fits-all storage
Ingestion	Frequency (batch vs streaming), idempotency, ordering	Building bespoke pipelines instead of using CDC tooling
Transformation	Schema evolution, lineage, idempotency, testability	Untested SQL; no rerun safety; no semantic layer
Serving	Query patterns, freshness SLAs, access control	Serving raw data without modeling; no semantic layer

The Six Undercurrents

Cross-cutting concerns that span every lifecycle stage. A weakness in any undercurrent eventually breaks the lifecycle.

Undercurrent	What it covers	Failure looks like
Security	Access control, encryption, secrets, audit, least privilege	Compliance fire drill; data leaks; broad-default access
Data Management	Lineage, governance, quality, master data, metadata, MDM	"Where does this number come from?" with no answer
DataOps	Monitoring, SLOs, deployments, on-call for data	3am pages with no runbook; silent data quality failures
Data Architecture	Platform design, integration patterns, build/buy decisions	Tool sprawl; bespoke everything; can't migrate
Orchestration	DAG management, dependency tracking, retries, scheduling	Cron jobs everywhere; no dependency graph; ad-hoc reruns
Software Engineering	Code quality, testing, version control, IaC, CI/CD	Spreadsheets; SQL in Slack; no PRs

The discipline diagnostic: if your team is missing one undercurrent, that's where the next outage / compliance issue / migration disaster will originate.

The Nine Architecture Principles

Source: Reis & Housley, distilled across multiple cloud architecture frameworks (AWS Well-Architected, Google Cloud's 5 principles, plus the authors' synthesis).

1. Choose Common Components Wisely

Pick widely adopted, interoperable building blocks. Avoid bespoke when commodity will do.

In practice: Postgres before custom-built. Airflow / Dagster / Prefect before homegrown DAG runner. Parquet / Iceberg / Delta before proprietary formats.

2. Plan for Failure

Every component fails. Define availability, durability, RTO (Recovery Time Objective), RPO (Recovery Point Objective) explicitly.

In practice: Multi-region for tier-1 data. Idempotent writes everywhere. Automated runbooks. Practiced restores, not theoretical ones.

3. Architect for Scalability

Scale up and scale down. Scale to zero is a feature; idle resources are waste.

In practice: Serverless for spiky loads. Auto-scaling for sustained ones. Separate compute from storage so each scales independently.

4. Architecture Is Leadership

Architects are senior engineers, not ivory-tower designers. They mentor, document, and decentralize decisions.

In practice: Architects on PRs. Architects writing ADRs. Architects pairing with juniors, not gatekeeping releases.

5. Always Be Architecting

The target architecture is a moving target. Continuously refactor; don't wait for "the rewrite."

In practice: Quarterly architecture reviews. Strangler-fig migrations. No "freeze for 6 months while we redesign."

6. Build Loosely Coupled Systems

Components communicate via abstraction layers (APIs, events, queues). Internal change is invisible to consumers.

In practice: Contract-tested APIs. Event-driven where producers and consumers shouldn't know each other. No shared database mutations between services.

7. Make Reversible Decisions

Default to choices that can be undone. The cost of being wrong should be hours, not years.

In practice: Open formats (Parquet over proprietary). Cloud-portable infrastructure (Terraform, Kubernetes). Multi-cloud-friendly storage abstractions. Don't sign 5-year DW contracts on month 1.

8. Prioritize Security

Security is part of design, not a final-stage checklist. Least privilege, zero trust, encryption at rest and in transit, secrets management.

In practice: No service accounts with admin. No long-lived credentials. Audit logs for every PII access. Secrets in a vault, not env vars in YAML.

9. Embrace FinOps

Cloud is consumption-based. Treat cost as a continuous engineering concern, not an end-of-quarter spreadsheet shock.

In practice: Cost-per-query dashboards. Tagged workloads. Compute warehouse auto-suspend. Engineering KPI: $/business-outcome, not $/raw-resource.

Data Maturity Model (Three Stages)

Stage	Characteristics	Right move	Wrong move
Starting with data	Ad hoc requests, spreadsheets, single analyst, no platform	Build the foundation: warehouse + ETL + dashboards. Defer ML.	Hire a data scientist as the first data hire.
Scaling with data	Multiple teams, formal practices emerging, data platform exists, governance gaps	Invest in undercurrents: lineage, quality, DataOps. Establish ownership.	Buy 8 vendor tools to "modernize" before fixing process.
Leading with data	Mature platform, data products, decentralized ownership (often Data Mesh), constant maintenance	Optimize, extend, govern. Push compute and decisions to domains.	Build a central monolith because "consistency".

Data Engineer Roles — Type A vs Type B

Type A (Abstraction): Builds on managed services and common tooling. Composes a stack from existing parts. The right hire for most companies.

Type B (Build): Develops custom tools and platforms when off-the-shelf doesn't fit (FAANG-scale problems, novel domains).

Internal-facing: BI, executive dashboards, finance reports. Stable schemas; correctness over latency.

External-facing: Product analytics, transactional event data, customer-visible dashboards. Latency-sensitive; reliability over fancy modeling.

Hiring rule: Identify Type A vs Type B and Internal vs External before writing the job description. Confusing these produces bad hires and frustrated engineers.

Principles

Lifecycle thinking beats stage thinking. A failure at Serving usually originates in Generation or Storage. Trace upstream.
Undercurrents are non-negotiable. Missing one will eventually outweigh all the gains from your shiniest pipeline tool.
Business value > technology novelty. A boring Postgres + cron stack that ships analytics weekly beats a bespoke streaming Kappa setup that's 6 months from production.
Reversibility > optimization. Optimizing for a wrong assumption is worse than carrying a 20% inefficiency that you can refactor away.
Cost is a design dimension. $5K/month of unattended Snowflake warehouse equals one engineer's headcount. Treat it accordingly.
Data quality is a feature, not a phase. Build assertions, monitors, and data contracts in from day 1. Adding them later is 10x harder.
Production-grade software discipline applies. Tests, version control, code review, IaC. "It's just a notebook" is an anti-pattern.

Anti-Patterns

Premature Machine Learning

Looks like: Hiring a data scientist before you have reliable ETL. Promising "AI features" with no production-quality training data.

The fix: Foundation first. Lifecycle stages 1-4 must be reliable before stage 5 (Serving for ML) is meaningful.

Tool-First Architecture

Looks like: "We need Snowflake/Databricks/Airflow because everyone has it." Tool selected before use case understood.

Why it fails: Tools don't solve undefined problems. Your team ends up customizing the tool to fit the unclear use case, multiplying effort.

The fix: Use case → access patterns → architecture → tool. In that order.

"Big Data" as the Default Frame

Looks like: Treating every dataset as if it were petabyte-scale. Choosing distributed everything.

Why it fails: Most companies have small data. Distributed systems are 10x more operational burden than a single large machine.

The fix: Measure your actual data volume and growth. A single Postgres instance handles 90% of business data. Don't reach for Spark until you have to.

One-Size-Fits-All Storage

Looks like: Everything in one warehouse, or everything in one lake, or everything in one database.

Why it fails: Workloads have different access patterns, latency requirements, and cost profiles. Forcing all into one tier overpays for some and underdelivers for others.

The fix: Layered storage. Hot path (low latency, expensive). Warm path (queryable, moderate cost). Cold path (archival, cheap). Each workload uses the appropriate tier.

Architects as Gatekeepers

Looks like: Architects produce designs from a tower; engineers implement them and find they don't fit reality.

Why it fails: Architects lose touch with the actual system. Designs miss constraints. Engineers feel disempowered. Architecture decisions ossify.

The fix: Architects on the team, in PRs, doing some implementation. Architects mentor and decentralize, not approve and gate.

Static Architecture

Looks like: "We did the architecture review last year. We'll redo it next year."

Why it fails: Business changes faster than annual reviews. Drift accumulates. The "next architecture rewrite" balloons until it's existential.

The fix: Always Be Architecting. Quarterly small refactors. Strangler-fig migrations. Never a 6-month freeze.

Cost as Afterthought

Looks like: End-of-quarter shock at the cloud bill. Engineers rewriting last quarter's code for cost.

Why it fails: Cost surfaces are far from the engineer who controls them. By the time finance complains, the architecture is locked in.

The fix: Cost telemetry on every workload. Engineering KPI on $/outcome. Tag everything. Auto-suspend idle resources.

Decision Rules

Situation	Action
Greenfield data platform, < 50 person company	Cloud DW (Snowflake / BigQuery / Redshift) + dbt + managed orchestrator. Skip custom.
Real-time analytics required	Confirm "real time" is genuinely sub-second; usually "near-real-time" (1-10 min) is enough and 10x simpler.
First data hire	Type A internal-facing data engineer. Not data scientist. Not architect.
ML team requesting features	Confirm DE foundation is solid. If not, fix that first.
Multiple teams want different DWs	Either Data Mesh (decentralized, but requires maturity) or one DW with strong governance. Don't pretend "5 DWs is fine, they'll integrate later."
Vendor proposing 3-year contract	Reject. Reversibility principle. Negotiate a 1-year with extension options.
Tool sprawl complaint	Audit by undercurrent. Likely some undercurrent has 4 tools that should be 1.
"We need a data scientist"	Verify lifecycle stages 1-4 are reliable. If not, hire a data engineer first.
Annual architecture review only	Switch to quarterly. Always Be Architecting.
Cloud bill surprise	Establish cost telemetry. Tag every workload. Set per-workload budget alerts.
Data quality issue blamed on engineer	Likely a missing undercurrent (governance, lineage, monitoring). Fix the system, not the person.

Worked Example: Diagnosing a Failing Data Team

Context: Series-B startup, 50 employees, 4-person data team. Constant fires: stale dashboards, exec frustration, two recent compliance escalations.

Lifecycle audit:

Stage	State	Issue
Generation	OK	Production app emits clean events
Storage	Cluttered	One Snowflake DB, no schemas, no governance
Ingestion	Bespoke	Custom Python scripts on cron; no idempotency; no monitoring
Transformation	Untested	Inline SQL in Looker; no version control
Serving	Slow	Looker queries hit raw tables; 2-min response time

Undercurrents audit:

Undercurrent	State	Issue
Security	Failing	Service account with admin; PII in shared schemas
Data Management	Failing	No lineage, no MDM, "what is a customer" varies
DataOps	Failing	No SLOs; no monitoring; on-call is one engineer's Slack
Data Architecture	OK	Snowflake choice was sound
Orchestration	Failing	Cron + Python; no DAG visibility
Software Engineering	Failing	No PR review on transformations; no tests; no IaC

Diagnosis: Architecture is OK. Five of six undercurrents are failing. This is why fires don't stop.

Plan (six months, in priority order):

DataOps: Adopt dbt + Airflow. Move SQL to version control. Add tests on every model. (Wins: ingestion + transformation + orchestration + SE undercurrents.)
Data Management: Define data product owners. Adopt a metadata catalog (DataHub / Atlan / OpenMetadata). Establish data contracts at ingestion.
Security: Audit access. Remove admin service accounts. Mask PII in non-prod. Enforce least privilege.
DataOps continued: Add SLOs on freshness and quality. Monitor with synthetic queries. Establish on-call runbook.
Serving: Build a semantic layer. Materialize core dimensional models. Move Looker off raw tables.
Architecture review: Now that fires are out, quarterly architecture reviews to keep undercurrents healthy.

Lesson: Tool changes are the easy part. The undercurrents are the work. A data team that's "constantly behind" is almost always missing undercurrents, not architectural pieces.

Gotchas

Lifecycle is not waterfall. Stages happen continuously and concurrently. The diagram is conceptual.
Undercurrents aren't optional even at small scale. A 2-person team still needs DataOps; it just looks different (a Slack channel + dbt tests instead of a 24/7 on-call).
"Type A vs Type B" is not seniority. Both are senior roles, just with different focus. Confusing them is a hiring error.
Maturity is per-undercurrent. A team can be Leading on Architecture and Starting on Security simultaneously. Audit each independently.
The 9 principles can conflict. "Plan for failure" (multi-region) is in tension with "FinOps" (single-region is cheaper). Architecture is making the tradeoff explicit.
Reversibility is a tax. Reversible choices are sometimes 10-20% slower or pricier than locked-in ones. The principle says: pay the tax for important decisions, not for trivial ones.
"Data engineer" is not a unified role. Different companies mean different things. Read the job description carefully — it could be ETL plumber, analytics engineer, or platform builder.

data-engineering-lifecycle-and-principles

Data Engineering Lifecycle and Principles

When to Use This Skill

The Lifecycle (Five Stages)

The Six Undercurrents

The Nine Architecture Principles

1. Choose Common Components Wisely

2. Plan for Failure

3. Architect for Scalability

4. Architecture Is Leadership

5. Always Be Architecting

6. Build Loosely Coupled Systems

7. Make Reversible Decisions

8. Prioritize Security

9. Embrace FinOps

Data Maturity Model (Three Stages)

Data Engineer Roles — Type A vs Type B

Principles

Anti-Patterns

Premature Machine Learning

Tool-First Architecture

"Big Data" as the Default Frame

One-Size-Fits-All Storage

Architects as Gatekeepers

Static Architecture

Cost as Afterthought

Decision Rules

Worked Example: Diagnosing a Failing Data Team

Gotchas

Further Reading

Data Engineering Lifecycle and Principles

When to Use This Skill

The Lifecycle (Five Stages)

The Six Undercurrents

The Nine Architecture Principles

1. Choose Common Components Wisely

2. Plan for Failure

3. Architect for Scalability

4. Architecture Is Leadership

5. Always Be Architecting

6. Build Loosely Coupled Systems

7. Make Reversible Decisions

8. Prioritize Security

9. Embrace FinOps

Data Maturity Model (Three Stages)

Data Engineer Roles — Type A vs Type B

Principles

Anti-Patterns

Premature Machine Learning

Tool-First Architecture

"Big Data" as the Default Frame

One-Size-Fits-All Storage

Architects as Gatekeepers

Static Architecture

Cost as Afterthought

Decision Rules

Worked Example: Diagnosing a Failing Data Team

Gotchas

Further Reading