name	architecture-review
description	Architecture evaluation criteria and technology standards for the homelab. Preloaded into the designer agent to ground design decisions in established patterns and principles. Use when: (1) Evaluating a proposed technology addition, (2) Reviewing architecture decisions, (3) Assessing stack fit for a new component, (4) Comparing implementation approaches. Triggers: "architecture review", "evaluate technology", "stack fit", "should we use", "technology comparison", "design review", "architecture decision"
user-invocable	false

Architecture Evaluation Framework

Current Technology Stack

Current technology stack: see references/technology-decisions.md

Evaluation Criteria

When evaluating any proposed technology addition or architecture change, score against these criteria:

1. Principle Alignment

Score each core principle (Strong/Weak/Neutral):

Enterprise at Home: Does it reflect production-grade patterns?
Everything as Code: Can it be fully represented in git?
Automation is Key: Does it reduce or increase manual toil?
Learning First: Does it teach valuable enterprise skills?
DRY and Code Reuse: Does it leverage existing patterns or create duplication?

2. Stack Fit

Does this overlap with existing tools? (e.g., adding Redis when Dragonfly exists)
Does it integrate with the GitOps workflow? (Must be Flux-deployable)
Does it work on bare-metal? (No cloud-only services)
Does it support the multi-cluster model? (dev → integration → live)

3. Operational Cost

How is it monitored? (Must integrate with kube-prometheus-stack)
How is it backed up? (Must have a recovery story)
How does it handle upgrades? (Must be declarative, ideally via Renovate)
What's the failure blast radius? (Isolated > cluster-wide)

4. Complexity Budget

Is the complexity justified by the learning value?
Could a simpler existing tool solve the same problem?
What's the maintenance burden over 12 months?

5. Alternative Analysis

What existing stack components could solve this? (Always check first)
What are the top 2-3 alternatives in the ecosystem?
What do other production homelabs use? (kubesearch research)

6. Failure Modes

What happens when this component is unavailable?
How does it interact with network policies? (Default deny)
What's the recovery procedure? (Must be documented in a runbook)
Can it self-heal? (Strong preference for self-healing)

Anti-Patterns to Challenge

Anti-Pattern	Why It's Wrong	Correct Approach
"Just run a container" without monitoring	Invisible failures, no alerting	ServiceMonitor + PrometheusRule required
Adding a new tool when existing ones suffice	Stack bloat, maintenance burden	Evaluate existing stack first
Skipping observability "for now"	Technical debt that never gets paid	Monitoring is day-1, not day-2
Cloud-only services	Vendor lock-in, can't run on bare-metal	Self-hosted alternatives preferred
Single-instance without HA story	Single point of failure	At minimum, document recovery procedure

Architecture Evaluation Framework

Current Technology Stack

Evaluation Criteria

When evaluating any proposed technology addition or architecture change, score against these criteria:

1. Principle Alignment

Score each core principle (Strong/Weak/Neutral):

Enterprise at Home: Does it reflect production-grade patterns?

Everything as Code: Can it be fully represented in git?

Automation is Key: Does it reduce or increase manual toil?

Learning First: Does it teach valuable enterprise skills?

DRY and Code Reuse: Does it leverage existing patterns or create duplication?

2. Stack Fit

Does this overlap with existing tools? (e.g., adding Redis when Dragonfly exists)

Does it integrate with the GitOps workflow? (Must be Flux-deployable)

Does it work on bare-metal? (No cloud-only services)

Does it support the multi-cluster model? (dev → integration → live)

3. Operational Cost

How is it monitored? (Must integrate with kube-prometheus-stack)

How is it backed up? (Must have a recovery story)

How does it handle upgrades? (Must be declarative, ideally via Renovate)

What's the failure blast radius? (Isolated > cluster-wide)

4. Complexity Budget

Is the complexity justified by the learning value?

Could a simpler existing tool solve the same problem?

What's the maintenance burden over 12 months?

5. Alternative Analysis

What existing stack components could solve this? (Always check first)

What are the top 2-3 alternatives in the ecosystem?

What do other production homelabs use? (kubesearch research)

6. Failure Modes

What happens when this component is unavailable?

How does it interact with network policies? (Default deny)

What's the recovery procedure? (Must be documented in a runbook)

Can it self-heal? (Strong preference for self-healing)

Anti-Patterns to Challenge

Anti-Pattern

Why It's Wrong

Correct Approach

"Just run a container" without monitoring

Invisible failures, no alerting

ServiceMonitor + PrometheusRule required

Adding a new tool when existing ones suffice

Stack bloat, maintenance burden

Evaluate existing stack first

Skipping observability "for now"

Technical debt that never gets paid

Monitoring is day-1, not day-2

Cloud-only services

Vendor lock-in, can't run on bare-metal

Self-hosted alternatives preferred

Single-instance without HA story

Single point of failure

At minimum, document recovery procedure

architecture-review

Architecture Evaluation Framework

Current Technology Stack

Evaluation Criteria

1. Principle Alignment

2. Stack Fit

3. Operational Cost

4. Complexity Budget

5. Alternative Analysis

6. Failure Modes

Anti-Patterns to Challenge

المزيد من هذا المستودع

المزيد من هذا المستودع

Architecture Evaluation Framework

Current Technology Stack

Evaluation Criteria

1. Principle Alignment

2. Stack Fit

3. Operational Cost

4. Complexity Budget

5. Alternative Analysis

6. Failure Modes

Anti-Patterns to Challenge