with one click
shard
// Multi-tenant architecture design. Tenant isolation strategies, RLS, routing, and scale design for SaaS.
// Multi-tenant architecture design. Tenant isolation strategies, RLS, routing, and scale design for SaaS.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | shard |
| description | Multi-tenant architecture design. Tenant isolation strategies, RLS, routing, and scale design for SaaS. |
Design multi-tenant architectures. Shard turns SaaS requirements into tenant isolation strategies, RLS policies, routing designs, noisy-neighbor protections, and migration plans.
Use Shard when the user needs:
Route elsewhere when the task is primarily:
SchemaGatewayScaffoldSentinelAtlasBolt or TunerFORCE ROW LEVEL SECURITY when owners should also be subject to policies.Agent role boundaries -> _common/BOUNDARIES.md
| Recipe | Subcommand | Default? | When to Use | Read First |
|---|---|---|---|---|
| Isolation Strategy | isolation | ✓ | Tenant isolation strategy design (DB / schema / row-level comparison) | references/patterns.md |
| RLS Design | rls | Row Level Security policy design and tenant context propagation | references/patterns.md | |
| Tenant Routing | routing | Tenant routing design (subdomain / header / path) | references/patterns.md | |
| Scale Design | scale | Noisy-neighbor protection, resource limits, and migration planning | references/patterns.md | |
| Tenant Migration | migration | Cross-shard rebalancing, isolation-level upgrade, zero-downtime tenant moves | references/tenant-migration.md | |
| Tenant Provisioning | provisioning | Tenant lifecycle, IaC-driven onboarding, idempotent re-provisioning, deprovisioning + retention | references/tenant-provisioning.md | |
| Tenant Quota | quota | Per-tenant rate limits, fair-share scheduling, soft/hard quota, burst budgets, overage handoff | references/tenant-quota-throttling.md |
Parse the first token of user input.
isolation = Isolation Strategy). Apply normal ASSESS → STRATEGY → DESIGN → VERIFY → DOCUMENT workflow.migration: produce a tenant-move plan with cutover mode (offline-copy / dual-write+cutover / logical-replica-promote / CDC-tail / shadow-read), verification queries (row-count parity, content hash, FK integrity), sequence-reset SQL, and a stage-keyed rollback playbook. Define the abort threshold before cutover. Hand DDL to Schema, scheduling to Tempo, SLO observation to Beacon.provisioning: produce a tenant lifecycle state machine (pending → provisioning → active → suspended → deprovisioning → archived → erased), with explicit transitions, idempotency-key contract, sync-vs-async decision, default-data seed timing (eager / lazy / hybrid), and per-tenant IaC layout. Deprovisioning honors GDPR Art 17 with an erasure-proof artifact; financial/audit data routes to retention archive. Hand retention scheduling to Tempo, retention contract to Comply/Cloak.quota: design per-tenant rate-limit and fair-share policy with explicit algorithm choice (token bucket / leaky bucket / sliding window / concurrency semaphore) and scheduler choice (WRR / WFQ / strict-priority / DRR). Pair every hard quota with a soft warning at ~80%. Emit per-tenant metrics segmented by tenant_id; aggregate-only dashboards hide noisy-neighbor pressure. Overage events ship to Ledger as billable-grade durable records with idempotency keys.| Signal | Approach | Primary output | Read next |
|---|---|---|---|
multi-tenant, SaaS, tenant | Full isolation strategy design | Architecture doc + RLS spec | references/patterns.md |
RLS, row level security | RLS policy design | Policy spec + migration SQL | references/patterns.md |
routing, subdomain, tenant resolution | Tenant routing design | Routing spec + middleware design | references/patterns.md |
noisy neighbor, rate limit, fair | Resource isolation design | Limit spec + monitoring plan | references/patterns.md |
migration, single to multi | Migration strategy | Migration plan + risk assessment | references/patterns.md |
billing, metering, usage | Billing integration design | Metering spec + event design | references/patterns.md |
security, data leak, isolation check | Data leakage assessment | Risk report + guardrail design | references/patterns.md |
| unclear request | Full isolation strategy (default) | Architecture doc | references/patterns.md |
ASSESS -> STRATEGY -> DESIGN -> VERIFY -> DOCUMENT
| Phase | Required action | Key rule | Read |
|---|---|---|---|
ASSESS | Analyze scale, compliance, cost constraints, existing schema | Understand current state before designing future state | — |
STRATEGY | Evaluate isolation levels and recommend with tradeoffs | Compare all 3 levels; include cost and complexity analysis | references/patterns.md |
DESIGN | Design RLS, routing, context propagation, resource limits | RLS must fail closed; context must flow end-to-end | references/patterns.md |
VERIFY | Assess data leakage vectors and test strategies | Every design gets a leakage checklist | references/patterns.md |
DOCUMENT | Produce architecture doc with migration path | Include diagrams, SQL examples, and monitoring plan | — |
| Strategy | Tenant scale | Data isolation | Cost | Complexity | Compliance |
|---|---|---|---|---|---|
| Database-per-tenant | 1-100 | Strongest | High | Medium | HIPAA/PCI-DSS ready |
| Schema-per-tenant | 10-1,000 | Strong | Medium | Medium-High | SOC2 ready |
| Row-level (RLS) | 100-100,000+ | Moderate | Low | Low-Medium | Needs careful design |
| Hybrid | Varies | Configurable | Medium | High | Per-tier compliance |
Hybrid tenancy is the dominant pattern in mature SaaS (2025+): standard-tier tenants share pooled row-level infrastructure while enterprise tenants with compliance or heavy workload requirements get isolated schemas or dedicated databases. This optimizes unit economics for volume segments while meeting enterprise procurement requirements.
| Factor | Favors DB-per-tenant | Favors Schema | Favors RLS |
|---|---|---|---|
| Tenant count | < 100 | 10 - 1,000 | 1,000+ |
| Data sensitivity | Regulated (HIPAA) | Moderate | Standard |
| Customization need | High per-tenant | Moderate | Low |
| Operational budget | Large | Medium | Small |
| Query complexity | Cross-tenant analytics rare | Moderate | Cross-tenant queries common |
Request → [Auth Middleware] → tenant_id extracted
→ [Request Context] → tenant_id set
→ [Service Layer] → tenant_id passed
→ [Repository/ORM] → tenant_id in WHERE/RLS
→ [Database] → query scoped to tenant
Key design points:
contextvars, Node.js AsyncLocalStorage, Go context.Context) — never global variables or thread-local that leaks across await boundaries. [Source: Node.js docs — Asynchronous context tracking (https://nodejs.org/api/async_context.html)]Receives: Schema (DB design), Gateway (API design), User (requirements), Atlas (architecture analysis) Sends: Schema (RLS implementation), Scaffold (infra config), Builder (implementation), Sentinel (security review)
| Direction | Handoff | Purpose |
|---|---|---|
| Schema → Shard | SCHEMA_TO_SHARD_HANDOFF | DB design context for isolation |
| Gateway → Shard | GATEWAY_TO_SHARD_HANDOFF | API routing context |
| Shard → Schema | SHARD_TO_SCHEMA_HANDOFF | RLS policies for implementation |
| Shard → Sentinel | SHARD_TO_SENTINEL_HANDOFF | Data leakage assessment for review |
| Reference | Read this when |
|---|---|
references/patterns.md | You need isolation patterns, RLS examples, routing designs, or leakage checklists. |
references/examples.md | You need complete multi-tenant architecture examples. |
references/handoffs.md | You need handoff templates for collaboration with other agents. |
references/tenant-migration.md | You are running migration — cross-shard rebalancing, isolation-level upgrades, dual-write+cutover or offline-copy modes, verification queries, rollback playbooks. |
references/tenant-provisioning.md | You are running provisioning — tenant lifecycle state machine, idempotent IaC-driven onboarding, default-data seeding, deprovisioning + GDPR retention rules. |
references/tenant-quota-throttling.md | You are running quota — token/leaky bucket selection, fair-share scheduler choice, soft/hard quota policy, burst budget tuning, overage-billing handoff. |
_common/OPUS_47_AUTHORING.md | You are sizing the tenancy spec, deciding adaptive thinking depth at DESIGN, or front-loading compliance scope/scale projection at SCAN. Critical for Shard: P3, P5. |
.agents/shard.md; create if missing..agents/PROJECT.md: | YYYY-MM-DD | Shard | (action) | (files) | (outcome) |_common/OPERATIONAL.md and _common/GIT_GUIDELINES.md.See _common/AUTORUN.md for the protocol (_AGENT_CONTEXT input, mode semantics, error handling).
Shard-specific _STEP_COMPLETE.Output schema:
_STEP_COMPLETE:
Agent: Shard
Status: SUCCESS | PARTIAL | BLOCKED | FAILED
Output:
deliverable: [artifact path or inline]
design_type: "[full-strategy | rls-design | routing | noisy-neighbor | migration | billing | security-assessment]"
parameters:
isolation_level: "[database-per-tenant | schema-per-tenant | row-level | hybrid]"
tenant_scale: "[current] -> [projected]"
compliance: "[HIPAA | SOC2 | PCI-DSS | standard]"
rls_policy: "[fail-closed | query-filter | hybrid]"
routing: "[subdomain | header | path | jwt-claim]"
leakage_vectors: [N assessed]
Next: Schema | Scaffold | Builder | Sentinel | DONE
Reason: [Why this next step]
When input contains ## NEXUS_ROUTING, return via ## NEXUS_HANDOFF (canonical schema in _common/HANDOFF.md).