Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

async-messaging

Build reliable event-driven flows with the Transactional Outbox pattern — write state and event in one transaction, relay asynchronously, achieve at-least-once delivery + consumer idempotency. Use when an action must reliably trigger downstream work, or when events are lost on crash (dual-write problem). Not for simple background work without state+event reliability (use background-jobs) or outbound HTTP webhook specifics (use webhook-design).

Ejecutar en Manus

Estrellas0

Forks0

Actualizado8 de junio de 2026, 13:41

Fuente

JayKim88

JayKim88/claude-ai-engineering

Abrir repositorio de GitHub Ver repositorios del creador

Comando de instalación

Descarga

Ejecutar en Manus

SKILL.md

readonly

name	async-messaging
description	Build reliable event-driven flows with the Transactional Outbox pattern — write state and event in one transaction, relay asynchronously, achieve at-least-once delivery + consumer idempotency. Use when an action must reliably trigger downstream work, or when events are lost on crash (dual-write problem). Not for simple background work without state+event reliability (use background-jobs) or outbound HTTP webhook specifics (use webhook-design).
license	MIT

Async Messaging (Transactional Outbox)

Purpose

Make "change state AND notify the world" reliable. The naive approach (write DB, then publish to a queue) loses events on crash between the two steps — the dual-write problem. The Outbox pattern solves it.

Universal — the Outbox pattern, the dual-write problem, and "exactly-once = at-least-once + consumer idempotency" are distributed-systems fundamentals independent of broker/language.

Procedure

Recognize the dual-write problem
- "Write DB, then publish to Kafka/Redis" has no atomicity — crash after DB write but before publish = lost event; crash after publish but before commit = phantom event
- You cannot atomically write to two systems without a distributed transaction (which you should avoid)
Apply the Transactional Outbox
- Write the business state change AND an outbox row in ONE local DB transaction (see transaction-management)
- Both commit together or neither does — atomicity guaranteed by the single DB
Relay the outbox asynchronously
- Polling publisher: a background-jobs task reads unsent outbox rows, publishes to the broker, marks sent
- CDC (Change Data Capture, e.g., Debezium): tail the DB log, publish outbox inserts — lower latency, more infra
- Start with polling; move to CDC only if latency demands it
Design consumers for at-least-once (idempotency)
- The relay guarantees at-least-once → consumers WILL see duplicates
- "Exactly-once" delivery is a myth; achieve effective exactly-once via at-least-once + idempotent consumers (dedupe on event id / idempotency key — see resilience-patterns)

4b. Version events from day one (additive-only)

Include eventType + eventVersion (or schemaVersion) on every payload — consumers will outlive any one publisher
Additive-only changes on a live event type: new fields nullable, never remove or repurpose; bump the version and run old+new in parallel during cutover. Treat events like a public API contract (see api-contract)
Never publish two semantically-different shapes under the same eventType + version

4c. Dead-letter queue (DLQ) + poison-message handling

A retried-N-times failure is a poison message: route to a DLQ after the retry budget rather than blocking the queue forever
DLQ needs alerting (a growing DLQ = silent failure) + a manual replay path with the original payload + reason
Cap retry attempts and use backoff + jitter (see resilience-patterns / background-jobs)

Choose the broker by needs (criteria, not products)
- Simple in-app queue — low setup, good for in-process jobs
- Routing broker — work queues + routing topologies, moderate scale
- High-throughput log — event replay, multiple consumer groups, high volume
- Ordering needs a partition key: same-entity events go to the same partition (order_id, user_id). Within a partition: ordered; across partitions: no guarantees. Choose the key by the invariant you actually need
- Don't reach for a heavyweight log broker by default — it's heavy infra (specific products in Implementation)

5b. Retain only what you need

The outbox table grows on every write — schedule a cleanup job to delete (or archive) sent rows past N days, or it eats the DB
For broker retention: replay needs ≠ storage forever; tune by recovery window (e.g., 7 days for backfill, not 1 year "just in case")

Validate (validation loop)
- Crash the service between state write and relay → verify the event is still delivered (outbox row persisted)
- Deliver an event twice → verify the consumer processes it once (idempotent)
- If an event is lost → the write wasn't in the same transaction as the outbox row; fix and re-test

Anti-patterns

❌ Anti-pattern	✅ Correct
Write DB then publish to broker (dual write)	Write state + outbox row in one transaction, relay async
Assuming the broker gives exactly-once	At-least-once + idempotent consumers
Publishing inside the business transaction (network I/O in tx)	Outbox row in tx; relay outside
Heavyweight log broker for a simple in-app job	Simple queue first; log broker only at real scale
Consumer with no dedup	Dedupe on event id
Two payload shapes under the same `eventType` (silent break)	`eventVersion` + additive-only changes; run old+new in parallel on cutover
Poison messages blocking the queue indefinitely	DLQ after retry budget; alert on DLQ growth; manual replay
Outbox table growing unbounded	Scheduled cleanup of sent rows past retention window
Ordering assumed without a partition key	Choose a partition key matching the invariant (same-entity events to the same partition)

Severity tiers

Tier	Examples	Action SLA
Critical	Dual-write losing payment/order events on crash; consumer with no idempotency double-charging; breaking schema change shipped on a live `eventType` (consumers crash on next message)	Fix immediately
Major	Publishing inside a transaction (lock contention); no DLQ for poison messages; outbox table grows without cleanup (DB-eating); no partition key on a flow that requires ordering	Fix this sprint
Minor	Polling interval untuned; broker over-provisioned (Kafka for tiny load); broker retention not aligned with actual replay needs	Schedule within 2 sprints

Completion Criteria

State change + event written in one transaction (Outbox)
Relay (polling or CDC) publishes + marks sent
Consumers idempotent (dedup on event id)
Every event payload carries eventVersion; schema changes are additive-only
DLQ wired + alerted on growth; manual replay path exists
Outbox cleanup scheduled; broker retention matches the actual replay window
Partition key chosen wherever ordering is required
Broker choice justified by actual throughput/replay needs
Crash + duplicate scenarios verified

Output

Outbox table + relay: schema + polling publisher (or CDC config)
Consumer handlers: idempotent, with dedup
Architecture doc: event catalog + delivery guarantees
Commit format: feat(events): add outbox for <event> / feat(events): idempotent consumer for <event>

Implementation

TypeScript + Prisma + BullMQ/Redis (default)

Outbox: outbox(id, type, payload, created_at, sent_at) written inside prisma.$transaction with the state change
Relay: a BullMQ repeatable job polls WHERE sent_at IS NULL, publishes, sets sent_at
Consumers: BullMQ workers with dedup table on event id
Scale path: Redis Streams → Kafka (KafkaJS) if replay/throughput demands

Other stacks

Python: SQLAlchemy outbox + Celery relay; or Debezium CDC → Kafka
Go: outbox + worker; Watermill library for messaging abstractions
Universal: Outbox is DB-agnostic; Debezium provides CDC for Postgres/MySQL/Mongo; the at-least-once + idempotency principle is broker-independent

Related skills

transaction-management — the outbox row is written in the same transaction as the state change
background-jobs — the outbox relay / consumers run as jobs
webhook-design — outbound webhooks can be driven by the outbox

Reference

Key insight encoded: Solve the dual-write problem by writing business state and the event to an outbox table in one local transaction, then relay asynchronously (polling or CDC) — this is what makes at-least-once delivery achievable; "exactly-once" is really at-least-once + consumer idempotency. Treat events like a public API: version from day one, additive-only changes, partition key on flows that need ordering. Operational hygiene matters: a DLQ for poison messages + an outbox-cleanup job (else the outbox eats the DB).

Más de este repositorio

mismo repositorio

ai-llm-backend

JayKim88/claude-ai-engineering

Build LLM features on the backend — deterministic agent loops (round-trip every tool call by id), RAG over a vector store, token/cost accounting, streaming, eval harness, and prompt-injection defense (treat all model context as untrusted). Use when adding an AI feature, building RAG, or wiring an agent loop. Not for the AI streaming UI on the frontend (use frontend-toolkit's AI integration) or general boundary input parsing (use data-validation).

2026-06-080

api-contract

JayKim88/claude-ai-engineering

Define a schema-first API contract — standardized error envelope (RFC 9457), pagination, status codes, consistent JSON shapes. Use when establishing API conventions, before multiple teams consume an API, or when error responses are inconsistent. Not for choosing the protocol or modeling resources (use api-design) or for runtime input parsing at the boundary (use data-validation).

2026-06-080

api-design

JayKim88/claude-ai-engineering

Choose the API protocol (REST / GraphQL / gRPC) by traffic shape and design resources, versioning, and async patterns. Use when adding a new API surface, designing a service boundary, or when clients complain about over/under-fetching. Not for the schema/error envelope details (use api-contract) or per-resource access control (use authorization).

2026-06-080

architecture-improvement

JayKim88/claude-ai-engineering

Default to a modular monolith with enforced internal boundaries; treat microservices as a destination after boundaries prove stable, not a starting point. Use when structuring a backend, when tempted to split into services, or when module boundaries blur. Not for the actual schema-split / service-extraction migrations (use migration-strategy + schema-design).

2026-06-080

authentication

JayKim88/claude-ai-engineering

Choose and implement auth correctly — JWT vs session vs OAuth decision, pin allowed algorithms server-side, rotate refresh tokens with reuse detection, avoid the classic JWT pitfalls. Use when adding login, integrating OAuth, or when token handling looks risky. Not for access control / permissions (use authorization) or a broader OWASP audit (use backend-security-audit).

2026-06-080

authorization

JayKim88/claude-ai-engineering

Design access control — RBAC for coarse function-level checks, Postgres Row Level Security (RLS) for row-level data isolation, ABAC pushed to the app/policy layer. Use when adding permissions, building multi-user data access, or when one user can see another's data. Not for establishing who the caller is (use authentication) or tenant isolation specifically (use multitenancy-audit).

2026-06-080

name	async-messaging
description	Build reliable event-driven flows with the Transactional Outbox pattern — write state and event in one transaction, relay asynchronously, achieve at-least-once delivery + consumer idempotency. Use when an action must reliably trigger downstream work, or when events are lost on crash (dual-write problem). Not for simple background work without state+event reliability (use background-jobs) or outbound HTTP webhook specifics (use webhook-design).
license	MIT

Async Messaging (Transactional Outbox)

Purpose

Universal — the Outbox pattern, the dual-write problem, and "exactly-once = at-least-once + consumer idempotency" are distributed-systems fundamentals independent of broker/language.

Procedure

Recognize the dual-write problem
- "Write DB, then publish to Kafka/Redis" has no atomicity — crash after DB write but before publish = lost event; crash after publish but before commit = phantom event
- You cannot atomically write to two systems without a distributed transaction (which you should avoid)
Apply the Transactional Outbox
- Write the business state change AND an outbox row in ONE local DB transaction (see transaction-management)
- Both commit together or neither does — atomicity guaranteed by the single DB
Relay the outbox asynchronously
- Polling publisher: a background-jobs task reads unsent outbox rows, publishes to the broker, marks sent
- CDC (Change Data Capture, e.g., Debezium): tail the DB log, publish outbox inserts — lower latency, more infra
- Start with polling; move to CDC only if latency demands it
Design consumers for at-least-once (idempotency)
- The relay guarantees at-least-once → consumers WILL see duplicates
- "Exactly-once" delivery is a myth; achieve effective exactly-once via at-least-once + idempotent consumers (dedupe on event id / idempotency key — see resilience-patterns)

4b. Version events from day one (additive-only)

Include eventType + eventVersion (or schemaVersion) on every payload — consumers will outlive any one publisher
Additive-only changes on a live event type: new fields nullable, never remove or repurpose; bump the version and run old+new in parallel during cutover. Treat events like a public API contract (see api-contract)
Never publish two semantically-different shapes under the same eventType + version

4c. Dead-letter queue (DLQ) + poison-message handling

A retried-N-times failure is a poison message: route to a DLQ after the retry budget rather than blocking the queue forever
DLQ needs alerting (a growing DLQ = silent failure) + a manual replay path with the original payload + reason
Cap retry attempts and use backoff + jitter (see resilience-patterns / background-jobs)

Choose the broker by needs (criteria, not products)
- Simple in-app queue — low setup, good for in-process jobs
- Routing broker — work queues + routing topologies, moderate scale
- High-throughput log — event replay, multiple consumer groups, high volume
- Ordering needs a partition key: same-entity events go to the same partition (order_id, user_id). Within a partition: ordered; across partitions: no guarantees. Choose the key by the invariant you actually need
- Don't reach for a heavyweight log broker by default — it's heavy infra (specific products in Implementation)

5b. Retain only what you need

The outbox table grows on every write — schedule a cleanup job to delete (or archive) sent rows past N days, or it eats the DB
For broker retention: replay needs ≠ storage forever; tune by recovery window (e.g., 7 days for backfill, not 1 year "just in case")

Validate (validation loop)
- Crash the service between state write and relay → verify the event is still delivered (outbox row persisted)
- Deliver an event twice → verify the consumer processes it once (idempotent)
- If an event is lost → the write wasn't in the same transaction as the outbox row; fix and re-test

Anti-patterns

❌ Anti-pattern	✅ Correct
Write DB then publish to broker (dual write)	Write state + outbox row in one transaction, relay async
Assuming the broker gives exactly-once	At-least-once + idempotent consumers
Publishing inside the business transaction (network I/O in tx)	Outbox row in tx; relay outside
Heavyweight log broker for a simple in-app job	Simple queue first; log broker only at real scale
Consumer with no dedup	Dedupe on event id
Two payload shapes under the same `eventType` (silent break)	`eventVersion` + additive-only changes; run old+new in parallel on cutover
Poison messages blocking the queue indefinitely	DLQ after retry budget; alert on DLQ growth; manual replay
Outbox table growing unbounded	Scheduled cleanup of sent rows past retention window
Ordering assumed without a partition key	Choose a partition key matching the invariant (same-entity events to the same partition)

Severity tiers

Tier	Examples	Action SLA
Critical	Dual-write losing payment/order events on crash; consumer with no idempotency double-charging; breaking schema change shipped on a live `eventType` (consumers crash on next message)	Fix immediately
Major	Publishing inside a transaction (lock contention); no DLQ for poison messages; outbox table grows without cleanup (DB-eating); no partition key on a flow that requires ordering	Fix this sprint
Minor	Polling interval untuned; broker over-provisioned (Kafka for tiny load); broker retention not aligned with actual replay needs	Schedule within 2 sprints

Completion Criteria

State change + event written in one transaction (Outbox)
Relay (polling or CDC) publishes + marks sent
Consumers idempotent (dedup on event id)
Every event payload carries eventVersion; schema changes are additive-only
DLQ wired + alerted on growth; manual replay path exists
Outbox cleanup scheduled; broker retention matches the actual replay window
Partition key chosen wherever ordering is required
Broker choice justified by actual throughput/replay needs
Crash + duplicate scenarios verified

Output

Outbox table + relay: schema + polling publisher (or CDC config)
Consumer handlers: idempotent, with dedup
Architecture doc: event catalog + delivery guarantees
Commit format: feat(events): add outbox for <event> / feat(events): idempotent consumer for <event>

Implementation

TypeScript + Prisma + BullMQ/Redis (default)

Outbox: outbox(id, type, payload, created_at, sent_at) written inside prisma.$transaction with the state change
Relay: a BullMQ repeatable job polls WHERE sent_at IS NULL, publishes, sets sent_at
Consumers: BullMQ workers with dedup table on event id
Scale path: Redis Streams → Kafka (KafkaJS) if replay/throughput demands

Other stacks

Python: SQLAlchemy outbox + Celery relay; or Debezium CDC → Kafka
Go: outbox + worker; Watermill library for messaging abstractions
Universal: Outbox is DB-agnostic; Debezium provides CDC for Postgres/MySQL/Mongo; the at-least-once + idempotency principle is broker-independent

Related skills

transaction-management — the outbox row is written in the same transaction as the state change
background-jobs — the outbox relay / consumers run as jobs
webhook-design — outbound webhooks can be driven by the outbox

Reference

Key insight encoded: Solve the dual-write problem by writing business state and the event to an outbox table in one local transaction, then relay asynchronously (polling or CDC) — this is what makes at-least-once delivery achievable; "exactly-once" is really at-least-once + consumer idempotency. Treat events like a public API: version from day one, additive-only changes, partition key on flows that need ordering. Operational hygiene matters: a DLQ for poison messages + an outbox-cleanup job (else the outbox eats the DB).