Run any Skill in Manus with one click

erlang-elixir-principal-engineer

Principal/Senior-level Erlang and Elixir playbook for highly available backend systems, OTP design, actor-model concurrency, reliability, observability, and production operations. Use when: building or reviewing Elixir or Erlang services, designing OTP supervision trees, handling high-concurrency workloads, debugging distributed failures, improving resilience, or preparing BEAM systems for production.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/mOdrA40/claude-codex-skills-directory --skill erlang-elixir-principal-engineer

Copy and paste this command into Claude Code to install the skill

Source

mOdrA40/claude-codex-skills-directory

Stars5

Forks1

UpdatedMarch 21, 2026 at 15:10

File Explorer

6 files

SKILL.md

readonly

name

erlang-elixir-principal-engineer

description

Erlang / Elixir Mastery (Senior → Principal)

Operate

Start by confirming: language choice (Erlang or Elixir), OTP version, cluster topology, persistence strategy, workload shape, availability goals, and the definition of done.
Model the system around processes, supervision, message flow, and failure isolation before discussing frameworks.
Prefer clear OTP ownership and supervision over custom concurrency control.
Optimize for operability: observability, restart semantics, mailbox health, and degraded-mode behavior are part of the design.

The goal is not to show off the actor model. The goal is a BEAM system that fails in contained ways, recovers predictably, and remains easy to reason about during incidents.

Default Standards

Use OTP behaviors (GenServer, Supervisor, DynamicSupervisor, GenStage when justified) instead of ad-hoc process orchestration.
Keep Phoenix or HTTP layers thin; move policy and orchestration into testable modules.
Define message contracts and state transitions explicitly.
Treat mailbox growth, process cardinality, and backpressure as primary production risks.
Prefer supervision trees and retries at the right layer over defensive rescue blocks everywhere.
Make timeout, retry, and circuit breaker behavior explicit for outbound dependencies.
Use telemetry and structured logging consistently.

“Bad vs Good” (common production pitfalls)

# ❌ BAD: using a GenServer as a catch-all service locator.
def handle_call({:do_everything, payload}, _from, state) do
  result = BigBallOfMud.run(payload, state)
  {:reply, result, state}
end

# ✅ GOOD: keep the process focused and delegate domain behavior.
def handle_call({:create_user, attrs}, _from, state) do
  case Users.create(attrs, state.deps) do
    {:ok, user} -> {:reply, {:ok, user}, state}
    {:error, reason} -> {:reply, {:error, reason}, state}
  end
end

# ❌ BAD: unbounded Task spawning in a request path.
Enum.each(events, fn event ->
  Task.start(fn -> publish(event) end)
end)

# ✅ GOOD: use supervised, bounded concurrency.
Task.Supervisor.async_stream_nolink(MyApp.TaskSupervisor, events, &publish/1, max_concurrency: 8, timeout: 5_000)
|> Stream.run()

# ❌ BAD: catch-all rescue hides operational signals.
try do
  external_call()
rescue
  _ -> :ok
end

# ✅ GOOD: classify the failure and surface it.
case external_call() do
  {:ok, result} -> {:ok, result}
  {:error, :timeout} -> {:error, :dependency_timeout}
  {:error, reason} -> {:error, {:dependency_failed, reason}}
end

Workflow (Feature / Refactor / Bug)

Reproduce the issue or encode the expected behavior in tests.
Decide process ownership, supervision strategy, and message boundaries.
Define failure semantics: restart, retry, drop, dead-letter, or escalate.
Implement the smallest end-to-end slice.
Validate mailbox behavior, process counts, timeout paths, and telemetry.
Review cluster behavior, deploy safety, and rollback readiness.

Validation Commands

Run mix format --check-formatted.
Run mix test.
Run mix credo --strict if Credo is used.
Run mix dialyzer for type/spec analysis when configured.
Run mix test --trace during debugging.
Run mix phx.routes and endpoint smoke checks for Phoenix applications.
Run release smoke tests if the application ships as an OTP release.

OTP and Concurrency Guardrails

Every long-lived process should have a clear owner and supervision policy.
Avoid turning one GenServer into a bottleneck for unrelated responsibilities.
Bound concurrency for fan-out work; use supervised tasks or queues.
Watch mailbox growth, reduction spikes, and scheduler imbalance.
Prefer message passing over shared mutable escape hatches such as ETS misuse.
Use ETS intentionally: great for read-heavy shared data, dangerous as hidden global state.

Service and API Defaults

Validate input at the boundary before it reaches domain logic.
Map domain errors consistently to HTTP, gRPC, or queue semantics.
Use idempotency keys for retried commands with side effects.
Set explicit client timeouts and pool limits for outbound HTTP/database calls.
Do not leak stack traces or internal exception details to clients.

Reliability, Distributed Systems, and Operations

Design supervision trees around blast radius, not just code organization.
Decide what must restart together and what must fail independently.
Be explicit about node discovery, cookie management, and cluster partitions.
Treat netsplits, mailbox buildup, and dependency slowness as first-class failure modes.
Expose health checks, readiness, telemetry, and trace correlation for incident response.

Security Checklist (Minimum)

Keep secrets out of logs, process state dumps, and crash reports.
Validate payload size and shape for all external inputs.
Use least-privilege credentials and role separation.
Harden admin endpoints, LiveDashboard, and remote shell access.
Audit distributed node trust boundaries before enabling clustering across networks.

Decision Heuristics

Choose Erlang/Elixir when:
- the workload is highly concurrent and failure isolation matters
- soft real-time behavior and uptime are core requirements
- supervision and message passing are natural fits for the domain
- the team benefits from OTP and BEAM operational strengths

Prefer another backend stack when:
- the problem is simple CRUD with no resilience or concurrency pressure
- ecosystem maturity for a required niche library is missing
- the team cannot support BEAM operational knowledge yet

References

OTP design and supervision boundaries: references/otp-design.md
Phoenix and service boundaries: references/phoenix-and-boundaries.md
Distributed systems and operations: references/distributed-systems-and-operations.md
Telemetry and observability: references/telemetry-and-observability.md
Incident runbooks: references/incident-runbooks.md

Erlang / Elixir Mastery (Senior → Principal)

Operate

Start by confirming: language choice (Erlang or Elixir), OTP version, cluster topology, persistence strategy, workload shape, availability goals, and the definition of done.
Model the system around processes, supervision, message flow, and failure isolation before discussing frameworks.
Prefer clear OTP ownership and supervision over custom concurrency control.
Optimize for operability: observability, restart semantics, mailbox health, and degraded-mode behavior are part of the design.

The goal is not to show off the actor model. The goal is a BEAM system that fails in contained ways, recovers predictably, and remains easy to reason about during incidents.

Default Standards

Use OTP behaviors (GenServer, Supervisor, DynamicSupervisor, GenStage when justified) instead of ad-hoc process orchestration.
Keep Phoenix or HTTP layers thin; move policy and orchestration into testable modules.
Define message contracts and state transitions explicitly.
Treat mailbox growth, process cardinality, and backpressure as primary production risks.
Prefer supervision trees and retries at the right layer over defensive rescue blocks everywhere.
Make timeout, retry, and circuit breaker behavior explicit for outbound dependencies.
Use telemetry and structured logging consistently.

“Bad vs Good” (common production pitfalls)

# ❌ BAD: using a GenServer as a catch-all service locator.
def handle_call({:do_everything, payload}, _from, state) do
  result = BigBallOfMud.run(payload, state)
  {:reply, result, state}
end

# ✅ GOOD: keep the process focused and delegate domain behavior.
def handle_call({:create_user, attrs}, _from, state) do
  case Users.create(attrs, state.deps) do
    {:ok, user} -> {:reply, {:ok, user}, state}
    {:error, reason} -> {:reply, {:error, reason}, state}
  end
end

# ❌ BAD: unbounded Task spawning in a request path.
Enum.each(events, fn event ->
  Task.start(fn -> publish(event) end)
end)

# ✅ GOOD: use supervised, bounded concurrency.
Task.Supervisor.async_stream_nolink(MyApp.TaskSupervisor, events, &publish/1, max_concurrency: 8, timeout: 5_000)
|> Stream.run()

# ❌ BAD: catch-all rescue hides operational signals.
try do
  external_call()
rescue
  _ -> :ok
end

# ✅ GOOD: classify the failure and surface it.
case external_call() do
  {:ok, result} -> {:ok, result}
  {:error, :timeout} -> {:error, :dependency_timeout}
  {:error, reason} -> {:error, {:dependency_failed, reason}}
end

Workflow (Feature / Refactor / Bug)

Reproduce the issue or encode the expected behavior in tests.
Decide process ownership, supervision strategy, and message boundaries.
Define failure semantics: restart, retry, drop, dead-letter, or escalate.
Implement the smallest end-to-end slice.
Validate mailbox behavior, process counts, timeout paths, and telemetry.
Review cluster behavior, deploy safety, and rollback readiness.

Validation Commands

Run mix format --check-formatted.
Run mix test.
Run mix credo --strict if Credo is used.
Run mix dialyzer for type/spec analysis when configured.
Run mix test --trace during debugging.
Run mix phx.routes and endpoint smoke checks for Phoenix applications.
Run release smoke tests if the application ships as an OTP release.

OTP and Concurrency Guardrails

Every long-lived process should have a clear owner and supervision policy.
Avoid turning one GenServer into a bottleneck for unrelated responsibilities.
Bound concurrency for fan-out work; use supervised tasks or queues.
Watch mailbox growth, reduction spikes, and scheduler imbalance.
Prefer message passing over shared mutable escape hatches such as ETS misuse.
Use ETS intentionally: great for read-heavy shared data, dangerous as hidden global state.

Service and API Defaults

Validate input at the boundary before it reaches domain logic.
Map domain errors consistently to HTTP, gRPC, or queue semantics.
Use idempotency keys for retried commands with side effects.
Set explicit client timeouts and pool limits for outbound HTTP/database calls.
Do not leak stack traces or internal exception details to clients.

Reliability, Distributed Systems, and Operations

Design supervision trees around blast radius, not just code organization.
Decide what must restart together and what must fail independently.
Be explicit about node discovery, cookie management, and cluster partitions.
Treat netsplits, mailbox buildup, and dependency slowness as first-class failure modes.
Expose health checks, readiness, telemetry, and trace correlation for incident response.

Security Checklist (Minimum)

Keep secrets out of logs, process state dumps, and crash reports.
Validate payload size and shape for all external inputs.
Use least-privilege credentials and role separation.
Harden admin endpoints, LiveDashboard, and remote shell access.
Audit distributed node trust boundaries before enabling clustering across networks.

Decision Heuristics

Choose Erlang/Elixir when:
- the workload is highly concurrent and failure isolation matters
- soft real-time behavior and uptime are core requirements
- supervision and message passing are natural fits for the domain
- the team benefits from OTP and BEAM operational strengths

Prefer another backend stack when:
- the problem is simple CRUD with no resilience or concurrency pressure
- ecosystem maturity for a required niche library is missing
- the team cannot support BEAM operational knowledge yet

References

OTP design and supervision boundaries: references/otp-design.md
Phoenix and service boundaries: references/phoenix-and-boundaries.md
Distributed systems and operations: references/distributed-systems-and-operations.md
Telemetry and observability: references/telemetry-and-observability.md
Incident runbooks: references/incident-runbooks.md

erlang-elixir-principal-engineer

Erlang / Elixir Mastery (Senior → Principal)

Operate

Default Standards

“Bad vs Good” (common production pitfalls)

Workflow (Feature / Refactor / Bug)

Validation Commands

OTP and Concurrency Guardrails

Service and API Defaults

Reliability, Distributed Systems, and Operations

Security Checklist (Minimum)

Decision Heuristics

References

More from this repository

Erlang / Elixir Mastery (Senior → Principal)

Operate

Default Standards

“Bad vs Good” (common production pitfalls)

Workflow (Feature / Refactor / Bug)

Validation Commands

OTP and Concurrency Guardrails

Service and API Defaults

Reliability, Distributed Systems, and Operations

Security Checklist (Minimum)

Decision Heuristics

References

More from this repository