Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

pony-software-design

Name: Pony Software Design
Author: ponylang

// Disciplines for software design work. Load when designing APIs, type systems, features, or system boundaries. Counters the tendency to retrieve familiar patterns instead of discovering what the problem actually needs. Has full (8-persona) and lightweight (5-persona) modes.

Exécuter dans Manus

$ git log --oneline --stat

stars:5

forks:0

updated:22 mai 2026 à 14:07

Explorateur de fichiers

10 fichiers

SKILL.md

readonly

related-skills.json

même dépôt

pony-ref.md

from "ponylang/llm-skills"

Load the Pony language reference (capabilities, PonyCheck, stdlib pitfalls, mort pattern). Load it before Pony coding sessions.

2026-05-315

pony-code-review.md

from "ponylang/llm-skills"

Ensemble code review with specialized reviewer personas. Has full (8-persona) and lightweight (3-persona) modes. Load when conducting a code review of a PR, branch, or local changes.

2026-05-225

pony-docs-review.md

from "ponylang/llm-skills"

Ensemble documentation review with specialized reviewer personas. Has full (8-persona) and lightweight (3-persona) modes. Load when reviewing documentation-only changes where code-focused personas don't apply.

2026-05-225

pony-ensemble.md

from "ponylang/llm-skills"

Ensemble workflow for producing higher-confidence outputs through decorrelated reasoning paths. Load when the human explicitly requests the ensemble approach.

2026-05-225

pony-pbt-patterns.md

from "ponylang/llm-skills"

Property-based and generative testing patterns. Load when writing property-based tests, generators, or generative test suites.

2026-05-225

pony-test-design.md

from "ponylang/llm-skills"

Two-stage ensemble for planning meaningful tests. Load when writing tests for new features or reviewing test quality. Counters the tendency to write tests that exercise the stdlib instead of your code. Has full (8-persona) and lightweight (5-persona) modes.

2026-05-225

package.json

"author": "ponylang"

"repository": "ponylang/llm-skills"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Développeurs de logicielsProfessions informatiques et mathématiques15-1252L4

Exécutez n'importe quel Skill en un clic

name	pony-software-design
description	Disciplines for software design work. Load when designing APIs, type systems, features, or system boundaries. Counters the tendency to retrieve familiar patterns instead of discovering what the problem actually needs. Has full (8-persona) and lightweight (5-persona) modes.
disable-model-invocation	false

Software Design

Load this skill when doing design work — APIs, type systems, features, system boundaries. The core problem it addresses: LLMs default to retrieving familiar patterns from training data rather than discovering what a specific problem needs. The result is designs that look right (they have the right nouns) but weren't derived from the problem.

Design is the act of discovering what is needed. It's about finding surprising affordances and avoiding candy-machine interfaces riddled with footguns.

Design is not a phase. It's a continuous loop: observe what you have, orient against what you know, decide, act, then observe again. Every decision you make reveals new information. That information might confirm prior decisions or invalidate them. Either way, you have to look. "I already decided that" is never a reason to skip re-evaluation — it was decided with less information than you have now.

Mode selection

The skill has two modes: full and lightweight. The orchestrator selects the appropriate mode based on the criteria below and proceeds. Report the mode choice when presenting results — the human can request full mode if lightweight was used and they want deeper coverage.

Full mode is the default. Use it when:

Defining new boundaries — a new subsystem, API surface, or type hierarchy
Multiple ownership or abstraction decisions need to be made simultaneously
There's genuine design space to explore — multiple viable approaches
The design must be described from first principles, not as an extension of something that already exists

Lightweight mode is for bounded design work within established patterns:

Adding a method to an existing API, a new variant in a type family, a new handler following established conventions
The boundaries are already decided — you're filling in, not redrawing
Consumer code patterns are already established by adjacent features
The task can be described as "another X that does Y" where X already exists in the codebase

When in doubt, use full mode. Lightweight is appropriate when there's a clear existing pattern being extended and the boundaries are already decided.

Process: full mode

Design work is where pattern-matching failures are most costly and hardest to self-detect. A single agent applying design disciplines will still pattern-match — the disciplines become post-hoc rationalizations for a retrieved design rather than actual constraints on the exploration.

The full design process runs in two stages with a feedback loop. Load pony-ensemble for the mechanical process; the personas defined in this skill replace the generic attention focuses.

Relationship to Ensemble Workflow

This skill uses the ensemble workflow with domain-specific customizations. Stage 1 (design) runs as a standard ensemble with 3 personas. Stage 2 (evaluation) runs as a second ensemble with 5 personas, using the Stage 1 synthesis output as its input. The two-stage loop and finding categorization (Rejection/Adjustment/Tension) are additions specific to this skill — the base ensemble protocol handles agent spawning, triage, and synthesis mechanics.

Orchestrator pre-spawn: understand the problem

Before spawning design personas, the orchestrator must decompose the problem statement. Ask: "What are we trying to accomplish? What pain point does this address?"

If the problem statement implies a solution rather than stating a problem — "add convenience wrapper methods for X" rather than "users struggle with X because Y" — peel it back to the underlying pain point. If you can't confidently identify the pain point, ask the human. This is not a failure; it's the most valuable question the orchestrator can ask, because every persona will anchor on whatever framing they receive.

Brief personas with the actual problem. The original problem statement can be included as context, but the briefing should lead with the pain point: what the user is struggling with and why. This gives personas room to explore different solutions rather than anchoring on the solution implied by the problem statement.

Stage 1: Design

Three design personas explore the problem space in parallel. Each applies all the disciplines below but enters the problem from a different direction. The decorrelation comes from where they start, not what they know. Persona definitions are in personas/design/.

File	Focus
`consumer-first.md`	Starts from usage code, derives API from what makes call sites clean
`skeptic.md`	Questions every abstraction, tries to subtract, proposes the smallest design
`principle-checker.md`	Hard verification of each design principle with evidence

Orchestrator post-return: evaluate exploration quality

When personas return, the orchestrator evaluates the quality of exploration before forwarding outputs to synthesis. Check:

Did the personas explore meaningfully different approaches? Read each persona's "Key decisions / alternatives considered" section. If a persona only considered one interpretation of the problem and designed for it, the exploration was insufficient.
Did they all anchor on the same narrow interpretation? If all three personas produced designs based on the same framing of the problem — e.g., all three designed wrapper methods — the ensemble failed to decorrelate. The different entry points weren't enough to overcome the anchoring effect of the problem statement.
Is the reasoning for rejecting alternatives sound? A persona that considered a wild idea and rejected it with good reasoning explored genuinely. A persona that listed an alternative and dismissed it in one sentence didn't.

If exploration quality is poor — narrow interpretation, insufficient alternatives, weak rejection reasoning — don't forward to synthesis. Refine the briefing: restate the problem more broadly, point out the anchoring you observed, and ask the human to clarify if needed. Then re-spawn.

Stage 1 synthesis

Stage 1 synthesis produces a candidate design using the standard ensemble synthesis process. The synthesis should pay special attention to:

Where the consumer-first designer's sketches conflict with the skeptic's subtractions — the tension usually reveals the right boundary
Where the principle checker found a violation that the others missed — this is the highest-value finding
Whether all three converged on the same abstraction — convergence from different starting points is strong signal
When the skeptic says "no value" and redirects: The skeptic is the only persona whose job includes saying "this shouldn't exist." When the skeptic concludes the proposed design doesn't earn its keep and redirects toward a different problem, this is not a minority position to be outvoted because the other two produced designs. The other two will always produce a design — that's their job. Evaluate the skeptic's case on its merits. If the skeptic is right that the design is a thin wrapper, the synthesis output should adopt the skeptic's redirect as the foundation: "the proposed design doesn't earn its keep — here's the actual problem worth solving" becomes the candidate, not a blended version of the thin wrapper.

Stage 2: Evaluation

Five evaluation personas stress-test the candidate design in parallel. Their input is the Integrated Result from Stage 1 synthesis — the candidate design with its consumer sketches, type definitions, and boundary decisions. They evaluate these design artifacts, not implementation code. Persona definitions are in personas/evaluation/.

File	Focus
`security.md`	Trust boundaries, attack surfaces, resource bounds in the design
`performance.md`	Architectural bottlenecks, coordination points, data structure choices
`adversarial.md`	Concrete usage scenarios that lead to bad outcomes
`testability.md`	Whether the design is verifiable — observable effects, isolatable components
`wildcard.md`	What all the other personas missed

For the wildcard persona specifically: include the identity statement (first paragraph) from each of the other personas so the wildcard knows what territory is already covered.

Before spawning evaluation personas, create a temporary directory for evidence files (~/tmp/design-eval-<timestamp>/). Each persona writes its detailed analysis to a file in this directory and returns a structured summary to the orchestrator. The synthesizer works from summaries and digs into evidence files only when it needs to examine a finding more closely. This prevents context overload during synthesis.

Evaluation personas identify problems and assess impact — they do not categorize their own findings as Rejection/Adjustment/Tension. Categorization is the synthesis step's responsibility.

Evaluation persona output format

Each evaluation persona produces two artifacts:

Evidence file — written to the path provided by the orchestrator. Contains the full detailed analysis: every finding with complete evidence, full design element excerpts, detailed reasoning, and complete pass/fail evaluations. This is the authoritative record.

Summary (returned to orchestrator) — a structured summary for the synthesizer to work from:

Findings — ordered by impact (Structural > Significant > Minor). Each:

Design element: The type, boundary, API, or interaction being evaluated
Concern: What the problem is
Impact: Structural (requires rethinking the approach), significant (requires notable changes to the candidate), or minor (small adjustment)
Evidence: Brief — full evidence is in the file
Suggested change: If applicable

The impact assessment helps the synthesizer with categorization without pre-empting it. A persona's "structural" assessment is a strong signal toward Rejection, but the synthesizer may disagree if it sees the concern addressed by another persona's suggestion.

Passes — things checked that look correct. Brief.

Uncertainties — things the persona couldn't determine, and why.

Stage 2 synthesis works from the persona summaries and categorizes each finding. Provide the paths to each persona's evidence file so the synthesizer can dig in when it needs more context — when impact assessments conflict, when a finding's summary is ambiguous, or when it needs to verify the evidence supports the concern.

Stage 2 synthesis categorizes each finding:

Rejection: A structural problem that invalidates the design direction. The candidate cannot be fixed by adjustment — the design personas need to rethink the approach. The rejection includes why the direction fails and what constraint it violates.
Adjustment: A specific aspect that needs to change, but the overall direction is sound. Becomes a constraint for the next design iteration.
Tension: A fundamental conflict that the personas cannot resolve — it requires human judgment. Collected and presented at the end.

The loop

After stage 2 synthesis:

If there are only tensions (no rejections or adjustments), the loop terminates. Present the design with the tensions for human review.
If there are adjustments and/or rejections, feed them back to the design personas. Each design persona receives: the prior candidate design, the original problem statement, and the categorized findings. Rejections include the rationale for why the direction failed — design personas should explore a different approach, not patch the rejected one. Adjustments include the specific change needed and why — design personas should revise the candidate to incorporate them as constraints.
The design personas run again with this context, producing a revised candidate.
Evaluation runs again on the revised candidate. Evaluation personas run with fresh context (no knowledge of prior evaluations). The synthesis step receives the full history so it can track convergence.
Repeat until clean or until convergence failure is detected.

Convergence failure

The orchestrator monitors the loop for signs that it isn't converging:

The same evaluation concern keeps appearing across iterations, even after design revisions attempt to address it
Rejections and adjustments are contradicting each other (fixing one evaluation concern breaks another)
The design is growing more complex with each iteration rather than settling

When a convergence failure is detected, the orchestrator stops the loop and escalates to the human: "Here's the fundamental tension — these concerns pull in opposite directions, and we need you to decide which matters more." This is not a failure of the process. Surfacing genuine tensions is one of its primary outputs.

Output

The final output includes:

Accepted design (if one emerged): The candidate that passed evaluation, with consumer sketches, type definitions, and boundary decisions.
Rejected designs: Each candidate that was rejected during the loop, with the rejection rationale. These are valuable — they document explored territory and why it didn't work.
Unresolved tensions: Findings categorized as tensions that require human judgment.

If the loop terminated via convergence failure rather than a clean evaluation, there is no accepted design — only the history of attempts, the rejections, and the tensions.

Process: lightweight mode

Lightweight mode uses fewer personas and a single pass. It keeps all three design personas but reduces evaluation to two personas and drops the feedback loop. Load pony-ensemble for the mechanical process.

Orchestrator pre-spawn: understand the problem

Same as full mode. Before spawning personas, decompose the problem statement to the underlying pain point. If the problem statement implies a solution, peel it back or ask the human. Brief personas with the actual problem.

Stage 1: Design

Three design personas explore the problem in parallel — the same as full mode. The same disciplines apply; the decorrelation still comes from different entry points.

File	Focus
`consumer-first.md`	Starts from usage code, derives API from what makes call sites clean
`skeptic.md`	Questions every abstraction, tries to subtract, proposes the smallest design
`principle-checker.md`	Hard verification of each design principle with evidence

Orchestrator post-return: evaluate exploration quality

Same as full mode. Evaluate whether personas explored meaningfully different approaches before forwarding to synthesis. If all three anchored on the same narrow interpretation, re-brief and re-spawn.

Stage 1 synthesis produces a candidate design using the standard ensemble synthesis process. The same synthesis guidance as full mode applies: consumer-first vs skeptic tensions reveal boundaries, principle-checker violations the others missed are the highest-value findings, and convergence from different starting points is strong signal.

Stage 2: Evaluation

Two evaluation personas stress-test the candidate. The adversarial evaluator always runs. The second slot is context-dependent — if the human specifies which evaluator to use, use that. Otherwise the orchestrator picks whichever lens is most relevant to the task:

When	Pick
Design touches trust boundaries or external input	`security.md`
Design is on a hot path or introduces coordination points	`performance.md`
Design has complex state or will be hard to test in isolation	`testability.md`

Pick whichever is closest — every design has some risk profile. If multiple conditions apply, pick the most relevant one. If the reason for picking a particular evaluator is a characteristic that also appears in the full-mode selection criteria, that's a signal the task warrants full mode — don't use the persona pick to compensate for a wrong mode selection.

Before spawning evaluation personas, create a temporary directory for evidence files (~/tmp/design-eval-<timestamp>/), same as full mode. Each persona writes its detailed analysis to a file in this directory and returns a structured summary. Evaluation personas use the same output format as full mode (evidence file + summary with findings ordered by impact, passes, uncertainties).

Stage 2 synthesis categorizes each finding as Rejection, Adjustment, or Tension using the same scheme as full mode.

No loop — single pass

Lightweight mode does not iterate. After Stage 2 synthesis:

Adjustments and tensions: Present the design with findings to the human. Adjustments are expected to be small enough that the orchestrator or human can apply them directly. If adjustments collectively amount to redesigning rather than tweaking, that's the same escalation signal as high finding density — present it to the human.
Rejection: The design direction is wrong. Present the rejected candidate, the rejection rationale, and all other findings to the human. The human decides what to do — escalate to full mode, fix it directly, rethink the problem statement, or something else. Lightweight doesn't prescribe the response; it presents the information.

If the review produces an unexpectedly high density of findings relative to the change size, if a finding reveals the approach is fundamentally wrong, or if a finding reveals the change touches more subsystems or has more complex interactions than the mode selection assumed, the orchestrator presents this to the human. The human decides what to do — the same options apply.

Output

The final output includes:

Candidate design: With consumer sketches and type definitions.
Adjustments: Specific changes needed, small enough to apply directly.
Tensions: Conflicts requiring human judgment.
Rejection rationale: If applicable — the structural finding and why the design direction was rejected.

Design values

When principles conflict, these values set the priority. We value the left side over the right — but the right side still matters when the left isn't at stake.

API safety over API minimality — an error-prone API should be fixed even if the fix adds surface. Prefer solutions that don't expand the API, but never leave a footgun to preserve minimality.

Correctness over performance — never sacrifice correctness for speed. Get it right first, then optimize. A faster wrong answer is still wrong.

Correctness over concision — correct but verbose beats concise but wrong. Don't simplify code or APIs at the cost of correct behavior.

Security over performance — never skip validation at trust boundaries for speed. Optimize how you validate, not whether you validate. Security is correctness.

Interface simplicity over implementation simplicity — accept a harder implementation to give users a clean interface. The consumer's experience matters more than the implementer's convenience.

Performance over interface simplicity — runtime speed matters more than programmer convenience. It's acceptable to make things harder on the user to improve performance, but never at the cost of correctness.

Simplicity over consistency — don't force artificial consistency when it makes things harder to use. If two similar things genuinely need different interfaces, let them be different.

Explicitness over implicitness — when the language allows something to work by magic (implicit conversions, convention-based wiring, unnamed dependencies), prefer the version that states what's happening. The cost of a few extra characters is less than the cost of reconstructing hidden knowledge.

Type safety over convenience — use the type system to encode constraints even when it's more work. Distinct types for distinct semantics, validated wrappers over raw primitives, explicit error vocabularies over generic errors. "We could just use a String here" is almost always wrong.

Changeability over predictive design — make designs modular and replaceable so future needs can be accommodated, but don't add abstractions, extension points, or features for changes that haven't happened yet. Easy to modify beats designed for a specific predicted modification.

The disciplines

These are the foundation each persona builds on. Every agent applies all of them.

Start from the problem, not the solution

State what problem the user has before proposing any types, traits, or APIs. "The user needs X" comes before "here's a SessionStore trait." If you can't articulate the problem without referencing your solution, you don't understand the problem yet.

Explore before committing

If you only have one idea, you don't have any. The first interpretation of a problem statement is pattern retrieval — the LLM equivalent of "this looks like a thing I've seen before." Design starts when you generate a second interpretation that's genuinely different, not a variation on the first.

Before committing to a direction, explore the design space. Generate multiple approaches internally — different framings of what the problem is asking for, not just different implementations of the same framing. Include wild ideas. They're valuable not because you'll use them, but because they reveal what matters about the problem. An idea you reject teaches you something about why you rejected it — that "why" is design knowledge.

Present your best idea, not your first idea. The output is the winner of an internal competition. The ensemble output format has a "Key decisions: alternatives considered" section — use it to document what you explored, what you picked, why, and why you rejected the alternatives. This isn't bookkeeping; it gives the orchestrator visibility into whether real exploration happened.

The problem statement is where exploration starts, not where it ends. "Add convenience wrapper methods" is a hint about a pain point, not a design specification. What pain? Why is the current approach painful? What would "convenience" actually mean to the user? Different answers to those questions lead to genuinely different designs — one of which might be "thin wrappers" and another might be "a unified API that abstracts away the underlying complexity." Both are valid interpretations of "convenience"; only exploration reveals which one earns its keep.

Sketch consumer code first

Before designing any API, write the code that uses it. The handler, the call site, the configuration — the actual application code a user would write. Not as an afterthought example, but as the first artifact. The consumer sketch is the specification. It reveals:

What the API actually needs to provide (and what it doesn't)
Where type safety breaks down (runtime casts, stringly-typed maps)
Whether the abstraction can actually serve its purpose (can middleware do async work? can the handler access typed data?)
What the error paths look like from the consumer's perspective

If the consumer code is awkward, the API is wrong. Fix the API, not the consumer code.

When claiming consistency between two APIs (e.g., "guards use the same API as handlers"), write both consumer sketches side by side. If the method names, signatures, or interaction patterns differ, the claim is false — address the discrepancy before proceeding.

Inventory before inventing

Before proposing a new type, trait, or abstraction, write down what already exists that addresses the same need: in the codebase, in the language's stdlib, in the ecosystem. If nothing exists, say so explicitly. If something exists, start from it — extend, adapt, or compose it rather than building a parallel structure.

On a greenfield project, "what already exists" means the language's built-in types, stdlib, and idioms. A new type that duplicates what the language provides is a smell.

This is not "reuse for reuse's sake." It's a forcing function against the pattern-matching tendency to invent new abstractions when the problem doesn't require them.

Build up incrementally

Don't design the whole thing at once. Start with the smallest coherent piece. Validate it with a consumer sketch. Then add the next piece and see if it fits.

At each step, ask:

Does the new piece fit naturally with what's already there?
Or is it fighting the existing design?
If it's fighting, is the problem upstream? Would a different foundation make this piece fit naturally?

This is how you discover the shape of the problem. A big-bang design papers over these tensions. Incremental exploration surfaces them while they're cheap to fix.

Every step changes what you can see

Every design decision is made with incomplete information. As you explore further, the territory expands. At step B you could see one option and picked it. At step D you can see two options that weren't visible from B. Step C might provide evidence that the option you didn't pick is actually better. None of this means B was wrong at the time — it means the landscape changed and you need to look again.

The way you discover this is by constantly pushing on the design: "what if we did it this other way?" "Does this conform to our principles?" "This doesn't feel right — why?" These aren't idle questions. They're how you explore more of the map. Each question might reveal new options, new evidence, or new connections between decisions you thought were independent.

When the landscape changes, trace back through prior decisions. Not to check whether they were "disproved" — that's too binary. Check whether the option space has expanded. Maybe at step B you picked X because it was the only option you could see. Now at step D you can see X and Y. Which is better given everything you've learned? Maybe X is still right. Maybe Y is clearly better. Maybe you need to explore further to tell. All three of those outcomes require you to actually go back and look rather than assuming B is settled.

This is the core of the design loop. Skipping it is how you end up with designs that look coherent on the surface but have quiet contradictions baked in — an app-level-only registration model sitting next to a radix tree that already knows the route hierarchy, because nobody went back to check whether "app-level only" still made sense after discovering what the tree could do.

The cost of revisiting decisions is real but bounded. The cost of building on a decision that should have been revisited compounds with every step forward.

Question every abstraction

For each type, trait, or interface in your design, ask: is this here because the problem requires it, or because other systems have it? "Sessions usually have a SessionStore" is not a reason. "The framework needs to persist session data" is a reason — but only if the framework actually needs to own that responsibility.

The strongest signal that you're importing rather than discovering: your design has the same nouns as Rails/Phoenix/Express/Django and you're working in a language with fundamentally different idioms.

Name things precisely

Names are the primary interface of a design — they're how it communicates intent to every future reader. A design that's structurally sound can still fail in practice because a name suggested something different from what the thing actually does, and the user built a wrong mental model from it.

This discipline earns its keep at boundaries: public API types, method names, parameter names, module names — anywhere a user encounters a name and forms an expectation about what it means. Internal names matter too, but the cost of a misleading public name is much higher because it shapes how every consumer understands the system.

At each design step, ask:

Does the name come from the problem domain or the solution domain? Problem-domain names ("invoice," "route," "subscription") connect the code to what users already understand. Solution-domain names ("manager," "handler," "processor") describe implementation roles that tell the user nothing about what the thing actually does. Prefer problem-domain names for types the user interacts with; reserve solution-domain names for internal machinery where they're the clearest description of the role. Note that some of these words become problem-domain vocabulary in specific contexts — "handler" is the natural term in web frameworks and event systems. The concern is with using them as generic suffixes that avoid naming what the thing actually handles or manages.
Does the name describe what this thing does, or just what category it belongs to? "Validator" says it validates — but what? "InvoiceAmountValidator" says what it validates. "Store" says it stores — but a UserStore that also sends email notifications lies about its responsibilities. A name that categorizes without specifying is a name that lets scope creep in without anyone noticing.
Could the name mislead a user about what this thing does? If someone reading only the name would form a wrong expectation about the behavior, the name is wrong. This includes names that are accurate but incomplete — a Cache that also writes through to the database is misleading because the name suggests read-only caching.
Are there names that sound similar but mean different things? Session and SessionState, Token and TokenData, Route and Router — when names differ by a suffix or prefix, users will assume the relationship is systematic. If SessionState is not the state of a Session, the naming implies a relationship that doesn't exist.
Are there names that sound different but mean the same thing? If the design uses user in one place and account in another for the same concept, readers will assume these are different things and spend effort trying to understand the distinction. One concept, one name — everywhere. This is about vocabulary consistency, not boundary-qualified variants like UserInput and UserRecord — those are distinct types that need distinct names per "Distinguish values with distinct semantics."

"Distinguish values with distinct semantics" ensures different concepts get different types. This discipline ensures those types get names that communicate what they actually are. A type can be correctly distinct and still misleading if its name suggests the wrong thing.

When this discipline pushes for more precise names and the skeptic questions whether the named concepts need to exist at all, surface the tension — both are needed. Good names make abstractions easier to evaluate: a precisely named type reveals its purpose, which either justifies or undermines its existence.

Reason about ownership boundaries

For every capability in the design, ask: does the framework/library own this, or does the user own this? The answer should come from analysis of the consumer sketch, not from "frameworks usually own this."

The test: if you removed this from the framework and the user did it themselves, would anything break? Would anything get worse? If the user can do it better (more type-safe, more flexible, more natural in the language), the framework shouldn't own it.

This discipline draws the line between framework and user — the external boundary. For application-level design where the question is how to organize within the user's code, see "Separate layers."

Separate layers

"Reason about ownership boundaries" asks whether the framework or the user should own a capability — the external boundary. This discipline asks the internal question: within the application, does each piece of logic live in the right layer?

Layered applications have three layers: domain logic (pure business rules, zero infrastructure dependencies), orchestration (combines domain logic with infrastructure — databases, caches, queues), and presentation (adapts orchestration for a specific protocol — HTTP, GraphQL, CLI). This discipline operationalizes that separation as design-time questions.

Scope: This discipline applies when the skill is used for application-level design — systems with distinct domain, orchestration, and presentation concerns. For library or API design where there's no application layering, this discipline is not relevant. The ownership boundary discipline covers the framework/library boundary instead. Note that "infrastructure" in the questions below means infrastructure relative to the application's domain — a deployment tool's domain vocabulary naturally includes container and orchestration concepts, and "infrastructure" for that tool means the specific database or message queue it uses to do its work, not the deployment concepts it operates on.

At each design step, ask:

Which layer does this belong to? For every type, function, or interaction in the design, classify it: domain, orchestration, or presentation. If you can't classify it cleanly, the boundaries are blurred — which usually means domain logic has acquired an infrastructure dependency or presentation logic has absorbed business rules.
Does this domain type depend on any infrastructure? Domain types should have zero infrastructure dependencies — no database clients, no HTTP types, no cache interfaces. If a domain type needs to talk to infrastructure, it belongs in orchestration, or the domain type needs to express its need through an interface that orchestration satisfies.
Is orchestration leaking into domain types? The tell: a domain type that knows about connection pools, transaction boundaries, or retry policies. Domain logic defines what should happen; orchestration decides how to make it happen with real infrastructure.
Is domain logic leaking into presentation? The tell: a request handler that contains business rules instead of delegating to orchestration. When business logic lives in the presentation layer, it can't be reused by a different presentation (a CLI that needs the same logic as the HTTP API).
Could you swap the presentation layer without touching domain logic? HTTP → CLI, REST → GraphQL, synchronous → message-queue-driven. If swapping the presentation layer requires changes to domain types, those types have presentation concerns baked in. This is a thought experiment, not a requirement to actually build multiple presentations — it tests whether the boundary is clean.
Is the presentation layer coupled to orchestration internals? The tell: a request handler that manages transaction boundaries, knows about caching strategies, or handles retry logic instead of delegating those concerns to orchestration. Presentation should call orchestration and receive results — it shouldn't know how orchestration coordinates infrastructure.
Are there domain concepts that only exist because of an infrastructure choice? A PaginatedResult type in the domain layer exists because the database returns paginated results — that's infrastructure leaking into domain vocabulary. The domain might need "a bounded subset of results" but shouldn't know that the bound comes from database cursor limits.

The test for clean layer separation: can you describe the domain logic without mentioning any infrastructure technology? If explaining a business rule requires saying "database," "HTTP," "cache," or "queue," the rule has infrastructure dependencies that should live in orchestration instead.

"Check cohesion" asks whether things belong together within a type. This discipline asks a related but distinct question: do things belong together within a layer? A type can be internally cohesive but live in the wrong layer — a well-structured UserService that's cohesive in its responsibilities but mixes domain rules with database queries. Cohesion would call it fine (all methods serve "user management"); layer separation would flag the infrastructure dependency in domain logic.

"Reason about ownership boundaries" draws the line between framework and user. This discipline draws lines within the user's code. The two work at different scales but share a principle: clear boundaries make it obvious where each concern lives and prevent responsibilities from migrating to the wrong side.

When this discipline pushes toward extracting infrastructure from domain types and the skeptic questions whether the indirection is worth it, surface the tension. For small applications or early-stage code, the cost of clean separation may exceed the benefit — the answer depends on the scale and expected evolution of the application. But be honest about the tradeoff: coupling that's cheap today gets expensive as the application grows, and untangling it later costs more than separating it now. The skeptic's role here is to prevent ceremony — extracting an interface that has exactly one implementation and no prospect of a second is indirection without value. The discipline's role is to prevent entanglement — domain logic that can't be tested, reused, or understood without its infrastructure context. When both concerns are real, that's a tension worth presenting to the human.

Map state explicitly

State is where most design complexity hides. Boolean flags, nullable fields, and implicit modes are all state — just state without a name. Unnamed state can't be reasoned about systematically: you can't enumerate its transitions, verify its invariants, or ask what happens when an input arrives in a state you didn't consider.

This discipline earns its keep when a component has multiple interacting pieces of state, or when state determines which operations are valid. A single flag that independently tracks one condition doesn't need a state machine — it needs a good name. For everything else, answer these questions:

What states can it be in? Not just the happy-path states — include initialization, error recovery, shutdown, and any transitional states between them.
What are the transitions? What event or action moves the system from one state to another? Are there transitions the design allows but shouldn't?
What invariants hold in each state? What can the rest of the system assume about a component that's in state X? For invariants that span beyond individual states — cross-component relationships, ordering constraints — see "Articulate invariants."
What happens when an input arrives in a state where it doesn't apply? The design should answer this explicitly — ignore, error, queue, crash — rather than leaving it undefined.
Can the type system encode the states? If the language supports it (union types, sealed classes, enums, trait-based state machines), prefer type-level encoding over convention-level tracking. When the type system encodes the states, the compiler enforces completeness — every handler must account for every state.

The test for whether you've mapped state well: can someone new to the design draw the state diagram from your description? If not, the state model isn't explicit enough.

Articulate invariants

Every design establishes contracts — things that must always be true for the system to work correctly. Callers rely on ordering guarantees, components depend on data relationships, operations assume structural properties hold. When these contracts are implicit, nobody can verify them, nobody knows when they've been violated, and future changes break them without anyone realizing what happened until the symptoms surface far from the cause.

State modeling asks "what states can this component be in?" Invariant articulation asks the broader question: what must always be true across the entire design? Some invariants are specific to individual states within a single component (and "Map state explicitly" covers those). But many span the design — relationships between components, ordering guarantees across sequences of operations, structural properties that the system maintains by construction or by convention.

At each design step, ask:

What can callers rely on? What guarantees does this API make to its consumers? Not just what it returns, but what it promises about the state of the system after it returns. If a caller can't answer "what did this operation guarantee?" from the API alone, the contract is unstated.
What are the pre/post conditions of operations? What must be true before an operation is called, and what does it establish afterward? Preconditions that exist but aren't stated become silent failure modes — the operation "works" but produces wrong results because the caller didn't know what it needed.
What relationships between components are maintained by construction vs. by convention? Relationships maintained by construction (enforced by the type system, by encapsulation, by the structure of the code) are invariants callers can trust without knowing about them. Relationships maintained by convention (documented rules, expected call sequences, assumed configurations) are invariants that break when someone doesn't read the documentation.
Which invariants does the type system enforce, and which depend on correct usage? This is the enforcement boundary. Invariants the type system enforces are guarantees. Invariants that depend on correct usage are hopes — they hold until someone doesn't know about them. For each convention-dependent invariant, ask whether the type system could enforce it instead, and what the cost would be.

The test for whether invariants are well-articulated: can a new team member, reading only the design, list the contracts they must not violate? If any contract requires reading the implementation to discover, it's implicit.

When this discipline pushes toward articulating more invariants and the skeptic questions whether those contracts are necessary, surface the tension — both are needed. The skeptic prevents over-specification (inventing invariants the design doesn't actually need), while invariant articulation prevents under-specification (leaving real contracts unstated). The design process should make the tension visible so it gets an explicit decision.

Note that the skeptic's standard subtraction test — "what breaks if we remove this?" — is misleading for invariants. Removing an invariant breaks nothing now. The cost is deferred: it shows up when a future change violates a contract nobody knew existed. The right skeptic test for invariants is "what does this prevent?" — not what breaks today, but what incorrect states or sequences does this contract make impossible.

Check cohesion

The skeptic asks "should we remove this?" — the subtraction lens. Cohesion is the complementary grouping lens: do the things that survived subtraction belong together?

A type or module can pass every other discipline and still be a grab-bag of loosely related functionality. Each piece is justified individually — the skeptic couldn't remove any of them, the names are precise, the state is mapped, the invariants are articulated. But the pieces don't cohere. They ended up in the same type because they were discovered at the same time, or because they operate on the same data, or because nobody asked whether "same type" was the right grouping.

The cost of low cohesion is indirect: changes to one responsibility ripple into code that deals with an unrelated responsibility, because they share a type. Tests for one responsibility drag in setup for the other. The type's name becomes either dishonest (describing only one of its jobs) or vague (describing the grab-bag with a word like "service" or "utils").

At each design step, ask:

Does this type have a single coherent purpose? If you had to describe what it does in one sentence without using "and," could you? If "and" connects steps in a single workflow or facets of one responsibility ("authenticates users and issues tokens"), the type is likely cohesive. If "and" connects responsibilities that could each exist independently ("authenticates users and sends email notifications"), it's doing more than one thing. This isn't about counting methods — a type with many methods can be cohesive if they all serve the same purpose.
Are there methods or fields that are only used together with a subset of the other methods or fields? If a type has two clusters of functionality where each cluster uses its own fields and rarely touches the other's, the type is two things wearing a trenchcoat. The clusters should probably be separate types.
Would a change to one responsibility force changes to unrelated code in the same unit? If modifying how authentication works requires touching code that deals with request routing — not because they interact, but because they share a type — the grouping is wrong.
Did these things end up together for a reason, or by accident? Common accidents: they were discovered at the same time, they operate on the same raw data, they're called from the same place, or they were "too small" to extract. None of these are reasons for cohesion — they're reasons for proximity, which is different.
For modules or packages: do the types serve a single coherent theme? A module whose types all participate in one domain concept is cohesive. A module that's a drawer for unrelated helpers (utils, common, misc) has the same problem at a larger scale — the types don't cohere, they just cohabitate.

The test for cohesion: if you split this type along the cluster boundaries, would either half need to reach back into the other to function? If yes, the coupling is real and the grouping may be justified. If no, the grouping is accidental and the type should be split. For data types whose fields represent attributes of a single domain concept and co-travel through the system, cohesion comes from domain identity, not field interdependence — the split test applies to behavioral clusters, not to attribute groupings.

"Distinguish values with distinct semantics" can independently reach the same conclusion — different responsibilities often mean different semantic guarantees, which that discipline would flag as needing separate types. When both disciplines point at the same split, it's a strong signal — but validate even strong signals against consumer sketches, because two disciplines agreeing on the wrong split is still the wrong split. When only one does, surface the tension.

"Name things precisely" often reveals cohesion failures: a type that can't be named without "and" or that requires a vague name like "context" or "manager" is usually doing too much. The naming discipline catches the symptom; this discipline identifies the structural cause.

When this discipline pushes toward splitting a type and the skeptic resists because each piece is too small to justify its own type, surface the tension. Small pieces that don't cohere are still better separated — a small focused type is easier to understand than a large incoherent one. But the skeptic's resistance may also signal that the pieces genuinely belong together and the cohesion check is being too aggressive. The design process should make the tension visible.

Surface the grain

Every design has directions that are cheap to extend and directions that are expensive to change. When you decide "variants are types implementing a trait" or "data flows through a pipeline of transforms" or "each handler owns its own state," you create a grain. Adding another type to the trait family is with the grain. Changing the data flow from push to pull is against it. Neither direction is inherently right — what matters is knowing which is which so the tradeoff is deliberate.

This is not speculation about future needs. It's understanding the shape of what you're building now. A design where adding a new output format requires touching every handler is fine — if the team knows that and decided the tradeoff was worth it. A design where that same coupling exists but nobody noticed it is a future surprise.

At each design step, ask:

What's easy to add? If this design needed one more variant, handler, or type in the family, where does it slot in? How much existing code needs to change to accommodate it? If the answer is "add a new file and implement the trait" — that's with the grain. If the answer is "touch five existing types to thread the new thing through" — the grain doesn't run that way.
What's expensive to change? If a fundamental assumption changed — data flow direction, ownership boundary, concurrency model, storage strategy — how much of the design survives? Understanding how deep each assumption is embedded lets you make informed decisions about which assumptions to commit to.
Does the grain align with the domain's natural variation? If the design makes it cheap to add new data formats but expensive to add new data sources, and the domain naturally acquires new sources more often than new formats, the grain runs the wrong way.

The consumer-first discipline naturally reveals grain: sketching what "add one more variant" looks like as consumer code shows whether the grain runs in a useful direction. If the sketch for a new variant is clean and minimal — just implement the trait, register it, done — the grain is favorable. If the sketch requires the consumer to understand and modify internals scattered across the codebase, it isn't. When applying grain awareness, extend the consumer sketch beyond just current usage to include at least one "what would adding X look like?" scenario. Pick the most mundane addition — one more variant of the same kind the design already handles, not a fundamentally different kind of thing. If even mundane additions are expensive, the grain is genuinely misaligned. If only exotic additions are expensive, that's usually the correct tradeoff — designs can't be cheap in every direction.

The test for whether you've surfaced the grain: can you state, for each major structural decision, what it makes cheap and what it makes expensive? If you can't answer that for a decision, you don't yet understand its consequences.

"Check cohesion" asks whether things belong together. This discipline asks what the grouping makes cheap or expensive. A cohesive type can still have grain that's misaligned with the domain — the grouping is right, but the extension point runs in the wrong direction. Conversely, grain analysis might reveal that a type with good cohesion should be structured differently to align extension costs with the domain's variation points — which feeds back into whether the current grouping is actually the right one.

"Map state explicitly" makes grain decisions concrete. A choice between a flat enum and a trait-based state machine is fundamentally a grain choice: the enum makes adding transitions cheap (one match in each method) but adding states expensive (touch every match site); the trait-based machine makes adding states cheap (new class, implement the trait) but adding transitions expensive (new method on every state class). Neither is better in the abstract — the right choice depends on whether the domain's variation is in new states or new transitions.

When grain analysis and the skeptic point in different directions, surface the tension — don't silently rework the design. The misalignment may be an acceptable tradeoff given other constraints. The distinction between the two lenses: grain awareness observes the structural consequences of decisions already made; the skeptic guards against acting on predicted needs that haven't materialized. If the tension forces you to articulate why the grain observation matters independent of any specific prediction, that's the process working correctly.

Look for footguns

After sketching a design, look for ways a user could do something that looks correct but fails silently or in non-obvious ways:

Can the user set up a configuration that appears valid but doesn't work?
Can the user call methods in an order that compiles but produces wrong results?
Are there boolean flag combinations that represent illegal states? If so, revisit "Map state explicitly" — the state model is incomplete.
Does the API make it easy to forget a step?
Can the user confuse two values that have different semantics but the same type? If so, revisit "Distinguish values with distinct semantics" — the type boundaries may be too coarse.
Could a name lead the user to misunderstand what something does and use it incorrectly? If so, revisit "Name things precisely" — the name may be misleading or too generic.
Can an operation silently violate an invariant, or does the API rely on an ordering guarantee it doesn't enforce? If so, revisit "Articulate invariants" — the contract may be unstated.
Does a type do multiple unrelated things, making it unclear which part of its API applies to the user's current need? If so, revisit "Check cohesion" — the type may be a grab-bag of responsibilities.
Is the design expensive to extend in a direction the domain naturally varies? If so, revisit "Surface the grain" — the design's cheap-to-extend directions may not align with where variation actually occurs.
Can a domain type only be used when specific infrastructure is available? If so, revisit "Separate layers" — domain logic has acquired an infrastructure dependency.
Is any outcome implicit (success by silence, failure by absence)?
Can the user receive an error and not know what to do with it? If so, revisit "Design error vocabularies" — the error types may be too coarse or missing context.

The questions above target specific failure modes — things that go wrong. But a design can also fail by requiring knowledge it doesn't encode. Implicit mechanisms — setup ordering, required conventions, context-dependent behavior — create knowledge that exists only in the designer's head. After checking for concrete footguns, probe for implicit design knowledge:

What does a user need to know about this design that isn't encoded in the types or API?
Does any behavior depend on context that isn't visible at the call site?
Are there conventions the user must follow that aren't enforced by the type system?
If a user read only the type signatures (no docs, no examples), what would they get wrong?

The last question is a forcing function — it surfaces every piece of implicit knowledge the design depends on. Each answer is either an acceptable documentation requirement or a design problem to fix. If the list is long, the design is too implicit.

A candy-machine interface is one where the user can put the money in the slot and push the button and get something other than what they expected. Good design makes the right thing easy and the wrong thing impossible (or at least loud).

Distinguish values with distinct semantics

Two values that look similar but carry different guarantees, different lifetimes, or different validation states are not the same thing. Sharing a type between them forces callers to use out-of-band knowledge to tell them apart — which guarantee does this String carry? Is this User from input or from the database? Did this Config pass validation?

This discipline is an active counterbalance to subtraction. The skeptic asks "what breaks if we remove this type?" and the answer may be "nothing breaks right now." But that's the wrong test when two values have different semantics. The right test is: can a caller distinguish them without context they shouldn't need? If collapsing two types into one means the caller must remember which kind they're holding, the types should stay separate — even if the fields are identical today.

At each design step, ask:

Do any values in this design have different meanings but share a representation?
If two things look similar, do they carry different guarantees, different lifetimes, or different validation states?
Would collapsing two types into one force callers to use out-of-band knowledge to distinguish them?
At each data boundary (input, storage, output, inter-component), is the type actually the same concept or a similar-looking different concept?

When this discipline and the skeptic's subtraction pull in opposite directions, that's a tension worth surfacing — not a conflict to resolve silently. Both are needed: subtraction prevents unnecessary abstraction, distinct-semantics prevents premature unification. The design process should make the tension visible so it gets an explicit decision.

Design error vocabularies

Error types are not just return values — they're an API. "Distinguish values with distinct semantics" asks the general question: do any values in this design have different meanings but share a representation? This discipline applies that question specifically to error types, where the consequences of getting it wrong are distinct: a caller that can't tell errors apart can't handle them correctly, and context lost at a wrapping boundary is gone forever.

At each layer or boundary in the design:

What are the distinct failure modes? Enumerate them. If two failure modes need different handling by the caller, they need different representations. If two failure modes always get the same treatment, they might be one.
Does each error carry enough context for the caller to act? "Not found" is reporting. "Not found: key X in store Y" is actionable. If the caller needs to do something about the error — retry, fall back, present a specific message — it needs the data to do so.
Are distinct failures collapsed into one type? A generic "parse error" that covers both "malformed input" and "unsupported version" forces the caller to guess which it is. This is the error-specific form of premature unification.
How does context propagate across layers? When a low-level error gets wrapped by a higher-level one, is the original context preserved? Can a human (or a log aggregator) trace the error back to its source? Information lost at wrapping boundaries is gone forever.
Can the consumer distinguish errors that need different handling? The consumer sketch shows what the match statement looks like. The question here is whether the error types support the distinctions the consumer needs. If the consumer has to inspect string messages or use out-of-band knowledge to tell errors apart, the vocabulary is too coarse.

The consumer-first discipline sketches what error handling looks like from outside — it reveals what distinctions the caller needs. This discipline designs the error types themselves, ensuring they carry the right distinctions and enough context. Both are needed: the consumer sketch is the specification, this discipline is the implementation check.

When this discipline pushes toward more error variants and the skeptic's subtraction pushes toward fewer, surface the tension — don't resolve it silently. Each error variant added to a public API is hard to remove later; the consumer sketch shows which distinctions the caller actually needs versus which are internal detail.

When in doubt, ask

Design is full of decision points where two reasonable paths diverge. When you hit one — when a design choice could go either way and you don't have a clear reason to prefer one — stop and ask. Present the tradeoff, say what you're uncertain about, and get input before committing to a direction.

The instinct to keep moving and produce a complete design is the enemy here. An unasked question that leads to a wrong turn costs more than the pause. The whole point of collaboration is that the human has context and judgment that the model doesn't. Use it.

This applies especially to:

Ownership boundaries (should the framework own this or the user?)
Abstraction level (is this too much? too little?)
When the consumer sketch reveals a tension and you're not sure which side to resolve it on
When you're about to add something because other systems have it but you're not sure this system needs it

Design is the map, the plan is one path

Design explores the territory — the full space of what could exist, how pieces relate, where the constraints are. A plan picks one path through explored territory to implement within a specific scope. The map doesn't stop being useful once you pick a path. Keep mapping while you walk, because what you learn during implementation changes your understanding of the territory.

Design insights that go beyond the current plan's scope are still valuable. They inform constraints on what you build now (don't build something incompatible with where you're headed) and get recorded (discussions, issues, plan notes) so future work can use them. "This is the right design" and "we implement all of it now" are separate decisions.

The disciplines in this skill — consumer sketches, abstraction questioning, footgun scanning — are not a checklist you run once during a "design phase." They're lenses you keep applying. Every time the design changes, run them again on the changed parts and on the parts that depend on what changed. A discipline that only runs once is a ritual, not a practice.

Anti-patterns

These are the specific failure modes this skill exists to prevent. If you catch yourself doing any of these, stop and reorient.

Designing the complete system at once. If your first artifact is a full design document with all types, all interactions, all edge cases — you skipped the discovery process. Back up. What's the smallest piece?

Starting from solution shape instead of problem statement. If your design document opens with type definitions rather than "the user needs to..." — you're retrieving, not designing.

Committing to the first interpretation. When the problem statement implies an approach ("add convenience wrapper methods"), treating that as the design rather than as a hint about a pain point. If every persona produced the same kind of solution, nobody explored — they all pattern-matched on the problem statement and dressed up the first idea. The tell: the "alternatives considered" section is empty or lists only variations on the same approach, not genuinely different framings of what the problem is asking for.

Importing patterns without questioning fit. String-keyed maps, middleware chains, context objects, store traits — these exist in many frameworks. Their presence elsewhere is not evidence that your system needs them. Evaluate each one against the actual consumer code.

Consumer code as an afterthought. If the example usage appears at the end of the design document (or not at all), the API was designed without its primary constraint. Move the consumer code to the beginning.

Giving the framework too much responsibility. When in doubt, the user owns it. The framework can always take on more responsibility later; taking it back is a breaking change.

Claiming consistency without verifying it. "This uses the same API as X" is a testable claim. Write both usages side by side. If they don't match, the design has a problem — either make them actually match or drop the claim and design each on its own merits.

Naming from the solution domain instead of the problem domain. If you can't explain what a type does without restating its name — "it manages connections," "it handles requests," "it processes events" — the name is a solution-domain label, not a problem-domain description. The tell is that the name describes the type's role in the code structure rather than what it means to the user. A ConnectionManager might be fine as an internal implementation detail, but if it's in the public API, the user has to open the docs to find out what it manages and how.

Inventing when extending would suffice. Proposing a new type when the codebase, language, or stdlib already has something that serves the same purpose. The new type may feel cleaner in isolation but adds a concept the user must learn and the codebase must maintain.

Collapsing distinct semantics into a shared type. Using one type for two values that carry different guarantees — user input and a validated record, a database row and an API response, a request and a stored entity. The fields may look identical, but the values mean different things at different boundaries. The skeptic's "what breaks if we remove this?" may not catch it because nothing breaks structurally — the breakage is semantic, showing up as a caller that treats an unvalidated input as a trusted record or vice versa.

Treating error types as an afterthought. Designing the happy-path types and interactions first, then bolting on generic error types at the end. Error vocabularies designed this way tend to be too coarse — one error type for many distinct failure modes — because they weren't part of the design exploration. Error types should emerge alongside the domain types; they're part of the API, not an appendage.

Skipping evaluation. Producing a design from the design personas and going straight to implementation without running the evaluation stage. The evaluation personas catch structural problems — security gaps, performance ceilings, untestable interfaces, adversarial usage scenarios — that the design personas aren't looking for.

Representing state implicitly through field combinations. When multiple fields interact — boolean flags, optional fields, status enums paired with counters — the component has an implicit state machine. Boolean flags are the most common form: n interacting flags create 2^n combinations, most invalid (authenticated-but-not-connected is probably illegal). But any set of fields whose valid combinations are a proper subset of all possible combinations has the same problem: no transitions, no invariants, no handling for illegal combinations. Name the states instead: the valid combinations are the actual states, and making them explicit lets the type system enforce which transitions are legal.

Leaving invariants implicit. When a design establishes contracts between components — ordering requirements, data relationships, structural properties — but never states them. The contracts exist whether or not they're documented: callers depend on them, implementations preserve them, and future changes break them. The difference between a stated invariant and an unstated one is that when the stated one breaks, you know what went wrong and why. When the unstated one breaks, you get symptoms far from the cause and no trail to follow.

Committing to structure without understanding its grain. Making structural decisions — "variants are enum cases," "handlers are registered at startup," "data flows through a transform pipeline" — without asking what those decisions make cheap and what they make expensive. Every structural choice creates a grain; ignoring the grain means discovering the tradeoff only when you need to go against it, at which point the cost is high and the decision is baked in. The result is designs where the expensive direction turns out to be the one the domain actually needs, and nobody knew until the first extension attempt.

Embedding infrastructure in domain types. When a domain type directly uses database clients, HTTP types, cache interfaces, or other infrastructure. The domain type might be correct about the business logic, but it can't be tested without the infrastructure, can't be reused in a different deployment context, and can't survive an infrastructure swap. This is distinct from "accumulating unrelated responsibilities" — the type may be cohesive in purpose but coupled to the wrong layer.

Accumulating unrelated responsibilities in a type. When a type grows by accretion — each new responsibility is too small to extract on its own, so it gets added to the nearest existing type. The result is a type that does five things, none of which is its primary purpose, because no single addition was large enough to trigger extraction. The tell: methods or fields that cluster into subsets that don't interact with each other, or a type whose name has drifted to something vague ("manager," "context," "helper") because no precise name can describe everything it does. Unlike "collapsing distinct semantics," which is about values sharing a type they shouldn't, this is about responsibilities sharing a type they shouldn't.

Pony-specific design guidance

When designing Pony APIs, libraries, or framework features:

Leverage the type system. Union types for state instead of boolean flags. Distinct types for distinct semantics. Any val is almost always a sign that the design is avoiding a harder type-safety question — answer that question instead.
Make illegal states unrepresentable. Pony's type system is strong enough to encode most constraints. If a combination of values is illegal, the types should prevent it from existing.
Be skeptical of patterns from dynamic-language frameworks. Middleware chains, string-keyed context maps, convention-over-configuration — these patterns compensate for weak type systems. Pony has a strong type system. Use it. The idiomatic solution in Pony often looks nothing like the idiomatic solution in Ruby or Python.
Favor trait-based state machines for stateful components. When a component has distinct phases where valid operations change between them, represent each phase as a separate type implementing a shared interface rather than using flags or optional fields. The actor becomes a thin shell that delegates to the current state object — state transitions replace the object, and the old state's resources are released automatically. See the state machine pattern in pony-ref. Marker primitives with scattered match _state / if _state is checks are a step above boolean flags but still fragment behavior across the actor's methods — use them only for simple guards within a single behavior, not for routing different behavior across multiple events. The test: if more than one behavior needs to check the state to decide what to do, the actor needs a trait-based state machine.
Think about capabilities. Who can read this? Who can write it? Who can send it across actor boundaries? If the design requires val but the consumer naturally produces ref, there's a friction point worth examining.
Understand what composition looks like in the specific context. Don't assume a single composition pattern. In a web framework with actor-per-request, the handler actor might be the composition unit. In a parsing library, it might be function composition. In a template engine, it might be class hierarchies. Let the problem dictate the composition model.

name	pony-software-design
description	Disciplines for software design work. Load when designing APIs, type systems, features, or system boundaries. Counters the tendency to retrieve familiar patterns instead of discovering what the problem actually needs. Has full (8-persona) and lightweight (5-persona) modes.
disable-model-invocation	false

Software Design

Design is the act of discovering what is needed. It's about finding surprising affordances and avoiding candy-machine interfaces riddled with footguns.

Mode selection

Full mode is the default. Use it when:

Defining new boundaries — a new subsystem, API surface, or type hierarchy
Multiple ownership or abstraction decisions need to be made simultaneously
There's genuine design space to explore — multiple viable approaches
The design must be described from first principles, not as an extension of something that already exists

Lightweight mode is for bounded design work within established patterns:

Adding a method to an existing API, a new variant in a type family, a new handler following established conventions
The boundaries are already decided — you're filling in, not redrawing
Consumer code patterns are already established by adjacent features
The task can be described as "another X that does Y" where X already exists in the codebase

When in doubt, use full mode. Lightweight is appropriate when there's a clear existing pattern being extended and the boundaries are already decided.

Process: full mode

The full design process runs in two stages with a feedback loop. Load pony-ensemble for the mechanical process; the personas defined in this skill replace the generic attention focuses.

Relationship to Ensemble Workflow

Orchestrator pre-spawn: understand the problem

Before spawning design personas, the orchestrator must decompose the problem statement. Ask: "What are we trying to accomplish? What pain point does this address?"

Stage 1: Design

File	Focus
`consumer-first.md`	Starts from usage code, derives API from what makes call sites clean
`skeptic.md`	Questions every abstraction, tries to subtract, proposes the smallest design
`principle-checker.md`	Hard verification of each design principle with evidence

Orchestrator post-return: evaluate exploration quality

When personas return, the orchestrator evaluates the quality of exploration before forwarding outputs to synthesis. Check:

Did the personas explore meaningfully different approaches? Read each persona's "Key decisions / alternatives considered" section. If a persona only considered one interpretation of the problem and designed for it, the exploration was insufficient.
Did they all anchor on the same narrow interpretation? If all three personas produced designs based on the same framing of the problem — e.g., all three designed wrapper methods — the ensemble failed to decorrelate. The different entry points weren't enough to overcome the anchoring effect of the problem statement.
Is the reasoning for rejecting alternatives sound? A persona that considered a wild idea and rejected it with good reasoning explored genuinely. A persona that listed an alternative and dismissed it in one sentence didn't.

Stage 1 synthesis

Stage 1 synthesis produces a candidate design using the standard ensemble synthesis process. The synthesis should pay special attention to:

Where the consumer-first designer's sketches conflict with the skeptic's subtractions — the tension usually reveals the right boundary
Where the principle checker found a violation that the others missed — this is the highest-value finding
Whether all three converged on the same abstraction — convergence from different starting points is strong signal
When the skeptic says "no value" and redirects: The skeptic is the only persona whose job includes saying "this shouldn't exist." When the skeptic concludes the proposed design doesn't earn its keep and redirects toward a different problem, this is not a minority position to be outvoted because the other two produced designs. The other two will always produce a design — that's their job. Evaluate the skeptic's case on its merits. If the skeptic is right that the design is a thin wrapper, the synthesis output should adopt the skeptic's redirect as the foundation: "the proposed design doesn't earn its keep — here's the actual problem worth solving" becomes the candidate, not a blended version of the thin wrapper.

Stage 2: Evaluation

File	Focus
`security.md`	Trust boundaries, attack surfaces, resource bounds in the design
`performance.md`	Architectural bottlenecks, coordination points, data structure choices
`adversarial.md`	Concrete usage scenarios that lead to bad outcomes
`testability.md`	Whether the design is verifiable — observable effects, isolatable components
`wildcard.md`	What all the other personas missed

For the wildcard persona specifically: include the identity statement (first paragraph) from each of the other personas so the wildcard knows what territory is already covered.

Evaluation personas identify problems and assess impact — they do not categorize their own findings as Rejection/Adjustment/Tension. Categorization is the synthesis step's responsibility.

Evaluation persona output format

Each evaluation persona produces two artifacts:

Summary (returned to orchestrator) — a structured summary for the synthesizer to work from:

Findings — ordered by impact (Structural > Significant > Minor). Each:

Design element: The type, boundary, API, or interaction being evaluated
Concern: What the problem is
Impact: Structural (requires rethinking the approach), significant (requires notable changes to the candidate), or minor (small adjustment)
Evidence: Brief — full evidence is in the file
Suggested change: If applicable

Passes — things checked that look correct. Brief.

Uncertainties — things the persona couldn't determine, and why.

Stage 2 synthesis categorizes each finding:

Rejection: A structural problem that invalidates the design direction. The candidate cannot be fixed by adjustment — the design personas need to rethink the approach. The rejection includes why the direction fails and what constraint it violates.
Adjustment: A specific aspect that needs to change, but the overall direction is sound. Becomes a constraint for the next design iteration.
Tension: A fundamental conflict that the personas cannot resolve — it requires human judgment. Collected and presented at the end.

The loop

After stage 2 synthesis:

If there are only tensions (no rejections or adjustments), the loop terminates. Present the design with the tensions for human review.
If there are adjustments and/or rejections, feed them back to the design personas. Each design persona receives: the prior candidate design, the original problem statement, and the categorized findings. Rejections include the rationale for why the direction failed — design personas should explore a different approach, not patch the rejected one. Adjustments include the specific change needed and why — design personas should revise the candidate to incorporate them as constraints.
The design personas run again with this context, producing a revised candidate.
Evaluation runs again on the revised candidate. Evaluation personas run with fresh context (no knowledge of prior evaluations). The synthesis step receives the full history so it can track convergence.
Repeat until clean or until convergence failure is detected.

Convergence failure

The orchestrator monitors the loop for signs that it isn't converging:

The same evaluation concern keeps appearing across iterations, even after design revisions attempt to address it
Rejections and adjustments are contradicting each other (fixing one evaluation concern breaks another)
The design is growing more complex with each iteration rather than settling

Output

The final output includes:

Accepted design (if one emerged): The candidate that passed evaluation, with consumer sketches, type definitions, and boundary decisions.
Rejected designs: Each candidate that was rejected during the loop, with the rejection rationale. These are valuable — they document explored territory and why it didn't work.
Unresolved tensions: Findings categorized as tensions that require human judgment.

If the loop terminated via convergence failure rather than a clean evaluation, there is no accepted design — only the history of attempts, the rejections, and the tensions.

Process: lightweight mode

Orchestrator pre-spawn: understand the problem

Stage 1: Design

Three design personas explore the problem in parallel — the same as full mode. The same disciplines apply; the decorrelation still comes from different entry points.

File	Focus
`consumer-first.md`	Starts from usage code, derives API from what makes call sites clean
`skeptic.md`	Questions every abstraction, tries to subtract, proposes the smallest design
`principle-checker.md`	Hard verification of each design principle with evidence

Orchestrator post-return: evaluate exploration quality

Same as full mode. Evaluate whether personas explored meaningfully different approaches before forwarding to synthesis. If all three anchored on the same narrow interpretation, re-brief and re-spawn.

Stage 2: Evaluation

When	Pick
Design touches trust boundaries or external input	`security.md`
Design is on a hot path or introduces coordination points	`performance.md`
Design has complex state or will be hard to test in isolation	`testability.md`

Stage 2 synthesis categorizes each finding as Rejection, Adjustment, or Tension using the same scheme as full mode.

No loop — single pass

Lightweight mode does not iterate. After Stage 2 synthesis:

Adjustments and tensions: Present the design with findings to the human. Adjustments are expected to be small enough that the orchestrator or human can apply them directly. If adjustments collectively amount to redesigning rather than tweaking, that's the same escalation signal as high finding density — present it to the human.
Rejection: The design direction is wrong. Present the rejected candidate, the rejection rationale, and all other findings to the human. The human decides what to do — escalate to full mode, fix it directly, rethink the problem statement, or something else. Lightweight doesn't prescribe the response; it presents the information.

Output

The final output includes:

Candidate design: With consumer sketches and type definitions.
Adjustments: Specific changes needed, small enough to apply directly.
Tensions: Conflicts requiring human judgment.
Rejection rationale: If applicable — the structural finding and why the design direction was rejected.

Design values

When principles conflict, these values set the priority. We value the left side over the right — but the right side still matters when the left isn't at stake.

API safety over API minimality — an error-prone API should be fixed even if the fix adds surface. Prefer solutions that don't expand the API, but never leave a footgun to preserve minimality.

Correctness over performance — never sacrifice correctness for speed. Get it right first, then optimize. A faster wrong answer is still wrong.

Correctness over concision — correct but verbose beats concise but wrong. Don't simplify code or APIs at the cost of correct behavior.

Security over performance — never skip validation at trust boundaries for speed. Optimize how you validate, not whether you validate. Security is correctness.

Interface simplicity over implementation simplicity — accept a harder implementation to give users a clean interface. The consumer's experience matters more than the implementer's convenience.

Simplicity over consistency — don't force artificial consistency when it makes things harder to use. If two similar things genuinely need different interfaces, let them be different.

The disciplines

These are the foundation each persona builds on. Every agent applies all of them.

Start from the problem, not the solution

Explore before committing

Sketch consumer code first

What the API actually needs to provide (and what it doesn't)
Where type safety breaks down (runtime casts, stringly-typed maps)
Whether the abstraction can actually serve its purpose (can middleware do async work? can the handler access typed data?)
What the error paths look like from the consumer's perspective

If the consumer code is awkward, the API is wrong. Fix the API, not the consumer code.

Inventory before inventing

On a greenfield project, "what already exists" means the language's built-in types, stdlib, and idioms. A new type that duplicates what the language provides is a smell.

This is not "reuse for reuse's sake." It's a forcing function against the pattern-matching tendency to invent new abstractions when the problem doesn't require them.

Build up incrementally

Don't design the whole thing at once. Start with the smallest coherent piece. Validate it with a consumer sketch. Then add the next piece and see if it fits.

At each step, ask:

Does the new piece fit naturally with what's already there?
Or is it fighting the existing design?
If it's fighting, is the problem upstream? Would a different foundation make this piece fit naturally?

This is how you discover the shape of the problem. A big-bang design papers over these tensions. Incremental exploration surfaces them while they're cheap to fix.

Every step changes what you can see

The cost of revisiting decisions is real but bounded. The cost of building on a decision that should have been revisited compounds with every step forward.

Question every abstraction

Name things precisely

At each design step, ask:

Does the name come from the problem domain or the solution domain? Problem-domain names ("invoice," "route," "subscription") connect the code to what users already understand. Solution-domain names ("manager," "handler," "processor") describe implementation roles that tell the user nothing about what the thing actually does. Prefer problem-domain names for types the user interacts with; reserve solution-domain names for internal machinery where they're the clearest description of the role. Note that some of these words become problem-domain vocabulary in specific contexts — "handler" is the natural term in web frameworks and event systems. The concern is with using them as generic suffixes that avoid naming what the thing actually handles or manages.
Does the name describe what this thing does, or just what category it belongs to? "Validator" says it validates — but what? "InvoiceAmountValidator" says what it validates. "Store" says it stores — but a UserStore that also sends email notifications lies about its responsibilities. A name that categorizes without specifying is a name that lets scope creep in without anyone noticing.
Could the name mislead a user about what this thing does? If someone reading only the name would form a wrong expectation about the behavior, the name is wrong. This includes names that are accurate but incomplete — a Cache that also writes through to the database is misleading because the name suggests read-only caching.
Are there names that sound similar but mean different things? Session and SessionState, Token and TokenData, Route and Router — when names differ by a suffix or prefix, users will assume the relationship is systematic. If SessionState is not the state of a Session, the naming implies a relationship that doesn't exist.
Are there names that sound different but mean the same thing? If the design uses user in one place and account in another for the same concept, readers will assume these are different things and spend effort trying to understand the distinction. One concept, one name — everywhere. This is about vocabulary consistency, not boundary-qualified variants like UserInput and UserRecord — those are distinct types that need distinct names per "Distinguish values with distinct semantics."

Reason about ownership boundaries

Separate layers

At each design step, ask:

Which layer does this belong to? For every type, function, or interaction in the design, classify it: domain, orchestration, or presentation. If you can't classify it cleanly, the boundaries are blurred — which usually means domain logic has acquired an infrastructure dependency or presentation logic has absorbed business rules.
Does this domain type depend on any infrastructure? Domain types should have zero infrastructure dependencies — no database clients, no HTTP types, no cache interfaces. If a domain type needs to talk to infrastructure, it belongs in orchestration, or the domain type needs to express its need through an interface that orchestration satisfies.
Is orchestration leaking into domain types? The tell: a domain type that knows about connection pools, transaction boundaries, or retry policies. Domain logic defines what should happen; orchestration decides how to make it happen with real infrastructure.
Is domain logic leaking into presentation? The tell: a request handler that contains business rules instead of delegating to orchestration. When business logic lives in the presentation layer, it can't be reused by a different presentation (a CLI that needs the same logic as the HTTP API).
Could you swap the presentation layer without touching domain logic? HTTP → CLI, REST → GraphQL, synchronous → message-queue-driven. If swapping the presentation layer requires changes to domain types, those types have presentation concerns baked in. This is a thought experiment, not a requirement to actually build multiple presentations — it tests whether the boundary is clean.
Is the presentation layer coupled to orchestration internals? The tell: a request handler that manages transaction boundaries, knows about caching strategies, or handles retry logic instead of delegating those concerns to orchestration. Presentation should call orchestration and receive results — it shouldn't know how orchestration coordinates infrastructure.
Are there domain concepts that only exist because of an infrastructure choice? A PaginatedResult type in the domain layer exists because the database returns paginated results — that's infrastructure leaking into domain vocabulary. The domain might need "a bounded subset of results" but shouldn't know that the bound comes from database cursor limits.

Map state explicitly

What states can it be in? Not just the happy-path states — include initialization, error recovery, shutdown, and any transitional states between them.
What are the transitions? What event or action moves the system from one state to another? Are there transitions the design allows but shouldn't?
What invariants hold in each state? What can the rest of the system assume about a component that's in state X? For invariants that span beyond individual states — cross-component relationships, ordering constraints — see "Articulate invariants."
What happens when an input arrives in a state where it doesn't apply? The design should answer this explicitly — ignore, error, queue, crash — rather than leaving it undefined.
Can the type system encode the states? If the language supports it (union types, sealed classes, enums, trait-based state machines), prefer type-level encoding over convention-level tracking. When the type system encodes the states, the compiler enforces completeness — every handler must account for every state.

The test for whether you've mapped state well: can someone new to the design draw the state diagram from your description? If not, the state model isn't explicit enough.

Articulate invariants

At each design step, ask:

What can callers rely on? What guarantees does this API make to its consumers? Not just what it returns, but what it promises about the state of the system after it returns. If a caller can't answer "what did this operation guarantee?" from the API alone, the contract is unstated.
What are the pre/post conditions of operations? What must be true before an operation is called, and what does it establish afterward? Preconditions that exist but aren't stated become silent failure modes — the operation "works" but produces wrong results because the caller didn't know what it needed.
What relationships between components are maintained by construction vs. by convention? Relationships maintained by construction (enforced by the type system, by encapsulation, by the structure of the code) are invariants callers can trust without knowing about them. Relationships maintained by convention (documented rules, expected call sequences, assumed configurations) are invariants that break when someone doesn't read the documentation.
Which invariants does the type system enforce, and which depend on correct usage? This is the enforcement boundary. Invariants the type system enforces are guarantees. Invariants that depend on correct usage are hopes — they hold until someone doesn't know about them. For each convention-dependent invariant, ask whether the type system could enforce it instead, and what the cost would be.

Check cohesion

The skeptic asks "should we remove this?" — the subtraction lens. Cohesion is the complementary grouping lens: do the things that survived subtraction belong together?

At each design step, ask:

Does this type have a single coherent purpose? If you had to describe what it does in one sentence without using "and," could you? If "and" connects steps in a single workflow or facets of one responsibility ("authenticates users and issues tokens"), the type is likely cohesive. If "and" connects responsibilities that could each exist independently ("authenticates users and sends email notifications"), it's doing more than one thing. This isn't about counting methods — a type with many methods can be cohesive if they all serve the same purpose.
Are there methods or fields that are only used together with a subset of the other methods or fields? If a type has two clusters of functionality where each cluster uses its own fields and rarely touches the other's, the type is two things wearing a trenchcoat. The clusters should probably be separate types.
Would a change to one responsibility force changes to unrelated code in the same unit? If modifying how authentication works requires touching code that deals with request routing — not because they interact, but because they share a type — the grouping is wrong.
Did these things end up together for a reason, or by accident? Common accidents: they were discovered at the same time, they operate on the same raw data, they're called from the same place, or they were "too small" to extract. None of these are reasons for cohesion — they're reasons for proximity, which is different.
For modules or packages: do the types serve a single coherent theme? A module whose types all participate in one domain concept is cohesive. A module that's a drawer for unrelated helpers (utils, common, misc) has the same problem at a larger scale — the types don't cohere, they just cohabitate.

Surface the grain

At each design step, ask:

What's easy to add? If this design needed one more variant, handler, or type in the family, where does it slot in? How much existing code needs to change to accommodate it? If the answer is "add a new file and implement the trait" — that's with the grain. If the answer is "touch five existing types to thread the new thing through" — the grain doesn't run that way.
What's expensive to change? If a fundamental assumption changed — data flow direction, ownership boundary, concurrency model, storage strategy — how much of the design survives? Understanding how deep each assumption is embedded lets you make informed decisions about which assumptions to commit to.
Does the grain align with the domain's natural variation? If the design makes it cheap to add new data formats but expensive to add new data sources, and the domain naturally acquires new sources more often than new formats, the grain runs the wrong way.

Look for footguns

After sketching a design, look for ways a user could do something that looks correct but fails silently or in non-obvious ways:

Can the user set up a configuration that appears valid but doesn't work?
Can the user call methods in an order that compiles but produces wrong results?
Are there boolean flag combinations that represent illegal states? If so, revisit "Map state explicitly" — the state model is incomplete.
Does the API make it easy to forget a step?
Can the user confuse two values that have different semantics but the same type? If so, revisit "Distinguish values with distinct semantics" — the type boundaries may be too coarse.
Could a name lead the user to misunderstand what something does and use it incorrectly? If so, revisit "Name things precisely" — the name may be misleading or too generic.
Can an operation silently violate an invariant, or does the API rely on an ordering guarantee it doesn't enforce? If so, revisit "Articulate invariants" — the contract may be unstated.
Does a type do multiple unrelated things, making it unclear which part of its API applies to the user's current need? If so, revisit "Check cohesion" — the type may be a grab-bag of responsibilities.
Is the design expensive to extend in a direction the domain naturally varies? If so, revisit "Surface the grain" — the design's cheap-to-extend directions may not align with where variation actually occurs.
Can a domain type only be used when specific infrastructure is available? If so, revisit "Separate layers" — domain logic has acquired an infrastructure dependency.
Is any outcome implicit (success by silence, failure by absence)?
Can the user receive an error and not know what to do with it? If so, revisit "Design error vocabularies" — the error types may be too coarse or missing context.

What does a user need to know about this design that isn't encoded in the types or API?
Does any behavior depend on context that isn't visible at the call site?
Are there conventions the user must follow that aren't enforced by the type system?
If a user read only the type signatures (no docs, no examples), what would they get wrong?

Distinguish values with distinct semantics

At each design step, ask:

Do any values in this design have different meanings but share a representation?
If two things look similar, do they carry different guarantees, different lifetimes, or different validation states?
Would collapsing two types into one force callers to use out-of-band knowledge to distinguish them?
At each data boundary (input, storage, output, inter-component), is the type actually the same concept or a similar-looking different concept?

Design error vocabularies

At each layer or boundary in the design:

What are the distinct failure modes? Enumerate them. If two failure modes need different handling by the caller, they need different representations. If two failure modes always get the same treatment, they might be one.
Does each error carry enough context for the caller to act? "Not found" is reporting. "Not found: key X in store Y" is actionable. If the caller needs to do something about the error — retry, fall back, present a specific message — it needs the data to do so.
Are distinct failures collapsed into one type? A generic "parse error" that covers both "malformed input" and "unsupported version" forces the caller to guess which it is. This is the error-specific form of premature unification.
How does context propagate across layers? When a low-level error gets wrapped by a higher-level one, is the original context preserved? Can a human (or a log aggregator) trace the error back to its source? Information lost at wrapping boundaries is gone forever.
Can the consumer distinguish errors that need different handling? The consumer sketch shows what the match statement looks like. The question here is whether the error types support the distinctions the consumer needs. If the consumer has to inspect string messages or use out-of-band knowledge to tell errors apart, the vocabulary is too coarse.

When in doubt, ask

This applies especially to:

Ownership boundaries (should the framework own this or the user?)
Abstraction level (is this too much? too little?)
When the consumer sketch reveals a tension and you're not sure which side to resolve it on
When you're about to add something because other systems have it but you're not sure this system needs it

Design is the map, the plan is one path

Anti-patterns

These are the specific failure modes this skill exists to prevent. If you catch yourself doing any of these, stop and reorient.

Starting from solution shape instead of problem statement. If your design document opens with type definitions rather than "the user needs to..." — you're retrieving, not designing.

Giving the framework too much responsibility. When in doubt, the user owns it. The framework can always take on more responsibility later; taking it back is a breaking change.

Pony-specific design guidance

When designing Pony APIs, libraries, or framework features:

Leverage the type system. Union types for state instead of boolean flags. Distinct types for distinct semantics. Any val is almost always a sign that the design is avoiding a harder type-safety question — answer that question instead.
Make illegal states unrepresentable. Pony's type system is strong enough to encode most constraints. If a combination of values is illegal, the types should prevent it from existing.
Be skeptical of patterns from dynamic-language frameworks. Middleware chains, string-keyed context maps, convention-over-configuration — these patterns compensate for weak type systems. Pony has a strong type system. Use it. The idiomatic solution in Pony often looks nothing like the idiomatic solution in Ruby or Python.
Favor trait-based state machines for stateful components. When a component has distinct phases where valid operations change between them, represent each phase as a separate type implementing a shared interface rather than using flags or optional fields. The actor becomes a thin shell that delegates to the current state object — state transitions replace the object, and the old state's resources are released automatically. See the state machine pattern in pony-ref. Marker primitives with scattered match _state / if _state is checks are a step above boolean flags but still fragment behavior across the actor's methods — use them only for simple guards within a single behavior, not for routing different behavior across multiple events. The test: if more than one behavior needs to check the state to decide what to do, the actor needs a trait-based state machine.
Think about capabilities. Who can read this? Who can write it? Who can send it across actor boundaries? If the design requires val but the consumer naturally produces ref, there's a friction point worth examining.
Understand what composition looks like in the specific context. Don't assume a single composition pattern. In a web framework with actor-per-request, the handler actor might be the composition unit. In a parsing library, it might be function composition. In a template engine, it might be class hierarchies. Let the problem dictate the composition model.

pony-software-design

Plus depuis ce dépôt

Plus depuis ce dépôt

Software Design

Mode selection

Process: full mode

Relationship to Ensemble Workflow

Orchestrator pre-spawn: understand the problem

Stage 1: Design

Orchestrator post-return: evaluate exploration quality

Stage 1 synthesis

Stage 2: Evaluation

Evaluation persona output format

The loop

Convergence failure

Output

Process: lightweight mode

Orchestrator pre-spawn: understand the problem

Stage 1: Design

Orchestrator post-return: evaluate exploration quality

Stage 2: Evaluation

No loop — single pass

Output

Design values

The disciplines

Start from the problem, not the solution

Explore before committing

Sketch consumer code first

Inventory before inventing

Build up incrementally

Every step changes what you can see

Question every abstraction

Name things precisely

Reason about ownership boundaries

Separate layers

Map state explicitly

Articulate invariants

Check cohesion

Surface the grain

Look for footguns

Distinguish values with distinct semantics

Design error vocabularies

When in doubt, ask

Design is the map, the plan is one path

Anti-patterns

Pony-specific design guidance

Software Design

Mode selection

Process: full mode

Relationship to Ensemble Workflow

Orchestrator pre-spawn: understand the problem

Stage 1: Design

Orchestrator post-return: evaluate exploration quality

Stage 1 synthesis

Stage 2: Evaluation

Evaluation persona output format

The loop

Convergence failure

Output

Process: lightweight mode

Orchestrator pre-spawn: understand the problem

Stage 1: Design

Orchestrator post-return: evaluate exploration quality

Stage 2: Evaluation

No loop — single pass

Output

Design values

The disciplines

Start from the problem, not the solution

Explore before committing

Sketch consumer code first

Inventory before inventing

Build up incrementally

Every step changes what you can see

Question every abstraction

Name things precisely

Reason about ownership boundaries

Separate layers

Map state explicitly

Articulate invariants