Run any Skill in Manus with one click

$pwd:

deterministic-metric-design

Name: Deterministic Metric Design
Author: pproenca

// Inventing deterministic metrics — turning a fuzzy property like 'maintainability', 'risk', or 'how reducible this code is' into a deterministic, computable number an agent can trust and optimize. Covers the path from construct to adoption — operationalizing the construct, confronting computability limits (Kolmogorov, Rice) with sound proxies, picking the right measurement scale, proving properties (monotonicity, invariance, the Weyuker/Briand axioms), guaranteeing determinism, establishing construct validity (not just LOC in disguise), and hardening against Goodhart-style gaming when an agent optimizes the metric. Trigger when designing, reviewing, or validating a quantitative metric, score, measure, or index — and even when the user doesn't say 'metric' but wants to quantify, score, rank, or measure code/behavior, build a deterministic optimization target, or invent a measure for something previously unquantified (e.g., behavior-preserving codebase-size reduction).

Run Skill in Manus

$ git log --oneline --stat

stars:151

forks:9

updated:May 27, 2026 at 13:45

File Explorer

50 files

SKILL.md

readonly

related-skills.json

same repository

design-review.md

from "pproenca/dot-skills"

Structured UI design review — existing code (React/JSX, CSS, Tailwind) and, when behaviour matters, the running app in a real browser — reported as a prioritised Before / After / Why table. Covers visual hierarchy, spacing, typography, colour & contrast, component states, motion, responsiveness, accessibility, multi-page flow & navigation, and interaction continuity — grounded in Refactoring UI and Emil Kowalski's principles. For animation/jank/FPS, focus order, and cross-page UX it can drive Chrome via chrome-devtools-mcp to capture what a screenshot can't. Trigger when the user asks to "review this UI", "design review", "critique this component/screen/page or multi-page flow", asks why something "looks off", "looks AI-generated", or "looks like a wireframe", or wants to raise visual polish. For building UI from scratch use web-taste; for the full animation set see emilkowal-animations.

2026-05-27151

implementation-design-patterns.md

from "pproenca/dot-skills"

Implementation guide for the 22 Gang of Four design patterns in TypeScript, distilled from refactoring.guru. Use this skill when writing, refactoring, or reviewing TypeScript that exhibits a pattern-shaped problem — class-explosion from inheritance, conditionals switching on type, tight coupling to concrete classes, tree-shaped models, runtime algorithm selection, undo/redo, snapshot-and-restore, state-dependent behavior, subscriber notification, or hiding subsystem complexity. Each pattern entry includes intent, problem, solution, applicability (when to use AND when NOT to use), a runnable TypeScript example, implementation steps, pros/cons, and relations to sibling patterns. Trigger even when no pattern is named — cues like "class getting unwieldy," "giant switch," "swap implementations at runtime," "combinatorial subclasses," "need undo," or "traverse a tree" are pattern-shaped. Covers all 5 Creational, 7 Structural, and 10 Behavioral GoF patterns.

2026-05-27151

implementation-functional-patterns.md

from "pproenca/dot-skills"

TypeScript's functional answers to the 22 Gang of Four classes — factory functions (Factory Method, Abstract Factory, Prototype, Memento), module-scope singletons, fluent immutable builders, wrapper functions (Adapter, Facade), native Proxy, WeakMap caches (Flyweight), discriminated unions with exhaustive match (State, Visitor, Composite), event emitters and signals (Mediator, Observer), pipelines and composition (CoR, Decorator), stream methods (Iterator), closures-as-commands, higher-order strategies, lambda placement. Use when reviewing TypeScript that has a class-shaped problem the GoF catalog solves with a hierarchy but where idiomatic TS reaches for a function, a tagged union, or a data structure. Each rule names the GoF pattern(s) it replaces and when the class form still wins. Trigger on "factory class", "singleton getInstance", "state machine class", "observer pattern", "AST visitor", "where do I put this lambda". Sibling to implementation-design-patterns.

2026-05-27151

library-reference-distillation.md

from "pproenca/dot-skills"

Methodology for starting a new library-reference distillation skill — one that turns an external library (nuqs, zod, framer-motion, msw, react-hook-form, emilkowal-animations) into an idiomatic-usage rule pack — or evolving one against a new upstream release. Distills the conventions empirically shared across shipped library-ref skills in this repo — the source-priority ladder (docs → blog/changelog → issues → types → examples), version pinning that inverts with API velocity, the universal 4-tier category ladder (CRITICAL setup → HIGH isolation → MEDIUM composition → LOW edge cases), the 4-slot When-to-Apply template, the failure-gap exemplar heuristic (privilege production lessons over API restatement), and metadata.references[] as cite-set checksum. Triggers on "I want to write a skill for library X", "refresh against new upstream", "where should I source rules from", "what categories should this skill have", and on /dev-skill:new for a library-reference distillation.

2026-05-27151

same-results-less-code.md

from "pproenca/dot-skills"

Same behaviour in fewer, clearer lines — covers the judgment gaps that linters cannot catch (reinvention, wrong frame, hidden duplication, derived state, procedural rebuilds, speculative generality, defensive excess, type-system underuse). Trigger when reviewing, refactoring, or simplifying code — and even when the user doesn't explicitly ask for "simplification" but is reviewing code, refactoring, or asking "is there a shorter way to write this?". Complements knip/eslint/ruff/tsc by focusing on the conceptual modelling layer those tools cannot see.

2026-05-27151

code-distill.md

from "pproenca/dot-skills"

On-demand pattern extraction from a specific GitHub codebase, given a focused query — "how does shadcn/ui implement the design system", "how does opencode use effect-ts", "how does base-ui handle composition" — when no pre-distilled static rule pack exists yet. Distills the generic pattern-extraction moves — classify the query before grepping (component / composition / state / effect / error / build / routing), grep before reading whole files, treat tests and examples/ as canonical intent, follow imports outward for the public surface, follow usages inward for variants, filter boilerplate / legacy / test scaffolding to surface load-bearing code, and capture findings to /knowledge/libraries/ for reuse. Dynamic light sibling of static code-atlas skills (opencode-ts, openai-codex-rust-patterns, nextjs-ppr-patterns). Triggers on "show me how <library> implements X", "find the <pattern> in <repo>", "distill <library>", and ad-hoc /distill-<library>-style invocations.

2026-05-27151

package.json

"author": "pproenca"

"repository": "pproenca/dot-skills"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

deterministic-metric-design

description

Inventing deterministic metrics — turning a fuzzy property like 'maintainability', 'risk', or 'how reducible this code is' into a deterministic, computable number an agent can trust and optimize. Covers the path from construct to adoption — operationalizing the construct, confronting computability limits (Kolmogorov, Rice) with sound proxies, picking the right measurement scale, proving properties (monotonicity, invariance, the Weyuker/Briand axioms), guaranteeing determinism, establishing construct validity (not just LOC in disguise), and hardening against Goodhart-style gaming when an agent optimizes the metric. Trigger when designing, reviewing, or validating a quantitative metric, score, measure, or index — and even when the user doesn't say 'metric' but wants to quantify, score, rank, or measure code/behavior, build a deterministic optimization target, or invent a measure for something previously unquantified (e.g., behavior-preserving codebase-size reduction).

dot-skills Deterministic Metric Design Best Practices

Design metrics that are deterministic, computable, provable, and valid — measures an agent can trust and optimize against without gaming them. The 44 rules across 8 categories take a metric from a fuzzy construct to an adoptable, machine-checkable number: define the construct, confront computability limits with sound proxies, ground it in measurement theory, prove its properties, pin its determinism, validate it empirically, harden it against optimization pressure, and package it for adoption.

A running example threads through every category — a deterministic measure of behavior-preserving codebase-size reduction (shrink code without changing how the app works). It is the ideal stress test because its ideal form is provably out of reach (Kolmogorov complexity is uncomputable; program equivalence is undecidable by Rice's theorem), so the whole craft is building a deterministic, tractable proxy with a proven guarantee.

This is the measurement-design layer that the *-algorithms skills apply (Big-O, NDCG, cyclomatic, MoJoFM) but never teach.

When to Apply

Use this skill when:

Designing a new metric, score, or index — or reviewing someone's proposed metric for rigor
Asked to "quantify", "measure", "score", or "rank" a property that has no agreed measure yet
Building a deterministic optimization target an agent will push on (e.g., reduce code size without changing behavior)
Auditing an existing metric that "feels off" — it suspiciously tracks LOC, jumps between runs, or gets gamed
Turning a research idea or formula into something computable, reproducible, and adoptable

Workflow: Define → Make Computable → Prove → Validate → Harden

The categories are ordered by cascade severity — an upstream mistake poisons everything below it. Work top-down, and jump straight to a category using this table:

If you are…	Start in	First rule
Starting from a fuzzy property	`def-`	def-name-the-latent-construct
Worried the ideal is uncomputable / undecidable	`comp-`	comp-do-not-define-metric-as-uncomputable-ideal
Unsure whether you can average or take ratios	`meas-`	meas-declare-the-scale-type
Claiming the metric behaves a certain way	`prop-`	prop-prove-monotonicity
Getting different numbers between runs	`det-`	det-pin-iteration-and-tie-break-order
Unsure it measures the real thing	`valid-`	valid-discriminant-not-just-loc
Letting an agent optimize the metric	`game-`	game-hard-block-construct-violating-wins
Publishing the metric for others	`agg-`	agg-ship-reference-impl-and-test-vectors

Each reference file is a {category}-{slug}.md containing: WHY it matters, an Incorrect example with the failure annotated, a Correct example with the minimal fix, and a reference. The incorrect/correct examples are metric definitions and procedures, not application code — the contrast is a badly-designed measure versus the fixed one.

Rule Categories by Priority

#	Category	Prefix	Impact	Rules
1	Construct Definition & Operationalization	`def-`	CRITICAL	6
2	Computability & Tractability	`comp-`	CRITICAL	7
3	Measurement-Theoretic Foundations	`meas-`	HIGH	5
4	Proof of Metric Properties	`prop-`	HIGH	6
5	Determinism & Reproducibility	`det-`	HIGH	5
6	Construct Validity & Calibration	`valid-`	MEDIUM-HIGH	6
7	Optimization Safety & Anti-Gaming	`game-`	MEDIUM	5
8	Aggregation, Reporting & Adoption	`agg-`	LOW-MEDIUM	4

See references/_sections.md for the full ordering rationale.

Quick Reference

1. Construct Definition & Operationalization (CRITICAL)

def-name-the-latent-construct — Name the unobservable property before writing any formula
def-separate-construct-from-proxy — Keep construct, proxy, and their assumed link distinct
def-write-falsifiable-operational-definition — Specify the exact procedure that yields the number
def-fix-unit-of-analysis — Pin the unit of analysis and the measurement boundary
def-anchor-to-the-decision — Attach the decision and action threshold the metric drives
def-operationalize-behavior-and-size — Define "behavior" (≈) and "size" so a formatter can't move them

2. Computability & Tractability (CRITICAL)

comp-do-not-define-metric-as-uncomputable-ideal — Don't define the metric as Kolmogorov complexity
comp-respect-rices-theorem-for-semantic-properties — Use sound approximations for undecidable semantic facts
comp-choose-a-decidable-observational-equivalence — Replace undecidable equivalence with a checkable ≈
comp-design-a-proxy-with-a-proven-error-direction — Give the proxy a sound bound that never over-states
comp-keep-the-metric-tractable — Pick a near-linear proxy, not an NP-hard optimum
comp-bound-approximation-error-explicitly — Quantify and report the proxy↔ideal gap
comp-prefer-monotone-confluent-transformations — Confluent, terminating rewrites give a unique fixed point

3. Measurement-Theoretic Foundations (HIGH)

meas-declare-the-scale-type — Declare nominal/ordinal/interval/ratio before any statistic
meas-only-admissible-statistics — Use only statistics invariant under the scale's transforms
meas-establish-meaningful-zero-and-unit — Give a true zero and a named unit for ratio claims
meas-preserve-the-empirical-relation — Verify the metric orders known anchor cases correctly
meas-avoid-ad-hoc-weighted-sums — Don't sum incommensurable scales with arbitrary weights

4. Proof of Metric Properties (HIGH)

prop-prove-monotonicity — Prove the score moves the right way when the construct does
prop-prove-invariance-under-irrelevant-transforms — Prove invariance to renaming and formatting
prop-ensure-sensitivity-to-relevant-change — Ensure it still discriminates (no saturation)
prop-check-weyuker-briand-axioms — Check the published axioms for your measure type
prop-prove-boundedness-and-handle-empty — Prove the range; define the empty / zero-denominator case
prop-prove-or-disclaim-composability — Prove additivity before aggregating, or refuse to sum

5. Determinism & Reproducibility (HIGH)

det-make-the-metric-a-pure-function — No hidden time, network, or global state
det-pin-iteration-and-tie-break-order — Sort by a total key; seed any randomness
det-pin-the-input-representation — Fix exactly which representation (AST stage) you measure
det-control-floating-point-and-accumulation — Fix summation order and rounding precision
det-version-and-record-the-toolchain — Emit metric version, tool versions, and input hash

6. Construct Validity & Calibration (MEDIUM-HIGH)

valid-converge-with-accepted-measure — Show convergence with a trusted measure of the construct
valid-discriminant-not-just-loc — Prove incremental signal beyond LOC / size
valid-predictive-validity-against-outcome — Show it predicts the real outcome out-of-sample
valid-beat-the-trivial-baseline — Quote the lift over a dumb baseline
valid-calibrate-thresholds-to-ground-truth — Derive thresholds from data, not round numbers
valid-validate-out-of-sample — Use a holdout / temporal split to avoid overfitting the corpus

7. Optimization Safety & Anti-Gaming (MEDIUM)

game-make-cheapest-improvement-the-right-one — Make the cheapest score gain the genuine one
game-recognize-goodhart-variants — Anticipate regressional / extremal / causal Goodhart
game-pair-with-guardrail-metrics — Add counter-metrics that veto a regressing "win"
game-hard-block-construct-violating-wins — Gate on invariants; never use a tradable soft penalty
game-detect-reward-hacking-with-audits — Spot-audit top scores; watch proxy↔outcome drift

8. Aggregation, Reporting & Adoption (LOW-MEDIUM)

agg-respect-scale-in-aggregation — Aggregate the way the scale permits (no mean of ordinal)
agg-report-uncertainty-not-false-precision — Report intervals / bounds, not false precision
agg-version-the-metric-publicly — Semver + changelog so consumers stay comparable
agg-ship-reference-impl-and-test-vectors — Publish test vectors so implementations agree

How to Use

Identify where you are with the Workflow table and open the matching first rule.
Work the categories top-down — def- and comp- are CRITICAL because a fuzzy construct or an uncomputable ideal makes everything downstream noise or unusable.
When proposing or critiquing a metric, quote the rule by file path so reviewers can check the reasoning.
For a new metric, produce a one-page spec naming: construct, proxy, scale + unit + zero, proven properties, determinism guarantees, validity evidence, guardrails, and version — one line per category here.
See references/_sections.md for ordering rationale and assets/templates/_template.md when adding rules.

Reference Files

File	Description
references/_sections.md	Category definitions, impact levels, and ordering rationale
assets/templates/_template.md	Template for adding new rules
metadata.json	Discipline, type, and source references

Related Skills

same-results-less-code, code-simplifier, complexity-optimizer, knip-deadcode — prescriptive code-reduction skills. This skill supplies the measurement layer they lack: a deterministic, behavior-preserving reduction metric to target and verify.
algorithmic-complexity-review, computer-science-algorithms — apply existing measures (Big-O). This skill teaches how to design new ones.
opensearch-function-scoring-algorithms — applied ranking metrics (NDCG, A/B tests). This skill is the foundational methodology beneath its eval- category.

deterministic-metric-design

More from this repository

dot-skills Deterministic Metric Design Best Practices

When to Apply

Workflow: Define → Make Computable → Prove → Validate → Harden

Rule Categories by Priority

Quick Reference

1. Construct Definition & Operationalization (CRITICAL)

2. Computability & Tractability (CRITICAL)

3. Measurement-Theoretic Foundations (HIGH)

4. Proof of Metric Properties (HIGH)

5. Determinism & Reproducibility (HIGH)

6. Construct Validity & Calibration (MEDIUM-HIGH)

7. Optimization Safety & Anti-Gaming (MEDIUM)

8. Aggregation, Reporting & Adoption (LOW-MEDIUM)

How to Use

Reference Files

Related Skills

dot-skills Deterministic Metric Design Best Practices

When to Apply

Workflow: Define → Make Computable → Prove → Validate → Harden

Rule Categories by Priority

Quick Reference

1. Construct Definition & Operationalization (CRITICAL)

2. Computability & Tractability (CRITICAL)

3. Measurement-Theoretic Foundations (HIGH)

4. Proof of Metric Properties (HIGH)

5. Determinism & Reproducibility (HIGH)

6. Construct Validity & Calibration (MEDIUM-HIGH)

7. Optimization Safety & Anti-Gaming (MEDIUM)

8. Aggregation, Reporting & Adoption (LOW-MEDIUM)

How to Use

Reference Files

Related Skills

More from this repository