تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

verify-algorithms

Name: Verify Algorithms
Author: av1155

// Verify probabilistic, distributional, or random behaviour empirically before changing search-engine code. Loads when reading or editing src/houndarr/engine/. Use when a user, code review, or another AI surfaces a claim about bias, ordering, page selection, randomness, or "we are searching the same things over and over".

تشغيل في Manus

$ git log --oneline --stat

stars:٢٣٩

forks:٤

updated:٤ مايو ٢٠٢٦ في ١٧:٤٤

SKILL.md

readonly

name	verify-algorithms
description	Verify probabilistic, distributional, or random behaviour empirically before changing search-engine code. Loads when reading or editing src/houndarr/engine/. Use when a user, code review, or another AI surfaces a claim about bias, ordering, page selection, randomness, or "we are searching the same things over and over".
paths	["src/houndarr/engine/**"]

Verifying claims about algorithms

Before modifying search-engine logic, scheduling, randomisation, ordering, distribution, or any code where probability or stateful iteration governs behaviour, verify the claim empirically and analytically first. Most reported "bugs" in this class turn out to be sample noise, observation bias, or misreadings of timing-dependent state, and shipping a fix for a non-bug introduces real risk for no real gain.

When this rule fires

Apply this workflow whenever a user, a code review, or another AI surfaces a claim along the lines of:

"X picks the wrong page / item / branch"
"Y is biased / unfair / skewed toward Z"
"Random does not feel random"
"The cycle order is broken"
"We are searching the same things over and over"

It does not apply to clear logic bugs, typos, or behaviour-change requests. The trigger is specifically: claims about probabilistic or distribution-shaped behaviour where the right answer is a measured histogram, not a code reading.

Required workflow

Reproduce the algorithm in isolation against tests/mock_arr/, not against the live test instances or short-window log dumps. The live test *arrs hold tens of records, which is far below the sample size needed to distinguish bias from variance, and live state (cooldowns, hourly caps, *arr-side sort orders) confounds the measurement.
Derive analytically what each page, item, or branch's probability should be under the current code. Read the loop, write the math down, and predict the distribution shape before running anything. "I think it should be uniform" is not a prediction; "uniform with chi-square below 16.92 at df=9" is.
Run hundreds of cycles through tests/mock_arr/probe_distribution.py or a similar probe modelled on it. Compute chi-square, max/min ratio, and per-bucket standard deviation. Compare against the analytical prediction and against the 5% chi-square critical value at df = N - 1.
Decide on evidence. If the empirical result agrees with the prediction and the chi-square lands below the critical value, the claim is wrong. Document the finding, reference the probe output, and close the investigation. If the result confirms real bias, scope the fix to the smallest change that closes the measured gap, then re-run the probe to prove the gap is gone.

Tooling to use

just mock-arr port=PORT items=N seed=S launches the seeded multi-app mock server with configurable item counts and a deterministic seed; identical seeds produce byte-identical responses.
.venv/bin/python -m tests.mock_arr.probe_distribution boots the mock in-process, drives the production run_instance_search for many cycles across a sweep of library sizes, and reports per-cycle start-page distributions plus full visit histograms. Use it as the template for any new programmatic probe.
The mock exposes GET /__page_log__/{app} and GET /__commands__/{app} for ground-truth request and dispatch records, plus POST /__reset__/{app} to clear them between configurations.
For statistical-power-bound questions (100k+ trials), a short pure-Python simulation of just the algorithm beats running through HTTP. Use it when the measurement is about the math, not the integration.

What not to do

Do not treat a short-window dev-DB histogram (a few hours, dozens of cycles, a handful of items) as evidence of algorithmic bias. Cooldown phase, *arr-side sort order, and small-sample variance dominate that signal. The math you owe is a many-cycle distribution against a predicted shape.
Do not adopt an external diagnostic write-up without re-deriving the math yourself. Direction (page 1 vs page N) and magnitude (1.5x vs 5x) routinely invert in second-hand summaries of probabilistic algorithms, and shipping a fix for an inverted claim ships a regression.
Do not start coding because the claim is plausible. Plausibility is not evidence. The bar is a reproducible measurement that disagrees with the predicted distribution by more than chance.

Closing the loop

When measurement contradicts the claim, the writeup is the engineering contribution. Reference the probe output, state the measured statistics, explain what the original observation was actually picking up (cooldown saturation, recency effects, sort-order interaction, sample noise), and close the discussion. A correct "no change required" is a successful task, not a non-result.

Known emergent behaviours (already measured)

These are real but minor effects that have been verified by probe and deliberately left alone. Do not re-investigate them unless the operating point changes or a user reports a concrete regression.

Partial-last-page over-selection on missing/cutoff under random search order. When the engine's page_size does not divide totalRecords evenly, items on the (short) last page are drained every visit because the engine dispatches up to batch_size items per page. Measured at most 2x attention skew for the 1-9 items on the last page at default settings (batch=1, pageSize=10) and 4x in contrived configurations (batch=5, pageSize=20). Affects a small slice of the backlog; the only clean fix is a virtual flat-index draw which is a substantial redesign of _run_search_pass. Probe: tests/mock_arr/probe_cooldown.py.
Sonarr / Whisparr v2 windowed-rotation coverage time. The upgrade pass visits 5 series per cycle; full-library coverage takes approximately ceil(eligible_episodes * H / batch) cycles where H is the harmonic-coverage factor. Measured 91% theoretical and 85-89% empirical coverage at 60 cycles with batch=5 on 50 series. This is the intentional trade-off versus hammering one series with a single huge *arr fetch. Probe: tests/mock_arr/probe_upgrade_coverage.py.

related-skills.json

نفس المستودع

houndarr-architecture.md

from "av1155/houndarr"

Houndarr's source layout and architectural patterns at file granularity. Loads when reading or editing src/houndarr/. Covers the per-file purpose for auth/, clients/, engine/, routes/, services/; the wire-models vs domain-models split; the *arr API spec snapshots under docs/api/; and pointers to the more specific database / engine skills for narrower scopes.

2026-05-22239

houndarr-database.md

from "av1155/houndarr"

Houndarr's SQLite schema and migration discipline. Loads when reading or editing src/houndarr/database.py. Covers the schema table reference, the SCHEMA_VERSION bump checklist, and the version-locking rule for migration constants that prevents later renames from retroactively breaking earlier rebuild migrations.

2026-05-22239

houndarr-changelog.md

from "av1155/houndarr"

Houndarr's CHANGELOG.md style guide and entry rules. Loads when reading or editing CHANGELOG.md or VERSION. Covers the noun-led present-tense voice, the 80-160 character length target, vocabulary the operator can act on, banned phrasings, what does not belong in a changelog, separator rules, and the bullet-justification protocol that prevents drafting from PR titles or memory.

2026-05-04239

houndarr-ci.md

from "av1155/houndarr"

Houndarr's CI workflow reference and branch protection. Loads when reading or editing .github/workflows/. Covers the 11 required status checks (exact names matter for branch protection), the additional non-required workflows, the paths-ignore + ci-skip pattern that keeps docs-only PRs green, and the rule against changing required check job names.

2026-05-04239

houndarr-python.md

from "av1155/houndarr"

Houndarr-specific Python conventions on top of the global python skill. Loads when reading or editing .py files in this repo. Covers the noqa/nosec suppression table, the AppSettings-as-plain-dataclass decision, the frozen-dataclass-with-slots invariant for domain models, the per-module logger pattern, and the background-task error-handling shape.

2026-05-04239

houndarr-testing.md

from "av1155/houndarr"

Houndarr's pytest patterns. Loads when reading or editing tests/ files. Covers the fixture dependency graph, the FK seeding pattern for cooldowns/search_log tests, the local _login() and CSRF helpers in route tests, and the auth-state reset in test_settings.

2026-05-04239

package.json

"author": "av1155"

"repository": "av1155/houndarr"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

مطوّرو البرمجياتمهن الحاسوب والرياضيات15-1252L4

Verifying claims about algorithms

When this rule fires

Apply this workflow whenever a user, a code review, or another AI surfaces a claim along the lines of:

"X picks the wrong page / item / branch"

"Y is biased / unfair / skewed toward Z"

"Random does not feel random"

"The cycle order is broken"

"We are searching the same things over and over"

Required workflow

Reproduce the algorithm in isolation against tests/mock_arr/, not against the live test instances or short-window log dumps. The live test *arrs hold tens of records, which is far below the sample size needed to distinguish bias from variance, and live state (cooldowns, hourly caps, *arr-side sort orders) confounds the measurement.

Derive analytically what each page, item, or branch's probability should be under the current code. Read the loop, write the math down, and predict the distribution shape before running anything. "I think it should be uniform" is not a prediction; "uniform with chi-square below 16.92 at df=9" is.

Run hundreds of cycles through tests/mock_arr/probe_distribution.py or a similar probe modelled on it. Compute chi-square, max/min ratio, and per-bucket standard deviation. Compare against the analytical prediction and against the 5% chi-square critical value at df = N - 1.

Decide on evidence. If the empirical result agrees with the prediction and the chi-square lands below the critical value, the claim is wrong. Document the finding, reference the probe output, and close the investigation. If the result confirms real bias, scope the fix to the smallest change that closes the measured gap, then re-run the probe to prove the gap is gone.

Tooling to use

just mock-arr port=PORT items=N seed=S launches the seeded multi-app mock server with configurable item counts and a deterministic seed; identical seeds produce byte-identical responses.

.venv/bin/python -m tests.mock_arr.probe_distribution boots the mock in-process, drives the production run_instance_search for many cycles across a sweep of library sizes, and reports per-cycle start-page distributions plus full visit histograms. Use it as the template for any new programmatic probe.

The mock exposes GET /__page_log__/{app} and GET /__commands__/{app} for ground-truth request and dispatch records, plus POST /__reset__/{app} to clear them between configurations.

For statistical-power-bound questions (100k+ trials), a short pure-Python simulation of just the algorithm beats running through HTTP. Use it when the measurement is about the math, not the integration.

What not to do

Do not treat a short-window dev-DB histogram (a few hours, dozens of cycles, a handful of items) as evidence of algorithmic bias. Cooldown phase, *arr-side sort order, and small-sample variance dominate that signal. The math you owe is a many-cycle distribution against a predicted shape.

Do not adopt an external diagnostic write-up without re-deriving the math yourself. Direction (page 1 vs page N) and magnitude (1.5x vs 5x) routinely invert in second-hand summaries of probabilistic algorithms, and shipping a fix for an inverted claim ships a regression.

Do not start coding because the claim is plausible. Plausibility is not evidence. The bar is a reproducible measurement that disagrees with the predicted distribution by more than chance.

Closing the loop

Known emergent behaviours (already measured)

These are real but minor effects that have been verified by probe and deliberately left alone. Do not re-investigate them unless the operating point changes or a user reports a concrete regression.

Partial-last-page over-selection on missing/cutoff under random search order. When the engine's page_size does not divide totalRecords evenly, items on the (short) last page are drained every visit because the engine dispatches up to batch_size items per page. Measured at most 2x attention skew for the 1-9 items on the last page at default settings (batch=1, pageSize=10) and 4x in contrived configurations (batch=5, pageSize=20). Affects a small slice of the backlog; the only clean fix is a virtual flat-index draw which is a substantial redesign of _run_search_pass. Probe: tests/mock_arr/probe_cooldown.py.

Sonarr / Whisparr v2 windowed-rotation coverage time. The upgrade pass visits 5 series per cycle; full-library coverage takes approximately ceil(eligible_episodes * H / batch) cycles where H is the harmonic-coverage factor. Measured 91% theoretical and 85-89% empirical coverage at 60 cycles with batch=5 on 50 series. This is the intentional trade-off versus hammering one series with a single huge *arr fetch. Probe: tests/mock_arr/probe_upgrade_coverage.py.

verify-algorithms

Verifying claims about algorithms

When this rule fires

Required workflow

Tooling to use

What not to do

Closing the loop

Known emergent behaviours (already measured)

المزيد من هذا المستودع

Verifying claims about algorithms

When this rule fires

Required workflow

Tooling to use

What not to do

Closing the loop

Known emergent behaviours (already measured)

المزيد من هذا المستودع