Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

benchmark-and-docs-refresh

Name: Benchmark And Docs Refresh
Author: open-edge-platform

// Run or continue model benchmarks, collect measured results, and refresh README/docs benchmark sections from generated artifacts. Use when benchmark tables in model docs need to be created, updated, or corrected.

Exécuter dans Manus

$ git log --oneline --stat

stars:5 786

forks:933

updated:10 avril 2026 à 11:50

SKILL.md

readonly

name	benchmark-and-docs-refresh
description	Run or continue model benchmarks, collect measured results, and refresh README/docs benchmark sections from generated artifacts. Use when benchmark tables in model docs need to be created, updated, or corrected.

Benchmark and Docs Refresh

Use this skill to update benchmark sections in model documentation from real benchmark outputs.

Scope

This skill focuses on:

running or continuing benchmarks
collecting benchmark CSV results from results/
updating benchmark tables in model READMEs
updating matching docs pages when benchmark status changes

It does not own sample image export. Use model-sample-image-export for that.

Request changes when

incomplete benchmark coverage is presented;
README or docs benchmark status drifts from the actual run state.

Preferred Benchmark Workflow

Always prefer:

tools/experimental/benchmarking/benchmark.py

with an appropriate config file.

If the stock benchmark path is insufficient for a specific model:

derive a small helper script from the benchmark workflow
keep it model-specific unless multiple models clearly need the same pattern
save measurable outputs such as CSV files under results/

Required Evidence

Only publish benchmark values when they come from actual artifacts, for example:

results/<model>_benchmark.csv
benchmark-generated CSV files under runs/ or results/
model-specific run outputs that clearly record the measured metrics

Never infer missing values.

Update Rules

When refreshing benchmark tables:

Read the target README and matching docs page first.
Read the benchmark artifact source.
Fill only the shot-settings and metrics that actually exist.
Leave unavailable rows blank or TODO.
Update status wording if the benchmark is still partial or still running.

Table Conventions

Common sections to refresh:

### Image-Level AUC
### Pixel-Level AUC
### Image F1 Score
### Pixel F1 Score

If a README only contains placeholders, replace only the rows supported by measured results.

Docs Synchronization Rules

If the README benchmark state changes, update the matching docs page under:

docs/source/markdown/guides/reference/models/image/<model>.md
docs/source/markdown/guides/reference/models/video/<model>.md

The docs page may stay shorter than the README, but it must not contradict it.

Quality Checks

Before finishing:

Confirm the benchmark artifact still exists.
Confirm copied values exactly match the artifact.
Confirm averages are computed from measured values only.
Confirm incomplete rows remain clearly incomplete.
Confirm README/docs wording matches reality.

Reviewer checklist

Check that the artifact exists.
Check that every copied value matches.
Check that partial runs are labeled clearly.
Check README and docs wording for consistency.

Repo-Specific Notes

Some benchmark jobs in this repo may require derived helper scripts.
Some long runs are better continued in tmux/background sessions.
A benchmark can be complete enough to fill a subset of rows without justifying all rows.
Never replace TODOs with fabricated numbers.

related-skills.json

même dépôt

agentic-actions-auditor.md

from "open-edge-platform/anomalib"

Audits GitHub Actions workflows for security vulnerabilities in AI agent integrations including Claude Code Action, Gemini CLI, OpenAI Codex, and GitHub AI Inference. Detects attack vectors where attacker-controlled input reaches AI agents running in CI/CD pipelines, including env var intermediary patterns, direct expression injection, dangerous sandbox configurations, and wildcard user allowlists. Use when reviewing workflow files that invoke AI coding agents, auditing CI/CD pipeline security for prompt injection risks, or evaluating agentic action configurations.

2026-05-135.8k

fastapi-rest-api-design.md

from "open-edge-platform/anomalib"

Designs and reviews REST APIs for FastAPI services using consistent resource naming, HTTP semantics, validation, security, and error handling patterns. Use for backend API tasks, endpoint design/refactors, or API review requests in FastAPI/Python projects.

2026-04-225.8k

docs-changelog.md

from "open-edge-platform/anomalib"

Reviews anomalib docstrings, documentation updates, and changelog expectations

2026-04-105.8k

model-doc-sync.md

from "open-edge-platform/anomalib"

Keep anomalib model READMEs, docs pages, image assets, and benchmark/result references in sync

2026-04-105.8k

model-sample-image-export.md

from "open-edge-platform/anomalib"

Export, validate, and publish model sample-result images into docs/source/images and reference them from README/docs pages. Use when model sample images are missing, outdated, or suspected to be invalid.

2026-04-105.8k

models-data.md

from "open-edge-platform/anomalib"

Reviews anomalib model, data, callback, metric, and CLI integration conventions

2026-04-105.8k

package.json

"author": "open-edge-platform"

"repository": "open-edge-platform/anomalib"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Scientifiques des donnéesProfessions informatiques et mathématiques15-2051L4

name	benchmark-and-docs-refresh
description	Run or continue model benchmarks, collect measured results, and refresh README/docs benchmark sections from generated artifacts. Use when benchmark tables in model docs need to be created, updated, or corrected.

Benchmark and Docs Refresh

Use this skill to update benchmark sections in model documentation from real benchmark outputs.

Scope

This skill focuses on:

running or continuing benchmarks
collecting benchmark CSV results from results/
updating benchmark tables in model READMEs
updating matching docs pages when benchmark status changes

It does not own sample image export. Use model-sample-image-export for that.

Request changes when

incomplete benchmark coverage is presented;
README or docs benchmark status drifts from the actual run state.

Preferred Benchmark Workflow

Always prefer:

tools/experimental/benchmarking/benchmark.py

with an appropriate config file.

If the stock benchmark path is insufficient for a specific model:

derive a small helper script from the benchmark workflow
keep it model-specific unless multiple models clearly need the same pattern
save measurable outputs such as CSV files under results/

Required Evidence

Only publish benchmark values when they come from actual artifacts, for example:

results/<model>_benchmark.csv
benchmark-generated CSV files under runs/ or results/
model-specific run outputs that clearly record the measured metrics

Never infer missing values.

Update Rules

When refreshing benchmark tables:

Read the target README and matching docs page first.
Read the benchmark artifact source.
Fill only the shot-settings and metrics that actually exist.
Leave unavailable rows blank or TODO.
Update status wording if the benchmark is still partial or still running.

Table Conventions

Common sections to refresh:

### Image-Level AUC
### Pixel-Level AUC
### Image F1 Score
### Pixel F1 Score

If a README only contains placeholders, replace only the rows supported by measured results.

Docs Synchronization Rules

If the README benchmark state changes, update the matching docs page under:

docs/source/markdown/guides/reference/models/image/<model>.md
docs/source/markdown/guides/reference/models/video/<model>.md

The docs page may stay shorter than the README, but it must not contradict it.

Quality Checks

Before finishing:

Confirm the benchmark artifact still exists.
Confirm copied values exactly match the artifact.
Confirm averages are computed from measured values only.
Confirm incomplete rows remain clearly incomplete.
Confirm README/docs wording matches reality.

Reviewer checklist

Check that the artifact exists.
Check that every copied value matches.
Check that partial runs are labeled clearly.
Check README and docs wording for consistency.

Repo-Specific Notes

Some benchmark jobs in this repo may require derived helper scripts.
Some long runs are better continued in tmux/background sessions.
A benchmark can be complete enough to fill a subset of rows without justifying all rows.
Never replace TODOs with fabricated numbers.

benchmark-and-docs-refresh

Benchmark and Docs Refresh

Scope

Request changes when

Preferred Benchmark Workflow

Required Evidence

Update Rules

Table Conventions

Docs Synchronization Rules

Quality Checks

Reviewer checklist

Repo-Specific Notes

Plus depuis ce dépôt

Plus depuis ce dépôt

Benchmark and Docs Refresh

Scope

Request changes when

Preferred Benchmark Workflow

Required Evidence

Update Rules

Table Conventions

Docs Synchronization Rules

Quality Checks

Reviewer checklist

Repo-Specific Notes