Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

Commencer

$pwd:

benchmarks

Name: Benchmarks
Author: graphistry

// Internal maintainer skill for running, validating, and publishing eval benchmarks for this repository.

Exécuter dans Manus

$ git log --oneline --stat

stars:2

forks:0

updated:30 mars 2026 à 06:34

SKILL.md

readonly

name	benchmarks
description	Internal maintainer skill for running, validating, and publishing eval benchmarks for this repository.
metadata	{"internal":true}

Benchmarks Skill (Internal Maintainer)

Use this for repository maintenance workflows. It is not a user-facing Graphistry domain skill.

Use This Skill For

Running eval sweeps across journeys (skill pressure, persona, guardrails, etc.)
Validating baseline isolation (skills=off must not read skill files)
Generating public-safe benchmark reports
Updating README.md with fresh benchmark numbers
Creating CHANGELOG.md entries for benchmark releases
Git version tagging (semver vX.Y.Z)

Success Criteria (Do Not Skip)

Sweeps complete without baseline contamination (verify via log inspection)
pass_bool is used for pass rate calculation (not score >= 0.8)
Reports are public-safe (source paths redacted)
README.md and benchmarks/README.md are updated with new numbers
CHANGELOG.md has entry for the sweep
Git tag created after merge

Key Metrics

pass_bool: Official pass/fail metric (deterministic checks)
Delta: skills=on pass rate minus skills=off pass rate (in percentage points)
Latency: Average response time in seconds

Preconditions

CLIs on PATH: codex, claude, jq
Auth configured: ~/.codex, ~/.claude
Working from repo root: graphistry-skills/

Workflow

1) Run Eval Sweep

OUT="/tmp/graphistry_skills_sweep_$(date +%Y%m%d-%H%M%S)"
./bin/agent.sh \
  --codex --claude \
  --journeys all \
  --skills-mode both \
  --skills-delivery native \
  --max-workers 2 \
  --out "$OUT"

2) Verify Baseline Isolation

Check skills=off logs for no skill file reads:

grep -l "SKILL.md" "$OUT"/raw/*skills_off* 2>/dev/null && echo "CONTAMINATION DETECTED" || echo "Clean"

3) Generate Public-Safe Report

python3 scripts/benchmarks/make_report.py \
  --public-safe \
  --rows "$OUT/rows.jsonl" \
  --title "Eval Sweep $(date +%Y-%m-%d)" \
  --out-md benchmarks/reports/$(date +%Y-%m-%d)-sweep.md \
  --out-json benchmarks/data/$(date +%Y-%m-%d)-sweep/combined_metrics.json

4) Generate README Snippet

python3 scripts/benchmarks/readme_snippet.py \
  --rows "$OUT/rows.jsonl" \
  --title "Fresh eval sweep"

5) Update Files

Update README.md Evals section with generated snippet
Update benchmarks/README.md with new pack reference
Add entry to CHANGELOG.md under [Development] section

6) Create PR and Tag

After PR merge:

git fetch origin main && git checkout main && git pull
git tag -a vX.Y.Z -m "Release vX.Y.Z: <summary>"
git push origin vX.Y.Z

Versioning Convention

Semver: vX.Y.Z (following Supabase MCP, Databricks AI Dev Kit patterns)
Patch (Z): Bug fixes, minor eval improvements
Minor (Y): New journeys, new skills, notable benchmark changes
Major (X): Breaking changes to eval harness or skill format

File Locations

Journeys: evals/journeys/*.json
Scripts: scripts/benchmarks/make_report.py, scripts/benchmarks/readme_snippet.py
Reports: benchmarks/reports/*.md
Data: benchmarks/data/*/combined_metrics.json
Raw artifacts (private): rows.jsonl, manifest.json, traces, logs

Guardrails

Do not check in raw rows.jsonl (contains full prompt/response text)
Do not check in manifest.json, otel_ids.json, or raw logs
Always use --public-safe flag for checked-in reports
Use pass_bool for official metrics, not score thresholds
Verify baseline isolation before publishing results

Related Skills

eval-otel: OTel trace validation and inspection
plan: Multi-session task planning (for complex benchmark campaigns)
release: Semver bump + changelog + tag + GitHub release workflow

related-skills.json

même dépôt

pygraphistry-gfql.md

from "graphistry/graphistry-skills"

Construct and run GFQL graph queries in PyGraphistry using chain-list syntax OR Cypher strings. Covers pattern matching, hop constraints, predicates, let/DAG bindings, GRAPH constructors, and remote execution. Use when requests involve subgraph extraction, path-style matching, Cypher queries, or GPU/remote graph query workflows.

2026-03-302

release.md

from "graphistry/graphistry-skills"

Internal maintainer skill for cutting graphistry-skills releases (changelog bump, PR merge, semver tag, and GitHub release publish).

2026-03-302

pygraphistry-core.md

from "graphistry/graphistry-skills"

Core PyGraphistry workflow for authentication, shaping edges/nodes/hypergraphs, and plotting. Use for first-run setup, converting tables to graphs, and producing an initial interactive graph quickly and safely.

2026-03-302

graphistry-rest-api.md

from "graphistry/graphistry-skills"

Graphistry Hub REST API specialist for auth, upload lifecycle, URL controls, sessions, and sharing safety. Use for curl/requests endpoint guidance independent of SDK choice.

2026-03-222

graphistry.md

from "graphistry/graphistry-skills"

Umbrella router for Graphistry workflows across SDK and API surfaces. Use to dispatch between Python SDK, REST API, and (future) JavaScript SDK workflows.

2026-03-222

pygraphistry-connectors.md

from "graphistry/graphistry-skills"

Select and use PyGraphistry connector and plugin workflows for graph databases, SQL/data platforms, SIEM/log sources, and layout/compute plugins. Use when requests involve Neo4j/Neptune/Splunk/Kusto/Databricks/SQL/TigerGraph and similar integrations.

2026-03-222

package.json

"author": "graphistry"

"repository": "graphistry/graphistry-skills"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Analystes en assurance qualité des logiciels et testeursProfessions informatiques et mathématiques15-1253L4

name	benchmarks
description	Internal maintainer skill for running, validating, and publishing eval benchmarks for this repository.
metadata	{"internal":true}

Benchmarks Skill (Internal Maintainer)

Use this for repository maintenance workflows. It is not a user-facing Graphistry domain skill.

Use This Skill For

Running eval sweeps across journeys (skill pressure, persona, guardrails, etc.)
Validating baseline isolation (skills=off must not read skill files)
Generating public-safe benchmark reports
Updating README.md with fresh benchmark numbers
Creating CHANGELOG.md entries for benchmark releases
Git version tagging (semver vX.Y.Z)

Success Criteria (Do Not Skip)

Sweeps complete without baseline contamination (verify via log inspection)
pass_bool is used for pass rate calculation (not score >= 0.8)
Reports are public-safe (source paths redacted)
README.md and benchmarks/README.md are updated with new numbers
CHANGELOG.md has entry for the sweep
Git tag created after merge

Key Metrics

pass_bool: Official pass/fail metric (deterministic checks)
Delta: skills=on pass rate minus skills=off pass rate (in percentage points)
Latency: Average response time in seconds

Preconditions

CLIs on PATH: codex, claude, jq
Auth configured: ~/.codex, ~/.claude
Working from repo root: graphistry-skills/

Workflow

1) Run Eval Sweep

OUT="/tmp/graphistry_skills_sweep_$(date +%Y%m%d-%H%M%S)"
./bin/agent.sh \
  --codex --claude \
  --journeys all \
  --skills-mode both \
  --skills-delivery native \
  --max-workers 2 \
  --out "$OUT"

2) Verify Baseline Isolation

Check skills=off logs for no skill file reads:

grep -l "SKILL.md" "$OUT"/raw/*skills_off* 2>/dev/null && echo "CONTAMINATION DETECTED" || echo "Clean"

3) Generate Public-Safe Report

python3 scripts/benchmarks/make_report.py \
  --public-safe \
  --rows "$OUT/rows.jsonl" \
  --title "Eval Sweep $(date +%Y-%m-%d)" \
  --out-md benchmarks/reports/$(date +%Y-%m-%d)-sweep.md \
  --out-json benchmarks/data/$(date +%Y-%m-%d)-sweep/combined_metrics.json

4) Generate README Snippet

python3 scripts/benchmarks/readme_snippet.py \
  --rows "$OUT/rows.jsonl" \
  --title "Fresh eval sweep"

5) Update Files

Update README.md Evals section with generated snippet
Update benchmarks/README.md with new pack reference
Add entry to CHANGELOG.md under [Development] section

6) Create PR and Tag

After PR merge:

git fetch origin main && git checkout main && git pull
git tag -a vX.Y.Z -m "Release vX.Y.Z: <summary>"
git push origin vX.Y.Z

Versioning Convention

Semver: vX.Y.Z (following Supabase MCP, Databricks AI Dev Kit patterns)
Patch (Z): Bug fixes, minor eval improvements
Minor (Y): New journeys, new skills, notable benchmark changes
Major (X): Breaking changes to eval harness or skill format

File Locations

Journeys: evals/journeys/*.json
Scripts: scripts/benchmarks/make_report.py, scripts/benchmarks/readme_snippet.py
Reports: benchmarks/reports/*.md
Data: benchmarks/data/*/combined_metrics.json
Raw artifacts (private): rows.jsonl, manifest.json, traces, logs

Guardrails

Do not check in raw rows.jsonl (contains full prompt/response text)
Do not check in manifest.json, otel_ids.json, or raw logs
Always use --public-safe flag for checked-in reports
Use pass_bool for official metrics, not score thresholds
Verify baseline isolation before publishing results

Related Skills

eval-otel: OTel trace validation and inspection
plan: Multi-session task planning (for complex benchmark campaigns)
release: Semver bump + changelog + tag + GitHub release workflow

benchmarks

Benchmarks Skill (Internal Maintainer)

Use This Skill For

Success Criteria (Do Not Skip)

Key Metrics

Preconditions

Workflow

1) Run Eval Sweep

2) Verify Baseline Isolation

3) Generate Public-Safe Report

4) Generate README Snippet

5) Update Files

6) Create PR and Tag

Versioning Convention

File Locations

Guardrails

Related Skills

Plus depuis ce dépôt

Plus depuis ce dépôt

Benchmarks Skill (Internal Maintainer)

Use This Skill For

Success Criteria (Do Not Skip)

Key Metrics

Preconditions

Workflow

1) Run Eval Sweep

2) Verify Baseline Isolation

3) Generate Public-Safe Report

4) Generate README Snippet

5) Update Files

6) Create PR and Tag

Versioning Convention

File Locations

Guardrails

Related Skills