en un clic
benchmarks
// Internal maintainer skill for running, validating, and publishing eval benchmarks for this repository.
// Internal maintainer skill for running, validating, and publishing eval benchmarks for this repository.
Construct and run GFQL graph queries in PyGraphistry using chain-list syntax OR Cypher strings. Covers pattern matching, hop constraints, predicates, let/DAG bindings, GRAPH constructors, and remote execution. Use when requests involve subgraph extraction, path-style matching, Cypher queries, or GPU/remote graph query workflows.
Internal maintainer skill for cutting graphistry-skills releases (changelog bump, PR merge, semver tag, and GitHub release publish).
Core PyGraphistry workflow for authentication, shaping edges/nodes/hypergraphs, and plotting. Use for first-run setup, converting tables to graphs, and producing an initial interactive graph quickly and safely.
Graphistry Hub REST API specialist for auth, upload lifecycle, URL controls, sessions, and sharing safety. Use for curl/requests endpoint guidance independent of SDK choice.
Umbrella router for Graphistry workflows across SDK and API surfaces. Use to dispatch between Python SDK, REST API, and (future) JavaScript SDK workflows.
Select and use PyGraphistry connector and plugin workflows for graph databases, SQL/data platforms, SIEM/log sources, and layout/compute plugins. Use when requests involve Neo4j/Neptune/Splunk/Kusto/Databricks/SQL/TigerGraph and similar integrations.
| name | benchmarks |
| description | Internal maintainer skill for running, validating, and publishing eval benchmarks for this repository. |
| metadata | {"internal":true} |
Use this for repository maintenance workflows. It is not a user-facing Graphistry domain skill.
pass_bool is used for pass rate calculation (not score >= 0.8)skills=on pass rate minus skills=off pass rate (in percentage points)codex, claude, jq~/.codex, ~/.claudegraphistry-skills/OUT="/tmp/graphistry_skills_sweep_$(date +%Y%m%d-%H%M%S)"
./bin/agent.sh \
--codex --claude \
--journeys all \
--skills-mode both \
--skills-delivery native \
--max-workers 2 \
--out "$OUT"
Check skills=off logs for no skill file reads:
grep -l "SKILL.md" "$OUT"/raw/*skills_off* 2>/dev/null && echo "CONTAMINATION DETECTED" || echo "Clean"
python3 scripts/benchmarks/make_report.py \
--public-safe \
--rows "$OUT/rows.jsonl" \
--title "Eval Sweep $(date +%Y-%m-%d)" \
--out-md benchmarks/reports/$(date +%Y-%m-%d)-sweep.md \
--out-json benchmarks/data/$(date +%Y-%m-%d)-sweep/combined_metrics.json
python3 scripts/benchmarks/readme_snippet.py \
--rows "$OUT/rows.jsonl" \
--title "Fresh eval sweep"
README.md Evals section with generated snippetbenchmarks/README.md with new pack referenceCHANGELOG.md under [Development] sectionAfter PR merge:
git fetch origin main && git checkout main && git pull
git tag -a vX.Y.Z -m "Release vX.Y.Z: <summary>"
git push origin vX.Y.Z
evals/journeys/*.jsonscripts/benchmarks/make_report.py, scripts/benchmarks/readme_snippet.pybenchmarks/reports/*.mdbenchmarks/data/*/combined_metrics.jsonrows.jsonl, manifest.json, traces, logsrows.jsonl (contains full prompt/response text)manifest.json, otel_ids.json, or raw logs--public-safe flag for checked-in reportspass_bool for official metrics, not score thresholdseval-otel: OTel trace validation and inspectionplan: Multi-session task planning (for complex benchmark campaigns)release: Semver bump + changelog + tag + GitHub release workflow