with one click
sv-data
// Build and manage Security Verifiers datasets. Use when asked to build E1 or E2 datasets, create test fixtures, validate data, or manage dataset files for network-logs or config-verification environments.
// Build and manage Security Verifiers datasets. Use when asked to build E1 or E2 datasets, create test fixtures, validate data, or manage dataset files for network-logs or config-verification environments.
Run and analyze Security Verifiers evaluations. Use when asked to evaluate models on E1 (network-logs) or E2 (config-verification), generate metrics reports, compare model performance, or analyze eval results.
Generate SV-Bench metrics reports (summary.json + report.md) for E1/E2 runs, validate metrics contracts, and produce comparison-friendly artifacts from outputs/evals/.
Deploy Security Verifiers environments and packages. Use when asked to deploy to Prime Intellect Environments Hub, publish to PyPI, bump versions, build wheels, or manage releases.
Development workflow for Security Verifiers. Use when asked to run tests, lint code, format files, set up the development environment, or perform CI checks on the codebase.
Manage HuggingFace datasets for Security Verifiers. Use when asked to push datasets to HuggingFace, manage metadata, configure gated access, or set up user HF repositories for E1/E2 datasets.
| name | sv-data |
| description | Build and manage Security Verifiers datasets. Use when asked to build E1 or E2 datasets, create test fixtures, validate data, or manage dataset files for network-logs or config-verification environments. |
| metadata | {"author":"security-verifiers","version":"1.0"} |
Build, validate, and manage datasets for E1 (network-logs) and E2 (config-verification) environments.
| Type | Purpose | Location | Committed |
|---|---|---|---|
| Production | Full training/eval data | environments/sv-env-*/data/ | No (private) |
| Test fixtures | CI/unit tests | environments/sv-env-*/data/ | Yes (small) |
| HuggingFace | Remote dataset access | HF Hub | Private repos |
# Full dataset (1800 examples, 60/20/20 split)
make data-e1
# Custom limit
make data-e1 LIMIT=3000
Outputs: environments/sv-env-network-logs/data/iot23-train-dev-test-v1.jsonl
# CIC-IDS-2017 and UNSW-NB15 (600 examples each)
make data-e1-ood
# Custom count
make data-e1-ood N=1000
Outputs:
cic-ids-2017-ood-v1.jsonlunsw-nb15-ood-v1.jsonl# Small datasets for CI (20-30 examples)
make data-e1-test
E2 datasets are built from real Kubernetes and Terraform configs:
# Clone recommended repos
make clone-e2-sources
Clones to scripts/data/sources/:
kubernetes/ - K8s YAML manifeststerraform/ - Terraform HCL configs# Using cloned sources
make data-e2-local
# Using custom paths
make data-e2 K8S_ROOT=/path/to/k8s TF_ROOT=/path/to/terraform
Outputs:
environments/sv-env-config-verification/data/k8s-labeled-v1.jsonlenvironments/sv-env-config-verification/data/terraform-labeled-v1.jsonl# Requires clone-e2-sources first
make clone-e2-sources
make data-e2-test
# All E1 production datasets
make data-all
# All test fixtures (for CI)
make data-test-all
Validate datasets with Pydantic before HuggingFace push:
# Validate E1 splits
make validate-e1-data
# Validate E2 splits
make validate-e2-data
# Validate all
make validate-data
Note: Examples below show schema structure only. Actual benchmark data is gated to prevent training contamination. See
plans/ROADMAP-Q1-2026.mdfor benchmark integrity policy.
Hub/local JSONL format:
{
"question": "<network log entry - content gated>",
"answer": "Benign|Malicious",
"meta": {
"source": "<dataset source>",
"scenario": "<capture scenario>",
"attack_family": "<attack type if malicious>",
"hash": "<content hash>",
"split": "train|dev|test"
}
}
Hub/local JSONL format:
{
"question": "<k8s/terraform config - content gated>",
"info": {
"violations": [
{
"tool": "kube-linter|semgrep|opa",
"rule_id": "<rule identifier>",
"severity": "low|medium|high",
"msg": "<violation message>",
"loc": "<file:line if available>"
}
],
"patch": "<optional unified diff>"
},
"meta": {
"lang": "k8s|terraform",
"source": "<source repository>",
"hash": "<content hash>"
}
}
When datasets are loaded, the environment converts them to internal format:
question → prompt, answer → expected labelquestion → prompt, info → answer (JSON string with oracle violations)The conversion happens in _convert_e2_format() in sv_env_config_verification.py.
| Environment | Data Directory |
|---|---|
| E1 network-logs | environments/sv-env-network-logs/data/ |
| E2 config-verification | environments/sv-env-config-verification/data/ |
Environments support multi-tier loading:
data/ directoryimport verifiers as vf
# Auto mode (tries local → hub → synthetic)
env = vf.load_environment("sv-env-network-logs")
# Explicit source
env = vf.load_environment("sv-env-network-logs", dataset_source="local")
env = vf.load_environment("sv-env-network-logs", dataset_source="hub")
env = vf.load_environment("sv-env-network-logs", dataset_source="synthetic")
HF_TOKEN required: Set in .env for gated dataset access.
Missing sources: Run make clone-e2-sources before E2 data builds.
Validation fails: Check schema matches expected Pydantic models in scripts/data/validate_splits_*.py.