Run any Skill in Manus with one click

$pwd:

sv-data

Name: Sv Data
Author: intertwine

// Build and manage Security Verifiers datasets. Use when asked to build E1 or E2 datasets, create test fixtures, validate data, or manage dataset files for network-logs or config-verification environments.

Run Skill in Manus

$ git log --oneline --stat

stars:3

forks:0

updated:January 24, 2026 at 19:53

SKILL.md

readonly

related-skills.json

same repository

sv-eval.md

from "intertwine/security-verifiers"

Run and analyze Security Verifiers evaluations. Use when asked to evaluate models on E1 (network-logs) or E2 (config-verification), generate metrics reports, compare model performance, or analyze eval results.

2026-02-043

sv-report.md

from "intertwine/security-verifiers"

Generate SV-Bench metrics reports (summary.json + report.md) for E1/E2 runs, validate metrics contracts, and produce comparison-friendly artifacts from outputs/evals/.

2026-02-043

sv-deploy.md

from "intertwine/security-verifiers"

Deploy Security Verifiers environments and packages. Use when asked to deploy to Prime Intellect Environments Hub, publish to PyPI, bump versions, build wheels, or manage releases.

2026-01-243

sv-dev.md

from "intertwine/security-verifiers"

Development workflow for Security Verifiers. Use when asked to run tests, lint code, format files, set up the development environment, or perform CI checks on the codebase.

2026-01-243

sv-hf.md

from "intertwine/security-verifiers"

Manage HuggingFace datasets for Security Verifiers. Use when asked to push datasets to HuggingFace, manage metadata, configure gated access, or set up user HF repositories for E1/E2 datasets.

2026-01-243

package.json

"author": "intertwine"

"repository": "intertwine/security-verifiers"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Information Security AnalystsComputer and Mathematical Occupations15-1212L4

name	sv-data
description	Build and manage Security Verifiers datasets. Use when asked to build E1 or E2 datasets, create test fixtures, validate data, or manage dataset files for network-logs or config-verification environments.
metadata	{"author":"security-verifiers","version":"1.0"}

Security Verifiers Data Management

Build, validate, and manage datasets for E1 (network-logs) and E2 (config-verification) environments.

Dataset Types

Type	Purpose	Location	Committed
Production	Full training/eval data	`environments/sv-env-*/data/`	No (private)
Test fixtures	CI/unit tests	`environments/sv-env-*/data/`	Yes (small)
HuggingFace	Remote dataset access	HF Hub	Private repos

E1: Network Logs Datasets

Build Production Dataset (IoT-23)

# Full dataset (1800 examples, 60/20/20 split)
make data-e1

# Custom limit
make data-e1 LIMIT=3000

Outputs: environments/sv-env-network-logs/data/iot23-train-dev-test-v1.jsonl

Build OOD (Out-of-Distribution) Datasets

# CIC-IDS-2017 and UNSW-NB15 (600 examples each)
make data-e1-ood

# Custom count
make data-e1-ood N=1000

Outputs:

cic-ids-2017-ood-v1.jsonl
unsw-nb15-ood-v1.jsonl

Build Test Fixtures

# Small datasets for CI (20-30 examples)
make data-e1-test

E2: Config Verification Datasets

Clone Source Repositories

E2 datasets are built from real Kubernetes and Terraform configs:

# Clone recommended repos
make clone-e2-sources

Clones to scripts/data/sources/:

kubernetes/ - K8s YAML manifests
terraform/ - Terraform HCL configs

Build Production Dataset

# Using cloned sources
make data-e2-local

# Using custom paths
make data-e2 K8S_ROOT=/path/to/k8s TF_ROOT=/path/to/terraform

Outputs:

environments/sv-env-config-verification/data/k8s-labeled-v1.jsonl
environments/sv-env-config-verification/data/terraform-labeled-v1.jsonl

Build Test Fixtures

# Requires clone-e2-sources first
make clone-e2-sources
make data-e2-test

Build All Datasets

# All E1 production datasets
make data-all

# All test fixtures (for CI)
make data-test-all

Data Validation

Validate datasets with Pydantic before HuggingFace push:

# Validate E1 splits
make validate-e1-data

# Validate E2 splits
make validate-e2-data

# Validate all
make validate-data

Dataset Schema

Note: Examples below show schema structure only. Actual benchmark data is gated to prevent training contamination. See plans/ROADMAP-Q1-2026.md for benchmark integrity policy.

E1 Schema (network-logs)

Hub/local JSONL format:

{
  "question": "<network log entry - content gated>",
  "answer": "Benign|Malicious",
  "meta": {
    "source": "<dataset source>",
    "scenario": "<capture scenario>",
    "attack_family": "<attack type if malicious>",
    "hash": "<content hash>",
    "split": "train|dev|test"
  }
}

E2 Schema (config-verification)

Hub/local JSONL format:

{
  "question": "<k8s/terraform config - content gated>",
  "info": {
    "violations": [
      {
        "tool": "kube-linter|semgrep|opa",
        "rule_id": "<rule identifier>",
        "severity": "low|medium|high",
        "msg": "<violation message>",
        "loc": "<file:line if available>"
      }
    ],
    "patch": "<optional unified diff>"
  },
  "meta": {
    "lang": "k8s|terraform",
    "source": "<source repository>",
    "hash": "<content hash>"
  }
}

Schema Conversion

When datasets are loaded, the environment converts them to internal format:

E1: question → prompt, answer → expected label
E2: question → prompt, info → answer (JSON string with oracle violations)

The conversion happens in _convert_e2_format() in sv_env_config_verification.py.

Dataset Locations

Environment	Data Directory
E1 network-logs	`environments/sv-env-network-logs/data/`
E2 config-verification	`environments/sv-env-config-verification/data/`

Loading Datasets

Environments support multi-tier loading:

Local: JSONL files in data/ directory
Hub: HuggingFace (requires HF_TOKEN)
Synthetic: Built-in test fixtures (fallback)

import verifiers as vf

# Auto mode (tries local → hub → synthetic)
env = vf.load_environment("sv-env-network-logs")

# Explicit source
env = vf.load_environment("sv-env-network-logs", dataset_source="local")
env = vf.load_environment("sv-env-network-logs", dataset_source="hub")
env = vf.load_environment("sv-env-network-logs", dataset_source="synthetic")

Troubleshooting

HF_TOKEN required: Set in .env for gated dataset access. Missing sources: Run make clone-e2-sources before E2 data builds. Validation fails: Check schema matches expected Pydantic models in scripts/data/validate_splits_*.py.

sv-data

More from this repository

More from this repository

Security Verifiers Data Management

Dataset Types

E1: Network Logs Datasets

Build Production Dataset (IoT-23)

Build OOD (Out-of-Distribution) Datasets

Build Test Fixtures

E2: Config Verification Datasets

Clone Source Repositories

Build Production Dataset

Build Test Fixtures

Build All Datasets

Data Validation

Dataset Schema

E1 Schema (network-logs)

E2 Schema (config-verification)

Schema Conversion

Dataset Locations

Loading Datasets

Troubleshooting

Security Verifiers Data Management

Dataset Types

E1: Network Logs Datasets

Build Production Dataset (IoT-23)

Build OOD (Out-of-Distribution) Datasets

Build Test Fixtures

E2: Config Verification Datasets

Clone Source Repositories

Build Production Dataset

Build Test Fixtures

Build All Datasets

Data Validation

Dataset Schema

E1 Schema (network-logs)

E2 Schema (config-verification)

Schema Conversion

Dataset Locations

Loading Datasets

Troubleshooting