with one click
verification-quality-assurance
// Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.
// Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.
| name | Verification & Quality Assurance |
| description | Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability. |
This skill provides a comprehensive verification and quality assurance system that ensures code quality and correctness through:
Shipped vs. aspirational. The concrete, in-CI verification stack — the 6 regression-guard jobs + the witness manifest + the tool-discoverability audit — is real and runs on every push. The truth-scoring / auto-rollback / WebSocket-dashboard surface described later in this doc is partly shipped (
ruflo verifyruns the witness checks) and partly design — treat the "CI Guards" section below as the authoritative current state.
Ruflo's regression protection is three layers, all gated before publish. Authoritative reference: verification/README.md.
| Layer | What | CI job(s) in .github/workflows/v3-ci.yml | ADR |
|---|---|---|---|
| 1 — install/behavioral smoke | Exercise user-visible failure modes against a real build | smoke-install-no-bsqlite (npm install on platforms w/o prebuilds), plugin-hooks-smoke (#1859/#1862 — hook flag parsing), mcp-protocol-smoke (#1874 — HTTP MCP wire format), memory-import-smoke (#1883/#1884 — WSL path + key sanitization), mcp-roundtrip-smoke (#1889 paired-tool round-trip + #1863 cli-no-crash + ADR-095 G2 consensus-transport) | ADR-102 |
| 1 — discoverability gate | Every MCP tool description must answer "use this over native when?" | tool-descriptions-audit — scripts/audit-tool-descriptions.mjs, baseline at verification/mcp-tool-baseline.json (monotone-decreasing: noGuidance / tooShort / duplicates) | ADR-112 |
| 2 — cryptographic witness | Every documented fix's load-bearing marker must still be present in dist; Ed25519-signed, per-OS bundles | witness-verify (ubuntu/macos/windows) — plugins/ruflo-core/scripts/witness/verify.mjs against verification/<os>/manifest.md.json | ADR-103 |
| 3 — temporal history | When was a regression introduced | verification/<os>/history.jsonl + history.mjs (summary / regressions / timeline) | ADR-103 |
# Tool-description discoverability audit (ADR-112)
node scripts/audit-tool-descriptions.mjs # fails if any baseline count rises
node scripts/audit-tool-descriptions.mjs --update-baseline # lock the new floor after a fix lands
# Behavioral smokes (each builds what it needs; safe to run individually)
node plugins/ruflo-core/scripts/test-hooks.mjs "node $PWD/v3/@claude-flow/cli/bin/cli.js"
node plugins/ruflo-core/scripts/test-mcp-protocol.mjs
node plugins/ruflo-core/scripts/test-memory-import.mjs
node plugins/ruflo-core/scripts/test-mcp-roundtrips.mjs # #1889 paired-tool round-trip
node plugins/ruflo-core/scripts/test-cli-no-crash.mjs # #1863 unhandled-exception class
node plugins/ruflo-core/scripts/test-consensus-transport.mjs # ADR-095 G2 consensus transport
# Witness manifest — regenerate + verify
node scripts/regen-witness.mjs
node plugins/ruflo-core/scripts/witness/verify.mjs --manifest verification/macos/manifest.md.json
# Temporal history
node plugins/ruflo-core/scripts/witness/history.mjs --history verification/macos/history.jsonl summary
node plugins/ruflo-core/scripts/witness/history.mjs --history verification/macos/history.jsonl regressions
plugins/ruflo-core/scripts/test-<name>.mjs. Pattern: static dist-scan first (fast, always completes), behavioral probe second with an internal timeout + a process-level watchdog so CI never hangs. Add a step to the relevant job in v3-ci.yml.scripts/audit-<name>.mjs that scans, counts violations, and fails if the count exceeds a monotone-decreasing baseline in verification/<name>-baseline.json. Support --update-baseline. Add a CI job; wire it into witness-verify needs[] if it should gate publish.{ id, desc, file, marker } to verification/witness-fixes.json, run node scripts/regen-witness.mjs. The marker must be a substring the fix specifically creates (not a generic pattern like 'function').npx ruflo@alpha)@noble/ed25519 (for the witness verifier — a single runtime dep, npm i @noble/ed25519)# View current truth scores
npx ruflo@alpha truth
# Run verification check
npx ruflo@alpha verify check
# Verify specific file with custom threshold
npx ruflo@alpha verify check --file src/app.js --threshold 0.98
# Rollback last failed verification
npx ruflo@alpha verify rollback --last-good
Display comprehensive quality and reliability metrics for your codebase and agent tasks.
Basic Usage:
# View current truth scores (default: table format)
npx ruflo@alpha truth
# View scores for specific time period
npx ruflo@alpha truth --period 7d
# View scores for specific agent
npx ruflo@alpha truth --agent coder --period 24h
# Find files/tasks below threshold
npx ruflo@alpha truth --threshold 0.8
Output Formats:
# Table format (default)
npx ruflo@alpha truth --format table
# JSON for programmatic access
npx ruflo@alpha truth --format json
# CSV for spreadsheet analysis
npx ruflo@alpha truth --format csv
# HTML report with visualizations
npx ruflo@alpha truth --format html --export report.html
Real-time Monitoring:
# Watch mode with live updates
npx ruflo@alpha truth --watch
# Export metrics automatically
npx ruflo@alpha truth --export .claude-flow/metrics/truth-$(date +%Y%m%d).json
Example dashboard output:
📊 Truth Metrics Dashboard
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Overall Truth Score: 0.947 ✅
Trend: ↗️ +2.3% (7d)
Top Performers:
verification-agent 0.982 ⭐
code-analyzer 0.971 ⭐
test-generator 0.958 ✅
Needs Attention:
refactor-agent 0.821 ⚠️
docs-generator 0.794 ⚠️
Recent Tasks:
task-456 0.991 ✅ "Implement auth"
task-455 0.967 ✅ "Add tests"
task-454 0.743 ❌ "Refactor API"
Truth Scores (0.0-1.0):
1.0-0.95: Excellent ⭐ (production-ready)0.94-0.85: Good ✅ (acceptable quality)0.84-0.75: Warning ⚠️ (needs attention)<0.75: Critical ❌ (requires immediate action)Trend Indicators:
Statistics:
Execute comprehensive verification checks on code, tasks, or agent outputs.
File Verification:
# Verify single file
npx ruflo@alpha verify check --file src/app.js
# Verify directory recursively
npx ruflo@alpha verify check --directory src/
# Verify with auto-fix enabled
npx ruflo@alpha verify check --file src/utils.js --auto-fix
# Verify current working directory
npx ruflo@alpha verify check
Task Verification:
# Verify specific task output
npx ruflo@alpha verify check --task task-123
# Verify with custom threshold
npx ruflo@alpha verify check --task task-456 --threshold 0.99
# Verbose output for debugging
npx ruflo@alpha verify check --task task-789 --verbose
Batch Verification:
# Verify multiple files in parallel
npx ruflo@alpha verify batch --files "*.js" --parallel
# Verify with pattern matching
npx ruflo@alpha verify batch --pattern "src/**/*.ts"
# Integration test suite
npx ruflo@alpha verify integration --test-suite full
The verification system evaluates:
Code Correctness
Best Practices
Security
Performance
Documentation
# Get structured JSON output
npx ruflo@alpha verify check --json > verification.json
# Example JSON structure:
{
"overallScore": 0.947,
"passed": true,
"threshold": 0.95,
"checks": [
{
"name": "code-correctness",
"score": 0.98,
"passed": true
},
{
"name": "security",
"score": 0.91,
"passed": false,
"issues": [...]
}
]
}
Automatically revert changes that fail verification checks.
Basic Rollback:
# Rollback to last known good state
npx ruflo@alpha verify rollback --last-good
# Rollback to specific commit
npx ruflo@alpha verify rollback --to-commit abc123
# Interactive rollback with preview
npx ruflo@alpha verify rollback --interactive
Smart Rollback:
# Rollback only failed files (preserve good changes)
npx ruflo@alpha verify rollback --selective
# Rollback with automatic backup
npx ruflo@alpha verify rollback --backup-first
# Dry-run mode (preview without executing)
npx ruflo@alpha verify rollback --dry-run
Rollback Performance:
Create detailed verification reports with metrics and visualizations.
Report Formats:
# JSON report
npx ruflo@alpha verify report --format json
# HTML report with charts
npx ruflo@alpha verify report --export metrics.html --format html
# CSV for data analysis
npx ruflo@alpha verify report --format csv --export metrics.csv
# Markdown summary
npx ruflo@alpha verify report --format markdown
Time-based Reports:
# Last 24 hours
npx ruflo@alpha verify report --period 24h
# Last 7 days
npx ruflo@alpha verify report --period 7d
# Last 30 days with trends
npx ruflo@alpha verify report --period 30d --include-trends
# Custom date range
npx ruflo@alpha verify report --from 2025-01-01 --to 2025-01-31
Report Content:
Run interactive web-based verification dashboard with real-time updates.
# Launch dashboard on default port (3000)
npx ruflo@alpha verify dashboard
# Custom port
npx ruflo@alpha verify dashboard --port 8080
# Export dashboard data
npx ruflo@alpha verify dashboard --export
# Dashboard with auto-refresh
npx ruflo@alpha verify dashboard --refresh 5s
Dashboard Features:
Set verification preferences in .claude-flow/config.json:
{
"verification": {
"threshold": 0.95,
"autoRollback": true,
"gitIntegration": true,
"hooks": {
"preCommit": true,
"preTask": true,
"postEdit": true
},
"checks": {
"codeCorrectness": true,
"security": true,
"performance": true,
"documentation": true,
"bestPractices": true
}
},
"truth": {
"defaultFormat": "table",
"defaultPeriod": "24h",
"warningThreshold": 0.85,
"criticalThreshold": 0.75,
"autoExport": {
"enabled": true,
"path": ".claude-flow/metrics/truth-daily.json"
}
}
}
Adjust verification strictness:
# Strict mode (99% accuracy required)
npx ruflo@alpha verify check --threshold 0.99
# Lenient mode (90% acceptable)
npx ruflo@alpha verify check --threshold 0.90
# Set default threshold
npx ruflo@alpha config set verification.threshold 0.98
Per-environment thresholds:
{
"verification": {
"thresholds": {
"production": 0.99,
"staging": 0.95,
"development": 0.90
}
}
}
GitHub Actions:
name: Quality Verification
on: [push, pull_request]
jobs:
verify:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Dependencies
run: npm install
- name: Run Verification
run: |
npx ruflo@alpha verify check --json > verification.json
- name: Check Truth Score
run: |
score=$(jq '.overallScore' verification.json)
if (( $(echo "$score < 0.95" | bc -l) )); then
echo "Truth score too low: $score"
exit 1
fi
- name: Upload Report
uses: actions/upload-artifact@v3
with:
name: verification-report
path: verification.json
GitLab CI:
verify:
stage: test
script:
- npx ruflo@alpha verify check --threshold 0.95 --json > verification.json
- |
score=$(jq '.overallScore' verification.json)
if [ $(echo "$score < 0.95" | bc) -eq 1 ]; then
echo "Verification failed with score: $score"
exit 1
fi
artifacts:
paths:
- verification.json
reports:
junit: verification.json
Run verification automatically during swarm operations:
# Swarm with verification enabled
npx ruflo@alpha swarm --verify --threshold 0.98
# Hive Mind with auto-rollback
npx ruflo@alpha hive-mind --verify --rollback-on-fail
# Training pipeline with verification
npx ruflo@alpha train --verify --threshold 0.99
Enable real-time verification during collaborative development:
# Pair with verification
npx ruflo@alpha pair --verify --real-time
# Pair with custom threshold
npx ruflo@alpha pair --verify --threshold 0.97 --auto-fix
Monitor codebase continuously during development:
# Watch directory for changes
npx ruflo@alpha verify watch --directory src/
# Watch with auto-fix
npx ruflo@alpha verify watch --directory src/ --auto-fix
# Watch with notifications
npx ruflo@alpha verify watch --notify --threshold 0.95
Send metrics to external monitoring systems:
# Export to Prometheus
npx ruflo@alpha truth --format json | \
curl -X POST https://pushgateway.example.com/metrics/job/claude-flow \
-d @-
# Send to DataDog
npx ruflo@alpha verify report --format json | \
curl -X POST "https://api.datadoghq.com/api/v1/series?api_key=${DD_API_KEY}" \
-H "Content-Type: application/json" \
-d @-
# Custom webhook
npx ruflo@alpha truth --format json | \
curl -X POST https://metrics.example.com/api/truth \
-H "Content-Type: application/json" \
-d @-
Automatically verify before commits:
# Install pre-commit hook
npx ruflo@alpha verify install-hook --pre-commit
# .git/hooks/pre-commit example:
#!/bin/bash
npx ruflo@alpha verify check --threshold 0.95 --json > /tmp/verify.json
score=$(jq '.overallScore' /tmp/verify.json)
if (( $(echo "$score < 0.95" | bc -l) )); then
echo "❌ Verification failed with score: $score"
echo "Run 'npx ruflo@alpha verify check --verbose' for details"
exit 1
fi
echo "✅ Verification passed with score: $score"
Verification Speed:
Rollback Speed:
Dashboard Performance:
Low Truth Scores:
# Get detailed breakdown
npx ruflo@alpha truth --verbose --threshold 0.0
# Check specific criteria
npx ruflo@alpha verify check --verbose
# View agent-specific issues
npx ruflo@alpha truth --agent <agent-name> --format json
Rollback Failures:
# Check git status
git status
# View rollback history
npx ruflo@alpha verify rollback --history
# Manual rollback
git reset --hard HEAD~1
Verification Timeouts:
# Increase timeout
npx ruflo@alpha verify check --timeout 60s
# Verify in batches
npx ruflo@alpha verify batch --batch-size 10
Verification commands return standard exit codes:
0: Verification passed (score ≥ threshold)1: Verification failed (score < threshold)2: Error during verification (invalid input, system error)npx ruflo@alpha pair - Collaborative development with verificationnpx ruflo@alpha train - Training with verification feedbacknpx ruflo@alpha swarm - Multi-agent coordination with quality checksnpx ruflo@alpha report - Generate comprehensive project reports/docs/truth-scoring.md/docs/verification-criteria.md/examples/verification//docs/api/verification.mdComprehensive GitHub project management with swarm-coordinated issue tracking, project board automation, and sprint planning
Comprehensive GitHub code review with AI-powered swarm coordination
Multi-repository coordination, synchronization, and architecture management with AI swarm orchestration
Comprehensive GitHub release orchestration with AI swarm coordination for automated versioning, testing, deployment, and rollback management
Advanced GitHub Actions workflow automation with AI swarm coordination, intelligent CI/CD pipelines, and comprehensive repository management
Mean-variance portfolio optimization via Conjugate Gradient — 40-60× faster than the legacy Neumann path (ADR-126 Phase 3, ADR-123 Wedge 8)