Name: 10x Run Eval
Author: przeprogramowani

10x-run-eval

// Evaluate Przeprogramowani website implementations against benchmark criteria. Analyzes tech stack, pages, content accuracy, SEO, and responsiveness. Use when evaluating LLM-generated website attempts in the Przeprogramowani benchmark repository. IMPORTANT: Do not use this skill during the task of creating the website. Use it only to evaluate the website based on a direct request from the user.

name	10x-run-eval
description	Evaluate Przeprogramowani website implementations against benchmark criteria. Analyzes tech stack, pages, content accuracy, SEO, and responsiveness. Use when evaluating LLM-generated website attempts in the Przeprogramowani benchmark repository. IMPORTANT: Do not use this skill during the task of creating the website. Use it only to evaluate the website based on a direct request from the user.

10x Run Evaluation

Evaluate a Przeprogramowani website implementation against benchmark criteria.

10xBench Structure

10x-bench (this repository) - contains the implementation to evaluate
10x-bench-eval (companion repository) - contains the evaluation criteria and scoring methodology

What this skill does

Systematically evaluates website implementations by:

Reading benchmark criteria from 10x-bench-eval/benchmark/criteria.md
Setting up the implementation (npm install, npm run build, npm run dev)
Testing against all evaluation criteria and asking user for feedback where needed
Generating structured results in 10x-bench/eval-results/{model-name}-attempt-{number}/eval-results.csv

How to use

Invoke with the directory path to evaluate:

/10x-run-eval /path/to/implementation

Or provide the path when prompted if not specified.

Output

Generates eval-results.csv in ./eval-results/{model-name}-attempt-{number}/eval-results.csv directory

See 10x-bench-eval/benchmark/eval.md for complete evaluation guidelines.

name	10x-run-eval
description	Evaluate Przeprogramowani website implementations against benchmark criteria. Analyzes tech stack, pages, content accuracy, SEO, and responsiveness. Use when evaluating LLM-generated website attempts in the Przeprogramowani benchmark repository. IMPORTANT: Do not use this skill during the task of creating the website. Use it only to evaluate the website based on a direct request from the user.