تشغيل أي مهارة في Manus بنقرة واحدة

earos-assess

النجوم٠

التفرعات٠

آخر تحديث٢٣ مارس ٢٠٢٦ في ١٤:٢٦

Run a full EAROS evaluation on an architecture artifact. Triggers when the user wants to assess, evaluate, score, or review an architecture document using the EAROS framework. Also triggers for "score this architecture", "evaluate this ADR", "run EAROS on this", "assess this capability map", "review this solution design", "is this architecture any good", "quality check this design", "grade this document", "what score would this get", or any request to evaluate, rate, or assess the quality of an architecture artifact.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

ThomasRohde

ThomasRohde/EAROS

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مطوّرو البرمجياتمهن الحاسوب والرياضيات·SOC 15-1252

مستكشف الملفات

4 ملفات

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

earos-artifact-gen

ThomasRohde/EAROS

Create architecture documents through guided interview. Triggers on "create an architecture document", "generate a reference architecture", "help me write a solution architecture", "document my architecture", "new architecture document", or any request to create/write/generate architecture artifacts.

2026-04-110

earos-report

ThomasRohde/EAROS

Generate executive reports from EAROS evaluation records. Triggers when the user wants to generate a report, create a summary, produce an executive view, aggregate multiple evaluations, show trends, or says "generate a report", "create an executive summary", "summarize these evaluations", "show me the portfolio status", "create a dashboard view", "what is the overall quality of our architecture portfolio", or "produce an EAROS report".

2026-03-230

earos-calibrate

ThomasRohde/EAROS

Run EAROS calibration exercises to validate rubric reliability before production use. Use this skill whenever someone wants to calibrate a rubric, validate inter-rater reliability, compare scores against gold-standard artifacts, measure scoring consistency, or says "calibrate this rubric", "run calibration", "check if the rubric is reliable", "compare my scores to the gold set", "test this profile against examples", "is this rubric ready for production", "what is our kappa", "measure agreement between reviewers", "validate a new profile", or "how well does the rubric score consistently". Calibration is required before any new profile can move from draft to candidate status.

2026-03-220

earos-create

ThomasRohde/EAROS

Create a new EAROS rubric — core rubric, artifact profile, or cross-cutting overlay. Use this skill when someone wants to "create a rubric", "new profile", "new overlay", "define criteria for", "make an assessment rubric for", "I need a rubric for", "how do I assess [artifact type]", "create evaluation criteria", "build a scoring framework", "new EAROS rubric", "add a rubric for [type]", "we don't have a rubric for", "extend EAROS for", "create evaluation standards for", or any request to create, define, or build evaluation criteria for architecture artifacts. Also triggers on "I need something to score [artifact type]", "how do I make EAROS work for [artifact type]", "we need criteria for [artifact type]", or "I want to add [artifact type] to EAROS". This skill supersedes earos-profile-author for creating new rubrics from scratch.

2026-03-220

earos-profile-author

ThomasRohde/EAROS

Technical YAML authoring guide for EAROS profiles and overlays. Use this skill when someone has already completed rubric design (criteria defined, design method chosen) and needs help with the YAML structure, v2 field requirements, or schema compliance. Also triggers when someone asks "what are the 5 design methods", "how do I write a criterion", "what fields does a v2 criterion need", or "how do I structure overlay YAML". NOTE: For creating new rubrics from scratch — where the criteria are not yet defined — use earos-create instead. This skill focuses on the technical details of profile YAML authoring after rubric design is complete.

2026-03-220

earos-remediate

ThomasRohde/EAROS

Generate a prioritized improvement plan from an EAROS evaluation. Triggers on "how do I fix this", "improve this artifact", "remediation plan", "how to pass EAROS", "fix the assessment", "improvement plan", "what's wrong with my architecture", "how to get a better score", or any request to improve an artifact based on evaluation results.

2026-03-200

name

earos-assess

description

EAROS Assessment Skill

You are running a governed architecture quality evaluation. The output must be auditable — every score needs a cited evidence anchor from the artifact, not an impression. The most common failure mode in architecture assessment (human and AI alike) is scoring from vibes rather than evidence. This skill prevents that.

Before anything else: Read references/scoring-protocol.md to understand the RULERS evidence-anchoring protocol. Do this before Step 2.

Step 0 — Load the Rubric Files

Read these files before scoring anything. The rubric files contain the scoring_guide and decision_tree fields that define what each score level means. Do not score from memory — read the rubric.

Start with the manifest. Read earos.manifest.yaml (at the repo root) first — it is the authoritative registry of all available profiles and overlays. Use the path values listed in the manifest to find the files. Do not hardcode paths.

Always load:

core/core-meta-rubric.yaml — 9 dimensions, 10 criteria, applies to every artifact

Load the matching profile (if one exists): Check earos.manifest.yaml → profiles section for available profiles and their artifact_type. Select the entry whose artifact_type matches the artifact being assessed. If no match exists, use core only.

Common matches:

Solution architecture → profiles/solution-architecture.yaml
Reference architecture → profiles/reference-architecture.yaml
Architecture Decision Record → profiles/adr.yaml
Capability map → profiles/capability-map.yaml
Roadmap → profiles/roadmap.yaml

Ask the user which overlays apply (if not specified): Check earos.manifest.yaml → overlays section for all available overlays.

overlays/security.yaml — apply when the artifact touches auth, authorization, personal data, or external integrations
overlays/data-governance.yaml — apply when the artifact describes data flows, retention, or classification
overlays/regulatory.yaml — apply when the artifact is in a regulated domain (payments, healthcare, financial reporting)

The 8-Step Evaluation DAG

The evaluation follows a directed acyclic graph. Steps must run in order — you cannot aggregate before scoring, cannot determine status before checking gates.

structural_validation → content_extraction → criterion_scoring
  → cross_reference_validation → dimension_aggregation
    → challenge_pass → calibration → status_determination

Step 1 — Structural Validation

Binary gate: is the artifact reviewable at all?

Check whether these five elements are present:

Title and version identifier
Named owner or author
Purpose or scope statement
Diagrams or structural representations
Stakeholder or audience section

If 3 or more are absent: stop. Flag Not Reviewable. Explain exactly which elements are missing and what must be added before assessment can proceed. Do not assign criterion scores for an un-reviewable artifact.

Step 2 — Content Extraction (RULERS Protocol)

Read references/scoring-protocol.md before this step. It contains the full RULERS protocol, evidence classification rules, and examples of correct vs. incorrect evidence extraction.

For every criterion in the loaded rubric files:

Read the criterion's required_evidence list
Search the artifact for direct quotes, references, or sections that address it
Record an evidence_anchor (section heading, page, diagram label) and an excerpt (direct quote or close paraphrase)
Classify: observed (directly stated) / inferred (reasonable interpretation) / external (judgment based on a standard outside the artifact)

If you cannot find evidence → record evidence_class: none. The absence of evidence is data. Never score from impression.

Step 3 — Criterion Scoring

For each criterion:

Use the scoring_guide level descriptors from the rubric YAML — these are the authoritative definitions of each score level
Use the decision_tree field to resolve ambiguous cases (it translates the scoring guide into observable conditions)
Score 0–4. Use N/A only when the criterion genuinely cannot apply, with a written justification
Report confidence (high / medium / low) separately — it does not change the numerical score
If you find contradicting evidence after assigning a score, revise the score down

Minimum output per criterion:

criterion_id: [ID]
criterion_question: "[full question text from the rubric]"
score: [0-4 or N/A]
evidence_class: [observed/inferred/external/none]
confidence: [high/medium/low]
evidence_anchor: "[section or location in artifact]"
excerpt: "[direct quote or close paraphrase]"
rationale: "[1-3 sentences citing the evidence]"

If scoring feels ambiguous, see references/scoring-protocol.md for worked examples of good and bad scoring, and how to use decision_tree fields.

Step 4 — Cross-Reference Validation

Check for internal consistency issues that affect scores:

Do component names match across all diagrams and sections?
Do interface definitions agree between API specs and sequence diagrams?
Is the scope boundary consistent across all views?
Do narrative claims match the diagrams?

Inconsistencies reduce scores on CON-01 (internal consistency) in the core rubric. Note specific mismatches as evidence.

For cross-reference patterns, see references/scoring-protocol.md#cross-reference-validation.

Step 5 — Dimension Aggregation

For each dimension:

Average criterion scores — exclude N/A criteria from the denominator (they don't count for or against)
Apply the dimension weight from the rubric YAML
Report the weighted dimension score

A dimension score of 0.0 is not neutralized by a dimension score of 4.0. Report each dimension separately.

Step 6 — Challenge Pass

Before finalizing, challenge your three highest and three lowest scores:

"What interpretation of the evidence would justify a higher score?"
"What interpretation would justify a lower score?"
"Am I labelling as observed something that is actually inferred?"

Revise scores where the challenge reveals weak reasoning. Flag revised scores with revised: true.

The purpose of this step is to catch over-scoring (the most common agent failure) and under-scoring (harsh treatment of well-evidenced but incomplete artifacts).

For detailed challenge methodology, see references/scoring-protocol.md#challenge-pass.

Step 7 — Calibration Check

Read references/calibration-benchmarks.md before this step to sanity-check your score distribution.

Quick self-checks:

An overall score > 3.5 should be genuinely exceptional — evidence-rich, decision-ready. If you're scoring above 3.5, confirm it's warranted.
An overall score < 2.0 is a serious, near-unusable artifact. Confirm this is warranted before finalizing.
Flag any criterion where confidence: low — these warrant independent human review.

Step 8 — Status Determination

Gates first — check gate criteria before computing any weighted average. A single critical gate failure blocks a passing status, no matter how high the average is. The specific outcome (reject or not_reviewable) is determined by the criterion's failure_effect.

Gate type	Effect
`critical` failure	Status = `reject` or `not_reviewable` (per `failure_effect`) regardless of average
`major` failure	Status cannot exceed `conditional_pass`

Then compute the weighted overall average and apply thresholds:

Status	Threshold
Pass	No critical gate failure + overall ≥ 3.2 + no dimension < 2.0
Conditional Pass	No critical gate failure + overall 2.4–3.19
Rework Required	Overall < 2.4 or repeated weak dimensions
Reject	Any critical gate failure
Not Reviewable	Evidence too incomplete to score gate criteria

Output

Produce two outputs: a YAML evaluation record and a markdown report.

Read references/output-templates.md for full templates with field-by-field explanations before producing output. Mirror the structure of examples/example-solution-architecture.evaluation.yaml.

YAML evaluation record key fields: evaluation_id, rubric_id, artifact_ref, evaluation_date, status, overall_score, gate_failures, criterion_results, dimension_scores, narrative_summary, recommended_actions

Markdown report key sections: Traffic-light status, overall score, dimension table, gate failures, key findings (3–5 bullets), top 5 prioritized recommended actions with criterion references.

Non-Negotiable Rules

Evidence first. Every score requires a cited excerpt or reference. "The artifact seems to address this" is not evidence.
Gates override averages. One critical gate failure = Reject regardless of the overall score.
Confidence ≠ score. Low confidence lowers the weight a human reviewer places on your output. It does not lower the numerical score.
N/A requires justification. One sentence explaining why the criterion genuinely cannot apply.
Do not modify the rubric during evaluation. It is locked. Changes require a version bump.
Never collapse the three evaluation types — artifact quality, architectural fitness, and governance fit are distinct judgments. Keep them separate in the narrative.

When to Read Which Reference File

When	Read
Before Step 2 (always)	`references/scoring-protocol.md`
When scoring is ambiguous	`references/scoring-protocol.md`
Before the challenge pass (Step 6)	`references/scoring-protocol.md` — section: Challenge Pass
Before Step 7 (calibration check)	`references/calibration-benchmarks.md`
Before producing output	`references/output-templates.md`
Unsure what a score distribution should look like	`references/calibration-benchmarks.md`