name	evaluate-evidence
description	Evaluate engineering artifacts against skill markers and write evidence rows for Landmark.

Evaluate Evidence

Evaluate engineering artifacts against an agent-aligned engineering standard and write evidence rows that Landmark presents.

Trigger

The prompt must contain "evaluate" and one of:

Person scope: "for {email}" — GetUnscoredArtifacts({ email })
Team scope: "for direct reports of {email}" — GetUnscoredArtifacts({ manager_email: email })
Org scope: "for all" — GetUnscoredArtifacts({ org: true })

Procedure

Parse the scope from the prompt.
Call GetUnscoredArtifacts with the parsed scope. If the result is empty, report "no unscored artifacts" and exit.
For each artifact returned: a. Call GetArtifact with the artifact's id to get its detail. b. Call GetPerson with the artifact's author email to get their profile (discipline, level, track). c. Call GetMarkersForProfile with the profile to get the markers the engineer is expected to demonstrate. Each line is tab-separated: skill_id\tlevel_id\tmarker_text. d. Evaluate the artifact against each returned marker:
- Determine matched (boolean): does the artifact demonstrate this marker?
- Write a 1-3 sentence rationale explaining your reasoning.
- matched: false rows are valid — write them to document what was checked and not found. e. Call WriteEvidence once per marker with: artifact_id, skill_id, level_id, marker_text, matched, rationale. Call multiple markers in parallel for throughput.

Constraints

Every skill_id + marker_text pair must come verbatim from GetMarkersForProfile — never invent or paraphrase markers.
Every row must have non-null rationale and level_id.
Do not re-evaluate artifacts that already have evidence — they will not appear in GetUnscoredArtifacts.
Evaluation is idempotent: WriteEvidence upserts on (artifact_id, skill_id, level_id, marker_text).

Multi-source note

GetArtifact returns different structures per source type. Evaluate based on what the artifact contains:

Pull requests: title, description, diff context, review thread
Reviews: review body, comments, verdict
Commits: commit message, changed files

After all artifacts are evaluated, report the count of evidence rows written and exit.

Evaluate Evidence

Evaluate engineering artifacts against an agent-aligned engineering standard and write evidence rows that Landmark presents.

Trigger

The prompt must contain "evaluate" and one of:

Person scope: "for {email}" — GetUnscoredArtifacts({ email })

Team scope: "for direct reports of {email}" — GetUnscoredArtifacts({ manager_email: email })

Org scope: "for all" — GetUnscoredArtifacts({ org: true })

Procedure

Parse the scope from the prompt.

Call GetUnscoredArtifacts with the parsed scope. If the result is empty, report "no unscored artifacts" and exit.

For each artifact returned: a. Call GetArtifact with the artifact's id to get its detail. b. Call GetPerson with the artifact's author email to get their profile (discipline, level, track). c. Call GetMarkersForProfile with the profile to get the markers the engineer is expected to demonstrate. Each line is tab-separated: skill_id\tlevel_id\tmarker_text. d. Evaluate the artifact against each returned marker:

Determine matched (boolean): does the artifact demonstrate this marker?
Write a 1-3 sentence rationale explaining your reasoning.
matched: false rows are valid — write them to document what was checked and not found. e. Call WriteEvidence once per marker with: artifact_id, skill_id, level_id, marker_text, matched, rationale. Call multiple markers in parallel for throughput.

Constraints

Every skill_id + marker_text pair must come verbatim from GetMarkersForProfile — never invent or paraphrase markers.

Every row must have non-null rationale and level_id.

Do not re-evaluate artifacts that already have evidence — they will not appear in GetUnscoredArtifacts.

Evaluation is idempotent: WriteEvidence upserts on (artifact_id, skill_id, level_id, marker_text).

Multi-source note

GetArtifact returns different structures per source type. Evaluate based on what the artifact contains:

Pull requests: title, description, diff context, review thread

Reviews: review body, comments, verdict

Commits: commit message, changed files

After all artifacts are evaluated, report the count of evidence rows written and exit.

evaluate-evidence