원클릭으로 Manus에서 모든 스킬 실행

$pwd:

review-data-pr

Name: Review Data Pr
Author: owid

// Review an OWID ETL data update PR end-to-end — runs the pipeline, compares snapshot fields against the previous version, verifies links, audits indicator metadata coverage, and cross-checks workflow items from /update-dataset. Trigger when the user asks to "review this PR", "review the data PR", or invokes this on an open dataset-update branch.

Manus에서 실행

$ git log --oneline --stat

stars:148

forks:31

updated:2026년 5월 20일 15:33

SKILL.md

readonly

related-skills.json

같은 저장소

vibe-app.md

from "owid/etl"

Create a vibe app — a self-contained HTML page (data report, mini dashboard, prototype, …) — and publish it to the owid/vibe-webapps repo (served at vibe.owid.io). The scaffolder is geared toward data reports (text + embedded charts) but the same shape works for any internal webapp. Use when the user wants to turn an exploration, finding, analysis, or small interactive idea into a shareable internal page. (Renamed from `/data-report`.)

2026-05-29148

check-metadata-style.md

from "owid/etl"

Check grapher chart metadata (titles, subtitles, descriptions, display names) against OWID's Writing and Style Guide. Use when the user mentions the style guide, writing guide, chart copy quality, title/subtitle review, or after editing .meta.yml files under etl/steps/data/grapher/.

2026-05-27148

owid-metadata-generation.md

from "owid/etl"

Use when creating or enriching metadata for OWID ETL datasets - generates comprehensive YAML metadata from dataset inspection, data exploration, and web research following OWID metadata standards. Trigger when writing or editing *.meta.yml files, when a garden step has empty or minimal metadata, or when user asks to improve/add/enrich metadata.

2026-05-27148

map-explorer-to-mdim.md

from "owid/etl"

Suggest a redirect mapping from a (soon-to-sunset) OWID explorer's views to the views of one or more replacement MDIMs. Pulls explorer views and MDIM views from the grapher DB, writes a CSV per source/target plus a wide joint proposal that routes each explorer view to a target MDIM view, and flags when several explorer views land on the same MDIM view. Trigger when the user says "map explorer <slug> to mdim(s) <...>", "suggest explorer->MDIM redirects", "we're sunsetting the <slug> explorer, map its views to the new multidims", or similar.

2026-05-27148

data-updates-comms.md

from "owid/etl"

Draft answers for OWID's data-updates-comms Slack template using snapshot DVC + garden metadata + staging DB queries. Use when the user wants to announce a dataset update, fill the "Message about new data update" form, or generate the FAQ-style Slack post after an ETL update. Mechanical fields (producer, dates, coverage, chart count, search URL) are filled directly; editorial fields (why it matters, caveats, what's interesting about this update) get prompts seeded with extracted context for the user to refine.

2026-05-20148

report-indicator-changes.md

from "owid/etl"

Draft a short update message to a topic-owner reviewer after landing substantial dataset or chart changes on staging. The message lists the indicator changes with staging admin links, surfaces open design questions with option tables, and closes with a Chart Diff sign-off CTA. The output is markdown so the user can paste it into either Slack or a GitHub PR comment. Use after a dataset redesign or restructure when the iteration with the reviewer is back-and-forth and you need them to verify on staging before merge. Not for the canonical comms announcement — for that, see `data-updates-comms`.

2026-05-20148

package.json

"author": "owid"

"repository": "owid/etl"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

소프트웨어 품질 보증 분석가·테스터컴퓨터 및 수학직15-1253L4

name	review-data-pr
description	Review an OWID ETL data update PR end-to-end — runs the pipeline, compares snapshot fields against the previous version, verifies links, audits indicator metadata coverage, and cross-checks workflow items from /update-dataset. Trigger when the user asks to "review this PR", "review the data PR", or invokes this on an open dataset-update branch.
metadata	{"internal":true}

Review Data PR

End-to-end review of a dataset-update PR. Goes deeper than /review: actually runs the steps, compares to the previous version, audits metadata coverage against a fixed checklist, and reports on /update-dataset workflow status (Slack draft, Codex review, indicator upgrade, downstream deps).

Inputs

Optional PR number. If omitted, derive it from the current branch via gh pr list --head <branch>.

Workflow

1. PR metadata

gh pr view <num> --json title,body,isDraft,mergeable,statusCheckRollup,comments,reviews

Flag if PR description is empty (per user's standing rule: keep PR body in sync with substantial changes).

2. Diff and changed files

gh pr view <num> --json files --jq '.files[] | "\(.additions)+ \(.deletions)- \(.path)"'

For very large diffs (>1MB) skip gh pr diff and read the changed files directly with Read.

3. Locate the new dataset

From the changed files, identify:

New snapshot path: snapshots/<namespace>/<new_version>/<short_name>.<ext>.dvc (a .py upload script is optional — .dvc + url_download or a local file path is enough)
New step files: etl/steps/data/{meadow,garden,grapher}/<namespace>/<new_version>/<short_name>.{py,meta.yml}
Old version (from dag/archive/*.yml or by grepping for the same <short_name>)

3b. Update shape — version bump vs restructure

Before running the pipeline, classify the PR. If any of the following are true, you're reviewing a restructure, not a version bump, and several downstream checks apply differently:

The short_name changed (old version uses one name, new version uses another).
The schema changed (wide ↔ long, different file format with a different column set, new dimensions).
The set of policies/indicators changed substantially (splits, dropped composites, newly added areas).
Score semantics changed (e.g. binary → continuous 0–1, units/scale changed).

When it's a restructure:

Don't expect the auto-Indicator-Upgrader to have remapped charts. When short_names differ entirely, the upgrader has nothing to match on. Look for a hand-curated v1 title → v2 title mapping table in the PR description (or a follow-up PR thread). 🟡 if charts on the old chain are still published but no mapping plan exists.
Don't expect a .py step copy from the old version. Step files should be authored from scratch, not produced by etl update rename. If the new step files look mechanically renamed (same logic, just version-bumped strings), flag 🟡 — the author may have skipped restructure-specific decisions.
Slack + /latest drafts can legitimately be empty in the first PR of a multi-PR restructure (chart remap pending). 🟢 if PR body explicitly defers them.

4. Run the full pipeline end-to-end

.venv/bin/etlr data://grapher/<namespace>/<new_version>/<short_name>
.venv/bin/etlr grapher://grapher/<namespace>/<new_version>/<short_name> --grapher --force --only

The --grapher upload is required to verify MySQL ingestion and to enable later checks (chart count, indicator upgrade verification). Confirm:

All four steps run cleanly (snapshot pulled from S3 if .dvc is committed, otherwise re-fetched)
MySQL upload returns a dataset id and shows variable upserts
No errors / no empty tables

5. Snapshot field comparison

Read both .dvc files (old and new) and produce a side-by-side table for these fields:

Field	Check
`title`	Reasonable update if scope changed
`description`	Updated to reflect new source / scope
`date_published`	Must differ from `date_accessed` — source from `url_main` or the file. If unsure, ask.
`date_accessed`	Updated to today (or run-date)
`producer` / `attribution_short`	Same source, same values (unless changed deliberately)
`url_main`	Status check — see step 6
`url_download`	Status check; OK to remove if data is now fetched via API
`license.url`	Status check

6. Verify all links

Run the HEAD-check loop from /update-dataset § 6c on every URL in the new .dvc and .meta.yml files. Anything non-2xx is a 🔴 blocker.

7. Code clarity & docs

For each step file, check:

Snapshot script (if present): docstring explains source choice; no hidden hardcoded year/date constants without --cli-flag parametrization (or at minimum a clear update comment). Note: a .py upload script is optional — many snapshots ship with only the .dvc and a url_download. Don't flag the absence of a script.
Meadow / garden / grapher: clear top-level docstrings; no commented-out code; no silent exception handlers
Garden: harmonization uses paths.regions.harmonize_names(tb, ...) (the new API), not the legacy geo.harmonize_countries
Garden assertions: sanity checks present and not overly brittle (e.g. avoid hard-coded "X must always exceed Y" if it's not a true invariant)
Grapher meta.yml: drop it if it only duplicates the garden values — the grapher step inherits via default_metadata=ds_garden.metadata

8. Outdated practices

Run the canonical detector. The source of truth is vscode_extensions/detect-outdated-practices/src/extension.ts. Highlights to grep manually if the extension isn't running:

if __name__ == "__main__": in snapshot files — outdated. Remove it; snapshots run via etls / etl snapshot.
geo.harmonize_countries(...) in step files — replaced by paths.regions.harmonize_names(...).
dest_dir argument, paths.load_dependency(...), np.where(...) (strips origins), index.map(...) — all flagged.

When in doubt, run the /check-outdated-practices skill on the new files.

8b. Carried-over annotations & sanity_checks (review side)

/update-dataset steps 1c+6a (annotations) and 1d+5b (sanity_checks) define the catalog/resolve procedure. As reviewer, verify the outcome:

Annotations: scan the diff for any # NOTE: / # TODO: / # FIXME: / # HACK: / # XXX: that are unchanged from the old version. For each, confirm the PR body mentions whether the workaround is still needed, or that it was deleted with its code. Unresolved + undocumented = 🟡.
Sanity-check log flags: grep the diff for SHOW_SANITY_CHECK_LOGS, DEBUG, LONG_FORMAT set to True. If a debug flag was left enabled, that's a 🔴 — must be reverted.
Silent deletes: in any sanity_checks function, scan for drop, filter, tb = tb[...] — row removals that the user might miss. Make sure the PR body lists them.

9. Indicator metadata coverage & dataset block

The mandatory-fields checklist, the dataset.update_period_days requirement, and the presentation.attribution_short non-inheritance gotcha all live in /update-dataset § 6c. As reviewer, build the indicator × field matrix from that checklist and flag any missing field as 🔴.

Quick verification that presentation.attribution_short actually landed on the produced indicators (origin's value does NOT propagate):

make query SQL="SELECT shortName, attributionShort FROM variables WHERE catalogPath LIKE '%<ns>/<v>/<short_name>%'"

Any NULL row is a 🔴.

Additional reviewer-side metadata checks:

processing_level: major must come with description_processing. Grep the new garden meta.yml for processing_level: major. Each occurrence (whether on definitions.common or per-indicator) requires a description_processing field on that same scope. 🟡 mismatch.
Per-indicator description_processing should describe the indicator's own derivation, not just point at a shared generic note. When every aggregate indicator's description_processing is the exact same string (e.g. all four region indicators just reference {definitions.description_regions_processing} with no per-indicator detail), 🟡 flag — author should compose per-indicator sentences.
Long-format-with-dimensions Jinja coverage. When variables are keyed by a long-column name (e.g. proportion) with <% if <dim> == "X" %>...<% endif %> blocks for title, description_short, display.name, verify every active (dim1, dim2) cell renders a non-empty value. Easiest check: read every column from the grapher dataset and assert metadata.title is non-empty.
paths.regions.add_population(tb) / paths.regions.add_aggregates(tb, regions=[...]) auto-resolve their DAG dependencies. If the garden step loads population (or income_groups) via paths.load_dataset(...) but never passes the dataset to anything, that's dead code — 🟡. The DAG dependency still needs to be declared either way.
WB income groups in regional aggregates. When the dataset is suitable for cross-country aggregation, check that the four WB income groups (High-income countries, Upper-middle-income countries, Lower-middle-income countries, Low-income countries) are in the REGIONS list, the income_groups DAG dep is declared, and description_regions_processing references the income groups article. 🟢 informational if absent — not all datasets need this, but it's worth surfacing.

10. Metadata quality skills

Run /check-metadata-typos, /check-metadata-spacing, /check-metadata-style against the new garden + grapher .meta.yml files. See /update-dataset § 6b for the full procedure (typos / spacing / style + a manual clarity checklist for general-audience readability — apply that checklist here too). Report findings as 🟡 (or 🔴 if a violation breaks rendering or makes the text outright misleading).

11. DAG checks

The archive-and-reorder procedure is in /update-dataset § "DAG archiving & reordering". As reviewer, verify the outcome:

rg "<namespace>/<old_version>/<short_name>" dag/ -g "*.yml" | grep -v "^dag/archive"   # should be empty
rg "<namespace>/<old_version>/<short_name>" dag/archive/ -g "*.yml"                    # should match
rg "<namespace>/<new_version>/<short_name>" dag/ -g "*.yml" | grep -v "^dag/archive"   # should be in old slot, not at bottom

Visual inspection of the diff for:

Comment headers (# Source — dataset name.) preserved above both archived and new entries
Indentation consistent ( # vs # is a frequent typo)
Trailing newline on the archive YAML
New entries placed in the old block's slot, not orphaned at the bottom (🟡 if at the bottom)

12. Downstream dependency check

Procedure in /update-dataset § "Downstream dependency check". One-liner:

rg "<namespace>/<old_version>/<short_name>" dag/ -g "*.yml" | grep -v "^dag/archive"

After excluding the dataset's own chain, any remaining hits are downstream consumers — flag 🟡 unless the PR body already documents them under a "Downstream dependencies" section.

13. /update-dataset workflow status

Verify the author completed each post-step item from /update-dataset. The procedures live there — here we just confirm the outcomes:

Item	Verify by
Indicator upgrade ran (§7)	`make query SQL="SELECT COUNT(*) FROM chart_dimensions cd JOIN variables v ON cd.variableId=v.id WHERE v.catalogPath LIKE '%<ns>/<new_v>/%'"` — non-zero
Chart-diff bot result	PR comments include `<!--chart-diff-start-->` block ✅
`@codex review` posted (§9)	`gh pr view <num> --json comments` shows the trigger comment + a Codex review
Codex threads resolved (§10)	`gh api graphql -f query='{ repository(owner:"owid", name:"etl") { pullRequest(number:<num>) { reviewThreads(first:20) { nodes { isResolved } } } } }'` — all `isResolved: true`

Out of scope for review: Slack announcement and Anomalist + Chart Diff hand-off are author-side concerns, not reviewer checks.

Producer-docs vs. data consistency. If the PR description notes a discrepancy between the producer's documentation (codebook, methodology page, README, release notes — whatever is available) and the actual file shipped, that's a 🟢 informational item — the author has surfaced it for producer follow-up. Don't ask them to "fix" the data to match the docs; the PR should preserve what the source shipped and flag the discrepancy.

14. Final report

Structure the review with:

Overview — one-paragraph summary of what the PR does
Pipeline test result — ✅/❌ for each step + grapher upload
Snapshot comparison table — old vs new
Indicator metadata table — fields × indicators, ✓/❌ matrix
🔴 Blockers — must-fix before merge
🟡 Suggestions — nice-to-have
🟢 Informational — observations, no action needed
Workflow gaps from /update-dataset — PR description, Slack draft, Codex review, etc.

Severity rubric

🔴 Blocker: missing mandatory metadata field, broken link, failing pipeline step, breaking change to chart data, missing update_period_days, missing presentation.attribution_short, outdated __main__ block in snapshot, DAG reference to old version that should be archived
🟡 Suggestion: brittle assertion, hardcoded year that should be dynamic, duplicated grapher meta.yml that could be removed, non-blocking style issues
🟢 Informational: things to be aware of but not action items

Notes

The /review skill is for general PR review — this skill is the dataset-specific superset.
If the user explicitly asks to skip the pipeline run (e.g. "don't run it, just look"), still do steps 1–3 and 5–13, but skip step 4 and note that pipeline correctness is unverified.
Always include the --grapher flag when running the grapher step end-to-end — without it, MySQL ingestion is not exercised and indicator metadata in the DB is not verified.

review-data-pr

이 저장소의 다른 Skills

이 저장소의 다른 Skills

Review Data PR

Inputs

Workflow

1. PR metadata

2. Diff and changed files

3. Locate the new dataset

3b. Update shape — version bump vs restructure

4. Run the full pipeline end-to-end

5. Snapshot field comparison

6. Verify all links

7. Code clarity & docs

8. Outdated practices

8b. Carried-over annotations & sanity_checks (review side)

9. Indicator metadata coverage & dataset block

10. Metadata quality skills

11. DAG checks

12. Downstream dependency check

13. /update-dataset workflow status

14. Final report

Severity rubric

Notes

Review Data PR

Inputs

Workflow

1. PR metadata

2. Diff and changed files

3. Locate the new dataset

3b. Update shape — version bump vs restructure

4. Run the full pipeline end-to-end

5. Snapshot field comparison

6. Verify all links

7. Code clarity & docs

8. Outdated practices

8b. Carried-over annotations & sanity_checks (review side)

9. Indicator metadata coverage & dataset block

10. Metadata quality skills

11. DAG checks

12. Downstream dependency check

13. /update-dataset workflow status

14. Final report

Severity rubric

Notes