تشغيل أي مهارة في Manus بنقرة واحدة

disclosure-check

النجوم١٬٣٠١

التفرعات٢٬٦١٤

آخر تحديث٩ يونيو ٢٠٢٦ في ١٥:٤٣

Pre-screen analysis outputs (tables, figures, logs) built on restricted or confidential data for statistical-disclosure-limitation problems before any release. Scans for small cell counts, complementary-suppression gaps, dominance (p-percent / (n,k)), re-identifiable exact counts, PII leakage, and unrounded sensitive statistics; classifies each finding CRITICAL / WARNING / OK and gates on any CRITICAL. Use before depositing or sharing restricted-data results, or when the user says "disclosure check", "SDL scan", "is this output safe to release", "check for small cells", "disclosure avoidance", "pre-screen for the RDC", or "can I export this from the enclave".

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

pedrohcgs

pedrohcgs/claude-code-my-workflow

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

محللو أمن المعلوماتمهن الحاسوب والرياضيات·SOC 15-1212

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

submission-disclosures

pedrohcgs/claude-code-my-workflow

Generate the submission-time disclosure block for a manuscript — the AI-use disclosure statement matched to the target journal's policy, CRediT author-contribution roles, conflict-of-interest statement, and data-availability statement. Use when the user says "AI disclosure", "disclosure statement", "do I need to disclose Claude", "CRediT roles", "conflict of interest statement", "data availability statement", or is preparing a submission package. NOT statistical-disclosure screening of restricted-data outputs — that is /disclosure-check.

2026-06-101.3k

diagnose

pedrohcgs/claude-code-my-workflow

Root-cause a failing or wrong empirical result with a disciplined reproduce → minimise → hypothesise → instrument → fix loop, instead of guessing-and-poking. Use when the user says "why is my regression wrong", "this number changed", "my script errors out", "the result won't reproduce", "debug this", "this estimate looks wrong", or "it worked yesterday". Tuned for research code (R/Stata/Python): type coercion, NA/merge blow-ups, factor levels, clustering/SE choices, weighting, collinearity/convergence, seeds, package-version drift. Use `--no-fix` to localize the root cause without editing shared or load-bearing files.

2026-06-091.3k

audit-reproducibility

pedrohcgs/claude-code-my-workflow

Enforce the replication-protocol.md rule by cross-checking numeric claims in a manuscript against the actual R / Stata / Python outputs. Report PASS/FAIL per claim against tolerance thresholds. Use before submission and before releasing a replication package.

2026-06-091.3k

deep-audit

pedrohcgs/claude-code-my-workflow

Deep consistency audit of the entire repository infrastructure. Launches 4 parallel specialist agents to find factual errors, code bugs, count mismatches, and cross-document inconsistencies. Then fixes all issues and loops until clean. Use when: after making broad changes, before releases, or when user says "audit", "find inconsistencies", "check everything".

2026-06-091.3k

create-lecture

pedrohcgs/claude-code-my-workflow

Create a new Beamer lecture `.tex` from source papers and materials, with notation consistency checks and the project's preamble wired in. Use when user says "create a lecture on X", "new lecture from these papers", "start a deck on topic Y", "scaffold a new Beamer file", "build me a lecture from these PDFs". Scaffolds the full deck — NOT for compiling existing `.tex` (use `/compile-latex`).

2026-06-091.3k

review-paper

pedrohcgs/claude-code-my-workflow

Comprehensive manuscript review with three modes: single-pass (default), --adversarial critic-fixer loop, and --peer [journal] simulated peer-review pipeline (editor + 2 dispositioned referees + editorial decision, calibrated to a target journal). R&R continuation via --peer --r2/--r3; hostile-editor stress test via --peer --stress; reviewer-disposition variance reporting via --peer --variance N. Auto-invokes /review-r + /audit-reproducibility on referenced scripts unless --no-cross-artifact.

2026-06-091.3k

name	disclosure-check
description	Pre-screen analysis outputs (tables, figures, logs) built on restricted or confidential data for statistical-disclosure-limitation problems before any release. Scans for small cell counts, complementary-suppression gaps, dominance (p-percent / (n,k)), re-identifiable exact counts, PII leakage, and unrounded sensitive statistics; classifies each finding CRITICAL / WARNING / OK and gates on any CRITICAL. Use before depositing or sharing restricted-data results, or when the user says "disclosure check", "SDL scan", "is this output safe to release", "check for small cells", "disclosure avoidance", "pre-screen for the RDC", or "can I export this from the enclave".
argument-hint	[outputs-dir] [--provider census\|irs\|irb\|generic] [--threshold N] (outputs-dir defaults to scripts/R/_outputs/)
disable-model-invocation	true
allowed-tools	["Read","Grep","Glob","Write","Bash"]
effort	high

`/disclosure-check` — Statistical-Disclosure-Limitation pre-screen

Scan analysis outputs built on restricted or confidential data (Census FSRDC, IRS SOI, administrative registers, linked health records, proprietary firm panels) for the disclosure-avoidance problems that get an export request rejected — before it reaches the data provider's official disclosure review. The skill is a pre-screen, not a substitute for that review.

Core principle: A single un-suppressed n=3 cell, an exact count that pins down one firm, or a p-percent dominance failure can re-identify a person or establishment. Catch it on your machine, not in the rejection email from the RDC analyst.

When to use

Before requesting an export from a Census FSRDC / secure data enclave / RDC.
Before depositing restricted-data results to openICPSR, a journal, or a co-author outside the enclave.
Before sharing any figure, table, or log derived from confidential microdata.
As a release gate. Pair with a pre-commit / pre-deposit invocation so no restricted-data output ships un-screened. This is the foundation of the data-management plan for any restricted-data project.

Inputs

$0 — outputs directory to scan. Defaults to scripts/R/_outputs/. Recognised siblings: scripts/stata/_outputs/, scripts/python/_outputs/, or any export-staging directory (e.g., a to_review/ folder the analyst stages for the RDC).
--provider — selects which disclosure-rule profile to load (Phase 0). One of census / irs / irb / generic. Providers differ — thresholds and rules are not interchangeable; default generic is deliberately conservative.
--threshold N — override the minimum cell count (default n<10). Census FSRDC commonly uses 10 for establishments; IRS and many IRBs differ. Always reconcile with your provider's written rules.

Workflow

Phase 0: Load the provider's disclosure rules

Read .claude/rules/confidential-data.md for the project's restricted-data handling contract and the rule-profile placeholder.
Load the --provider profile (a placeholder config the forker fills in from their signed agreement — Census, IRS, and IRB rules differ and supersede any default here):
- min cell count (default n<10),
- dominance rules: p-percent (a cell is unsafe if the largest respondents contribute > p% of the total) and (n,k) (top n units > k% of total),
- rounding required for sensitive statistics (counts, totals, ratios),
- top-coding / bottom-coding thresholds for extreme values,
- geographic minimum population for any geocoded statistic.
If no signed-rule values are recorded, fall back to the conservative generic profile and flag prominently in the report that real provider thresholds must be substituted.

Phase 1: Scan the outputs directory

Glob the outputs dir for .tex, .csv, .txt, .log, .smcl, .out, .md tables and figure-data files. For each:

Cell counts — parse table cells / frequency columns; flag any count 0 < n < threshold that is not already suppressed.
Complementary-suppression gaps — if one cell in a row/column is suppressed but the margin total and the other cells let a reader back it out by subtraction, the suppression is incomplete.
Dominance — for any total/mean cell where unit-level contributions are available (or inferable), apply the p-percent and (n,k) rules.
Exact re-identifying counts — small exact integers (e.g., "4 hospitals", "1 firm", a max/min that is a single observation) that single out a unit.
PII leakage — regex for names, SSNs (\d{3}-\d{2}-\d{4}), exact dates of birth, addresses, exact lat/long or fine geocodes, record IDs that survived into an output.
Unrounded sensitive statistics — exact unrounded counts/totals where the provider requires rounding.

Phase 2: Classify each finding — CRITICAL / WARNING / OK

Disposition	Meaning	Examples
CRITICAL	Would fail the provider's disclosure review; blocks release.	`n=3` cell un-suppressed; complementary-suppression hole; `p`-percent dominance failure; any PII; an exact count identifying ≤2 units.
WARNING	Plausibly safe but needs a human judgment call.	Cell at exactly the threshold; unrounded total just over a rounding base; geographic statistic near the min-population floor.
OK	Within the loaded rules, no action needed.	Counts ≥ threshold and rounded; dominance passes; no PII.

When two findings interact (a suppressed cell + a recoverable margin), report them together — the gate cares about the joint disclosure risk, not each cell in isolation. Be economics-aware: DiD / event-study cell counts per (cohort × period), IV first-stage subsamples, RCT arm × stratum balance tables, and panel firm-counts are the usual offenders.

Phase 3: Suggest remediation

For each CRITICAL / WARNING, propose the standard SDL fix, in order of preference:

Suppress the offending cell (and its complement, if a margin allows back-out).
Round counts/totals to the provider's base (e.g., nearest 10 or 15).
Top-code / bottom-code extreme values.
Aggregate — collapse thin categories, coarsen geography, widen bins until every cell clears the threshold.
Drop the statistic if no remediation preserves both safety and meaning.

Each suggestion names the file, the cell/location, the rule it violates, and the concrete edit — never auto-applies it (the analyst owns the disclosure decision).

Phase 4: Gate

Exit non-zero on any CRITICAL. WARNINGs surface but do not block. See Exit behavior.

Output / Report format

Write quality_reports/disclosure_check_[outputs-dir-slug].md:

# Disclosure Check: [outputs dir]

**Date:** [YYYY-MM-DD]
**Provider profile:** census | irs | irb | generic   (rules source: confidential-data.md)
**Min cell count:** [N]   **Dominance:** p=[p]%, (n,k)=([n],[k]%)   **Rounding base:** [b]

## Summary
| Disposition | Count |
|---|---|
| CRITICAL | M |
| WARNING | W |
| OK | P |
| **Verdict** | **PASS / FAIL** (FAIL iff M > 0) |

## CRITICAL (blocks release)
| File | Location | Rule violated | Observed | Suggested remediation |
|---|---|---|---|---|
| tab3_by_cohort.tex | row "2008", col "n" | min cell (n<10) | n=4 | suppress cell + suppress complement in margin |

## WARNING (human judgment)
| File | Location | Concern | Suggested action |
|---|---|---|---|

## OK
[counts only, or a short list]

## Next steps
1. Resolve every CRITICAL — suppress / round / top-code / aggregate, then re-run.
2. Review WARNINGs with the agreement's written rules in hand.
3. Re-run until zero CRITICAL, THEN submit to the provider's OFFICIAL disclosure review.

Exit behavior

Zero CRITICAL: exit 0; report printed. (WARNINGs allowed — they are surfaced, not blocking.)
Any CRITICAL: exit 1; summary to stderr. This makes the skill usable as a release / pre-deposit gate. Mirrors /audit-reproducibility's gate semantics: WARNING ≠ FAIL, only CRITICAL blocks.
No rules loaded (generic fallback): exit 0 with a prominent warning that real provider thresholds were not supplied — the pre-screen ran but at conservative defaults, not the actual agreement.

Flags

--provider <name> — Load that data provider's disclosure rules (e.g. census-fsrdc, irs, irb). Default: the generic small-cell ruleset.
--threshold <n> — Override the minimum cell-count threshold (default n<10); match your data-use agreement's actual rule.

Cross-references

.claude/rules/confidential-data.md — restricted-data handling contract + the provider-rule profiles this skill loads.
.claude/rules/replication-protocol.md — for restricted-data papers the replication package ships code + access path, not the microdata; screen every released output first.
.claude/skills/audit-reproducibility/SKILL.md — numeric paper↔code verification: run it on the retained values, this skill on the released ones.
.claude/skills/data-analysis/SKILL.md, .claude/skills/stata-replication/SKILL.md — produce the R / Stata / Python outputs this skill screens.
AEA Data Editor checklist and the DCAS standard — disclosure + access expectations for restricted-data deposits (openICPSR restricted-access stub).

What this skill does NOT do

It does not replace the data provider's official disclosure review. Census/RDC, IRS, and IRB analysts run the authoritative review; this skill pre-screens so the official review is more likely to pass on the first pass. A PASS here is not clearance to release.
It does not certify your rules are correct. It applies the thresholds you load from your signed agreement; if the loaded --provider profile is wrong, the scan is wrong. Reconcile with the written agreement, not a default.
It does not move, encrypt, or transmit data, never exfiltrates microdata from the enclave — it reads only the staged outputs you point it at.
It does not catch every disclosure risk. Differencing across released tables, longitudinal re-identification, and model-based inferential disclosure can evade a per-file scan. A clean run is necessary, not sufficient.

disclosure-check

المزيد من هذا المستودع

المزيد من هذا المستودع

/disclosure-check — Statistical-Disclosure-Limitation pre-screen

When to use

Inputs

Workflow

Phase 0: Load the provider's disclosure rules

Phase 1: Scan the outputs directory

Phase 2: Classify each finding — CRITICAL / WARNING / OK

Phase 3: Suggest remediation

Phase 4: Gate

Output / Report format

Exit behavior

Flags

Cross-references

What this skill does NOT do

/disclosure-check — Statistical-Disclosure-Limitation pre-screen

When to use

Inputs

Workflow

Phase 0: Load the provider's disclosure rules

Phase 1: Scan the outputs directory

Phase 2: Classify each finding — CRITICAL / WARNING / OK

Phase 3: Suggest remediation

Phase 4: Gate

Output / Report format

Exit behavior

Flags

Cross-references

What this skill does NOT do

`/disclosure-check` — Statistical-Disclosure-Limitation pre-screen

`/disclosure-check` — Statistical-Disclosure-Limitation pre-screen