Run any Skill in Manus with one click

annotate

Build and verify a PII gold set with HUMAN annotators (first-class). Launch the browser annotator, label spans per the codebook, export per-annotator label files, then compute inter-annotator agreement (Cohen's/Fleiss' kappa) and draft an adjudicated gold. Use when the user says "annotate PII", "label this transcript", "build a gold set", "inter-annotator agreement", "review annotations", "adjudicate labels", or wants to measure/defend a de-identification gold standard. Local-only: synthetic or consented data only; annotators' names and transcript text stay on the machine — only labels/stats are collected, nothing PII is re-shared.

Run Skill in Manus

Stars254

Forks36

UpdatedJune 3, 2026 at 19:50

Source

glebis

glebis/claude-skills

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

File Explorer

6 files

SKILL.md

readonly

name

annotate

description

confide:annotate — human PII gold set + inter-annotator agreement

Humans label PII spans in a transcript; you measure how much they agree (κ) and draft an adjudicated gold from their labels. Annotators are first-class here — most of this skill is plain instructions FOR a person doing the labelling, plus a coordinator path to score it.

Privacy invariants (do not violate)

Synthetic or consented data only. Never load a real client transcript the person did not consent to share. When in doubt, anonymize first (confide:anon) and annotate the GREEN copy.
Names stay local. The annotator's labels (which contain real surface text spans) live in their browser and the exported JSON file on their own machine. Collect label files locally.
Nothing PII is re-shared. Only κ / F1 / disagreement clusters travel between people if needed. The transcript text and the original PII are never re-distributed by this skill.

Bundled assets

assets/annotator.html — zero-install browser annotation tool (EN/RU, runs offline).
references/codebook.md — the labelling rulebook (10 PII types, direct/quasi, harm).
references/tool-guide.md — how to drive the tool + scorer step by step.
scripts/score_iaa.py — Cohen's/Fleiss' κ, span-F1, disagreement queue, draft gold (stdlib).
scripts/gold_to_labels.py — turn an existing gold into a "reference annotator" to test solo.

FOR THE ANNOTATOR (no coding needed)

Open the tool. Double-click assets/annotator.html (or open it in Chrome/Firefox/ Safari). It runs entirely in your browser — nothing is uploaded; labels stay on your machine until you Export.
Read the rules. Open references/codebook.md first. It defines the 10 types (PERSON, LOCATION, ORG, PHONE, EMAIL, ID, DATE, MEDICATION, AGE, PROFESSION), what counts as a span (the minimal identifying text), and direct vs. quasi-identifier.
Set your annotator id and load the transcript in the tool (e.g. A, B, or your name). Use only synthetic or consented text.
Label every PII span. Select the minimal text that identifies a real person (the client or third parties they mention) and assign its type. Record direct/quasi, entity id, role, and harm as the codebook describes. Do not rewrite or redact — only label.
When unsure, log it — don't guess silently. Add a note starting with QUESTION: on the span (e.g. QUESTION: gym or city?). These flow straight into the adjudication queue.
Export. Click Export → you get labels.<doc>.<annotator>.json (schema: {doc_id, annotator, text, spans:[{start,end,text,type,...}]}). Keep it local and hand only this file to the coordinator. Two+ people should label the same doc independently (blind) for a meaningful κ.

FOR THE COORDINATOR (measure + adjudicate)

Collect every labels.<doc>.<annotator>.json into one folder, e.g. labels/.
Score IAA:
```
python3 skills/annotate/scripts/score_iaa.py --labels-dir labels/ --out-dir results/
```
It writes (per doc + overall): Cohen's κ (pairwise), Fleiss' κ (3+ annotators), span-F1, a disagreement queue (*-iaa-disagreements.json: every cluster annotators don't fully agree on, plus any QUESTION: spans), and a draft adjudicated gold (*-adjudicated-gold-draft.json: majority span per overlap-cluster, ties/questions marked needs_review:true). Character-level κ sidesteps tokenization disputes.
Target κ ≥ 0.80 = a defensible gold. Lower usually means an unclear codebook rule, not a careless annotator — fix the rule and re-label, don't just discard.
Adjudicate. Walk the disagreement queue with a human adjudicator; resolve each needs_review cluster. The resulting label set is the published gold; report post-adjudication κ too. Nothing is ever auto-finalised.

Test the loop solo (no second person yet)

Treat an existing gold JSONL as one "reference annotator", label the same doc yourself in annotator.html as another, then score the pair:

python3 skills/annotate/scripts/gold_to_labels.py --gold GOLD.jsonl --name gold --out-dir labels/
# label the same doc yourself in annotator.html as "me" -> drop labels.<doc>.me.json into labels/
python3 skills/annotate/scripts/score_iaa.py --labels-dir labels/ --out-dir results/

(--sessions-dir DIR lets gold_to_labels.py read transcript text from disk so char offsets match the gold exactly.)

Output

IAA results (κ, F1) + a disagreement list + a draft adjudicated gold — labels/stats only. Transcript text and original PII stay local; only what's needed to adjudicate is shared.

More from this repository

same repository

rigorous-experiments

glebis/claude-skills

This skill should be used when designing, running, validating, or auditing statistical experiments on personal or observational time-series data (health metrics, speech/text corpora, behavioral logs, diaries, n-of-1 self-tracking). It enforces pre-registration, exact permutation tests, FDR discipline, data-validation gates, adversarial code review, and cross-validation with external models. Triggers on "design an experiment", "test this hypothesis on my data", "is this correlation real", "audit these findings", "pre-register", "validate this dataset", or any n-of-1 / quantified-self analysis request.

2026-06-08254

tufte-report

glebis/claude-skills

Create Tufte-inspired data reports and infographic dashboards as standalone HTML files. Uses EB Garamond for text, Monaspace Argon for numbers, Chart.js for interactive charts, and inline SVG sparklines. Produces publication-quality reports with 2-column narrative+data layouts, status dashboards, scroll animations, and responsive mobile support. Use this skill whenever the user wants to create a data report, activity dashboard, infographic, personal analytics page, health tracker visualization, or any document that combines narrative text with interactive charts and tables. Also triggers for "make a report like Tufte", "create an infographic", "build a dashboard", "visualize my data", or requests for beautiful data-driven documents.

2026-06-05254

anon

glebis/claude-skills

De-identify a session transcript (file or folder) by redacting PII LOCALLY before any sharing or cloud use. Produces a redacted GREEN copy with unique reserved-sentinel placeholders ([CONFIDE_PERSON_0001], [CONFIDE_EMAIL_0001], [CONFIDE_DATE_0002]...) plus a counts-only stats summary, and a local secret <name>.map.json (0600, gitignored) that enables confide:rehydrate to restore real values after a cloud analysis. Use when the user says "anonymize this transcript", "redact PII", "de-identify session", "make safe to share", "strip personal data", "anonymize notes before sending to an LLM", or points at a transcript/folder that should be scrubbed. Local-only by default — raw text never leaves the machine; the map is the only artifact with originals and stays local; nothing printed is PII; human review is still required before sharing.

2026-06-03254

audit

glebis/claude-skills

Run a corpus-scale, STATS-ONLY PII audit over a folder of session transcripts LOCALLY and produce an aggregate report — counts by type and by layer, the per-session redaction-rate distribution, document lengths, and a coarse residual proxy. Use when the user says "audit my sessions", "scan folder for PII", "how much PII across these transcripts", "PII stats for my corpus", "is my redaction holding at scale", or points at a directory of transcripts and asks how much personal data it contains. Fully local — raw text never leaves the machine; the report carries ZERO PII values, transcript substrings, or filenames (only anonymized own-NN ids and counts), so the aggregates are safe to surface. Run it on a RED (raw) corpus to size the PII, or on a GREEN (already-redacted) corpus to check residual leakage.

2026-06-03254

rehydrate

glebis/claude-skills

Put the real values back into an analysis that was produced from GREEN (placeholder) text — LOCALLY, using the user's own reversible map. Completes the confide round-trip (redact -> cloud-analyze the green -> rehydrate locally). Use when the user says "rehydrate", "restore real names", "unmask the analysis", "put the names back", "de-redact this output", "reverse the placeholders", or hands you an analysis full of [CONFIDE_PERSON_0001]/[CONFIDE_DATE_0002] plus a *.map.json. Runs only on the user's own map; the map never leaves the machine; nothing fetched or transmitted. Prints counts only — never echoes restored PII. Warns on placeholders not in the map (possible LLM hallucination).

2026-06-03254

vault

glebis/claude-skills

Set up and verify the CONFIDE THREE LOCKS for storing RED (real, identifiable) session data at rest — device FileVault, a dedicated encrypted store, and per-file sops/age encryption. Use when the user says "set up confide vault", "encrypt my session data", "three locks", "secure store for transcripts", "sops/age for RED data", or asks how to store real therapy/coaching transcripts safely. NON-DESTRUCTIVE: it CHECKS each lock's status and prints the EXACT command to fix any gap; it never moves, deletes, or encrypts data, and never runs `fdesetup enable`/`hdiutil`/`age-keygen` without an explicit flag and your confirmation. Probes are read-only (`fdesetup status`, which sops/age, key path).

2026-06-03254

name

annotate

description

confide:annotate — human PII gold set + inter-annotator agreement

Privacy invariants (do not violate)

Synthetic or consented data only. Never load a real client transcript the person did not consent to share. When in doubt, anonymize first (confide:anon) and annotate the GREEN copy.
Names stay local. The annotator's labels (which contain real surface text spans) live in their browser and the exported JSON file on their own machine. Collect label files locally.
Nothing PII is re-shared. Only κ / F1 / disagreement clusters travel between people if needed. The transcript text and the original PII are never re-distributed by this skill.

Bundled assets

assets/annotator.html — zero-install browser annotation tool (EN/RU, runs offline).
references/codebook.md — the labelling rulebook (10 PII types, direct/quasi, harm).
references/tool-guide.md — how to drive the tool + scorer step by step.
scripts/score_iaa.py — Cohen's/Fleiss' κ, span-F1, disagreement queue, draft gold (stdlib).
scripts/gold_to_labels.py — turn an existing gold into a "reference annotator" to test solo.

FOR THE ANNOTATOR (no coding needed)

Open the tool. Double-click assets/annotator.html (or open it in Chrome/Firefox/ Safari). It runs entirely in your browser — nothing is uploaded; labels stay on your machine until you Export.
Read the rules. Open references/codebook.md first. It defines the 10 types (PERSON, LOCATION, ORG, PHONE, EMAIL, ID, DATE, MEDICATION, AGE, PROFESSION), what counts as a span (the minimal identifying text), and direct vs. quasi-identifier.
Set your annotator id and load the transcript in the tool (e.g. A, B, or your name). Use only synthetic or consented text.
Label every PII span. Select the minimal text that identifies a real person (the client or third parties they mention) and assign its type. Record direct/quasi, entity id, role, and harm as the codebook describes. Do not rewrite or redact — only label.
When unsure, log it — don't guess silently. Add a note starting with QUESTION: on the span (e.g. QUESTION: gym or city?). These flow straight into the adjudication queue.
Export. Click Export → you get labels.<doc>.<annotator>.json (schema: {doc_id, annotator, text, spans:[{start,end,text,type,...}]}). Keep it local and hand only this file to the coordinator. Two+ people should label the same doc independently (blind) for a meaningful κ.

FOR THE COORDINATOR (measure + adjudicate)

Collect every labels.<doc>.<annotator>.json into one folder, e.g. labels/.
Score IAA:
```
python3 skills/annotate/scripts/score_iaa.py --labels-dir labels/ --out-dir results/
```
It writes (per doc + overall): Cohen's κ (pairwise), Fleiss' κ (3+ annotators), span-F1, a disagreement queue (*-iaa-disagreements.json: every cluster annotators don't fully agree on, plus any QUESTION: spans), and a draft adjudicated gold (*-adjudicated-gold-draft.json: majority span per overlap-cluster, ties/questions marked needs_review:true). Character-level κ sidesteps tokenization disputes.
Target κ ≥ 0.80 = a defensible gold. Lower usually means an unclear codebook rule, not a careless annotator — fix the rule and re-label, don't just discard.
Adjudicate. Walk the disagreement queue with a human adjudicator; resolve each needs_review cluster. The resulting label set is the published gold; report post-adjudication κ too. Nothing is ever auto-finalised.

Test the loop solo (no second person yet)

Treat an existing gold JSONL as one "reference annotator", label the same doc yourself in annotator.html as another, then score the pair:

python3 skills/annotate/scripts/gold_to_labels.py --gold GOLD.jsonl --name gold --out-dir labels/
# label the same doc yourself in annotator.html as "me" -> drop labels.<doc>.me.json into labels/
python3 skills/annotate/scripts/score_iaa.py --labels-dir labels/ --out-dir results/

(--sessions-dir DIR lets gold_to_labels.py read transcript text from disk so char offsets match the gold exactly.)

Output

IAA results (κ, F1) + a disagreement list + a draft adjudicated gold — labels/stats only. Transcript text and original PII stay local; only what's needed to adjudicate is shared.