name	persona-extractor
description	Build a reusable, evidence-grounded persona artifact from raw source material such as interviews, memoirs, speeches, letters, biographies, transcripts, diaries, articles, or mixed dossiers. Use this skill when the goal is to produce an engineering-grade persona package for downstream simulation, roleplay, analysis, or evaluation rather than an informal character sketch.

Persona Extractor

This skill is for turning a corpus about one person into a standardized persona artifact.

The output is not just a summary. It is a structured package with:

source inventory
extraction config
normalized evidence index
persona synthesis
scenario library
simulation contract
runtime modules
load profiles

The canonical final artifact is:

personas/<persona_slug>/persona.json

The extraction pipeline is config-driven. Do not hardcode dimensions, domain assumptions, or source-specific heuristics into the final persona.

Read these references first:

references/pipeline.md
references/dimension_catalog.md
references/output_contract.md
references/quality_rubric.md

Standard Workflow

Scaffold the project. Generate a generic working folder and config:

python skills/persona-extractor/scripts/scaffold_persona_project.py \
  --output-dir personas/<persona_slug> \
  --persona-id <persona_slug> \
  --display-name "<display_name>"

This creates:

extractor_config.json
persona.template.json
notes.md

Define the extraction boundary. Fill extractor_config.json:
- corpus root
- source types
- time span
- chosen dimensions
- scenario axes
- optional seed terms per dimension
Build the evidence pack. Run:

python skills/persona-extractor/scripts/persona_evidence_builder.py \
  --source-dir <corpus_dir> \
  --output-dir personas/<persona_slug> \
  --config personas/<persona_slug>/extractor_config.json

This generates:

source_index.json
block_index.json
candidate_passages.json
candidate_evidence.md

Synthesize the persona. Write persona.json by combining:
- source facts
- normalized evidence ids
- dimension summaries
- scenario answers
- roleplay contract
Validate before use. Template-stage validation:

python skills/persona-extractor/scripts/validate_persona.py \
  personas/<persona_slug>/persona.template.json \
  --mode template

Final artifact validation:

python skills/persona-extractor/scripts/validate_persona.py \
  personas/<persona_slug>/persona.json \
  --mode final

Render human-readable mirrors.

python skills/persona-extractor/scripts/render_persona_markdown.py \
  personas/<persona_slug>/persona.json

Build runtime modules for partial loading.

python skills/persona-extractor/scripts/build_persona_modules.py \
  personas/<persona_slug>/persona.json

This writes:

personas/<persona_slug>/modules/*.json

and updates:

module_registry
load_profiles

Design Rules

Make the schema fit the corpus, not the other way around.
Prefer reusable behavioral dimensions over biographical trivia.
Separate evidence, inference, and extrapolation.
Preserve drift over time when the persona changes.
Record uncertainty explicitly instead of forcing false precision.
Keep the artifact generic enough for any downstream simulator, not just one roleplay style.
Keep the runtime layer modular enough that later invocation can load only the slices needed for a scene.

Quality Bar

The artifact must be understandable without reopening the raw corpus.
Every major synthesis claim should trace back to evidence ids.
Dimensions should capture stable behavior patterns, not just famous slogans.
Scenario answers should reflect the extracted method, values, and voice, not cosplay.
The artifact should remain useful even if the person is not political, historical, or public-facing.

Output Rule

persona.json is the source of truth.

Markdown files are mirrors for review.

The roleplay skill should consume the final artifact generically via the simulation contract and scenario library, not via person-specific patches.

name	persona-extractor
description	Build a reusable, evidence-grounded persona artifact from raw source material such as interviews, memoirs, speeches, letters, biographies, transcripts, diaries, articles, or mixed dossiers. Use this skill when the goal is to produce an engineering-grade persona package for downstream simulation, roleplay, analysis, or evaluation rather than an informal character sketch.