Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

extract

Sterne2

Forks5

Aktualisiert12. Mai 2026 um 04:27

General extraction/ingestion skill that routes to specific workflows based on input type. Extracts structured information from documents, emails, reviews, feedback, and other sources.

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

nicsuzor

nicsuzor/academicOps

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Verwandte BerufeSOC

Basierend auf der SOC-Berufsklassifikation

SoftwareentwicklerInformatik- und Mathematikberufe·SOC 15-1252

Datei-Explorer

4 Dateien

SKILL.md

readonly

Mehr aus diesem Repository

gleiches Repository

daily

nicsuzor/academicOps

Daily note lifecycle — compose and maintain a factual daily note. Reports the state of the day; does not prioritise or recommend. SSoT for daily note structure.

2026-06-242

remember

nicsuzor/academicOps

Unified memory skill: immediate mode (/remember) persists knowledge via PKB MCP; maintenance mode (/sleep, GHA cron) runs periodic consolidation — transcript mining, knowledge synthesis, data quality, brain sync.

2026-06-242

strategic-review

nicsuzor/academicOps

Unified multi-agent review of any artifact — a document, plan, proposal, or pull request. The calling agent deploys rbg, pauli, and marsha in parallel, then @james reconciles their findings into one verdict. Pass `comment` and/or `fix` to write the result back to the review surface. Use `--critic` for a fast pauli-only pre-hoc critique.

2026-06-242

craft

nicsuzor/academicOps

Instruction quality gate — reviews agent instructions (task bodies, workflow steps, skill procedures, self-test protocols) for shallow-execution vulnerabilities before deployment. Two modes: author (pre-hoc review) and audit (trace a failure back to the instruction gap). The bar is excellence, not compliance.

2026-06-242

planner

nicsuzor/academicOps

Strategic planning agent — graph structure ownership, task decomposition, knowledge-building, and PKM maintenance. Works on WHAT exists and HOW it relates.

2026-06-242

survey

nicsuzor/academicOps

Survey a corpus, classify, and dispatch outputs. Three modes: retro (transcript review → issues), trend (longitudinal performance analysis), sweep (GitHub issue triage → fix-epics). Delegates execution to pauli (retro/trend) or jr (sweep) to keep main context clean.

2026-06-242

name	extract
type	skill
category	instruction
description	General extraction/ingestion skill that routes to specific workflows based on input type. Extracts structured information from documents, emails, reviews, feedback, and other sources.
triggers	["extract information","extract from document","ingest","extract decisions","extract training data","convert document","DOCX/PDF/XLSX conversion","/convert-to-md"]
modifies_files	true
needs_task	true
mode	execution
domain	["operations"]
allowed-tools	Read,Grep,Glob,Edit,Write,Skill,Bash
version	1.0.0
permalink	skills-extract

Extraction & Ingestion Skill

Taxonomy note: This skill provides domain expertise (HOW) for extracting structured information from documents and sources. See [[aops-core/skills/remember/references/TAXONOMY.md]] for the skill/workflow distinction.

General-purpose extraction skill that intelligently routes to specialized workflows based on input type. Extracts structured information from various sources and stores it appropriately (public framework vs. private data).

Framework Context

Universal axioms apply (enforced by rbg). P#52 (Read-Then-Write Memory) is especially relevant.

Search Before Creating

MANDATORY before creating any new extracted content: search PKB for existing knowledge on the same subject.

mcp__pkb__search(query="[topic/person/document subject]")

If a match exists → augment it rather than creating a duplicate
If people are mentioned → retrieve existing relationship context to enrich extraction
If no match → proceed with creation

This prevents duplicate memories and grounds extraction in accumulated knowledge. See [[remember]] skill's "search first" step as the model.

Purpose

Provide a unified entry point for all extraction tasks:

Training data extraction from feedback documents
Archive information extraction (emails, correspondence)
Decision extraction from task queues
Knowledge extraction from documents
Review pair extraction for LLM training

Workflow Routing

When invoked, analyze the input and route to the appropriate workflow:

1. Training Data Extraction

Signals:

Input contains review feedback + source document
User mentions "training", "extract patterns", "learning", "dataset"
Documents contain tracked changes, comments, or annotations
Goal is to build LLM training data

Route to: workflows/training-data.md

Storage:

Sensitive data (actual review content, source documents): $ACA_DATA/processed/review_training/
Generalized patterns (depersonalized principles): Framework docs or peer-review skill

2. Archive Information Extraction

Signals:

Input is email archive, correspondence, receipts
User mentions "archive", "preserve", "remember"
Goal is to capture significant events/relationships
Source is historical documents

Route to: Archive extraction logic (selective extraction, use /remember for storage)

Storage: Use Skill(skill="remember") for PKB storage

3. Decision Extraction

Signals:

User mentions "decisions", "pending", "blocking"
Goal is to surface approval/choice items
Source is task queue

Route to: Existing aops-core/skills/decision-extract/SKILL.md

Storage: Daily note with decision formatting

4. Document Knowledge Extraction

Signals:

Single document needs key information extracted
User mentions "extract", "parse", "ingest"
Not training data, not archive, not decisions
Goal is structured information retrieval

Route to: workflows/document-knowledge.md (to be created)

Storage: Depends on content - PKB or framework docs

5. Document to Markdown Conversion

Signals:

Input is a document file (DOCX, PDF, XLSX, TXT, PPTX, MSG, DOC, DOTX)
User mentions "convert", "convert to markdown", "docx to markdown", "pdf to markdown"
Invoked as /convert-to-md (alias preserved for backwards compatibility)
Goal is format conversion, not structured extraction

Route to: workflows/docs-to-md.md

Storage: Converted .md files replace originals in the same directory

Workflow: Training Data Extraction

Input Types

Type A: Review with Inline Comments (DOCX with tracked changes)

Example: Peer review with inline comments and suggestions
Procedure: procedures/review-inline-comments.md
Output: Training pairs + generalized principles

Type B: Separate Review + Source Documents

Example: review.txt + source.pdf + metadata.json
Workflow: Review-training extraction (match feedback to source evidence)
Output: Training pairs matching feedback to source evidence

Type C: Revision History

Example: Git history, Google Docs revision history, track changes
Workflow: workflows/revision-history.md (to be created)
Output: Before/after pairs with change rationales

Extraction Process

See procedures/review-inline-comments.md for detailed procedure.

Quick summary:

Convert to workable format (preserve markup)
Extract feedback units (text + comment pairs)
Categorize feedback (type, scope, action)
Identify patterns (group similar feedback)
Generalize principles (abstract to transferable form)
Separate sensitive/public (raw data vs. patterns)
Store appropriately (sensitive → $ACA_DATA, patterns → framework)

Storage Rules

CRITICAL: Training data often contains sensitive information (author names, unpublished work, specific critiques).

Sensitive data → $ACA_DATA/processed/review_training/{collection_name}/:

extracted_examples.json (full text/feedback pairs)
training_pairs.jsonl (machine-readable format)
collection_summary.md (with identifying information)
Source documents (if retained)

Generalized patterns → Framework (public repo):

aops-core/skills/hydrator/workflows/peer-review.md (update with principles)
aops-core/skills/*/references/ (depersonalized examples)
No names, no specific unpublished content, no identifying details

Quality Standards

High-quality extraction:

Clear connection between feedback and source
Sufficient context for learning
Well-categorized with teaching points
Patterns are transferable

Generalization quality:

Principles are specific enough to apply
Principles are general enough to transfer
Examples span different contexts
Limitations are documented

Workflow: Archive Information Extraction

Apply selective extraction logic.

Key principle: Most archival documents have NO long-term value. Be highly selective.

Extract: Concrete outcomes, significant relationships, financial records Skip: Newsletters, invitations, administrative routine, mass communications

Storage: Use Skill(skill="remember") with proper tags and canonical identifiers.

Workflow: Decision Extraction

Delegate to aops-core/skills/decision-extract/SKILL.md.

Key principle: Extract tasks requiring approval/choice that are blocking other work.

Storage: Daily note with formatted decision list for batch processing.

Sensitive Data Handling

What is Sensitive?

Author names and identifying information
Unpublished work content
Specific critiques of individuals' work
Email content and correspondence
Personal information
Institutional confidential information

Storage Location: `$ACA_DATA/processed/`

Directory structure:

$ACA_DATA/processed/
├── review_training/
│   ├── {collection_name}/
│   │   ├── extracted_examples.json
│   │   ├── training_pairs.jsonl
│   │   ├── collection_summary.md
│   │   └── source_documents/
│   └── ...
├── email_archive/
│   └── ...
└── ...

Access: This directory is:

Outside the public academicOps repo
In personal data directory ($ACA_DATA = /home/nic/brain)
Should be backed up separately
Not committed to git

Depersonalization for Public Framework

When adding examples to public framework docs:

Remove all names (use "Author", "Reviewer", or generic placeholders)
Remove specific work titles (use generic descriptions)
Remove institutional affiliations
Generalize to principle, not specific instance
Use constructed examples if real ones can't be depersonalized

Integration with Other Skills

When to use `/extract` vs. specialized skills

Use /extract:

Unclear what type of extraction is needed
Multiple types of documents to process
Want intelligent routing to appropriate workflow

Use specialized skill directly:

/remember - When you know you want to add to knowledge base
/decision-extract - When specifically extracting decisions
/review-training - When processing matched review/source pairs (legacy)

Note on /convert-to-md: This trigger is now an alias for /extract. Invoking /convert-to-md routes to the workflows/docs-to-md.md workflow.

Skill Composition

/extract → analyze input → route to:
  - /remember (for archival preservation)
  - /decision-extract (for pending decisions)
  - training-data workflow (for LLM training data)
  - document-knowledge workflow (for general extraction)

Error Handling

Scenario	Behavior
Unclear input type	Ask user to clarify extraction goal
Cannot convert document format	Try alternative conversion, document failure
Ambiguous feedback	Flag with `"quality": "ambiguous"`, include with caveats
No clear extraction value	Ask user if they want to skip or force extraction
Storage location unclear	Default to `$ACA_DATA/processed/`, confirm with user

Examples

Example 1: Peer Review Extraction

Input: DOCX file with inline comments from peer review

User: /extract /path/to/review.docx --type peer-review

Agent:

Detects inline comments → routes to review-inline-comments workflow
Converts DOCX with pandoc --track-changes=all
Extracts 18 feedback units
Identifies 10 generalisable principles
Stores sensitive data in $ACA_DATA/processed/review_training/aoir2026/
Updates aops-core/skills/hydrator/workflows/peer-review.md with depersonalized principles

Example 2: Email Archive

Input: Directory of email MSG files

User: /extract emails/2025-Q1/ --type archive

Agent:

Detects email archive → routes to extractor skill logic
Processes each email, applies judgment criteria
Extracts significant events/relationships
Uses /remember to store in PKB
Skips 90% of emails as noise

Example 3: Auto-Detection

Input: Mixed documents without type specified

User: /extract documents/

Agent:

Analyzes each document
Detects: 2 peer reviews (tracked changes), 5 emails, 1 grant application
Routes peer reviews → training-data workflow
Routes emails → archive extraction
Asks user about grant application (unclear extraction goal)

Validation Checklist

Before completing extraction:

Completeness:

All extractable items identified
All items processed (or skipped with reason)
Output files created in correct locations

Quality:

Teaching value is clear (for training data)
Categorization is accurate
Context is sufficient

Sensitivity:

Sensitive data stored in $ACA_DATA/processed/
Public framework contains only depersonalized content
No identifying information in public docs

Documentation:

Extraction process documented
Decisions and ambiguities noted
Collection summary created

Future Enhancements

Semi-automated pattern detection
Batch processing for multiple documents
Integration with continuous ingestion pipeline
Quality metrics and validation tools
Cross-collection pattern analysis

extract

Mehr aus diesem Repository

Extraction & Ingestion Skill

Framework Context

Search Before Creating

Purpose

Workflow Routing

1. Training Data Extraction

2. Archive Information Extraction

3. Decision Extraction

4. Document Knowledge Extraction

5. Document to Markdown Conversion

Workflow: Training Data Extraction

Input Types

Type A: Review with Inline Comments (DOCX with tracked changes)

Type B: Separate Review + Source Documents

Type C: Revision History

Extraction Process

Storage Rules

Quality Standards

Workflow: Archive Information Extraction

Workflow: Decision Extraction

Sensitive Data Handling

What is Sensitive?

Storage Location: $ACA_DATA/processed/

Depersonalization for Public Framework

Integration with Other Skills

When to use /extract vs. specialized skills

Skill Composition

Error Handling

Examples

Example 1: Peer Review Extraction

Example 2: Email Archive

Example 3: Auto-Detection

Validation Checklist

Future Enhancements

Extraction & Ingestion Skill

Framework Context

Search Before Creating

Purpose

Workflow Routing

1. Training Data Extraction

2. Archive Information Extraction

3. Decision Extraction

4. Document Knowledge Extraction

5. Document to Markdown Conversion

Workflow: Training Data Extraction

Input Types

Type A: Review with Inline Comments (DOCX with tracked changes)

Type B: Separate Review + Source Documents

Type C: Revision History

Extraction Process

Storage Rules

Quality Standards

Workflow: Archive Information Extraction

Workflow: Decision Extraction

Sensitive Data Handling

What is Sensitive?

Storage Location: $ACA_DATA/processed/

Depersonalization for Public Framework

Integration with Other Skills

When to use /extract vs. specialized skills

Skill Composition

Error Handling

Examples

Example 1: Peer Review Extraction

Example 2: Email Archive

Example 3: Auto-Detection

Validation Checklist

Future Enhancements

Mehr aus diesem Repository

Storage Location: `$ACA_DATA/processed/`

When to use `/extract` vs. specialized skills

Storage Location: `$ACA_DATA/processed/`

When to use `/extract` vs. specialized skills