Run any Skill in Manus with one click

$pwd:

airtable-knowledge-extract

Name: Airtable Knowledge Extract
Author: sonthanh

// Extract knowledge từ Airtable bases vào vault dưới dạng linked entity pages. Sonnet worker + Opus reviewer ≥95% gate, mandatory HITL re-anchor mỗi base mới. Use khi seeding historical knowledge từ company Airtable bases.

Run Skill in Manus

$ git log --oneline --stat

stars:16

forks:8

updated:May 6, 2026 at 06:23

File Explorer

62 files

SKILL.md

readonly

package.json

"author": "sonthanh"

"repository": "sonthanh/brain-os-plugin"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	airtable-knowledge-extract
description	Extract knowledge từ Airtable bases vào vault dưới dạng linked entity pages. Sonnet worker + Opus reviewer ≥95% gate, mandatory HITL re-anchor mỗi base mới. Use khi seeding historical knowledge từ company Airtable bases.

/airtable-knowledge-extract — Airtable → Vault Knowledge Extractor

Phase 1 MVP. Pure-TypeScript bun pipeline (scripts/run.mts) drives a per-base × vertical-slice extract from Airtable into a per-run cache (~/.claude/airtable-extract-cache/<run-id>/). Sonnet worker writes entity slices; Opus 4.6 reviewer gates each 10-slice batch at ≥95%; self-improver appends rejected examples; re-anchor enforces a mandatory HITL teach round on every new base. Settled design: {vault}/daily/grill-sessions/2026-04-27-auto-eval-scale-pipeline.md.

Vault Path

Read from ${CLAUDE_PLUGIN_ROOT}/brain-os.config.md (or user-local ~/.brain-os/brain-os.config.md, which takes precedence — same logic as hooks/resolve-vault.sh, ported to TS in scripts/outcome-log.mts). Key: vault_path:. The outcome log lands at <vault_path>/daily/skill-outcomes/airtable-knowledge-extract.log.

RUN MODE — Pattern B: never orchestrate from session; launch via supaterm tab

This is a Pattern B pipeline (rationale: {vault}/thinking/aha/2026-04-18-orchestration-on-script.md). The bun runtime owns the per-base loop; claude -p worker subprocesses do per-slice work. A real run takes ~30–60 min per base.

Rule 1 — never orchestrate from a Claude session. Each Bash tool result re-reads the full transcript; a 30-min in-session orchestration burns Opus context-tokens at $300+/run. Do NOT poll, Monitor, or TaskUpdate during the loop.

Rule 2 — supaterm tab + env -u CLAUDECODE is how Claude launches it. sp tab new --script creates a background supaterm tab that survives the Claude session. Stripping CLAUDECODE from the env makes child claude -p workers run outside the parent session context.

When the user invokes /airtable-knowledge-extract, print the ## Run block below verbatim and stop in-session. Do not Bash it from the session. The user pastes the block (after filling in <list-of-base-ids>) into their terminal or invokes the supaterm command directly.

Run

Resolve the bases to process up-front via scripts/active-bases.mts (which internally calls list-bases.mts, applies the inclusion + exclusion rules from references/base-selection-rules.md, and emits the filtered set). Then chain the per-base pre-steps and the orchestrator inside one supaterm tab:

RUN_ID="$(date -u +%Y-%m-%d)-airtable-extract"
SKILL_DIR="${CLAUDE_PLUGIN_ROOT}/skills/airtable-knowledge-extract"
sp tab new --script "env -u CLAUDECODE bash -c '
  set -euo pipefail
  cd \"$SKILL_DIR\"
  ACTIVE_IDS=\$(AIRTABLE_RUN_ID=\"$RUN_ID\" bun run scripts/active-bases.mts --ids)
  for BASE_ID in \$ACTIVE_IDS; do
    AIRTABLE_RUN_ID=\"$RUN_ID\" bun run scripts/classify-table-type.mts  \"\$BASE_ID\" > /dev/null
    AIRTABLE_RUN_ID=\"$RUN_ID\" bun run scripts/classify-clusters.mts    \"\$BASE_ID\" > /dev/null
    AIRTABLE_RUN_ID=\"$RUN_ID\" bun run scripts/cluster-classifier.mts   \"\$BASE_ID\" > /dev/null
    AIRTABLE_RUN_ID=\"$RUN_ID\" bun run scripts/schema-analyser.mts      \"\$BASE_ID\" > /dev/null
    AIRTABLE_RUN_ID=\"$RUN_ID\" bun run scripts/legacy-link-detector.mts \"\$BASE_ID\" > /dev/null
    for CID in \$(jq -r \".[].cluster_id\" \"\$HOME/.claude/airtable-extract-cache/$RUN_ID/clusters-\$BASE_ID.json\"); do
      AIRTABLE_RUN_ID=\"$RUN_ID\" bun run scripts/seed-selection.mts \"\$BASE_ID\" \"\$CID\" > /dev/null
    done
    AIRTABLE_RUN_ID=\"$RUN_ID\" bun run scripts/rubric-author.mts \"\$BASE_ID\" > /dev/null
    bun run scripts/run.mts --run-id \"$RUN_ID\" --base \"\$BASE_ID\" -P 1 \\
      2>&1 | tee -a /tmp/airtable-extract-\"$RUN_ID\".log
  done
  sleep 10
'"

The pre-step chain feeds run.mts's internal dispatcher (added in #237/#268). Order: classify-table-type → Stage 0 LLM table classifier (cached at ${SKILL_DIR}/cache/table-types/); classify-clusters → link-graph clusters; cluster-classifier → cluster-shapes (orphan|pair|chain|tree|hub-spoke|graph); schema-analyser → natural keys + entity-table verdicts; legacy-link-detector → cleanup-tasks; seed-selection → per-cluster seed records (the inner jq loop reads clusters-<base-id>.json to enumerate cluster IDs — seed-selection.mts requires <base-id> <cluster-id> and exits non-zero if cluster-id is omitted). Skipping any of these aborts run.mts with cache not found at ... from the dispatcher.

On the first invocation within a run-id, active-bases.mts writes both ~/.claude/airtable-extract-cache/<run-id>/bases.json (raw Meta API listing, via the wrapped list-bases.mts) and ~/.claude/airtable-extract-cache/<run-id>/active-bases.json (filter trace per base). On any subsequent invocation with the same run-id, the cached trace is read and the API probes are skipped.

For the first run on a new base, keep -P 1 so re-anchor's rung-0' shallow slice gates HITL approval before any parallel rung-1 sweep. Once the user has approved the re-anchor and the rubric, raise to -P 5 for rung-1 throughput.

rubric-author.mts defaults to an interactive readline grill — one question at a time with a concrete sample record, accepting k/d/rewrite/+/batch shortcuts. Prompts go to stderr so the > /dev/null redirect on stdout doesn't suppress the UI; the operator switches to the supaterm tab when it pauses on input. Pass --editor to fall back to the original $EDITOR-based flow for non-interactive contexts. Full shortcut table + plumbing notes: references/rubric-author-flow.md.

HITL — export-decision-and-exit (re-anchor + self-improve)

Both re-anchor.mts and self-improve.mts use the export-decision-and-exit pattern (issue #232). Workers do NOT block on stdin or poll a file mid-run. Instead:

Worker computes its decision context (slices, verdict, sample entities for re-anchor; raw + reasoning for self-improve).
Worker writes the context to a known JSON path.
Worker EXITS with code 42 (HITL_PENDING_EXIT_CODE).
run.mts propagates exit 42 → updates state.json phase=hitl-pending → exits 42 itself with a one-line operator pointer.
set -euo pipefail in the supaterm bash chain bails the for-loop.
Agent (Claude in chat) detects the *-decision-needed.json file, reads context, drives a grill conversation with concrete examples (per the rubric-author #228 pattern), and writes the decision JSON to the spec'd decision_path.
Operator re-runs the same supaterm chain; the for-loop walks confirmed bases (no-op fast-paths) until it reaches the paused base; the worker detects the decision file, consumes (and deletes) it, and proceeds.

Decision file paths and shapes:

Phase	Needed file (worker → agent)	Decision file (agent → worker)
`re-anchor`	`<runDir>/hitl/re-anchor-decision-needed.json`	`<runDir>/hitl/re-anchor-decision.json` — `{verdict: "approve"\|"reject", notes, rewrites?}`
`self-improve` (per slice)	`<basePath>/corrections/correction-needed-<slice_id>.json`	`<basePath>/corrections/correction-decided-<slice_id>.json` — `{corrected, reason}`

Notes:

The needed-file payload includes a wake_token (uuid). It is informational; the decision file does not need to echo it.
For self-improve, selfImprove is atomic in two-pass mode: if any reject lacks a decision file, it writes needed-files for the missing ones only and throws HitlPendingError without consuming any decisions that are already present.
For non-interactive contexts (CI smoke runs, scripted tests) where the legacy stdin / file-poll behavior is required, pass --strict-stdin-hitl to re-anchor.mts or self-improve.mts.

Status & analysis

Both are read-only one-shots — safe to Bash from a session when the user asks "how's it going?".

# live snapshot (human-formatted)
bun run ${CLAUDE_PLUGIN_ROOT}/skills/airtable-knowledge-extract/scripts/status.mts <run-id>

# JSON snapshot for /improve / dashboards
bun run ${CLAUDE_PLUGIN_ROOT}/skills/airtable-knowledge-extract/scripts/status.mts <run-id> --json

# user-only — do NOT call from a session, it loops:
# tail -f /tmp/airtable-extract-<run-id>.log

Outputs

Path	Purpose
`~/.claude/airtable-extract-cache/<run-id>/state.json`	Orchestrator resume state (single source of truth for `--run-id` resume)
`~/.claude/airtable-extract-cache/<run-id>/cost-meter.jsonl`	Per-call token usage (guard input + cumulative reporter)
`~/.claude/airtable-extract-cache/<run-id>/bases.json`	Cached Airtable Meta API bases listing (raw, unfiltered)
`~/.claude/airtable-extract-cache/<run-id>/active-bases.json`	Filter trace per base — `{id, name, included, excluded_by, last_activity, table_count}`
`~/.claude/airtable-extract-cache/<run-id>/clusters-<base-id>.json`	Cluster classification cache (link-graph components)
`~/.claude/airtable-extract-cache/<run-id>/seed-records-<base-id>-<cluster-id>.json`	Seed record selection cache
`~/.claude/airtable-extract-cache/<run-id>/bases/<base-id>/out/<entity-type>/<slug>.md`	Extracted entity page — frontmatter + body with `[[wikilinks]]` to other extracted slugs
`~/.claude/airtable-extract-cache/<run-id>/bases/<base-id>/out/airtable-cleanup-tasks.md`	Detected legacy / broken Airtable cross-cluster links
`~/.claude/airtable-extract-cache/<run-id>/bases/<base-id>/examples.jsonl`	Self-improver per-base examples bank (raw / corrected / reason triples)
`~/.claude/airtable-extract-cache/<run-id>/metrics/base-<base-id>-rung-<n>.jsonl`	Per-base run-summary (batch outcomes, pass-rate, consec-fails)
`<vault>/daily/skill-outcomes/airtable-knowledge-extract.log`	Outcome log line per skill-spec § 11 (one row per `run.mts` invocation)

Entity pages stage in the cache, not the vault, until the user reviews via sp tab focus and merges into the live vault. This is intentional Phase 1 design — the vault is the import destination, not the live extraction target.

Contract

Model tiers pinned. Worker = claude -p --model sonnet; reviewer = claude -p --model opus. Drift caught by tests/extract-slice.test.ts and tests/review-slice.test.ts.
Reviewer gate ≥95% per 10-slice batch. Below threshold: self-improver appends (raw, corrected, reason) to per-base examples.jsonl, retry runs at Math.max(1, floor(P/2)). Two consecutive fail-after-retry outcomes → SIGTERM workers + osascript notify + exit kind=escalated-batch.
Mandatory re-anchor on every new base. Rung-0' shallow slice + HITL approve before rung-1 sweep. Rejection → exit kind=escalated-re-anchor, base re-tried on the next run-id (no auto-promotion).
Idempotent resume. Re-running the same --run-id picks up at the last persisted batch index; seed-selection.mts's deterministic cache is what makes this safe.
Pure TypeScript via bun (per feedback_scripts_ts_bun_only.md). Unit tests under tests/*.test.ts; integration smoke under tests/integration.test.ts.
Cost-meter is the source of truth for spend. Guard polls it every 30s and SIGTERMs workers on max-tokens / max-wallclock breach (calibrated from rung-0 observation, not user-set upfront).

When NOT to use

Two-way sync into Airtable — read-only, Phase 1.
Cron-scheduled live re-import — one-shot per invocation in Phase 1.
Bases mid-schema-rewrite — re-anchor will reject; settle the schema first.
Datasets so small a manual copy is faster — heuristic: <50 records across <3 tables, just copy by hand.

Outcome log

Follow {vault}/skill-spec.md § 11. Two action types appear in this log — extract (main extraction loop) and dispatch (schema-driven dispatcher, scripts/dispatcher.mts, per #237).

extract action — appended by scripts/run.mts CLI main() via scripts/outcome-log.mts:

{date} | airtable-knowledge-extract | extract | ~/work/brain-os-plugin | <metrics-or-cache-path> | commit:none | {pass|partial|fail} | run_id=… base_id=… batches_done=… total_batches=… pass_rate=…

pass — kind=paused (base completed) or kind=no-bases (queue drained).
partial — kind=hitl-pending: the HITL re-anchor gate exited with code 42 and awaits an operator decision. Read <runDir>/hitl/re-anchor-decision-needed.json, drive the grill conversation using the provided context, write <runDir>/hitl/re-anchor-decision.json ({verdict: "approve"|"reject", notes}), then re-run the same supaterm chain to resume.
fail — kind=escalated-re-anchor (operator rejected the re-anchor shallow slice — review what the slice lacked via the re-anchor-decision-needed.json context, adjust the rubric via rubric-author.mts or fix the base schema, then retry with a new run-id; no auto-promotion) or kind=escalated-batch (batch pass-rate below ≥95% threshold after retry — check examples.jsonl for the failing slices).
commit:none — extracted pages stage in cache, not in a tracked repo, so there is no anchor commit hash for diff detection.

dispatch action — appended by scripts/dispatcher.mts after computing the schema-driven dispatch plan for a base component:

{date} | airtable-knowledge-extract | dispatch | ~/work/brain-os-plugin | ~/.claude/airtable-extract-cache/<run-id>/dispatch-<base-id>.json | commit:none | {pass|fail} | run_id=… base_id=… component_id=… shape=… records=… est_tokens=… strategy=… interaction_handler=…

pass — dispatcher resolved the component shape and wrote the dispatch plan JSON.
fail — dispatcher could not resolve the component (unrecognized shape, no extractable tables, or NK-scan failure).
Optional fields: component_id (graph component id), shape (orphan|pair|chain|tree|hub-spoke|graph), records (record count in component), est_tokens (estimated token budget), strategy (edge-emit|single-pass|chunk-by-AP|record-by-record), interaction_handler (dispatch execution mode).

If result != pass, auto-invoke /brain-os:improve airtable-knowledge-extract only for unexpected failures (script errors, API failures). HITL escalations (partial=hitl-pending, fail=re-anchor) are by-design exits that need operator action, not SKILL.md changes.

name	airtable-knowledge-extract
description	Extract knowledge từ Airtable bases vào vault dưới dạng linked entity pages. Sonnet worker + Opus reviewer ≥95% gate, mandatory HITL re-anchor mỗi base mới. Use khi seeding historical knowledge từ company Airtable bases.