원클릭으로
kb-vacuum
Scan vault raw/ folder for hyperlinks embedded in markdown notes; POST each unseen URL to LuminaVault server /v1/capture/safari so Hermes ingests + memorizes. Usage: /kb-vacuum [folder=raw/]
메뉴
Scan vault raw/ folder for hyperlinks embedded in markdown notes; POST each unseen URL to LuminaVault server /v1/capture/safari so Hermes ingests + memorizes. Usage: /kb-vacuum [folder=raw/]
Compile all uncompiled raw/ content into the wiki. Writes source summaries, creates/updates concept articles with Obsidian backlinks, and updates the index. Run after /kb-ingest to process new content.
Generate daily Reddit and X/Twitter marketing content from the AI Cohort scoreboard, with automatic fallback to the most recent available data when today's file is missing.
Configuration, maintenance, and troubleshooting of AI Cohort scoreboard scripts including vault path setup and script updates.
Build and deliver periodic content digests (news, stock, entertainment) to multiple platforms: save to vault Raw/ and print full markdown to stdout for cron-based delivery.
Capture external content (X/Twitter articles, web posts) and ingest into Obsidian vault Raw/ with automatic theme detection, summarization, and structured frontmatter. Handles X/fixupx links via multi-strategy extraction (direct fetch → r.jina.ai → nitter fallback).
Weekly founder content research and production engine that transforms trending news into ready-to-use content assets.
| name | kb-vacuum |
| description | Scan vault raw/ folder for hyperlinks embedded in markdown notes; POST each unseen URL to LuminaVault server /v1/capture/safari so Hermes ingests + memorizes. Usage: /kb-vacuum [folder=raw/] |
Walk the local vault's raw/ folder, extract every HTTP(S) hyperlink found in markdown notes, and POST each unseen one to LuminaVault server /v1/capture/safari so the server-side enrichment pipeline (YouTube / X / GenericOG / Jina tier-2) runs and the result lands in vault_files for Hermes memory compilation.
Companion to /kb-ingest: Whereas /kb-ingest takes ONE URL/PDF/text and stages it, /kb-vacuum is the batch tool — vacuum up every URL the user has accumulated in their notes since the last run.
cat ~/.claude/kb-config.json
Required keys:
vault_path — absolute path to the local vault root (e.g. /opt/data/obsidian-vault/FACorreia)server_base_url — LuminaVault server origin (e.g. https://your-tenant.luminavault.app)auth_token — JWT for the authenticated user (Bearer token used against /v1/capture/safari)If any are missing, stop and tell the user which key needs to be set, then exit.
Set VAULT_PATH, SERVER_BASE_URL, AUTH_TOKEN from the config for subsequent steps.
The argument after /kb-vacuum is the folder to scan, relative to the vault root.
raw/raw/AI → scan only that subdirectorynotes/ → scan thatCompute SCAN_ROOT="$VAULT_PATH/<arg-or-raw>". Verify it exists:
test -d "$SCAN_ROOT" || { echo "scan root not found: $SCAN_ROOT"; exit 1; }
The manifest lives at $VAULT_PATH/.kb/vacuum_seen.json and stores the set of URLs already POSTed in prior runs. Format:
{
"seen": [
"https://example.com/article-a",
"https://www.youtube.com/watch?v=..."
]
}
Create the file if it doesn't exist (with {"seen": []}). Load into memory as a Set<String> for O(1) dedup.
mkdir -p "$VAULT_PATH/.kb"
test -f "$VAULT_PATH/.kb/vacuum_seen.json" || echo '{"seen": []}' > "$VAULT_PATH/.kb/vacuum_seen.json"
*.md under SCAN_ROOTWalk recursively. For each markdown file, extract all HTTP(S) URLs. Handle both bare URLs and [label](url) Markdown link wrappers — extract just the url part. Strip trailing punctuation (.,;:!?'"). Dedupe within the run.
Recommended approach: a small Python or jq+sed pipeline. Example Python:
import json, os, re, pathlib, sys, urllib.parse
vault = pathlib.Path(os.environ["VAULT_PATH"])
scan_root = pathlib.Path(os.environ["SCAN_ROOT"])
seen_path = vault / ".kb" / "vacuum_seen.json"
seen = set(json.loads(seen_path.read_text())["seen"])
# Match http(s) URLs both bare and inside Markdown link wrappers.
url_re = re.compile(r"https?://[^\s)>\]\"']+", re.IGNORECASE)
trailing_punct = ".,;:!?\"'"
discovered = []
discovered_set = set()
for md in scan_root.rglob("*.md"):
try:
text = md.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
for raw in url_re.findall(text):
url = raw
while url and url[-1] in trailing_punct:
url = url[:-1]
if not url or url in discovered_set:
continue
discovered_set.add(url)
discovered.append(url)
new_urls = [u for u in discovered if u not in seen]
print(f"DISCOVERED={len(discovered)}")
print(f"NEW={len(new_urls)}")
# Write the URL list to a temp file the bash steps can iterate.
out = vault / ".kb" / "vacuum_pending.txt"
out.write_text("\n".join(new_urls))
/v1/capture/safariFor each URL in .kb/vacuum_pending.txt, POST:
while IFS= read -r url; do
[ -z "$url" ] && continue
CAPTURED_AT=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST \
"$SERVER_BASE_URL/v1/capture/safari" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d "$(jq -nc --arg u "$url" --arg t "$CAPTURED_AT" '{url:$u, source:"kb-vacuum", capturedAt:$t}')")
case "$STATUS" in
2*)
echo "POSTED $url"
echo "$url" >> "$VAULT_PATH/.kb/vacuum_posted.txt"
;;
4*)
echo "CLIENT_ERROR ($STATUS) $url — skipping"
;;
5*|000)
echo "TRANSIENT ($STATUS) $url — retry once"
sleep 1
RETRY_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST \
"$SERVER_BASE_URL/v1/capture/safari" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d "$(jq -nc --arg u "$url" --arg t "$CAPTURED_AT" '{url:$u, source:"kb-vacuum", capturedAt:$t}')")
if [[ "$RETRY_STATUS" =~ ^2 ]]; then
echo "POSTED (retry) $url"
echo "$url" >> "$VAULT_PATH/.kb/vacuum_posted.txt"
else
echo "FAILED ($RETRY_STATUS) $url — will retry next run"
fi
;;
esac
done < "$VAULT_PATH/.kb/vacuum_pending.txt"
Merge every successfully-POSTed URL from this run into the manifest:
import json, pathlib, os
vault = pathlib.Path(os.environ["VAULT_PATH"])
seen_path = vault / ".kb" / "vacuum_seen.json"
posted_path = vault / ".kb" / "vacuum_posted.txt"
seen = set(json.loads(seen_path.read_text())["seen"])
if posted_path.exists():
for line in posted_path.read_text().splitlines():
url = line.strip()
if url:
seen.add(url)
seen_path.write_text(json.dumps({"seen": sorted(seen)}, indent=2))
posted_path.unlink(missing_ok=True)
(vault / ".kb" / "vacuum_pending.txt").unlink(missing_ok=True)
<DISCOVERED> urls discovered, <NEW> new, <POSTED> posted, <FAILED> failed
Example:
27 urls discovered, 14 new, 13 posted, 1 failed
If failed > 0, list the failed URLs so the user knows what to investigate (auth, server reachability, etc).
vacuum_seen.json) is primary. Server also dedupes via VaultFile (tenant_id, source_url) unique constraint, so re-POSTing the same URL accidentally is idempotent..kb/vacuum_seen.json and re-run. Server-side dedup keeps you safe from duplicate VaultFile rows.auth_token in ~/.claude/kb-config.json and re-run./v1/capture/bulk endpoint.