| name | framedex |
| description | Build a portable knowledge base of your video (and eventually photo) archive across multiple SSDs. For each clip: GPS + reverse-geocoded place, speaker-diarized multi-lingual transcript with English translation, face detection + embeddings for later named-person queries, Claude/Gemma structured assessment (keep/review/cull rating + technical quality + lighting + time of day + dominant colors + audio quality + people count + keywords + notable timestamps), and prose scene description. Writes plain-text sidecars next to originals + persistent face DB. Non-destructive, idempotent, resumable. Use whenever you want to: index videos, tag footage, organize a drive, build the video knowledge base, transcribe audio, describe clips, rate clips, find clips by location/lighting/person/keyword, generate folder summaries, identify duplicates or cull pile. Trigger phrases: 'index this drive', 'tag my videos', 'what's on this SSD', 'rate these clips', 'find me clips of X', 'what should I cull', 'build the video knowledge base'. |
framedex ā Video Archive Knowledge Base
Cross-project, cross-drive. An entire video archive turned into a portable plain-text knowledge base + queryable face DB.
Per-clip pipeline
ffprobe ā metadata
exiftool ā GPS lat/lon/altitude (iPhone, DJI, drone all supported)
- Nominatim ā reverse-geocoded place name (rate-limited 1/sec, free, no key)
ffmpeg ā 5 representative JPEG frames @ 1920px max
ffmpeg ā audio extraction ā WhisperX transcribe + diarization + alignment
- WhisperX translate-mode ā English translation for non-English clips
insightface (RetinaFace + ArcFace) ā face detection + 512-dim embeddings on the same frames
- Vision model (Claude Haiku/Sonnet via Max CLI / API, OR local Gemma via LM Studio) ā structured YAML + prose description in one call
- Write
[filename].description.md sidecar + insert face rows into ~/.framedex/faces.db
Output schema
Each sidecar's YAML frontmatter:
file: IMG_4827.mov
path: /Volumes/SSD-2024/...
parent_folder: drone
duration_seconds: 12.3
resolution: 3840x2160
codec: hvc1
size_bytes: 245678912
creation_time: 2024-08-14T07:23:11Z
location:
lat: 37.7456
lon: -119.5936
altitude_m: 1842.5
place: "Yosemite Valley, Mariposa County, USA"
language_detected: es
speaker_count: 2
rating: keep
cull_reason: ""
technical:
focus: sharp
exposure: strong
stability: smooth
motion_blur: clean
lighting: golden_hour
time_of_day: golden_hour
dominant_color_palette: "warm dusk: amber, ochre, dusty olive"
dominant_colors: [amber, ochre, olive, sky-blue]
audio_quality: clean_speech
people_count: 3
keywords: [drone, landscape, construction, golden-hour, wide-shot, speech, workers]
notable_timestamp: ""
faces:
- cluster_id: tmp_a3f78c
frame_time: 1.2
bbox: [120, 80, 180, 240]
detection_quality: high
face_count: 2
indexed_at: 2026-05-17T14:32:01
Body follows: ## Description (Scene/Subjects/Action/Mood/Shot type/Use cases prose), ## Transcript (with speaker labels if diarized), ## English translation (if applicable).
Three vision backends
| Backend | Quality | Speed | Cost | Privacy |
|---|
cli (default) | Claude Haiku/Sonnet via Max | ~10-30s per clip | $0 (Max subscription) | Cloud (frames sent to Anthropic) |
api | Claude Haiku/Sonnet via API | ~2-3s per clip | ~$0.002 (Haiku) / ~$0.008 (Sonnet) per clip | Cloud (frames sent to Anthropic) |
local | Local model via LM Studio (Gemma 4, Qwen2-VL, etc.) | ~3-90s depending on model | $0 | Fully local |
--vision-model haiku|sonnet picks Claude model for cli/api. --local-model NAME picks LM Studio model. The script auto-strips ANTHROPIC_API_KEY from claude -p subprocess env so CLI mode hits Max OAuth even if API key is set globally.
Face detection
Always on by default. ~/.framedex/faces.db is the single shared face database across all drives. Per-clip embeddings stored as 512 float32 vectors + bbox + detection score. Temporary cluster IDs (tmp_<hash>) get replaced with real names by the (not-yet-built) fdx-faces clustering tool ā that tool will be a follow-up that doesn't require re-running the indexing pass, because all embeddings are captured here.
Skip with --no-faces if you don't want face data.
Companion scripts / aliases
| Alias | Script | Purpose |
|---|
fdx | framedex.index_videos | Main indexer (this skill) |
fdx-summary | framedex.trip_summary | Recursive folder summaries (_folder-summary.md in each ā„5-clip folder) |
fdx-master | framedex.master_index | Drive-level _INDEX.md + _INDEX.json |
fdx-query | framedex.query | Filter sidecars by metadata (rating, lighting, person, keyword, etc.) |
Set up once
cd ~/.claude/skills/framedex
uv pip install -e .
python3 scripts/setup.py
export HF_TOKEN=hf_...
Common run patterns
fdx /Volumes/SSD-2024 --max-files 5
fdx /Volumes/SSD-2024
fdx /Volumes/SSD-2024 --vision-model sonnet
fdx /Volumes/SSD-2024 --backend local
fdx /Volumes/SSD-2024 --max-duration 30
fdx /Volumes/SSD-2024 --force --vision-model sonnet
fdx-summary /Volumes/SSD-2024
fdx-master /Volumes/SSD-2024
fdx-query /Volumes/SSD-2024 --rating keep --time-of-day golden_hour
fdx-query /Volumes/SSD-2024 --rating cull
fdx-query /Volumes/SSD-2024 --place-contains California --language es
fdx-query /Volumes/SSD-2024 --keyword drone --keyword landscape
fdx-query /Volumes/SSD-2024 --stability smooth --people-count 0
fdx-query /Volumes/SSD-2024 --rating keep --json | jq '.[] | .path'
Optional folder context
Drop .video-context.md at the root of any scan target with a paragraph describing what's on that drive ("construction site, 2023-2026", "family travel, 2024", etc). The vision prompt prepends it for context-aware descriptions.
Privacy
| Component | Local or cloud? |
|---|
| ffmpeg / exiftool / Whisper / pyannote / insightface | Local |
| Nominatim reverse geocoding | Sends lat/lon (not video). Skip with --no-geocode. |
Vision (--backend cli/api) | Frames sent to Anthropic. By default not used for training. |
Vision (--backend local) | Fully local, fully offline. |
Face DB (~/.framedex/faces.db) | Local only, never uploaded. Back up the file manually if you care. |
Multiple SSDs
Run on each drive separately. Sidecars travel with the data; the face DB is centralized at ~/.framedex/faces.db so cross-drive person queries work.
Known limitations (v1)
- Frame sampling is evenly-spaced, not scene-detected (future: ffmpeg
select=gt(scene,0.4))
- pyannote diarization degrades on heavy ambient noise (wind, music, crowd)
- WhisperX runs on CPU on Apple Silicon (CTranslate2 doesn't have M-series GPU acceleration yet; 64GB CPU is still plenty)
fdx-faces (clustering + labeling tool) not built yet ā face embeddings are captured but cluster IDs are temporary hashes until that tool ships
- RAW image format support not yet (videos only; photos are coming)
File layout
~/.claude/skills/framedex/
āāā SKILL.md # this file
āāā README.md
āāā pyproject.toml # deps, ruff/mypy config, entry points
āāā .pre-commit-config.yaml # pre-commit hooks
āāā .github/workflows/ci.yml # CI (ruff + mypy)
āāā scripts/
ā āāā setup.py # system binaries + model pre-download
āāā src/framedex/
āāā __init__.py # package init, version from pyproject.toml
āāā index_videos.py # main worker (fdx)
āāā face_db.py # face detection + SQLite face DB module
āāā trip_summary.py # recursive folder summaries (fdx-summary)
āāā master_index.py # drive-level KB (fdx-master)
āāā query.py # filter sidecars (fdx-query)