// Expert in photo content recognition, intelligent curation, and quality filtering. Specializes in face/animal/place recognition, perceptual hashing for de-duplication, screenshot/meme detection, burst photo selection, and quick indexing strategies. Activate on "face recognition", "face clustering", "perceptual hash", "near-duplicate", "burst photo", "screenshot detection", "photo curation", "photo indexing", "NSFW detection", "pet recognition", "DINOHash", "HDBSCAN faces". NOT for GPS-based location clustering (use event-detection-temporal-intelligence-expert), color palette extraction (use color-theory-palette-harmony-expert), semantic image-text matching (use clip-aware-embeddings), or video analysis/frame extraction.
| name | photo-content-recognition-curation-expert |
| description | Expert in photo content recognition, intelligent curation, and quality filtering. Specializes in face/animal/place recognition, perceptual hashing for de-duplication, screenshot/meme detection, burst photo selection, and quick indexing strategies. Activate on "face recognition", "face clustering", "perceptual hash", "near-duplicate", "burst photo", "screenshot detection", "photo curation", "photo indexing", "NSFW detection", "pet recognition", "DINOHash", "HDBSCAN faces". NOT for GPS-based location clustering (use event-detection-temporal-intelligence-expert), color palette extraction (use color-theory-palette-harmony-expert), semantic image-text matching (use clip-aware-embeddings), or video analysis/frame extraction. |
| allowed-tools | ["Read","Write","Edit","Bash","Grep","Glob","mcp__firecrawl__firecrawl_search","WebFetch"] |
| integrates_with | ["event-detection-temporal-intelligence-expert","color-theory-palette-harmony-expert","collage-layout-expert","clip-aware-embeddings"] |
Expert in photo content analysis and intelligent curation. Combines classical computer vision with modern deep learning for comprehensive photo analysis.
โ Use for:
โ NOT for:
event-detection-temporal-intelligence-expertcolor-theory-palette-harmony-expertclip-aware-embeddingsWhat do you need to recognize/filter?
โ
โโ Duplicate photos? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Perceptual Hashing
โ โโ Exact duplicates? โโโโโโโโโโโโโโโโโโโโโโโโโโโโ dHash (fastest)
โ โโ Brightness/contrast changes? โโโโโโโโโโโโโโโโโ pHash (DCT-based)
โ โโ Heavy crops/compression? โโโโโโโโโโโโโโโโโโโโโ DINOHash (2025 SOTA)
โ โโ Production system? โโโโโโโโโโโโโโโโโโโโโโโโโโโ Hybrid (pHash โ DINOHash)
โ
โโ People in photos? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Face Clustering
โ โโ Known thresholds? โโโโโโโโโโโโโโโโโโโโโโโโโโโโ Apple-style Agglomerative
โ โโ Unknown data distribution? โโโโโโโโโโโโโโโโโโโ HDBSCAN
โ
โโ Pets/Animals? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Pet Recognition
โ โโ Detection? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ YOLOv8
โ โโ Individual clustering? โโโโโโโโโโโโโโโโโโโโโโโ CLIP + HDBSCAN
โ
โโ Best from burst? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Burst Selection
โ โโ Score: sharpness + face quality + aesthetics
โ
โโ Filter junk? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Content Detection
โโ Screenshots? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Multi-signal classifier
โโ NSFW? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Safety classifier
Problem: Camera bursts, re-saved images, and minor edits create near-duplicates.
Solution: Perceptual hashes generate similar values for visually similar images.
Method Comparison:
| Method | Speed | Robustness | Best For |
|---|---|---|---|
| dHash | Fastest | Low | Exact duplicates |
| pHash | Fast | Medium | Brightness/contrast changes |
| DINOHash | Slower | High | Heavy crops, compression |
| Hybrid | Medium | Very High | Production systems |
Hybrid Pipeline (2025 Best Practice):
Hamming Distance Thresholds:
โ Deep dive: references/perceptual-hashing.md
Goal: Group photos by person without user labeling.
Apple Photos Strategy (2021-2025):
HDBSCAN Alternative:
Parameters:
| Setting | Agglomerative | HDBSCAN |
|---|---|---|
| Pass 1 threshold | 0.4 (cosine) | - |
| Pass 2 threshold | 0.6 (cosine) | - |
| Min cluster size | - | 3 photos |
| Metric | cosine | cosine |
โ Deep dive: references/face-clustering.md
Problem: Burst mode creates 10-50 nearly identical photos.
Multi-Criteria Scoring:
| Criterion | Weight | Measurement |
|---|---|---|
| Sharpness | 30% | Laplacian variance |
| Face Quality | 35% | Eyes open, smiling, face sharpness |
| Aesthetics | 20% | NIMA score |
| Position | 10% | Middle frames bonus |
| Exposure | 5% | Histogram clipping check |
Burst Detection: Photos within 0.5 seconds of each other.
โ Deep dive: references/content-detection.md
Multi-Signal Approach:
| Signal | Confidence | Description |
|---|---|---|
| UI elements | 0.85 | Status bars, buttons detected |
| Perfect rectangles | 0.75 | >5 UI buttons (90ยฐ angles) |
| High text | 0.70 | >25% text coverage (OCR) |
| No camera EXIF | 0.60 | Missing Make/Model/Lens |
| Device aspect | 0.60 | Exact phone screen ratio |
| Perfect sharpness | 0.50 | >2000 Laplacian variance |
Decision: Confidence >0.6 = screenshot
โ Deep dive: references/content-detection.md
Goal: Index 10K+ photos efficiently with caching.
Features Extracted:
Performance (10K photos, M1 MacBook Pro):
| Operation | Time |
|---|---|
| Perceptual hashing | 2 min |
| CLIP embeddings | 3 min (GPU) |
| Face detection | 4 min |
| Color palettes | 1 min |
| Aesthetic scoring | 2 min (GPU) |
| Clustering + dedup | 1 min |
| Total (first run) | ~13 min |
| Incremental | <1 min |
โ Deep dive: references/photo-indexing.md
What it looks like:
distance = np.linalg.norm(embedding1 - embedding2) # WRONG
Why it's wrong: Face embeddings are normalized; cosine similarity is the correct metric.
What to do instead:
from scipy.spatial.distance import cosine
distance = cosine(embedding1, embedding2) # Correct
What it looks like: Using same distance threshold for all face clusters.
Why it's wrong: Different people have varying intra-class variance (twins vs. diverse ages).
What to do instead: Use HDBSCAN for automatic threshold discovery, or two-pass clustering with conservative + relaxed passes.
What it looks like:
is_duplicate = np.allclose(img1, img2) # WRONG
Why it's wrong: Re-saved JPEGs, crops, brightness changes create pixel differences.
What to do instead: Perceptual hashing (pHash or DINOHash) with Hamming distance.
What it looks like: Processing faces one photo at a time without batching.
Why it's wrong: GPU underutilization, 10x slower than batched.
What to do instead: Batch process images (batch_size=32) with GPU acceleration.
What it looks like:
for face in all_detected_faces:
cluster(face) # No filtering
Why it's wrong: Low-confidence detections create noise clusters (hands, objects).
What to do instead: Filter by confidence (threshold 0.9 for faces).
What it looks like: Assigning noise points to nearest cluster.
Why it's wrong: Solo appearances shouldn't pollute person clusters.
What to do instead: HDBSCAN/DBSCAN naturally identifies noise (label=-1). Keep noise separate.
from photo_curation import PhotoCurationPipeline
pipeline = PhotoCurationPipeline()
# Index photo library
index = pipeline.index_library('/path/to/photos')
# De-duplicate
duplicates = index.find_duplicates()
print(f"Found {len(duplicates)} duplicate groups")
# Cluster faces
face_clusters = index.cluster_faces()
print(f"Found {len(face_clusters)} people")
# Select best from bursts
best_photos = pipeline.select_best_from_bursts(index)
# Filter screenshots
real_photos = pipeline.filter_screenshots(index)
# Curate for collage
collage_photos = pipeline.curate_for_collage(index, target_count=100)
torch transformers facenet-pytorch ultralytics hdbscan opencv-python scipy numpy scikit-learn pillow pytesseract
Version: 2.0.0 Last Updated: November 2025