| name | handtracking |
| description | Real-time hand detection in egocentric videos using victordibia/handtracking. Outputs bounding boxes for hands, specifically trained on EgoHands dataset. Supports video input/output with labeled hand boxes. Lightweight and fast for egocentric view applications. |
| license | MIT license |
| metadata | {"skill-author":"K-Dense Inc.","original-repo":"https://github.com/victordibia/handtracking","skill-category":"Computer Vision","tags":["hand-tracking","egocentric-vision","object-detection","tensorflow"]} |
HandTracking - Real-time Hand Detection
Overview
Real-time hand detection system designed specifically for egocentric (first-person) video views. Trained on the EgoHands dataset, this lightweight model detects hand bounding boxes in video streams and can output labeled videos with hand annotations. Ideal for quick prototyping of hand-based interaction systems in AR/VR and wearable computing applications.
Companion JavaScript library: Handtrack.js is available for browser-based applications (https://github.com/victordibia/handtrack.js).
When to Use This Skill
This skill should be used when:
- Analyzing egocentric video footage from wearable cameras or smart glasses
- Detecting hand presence and location in first-person perspective videos
- Building hand gesture interfaces or interaction systems
- Annotating training data for hand detection models
- Processing egocentric videos for human-computer interaction research
- Creating labeled video outputs with hand bounding box overlays
- Implementing real-time hand detection in web applications (using Handtrack.js)
- Quick prototyping of hand-based AR/VR interfaces
Choose this when: You need fast, lightweight hand detection with bounding box outputs and don't require detailed joint-level pose estimation.
Consider alternatives: If you need 3D hand pose keypoints, hand-object segmentation, or multi-view tracking, see other skills in this category.
Core Capabilities
1. Hand Detection in Egocentric Views
EgoHands-trained model: Specifically optimized for first-person perspective videos where hands are viewed from the wearer's viewpoint.
- Input: Video files (MP4, AVI) or webcam streams
- Output: Bounding boxes (x, y, width, height) with confidence scores
- Model: Lightweight TensorFlow model trained on EgoHands dataset
- Performance: Real-time processing on standard hardware
- Detection types: Left hand, right hand classification
- Visualization: Red/green bounding boxes overlaid on video
Bounding box format:
{
'bbox': [x, y, width, height],
'score': confidence,
'label': 'hand'
}
2. Video Processing and Annotation
Input video processing: Process entire video files and export annotated results.
Workflow:
git clone https://github.com/victordibia/handtracking.git
cd handtracking
pip install tensorflow==1.15.0 opencv-python numpy
python run.py \
--input_video your_egocentric.mp4 \
--output_video output_labeled.mp4 \
--threshold 0.5
Output video features:
- Hands outlined with colored bounding boxes (red/green)
- Real-time frame rate display
- Confidence scores shown on boxes
- Smooth tracking across frames
-保存带标注的视频文件
3. Real-time Webcam Detection
Live camera processing: Process webcam streams in real-time for interactive applications.
import handtracking
detector = handtracking.HandDetector()
detector.detect_from_webcam(
display=True,
save_video=False,
confidence_threshold=0.6
)
Applications:
- Live hand gesture interfaces
- Interactive installations
- Real-time hand presence detection
- Gesture-controlled systems
4. Browser-based Detection (Handtrack.js)
JavaScript companion library: Use the same model technology in web applications.
Integration:
<script src="https://cdn.jsdelivr.net/npm/handtrackjs/dist/handtrack.min.js"></script>
<script>
const model = await handTrack.load();
const video = document.getElementById('video');
const predictions = await model.detect(video);
predictions.forEach(prediction => {
console.log(prediction.bbox);
console.log(prediction.score);
});
</script>
Browser capabilities:
- Run entirely in browser (no server required)
- Real-time webcam processing
- Canvas-based visualization
- WebGL acceleration support
- Works with video files and live streams
Installation and Setup
Option 1: Python Installation
git clone https://github.com/victordibia/handtracking.git
cd handtracking
python -m venv venv
source venv/bin/activate
pip install tensorflow==1.15.0
pip install opencv-python numpy pillow
Model files: Automatically downloaded from the repository on first use (~20MB).
Option 2: JavaScript Installation (Handtrack.js)
npm install handtrackjs
Usage Examples
Example 1: Process Video with Output
import cv2
from handtracking import HandDetector
detector = HandDetector()
cap = cv2.VideoCapture('egocentric_video.mp4')
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output_labeled.mp4', fourcc, fps, (width, height))
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
detections = detector.detect_hands(frame)
for det in detections:
x, y, w, h = det['bbox']
score = det['score']
color = (0, 255, 0) if score > 0.7 else (0, 0, 255)
cv2.rectangle(frame, (x, y), (x+w, y+h), color, 2)
label = f"Hand: {score:.2f}"
cv2.putText(frame, label, (x, y-10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
out.write(frame)
cap.release()
out.release()
Example 2: Extract Hand Regions
import cv2
import numpy as np
from handtracking import HandDetector
detector = HandDetector()
cap = cv2.VideoCapture('egocentric.mp4')
frame_count = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
detections = detector.detect_hands(frame)
for i, det in enumerate(detections):
x, y, w, h = det['bbox']
hand_roi = frame[y:y+h, x:x+w]
if det['score'] > 0.7:
cv2.imwrite(f'hand_{frame_count}_{i}.jpg', hand_roi)
frame_count += 1
cap.release()
Example 3: Real-time Detection Statistics
from handtracking import HandDetector
import cv2
detector = HandDetector()
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
detections = detector.detect_hands(frame)
num_hands = len(detections)
avg_confidence = sum(d['score'] for d in detections) / num_hands if num_hands > 0 else 0
cv2.putText(frame, f"Hands: {num_hands}", (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.putText(frame, f"Avg Conf: {avg_confidence:.2f}", (10, 70),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow('Hand Tracking', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Example 4: Browser-based Detection
const modelParams = {
flipHorizontal: true,
maxNumBoxes: 2,
iouThreshold: 0.5,
scoreThreshold: 0.6,
};
handTrack.load(modelParams).then(model => {
console.log("Model loaded");
const video = document.getElementById('video');
const canvas = document.getElementById('canvas');
const context = canvas.getContext('2d');
function detectFrame() {
model.detect(video).then(predictions => {
context.clearRect(0, 0, canvas.width, canvas.height);
context.drawImage(video, 0, 0, canvas.width, canvas.height);
predictions.forEach(prediction => {
const [x, y, width, height] = prediction.bbox;
context.strokeStyle = '#00FF00';
context.lineWidth = 4;
context.strokeRect(x, y, width, height);
context.fillStyle = '#00FF00';
context.fillText(
`Hand: ${prediction.score.toFixed(2)}`,
x, y - 10
);
});
requestAnimationFrame(detectFrame);
});
}
detectFrame();
});
Integration with Other Skills
This skill works effectively with:
- MediaPipe skills: For more advanced hand pose estimation and gesture recognition
- OpenCV-based video processing skills: For comprehensive video analysis pipelines
- Machine learning skills: For building custom hand gesture classifiers
- XR/AR framework skills: For integrating hand detection into immersive experiences
Model Specifications
Architecture: Lightweight CNN-based object detection model
- Framework: TensorFlow 1.x (compatible with older TF versions)
- Training dataset: EgoHands (4,800 egocentric images)
- Input resolution: Flexible (recommended: 640x480)
- Model size: ~20MB
- Inference speed: 30+ FPS on standard CPU (depends on resolution)
Detection performance (on EgoHands test set):
- mAP: ~85% (mean Average Precision)
- Recall: ~82%
- Precision: ~88%
Limitations and Considerations
Scope: This skill provides bounding box detection only. For more detailed analysis, consider:
- 3D hand pose estimation: Use
ap229997-hands skill for joint keypoints
- Hand-object segmentation: Use
owenzlz-egohos skill for pixel-level masks
- Multi-view tracking: Use
facebookresearch-hot3d for 3D tracking
Known limitations:
- Trained primarily on egocentric views (first-person perspective)
- May have reduced performance on third-person views
- Bounding boxes only (no joint or finger-level details)
- Requires TensorFlow 1.x (not compatible with TF 2.x without modifications)
- Model trained on diverse hands but may have bias toward certain demographics
When to upgrade:
- Need 3D hand joint positions → Use ap229997-hands
- Need hand-object interaction segmentation → Use owenzlz-egohos
- Need multi-view or high-precision 3D tracking → Use facebookresearch-hot3d
- Need browser-based 3D hand tracking → Consider MediaPipe Hands
Performance Optimization
CPU optimization:
detector = HandDetector()
frame = cv2.resize(frame, (640, 480))
detections = detector.detect_hands(frame)
GPU acceleration (if available):
import tensorflow as tf
Batch processing:
from concurrent.futures import ThreadPoolExecutor
def process_video(video_path):
detector = HandDetector()
return detector.process_video(video_path)
with ThreadPoolExecutor(max_workers=4) as executor:
results = executor.map(process_video, video_list)
Troubleshooting
Issue: Model not downloading automatically
- Solution: Manually download model files from the GitHub repository and place in the models directory
Issue: TensorFlow version conflicts
- Solution: Use virtual environment with TensorFlow 1.15.0:
pip install tensorflow==1.15.0
Issue: Low detection accuracy
- Solution: Adjust confidence threshold (try 0.5-0.7), ensure video quality is adequate, check that view is egocentric
Issue: Slow processing speed
- Solution: Reduce video resolution, close other applications, consider GPU acceleration
Issue: No hands detected
- Solution: Check if video is in egocentric view, lower confidence threshold, improve lighting conditions
References and Resources
Documentation
Related Projects
Example Applications
- Gesture-controlled interfaces
- Sign language detection
- Human-computer interaction research
- AR/VR hand tracking
- Activity recognition from egocentric video
Citation
If you use this hand tracking implementation in research, please cite:
@article{betancourt2015egohands,
title={Egohands: A dataset for egocentric hand interactions},
author={Betancourt, Alex and Orozco, Jorge and Bolaños, Mauricio},
journal={arXiv preprint arXiv:1509.06044},
year={2015}
}
And the original repository:
@software{victordibia_handtracking,
author = {Victor Dibia},
title = {Real-time Hand Detection in Python using TensorFlow},
url = {https://github.com/victordibia/handtracking},
year = {2018}
}
Best Practices
- Start with this skill for quick prototyping and proof-of-concept
- Validate on your data before committing to production use
- Consider GPU acceleration for real-time applications
- Test confidence thresholds on your specific use case
- Use appropriate resolution - balance between speed and accuracy
- Handle edge cases - no hands, occluded hands, multiple hands
- Post-process results - smoothing, filtering, temporal consistency
- Benchmark alternatives before final implementation
Future Enhancements
Consider exploring these related directions:
- Fine-tune model on domain-specific egocentric data
- Integrate with gesture recognition pipelines
- Combine with object detection for hand-object interactions
- Extend to browser applications using Handtrack.js
- Upgrade to 3D pose estimation for more detailed analysis