| name | beast3d-animal-behavioral-neural-encoding |
| description | BEAST3D方法论:基于3D Gaussian splatting的自监督动物行为分析和神经编码框架。从多视角视频学习3D表征,支持新颖视角合成、姿态估计和神经编码。Activation: BEAST3D, 3D Gaussian splatting, animal behavior, neural encoding, multi-view video, 视觉神经科学. |
BEAST3D: Animal Behavioral Analysis and Neural Encoding
Paper: arXiv:2606.02937 (Submitted: 2026-06-01)
Authors: Yanchen Wang, Lenny Aharon, Wangshu Zhu, et al.
Categories: q-bio.NC, cs.CV
Core Innovation
BEAST3D is a self-supervised pretraining framework that learns 3D visual representations from unlabeled, calibrated multi-view video for animal behavioral analysis and neural encoding.
Key Features
-
3D Gaussian Splatting + Vision Transformer
- ViT predicts 3D Gaussian splats reconstructing held-out views via differentiable rendering
- Simultaneously segments animal from background
-
Sparse View Reconstruction
- Works with as few as 4 views by conditioning on known camera parameters
- Unlike general models requiring dense overlapping viewpoints
-
Three Downstream Tasks
- Novel view synthesis (validates 3D representation quality)
- Multi-view pose estimation (sparse keypoint trajectories)
- Neural encoding (relates 3D behavioral features to neural activity)
-
Cross-Species Evaluation
- Tested across 4 species with viewpoint-invariant features
Methodology Framework
1. Architecture Design
class BEAST3D(nn.Module):
"""
Self-supervised 3D representation learning framework
Components:
- Vision Transformer (ViT) encoder
- 3D Gaussian splat predictor
- Differentiable rendering decoder
- Background segmentation module
"""
def forward(self, multi_view_frames, camera_params):
view_features = self.vit_encoder(multi_view_frames)
splats = self.splat_predictor(view_features, camera_params)
rendered_views = self.differentiable_render(splats, camera_params)
segmentation = self.segmentation_head(view_features)
return rendered_views, segmentation, splats
2. Training Objective
def loss_function(rendered, target_view, segmentation, gt_mask):
"""
Self-supervised learning objective
"""
recon_loss = F.mse_loss(rendered, target_view)
seg_loss = F.binary_cross_entropy(segmentation, gt_mask)
total_loss = recon_loss + lambda_seg * seg_loss
return total_loss
3. Downstream Transfer
novel_view = model.render(splats, novel_camera_pose)
keypoints = pose_estimator(splats.features)
neural_prediction = neural_decoder(splats.behavioral_features)
correlation = pearsonr(neural_prediction, recorded_activity)
Implementation Guide
Step 1: Data Preparation
calibration = {
'camera_params': {
'intrinsics': K_matrix,
'extrinsics': [R_t for each camera]
},
'num_views': 4,
'frame_rate': fps,
'resolution': (H, W)
}
multi_view_batch = {
'frames': [frame_cam1, frame_cam2, frame_cam3, frame_cam4],
'camera_indices': [0, 1, 2, 3],
'held_out_cam': 4
}
Step 2: Model Training
model = BEAST3D(
vit_config='ViT-L/16',
splat_dim=64,
num_species=4
)
for batch in unlabeled_multi_view_data:
rendered, seg, splats = model(batch.frames, batch.camera_params)
loss = loss_function(
rendered,
batch.held_out_frame,
seg,
batch.segmentation_gt
)
loss.backward()
optimizer.step()
Step 3: Downstream Fine-tuning
model.vit_encoder.freeze()
pose_head = PoseEstimator(splat_dim=64)
pose_head.train(keypoint_annotations)
neural_decoder = NeuralDecoder(
input_dim=splat_dim,
output_dim=num_neurons
)
neural_decoder.train(behavioral_features, neural_activity)
encoding_score = evaluate_neural_encoding(
predicted, recorded, metric='r2_score'
)
Experimental Results
Novel View Synthesis
- Metric: PSNR, SSIM
- Performance: High-quality reconstruction from 4 views
Pose Estimation
- Output: Sparse keypoint trajectories
- Use: Standard behavioral analysis input
Neural Encoding
- Task: Relate 3D behavioral features to neural activity
- Performance: Significant correlation across 4 species
Applications
1. Behavioral Neuroscience
behavioral_features = beast3d.extract_features(multi_view_recording)
decoded_behavior = neural_decoder.inverse(activity_patterns)
2. Laboratory Automation
poses = beast3d.pose_estimator(unlabeled_video)
3. Neural Encoding Studies
encoding_model.fit(behavioral_features, neural_recordings)
predicted_activity = encoding_model.predict(novel_behavior_3d)
Key Advantages
- Self-Supervised: No manual pose annotation
- Sparse View: Works with 4 views (vs. dense general models)
- 3D Structure: Rich viewpoint-invariant features
- Multi-Task: View synthesis + pose + neural encoding
- Cross-Species: Generalizes across 4 animal species
Limitations
- Requires calibrated camera parameters
- Minimum 4 views needed
- Segment...[truncated]