| name | dina-v1-population-activity-interpretation |
| description | DINA (Dual-Tower Image-Neural Alignment) framework for interpretable contrastive analysis of V1 population activity. Aligns visual stimuli and V1 responses in shared latent space at intermediate feature map level. Activation: DINA, V1 population activity, image-neural alignment, contrastive framework, calcium imaging decoding, visual computation. |
DINA: Dual-Tower Image-Neural Alignment for V1 Population Activity Interpretation
An interpretable contrastive framework that jointly trains dual-tower architecture aligning visual stimuli and V1 population responses in shared latent space, enabling both accurate decoding and direct access to interpretable feature maps.
Metadata
- Source: arXiv:2605.04309
- Authors: Xin Wang, Zhuangzhi Gao, Hongyi Qin, Zhongli Wu, Feixiang Zhou, He Zhao
- Published: 2026-05-05
- Domain: Computational Neuroscience + Visual Processing
Core Methodology
Key Innovation
Traditional alignment-based approaches improve decoding accuracy from brain activity but provide limited insight into the neural computations underlying these improvements. DINA addresses this gap by training a dual-tower architecture that aligns visual stimuli and V1 population responses at the level of intermediate feature maps (not just final representations), enabling both accurate decoding AND direct access to interpretable computational mechanisms.
Technical Framework
-
Dual-Tower Architecture:
- Image Tower: Processes visual stimuli through hierarchical feature extraction
- Neural Tower: Processes V1 population activity through parallel architecture
- Both towers project into a shared latent space at intermediate feature map level
-
Contrastive Alignment:
- Positive pairs: (image, corresponding V1 response) are pulled together
- Negative pairs: mismatched image-response pairs are pushed apart
- Alignment occurs at multiple levels of the feature hierarchy
-
Interpretability Mechanism:
- Access to intermediate feature maps reveals which visual features drive neural responses
- Enables analysis of spatial regions contributing to alignment
- Identifies sparse subsets of strongly responsive neurons
Key Findings (Mouse V1 Two-Photon Calcium Imaging)
- Decoding performance primarily supported by coarse, low-level visual structure (not semantic category or fine-grained details)
- Alignable feature maps emerge from multiple spatially distributed image regions
- Both shape and texture cues captured by alignable features
- Features predominantly reconstructed by sparse subsets of strongly responsive neurons and their functional interactions
Implementation Guide
Prerequisites
- Large-scale two-photon calcium imaging data from V1
- Corresponding visual stimuli (natural images)
- PyTorch or similar deep learning framework
- GPU for contrastive training
Step-by-Step
- Preprocess Neural Data: Denoise and normalize calcium traces, extract population activity vectors
- Preprocess Visual Data: Extract multi-scale visual features (edges, textures, shapes)
- Build Dual Towers: Design parallel architectures for image and neural processing
- Define Contrastive Loss: InfoNCE or similar contrastive objective for paired alignment
- Train Jointly: Optimize both towers simultaneously with shared latent space projection
- Extract Feature Maps: Access intermediate representations for interpretability analysis
- Analyze Spatial Contributions: Map which image regions drive neural alignment
- Identify Key Neurons: Find sparse subsets of neurons most responsible for alignment
Code Sketch
import torch
import torch.nn as nn
import torch.nn.functional as F
class DINATower(nn.Module):
"""Dual-tower architecture for image-neural alignment."""
def __init__(self, image_dim, neural_dim, latent_dim):
super().__init__()
self.image_tower = nn.Sequential(
nn.Linear(image_dim, 512),
nn.ReLU(),
nn.Linear(512, 256),
nn.ReLU(),
nn.Linear(256, latent_dim)
)
self.neural_tower = nn.Sequential(
nn.Linear(neural_dim, 512),
nn.ReLU(),
nn.Linear(512, 256),
nn.ReLU(),
nn.Linear(256, latent_dim)
)
def forward(self, images, neural_responses):
img_features = self.image_tower(images)
neural_features = self.neural_tower(neural_responses)
return img_features, neural_features
def contrastive_loss(self, img_features, neural_features, temperature=0.07):
similarities = torch.matmul(img_features, neural_features.T) / temperature
labels = torch.arange(len(img_features), device=img_features.device)
return F.cross_entropy(similarities, labels)
Applications
- Visual Neuroscience: Understanding computational mechanisms in primary visual cortex
- Brain-Computer Interfaces: Improved decoding of visual stimuli from neural activity
- Model Validation: Testing whether artificial vision models capture biological computation
- Cross-Species Comparison: Comparing V1 computation across species using shared alignment framework
- Feature Visualization: Identifying which visual features drive neural population responses
Pitfalls
- Requires large-scale neural datasets (small datasets may not capture population dynamics)
- Contrastive alignment may capture superficial correlations rather than causal mechanisms
- Feature map interpretability depends on tower architecture design choices
- Mouse V1 findings may not directly generalize to primate/human visual cortex
- Two-photon calcium imaging has temporal resolution limits compared to electrophysiology
Related Skills
- primary-visual-cortex-v1-functions
- neural-encoding-evaluation-ground-truth
- connectome-constrained-neural-network
- eeg-visual-attention-decoding