Run any Skill in Manus with one click

spatiotemporal-tdann

Spatiotemporal Topographic Deep Artificial Neural Network (TDANN) methodology for modeling dorsal stream cortical self-organization. Extends TDANN to motion-sensitive MT area using 3D ResNet trained with MoCo self-supervised contrastive learning on naturalistic videos plus biologically inspired spatial loss. Spontaneously emerges brain-like direction maps and pinwheel structures. Use when: modeling visual cortex topography, self-organized cortical maps, spatiotemporal neural representations, dorsal stream modeling, MoCo-based neuroscience models. Activation: spatiotemporal tdann, MT direction maps, cortical self-organization, moco vision, topographic deep network, dorsal stream model, spatial loss neural network.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/hiyenwong/ai_collection --skill spatiotemporal-tdann

Copy and paste this command into Claude Code to install the skill

Source

hiyenwong/ai_collection

Stars1

Forks0

UpdatedJune 4, 2026 at 02:00

File Explorer

2 files

SKILL.md

readonly

More from this repository

same repository

attachment-representations-interbrain-synchrony

hiyenwong/ai_collection

Attachment representations in early childhood as independent endogenous driver of interbrain synchrony during remote cooperation. Novel Remote Partner-Belief Manipulation paradigm isolates attachment representations by manipulating partner-belief. EEG synchrony concentrated at P4 channel (right TPJ). Activation: attachment, interbrain synchrony, EEG hyperscanning, child-adult interaction, attachment representations, social neuroscience, partner-belief manipulation, early childhood, mother-child interaction, brain synchronization, attachment security, social-emotional development.

2026-06-041

sleep-replay-acceleration-sharp

hiyenwong/ai_collection

SHARP (Sleep-based Hierarchical Accelerated Replay) 方法论 — 睡眠启发的分层加速回放框架用于长程非平稳时序模式识别。受啮齿动物慢波睡眠中加速回放启发，通过分离记忆模块和模式识别模块实现无反向传播的长程信用分配。适用于流式时序学习、长程依赖建模、神经科学启发的 AI 架构。触发词：睡眠回放、加速回放、SHARP、时序学习、长程依赖、流式学习、慢波睡眠、hierarchical replay

2026-06-041

piston-control-two-ion-quantum

hiyenwong/ai_collection

Inverse-engineering methodology for piston operations in trapped-ion quantum devices. One ion serves as classical piston driven by Coulomb interaction with quantum-controlled ion. Stationary state determined self-consistently. Inverse-engineering protocols enable precise control of classical ion motion. Provides route toward controlled piston dynamics in microscopic quantum devices.

2026-06-041

quantum-fault-trees-minimal-cut

hiyenwong/ai_collection

Quantum fault tree analysis methodology using quantum computing. Extends classical reliability engineering fault trees to quantum domain. Identifies minimal cut sets in system reliability analysis using quantum algorithms. Applicable to safety-critical systems, cyber-physical systems, and quantum system reliability engineering.

2026-06-041

adaptive-hybrid-feature-fusion-medical

hiyenwong/ai_collection

Adaptive Hybrid Quantum-Classical Feature Fusion methodology for medical image classification. Addresses optimization asymmetries between quantum and classical paradigms using Temperature-Scaled Hybrid Fusion (TSHF), Dynamic Hybrid Fusion (DHF), and Static Hybrid Fusion (SHF) strategies. Use when designing hybrid quantum-classical ML pipelines for healthcare/medical imaging, especially when combining ResNet backbones with variational quantum circuits for diagnostic tasks.

2026-06-041

adaptive-spiking-neuron-asn

hiyenwong/ai_collection

Adaptive Spiking Neuron (ASN) methodology for vision and language modeling. Implements trainable membrane potential dynamics with adaptive firing mechanisms for efficient Spiking Neural Networks (SNNs). Activation: adaptive spiking neuron, ASN, spiking neural network vision language, SNN adaptive neuron, neuromorphic vision language model.

2026-06-041

Source

hiyenwong

hiyenwong/ai_collection

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

name

spatiotemporal-tdann

description

Spatiotemporal TDANN for Cortical Self-Organization

A 3D ResNet-based TDANN framework that, through spatiotemporal contrastive optimization on naturalistic videos with a biologically inspired spatial loss, spontaneously generates brain-like MT direction maps and pinwheel structures, unifying computational origins of ventral and dorsal streams.

Metadata

Source: arXiv:2605.11718
Authors: Zhaotian Gu, Molan Li, Jie Su, Chang Liu, Tianyi Qian, Dahui Wang
Published: 2026-05-12

Core Methodology

Key Innovation

Prior TDANN frameworks successfully modeled ventral stream (object recognition) spatial organization but left the dorsal stream (motion processing) unexplained. This work extends TDANN to MT (middle temporal) area by combining:

3D ResNet architecture for spatiotemporal feature extraction from video
MoCo (Momentum Contrast) self-supervised contrastive learning on naturalistic videos
Biologically inspired spatial loss that enforces topographic continuity
Dual optimization trade-off: task-driven discriminative pressure vs. spatial regularization

The model demonstrates that MT tuning properties (strong direction selectivity + residual axial component) emerge from this strict optimization trade-off, without requiring hand-coded direction-selective units.

Technical Framework

Architecture

3D ResNet backbone: Processes video frames (spatiotemporal convolutions) capturing motion dynamics
Contrastive head: MoCo-style projection for self-supervised learning
Topographic layer: 2D grid with spatial loss enforcing neighborhood similarity in both feature space and physical space

Training Paradigm: 6-Step Progressive Strategy

Direct spatial optimization on weight-sharing CNNs is highly unstable. The paper uses a six-step progressive training strategy that is essential to successful topography emergence:

Representation Pre-training: Train with only contrastive loss (L_contrast) to establish robust motion features — no spatial loss yet
Initial Position Initialization: Initialize unit positions based on biological feedforward hierarchy (retina → V1 → V2 → MT → LIP) to preserve coarse retinotopy
Iterative Position Pre-optimization: Rearrange unit positions on the simulated cortical sheet so units with correlated motion responses are placed closer together
Position Freezing: Lock positions permanently — this is critical, do not skip
Joint Fine-tuning: Fine-tune weights with both losses
Full Training: End-to-end training with composite objective

# Conceptual training loop (step 5-6 only, steps 1-4 are prerequisites)
total_loss = contrastive_loss(video_embeddings, moco_queue) \
           + lambda_spatial * spatial_regularization_loss(topographic_grid)

# MoCo: maintain momentum encoder for negative samples
# Spatial loss: nearby units in 2D grid should have similar features

Physiological Quantitative Alignment

The model's emergent representations quantitatively match in vivo macaque MT physiological baselines:

Direction Selectivity Index (DSI): Matches experimental measurements
Circular Variance: Consistent with biological recordings
Pinwheel Density: Macroscopic pinwheel density matches primate anatomy

The mechanism: MT tuning properties (strong direction selectivity + residual axial component) arise from a strict optimization trade-off. The network resolves the conflict by retaining an axial bimodal component rather than pursuing absolute unidirectional suppression. Pinwheels are not developmental artifacts — they are indispensable topological hubs providing optimal 360° directional coverage under wiring constraints.

Layer-to-Biological Mapping

Model Layer	Units	Cortical Area (mm²)	Neighborhood (mm)	Biological Area
Layer 2-3	200,704	5.7	0.047	Retina
Layer 4-5	100,352	1,180	2.7	V1
Layer 6	50,176	940	4.2	V2
Layer 7	50,176	50	2.1	MT (target)
Layer 8-9	25,088	56	2.7	LIP

Emergent Properties

Direction-selective maps: Neurons organized by preferred motion direction
Pinwheel structures: Topological singularities where all directions converge (matching biological MT density)
Direction selectivity index (DSI): Matches in vivo macaque MT physiological baselines
Circular variance: Consistent with biological measurements
Pinwheel density: Quantitatively matches primate cortex

Optimization Trade-Off Mechanism

The key insight is that MT tuning emerges from balancing two competing pressures:

Discriminative pressure: MoCo contrastive loss pushes representations to distinguish different motion patterns
Spatial regularization: Nearby cortical units must have similar receptive fields

This tension creates direction-selective maps as the optimal solution — strong selectivity for task performance, while maintaining spatial continuity.

Implementation Guide

Prerequisites

PyTorch
Naturalistic video dataset (e.g., Kinetics, Something-Something, or custom primate-relevant stimuli)
GPU for 3D ResNet training

Step-by-Step

Prepare video dataset: Collect naturalistic videos with diverse motion patterns
Build 3D ResNet: Standard architecture (e.g., R3D-18/R3D-50) with modified topographic output layer
Implement MoCo queue: Maintain momentum encoder and negative sample queue for contrastive learning

Design spatial loss:

def spatial_loss(features, grid_positions):
    """Nearby grid positions should have similar features"""
    # Compute pairwise distance in grid space
    grid_dist = pairwise_distance(grid_positions)
    # Compute pairwise similarity in feature space
    feat_sim = cosine_similarity(features)
    # Penalize feature dissimilarity for spatially close units
    return torch.sum(grid_dist * (1 - feat_sim))

Train with combined objective: L_total = L_contrastive + λ * L_spatial
Analyze emergent maps:
- Compute preferred direction for each unit
- Identify pinwheel centers (singularities in direction preference map)
- Calculate DSI, circular variance, pinwheel density
Validate against biological baselines: Compare to macaque MT electrophysiology data

Code Example

import torch
import torch.nn as nn
from torchvision.models.video import r3d_18

class SpatiotemporalTDANN(nn.Module):
    def __init__(self, grid_size=32, feature_dim=512):
        super().__init__()
        self.backbone = r3d_18(pretrained=False)
        self.backbone.fc = nn.Linear(512, feature_dim)
        self.grid_size = grid_size
        self.feature_dim = feature_dim
        
        # Topographic projection layer
        self.topographic_map = nn.Parameter(
            torch.randn(grid_size, grid_size, feature_dim)
        )
    
    def forward(self, video):
        # video: (B, C, T, H, W)
        features = self.backbone(video)  # (B, feature_dim)
        return features
    
    def spatial_regularization(self):
        """Enforce smooth topographic organization"""
        grid = self.topographic_map.view(-1, self.feature_dim)
        # Penalize discontinuities between neighbors
        loss = 0
        for i in range(self.grid_size):
            for j in range(self.grid_size):
                current = self.topographic_map[i, j]
                if i > 0:
                    loss += (1 - cos_sim(current, self.topographic_map[i-1, j]))
                if j > 0:
                    loss += (1 - cos_sim(current, self.topographic_map[i, j-1]))
        return loss / (self.grid_size ** 2)

Applications

Dorsal stream modeling: Study motion processing hierarchy in visual cortex
Cortical self-organization: Understand how topographic maps emerge from learning rules
Neuromorphic vision: Bio-inspired motion detection for event-based cameras
Computational neuroscience: Unified framework for ventral + dorsal stream organization
Visual AI: More robust motion understanding through biologically grounded representations

Pitfalls

3D ResNet memory: Video processing is memory-intensive; use smaller batch sizes or gradient accumulation
Position initialization matters: Unit positions must be initialized based on biological hierarchy before spatial optimization. Random initialization leads to suboptimal topography.
Weight-sharing instability: Direct spatial optimization on weight-sharing CNNs is highly unstable. The progressive 6-step training strategy is essential — do not skip pre-training steps.
MoCo queue size: Large queues improve contrastive quality but require significant memory
Evaluation complexity: Pinwheel detection requires specialized algorithms for topological singularity identification
Naturalistic data: Synthetic motion stimuli may not produce the same emergent properties as real-world videos

Related Skills

kuramoto-oscillatory-phase-encoding (neuro-inspired vision)
eeg-structure-guided-diffusion (structure-guided neural modeling)
brain-inspired-attention-mechanisms (brain-inspired vision)
trace-eeg-autoregressive-routing (autoregressive MoE for EEG, different domain)

References

references/paper-detail-2605.11718.md — Spatial loss formula, MoCo loss, mechanistic insights, and limitations