Run any Skill in Manus with one click

tribe-v2-foundation-model

TRIBE v2 tri-modal foundation model methodology for in-silico neuroscience. Uses video/audio/language embeddings to predict whole-brain fMRI across 720 subjects.

Run Skill in Manus

Overview

TRIBE v2 tri-modal foundation model methodology for in-silico neuroscience. Uses video/audio/language embeddings to predict whole-brain fMRI across 720 subjects.

Install command

npx skills add https://github.com/hiyenwong/ai_collection --skill tribe-v2-foundation-model

Copy and paste this command into Claude Code to install the skill

Source

hiyenwong/ai_collection

Stars1

Forks0

UpdatedJune 4, 2026 at 02:00

File Explorer

2 files

SKILL.md

readonly

name	tribe-v2-foundation-model
category	ai_collection
description	TRIBE v2 tri-modal foundation model methodology for in-silico neuroscience. Uses video/audio/language embeddings to predict whole-brain fMRI across 720 subjects.
created	"2026-05-09T00:00:00.000Z"
updated	"2026-05-09T00:00:00.000Z"
source	arXiv 2605.04326

TRIBE v2: Tri-Modal Foundation Model for In-Silico Neuroscience

Overview

TRIBE v2 is a foundation model that predicts high-resolution fMRI brain activity from video, audio, and text stimuli. Built on a unified dataset of 1000+ hours of fMRI across 720 subjects, it enables zero-shot generalization to novel stimuli, tasks, and subjects.

arXiv: 2605.04326 (May 2026) Authors: d'Ascoli, Rapin, Benchetrit, Brooks, Begany, Raugel, Banville, King (FAIR at Meta) Code: https://github.com/facebookresearch/tribev2 Weights: https://huggingface.co/facebook/tribev2

Core Methodology

Architecture

Transformer encoder with modality dropout and subject-specific blocks
Tri-modal input: video (ImageBind/Vision), audio (Wav2Vec), language (LLaMA) embeddings
Unseen subject prediction: learns population-level priors for zero-shot subject generalization

Training Pipeline

Feature Extraction: Pre-trained AI models extract embeddings per modality
Modality Dropout: Randomly drop modalities during training for robustness
Subject Blocks: Per-subject linear adapters capture individual variability
High-Resolution fMRI: Predicts at vertex-level resolution (not just ROI averages)

Four Essential Criteria

Integration: Whole-brain responses across diverse experimental conditions
Performance: Exceeds traditional linear encoding models
Generalization: Zero-shot to novel stimuli, tasks, and subjects
Interpretability: Decomposes cognitive function organization

Key Findings

Encoding Performance

Accurately predicts cortical AND subcortical responses across naturalistic and experimental conditions
Several-fold improvements over classic linear baselines
Scaling laws: performance increases with more training hours per subject

Zero-Shot Generalization

Generalizes to new subjects without fine-tuning
Generalizes to new tasks/paradigms not seen during training
Fine-tuning on half a subject's data significantly improves individualized predictions

In-Silico Experiments

Visual: Recovers face/body selectivity (FFA, EBA), object-selective cortex findings
Language: Recovers sentence > word, social > non-social, emotional > neutral contrasts
Agreement between predicted and actual z-scores across 360 HCP parcels

Interpretability (ICA)

Independent Component Analysis reveals neuroscientifically relevant patterns
Components correlate with known functional networks (visual, language, default mode, etc.)

Multimodal Integration

Text-only, audio-only, video-only ablations reveal modality-specific regions
Bimodal/multimodal integration areas identified via RGB cortical mapping
Reveals fine-grained topography of multisensory integration

Dataset Scale

Dataset	Mode	Subjects	fMRI (h)	Purpose
CNeuroMod	A+V+T	4	268.7	Train (deep)
BoldMoments	A+V	10	61.9	Train
Lebel2023	A+T	8	85.8	Train
Wen2017	V	3	35.2	Train
NNDb	A+V+T	86	160.6	Test (wide)
LPP	A+T	112	180.2	Test
Narratives	A+T	321	146.6	Test
HCP	A+V+T	176	178.7	Test (7T)

Total: 720 subjects, 5094 sessions, 1117.7 hours fMRI

Application Triggers

Use this skill when:

Building brain encoding/decoding models
Designing in-silico neuroscience experiments
Working with fMRI prediction tasks
Studying multimodal brain integration
Analyzing zero-shot brain response generalization
Comparing foundation models to linear baselines

Key Concepts

Foundation model for neuroscience
Tri-modal (video/audio/text) brain encoding
In-silico hypothesis testing
Zero-shot subject generalization
Fine-grained cortical parcellation (HCP-MMP1)
Independent Component Analysis (ICA) for interpretability
Modality dropout for robustness
Scaling laws in brain encoding

Limitations

fMRI temporal resolution limits (slow hemodynamic response)
Primarily trained on healthy adult subjects
In-silico experiments approximate but don't replace empirical validation
3T vs 7T resolution differences across datasets

Related Skills

brain-dit-fmri-foundation-model-v6
meta-learning-in-context-brain-decoding-v5
multimodal-brain-connectivity-gnn
geometric-brain-dynamics-mapping-v7

More from this repository

same repository

attachment-representations-interbrain-synchrony

hiyenwong/ai_collection

Attachment representations in early childhood as independent endogenous driver of interbrain synchrony during remote cooperation. Novel Remote Partner-Belief Manipulation paradigm isolates attachment representations by manipulating partner-belief. EEG synchrony concentrated at P4 channel (right TPJ). Activation: attachment, interbrain synchrony, EEG hyperscanning, child-adult interaction, attachment representations, social neuroscience, partner-belief manipulation, early childhood, mother-child interaction, brain synchronization, attachment security, social-emotional development.

2026-06-041

sleep-replay-acceleration-sharp

hiyenwong/ai_collection

SHARP (Sleep-based Hierarchical Accelerated Replay) 方法论 — 睡眠启发的分层加速回放框架用于长程非平稳时序模式识别。受啮齿动物慢波睡眠中加速回放启发，通过分离记忆模块和模式识别模块实现无反向传播的长程信用分配。适用于流式时序学习、长程依赖建模、神经科学启发的 AI 架构。触发词：睡眠回放、加速回放、SHARP、时序学习、长程依赖、流式学习、慢波睡眠、hierarchical replay

2026-06-041

piston-control-two-ion-quantum

hiyenwong/ai_collection

Inverse-engineering methodology for piston operations in trapped-ion quantum devices. One ion serves as classical piston driven by Coulomb interaction with quantum-controlled ion. Stationary state determined self-consistently. Inverse-engineering protocols enable precise control of classical ion motion. Provides route toward controlled piston dynamics in microscopic quantum devices.

2026-06-041

quantum-fault-trees-minimal-cut

hiyenwong/ai_collection

Quantum fault tree analysis methodology using quantum computing. Extends classical reliability engineering fault trees to quantum domain. Identifies minimal cut sets in system reliability analysis using quantum algorithms. Applicable to safety-critical systems, cyber-physical systems, and quantum system reliability engineering.

2026-06-041

adaptive-hybrid-feature-fusion-medical

hiyenwong/ai_collection

Adaptive Hybrid Quantum-Classical Feature Fusion methodology for medical image classification. Addresses optimization asymmetries between quantum and classical paradigms using Temperature-Scaled Hybrid Fusion (TSHF), Dynamic Hybrid Fusion (DHF), and Static Hybrid Fusion (SHF) strategies. Use when designing hybrid quantum-classical ML pipelines for healthcare/medical imaging, especially when combining ResNet backbones with variational quantum circuits for diagnostic tasks.

2026-06-041

adaptive-spiking-neuron-asn

hiyenwong/ai_collection

Adaptive Spiking Neuron (ASN) methodology for vision and language modeling. Implements trainable membrane potential dynamics with adaptive firing mechanisms for efficient Spiking Neural Networks (SNNs). Activation: adaptive spiking neuron, ASN, spiking neural network vision language, SNN adaptive neuron, neuromorphic vision language model.

2026-06-041

Source

hiyenwong

hiyenwong/ai_collection

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Biological Scientists, All OtherLife, Physical, and Social Science Occupations19-1029L4

name	tribe-v2-foundation-model
category	ai_collection
description	TRIBE v2 tri-modal foundation model methodology for in-silico neuroscience. Uses video/audio/language embeddings to predict whole-brain fMRI across 720 subjects.
created	"2026-05-09T00:00:00.000Z"
updated	"2026-05-09T00:00:00.000Z"
source	arXiv 2605.04326

TRIBE v2: Tri-Modal Foundation Model for In-Silico Neuroscience

Overview

Core Methodology

Architecture

Transformer encoder with modality dropout and subject-specific blocks
Tri-modal input: video (ImageBind/Vision), audio (Wav2Vec), language (LLaMA) embeddings
Unseen subject prediction: learns population-level priors for zero-shot subject generalization

Training Pipeline

Feature Extraction: Pre-trained AI models extract embeddings per modality
Modality Dropout: Randomly drop modalities during training for robustness
Subject Blocks: Per-subject linear adapters capture individual variability
High-Resolution fMRI: Predicts at vertex-level resolution (not just ROI averages)

Four Essential Criteria

Integration: Whole-brain responses across diverse experimental conditions
Performance: Exceeds traditional linear encoding models
Generalization: Zero-shot to novel stimuli, tasks, and subjects
Interpretability: Decomposes cognitive function organization

Key Findings

Encoding Performance

Accurately predicts cortical AND subcortical responses across naturalistic and experimental conditions
Several-fold improvements over classic linear baselines
Scaling laws: performance increases with more training hours per subject

Zero-Shot Generalization

Generalizes to new subjects without fine-tuning
Generalizes to new tasks/paradigms not seen during training
Fine-tuning on half a subject's data significantly improves individualized predictions

In-Silico Experiments

Visual: Recovers face/body selectivity (FFA, EBA), object-selective cortex findings
Language: Recovers sentence > word, social > non-social, emotional > neutral contrasts
Agreement between predicted and actual z-scores across 360 HCP parcels

Interpretability (ICA)

Independent Component Analysis reveals neuroscientifically relevant patterns
Components correlate with known functional networks (visual, language, default mode, etc.)

Multimodal Integration

Text-only, audio-only, video-only ablations reveal modality-specific regions
Bimodal/multimodal integration areas identified via RGB cortical mapping
Reveals fine-grained topography of multisensory integration

Dataset Scale

Dataset	Mode	Subjects	fMRI (h)	Purpose
CNeuroMod	A+V+T	4	268.7	Train (deep)
BoldMoments	A+V	10	61.9	Train
Lebel2023	A+T	8	85.8	Train
Wen2017	V	3	35.2	Train
NNDb	A+V+T	86	160.6	Test (wide)
LPP	A+T	112	180.2	Test
Narratives	A+T	321	146.6	Test
HCP	A+V+T	176	178.7	Test (7T)

Total: 720 subjects, 5094 sessions, 1117.7 hours fMRI

Application Triggers

Use this skill when:

Building brain encoding/decoding models
Designing in-silico neuroscience experiments
Working with fMRI prediction tasks
Studying multimodal brain integration
Analyzing zero-shot brain response generalization
Comparing foundation models to linear baselines

Key Concepts

Foundation model for neuroscience
Tri-modal (video/audio/text) brain encoding
In-silico hypothesis testing
Zero-shot subject generalization
Fine-grained cortical parcellation (HCP-MMP1)
Independent Component Analysis (ICA) for interpretability
Modality dropout for robustness
Scaling laws in brain encoding

Limitations

fMRI temporal resolution limits (slow hemodynamic response)
Primarily trained on healthy adult subjects
In-silico experiments approximate but don't replace empirical validation
3T vs 7T resolution differences across datasets

Related Skills

brain-dit-fmri-foundation-model-v6
meta-learning-in-context-brain-decoding-v5
multimodal-brain-connectivity-gnn
geometric-brain-dynamics-mapping-v7