Run any Skill in Manus with one click

neural-code-language-interpretability

Natural language hypothesis generation and verification for single-neuron selectivity. Combines vision-language models, neural digital twins, and text-to-image generation to automatically characterize what individual neurons encode across the visual hierarchy. Use for neuron interpretability, automated neuroscience discovery, digital twin validation, language-based neural characterization, closed-loop hypothesis testing.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/hiyenwong/ai_collection --skill neural-code-language-interpretability

Copy and paste this command into Claude Code to install the skill

Source

hiyenwong/ai_collection

Stars1

Forks0

UpdatedJune 4, 2026 at 02:00

SKILL.md

readonly

name

neural-code-language-interpretability

description

Neural Code Language Interpretability (NCLI)

Automated characterization of single-neuron selectivity through natural language, combining digital twins, vision-language models, and generative testing in a closed-loop framework.

Paper: "Letting the Neural Code Speak: Automated Characterization of Monkey Visual Neurons through Human Language" Authors: Vedang Lad, Katrin Franke, Tamar Rott Shaham, Surya Ganguli, Andreas S. Tolias, Sophia Sanborn, Nikos Karantzas arXiv: 2605.12485 (May 12, 2026)

Core Insight

Natural language can serve as a compact, interpretable coordinate system for neural selectivity — replacing mathematical models (like Gabor functions in V1) with semantic descriptions that generalize across the visual hierarchy.

Three-Stage Framework

Stage 1: Translate (Image → Text)

Convert images into detailed, reconstruction-faithful captions:

Input: Natural images (grayscale for V1, RGB for V4)
Model: Gemini 3.0 Pro (dense captioning)
Key instruction: Prioritize visually grounded details sufficient for reconstruction (colors, textures, spatial relations, lighting) — NOT semantic summarization
Validation: Regenerate images from captions using text-to-image model (Imagen 4.0), measure cosine similarity in DINOv3 embedding space
Quality check: Matched reconstruction similarity >> random pairing similarity

Stage 2: Semantic Hypothesis (Text → Hypothesis)

Distill extreme responses into concise, testable descriptions:

Use functional digital twin (neuron-specific readout on pretrained CNN backbone) to screen ~1.2M ImageNet images
Identify Most Activating Images (MAIs) and Least Activating Images (LAIs) for each neuron
Select top/bottom 15 images per neuron
Feed their dense captions to LLM to produce:
- Excitatory hypothesis: What features drive the neuron
- Suppressive hypothesis: What features suppress the neuron (only for non-sparse neurons with baseline activity)
- Dimensions of invariance
- Spatial properties (orientation, scale)
Sparsity threshold: Neurons with response distribution skewness < 2 are "non-sparse" (suppressible)
Results span visual hierarchy: V1 = oriented edges, spatial frequency; V4 = conjunctions of form, color, texture

Stage 3: Verification (Hypothesis → Image → Response)

Closed-loop generative testing:

LLM expands hypothesis into multiple diverse text prompts
Text-to-image model synthesizes novel stimuli from prompts
Digital twin predicts neural responses to generated images
Spatial optimization: Affine transformations (rotation, scale, translation) to find best receptive field alignment
Control: Same spatial optimization on random images (controls for spatial-only effects)

Success criterion: Generated images drive neuron to extreme percentiles of natural image response distribution

Key Results

Metric	V1	V4
Activating hypothesis success rate (>95th percentile)	>96%	>96%
Suppressing hypothesis success rate (<5th percentile)	56%	97.6%
Random image baseline	~10%	~10%

V4 suppression is highly describable in language (97.6%), V1 suppression is not (56%)
Suggests linguistic expressibility limits for sub-lexical properties in early visual cortex
Representation Similarity Analysis (RSA) shows partial alignment between: neural activity ↔ DINOv3 visual embeddings ↔ Qwen 0.6B language embeddings
Vision embeddings most aligned with neural activity
Information lost in text bottleneck is recovered when hypotheses are rendered back to images → linguistic compression is lossy but semantically faithful

Implementation Details

Digital Twin Architecture

Neuron-specific readout layers on top of pretrained CNN backbone
Trained on single-neuron spiking data from natural image fixation experiments
Predicts responses with high accuracy → enables in-silico screening at scale

Image Dataset

1.2 million ImageNet images
V1: grayscale stimuli (experimental protocol)
V4: full-color stimuli
Screened through digital twin to identify extreme-response sets

Language Models Used

Captioning: Gemini 3.0 Pro
Hypothesis generation: Gemini 3.0 Pro (single-pass, no iterative refinement against neural data)
Text-to-image: Imagen 4.0
Language embeddings for RSA: Qwen 0.6B

Spatial Optimization

Systematic search over: rotation, scale, translation
Two-stage: semantic generation → spatial optimization
Separates contribution of feature content from spatial arrangement

Applications

For Neuroscience Research

Automated neuron characterization at scale (438 V1 + 205 V4 neurons demonstrated)
Hypothesis generation for higher visual areas where no mathematical framework exists
Population-level analysis of neural code geometry across modalities
Bridging predictive accuracy and mechanistic understanding

For AI Interpretability

Extends to artificial neural networks (CNNs, ViTs)
Closed-loop validation distinguishes from correlational methods
Language as a universal coordinate system for unit selectivity

For Agentic Scientific Discovery

Framework enables autonomous hypothesis generation and testing
Combines generative models with neural digital twins
Moves toward automated, interpretable neuroscience

Limitations

Single-pass hypothesis generation: Not iteratively refined against neural responses
Language expressibility limits: V1 suppression poorly captured (56% vs 97.6% in V4)
Weaker criterion than equivalence: Success means extreme tail activation, not full equivalence to original MAIs/LAIs
Spatial positioning: Vision-language models unreliable at specifying feature location/scale/orientation in receptive field
Requires high-quality digital twin: Prediction accuracy is prerequisite
Preprint status: Not yet peer-reviewed

Related Methods

Network Dissection (Bau et al., 2017): Manual concept labeling
CLIP-based neuron labeling: Correlational, not closed-loop
Activation maximization: Pixel-space, not interpretable
LLM-based LLM unit description (Bills et al., 2023): For language model internals
Agentically-driven hypothesis generation (Rott Shaham et al., 2024): Extended here to biological neurons

Activation Keywords

neural code interpretability
neuron characterization
digital twin neuroscience
language-based neural analysis
closed-loop hypothesis testing
automated neuroscience
semantic hypothesis generation
visual cortex neuron selectivity
V1 V4 neuron encoding
neural digital twin
language-brain alignment
representation similarity analysis

Neural Code Language Interpretability (NCLI)

Automated characterization of single-neuron selectivity through natural language, combining digital twins, vision-language models, and generative testing in a closed-loop framework.

Core Insight

Three-Stage Framework

Stage 1: Translate (Image → Text)

Convert images into detailed, reconstruction-faithful captions:

Input: Natural images (grayscale for V1, RGB for V4)
Model: Gemini 3.0 Pro (dense captioning)
Key instruction: Prioritize visually grounded details sufficient for reconstruction (colors, textures, spatial relations, lighting) — NOT semantic summarization
Validation: Regenerate images from captions using text-to-image model (Imagen 4.0), measure cosine similarity in DINOv3 embedding space
Quality check: Matched reconstruction similarity >> random pairing similarity

Stage 2: Semantic Hypothesis (Text → Hypothesis)

Distill extreme responses into concise, testable descriptions:

Use functional digital twin (neuron-specific readout on pretrained CNN backbone) to screen ~1.2M ImageNet images
Identify Most Activating Images (MAIs) and Least Activating Images (LAIs) for each neuron
Select top/bottom 15 images per neuron
Feed their dense captions to LLM to produce:
- Excitatory hypothesis: What features drive the neuron
- Suppressive hypothesis: What features suppress the neuron (only for non-sparse neurons with baseline activity)
- Dimensions of invariance
- Spatial properties (orientation, scale)
Sparsity threshold: Neurons with response distribution skewness < 2 are "non-sparse" (suppressible)
Results span visual hierarchy: V1 = oriented edges, spatial frequency; V4 = conjunctions of form, color, texture

Stage 3: Verification (Hypothesis → Image → Response)

Closed-loop generative testing:

LLM expands hypothesis into multiple diverse text prompts
Text-to-image model synthesizes novel stimuli from prompts
Digital twin predicts neural responses to generated images
Spatial optimization: Affine transformations (rotation, scale, translation) to find best receptive field alignment
Control: Same spatial optimization on random images (controls for spatial-only effects)

Success criterion: Generated images drive neuron to extreme percentiles of natural image response distribution

Key Results

Metric	V1	V4
Activating hypothesis success rate (>95th percentile)	>96%	>96%
Suppressing hypothesis success rate (<5th percentile)	56%	97.6%
Random image baseline	~10%	~10%

V4 suppression is highly describable in language (97.6%), V1 suppression is not (56%)
Suggests linguistic expressibility limits for sub-lexical properties in early visual cortex
Representation Similarity Analysis (RSA) shows partial alignment between: neural activity ↔ DINOv3 visual embeddings ↔ Qwen 0.6B language embeddings
Vision embeddings most aligned with neural activity
Information lost in text bottleneck is recovered when hypotheses are rendered back to images → linguistic compression is lossy but semantically faithful

Implementation Details

Digital Twin Architecture

Neuron-specific readout layers on top of pretrained CNN backbone
Trained on single-neuron spiking data from natural image fixation experiments
Predicts responses with high accuracy → enables in-silico screening at scale

Image Dataset

1.2 million ImageNet images
V1: grayscale stimuli (experimental protocol)
V4: full-color stimuli
Screened through digital twin to identify extreme-response sets

Language Models Used

Captioning: Gemini 3.0 Pro
Hypothesis generation: Gemini 3.0 Pro (single-pass, no iterative refinement against neural data)
Text-to-image: Imagen 4.0
Language embeddings for RSA: Qwen 0.6B

Spatial Optimization

Systematic search over: rotation, scale, translation
Two-stage: semantic generation → spatial optimization
Separates contribution of feature content from spatial arrangement

Applications

For Neuroscience Research

Automated neuron characterization at scale (438 V1 + 205 V4 neurons demonstrated)
Hypothesis generation for higher visual areas where no mathematical framework exists
Population-level analysis of neural code geometry across modalities
Bridging predictive accuracy and mechanistic understanding

For AI Interpretability

Extends to artificial neural networks (CNNs, ViTs)
Closed-loop validation distinguishes from correlational methods
Language as a universal coordinate system for unit selectivity

For Agentic Scientific Discovery

Framework enables autonomous hypothesis generation and testing
Combines generative models with neural digital twins
Moves toward automated, interpretable neuroscience

Limitations

Single-pass hypothesis generation: Not iteratively refined against neural responses
Language expressibility limits: V1 suppression poorly captured (56% vs 97.6% in V4)
Weaker criterion than equivalence: Success means extreme tail activation, not full equivalence to original MAIs/LAIs
Spatial positioning: Vision-language models unreliable at specifying feature location/scale/orientation in receptive field
Requires high-quality digital twin: Prediction accuracy is prerequisite
Preprint status: Not yet peer-reviewed

Related Methods

Network Dissection (Bau et al., 2017): Manual concept labeling
CLIP-based neuron labeling: Correlational, not closed-loop
Activation maximization: Pixel-space, not interpretable
LLM-based LLM unit description (Bills et al., 2023): For language model internals
Agentically-driven hypothesis generation (Rott Shaham et al., 2024): Extended here to biological neurons

Activation Keywords

neural code interpretability
neuron characterization
digital twin neuroscience
language-based neural analysis
closed-loop hypothesis testing
automated neuroscience
semantic hypothesis generation
visual cortex neuron selectivity
V1 V4 neuron encoding
neural digital twin
language-brain alignment
representation similarity analysis

neural-code-language-interpretability

Neural Code Language Interpretability (NCLI)

Core Insight

Three-Stage Framework

Stage 1: Translate (Image → Text)

Stage 2: Semantic Hypothesis (Text → Hypothesis)

Stage 3: Verification (Hypothesis → Image → Response)

Key Results

Implementation Details

Digital Twin Architecture

Image Dataset

Language Models Used

Spatial Optimization

Applications

For Neuroscience Research

For AI Interpretability

For Agentic Scientific Discovery

Limitations

Related Methods

Activation Keywords

More from this repository

Neural Code Language Interpretability (NCLI)

Core Insight

Three-Stage Framework

Stage 1: Translate (Image → Text)

Stage 2: Semantic Hypothesis (Text → Hypothesis)

Stage 3: Verification (Hypothesis → Image → Response)

Key Results

Implementation Details

Digital Twin Architecture

Image Dataset

Language Models Used

Spatial Optimization

Applications

For Neuroscience Research

For AI Interpretability

For Agentic Scientific Discovery

Limitations

Related Methods

Activation Keywords

More from this repository