| name | nnsight-remote-interpretability |
| description | Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or when working with any PyTorch architecture. |
| version | 1.0.0 |
| author | Orchestra Research |
| license | MIT |
| tags | ["nnsight","NDIF","Remote Execution","Mechanistic Interpretability","Model Internals"] |
| dependencies | ["nnsight>=0.5.0","torch>=2.0.0"] |
nnsight: Transparent Access to Neural Network Internals
nnsight (/ɛn.saɪt/) enables researchers to interpret and manipulate the internals of any PyTorch model, with the unique capability of running the same code locally on small models or remotely on massive models (70B+) via NDIF.
GitHub: ndif-team/nnsight (730+ stars)
Paper: NNsight and NDIF: Democratizing Access to Foundation Model Internals (ICLR 2025)
Key Value Proposition
Write once, run anywhere: The same interpretability code works on GPT-2 locally or Llama-3.1-405B remotely. Just toggle remote=True.
with model.trace("Hello world"):
hidden = model.transformer.h[5].output[0].save()
with model.trace("Hello world", remote=True):
hidden = model.model.layers[40].output[0].save()
When to Use nnsight
Use nnsight when you need to:
- Run interpretability experiments on models too large for local GPUs (70B, 405B)
- Work with any PyTorch architecture (transformers, Mamba, custom models)
- Perform multi-token generation interventions
- Share activations between different prompts
- Access full model internals without reimplementation
Consider alternatives when:
- You want consistent API across models → Use TransformerLens
- You need declarative, shareable interventions → Use pyvene
- You're training SAEs → Use SAELens
- You only work with small models locally → TransformerLens may be simpler
Installation
pip install nnsight
pip install "nnsight[vllm]"
For remote NDIF execution, sign up at login.ndif.us for an API key.
Core Concepts
LanguageModel Wrapper
from nnsight import LanguageModel
model = LanguageModel("openai-community/gpt2", device_map="auto")
model = LanguageModel("meta-llama/Llama-3.1-8B", device_map="auto")
Tracing Context
The trace context manager enables deferred execution - operations are collected into a computation graph:
from nnsight import LanguageModel
model = LanguageModel("gpt2", device_map="auto")
with model.trace("The Eiffel Tower is in") as tracer:
hidden_states = model.transformer.h[5].output[0].save()
attn = model.transformer.h[5].attn.attn_dropout.input[0][0].save()
model.transformer.h[8].output[0][:] = 0
logits = model.output.save()
print(hidden_states.shape)
Proxy Objects
Inside trace, module accesses return Proxy objects that record operations:
with model.trace("Hello"):
h5_out = model.transformer.h[5].output[0]
h5_mean = h5_out.mean(dim=-1)
h5_saved = h5_mean.save()
Workflow 1: Activation Analysis
Step-by-Step
from nnsight import LanguageModel
import torch
model = LanguageModel("gpt2", device_map="auto")
prompt = "The capital of France is"
with model.trace(prompt) as tracer:
layer_outputs = []
for i in range(12):
layer_out = model.transformer.h[i].output[0].save()
layer_outputs.append(layer_out)
attn_patterns = []
for i in range(12):
attn = model.transformer.h[i].attn.attn_dropout.input[0][0].save()
attn_patterns.append(attn)
logits = model.output.save()
for i, layer_out in enumerate(layer_outputs):
print(f"Layer {i} output shape: {layer_out.shape}")
print(f"Layer {i} norm: {layer_out.norm().item():.3f}")
probs = torch.softmax(logits[0, -1], dim=-1)
top_tokens = probs.topk(5)
for token, prob in zip(top_tokens.indices, top_tokens.values):
print(f"{model.tokenizer.decode(token)}: {prob.item():.3f}")
Checklist
Workflow 2: Activation Patching
Step-by-Step
from nnsight import LanguageModel
import torch
model = LanguageModel("gpt2", device_map="auto")
clean_prompt = "The Eiffel Tower is in"
corrupted_prompt = "The Colosseum is in"
with model.trace(clean_prompt) as tracer:
clean_hidden = model.transformer.h[8].output[0].save()
with model.trace(corrupted_prompt) as tracer:
model.transformer.h[8].output[0][:] = clean_hidden
patched_logits = model.output.save()
paris_token = model.tokenizer.encode(" Paris")[0]
rome_token = model.tokenizer.encode(" Rome")[0]
patched_probs = torch.softmax(patched_logits[0, -1], dim=-1)
print(f"Paris prob: {patched_probs[paris_token].item():.3f}")
print(f"Rome prob: {patched_probs[rome_token].item():.3f}")
Systematic Patching Sweep
def patch_layer_position(layer, position, clean_cache, corrupted_prompt):
"""Patch single layer/position from clean to corrupted."""
with model.trace(corrupted_prompt) as tracer:
current = model.transformer.h[layer].output[0]
current[:, position, :] = clean_cache[layer][:, position, :]
logits = model.output.save()
return logits
results = torch.zeros(12, seq_len)
for layer in range(12):
for pos in range(seq_len):
logits = patch_layer_position(layer, pos, clean_hidden, corrupted)
results[layer, pos] = compute_metric(logits)
Workflow 3: Remote Execution with NDIF
Run the same experiments on massive models without local GPUs.
Step-by-Step
from nnsight import LanguageModel
model = LanguageModel("meta-llama/Llama-3.1-70B")
with model.trace("The meaning of life is", remote=True) as tracer:
layer_40_out = model.model.layers[40].output[0].save()
logits = model.output.save()
print(f"Layer 40 shape: {layer_40_out.shape}")
with model.trace(remote=True) as tracer:
with tracer.invoke("What is 2+2?"):
model.model.layers[20].output[0][:, -1, :] *= 1.5
output = model.generate(max_new_tokens=50)
NDIF Setup
- Sign up at login.ndif.us
- Get API key
- Set environment variable or pass to nnsight:
import os
os.environ["NDIF_API_KEY"] = "your_key"
from nnsight import CONFIG
CONFIG.API_KEY = "your_key"
Available Models on NDIF
- Llama-3.1-8B, 70B, 405B
- DeepSeek-R1 models
- Various open-weight models (check ndif.us for current list)
Workflow 4: Cross-Prompt Activation Sharing
Share activations between different inputs in a single trace.
from nnsight import LanguageModel
model = LanguageModel("gpt2", device_map="auto")
with model.trace() as tracer:
with tracer.invoke("The cat sat on the"):
cat_hidden = model.transformer.h[6].output[0].save()
with tracer.invoke("The dog ran through the"):
model.transformer.h[6].output[0][:] = cat_hidden
dog_with_cat = model.output.save()
Workflow 5: Gradient-Based Analysis
Access gradients during backward pass.
from nnsight import LanguageModel
import torch
model = LanguageModel("gpt2", device_map="auto")
with model.trace("The quick brown fox") as tracer:
hidden = model.transformer.h[5].output[0].save()
hidden.retain_grad()
logits = model.output
target_token = model.tokenizer.encode(" jumps")[0]
loss = -logits[0, -1, target_token]
loss.backward()
grad = hidden.grad
print(f"Gradient shape: {grad.shape}")
print(f"Gradient norm: {grad.norm().item():.3f}")
Note: Gradient access not supported for vLLM or remote execution.
Common Issues & Solutions
Issue: Module path differs between models
model.transformer.h[5].output[0]
model.model.layers[5].output[0]
print(model._model)
Issue: Forgetting to save
with model.trace("Hello"):
hidden = model.transformer.h[5].output[0]
print(hidden)
with model.trace("Hello"):
hidden = model.transformer.h[5].output[0].save()
print(hidden)
Issue: Remote timeout
with model.trace("prompt", remote=True, timeout=300) as tracer:
Issue: Memory with many saved activations
with model.trace("prompt"):
for i in range(100):
model.transformer.h[i].output[0].save()
key_layers = [0, 5, 11]
for i in key_layers:
model.transformer.h[i].output[0].save()
Issue: vLLM gradient limitation
model = LanguageModel("gpt2", device_map="auto")
Key API Reference
| Method/Property | Purpose |
|---|
model.trace(prompt, remote=False) | Start tracing context |
proxy.save() | Save value for access after trace |
proxy[:] | Slice/index proxy (assignment patches) |
tracer.invoke(prompt) | Add prompt within trace |
model.generate(...) | Generate with interventions |
model.output | Final model output logits |
model._model | Underlying HuggingFace model |
Comparison with Other Tools
| Feature | nnsight | TransformerLens | pyvene |
|---|
| Any architecture | Yes | Transformers only | Yes |
| Remote execution | Yes (NDIF) | No | No |
| Consistent API | No | Yes | Yes |
| Deferred execution | Yes | No | No |
| HuggingFace native | Yes | Reimplemented | Yes |
| Shareable configs | No | No | Yes |
Reference Documentation
For detailed API documentation, tutorials, and advanced usage, see the references/ folder:
External Resources
Tutorials
Official Documentation
Papers
Architecture Support
nnsight works with any PyTorch model:
- Transformers: GPT-2, LLaMA, Mistral, etc.
- State Space Models: Mamba
- Vision Models: ViT, CLIP
- Custom architectures: Any nn.Module
The key is knowing the module structure to access the right components.