| name | selective-alignment-kd-snn |
| description | Selective Alignment Knowledge Distillation (SeAl-KD) methodology for Spiking Neural Networks. Addresses the performance gap between SNNs and ANNs by selectively aligning class-level and temporal knowledge during distillation. Unlike uniform alignment across all timesteps, SeAl-KD equalizes competing logits at erroneous timesteps and reweights temporal alignment based on confidence and inter-timestep similarity. Use when: improving SNN performance via knowledge distillation, temporal alignment in SNN training, distilling ANN-to-SNN, or addressing timestep-varying predictions in spiking networks. |
SeAl-KD: Selective Alignment Knowledge Distillation for SNNs
Overview
arXiv:2605.14252 | Sun, Duan, Huang, Zhang, Smith, Gong, Kuhlmann (May 2026)
SNNs achieve high energy efficiency but have a performance gap with ANNs. Knowledge distillation (KD) commonly bridges this gap, but existing methods enforce uniform alignment across all timesteps, implicitly assuming per-timestep predictions should be treated equally. In practice, SNN predictions vary and evolve over time.
Core Insight
Not all timesteps matter equally. Intermediate timesteps need not all be individually correct when the final aggregated output is correct. Effective distillation should:
- Provide corrective guidance to erroneous timesteps
- Preserve useful temporal dynamics
- NOT force every timestep toward the same supervision target
SeAl-KD Methodology
Class-Level Alignment
- Identify erroneous timesteps where prediction diverges from target
- Equalize competing logits (reduce gap between incorrect top predictions)
- Avoid forcing correct timesteps toward unnecessary correction
Temporal Alignment Reweighting
- Weight temporal alignment by confidence (high-confidence timesteps matter more)
- Incorporate inter-timestep similarity (similar timesteps get aligned together)
- Preserve beneficial temporal dynamics while correcting harmful ones
Algorithm
For each timestep t:
1. Compute prediction confidence c_t
2. Identify if timestep is erroneous (prediction != target)
3. If erroneous:
a. Equalize competing logits: reduce gap between top-k incorrect classes
b. Apply corrective distillation loss
4. If correct:
a. Preserve temporal dynamics
b. Apply reduced/skip distillation loss
5. Reweight temporal alignment:
w_t = f(confidence_t, similarity_t, t-1)
Loss Function Components
- Selective Distillation Loss: Only penalize erroneous timesteps
- Temporal Consistency Loss: Reweighted by confidence and similarity
- Task Loss: Standard classification loss on final output
Implementation Guide
import torch
import torch.nn as nn
import torch.nn.functional as F
class SeAlKD(nn.Module):
def __init__(self, temperature=2.0, confidence_threshold=0.5):
super().__init__()
self.temperature = temperature
self.confidence_threshold = confidence_threshold
def forward(self, student_spikes, teacher_spikes, target, timestep_predictions):
"""
student_spikes: [T, B, C] - student spike counts per timestep
teacher_spikes: [T, B, C] - teacher spike counts per timestep
target: [B] - ground truth labels
timestep_predictions: [T, B, C] - per-timestep logits
"""
T, B, C = timestep_predictions.shape
total_loss = 0
for t in range(T):
logits_t = timestep_predictions[t]
probs_t = F.softmax(logits_t / self.temperature, dim=1)
confidence_t = probs_t.max(dim=1).values
predictions_t = logits_t.argmax(dim=1)
is_erroneous = (predictions_t != target)
if is_erroneous.any():
teacher_probs = F.softmax(
teacher_spikes[t] / self.temperature, dim=1
)
kd_loss = F.kl_div(
F.log_softmax(logits_t / self.temperature, dim=1),
teacher_probs,
reduction='batchmean'
)
weight = 1.0 - confidence_t[is_erroneous].mean()
total_loss += weight * kd_loss
if t > 0:
similarity = F.cosine_similarity(
timestep_predictions[t].view(B, -1),
timestep_predictions[t-1].view(B, -1)
)
temporal_weight = F.sigmoid(similarity) * confidence_t
total_loss += temporal_weight.mean() * 0.01
return total_loss
Comparison with Existing KD Methods
| Method | Temporal Treatment | Key Limitation |
|---|
| Uniform KD | Same loss all timesteps | Forces unnecessary corrections |
| Self-distillation | Inter-temporal consistency | Assumes all timesteps equally valid |
| SeAl-KD (proposed) | Selective per-timestep | Requires confidence estimation |
Experimental Results Summary
- Evaluated on static image datasets (CIFAR-10, CIFAR-100)
- Evaluated on neuromorphic event-based datasets (N-MNIST, DVS Gesture)
- Consistent improvements over uniform KD and self-distillation baselines
- Particularly effective when SNN has significant timestep prediction variance
When to Use
- SNN training with knowledge distillation from ANN teacher
- Cases where per-timestep predictions show high variance
- Neuromorphic applications needing both accuracy and energy efficiency
- Multi-timestep SNN architectures (LIF, IAF with temporal dynamics)
Activation
- selective alignment KD, SeAl-KD, SNN knowledge distillation
- timestep-aware distillation, temporal alignment SNN
- improving SNN accuracy, ANN-to-SNN distillation