Run any Skill in Manus with one click

selective-alignment-kd-snn

Selective Alignment Knowledge Distillation (SeAl-KD) methodology for Spiking Neural Networks. Addresses the performance gap between SNNs and ANNs by selectively aligning class-level and temporal knowledge during distillation. Unlike uniform alignment across all timesteps, SeAl-KD equalizes competing logits at erroneous timesteps and reweights temporal alignment based on confidence and inter-timestep similarity. Use when: improving SNN performance via knowledge distillation, temporal alignment in SNN training, distilling ANN-to-SNN, or addressing timestep-varying predictions in spiking networks.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/hiyenwong/ai_collection --skill selective-alignment-kd-snn

Copy and paste this command into Claude Code to install the skill

Source

hiyenwong/ai_collection

Stars1

Forks0

UpdatedJune 4, 2026 at 02:00

SKILL.md

readonly

name

selective-alignment-kd-snn

description

SeAl-KD: Selective Alignment Knowledge Distillation for SNNs

Overview

arXiv:2605.14252 | Sun, Duan, Huang, Zhang, Smith, Gong, Kuhlmann (May 2026)

SNNs achieve high energy efficiency but have a performance gap with ANNs. Knowledge distillation (KD) commonly bridges this gap, but existing methods enforce uniform alignment across all timesteps, implicitly assuming per-timestep predictions should be treated equally. In practice, SNN predictions vary and evolve over time.

Core Insight

Not all timesteps matter equally. Intermediate timesteps need not all be individually correct when the final aggregated output is correct. Effective distillation should:

Provide corrective guidance to erroneous timesteps
Preserve useful temporal dynamics
NOT force every timestep toward the same supervision target

SeAl-KD Methodology

Class-Level Alignment

Identify erroneous timesteps where prediction diverges from target
Equalize competing logits (reduce gap between incorrect top predictions)
Avoid forcing correct timesteps toward unnecessary correction

Temporal Alignment Reweighting

Weight temporal alignment by confidence (high-confidence timesteps matter more)
Incorporate inter-timestep similarity (similar timesteps get aligned together)
Preserve beneficial temporal dynamics while correcting harmful ones

Algorithm

For each timestep t:
    1. Compute prediction confidence c_t
    2. Identify if timestep is erroneous (prediction != target)
    3. If erroneous:
       a. Equalize competing logits: reduce gap between top-k incorrect classes
       b. Apply corrective distillation loss
    4. If correct:
       a. Preserve temporal dynamics
       b. Apply reduced/skip distillation loss
    5. Reweight temporal alignment:
       w_t = f(confidence_t, similarity_t, t-1)

Loss Function Components

Selective Distillation Loss: Only penalize erroneous timesteps
Temporal Consistency Loss: Reweighted by confidence and similarity
Task Loss: Standard classification loss on final output

Implementation Guide

import torch
import torch.nn as nn
import torch.nn.functional as F

class SeAlKD(nn.Module):
    def __init__(self, temperature=2.0, confidence_threshold=0.5):
        super().__init__()
        self.temperature = temperature
        self.confidence_threshold = confidence_threshold
    
    def forward(self, student_spikes, teacher_spikes, target, timestep_predictions):
        """
        student_spikes: [T, B, C] - student spike counts per timestep
        teacher_spikes: [T, B, C] - teacher spike counts per timestep  
        target: [B] - ground truth labels
        timestep_predictions: [T, B, C] - per-timestep logits
        """
        T, B, C = timestep_predictions.shape
        
        total_loss = 0
        for t in range(T):
            logits_t = timestep_predictions[t]  # [B, C]
            probs_t = F.softmax(logits_t / self.temperature, dim=1)
            confidence_t = probs_t.max(dim=1).values  # [B]
            
            # Identify erroneous timesteps
            predictions_t = logits_t.argmax(dim=1)
            is_erroneous = (predictions_t != target)  # [B]
            
            # Selective alignment: only correct erroneous timesteps
            if is_erroneous.any():
                # Equalize competing logits at erroneous timesteps
                teacher_probs = F.softmax(
                    teacher_spikes[t] / self.temperature, dim=1
                )
                kd_loss = F.kl_div(
                    F.log_softmax(logits_t / self.temperature, dim=1),
                    teacher_probs,
                    reduction='batchmean'
                )
                # Reweight by confidence
                weight = 1.0 - confidence_t[is_erroneous].mean()
                total_loss += weight * kd_loss
            
            # Temporal consistency (confidence-weighted)
            if t > 0:
                similarity = F.cosine_similarity(
                    timestep_predictions[t].view(B, -1),
                    timestep_predictions[t-1].view(B, -1)
                )
                temporal_weight = F.sigmoid(similarity) * confidence_t
                total_loss += temporal_weight.mean() * 0.01
        
        return total_loss

Comparison with Existing KD Methods

Method	Temporal Treatment	Key Limitation
Uniform KD	Same loss all timesteps	Forces unnecessary corrections
Self-distillation	Inter-temporal consistency	Assumes all timesteps equally valid
SeAl-KD (proposed)	Selective per-timestep	Requires confidence estimation

Experimental Results Summary

Evaluated on static image datasets (CIFAR-10, CIFAR-100)
Evaluated on neuromorphic event-based datasets (N-MNIST, DVS Gesture)
Consistent improvements over uniform KD and self-distillation baselines
Particularly effective when SNN has significant timestep prediction variance

When to Use

SNN training with knowledge distillation from ANN teacher
Cases where per-timestep predictions show high variance
Neuromorphic applications needing both accuracy and energy efficiency
Multi-timestep SNN architectures (LIF, IAF with temporal dynamics)

Activation

selective alignment KD, SeAl-KD, SNN knowledge distillation
timestep-aware distillation, temporal alignment SNN
improving SNN accuracy, ANN-to-SNN distillation

More from this repository

same repository

attachment-representations-interbrain-synchrony

hiyenwong/ai_collection

Attachment representations in early childhood as independent endogenous driver of interbrain synchrony during remote cooperation. Novel Remote Partner-Belief Manipulation paradigm isolates attachment representations by manipulating partner-belief. EEG synchrony concentrated at P4 channel (right TPJ). Activation: attachment, interbrain synchrony, EEG hyperscanning, child-adult interaction, attachment representations, social neuroscience, partner-belief manipulation, early childhood, mother-child interaction, brain synchronization, attachment security, social-emotional development.

2026-06-041

sleep-replay-acceleration-sharp

hiyenwong/ai_collection

SHARP (Sleep-based Hierarchical Accelerated Replay) 方法论 — 睡眠启发的分层加速回放框架用于长程非平稳时序模式识别。受啮齿动物慢波睡眠中加速回放启发，通过分离记忆模块和模式识别模块实现无反向传播的长程信用分配。适用于流式时序学习、长程依赖建模、神经科学启发的 AI 架构。触发词：睡眠回放、加速回放、SHARP、时序学习、长程依赖、流式学习、慢波睡眠、hierarchical replay

2026-06-041

piston-control-two-ion-quantum

hiyenwong/ai_collection

Inverse-engineering methodology for piston operations in trapped-ion quantum devices. One ion serves as classical piston driven by Coulomb interaction with quantum-controlled ion. Stationary state determined self-consistently. Inverse-engineering protocols enable precise control of classical ion motion. Provides route toward controlled piston dynamics in microscopic quantum devices.

2026-06-041

quantum-fault-trees-minimal-cut

hiyenwong/ai_collection

Quantum fault tree analysis methodology using quantum computing. Extends classical reliability engineering fault trees to quantum domain. Identifies minimal cut sets in system reliability analysis using quantum algorithms. Applicable to safety-critical systems, cyber-physical systems, and quantum system reliability engineering.

2026-06-041

adaptive-hybrid-feature-fusion-medical

hiyenwong/ai_collection

Adaptive Hybrid Quantum-Classical Feature Fusion methodology for medical image classification. Addresses optimization asymmetries between quantum and classical paradigms using Temperature-Scaled Hybrid Fusion (TSHF), Dynamic Hybrid Fusion (DHF), and Static Hybrid Fusion (SHF) strategies. Use when designing hybrid quantum-classical ML pipelines for healthcare/medical imaging, especially when combining ResNet backbones with variational quantum circuits for diagnostic tasks.

2026-06-041

adaptive-spiking-neuron-asn

hiyenwong/ai_collection

Adaptive Spiking Neuron (ASN) methodology for vision and language modeling. Implements trainable membrane potential dynamics with adaptive firing mechanisms for efficient Spiking Neural Networks (SNNs). Activation: adaptive spiking neuron, ASN, spiking neural network vision language, SNN adaptive neuron, neuromorphic vision language model.

2026-06-041

Source

hiyenwong

hiyenwong/ai_collection

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

name

selective-alignment-kd-snn

description

SeAl-KD: Selective Alignment Knowledge Distillation for SNNs

Overview

arXiv:2605.14252 | Sun, Duan, Huang, Zhang, Smith, Gong, Kuhlmann (May 2026)

Core Insight

Not all timesteps matter equally. Intermediate timesteps need not all be individually correct when the final aggregated output is correct. Effective distillation should:

Provide corrective guidance to erroneous timesteps
Preserve useful temporal dynamics
NOT force every timestep toward the same supervision target

SeAl-KD Methodology

Class-Level Alignment

Identify erroneous timesteps where prediction diverges from target
Equalize competing logits (reduce gap between incorrect top predictions)
Avoid forcing correct timesteps toward unnecessary correction

Temporal Alignment Reweighting

Weight temporal alignment by confidence (high-confidence timesteps matter more)
Incorporate inter-timestep similarity (similar timesteps get aligned together)
Preserve beneficial temporal dynamics while correcting harmful ones

Algorithm

For each timestep t:
    1. Compute prediction confidence c_t
    2. Identify if timestep is erroneous (prediction != target)
    3. If erroneous:
       a. Equalize competing logits: reduce gap between top-k incorrect classes
       b. Apply corrective distillation loss
    4. If correct:
       a. Preserve temporal dynamics
       b. Apply reduced/skip distillation loss
    5. Reweight temporal alignment:
       w_t = f(confidence_t, similarity_t, t-1)

Loss Function Components

Selective Distillation Loss: Only penalize erroneous timesteps
Temporal Consistency Loss: Reweighted by confidence and similarity
Task Loss: Standard classification loss on final output

Implementation Guide

import torch
import torch.nn as nn
import torch.nn.functional as F

class SeAlKD(nn.Module):
    def __init__(self, temperature=2.0, confidence_threshold=0.5):
        super().__init__()
        self.temperature = temperature
        self.confidence_threshold = confidence_threshold
    
    def forward(self, student_spikes, teacher_spikes, target, timestep_predictions):
        """
        student_spikes: [T, B, C] - student spike counts per timestep
        teacher_spikes: [T, B, C] - teacher spike counts per timestep  
        target: [B] - ground truth labels
        timestep_predictions: [T, B, C] - per-timestep logits
        """
        T, B, C = timestep_predictions.shape
        
        total_loss = 0
        for t in range(T):
            logits_t = timestep_predictions[t]  # [B, C]
            probs_t = F.softmax(logits_t / self.temperature, dim=1)
            confidence_t = probs_t.max(dim=1).values  # [B]
            
            # Identify erroneous timesteps
            predictions_t = logits_t.argmax(dim=1)
            is_erroneous = (predictions_t != target)  # [B]
            
            # Selective alignment: only correct erroneous timesteps
            if is_erroneous.any():
                # Equalize competing logits at erroneous timesteps
                teacher_probs = F.softmax(
                    teacher_spikes[t] / self.temperature, dim=1
                )
                kd_loss = F.kl_div(
                    F.log_softmax(logits_t / self.temperature, dim=1),
                    teacher_probs,
                    reduction='batchmean'
                )
                # Reweight by confidence
                weight = 1.0 - confidence_t[is_erroneous].mean()
                total_loss += weight * kd_loss
            
            # Temporal consistency (confidence-weighted)
            if t > 0:
                similarity = F.cosine_similarity(
                    timestep_predictions[t].view(B, -1),
                    timestep_predictions[t-1].view(B, -1)
                )
                temporal_weight = F.sigmoid(similarity) * confidence_t
                total_loss += temporal_weight.mean() * 0.01
        
        return total_loss

Comparison with Existing KD Methods

Method	Temporal Treatment	Key Limitation
Uniform KD	Same loss all timesteps	Forces unnecessary corrections
Self-distillation	Inter-temporal consistency	Assumes all timesteps equally valid
SeAl-KD (proposed)	Selective per-timestep	Requires confidence estimation

Experimental Results Summary

Evaluated on static image datasets (CIFAR-10, CIFAR-100)
Evaluated on neuromorphic event-based datasets (N-MNIST, DVS Gesture)
Consistent improvements over uniform KD and self-distillation baselines
Particularly effective when SNN has significant timestep prediction variance

When to Use

SNN training with knowledge distillation from ANN teacher
Cases where per-timestep predictions show high variance
Neuromorphic applications needing both accuracy and energy efficiency
Multi-timestep SNN architectures (LIF, IAF with temporal dynamics)

Activation

selective alignment KD, SeAl-KD, SNN knowledge distillation
timestep-aware distillation, temporal alignment SNN
improving SNN accuracy, ANN-to-SNN distillation