Run any Skill in Manus with one click

families-linear-algebra

Framework for Advanced (Multi)Linear Infrastructure in Engineering and Science. Dense linear algebra and tensor operations framework extending BLAS/LAPACK. Use when: linear algebra library design, BLAS implementation, LAPACK algorithms, tensor operations, high-performance computing, dense matrix operations, scientific computing infrastructure, GPU-accelerated linear algebra, or multi-node/multi-GPU dense algebra.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/hiyenwong/ai_collection --skill families-linear-algebra

Copy and paste this command into Claude Code to install the skill

Source

hiyenwong/ai_collection

Stars1

Forks0

UpdatedJune 4, 2026 at 02:00

File Explorer

4 files

SKILL.md

readonly

More from this repository

same repository

attachment-representations-interbrain-synchrony

hiyenwong/ai_collection

Attachment representations in early childhood as independent endogenous driver of interbrain synchrony during remote cooperation. Novel Remote Partner-Belief Manipulation paradigm isolates attachment representations by manipulating partner-belief. EEG synchrony concentrated at P4 channel (right TPJ). Activation: attachment, interbrain synchrony, EEG hyperscanning, child-adult interaction, attachment representations, social neuroscience, partner-belief manipulation, early childhood, mother-child interaction, brain synchronization, attachment security, social-emotional development.

2026-06-041

sleep-replay-acceleration-sharp

hiyenwong/ai_collection

SHARP (Sleep-based Hierarchical Accelerated Replay) 方法论 — 睡眠启发的分层加速回放框架用于长程非平稳时序模式识别。受啮齿动物慢波睡眠中加速回放启发，通过分离记忆模块和模式识别模块实现无反向传播的长程信用分配。适用于流式时序学习、长程依赖建模、神经科学启发的 AI 架构。触发词：睡眠回放、加速回放、SHARP、时序学习、长程依赖、流式学习、慢波睡眠、hierarchical replay

2026-06-041

piston-control-two-ion-quantum

hiyenwong/ai_collection

Inverse-engineering methodology for piston operations in trapped-ion quantum devices. One ion serves as classical piston driven by Coulomb interaction with quantum-controlled ion. Stationary state determined self-consistently. Inverse-engineering protocols enable precise control of classical ion motion. Provides route toward controlled piston dynamics in microscopic quantum devices.

2026-06-041

quantum-fault-trees-minimal-cut

hiyenwong/ai_collection

Quantum fault tree analysis methodology using quantum computing. Extends classical reliability engineering fault trees to quantum domain. Identifies minimal cut sets in system reliability analysis using quantum algorithms. Applicable to safety-critical systems, cyber-physical systems, and quantum system reliability engineering.

2026-06-041

adaptive-hybrid-feature-fusion-medical

hiyenwong/ai_collection

Adaptive Hybrid Quantum-Classical Feature Fusion methodology for medical image classification. Addresses optimization asymmetries between quantum and classical paradigms using Temperature-Scaled Hybrid Fusion (TSHF), Dynamic Hybrid Fusion (DHF), and Static Hybrid Fusion (SHF) strategies. Use when designing hybrid quantum-classical ML pipelines for healthcare/medical imaging, especially when combining ResNet backbones with variational quantum circuits for diagnostic tasks.

2026-06-041

adaptive-spiking-neuron-asn

hiyenwong/ai_collection

Adaptive Spiking Neuron (ASN) methodology for vision and language modeling. Implements trainable membrane potential dynamics with adaptive firing mechanisms for efficient Spiking Neural Networks (SNNs). Activation: adaptive spiking neuron, ASN, spiking neural network vision language, SNN adaptive neuron, neuromorphic vision language model.

2026-06-041

Source

hiyenwong

hiyenwong/ai_collection

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

name

families-linear-algebra

description

FAMLIES: Advanced Linear Algebra Infrastructure

Overview

This skill provides a framework for designing and implementing advanced dense linear and tensor algebra libraries. FAMLIES (Framework for Advanced (Multi)Linear Infrastructure in Engineering and Science) addresses limitations of traditional BLAS/LAPACK approaches while enabling better adaptation to new hardware architectures and applications.

Key Innovation: More flexible approach to defining and implementing high-performance dense linear and tensor algorithms that adapts to changing applications, software, and hardware.

Current Landscape & Challenges

BLAS/LAPACK Legacy

Success Factors:

De facto standard interfaces
Stringent boundary enforcement between layers
Wide adoption in scientific computing
Foundation for ML/data science

Current Limitations:

Boundary Enforcement: Impedes reducing data movement overhead
Layer Separation: Prevents loop fusion optimization
Hardware Adaptation: Slow to adapt to new architectures (GPUs)
Algorithm Implementation: Delayed adoption of new algorithms
Flexibility: Rigid structure limits innovation

Modern Challenges

Data Movement Overhead:

Data movement dominates computation cost
Boundaries prevent data reuse optimization
Loop fusion opportunities missed
Memory hierarchy not fully exploited

New Hardware Architectures:

GPU acceleration requirements
Multi-node/multi-GPU systems
Heterogeneous computing platforms
Different memory hierarchies

Emerging Applications:

Machine learning tensor operations
Data science sparse/dense hybrids
New matrix factorization algorithms
Quantum computing interfaces

FAMLIES Framework

Core Principles

Flexibility: Adaptable to applications, software, and hardware
Optimization: Enable data movement reduction
Standards: Maintain interface standards where beneficial
Performance: High-performance implementation across architectures
Extensibility: Support new algorithms and operations

Framework Architecture

Layer Flexibility:

Allow cross-layer optimization
Enable loop fusion
Support operation composition
Flexible boundary enforcement

Hardware Adaptation:

GPU-specific optimizations
Multi-node coordination
Memory hierarchy awareness
Architecture-specific implementations

Algorithm Support:

Dense linear algebra (BLAS/LAPACK level)
Tensor operations (multi-linear algebra)
New factorization algorithms
Application-specific operations

Leveraged Projects

BLIS (BLAS-like Library Instantiation Software)

Key Features:

Flexible BLAS implementation
Architecture-specific optimization
Layer decomposition
Performance portability

Benefits:

Rapid architecture adaptation
Optimized implementations
Maintainable code structure
Community-driven development

libflame

Key Features:

LAPACK-level functionality
Algorithm-by-blocks approach
Flexible implementation
Research-oriented design

Benefits:

Advanced algorithm implementations
Block-based optimization
Research platform
Extensible framework

Dense Linear Algebra Operations

BLAS-Level Operations

Level 1: Vector operations

DOT: dot product
AXPY: scaled vector addition
NORM: vector norm
SCALE: vector scaling

Level 2: Matrix-vector operations

GEMV: general matrix-vector multiply
GER: general rank-1 update
SYMV: symmetric matrix-vector multiply
TRSV: triangular solve

Level 3: Matrix-matrix operations

GEMM: general matrix multiply
SYMM: symmetric matrix multiply
TRMM: triangular matrix multiply
SYRK: symmetric rank-k update

LAPACK-Level Operations

Factorizations:

LU: LU decomposition
Cholesky: Cholesky factorization
QR: QR decomposition
SVD: singular value decomposition

Solvers:

Linear system solve
Eigenvalue problems
Least squares
Matrix inversion

Auxiliary:

Condition estimation
Matrix norms
Equilibration
Scaling

Tensor Operations

Multi-linear Operations:

Tensor contraction
Tensor decomposition (Tucker, CP)
Tensor unfolding
Tensor permutation

Tensor Algebra:

Einstein notation operations
Tensor-matrix products
Tensor-tensor products
Tensor factorizations

Implementation Considerations

Data Movement Optimization

Loop Fusion:

// Traditional BLAS: separate operations
GEMM(A, B, C);  // C = A*B
SCALE(alpha, C);  // C = alpha*C

// FAMLIES: fused operation
GEMM_SCALE_FUSED(A, B, C, alpha);  // C = alpha*A*B
// Reduces memory traffic: 2 reads + 2 writes -> 2 reads + 1 write

Operation Composition:

// Compose multiple operations
COMPOSE({
    GEMM(A, B, T1),
    GEMM(T1, C, T2),
    ADD(T2, D, E)
});
// Single kernel execution, reduced data movement

Memory Hierarchy:

Cache-aware implementations
Register blocking
Memory bandwidth optimization
Data locality maximization

Hardware-Specific Optimization

GPU Implementations:

CUDA kernel design
Shared memory utilization
Warp-level operations
Multi-stream execution

Multi-Node/Multi-GPU:

Distributed matrix storage
Communication optimization
Load balancing
Scalability considerations

Heterogeneous Computing:

CPU-GPU coordination
Device selection strategies
Memory transfer optimization
Hybrid implementations

Algorithm-by-Blocks Approach

Concept: Decompose algorithms into block operations

Benefits:

Better cache utilization
Parallelizable blocks
Flexible implementation
Data locality

Example: LU Factorization by Blocks:

def LU_by_blocks(A, block_size):
    n = A.shape[0]
    for k in range(0, n, block_size):
        # Factorize diagonal block
        LU_kernel(A[k:k+bs, k:k+bs])
        
        # Update trailing matrix
        for i in range(k+bs, n, block_size):
            # Solve panel
            solve_panel(A[k:k+bs, k:k+bs], A[i:i+bs, k:k+bs])
            
            # Update block row
            for j in range(k+bs, n, block_size):
                GEMM(A[i:i+bs, k:k+bs], 
                     A[k:k+bs, j:j+bs], 
                     A[i:i+bs, j:j+bs])

Performance Optimization Techniques

Cache Blocking

Register Blocking: Keep data in registers L1 Cache Blocking: Reuse data in L1 cache L2 Cache Blocking: Maximize L2 utilization TLB Blocking: Avoid TLB misses

Parallelization

Thread-Level: Multi-threaded execution SIMD-Level: Vector instructions GPU-Level: GPU parallelism Distributed-Level: Multi-node parallelism

Memory Optimization

Preconditioning: Reduce memory traffic In-place Operations: Avoid extra memory Strided Access: Optimal memory patterns Alignment: Cache line alignment

Framework Components

Operation Definitions

Standard Operations: BLAS/LAPACK compatible Extended Operations: New operation types Tensor Operations: Multi-linear algebra Composite Operations: Operation sequences

Implementation Layer

CPU Kernels: Optimized CPU implementations GPU Kernels: GPU-specific kernels Hybrid Kernels: Mixed CPU-GPU Reference Kernels: Portable reference code

Interface Layer

Standard APIs: BLAS/LAPACK compatibility Extended APIs: New operation interfaces Language Bindings: C, Fortran, Python, etc. Framework APIs: High-level framework calls

Research Paper Reference

Paper: "A Proposed Framework for Advanced (Multi)Linear Infrastructure in Engineering and Science (FAMLIES)"

Authors: Devin A. Matthews, Tze Meng Low, Margaret E. Myers, et al.
arXiv ID: 2604.07311
Published: April 8, 2026
Categories: cs.MS (Mathematical Software)
Link: https://arxiv.org/abs/2604.07311

Use Cases

Scientific Computing

Applications:

Numerical simulations
Finite element methods
Computational physics
Climate modeling

Requirements:

High performance
Large-scale matrices
Distributed computing
Specialized operations

Machine Learning

Applications:

Neural network training
Tensor operations
Large-scale optimization
Model inference

Requirements:

GPU acceleration
Batch processing
Tensor operations
High throughput

Data Science

Applications:

Large-scale analytics
Statistical computations
Data preprocessing
Dimensionality reduction

Requirements:

Flexible operations
Mixed sparse/dense
Memory efficiency
Ease of use

Related Skills

blas-lapack: Traditional BLAS/LAPACK usage
gpu-computing: GPU acceleration techniques
tensor-operations: Tensor algebra fundamentals
high-performance-computing: HPC best practices

Implementation Examples

Basic GEMM with Loop Fusion

// FAMLIES GEMM with scaling and addition
void gemm_scale_add(
    int m, int n, int k,
    double alpha, double beta, double gamma,
    double *A, double *B, double *C, double *D,
    double *E) {
    
    // E = alpha*A*B + beta*C + gamma*D
    // Fused implementation
    
    for (int i = 0; i < m; i++) {
        for (int j = 0; j < n; j++) {
            double sum = 0.0;
            for (int p = 0; p < k; p++) {
                sum += A[i*k+p] * B[p*n+j];
            }
            E[i*n+j] = alpha*sum + beta*C[i*n+j] + gamma*D[i*n+j];
        }
    }
}

Tensor Contraction

import numpy as np

def tensor_contract_3d(A, B, contract_dims):
    """
    Contract tensors A and B along specified dimensions
    
    Args:
        A: tensor of shape (a1, a2, a3)
        B: tensor of shape (b1, b2, b3)
        contract_dims: dimensions to contract
    
    Returns:
        Contracted tensor
    """
    # Example: contract last dimension of A with first of B
    if contract_dims == ((-1,), (0,)):
        return np.einsum('ijk,klm->ijlm', A, B)
    
    # General contraction
    einsum_str = build_einsum_string(A.shape, B.shape, contract_dims)
    return np.einsum(einsum_str, A, B)

Future Directions

Quantum Computing Integration:

Quantum linear algebra
Hybrid quantum-classical algorithms
Quantum matrix operations

Sparse-Dense Hybrids:

Mixed sparse/dense operations
Selective storage strategies
Format-aware kernels

Automatic Optimization:

Auto-tuning frameworks
Machine learning for optimization
Compiler-driven optimization

families-linear-algebra

More from this repository

More from this repository

FAMLIES: Advanced Linear Algebra Infrastructure

Overview

Current Landscape & Challenges

BLAS/LAPACK Legacy

Modern Challenges

FAMLIES Framework

Core Principles

Framework Architecture

Leveraged Projects

BLIS (BLAS-like Library Instantiation Software)

libflame

Dense Linear Algebra Operations

BLAS-Level Operations

LAPACK-Level Operations

Tensor Operations

Implementation Considerations

Data Movement Optimization

Hardware-Specific Optimization

Algorithm-by-Blocks Approach

Performance Optimization Techniques

Cache Blocking

Parallelization

Memory Optimization

Framework Components

Operation Definitions

Implementation Layer

Interface Layer

Research Paper Reference

Use Cases

Scientific Computing

Machine Learning

Data Science

Related Skills

Implementation Examples

Basic GEMM with Loop Fusion

Tensor Contraction

Future Directions

See Also

FAMLIES: Advanced Linear Algebra Infrastructure

Overview

Current Landscape & Challenges

BLAS/LAPACK Legacy

Modern Challenges

FAMLIES Framework

Core Principles

Framework Architecture

Leveraged Projects

BLIS (BLAS-like Library Instantiation Software)

libflame

Dense Linear Algebra Operations

BLAS-Level Operations

LAPACK-Level Operations

Tensor Operations

Implementation Considerations

Data Movement Optimization

Hardware-Specific Optimization

Algorithm-by-Blocks Approach

Performance Optimization Techniques

Cache Blocking

Parallelization

Memory Optimization

Framework Components

Operation Definitions

Implementation Layer

Interface Layer

Research Paper Reference

Use Cases

Scientific Computing

Machine Learning

Data Science

Related Skills

Implementation Examples

Basic GEMM with Loop Fusion

Tensor Contraction

Future Directions

See Also