CUDA expert: GPU kernel programming, memory management (global/shared/local), warp divergence, stream concurrency, cuBLAS/cuFFT integration. Use when writing GPU-accelerated code with CUDA.
Hugging Face expert: Transformers, Datasets, PEFT (LoRA/QLoRA), model fine-tuning, GGUF quantization, Text Generation Inference, pipeline optimization. Use when working with pretrained models, fine-tuning LLMs, or building NLP applications.
Jupyter expert: magic commands, nbconvert, JupyterLab extensions, remote setup, ipywidgets, profiling, debugging, cell decorators, papermill for automation. Use when working with Jupyter notebooks, data exploration, or building ML experiments.
LangChain expert: LCEL (LangChain Expression Language), chains, agents, RAG pipelines, tool calling, memory, callbacks, output parsers, retrieval strategies. Use when building LLM applications, RAG systems, or AI agents with LangChain.
Invoke when: User needs help with LlamaIndex RAG pipelines, index types, query engines, or vector stores. Provides: Index selection, embedding configuration, retrieval strategies, and pipeline optimization.
LLM serving expert: vLLM, TensorRT-LLM, Triton Inference Server, quantization (INT8/FP8/GPTQ/AWQ), continuous batching, PagedAttention, KV cache management. Use when deploying LLMs for inference.
MLflow expert: experiment tracking, model registry, autologging, MLflow Projects, MLflow Models, model serving, A/B testing, feature store integration. Use when tracking ML experiments, managing models, or deploying ML models with MLflow.
PyTorch expert: nn.Module, training loops, distributed training (DDP), mixed precision, FSDP, torch.compile, AMP, torch.jit, TorchScript, ONNX export, custom autograd functions.