Category: Machine Learning
Attention in Transformers
Part II of An Interpretability Guide to Language Models
Categories:
Zero-Layer Transformers
Part I of An Interpretability Guide to Language Models
Categories:
Understanding State Space Models
A look at State Space Models like S4, H3, and Mamba
Categories:
Writing CUDA Kernels for PyTorch
This page explores the basics of programming with CUDA, and shows how to build custom PyTorch operations that run on Nvidia GPUs
Categories:
Multi-Query & Grouped-Query Attention
Description of Multi-Query Attention (MQA), and Grouped-Query Attention (GQA) in transformer models.
Categories:
Mixture of Experts Pattern for Transformer Models
This page explores the Mixture of Experts pattern and its application to transformer models.
Categories:
Distributed Training and DeepSpeed
An examination of Distributed Training Techniques with PyTorch and DeepSpeed
Categories:
Language Model Fine-Tuning with LoRA
This page explore Low-Rank Adaptations, LoRA, as a method for fine-tuning language models.
Categories:
BERT Embeddings
This page explains the concept of embeddings in neural networks and illustrates the function of the BERT Embedding Layer.
Categories:
BERT Encoder Layer
This page explains the inner workings of the BERT Encoder Layer.
Categories:
BERT Tokenization
This page examines the tokenization logic used to prepare inputs for BERT.