Machine Learning

Part II of An Interpretability Guide to Language Models

Part I of An Interpretability Guide to Language Models

A look at State Space Models like S4, H3, and Mamba

This page explores the basics of programming with CUDA, and shows how to build custom PyTorch operations that run on Nvidia GPUs

Description of Multi-Query Attention (MQA), and Grouped-Query Attention (GQA) in transformer models.

This page explores the Mixture of Experts pattern and its application to transformer models.

An examination of Distributed Training Techniques with PyTorch and DeepSpeed

This page explore Low-Rank Adaptations, LoRA, as a method for fine-tuning language models.

This page explains the concept of embeddings in neural networks and illustrates the function of the BERT Embedding Layer.

This page explains the inner workings of the BERT Encoder Layer.

This page examines the tokenization logic used to prepare inputs for BERT.

Category: Machine Learning