Machine Learning
A look at State Space Models like S4, H3, and Mamba
This page explores the basics of programming with CUDA, and shows how to build custom PyTorch operations that run on Nvidia GPUs
Cloud
A crash course on AWS Networking, covering VPCs, Subnets, Internet Gateways, NAT Gateways, and more.
Description of Multi-Query Attention (MQA), and Grouped-Query Attention (GQA) in transformer models.
This page explores the Mixture of Experts pattern and its application to transformer models.
An examination of Distributed Training Techniques with PyTorch and DeepSpeed