A deep dive into Flash Attention video by Umar Jamil and how it solves the memory bottleneck problem in transformers. Learn about safe softmax, online softmax, and how to leverage shared memory for 10x faster attention computation.
blog@ms.dev
$ ls -la blog/
Neural Networks & Night Code
Thoughts on AI, deep learning, and the art of debugging at 3am.