blog@ms.dev
$ ls -la blog/

Neural Networks & Night Code

Thoughts on AI, deep learning, and the art of debugging at 3am.

Flash Attention: Making Transformers Faster

A deep dive into Flash Attention video by Umar Jamil and how it solves the memory bottleneck problem in transformers. Learn about safe softmax, online softmax, and how to leverage shared memory for 10x faster attention computation.