Neural Networks & Night Code

Thoughts on AI, deep learning, and the art of debugging at 3am.

Flash Attention: Making Transformers Faster

Oct 11, 2025 Flash Attention Transformers GPU Optimization Softmax

A deep dive into Flash Attention video by Umar Jamil and how it solves the memory bottleneck problem in transformers. Learn about safe softmax, online softmax, and how to leverage shared memory for 10x faster attention computation.

$ cat tags.txt

$ tail latest_activity.log

Published Flash Attention deep dive

Optimizing attention mechanisms

Writing about online softmax algorithms