GPU Optimization
Transformers
Softmax
Flash Attention: Making Transformers Faster
Building intuition from the memory bottleneck up: safe softmax, online softmax with the running correction factor, block matrix multiplication, and the full Flash Attention 2 forward pass that never touches HBM.
October 202510 min readRead post →