Writing

Learning in public. Notes on GPU kernels, Triton internals, and the systems work behind fast biological AI.

GPU Optimization Transformers Softmax

Flash Attention: Making Transformers Faster

Building intuition from the memory bottleneck up: safe softmax, online softmax with the running correction factor, block matrix multiplication, and the full Flash Attention 2 forward pass that never touches HBM.

October 202510 min readRead post →