I'm a GPU/ML systems researcher writing portable GPU kernels that fuse and accelerate biological and medical foundation models across NVIDIA and AMD hardware, turning order-of-magnitude speedups into clinically and scientifically usable systems.
Custom Triton kernels for AlphaGenome (5.05× faster, ~0.99999997 cosine similarity) and Enformer across 196K-base DNA: 1.43× on MI300X, 1.82× on H100.
Accelerated NVIDIA Bio Group's protein-ligand binding model on ROCm: 41.3s → 1.06s on MI300X, 4.2× on H100, 100% fidelity. Throughput 19.4 → 752.9 samples/s, 97% cost cut.
CUDA+Triton inference engine for Facebook's esm2_t6_8M (1.21M downloads/mo): 43.1× speedup, 97.7% latency reduction, 68.9% memory savings, $1,100+/mo/GPU saved.
GANs translating T2, T1CE, and FLAIR MRI from a single T1 scan, replacing four scans with one. Cuts scan time 30–44 min and cost ~70% per patient, SSIM > 0.95.
Open to GPU systems research, kernel optimization, and collaborations at the intersection of high-performance computing and biology.