Solo-authored work on GPU kernel acceleration for biological and medical foundation models, benchmarked cross-vendor on NVIDIA H100 and AMD MI300X. Each entry has a dedicated page with its key findings.
A fused GPU kernel folds irregular-sampling interpolation and SSM inference into a single launch, cutting end-to-end latency 35.7× and clearing the sub-50ms bedside target while improving AUROC over GRU-D.
Profiling-driven kernel fusion folds 3–8 TensorNet operations into single GPU launches, cutting kernel launches by 67–88% for a 2.82× end-to-end speedup that turns a 13-hour MD run into 4.6 hours, with physical accuracy preserved exactly.
A measurement study of biological foundation model inference across NVIDIA and AMD. Dedicated findings page coming soon.
One portable Triton fusion framework accelerates six biological models and algorithms (DualBind, AlphaGenome, Enformer, ESM-2, ProtBERT, Needleman-Wunsch) up to 720× across NVIDIA and AMD GPUs, with zero accuracy loss.
Per-model drop-in Triton kernels accelerate protein and genomic foundation models plus classical algorithms (Smith-Waterman, Needleman-Wunsch, k-mer indexing, BWT) up to 2,986× across NVIDIA and AMD, with correctness preserved to machine precision.
A library of 20+ write-once Triton kernels delivering 10–19,000× speedups for sequence alignment, k-mer indexing, and quality control, compiling through MLIR to run identically on NVIDIA and AMD with no source changes.
Also accepted: an Arch4Health (ACM ICS 2026) talk on the ICU work. A dedicated page is being added for it.