Research

Solo-authored work on GPU kernel acceleration for biological and medical foundation models, benchmarked cross-vendor on NVIDIA H100 and AMD MI300X. Each entry has a dedicated page with its key findings.

SD4H · ICML 2026 State-Space Models Clinical AI

From 805ms to 23ms: Accelerating State-Space Models for Real-Time ICU Monitoring

A fused GPU kernel folds irregular-sampling interpolation and SSM inference into a single launch, cutting end-to-end latency 35.7× and clearing the sub-50ms bedside target while improving AUROC over GRU-D.

ICML 2026 · South KoreaWorkshop paperView findings →
SimBioChem · EurIPS 2025 Molecular Dynamics ML Force Fields

Accelerating Molecular Simulations with Triton: Fused GPU Kernels for TensorNet Neural Potentials

Profiling-driven kernel fusion folds 3–8 TensorNet operations into single GPU launches, cutting kernel launches by 67–88% for a 2.82× end-to-end speedup that turns a 13-hour MD run into 4.6 hours, with physical accuracy preserved exactly.

EurIPS 2025 · DenmarkWorkshop paperView findings →
HotInfra · ISCA 2026 Infrastructure

When the LLM-Tuned Stack Misses: An Infrastructure View of Biological Foundation Model Inference Across NVIDIA and AMD

A measurement study of biological foundation model inference across NVIDIA and AMD. Dedicated findings page coming soon.

ISCA 2026 · U.S.A.Page in progress
ISC High Performance 2026 Cross-Vendor Bio Foundation Models

Portable GPU Kernel Acceleration for Biological Foundation Models & Algorithms using OpenAI Triton

One portable Triton fusion framework accelerates six biological models and algorithms (DualBind, AlphaGenome, Enformer, ESM-2, ProtBERT, Needleman-Wunsch) up to 720× across NVIDIA and AMD GPUs, with zero accuracy loss.

ISC 2026 · GermanyPosterView findings →
RECOMB-ARCH 2026 Hardware-Algorithm Co-design Bioinformatics

Hardware-Portable Fused GPU Kernels for High-Throughput Biological Foundation Models

Per-model drop-in Triton kernels accelerate protein and genomic foundation models plus classical algorithms (Smith-Waterman, Needleman-Wunsch, k-mer indexing, BWT) up to 2,986× across NVIDIA and AMD, with correctness preserved to machine precision.

RECOMB-ARCH 2026 · GreecePosterView findings →
MLSys YPS 2026 Write-Once Kernels Bioinformatics

BioTriton: Portable Cross-Vendor GPU Kernels for High-Throughput Bioinformatics via OpenAI Triton

A library of 20+ write-once Triton kernels delivering 10–19,000× speedups for sequence alignment, k-mer indexing, and quality control, compiling through MLIR to run identically on NVIDIA and AMD with no source changes.

MLSys YPS 2026 · USAPosterView findings →

Also accepted: an Arch4Health (ACM ICS 2026) talk on the ICU work. A dedicated page is being added for it.