Hi, I’m Shaoyi Zheng, currently a Ph.D. student in Computer Science at the Courant Institute of Mathematical Sciences, New York University, where I am advised by Prof. Shengjie Wang. I completed my undergraduate studies at NYU Shanghai, majoring in Computer Science with a minor in Mathematics.
My research interests broadly lie in Efficient AI and generative models, focusing on model architectural design and algorithmic acceleration.
Generative models such as diffusion models and large language models have achieved remarkable capabilities, but their ever-growing scale and input size lead to significant efficiency bottlenecks. My research focuses on algorithmic improvements and architectural innovations to accelerate these models without sacrificing quality. I explore methods such as sparsity, efficient model designs together with kernel optimization to reduce latency and memory usage. My broader vision is to make powerful generative models more accessible and affordable, enabling their deployment in real-world scenarios.
Under review at ICLR 2026. Proposed HilbertA, a sparse attention mechanism based on the Hilbert curve that jointly preserves 2D spatial locality and enables contiguous memory access, improving sparsity efficiency...
Read more → ICML 2025, PMLR 267:40930–40951ToMA is a GPU‑aligned token merging framework for diffusion models, reformulating token merging as an attention‑like linear transformation with invertible unmerge to accelerate diffusion models without degrading quality, using submodular...
Read more → arXiv preprint, under review at ACL 2026 (Short Paper Track)Under review at ACL 2026 (Short Paper Track). Proposed Sub-CP, a submodular, block-aware context selection framework that controls a diversity–coherence spectrum for scalable in‑context learning. Designed four partition strategies—Global Diverse,...
Read more →New York University, Courant Institute of Mathematical Sciences. Advised by Prof. Shengjie Wang. Research on Efficient AI and generative models.
Proposed Hilbert Attention for diffusion models achieving up to 4.17× speedup. Under review at ICLR 2026.
GPU-aligned token merging for diffusion models published at ICML 2025. Up to 1.4× speedup on SDXL without quality loss.
Applied Research Center, working on efficient generative model deployment.
Context selection framework for in-context learning. Under review at ACL 2026.
Research Group #21, contributing to computer vision research.
New York University Shanghai. Minor in Mathematics.