Hilbert Attention for Image Generation with Diffusion Models
Published in arXiv preprint, under review at ICLR 2026, 2025
Hilbert Attention (HilbertA) introduces a sparse attention mechanism for diffusion models using the Hilbert curve to ensure 2D spatial locality and contiguous memory access. The method reorders tokens along the Hilbert curve, applies tiling and sliding windows to balance local modeling and global information flow, and preserves coalesced GPU memory access for improved throughput. A custom Triton kernel fuses the sparse attention operations, and LoRA fine-tuning maximizes information flow under sparsity. Experiments show up to 4.17× speedup on Flux.1 with comparable image quality, offering a superior speed–quality trade toff relative to dense and 2D sparse attention baselines.
