InfoFlow KV: Information-Flow-Aware KV Recomputation for Long Context

arXiv preprint · 2026

InfoFlow KV addresses inference bottlenecks in retrieval-augmented generation for extended contexts by reframing selective key-value cache recomputation as an information flow problem. The method uses attention-norm signals from queries to identify tokens that are both semantically relevant and structurally capable of propagating information. It introduces information-flow-guided chunk reordering and demonstrates improvements across language and vision-language model benchmarks.

Citation:
Shaoyi Zheng*, Xin Teng*, Canyu Zhang*, Danyang Zhuo, Tianyi Zhou, Shengjie Wang. "InfoFlow KV: Information-Flow-Aware KV Recomputation for Long Context." arXiv preprint arXiv:2603.05353, 2026.