arXiv preprint · 2026
InfoFlow KV addresses inference bottlenecks in retrieval-augmented generation for extended contexts by reframing selective key-value cache recomputation as an information flow problem. The method uses attention-norm signals from queries to identify tokens that are both semantically relevant and structurally capable of propagating information. It introduces information-flow-guided chunk reordering and demonstrates improvements across language and vision-language model benchmarks.