Gaussian Representations for Video

Sachin Shah Anustup Choudhury Guan-Ming Su Jaclyn Pytlarz Christopher A. Metzler Trisha Mittal

Winter Conference on Applications of Computer Vision (WACV) — 2026

We introduce Gaussian representations for videos (GaRV), a novel video encoding and decoding scheme based upon 3D Gaussians. Unlike traditional representations, which encode videos as sequences of frames, or neural representations, which encode videos within the weights of a neural network, we encode videos as a collection of 3D Gaussians within a space-time volume. The key advantage of our approach is that it enables efficient and flexible rasterization-based video decoding. With a slight drop in overall compression rate, GaRV offers a 8-50x improvement in decoding time and 2.5-15x reduction in GPU memory compared with neural counterparts. Existing Gaussian video techniques require 2-30x more disk space, while also using more GPU resources than GaRV. Moreover, GaRV offers unique flexibility in how and when pixels are decoded: One can non-sequentially decode frames/regions without penalty and can selectively decode regions at high-resolution to enable low-cost foveated video decoding.

@inproceedings{shah2026gaussian,
  title={Gaussian Representations for Video},
  author={Shah, Sachin and Choudhury, Anustup and Su, Guan-Ming and Pytlarz, Jaclyn and Metzler, Christopher A and Mittal, Trisha},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={827--837},
  year={2026}
}