Foveated Scalable Image and Video Coding
Zhou Wang and Alan C. Bovik
(a) Background of Foveation
The photoreceptors (cones and rods) and ganglion cells are non-uniformly distributed in the retina in the human eye. The density of cone receptors and ganglion cells play important roles in determining the ability of our eyes to resolve what we see. Spatially, the resolution has the highest value at the point of the fovea and drops rapidly away from that point as a function of eccentricity. As a result, when a human observer gazes at a point in a real world image, a variable resolution image is transmitted through the front visual channel into the high level processing units in the human brain. The region around the point of fixation (or foveation point) is projected into the fovea, sampled with the highest density, and perceived with the highest sensitivity. The sampling density and contrast sensitivity decrease dramatically with increasing eccentricity. In conclusion, the human visual system (HVS) is space-variant in sampling, coding, processing and understanding visual information. (An illustration of the human visual foveation model is available here.) By contrast, traditional digital computer vision systems represent images on rectangular uniformly sampled lattices, which have the advantages of simple acquisition, storage, indexing and computation. Nowadays, most digital images and videos are stored, processed, transmitted and displayed in rectangular matrix format, where each entry represents one sampling point.
The motivation behind foveation image processing is:
There exists considerable high frequency information redundancy in the peripheral regions, thus a much more efficient representation of images can be obtained by removing or reducing such information redundancy, provided the foveation point(s) and the viewing distances can be discovered.
(b) Foveated Rate Scalable Image Coding
We designed an embedded foveation image coding (EFIC) algorithm, which attempts to order the output bitstream, so that those bits with greater contribution to the foveated visual distortion are encoded and transmitted first. In other words, it is designed to optimize foveated visual quality at any bit rate. The encoded bitstream can be truncated at any place to exactly match the available bandwidth on the communication channels or networks. Truncating the bitstream at different place will generate different decoded images with different bit rate, visual quality and foveation depth.
An illustration of the human visual foveation model is available here.
Some demo images of EFIC are available illustrated below.
Z. Wang, and A. C. Bovik, "Embedded foveation image coding," IEEE Transactions on Image Processing, vol. 10, no. 10, pp. 1397-1410, Oct. 2001.
Note: All the compressed images have the same bit rate with different foveated region(s).
(c) Foveated Rate Scalable Video Coding
Rate scalable coding algorithms allow the extraction of coded visual information at continuously varying data rates from a single compressed bitstream. This feature is especially suited for video transmission over heterogeneous, multi-user, time-varying and interactive networks such as the Internet. For example, in order to provide video services over the Internet, the video server must have the ability to create variable bandwidth video streams to meet different user requirements. The traditional solutions, such as layered video, video transcoding, and simply repeated encoding, require more resources in terms of computation, storage space and/or data management. More importantly, they lack the flexibility to adapt to the time-varying network conditions and user requirements, because once the compressed video stream is generated, it becomes inconvenient to change it to an arbitrary data rate. In contrast, with a rate scalable codec, we can tightly couple the available bandwidth and the data rate of the video being delivered.
We developed a motion estimation (ME)/motion compensation (MC) based foveation scalable video coding (FSVC) system. The key techniques behind the system are: (a) Foveated visual sensitivity modeling; (b) Embedded rate scalable coding (with a modified SPIHT algorithm); (c) Foveation point(s) selection; and (d) Foveation-based Adaptive frame prediction.
Z. Wang, L. Lu, and A. C. Bovik, "Rate scalable video coding using a foveation-based human visual system model," IEEE International Conference on Acoustics, Speech, & Signal Processing, vol. III, pp. 1785-1789, May 2001.
L. Lu, Z. Wang, and A. C. Bovik, "Adaptive frame prediction
for foveation scalable video coding," IEEE International
Conference on Multimedia and Expo, Aug. 2001.