Foveated Scalable
Image and Video Coding
Zhou Wang and Alan
C. Bovik
(a) Background of Foveation
The photoreceptors (cones and rods) and ganglion cells are non-uniformly
distributed in the retina in the human eye. The density of cone
receptors and ganglion cells play important roles in determining
the ability of our eyes to resolve what we see. Spatially, the
resolution has the highest value at the point of the fovea and
drops rapidly away from that point as a function of eccentricity.
As a result, when a human observer gazes at a point in a real
world image, a variable resolution image is transmitted through
the front visual channel into the high level processing units
in the human brain. The region around the point of fixation (or
foveation point) is projected into the fovea, sampled with the
highest density, and perceived with the highest sensitivity. The
sampling density and contrast sensitivity decrease dramatically
with increasing eccentricity. In conclusion, the human visual
system (HVS) is space-variant in sampling, coding, processing
and understanding visual information. (An illustration of the
human visual foveation model is available here.)
By contrast, traditional digital computer vision systems represent
images on rectangular uniformly sampled lattices, which have the
advantages of simple acquisition, storage, indexing and computation.
Nowadays, most digital images and videos are stored, processed,
transmitted and displayed in rectangular matrix format, where
each entry represents one sampling point.
The motivation behind foveation image processing is:
There exists considerable high frequency information redundancy
in the peripheral regions, thus a much more efficient representation
of images can be obtained by removing or reducing such information
redundancy, provided the foveation point(s) and the viewing distances
can be discovered.
(b) Foveated Rate Scalable Image Coding
We designed an embedded foveation image coding (EFIC) algorithm,
which attempts to order the output bitstream, so that those bits
with greater contribution to the foveated visual distortion are
encoded and transmitted first. In other words, it is designed to
optimize foveated visual quality at any bit rate. The encoded bitstream
can be truncated at any place to exactly match the available bandwidth
on the communication channels or networks. Truncating the bitstream
at different place will generate different decoded images with different
bit rate, visual quality and foveation depth.
An illustration of the human visual foveation model is available
here.
Some demo images of EFIC are available illustrated below.
Relevant Publications
Z. Wang, and A. C. Bovik, "Embedded foveation image coding,"
IEEE Transactions on Image Processing, vol. 10, no. 10, pp. 1397-1410,
Oct. 2001.

(a) Original "Zelda" image, 512
by 512, 8bits/pixel
|

(b) Original "Zelda" image with foveated region
|

(c) SPIHT compression, 0.25bpp (32:1)
|

(d) EFIC compression, 0.25bpp (32:1)
|

(e) SPIHT compression, 0.125bpp (64:1)
|

(f) EFIC compression, 0.125bpp (64:1)
|

(g) SPIHT compression, 0.0625bpp (128:1)
|

(h) EFIC compression, 0.0625bpp (128:1)
|

(i) SPIHT compression, 0.03125bpp (256:1)
|

(j) EFIC compression, 0.03125bpp (256:1)
|

(k) SPIHT compression, 0.015625bpp (512:1)
|

(l) EFIC compression, 0.015625bpp (512:1)
|

(m) SPIHT compression, 0.0078125bpp (1024:1)
|

(n) EFIC compression, 0.0078125bpp (1024:1)
|

(a) Original "News" image, 352 by 288, 8bits/pixel
|

(b) Foveated Region Selection
|

(c) SPIHT compression, 0.25bpp (32:1)
|

(d) EFIC with 3 foveated regions, 0.25bpp (32:1)
|

(e) EFIC with the left foveated region only 0.25bpp (32:1)
|

(f) EFIC with the right foveated region only 0.25bpp (32:1)
|

(g) EFIC with the upper foveated region only, 0.25bpp (32:1)
|
Note: All the compressed images have the same bit rate with different
foveated region(s).
(c) Foveated Rate Scalable Video Coding
Rate scalable coding algorithms allow the extraction of coded visual
information at continuously varying data rates from a single compressed
bitstream. This feature is especially suited for video transmission
over heterogeneous, multi-user, time-varying and interactive networks
such as the Internet. For example, in order to provide video services
over the Internet, the video server must have the ability to create
variable bandwidth video streams to meet different user requirements.
The traditional solutions, such as layered video, video transcoding,
and simply repeated encoding, require more resources in terms of
computation, storage space and/or data management. More importantly,
they lack the flexibility to adapt to the time-varying network conditions
and user requirements, because once the compressed video stream
is generated, it becomes inconvenient to change it to an arbitrary
data rate. In contrast, with a rate scalable codec, we can tightly
couple the available bandwidth and the data rate of the video being
delivered.
We developed a motion estimation (ME)/motion compensation (MC)
based foveation scalable video coding (FSVC) system. The key techniques
behind the system are: (a) Foveated visual
sensitivity modeling; (b) Embedded rate scalable coding (with
a modified SPIHT
algorithm); (c) Foveation point(s) selection; and (d) Foveation-based
Adaptive frame prediction. Relevant Publications
Z. Wang, L. Lu, and A. C. Bovik, "Rate scalable video coding
using a foveation-based human visual system model," IEEE
International Conference on Acoustics, Speech, & Signal Processing,
vol. III, pp. 1785-1789, May 2001.
L. Lu, Z. Wang, and A. C. Bovik, "Adaptive frame prediction
for foveation scalable video coding," IEEE International
Conference on Multimedia and Expo, Aug. 2001.
|