ut ut

Laboratory for Image & Video Engineering

History of LIVE Public-domain Subjective Picture Quality Databases

Public-domain subjective picture quality databases are basic and required tools for developing models and algorithms that predict how human viewers perceive and rate the quality of pictures, whether they are television signals, cell phone videos, or cinematic motion pictures. The creation of a useful Subjective Picture Quality Database is a large multi-month and sometimes multi-year task that requires deep insights into perceptual video engineering, communications systems, picture distortion modeling, visual neuroscience, and visual psychophysics. The few successful public-domain subjective picture quality databases are masterpieces of perceptual engineering that have made possible, and continue to enable, the design, testing, and comparison of high-performance perceptual video quality measurement models and algorithms that are used throughout the U.S. and global broadcast, cable, and satellite television and cinematic industries, in the development and testing of compression standards, benchmarking of picture processing algorithms, design of digital cameras, and so on.

The LIVE public-domain subjective picture quality databases are purely scientific efforts created without any proprietary interest, secrecy in design or algorithm testing, or monetary connection. They are made freely available to the public without any cost or return as a public service. Users have complete freedom without license or royalty fees to utilize professionally, copy, modify, or distribute. The only stipulations made for use of the LIVE databases (and most other public databases) are in regard to copyright and recognition of the source of the database in any publications that use the videos, subject scores, and associated data.

First Successful Public-domain Subjective Picture Quality Database

The LIVE Image Quality Assessment Database (LIVE IQA) was the world’s first large-scale, successful public-domain subjective picture quality database. It was created and made publicly available on the LIVE web page in 2003, and was first described in 2004 in the paper [1] on the Structural Similarity (SSIM). Since then the database has been downloaded thousands of times by researchers, television engineers and photographic engineers around the world. It has been used to create, compare, and test hundreds of advanced picture and video quality models. Using this breakthrough resource, in 2004 the authors of [1] were able to show that the quality prediction performance of the new SSIM model was significantly better than that of the Emmy-award winning Sarnoff JND-Metrix model, with greatly reduced computational complexity. In a large follow-up study reported in [2], the leading picture quality prediction models at that time were rigorously tested. It was found that two models, the newer multi-scale version of SSIM [3] and the perceptually-driven information-theoretic Visual Information Fidelity (VIF) model [4] delivered nearly identical perceptual quality prediction performance, both significantly better than the Sarnoff model or any other model, and both far better than the questionable peak signal-to-noise ratio (PSNR) or MSE method. Since then, other public-domain subjective picture quality databases have been created following the LIVE model. Some are smaller and more specific, such as the IRCCyN/IVC database [5], and others large and comprehensive, such as the TID database [6]. The database analysis paper [7] makes a number of interesting comparisons of existing public picture quality databases.

Prior to LIVE IQA, the only existing large picture quality databases were the proprietary, non-public VQEG databases produced by the Video Quality Experts Group. No access of the video data was given to any researchers or developers other than the algorithm proponents themselves, who also contributed content to the database. The intention was that the video data and testing was to be limited to the proponents. Unfortunately, the first “Phase 1 FR TV” VQEG effort failed since, as explained in the VQEG report [8] and as summarized in [2]

“the nine video QA methods that it tested, which contained some of the most sophisticated algorithms at that time, were statistically indistinguishable from the simple peak signal-to-noise ratio (PSNR).”

The VQEG did make publicly available all of the VQEG Phase 1 data after the failure of the study. Unfortunately, the VQEG Phase 1 perceptual scores were of limited use, owing to poor distribution of the picture distortion categories and severities, resulting in a high degree of “quality clustering” and poor perceptual separability. VQEG Phase 1 is sometimes referred to as the first public picture quality database, e.g., in [7], however this claim is questionable for two reasons: the original intention was that the database was to be used only for the benefit of proponent algorithm development and testing, and the data was only released after it was found to be defective. In any case, the first successful public-domain picture quality database was LIVE IQA, which has been used vastly more heavily than all VQEG databases combined. Because of its intrinsic problems, VQEG FR TV Phase 1 has been cited fewer than 100 times, while LIVE IQA has been cited about 3000 times.

The next proprietary VQEG attempt (Phase 2) fared little better, since the performances of all proponent algorithms were again statistically indistinguishable, albeit slightly better than PSNR, the data was again held secret and the videos have regrettably never been publicly released. Nevertheless, based on the study, the ITU did standardize one of the models (NTIA’s ‘VQM’), although it never had any commercial impact, and was later shown to deliver only middling performance on the large public databases. More advanced models such as Multiscale SSIM, MOVIE, Oklahoma’s MAD, ST-RRED, and the open-source Netflix VMAF model, which are based on sophisticated neuroscience and neurostatistical models, have been shown on LIVE VQA to provide much better performance than VQM or any other models and vastly better than PSNR. Unfortunately, the VQEG Phase 2 data was never made available, except (oddly) for the subjective scores, which were of little use, since most of the videos were owned by the proponent companies, who would not release them. As explained in [10]:

Only subjective data has been made available publicly from the VQEG FRTV Phase 2 study and the videos have not been made public, due to several copyright and licensing issues. The situation with the VQEG Multimedia dataset is identical, wherein the VQEG plans to release only the subjective data in September, 2009 and the videos will not be released publicly. This is a grave concern, since unavailability of the VQEG datasets seriously limits the ability of researchers to benchmark the performance of new, objective VQA models against the VQEG evaluations.

The absurdity of releasing only subjective scores without videos is obvious. Because of the limitations place by the proponents involved in the VQEG studies, VQEG has not been able to produce any successful public domain databases since then, although as stated in their objectives (e.g., at http://www.its.bldrdoc.gov/vqeg/about-vqeg.aspx)

“It is also the ambition of VQEG to produce open source databases of video material and test results, as well as software tools …. VQEG has been forced to proceed with validation tests using materials that have copyright restrictions, since open source video material has been difficult to obtain”

although this never materialized beyond the first (defective) Phase 1 dataset. In our view, the video quality assessment tools advocated by VQEG and occasionally blessed by the ANSI, ISO, and/or ITU standards bodies based on internal VQEG tests have never caught hold in commercial practice (including VQM and Swisscom’s VQUAD-HD) since they are not held to adequate public scrutiny and scientific repeatability. In contradistinction, picture quality measurement tools created, tested and compared on the LIVE public-domain subjective picture quality databases, and other public-domain databases like them, are used globally by every facet of the Photographic, Image Processing, Television and Cinematic industries and beyond, and are used to control and monitor the Television experiences of hundreds of millions of viewers daily.

First Public-domain Subjective Motion Picture Quality Database

LIVE also created and made public the first public-domain subjective motion picture quality database in mid-2009 [9]. The year-old LIVE Video Quality Database (LIVE VQA) was used for a large-scale algorithm comparison in [10], which first showed the significantly superior performance of the new MOVIE model [11] relative to the ITU standard (but little used) VQM model and all other models, and even to the widely deployed Multiscale SSIM algorithm. The MOVIE model was able to achieve this level of performance using advanced models of cortical and extra-cortical neuronal function in the visual brain. The success of LIVE VQA led to later “VQA” databases, including the smaller and more specific EPFL-PoliMi database [12] released in 2010.

Prior to the pioneering LIVE Public-domain Subjective Picture Quality Databases, the only resources were unavailable: private, industry-proprietary, or unpublished. Aside from the obvious lack of usefulness to the world-at-large, these kinds of databases could not, unfortunately, be tested to demonstrate their internal consistency, accuracy, lack of any bias, or perceptual relevance, leaving in question the quality of the perceptual engineering design of the video content, the video distortions, the unbiasedness of all aspects of the data, the quality and diversity of the human subject enrollment, and the perceptual accuracy of the measurements represented. LIVE VQA continues to be the pre-eminent resource for testing algorithms that assess or predict motion picture quality, although several newer LIVE databases have since been developed to address more specific motion picture quality issues.

Impact of LIVE Subjective Picture Quality Databases

The pioneering public LIVE databases, which were created in an academic laboratory with few intrinsic resources or infrastructure, but significant innovation, inspiration, sustained effort, and perspiration, have attained dramatic successes, and have changed the picture quality measurement environment entirely, leading to and inspiring other public databases over the years and hundreds of new and often successful automatic picture quality analyzers. The LIVE Public-domain Subjective Picture Quality Databases quickly became the dominant source of data and the centerpieces of creative development in the field of Television picture quality. The series of increasingly diverse, innovative, and specific LIVE databases that have followed, and that address such key problems as Television rate control, 3D content, flicker in videos, databases for optimizing QoE in view of new networking protocols such as DASH and HTTP, and wireless and mobile video quality, have likewise continued to transform and shape the field of Television, Internet, and mobile picture quality measurement for more than a decade, and will continue to do so in the future.

Overall, public-domain subjective picture quality databases have been successful and transformative because of their wide deployment and validation in thousands of global studies of picture quality algorithm design, comparison, and application; the wide variety and innovation of Television, Cinematic and mobile applications studied (3D, SD, HD, streaming, mobile rate control protocols, and so on); their large sizes in terms of enrollment and the diversity of test videos they contain; and the high quality of their perceptual engineering design. Indeed, the largest, most widely-used Public-domain Subjective Picture Quality Databases have been downloaded thousands of times and subjected to extraordinarily degrees of intensive testing, measurement, and validation, and have been used in many thousands of published applications. This is remarkable given the large storage volume of these databases (dozens of TB).

Public-domain subjective picture quality databases have made possible the design of an extraordinary diversity of Television picture quality measurement methods for SD and HD content, 3D videos, mobile/wireless television signals, and much more. The best-known and most widely-used television picture quality measurement tools, such as the Primetime Emmy-award-winning Structural Similarity (SSIM) and the more recent award-winning MOVIE model, both of which are heavily commercialized and deployed throughout the global broadcast, cablecast, satellite and post-production industries, are able to deliver highly accurate predictions of picture quality (unlike old standards such as the PSNR). The Visual Information Fidelity model was the top-performer on LIVE IQA for many years, and is now used both as an integral part of the Netflix VMAF large scale quality prediction engine, and as part of MPEG HDR quality prediction standard. A particularly robust model known as ST-RRED, which extends the VIF concept, has shown top or near top performance on most existing databases. More recently, no-reference blind and ‘completely blind’ picture model models from LIVE have become widely used. These include the extremely popular neurostatistics-based BRISQUE model, and the ‘opinion-unaware’ or ‘completely blind’ NIQE model, which is now being commercially marketed and used around the world, especially in Cloud-based video streaming applications. These models were all designed and tested using the distorted picture data and human subject scores available in existing public-domain subjective picture quality databases. These quality prediction tools are used globally by broadcast and post-production houses on a daily basis to monitor and control the quality of Television picture quality that is delivered to tens or hundreds of millions of broadcast, cable, and satellite Television viewers every day.

Overview of LIVE Public-domain Subjective Picture Quality Databases

Professor Al Bovik’s Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin is widely-acknowledged as the world’s leading center for innovation of photographic, television, cinematic, and mobile picture quality measurement in the world. Professor Bovik and his student collaborators conceptualized and designed the following currently available (online) public-domain subjective picture and video quality databases:

  • LIVE Image Quality Assessment Database (2003)
  • LIVE Video Quality Database (2008)
  • LIVE 3D Quality Database, Phase I (2012)
  • LIVE Mobile Video Quality Database (2013)
  • LIVE 3D Quality Database, Phase II (2013)
  • LIVE Multiply Distorted Image Quality Database (2013)
  • LIVE Quality of Experience (QoE) Database for HTTP-based Video Streaming (2014)
  • LIVE Flicker Video Database (2015)
  • LIVE “In the Wild” Image Quality Challenge Database (2015)
  • LIVE Mobile Video Stall Database-I (2015)
  • LIVE-ESPL HDR Subjective Image Quality Database (2016)
  • LIVE-Qualcomm Mobile In-Capture Video Quality Database (2017)
  • LIVE-NFLX Video Quality of Experience Database (2017)
  • LIVE Mobile Video Stall Database-II (2017)
  • LIVE Immersive Image Database (2018)
  • LIVE-NFLX-II Subjective Video Quality of Experience Database (2018)
  • LIVE Video Quality Challenge Database (2018)

Several other new databases are under development and slated for release, including databases directed towards 360 Video, and towards high-frame rate VQA. These pioneering innovations have been essential global “go-to” resources for the creation, development, and testing of new Picture and Video Quality models and algorithms for video researchers and engineers throughout the world for 15 years. Through their innovations in engineering design, these highly diverse databases encompass still (frame) pictures, SD and HD videos, mobile videos, 3D pictures, immersive videos, and HD videos subjected to a wide variety of realistic distortions and Quality of Experience (QoE) impairments. The LIVE website is the most popular video processing web destination in the world, and the LIVE subjective picture quality databases have been downloaded about 15,000 times despite their massive collective size (more than 20 TB). All are available for immediate download, with explanations, including all source videos, distorted videos, and human subject scores, absolutely for free at: http://live.ece.utexas.edu/research/Quality/index.htm.

Innovations and Intellectual and Scientific Contributions of the LIVE Public-domain Subjective Picture Quality Databases

Each LIVE database represents a very large effort in terms of intellectual design, vision science, video engineering, and manpower. These databases were created under carefully controlled psychometric conditions in the LIVE picture quality laboratory, which has engaged hundreds of diverse human subjects over the years in subjective quality studies in order to collect several hundred thousand human judgments of picture quality. These have resulted in the most widely-used and respected picture quality data in the world. The latest innovation, the just-released LIVE study, the new “In the Wild” Challenge Database, engaged more than 8,000 online human subjects to collect more than 350,000 human judgments of picture quality, in a move away from the controlled lab setting and reflective of mobile user behavior.

Many key innovations have been introduced in the LIVE public-domain picture quality databases. These include the first mobile and streaming “QoE” databases, which account for the perception in variations of Television or Internet bitrate, channel conditions, and adaptive protocols such as HTTP and DASH; the first 3D picture quality databases for assessing the perception of distortions on 3D visual signals; the first “video flicker” database (also new in 2016) that directly addresses one of the most annoying video distortions; the first truly large-scale crowdsourced picture and video quality databases; and a unique database that allows the study of multiple interacting distortions. The results of these studies have allowed researchers, scientists, and developers to profoundly deepen their understanding of the perception of picture quality. They have also led to the creation, testing, comparison and validation of the world’s leading picture quality measurement tools, including such commercially successful models as the Primetime Emmy Award-winning Structural Similarity (SSIM) and MS-SSIM video quality measurement tools, the Visual Information Fidelity (VIF) index, the Motion-tuned Video Integrity Evaluator (MOVIE) video quality measurement tool, and the Natural Image Quality Evaluator (NIQE).

SSIM, MOVIE, VIF and NIQE are currently heavily commercialized into products and are relied on by nearly every broadcast encoder and statistical multiplexer manufacturer, e.g., Arris, Ericson, Harmonic, Cisco, Envivio, RGB Networks, etc., who use them daily to ensure the products they build deliver the best possible video quality. Most broadcast and cable program originators, including Netflix, AT&T, Comcast, Discovery, Stars, NBC, FOX, Showtime, Turner, BT, and PBS use SSIM, VIF, and NIQE to test the products they buy, the infrastructure they own, and the quality of video content delivered to tens of millions of US viewers daily. Since at least 2009 and continuing today, SSIM and VIF are dominant video quality tools used by broadcast and post-production houses throughout the global TV industry: for example, in Netflix’s VMAF quality assessment system, by satellite broadcasters DIRECTV, the Sky companies (Italy, UK, Brazil, India), Nine and Telstra in Australia, Oi and TV Globo in Brazil (and many others) use them daily to determine the quality of their channel line-ups. Television quality delivered daily to hundreds of millions of global viewers is established and maintained by SSIM/VIF/NIQE quality tools that were developed on the LIVE Public-domain Subjective Picture Quality Databases. These tools are commercialized and offered globally in picture quality systems by many companies including Video Clarity, Rodhe and Schwarz, National Instruments, SSIMWave, Intel, and many others. SSIM is also a basic measurement and comparison tool for testing and comparing MPEG-2, H.264, and HEVC video compression standard encoder implementations and protocols, and is typically the objective standard of quality (along with the hoary PSNR) in encoder “bakeoff” contests.

Nearly every other competitive picture quality measurement tool has also been created and/or tested and compared on the LIVE subjective picture quality databases or on later databases. Widely used models such as FSIM (U. Hong Kong), VSNR (Cornell) and MAD (Oklahoma State) were all created and tested on the LIVE databases. Industry tools such as the Emmy-award winning Sarnoff JND-Metrix and the ITU Standard quality tools, including NTIA’s VQM have been subjected to testing on the LIVE subjective picture quality databases. Indeed, the superior performance of the SSIM model relative to the popular (but now obsolete) JND-Metrix system and the little used ITU standard VQM was first demonstrated on the LIVE database in 2006. Later, the even better quality prediction performance of the newer MOVIE, ST-RRED, and VMAF tools relative to the older SSIM model was also established on the LIVE databases, and subsequently verified many times by other developers and researchers.

On release, the LIVE databases quickly became global de facto standard tools for Television (and other) picture quality model development and testing tasks, and remain so today. The overall corpus of LIVE subjective picture quality databases represents many years of sustained and continuing effort (each database requires > 6-12 months of effort from start to finish) and involves much more than conducting human subject tests. The scrupulous, detailed design of picture content relevant to Television, Internet, mobile and other applications requires a deep engineering knowledge of how to mathematically model and simulate or produce realistic video distortions, innovating realistic distortion patterns including packet loss patterns, compression artifacts such as H.264 and MPEG2 profiles, video rate control protocols and new video transport protocols such as HTTP and DASH, which are required to understand and create streaming video quality tools. The work requires a deep knowledge of the human visual perception of Television pictures and picture distortions in both 2D and in 3D and over time.

References

[1] Z. Wang, A.C. Bovik, H.R. Sheikh and E.P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, April 2004.

[2] H.R. Sheikh, M.F. Sabir and A.C. Bovik, “An evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440-3451, November 2006.

[3] Z. Wang, E. Simoncelli and A.C. Bovik, “Multi-scale structural similarity for image quality assessment,” Thirty-Seventh Annual Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, California, November 9-12, 2003.

[4] H.R. Sheikh and A.C. Bovik, “Image information and visual quality,” IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430-444, February 2006.

[5] P. Le Callet and F. Autrusseau, “Subjective quality assessment IRCCyN/IVC database,” 2005, Online at: http://www.irccyn.ec-nantes.fr/ivcdb/.

[6] N. Ponomarenko et al., “TID2008 – a database for evaluation of full-reference visual quality assessment metrics,” 2008, http://www.ponomarenko.info/tid2008.htm.

[7] S. Winkler, “Analysis of public image and video databases for quality assessment,” IEEE Journal on Selected Topics in Signal Processing, vol. 6, no. 6, pp. 616-625, October 2012.

[8] VQEG, “Final report from the Video Quality Experts Group on the validation of objective models of video quality assessment,” April 2000.

[9] LIVE Video Quality Assessment Database, 2009, http://live.ece.utexas.edu/research/quality/live_video.html.

[10] K. Seshadrinathan, R. Soundararajan, A.C. Bovik and L.K. Cormack, “Study of subjective and objective quality assessment of video,” IEEE Transactions on Image Processing, vol. 19, no. 6, pp. 1427-1441, June 2010.

[11] K. Seshadrinathan and A.C. Bovik, “Motion-tuned spatio-temporal quality assessment of natural videos,” IEEE Transactions on Image Processing, vol. 19, no. 2, pp. 335-350, February 2010.

[12] F. De Simone, M. Tagliasacchi, M. Naccari, S. Tubaro, and T. Ebrahimi, “H.264/AVC video database for the evaluation of quality metrics", IEEE International Conference on Acoustics, Speech, and Signal Processing, March 14–19 2010.