ut ut

Laboratory for Image & Video Engineering

Regarding Visual Screening

We receive many questions regarding the use of human subjects, and how testing of those subjects is administered in our published visual quality studies. The data in those studies can be found here, with analyzed results in our publications [1]- [16]. While we conduct deep visual testing/screening in many of our psychometric experiments (e.g., detailed eye-tracking studies), in the course of our large-scale subjective studies of picture and video quality, we long ago came to the conclusion that they aren’t needed (exception: we still do apply the Randot 3D perception test to screen subjects during 3D picture quality studies). We also discussed this conclusion, and our decision to (in most cases) not conduct visual screening, with quite a few leaders in the picture quality community, including one of the leaders of the Video Quality Experts Group, who strongly agreed with our position.

There are a number of reasons for this decision: (1) Conducting visual tests takes time which must ultimately detract from that which is applied to actually rating pictures. Subjects can easily become bored or distracted. (2) Conducting the tests on subjects can possible bias them. For example, instead of having each subject simply rate picture/video quality as desired, they may be imbued with and affected by the idea that what they are doing is being measured with scientific precision, and that it requires that they be able to see properly and with good acuity. Certainly, the Snellen test is likely to evoke this feeling. So, we instead simply ask that they wear their corrective lenses, if need be (an obvious thing) and have them ‘go at it’ with minimal instructions to guide the particular test. (3) Since realism not only of content and distortions, but also of human subjects is important, we also believe that we must generally seek a representative cross-section of people having variable visual capabilities. This latter aspect is of particular importance.

If someone doesn’t do well on the Snellen test, we still want them in our study. In the very rare instance that their vision is uncorrectable, either they will inform us, or the outlier rejection process will remove their data, or they will have very negligible effect on the results. If they fail an aspect of the Ishara test, then yes, we still want them. Daltonism occurs in about only 1% of the population. Our study sizes are quite a bit bigger than ITU recommendations (usually 35-60 subjects, except when there are thousands), so these folks will very likely be statistically representative. Even an extremely rare occurrence of 3 Daltons in our group would be quite welcome. In fact, we’d welcome someone who was achromatopsic, and we wouldn’t need to know about them. (4) There isn’t any evidence than color blindness affects distortion perception very much. Color distortions are usually accompanied by gray-scale distortions, hence the effect is probably minimized.

While we and others have found that chromatic features in image quality predictors can help their performance a bit, it is not by much, at least on the types of studies we have thus far conducted. While screening might matter on a study of pure color distortions, again, chromatically-challenged persons are part of our population, and picture quality algorithms should be able to handle this situation. Ultimately our goal is a resource for creating and testing real-world picture/video quality models and algorithms. Anyway, that’s the long history of our decision on this, and we plan to stay the course. Naturally, we are willing to listen to alternate opinions.

As a final comment, we are sometimes asked, in regards to screening and other factors, why are we not following the published industry standards for conducting picture/video quality evaluation studies, such as those published by the ITU/ISO/VQEG? The answer to this is simple: we are vision scientists and video engineers, and to do that would be very limiting. Without disparaging those efforts, we often disagree with them, which, as scientists, we have every right; otherwise how can we be intellectually honest, and further, how can we advance the field? While we often part ways with the “standardized way” because they are either unnecessary, limiting, or in our view incorrect in some regard(s), even more often we conduct studies that go beyond what the standards consider, and must then make our own protocols regarding such studies.

References

[1] Z. Wang, A.C. Bovik, H.R. Sheikh and E.P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, April 2004 (description of the LIVE Image Quality Database Phase 1).

[2] H.R. Sheikh, M.F. Sabir and A.C. Bovik, “An evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440-3451, November 2006 (LIVE Image Quality Assessment Database).

[3] K. Seshadrinathan, R. Soundararajan, A.C. Bovik and L.K. Cormack, “Study of subjective and objective quality assessment of video,” IEEE Transactions on Image Processing, vol. 19, no. 6, pp. 1427-1441, June 2010 (LIVE Video Quality Assessment Database; first introduced and made public by Al Bovik during ICIP 2008 Plenary Talk).

[4] A.K. Moorthy, L.K. Choi, A.C. Bovik and G. de Veciana, “Video quality assessment on mobile devices: Subjective, behavioral, and objective studies,” IEEE Journal of Selected Topics in Signal Processing, vol. 6, no. 6, pp. 652-671, October 2012 (LIVE Mobile Video Quality Assessment Database).

[5] A.K. Moorthy, C.-C. Su, A. Mittal and A.C. Bovik, “Subjective evaluation of stereoscopic image quality,” Signal Processing: Image Communication, vol. 28, no. 9, pp. 870-883, September 2013 (LIVE 3D Image Quality Database)

[6] M.-J. Chen, L.K. Cormack and A.C. Bovik, “No-reference quality assessment of natural stereopairs,” IEEE Transactions on Image Processing, vol. 22, no. 9, pp. 3379-3391, September 2013 (LIVE 3D Image Quality Database Phase 2).

[7] D. Jayaraman, A. Mittal, A.K. Moorthy and A.C. Bovik, “Objective quality assessment of multiply distorted images,” Annual Asilomar Conference on Signals, Systems, and Computers, Monterey, California, November 4-7, 2012 (LIVE Multiply Distorted Image Quality Database).

[8] C. Chen, L.K. Choi, G. de Veciana, C. Caramanis, R.W. Heath, Jr. and A.C. Bovik, “A model of the time-varying subjective quality of HTTP video streams with rate adaptations,” IEEE Transactions on Image Processing, vol. 23, no. 5, pp. 2206-2221, May 2014 (LIVE QoE Database for HTTP-based Video Streaming).

[9] S. Gunasekar, J. Ghosh and A.C. Bovik, “Face detection on distorted images augmented by perceptual quality-aware features,” IEEE Transactions on Information Forensics and Security, vol. 9, no. 12, pp. 2119-2131, December 2014 (LIVE Distorted Face Database).

[10] L.K. Choi, A.C. Bovik and L.K. Cormack, “A flicker detector model of the motion silencing illusion,” Perception, vol. 43, no. 12, pp. 1286-1302, December 2014 (LIVE Flicker Visibility Database).

[11] D. Ghadiyaram and A.C. Bovik, “Massive online crowdsourced study of subjective and objective picture quality,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 372- 387, January 2016 (LIVE In-the- Wild Challenge Image Quality Database).

[12] D. Ghadiyaram, J. Pan and A.C. Bovik, “A time-varying subjective quality model for mobile streaming videos with stalling events,” SPIE Conference on Applications of Digital Image Processing, San Diego, California, August 10-13, 2015 (LIVE Mobile Video Stall Database-I).

[13] D. Kundu, D. Ghadiyaram, A.C. Bovik and B.L. Evans, “Large-scale crowdsourced study for tone-mapped HDR pictures,” IEEE Transactions on Image Processing, vol. 26, no. 10, pp. 4725-4740, October 2017 (ESPL-LIVE HDR Subjective Image Quality Database).

[14] D. Ghadiyaram, J. Pan, A.C. Bovik, A. Moorthy, P. Panda and K.C. Yang, “In-capture mobile video distortions: A study of subjective behavior and objective algorithms,” IEEE Transactions on Circuits and Systems for Video Technology, to appear (LIVE-Qualcomm Mobile In-Capture Video Quality Database).

[15] C. Bampis, Z. Li, A.K. Moorthy, I. Katsavounidis, A. Aaron and A.C. Bovik, “Study of temporal effects on subjective video quality of experience,” IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5217-5231, November 2017 (LIVE-Netflix Video Quality of Experience Database).

[16] D. Ghadiyaram, J. Pan, and A.C. Bovik, “A subjective and objective study of stalling events in mobile streaming videos,” IEEE Transactions on Circuits and Systems for Video Technology, to appear (LIVE Mobile Video Stall Database-II).