ut ut

Laboratory for Image & Video Engineering

Welcome to the LIVE-YouTube Text-in-Video Quality and LIVE-COCO Text Legibility Databases

LIVE-YT-TVQ and LIVE-COCO-TL Databases

Introduction

User-generated visual content (UGC) now occupies a significant fraction of internet traffic, and billions of UGC videos and pictures are uploaded daily. Among these, short-form video content now accounts for most of the videos consumed by online users. Given the popularity of short-form UGC content, being able to control the perceptual quality of UGC videos has emerged as an important problem. Visual UGC is subject to myriad types, severity, and combinations of distortions. While UGC video quality has been closely studied, the quality and legibility of text that is overlaid or embedded in short-form UGC videos has received relatively low attention. However, being able to accurately predict text quality in images is important, since it both impacts the overall perception of the content it is embedded in, as well as the messages being conveyed. It is also beneficial for applications involving image or video text recognition which can affect visual search and content identification. Analyzing the quality of test embedded in pictures or videos is a hard problem, since perception of it is commingled with the surrounding visual content. Our work contributes to both the psychophysics of embedded text quality as well as to computational models of its perception. We have created two subjective datasets -- designated as the LIVE-COCO Text Legibility (LIVE-COCO-TL) Database (a modification of COCO-Text), and the LIVE-YouTube Text-in-Video Quality (LIVE-YT-TVQ) Database. LIVE-COCO-TL contains 74,440 text patches with legibility annotations, while LIVE-YT-TVQ contains appx. 19K subjective quality ratings on 405 videos and 641 text patches extracted from them. We build models that predict embedded or overlaid text legibility and text quality, as well as a multi-task model that simultaneously predicts the overall quality of videos with embedded or overlaid and local text quality.

Download

We are making the LIVE-YouTube Text-in-Video Quality Database and LIVE-COCO Text Legibility Database available to the research community free of charge. If you use these databases in your research, we kindly ask that you to cite our paper listed below:

  • M. Mandal, N. Birkbeck, B. Adsumilli and A. C. Bovik, "Legit: Text Legibility For User-Generated Media," 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 2024, pp. 1152-1158, doi: 10.1109/ICIP51287.2024.10647498.
  • M. Mandal, N. Birkbeck, B. Adsumilli and A. C. Bovik, "Quality Prediction of Embedded and Overlaid Text in User-Generated Visual Content," in IEEE Transactions on Image Processing, 2024 (Under Review)

You can download the publicly available release of the database by filling THIS form. The password and link to the database will be available once you complete the form.

Database Description

The LIVE-COCO-TL Database contains 74,440 text patches categorized as legible or illegible, based on ratings collected from the COCO-Text study. 34,927 text patches were labeled as legible and 39,513 as illegible. These were derived from the original 173.6K text regions in COCO-Text. We maintained the original COCO-Text train-validation split, which resulted in 60.4K training patches and 14K validation patches. The LIVE-YT-TVQ Database consists of 405 videos and 641 text patches extracted from them. Subjective perceptual quality ratings on both the videos and embedded or overlaid text patches were obtained from 28 participants in a controlled laboratory study.

Investigators

The investigators in this research are:

Copyright Notice

-----------COPYRIGHT NOTICE STARTS WITH THIS LINE------------
Copyright (c) 2016 The University of Texas at Austin
All rights reserved.

Permission is hereby granted, without written agreement and without license or royalty fees, to use, copy, modify, and distribute this database (the videos, the results and the source files) and its documentation for any purpose, provided that the copyright notice in its entirety appear in all copies of this database, and the original source of this database, Laboratory for Image and Video Engineering (LIVE, http://live.ece.utexas.edu ) at the University of Texas at Austin (UT Austin, http://www.utexas.edu ), is acknowledged in any publication that reports research using this database.

The following papers are to be cited in the bibliography whenever the database is used as:

  • M. Mandal, N. Birkbeck, B. Adsumilli and A. C. Bovik, "Legit: Text Legibility For User-Generated Media," 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 2024, pp. 1152-1158, doi: 10.1109/ICIP51287.2024.10647498.
  • M. Mandal, N. Birkbeck, B. Adsumilli and A. C. Bovik, "Quality Prediction of Embedded and Overlaid Text in User-Generated Visual Content," in IEEE Transactions on Image Processing, 2024 (Under Review)

IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT AUSTIN BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS DATABASE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF TEXAS AT AUSTIN HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

THE UNIVERSITY OF TEXAS AT AUSTIN SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE DATABASE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF TEXAS AT AUSTIN HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

-----------COPYRIGHT NOTICE ENDS WITH THIS LINE------------

Back to Quality Assessment Research page