ut ut

Laboratory for Image & Video Engineering

LIVE-NFLX-II: Towards Perceptually Optimized End-to-end Adaptive Video Streaming

LIVE-NFLX-II Subjective Video QoE Database

Introduction

According to the Cisco visual index report, video traffic from content delivery networks is expected to occupy 71% of all consumed bandwidth by 2021. The need for more bandwidth and network resources is largely fueled by large scale video streaming applications and by increasing consumer demand for better quality videos and larger display resolutions. Nevertheless, the available bandwidth is sometimes volatile and insufficient, especially in mobile networks and in developing countries.

HTTP-based adaptive video streaming (HAS) is becoming a very common method for modern video streaming services (such as Netflix and YouTube) to address network variations. The main idea behind HAS is to parse video content into multiple representations (bitrate/quality levels) and to allow for client-driven rate allocation. Under this setting, the client device is responsible for deciding on the rate/quality of the chunk that will be played next. These client decisions are usually related to past network throughput values, future bandwidth estimates and other client-related information, such as the device buffer status.

Existing QoE studies do not fully capture important aspects of actual video streaming systems, e.g., they do not incorporate actual network measurements and client-adaptation strategies. To this end, we have created an adaptive streaming prototype that consists of an encoding module, a network module, a video quality module and a client module. Using this system, we built the LIVE-NFLX-II database, a large subjective QoE database that integrates perceptual video coding and quality assessment, using actual measurements of network and buffer conditions, and client-based adaptation.

A unique characteristic of the subjective database presented herein is that we incorporate recent developments in large scale video encoding and adaptive streaming. To model video encoding, we have deployed an encoding optimization tool that selects the optimal (rate/quality-distortion) encoding parameters for each content, which is an approach that may lead to significant bitrate savings as compared to a fixed bitrate ladder. To guide the video encoding process, we utilize a state-of-the-art video quality algorithm (VMAF) that is currently employed to measure streaming video quality at the global scale.

To model video streaming, we use actual network measurements and a pragmatic client buffer simulator, rather than just simplistic network and buffer occupancy models. Given the plethora of network traces and adaptation strategies, the database captures multiple streaming adaptation aspects, such as video quality fluctuations, rebuffering events of varying durations and numbers, spatial resolution changes, and diverse bitrate/quality levels and video content types.

Download

The LIVE-NFLX-II subjective video QoE database is available to the research community free of charge. If you use it in your research, we kindly ask that you to cite our paper listed below:

  • C. G. Bampis, Z.Li, I. Katsavounidis, TY Huang, C. Ekanadham and A. C. Bovik, “Towards Perceptually Optimized End-to-end Adaptive Video Streaming,” submitted to IEEE Transactions on Image Processing.

You can download the publicly available videos together with the subjective data by clicking THIS link. Please fill THIS FORM and the password will be sent to you.

Database Description

The database includes 420 videos that were evaluated by 65 subjects which led into 9750 continuous-time and 9750 retrospective subjective opinion scores. Continuous-time scores capture the instantaneous Quality of Experience (QoE), while retrospective scores reflect the overall viewing experience. These videos were generated from 15 video contents streamed under 7 different network conditions and 4 client adaptation strategies. These 7 network conditions are actual network traces from the HSDPA dataset representing challenging 3G mobile networks. The 4 client adaptation strategies cover the most representative client adaptation algorithms, such as rate-based, buffer-based and quality-based. The selected video contents cover a diverse set of content genres (action, documentary, sports, animation and video games). The content characteristics span a large variety including natural and animation video content, fast/slow motion scenes, light/dark scenes and low and high texture scenes.

To design the experimental interface, we relied on Psychopy, a Python-based software. Psychopy makes it possible to generate and display visual stimuli with high precision, which is very important when collecting continuous, per-frame subjective data. To facilitate video quality research, we make our subjective experiment interface publicly available at THIS link.

A single-stimulus continuous quality evaluation study was carried out over a period of four weeks at The University of Texas at Austin’s LIVE subjective testing lab. We collected retrospective and continuous-time QoE scores on a 1080p 16:9 computer monitor. Given the large number of videos to be evaluated and necessary constraints on the duration of a subjective study, we showed only a portion of the distorted videos to each subject via a round-robin approach. To avoid user fatigue, the study was divided into three separate 30-minute viewing sessions of 50 videos each (150 videos in total per subject). Each session was conducted at least 24 hours apart to minimize subject fatigue. To minimize memory effects, we ensured that within each group of 7 displayed videos, each content was not displayed more than once. We used the Snellen visual acuity test and ensured that all participants had normal or corrected-to-normal vision.

Metadata Description

To allow for reproducible research, we also make available the metadata for the generated video sequences. Next, we give a brief description for each variable that is made publicly available:
adaptation_algorithm: the type of streaming adaptation algorithm used to generate this specific video. There are four different algorithms in total.
content_name: the name of the source video content
content_name_acronym: acronym for the content name, following the conventions in the paper
content_spatial_information: SI measure for the corresponding source video
content_temporal_information: TI measure for the corresponding source video
continuous_zscored_mos: the continuous subjective scores, after performing z-scoring per subject and then averaging over all subjects (that watched this particular video sequence)
cropping_parameters: ffmpeg-style cropping parameters in case of black bars (useful if you want to remove them for video quality calculations)
distorted_mp4_video: name of the distorted video sequence
frame_rate: the frame rate of the video sequence
width: the width of the video (display width)
height: the height of the video (display height)
is_rebuffered_bool: a vector with zeros and ones, denoting the presence of a rebuffered frame with 1, else with 0
PSNR: the per-frame PSNR scores calculated between the reference and distorted videos, after removing black bars and rebuffered frames
SSIM: the per-frame SSIM scores calculated between the reference and distorted videos, after removing black bars and rebuffered frames
MSSIM: the per-frame MS-SSIM scores calculated between the reference and distorted videos, after removing black bars and rebuffered frames
STRRED: the per-frame ST-RRED scores calculated between the reference and distorted videos, after removing black bars and rebuffered frames
VMAF: the per-frame VMAF scores calculated between the reference and distorted videos, after removing black bars and rebuffered frames
N_playback_frames: the number of frames where there was no rebuffering
N_rebuffer_frames: the number of frames where there was rebuffering
N_total_frames: the total number of framse (with and without rebuffering)
per_segment_encoding_width: the encoding width for each segment in the video sequence (before adding rebuffering events, if any)
per_segment_encoding_height: the encoding height for each segment in the video sequence (before adding rebuffering events, if any)
per_segment_encoding_QP: the QP value for each segment in the video sequence (excluding rebuffering)
playback_duration_sec: duration of the video excluding rebuffering
rebuffer_duration_sec: duration of all rebuffering events in the video sequence
video_duration_sec: the total duration of the video sequence (with and without rebuffering)
playout_bitrate: the bitrate for each frames belonging to a segment. If it is a rebuffering event, the bitrate is set to 0 for that frame.
rebuffer_number: the number of rebuffering events in the video sequence
reference_yuv_video: the name of the corresponding source video (in YUV 420P format)
retrospective_zscored_mos: the retrospective subjective scores, after performing z-scoring per subject and then averaging over all subjects (that watched this particular video sequence)
scene_cuts: the frames that were selected as scene boundaries, with each pair defining a segment that was encoded at a specific resolution and QP value
scene_cuts_detected: the "actual" scene cuts. These are the places where the scene detection algorithm determined the existence of scene cut. These scene cuts are different from the ones we used in the scene_cuts variable, to enforce a maximum segment size (see paper).
throughput_trace_kbps: the corresponding network trace used to generate this video. There are seven different throughput traces overall.
throughput_trace_name: the name of the throughput trace. The naming convention is: means of transportation_start_end.
For any questions please contact cbampis@gmail.com.

Investigators

The investigators in this research are:

Copyright Notice

-----------COPYRIGHT NOTICE STARTS WITH THIS LINE------------
Copyright (c) 2018 The University of Texas at Austin
All rights reserved.

Permission is hereby granted, without written agreement and without license or royalty fees, to use, copy, modify, and distribute this software and its documentation for any purpose, provided that the copyright notice in its entirety appear in all copies of this software, and the original source of this software, Laboratory for Image and Video Engineering (LIVE, http://live.ece.utexas.edu ) at the University of Texas at Austin (UT Austin, http://www.utexas.edu ), is acknowledged in any publication that reports research using this software.

The following papers are to be cited in the bibliography whenever the software is used as:

  • C. G. Bampis, Z.Li, I. Katsavounidis, TY Huang, C. Ekanadham and A. C. Bovik, “Towards Perceptually Optimized End-to-end Adaptive Video Streaming,” submitted to IEEE Transactions on Image Processing.

IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT AUSTIN BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF TEXAS AT AUSTIN HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

THE UNIVERSITY OF TEXAS AT AUSTIN SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF TEXAS AT AUSTIN HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

-----------COPYRIGHT NOTICE ENDS WITH THIS LINE------------

Back to Quality Assessment Research page