Laboratory for Image and Video Engineering - The University of Texas at Austin

Deep Learning

Like nearly every other laboratory in the sciences and engineering, LIVE has become quite interested in deep learning, primarily as a tool to solve perception- related image processing problems. We are not of the opinion, however, that deep learning is much akin to brain science. A common conception is that CNNs resemble precessing by neurons in visual cortex. This idea largely arises from the observation that, in deep convolutional networks deploying many layers of adaptation on images, the very early layers of processing often begin to resemble the profiles low-level cortical neurons in area V1, viz., directionally tuned Gabor filters. However, deep network implementations are becoming increasingly vertical, suggesting different modes of adaptation than occurs in the brain. In human vision the several million cone cell responses are distributed in parallel over a much larger retino-cortical map, where they are processed through a relatively small number of identified functional layers, where visual information is processed in a massive “laterally” parallel manner (unlike CNNs operating on small images of fixed dimensions) to encode low-level spatio-temporal information over diverse orientations and scales, which is then passed to multiple areas of the brain. Thus, generally accepted models of cortical processing are actually quite different from evolving deep learning systems, whose functional adaptations are distributed across increasingly vertical deep layers in a manner that is difficult to decipher. Nevertheless, the analogy is, perhaps, not an unreasonable model of low-level foveal processing when using a CNN of limited depth. Despite the tremendous excitement surrounding deep networks, it is perhaps unsurprising that arbitrarily deep networks, with universal approximation capability in each layer, while optimizing millions of parameters using exceptionally efficient graphical engines, should achieve standout performance relative to simple shallow, networks. What is remarkable is the generalizability of these networks, viz., their ability to “transfer” their learned knowledge from one image dataset / task to another. In our work on deep networks for picture quality, we have been emphasizing the need and methods for efficient data augmentation and use, given the dearth of adequate quality-labeled picture data in existing picture quality databases. In other work we are doing on remote sensing (earth imaging), where data is often available, but labels (of things like rivers), we have also been emphasizing the creative labeling of vast image datasets. Some of the key papers, all recent, follow:

F. Isikdogan, P. Passalacqua and A.C. Bovik, “RivaMap: An automated river analysis and mapping engine,” Journal of Remote Sensing of Environment, to appear.

F. Isikdogan, P. Passalacqua and A.C. Bovik, “DeepWaterMap: Surface water mapping by deep learning,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, to appear.

J. Kim, H. Zeng, D. Ghadiyaram, S. Lee, L. Zhang and A.C. Bovik, “Deep convolutional neural models for picture quality prediction,” IEEE Signal Processing Magazine, Special Issue on Deep Learning for Signal Processing, to appear.

F. Isikdogan, P. Passalacqua and A.C. Bovik, “River network extraction by deep convolutional neural networks,” IEEE Geoscience and Remote Sensing Letters, Special Issue on Remote Sensing: Learning and Computing, to appear.

Laboratory for Image & Video Engineering