Saliency detection is an important task in image processing as it can solve many problems and it usually is the first step in other processes. Saliency detection is traditionally carried out by using hand-crafted features obtained from principles in neuroscience. On the other hand, introduction of convolutional neural networks was to some degree a paradigm shift in the field of object detection and it impacted other areas of image processing, including salient regions detection, as a whole. In this project, two new methods based on deep learning and convolutional neural networks have been introduced for pixel level detection.

The first network uses a modified version of inception layer in addition to the convolutional layers. This configuration of network is beneficial because it only need one pass of the network to produce final saliency map, thus it is fast and can be used in real-time applications. The output is a gray-scale image which is the final saliency map. The architecture of this network is depicted in the figure below:

The architecture of network with inception-like layers.

For this purpose, another network with different layer configuration have been designed. These two configurations differ in the middle part of network where the first one uses inception-like layer and the second uses residual configuration. Residual block configuration seems to be appropriate for such tasks when it is needed to produce data that resembles input image. Since saliency maps have a lot of features of their original image, using residual blocks is justifiable. Another advantage of using residual blocks is the enhanced gradient flow which can impact network convergence. The architecture of this network is depicted in the figure below:

The architecture of residual layers.

The whole training procedure for each network took about 2 hours on a Nvidia GTX980 GPU running TensorFlow framework. The results of each of these networks in saliency detection task is available in the figure below:

(a) Original image (b) Saliency map from the first network (c) Binary output of salient regions from the first network (d) Saliency map from the second network (c) Binary output of salient regions from the second network (f) Ground truth.

These methods are fast and accurate making them desirable for real-time implementation. The results of this project has been presented in The 6th International Conference on Robotics and Mechatronics (ICRoM 2018).

Leave a Reply

Your email address will not be published. Required fields are marked *