Conferences and supporting programme
DeepAPI - Bringing Deep Learning at the Edge Device With a Use Case in Food Recognition
In this white paper, we present an innovative approach for food product identification in real time, based on AI and deep learning methods that provide extreme accuracy without depending on cloud or high-end processing systems. During the past few years, convolutional neural networks (CNNs) have been established as the dominant technology for approaching real-world visual understanding tasks. A significant research effort has been put into the design of very deep architectures, able to construct high-order representations of visual information. The accuracy obtained by deep architectures such as GoogleNet and the more contemporary ResNet on image classification and object detection tasks, proved that depth of representation is indeed the key to a successful implementation. Main focus up to now was on implementations for mainstream PC-like computing systems or cloud based systems, in order to deploy deep learning approaches into diverse technological areas like automotive, transportation, IoT, medical etc. However, meeting particular performance requirements on embedded platforms is, in general, difficult and complex. A possible workaround to this problem, is the use of heterogeneous computing: This involves the exploitation of every computing resource present on an embedded system (CPU, GPU, DSP) to which a part of the load is off-loaded increasing this way the overall computational capacity and thus processing speed. There are however cases, where a multicore CPU is the only available resource on an embedded system. So, there is a reasonable question in this case: Is it possible to have a fast inference speed? One candidate solution is the SqueezeNet 1.1 model which is able to achieve similar levels of classification accuracy with ImageNet, to the baseline AlexNet architecture, using 50 times fewer coefficients. The smart combination of small convolutional kernels and a complex architecture that enables information to flow through different paths facilitates the construction of sufficiently high-order image representations that are suitable for a large variety of applications. In the use case of Food Recognition this CNN architecture has been trained to perform image tagging, and is able to discriminate between 101 food categories, tagged in the Food-101 database comprised by some 101000 images. The dataset us then augmented by cropping each image at 5 different frames and vertical mirroring each one of them. The training has been made using the Caffe deep learning framework and the accuracy achieved in terms of average recognition rate is 72% for Rank 1, 85% for Rank 3 and 91% for Rank 5.
--- Date: 28.02.2018 Time: 2:30 PM - 3:00 PM Location: Conference Counter NCC Ost