Jeff Bier, founder of the Embedded Vision Alliance and president of BDTI (Berkeley Design Technology Inc.) delivered the key note address at the Synopsys ARC Summit Workshop held on March 26 in Santa Clara.
The title of the talk was; How AI, Deep Learning and Machine Perception are Changing Our World.
Jeff started his talk by saying that he would like to talk about a world where every reasonable device can perceive the world as humans do. He made an exception for a hair brush. Although I concur that a comb may not need to be intelligent, a hair brush might be used to sense dandruff, or whether the hair is dry or oily. The amount of desired sophistication is still up to the user. Jeff’s point is that embedded vision and embedded machine perception are on a development track similar to what happened with wireless communications over the last twenty years.
Jeff stated that in fact things are progressing more rapidly in the embedded machine perception space than in wireless communication. “My goal today is to convince you that machine perception is rapidly becoming one of the most important technology of our lifetime. Systems with perception are vastly superior to those without, we finally have algorithms that enable machines to perceive the world as well as humans, and our ability to deploy the algorithms in systems is increasing at a very rapid pace” Jeff stated.
Machine perception enables a system to be safer, to be more autonomous, and to be easier to use. He showed, as an example, an ad for an intelligent automobile that can perform very advanced operations keeping the occupants, and especially the driver, safer and less tired. It is true that you can buy such a car today, I have one. The features are great, but they still require more development, since some of the results are confusing the first time and annoying from then on. For example, the collision avoidance system in my car does not separate a car turning left in the adjacent lane from one slowing in my lane. Thus my car is made to slow down for no reason, creating a problem for the car following mine. Since my car can keep itself in the lane of travel, it should be able to understand what is going on in that lane versus what is happening in the surrounding environment and the turn lane to the immediate side.
Mr. Bier defined Machine Perception as “the ability of a machine to perceive the world around it in a human-like way”. Computer vision is the visual subset of such function. He claims that the most interesting computer vision is done using artificial intelligence. Machine learning refers to algorithms that learn from data in contrast to ones that are designed by hand for a very specific procedure. Machine learning enables algorithms to learn from experience. Artificial neural networks are a hot topic when discussing machine learning. Deep neural networks simply refer to networks with non-trivial number of layers. At present humans do not have the ability to use much of the data surrounding the individuals as reported by such systems, but things are changing because sensors are getting much better. Sensors are becoming ubiquitous and this allows us to capture more ambient data information and to use it to make smarter devices and systems.
“Effective machine perception means that we have algorithms that reliably extract meaning from real world sensory data” maintains Mr. Bier. A problem that requires inductive logic to solve is difficult. In the visual sensory space the conditions surrounding the object to be detected such as lighting, composition, contrast, play a role in increasing the difficulty. Jeff explained that we do not really understand how our visual perception works, so we do not have a framework to guide our development. This means that solving one specific problem can take hundreds of man-years noted Jeff. “Fortunately, deep neural networks have provided a breakthrough in the accuracy and generality of the algorithms for a wide variety of machine perception cases”.
The picture shows the rate of progress in solving object recognition problems based on the Pascal BOC benchmark. The field was making slow progress in terms of accuracy until about 2012 when convolutional neural networks began to be applied to these problems. This, more importantly, changed the slope of the line so the accuracy of the algorithms is increasing at a substantially faster pace. We are now at the point that for some cases the results are as good as what humans perceive. The use of convoluted neural networks to solve computer vision problems is becoming popular with about 66% of companies in the field using them.
- research in convolutional neural networks is very expensive because researchers use large amounts of computing power. Jeff underlined that for researchers, there is no obvious reasons to look for inexpensive implementations versus more powerful algorithms. But in order to deploy the algorithms, cost becomes a significant factor. One way is to restrict the field of relevance, that is limit the number of functions executed by the algorithms. Mr. Bier stated that if the industry can achieve 10X each improvement in algorithms efficiency, process architecture, and software development tools, then we would achieve a 1000X improvement in the efficiency of implementing the algorithms in silicon.
For example, researchers developed algorithms using 32 bit floating point precision, but the algorithms really do not need such precision when used in real application. In an application on de-noising of a picture Synopsys showed that using a 12 bit fixed point algorithm instead the results are very acceptable. And using 8 bit architecture the impact is very minor. By retraining the algorithm to take into account the loss of precision one can take the architecture down to four, two, or even one bit architectures. The result is greater efficiency. Another way to simplify the algorithm is to use less numbers of neurons by removing those that have little impact on the result. The same result can be obtained by limiting the number of coefficients used in the algorithm. All of these changes lower the power consumption and the amount of memory required as well as demanding less compute resources.
Of course, even better results can be obtained if algorithms are developed taking cost into account at the very beginning. When such approach is taken, results are better than what can be achieved by optimizing algorithms after they are developed.
As far as processor architecture is concerned, the industry has well proven techniques already. Using parallel co-processors has proven to be an effective way to reduce power consumption and cost. The drawback of using special purpose co-processor is that the development environment is generally not as robust and comprehensive as for general purpose processors. Good results can be achieved using s general purpose CPU and a special purpose co-processor. The way to take advantage of such architecture is to execute the great majority of the code on the general purpose CPU, but to have most of the execution performed by the co-processor. Synopsys has shown an improvement over GPUs of over 300X in terms of performance per silicon area which means a significant improvement in silicon utilization. Now we can have wearable devices, like Microsoft Hololens that, according to Jeff, is a good implementation of augmented reality.
Development tools for heterogeneous systems remain a challenge. One solution is to create domain specific tools which support the compilation of deep convoluted networks only. Taking this approach significantly simplifies the problem of developing the tool. Synopsys has developed such a tool, used to map convolutional networks onto their vision processor. Mathworks has also developed a specialized compiler to generate deep neural network code for GPUs from Matlab. These show significant improvements between three and fourteen times in code efficiency over generic tools.
To look at what is possible within three years, Jeff took the efficiencies already achieved. Taking conservative values the result is 16X times 20X times 3X or just about 1000X. Considering the mid-range of the same improvements the result is closer to 3000X improvement. This is really happening and is going to change our world. Machine perception is becoming available to increasingly wider markets. Mr. Bier chose the Cosmo Robot Toy from Anki a San Francisco company, that sells for less than $200 as an example of a consumer product taking advantage of the new capabilities.
The conclusions of the talk are that there is significant advantage in bringing machine perception to products, that now there exist algorithms whose capabilities approach and at times exceed human perception, and that there exist both processor architectures and tools to implement the systems.