Neural networks bring advanced object detection to embedded vision
Computer vision has been around for many years, but until recently it has been run on PCs and other platforms with ample processing power, and therefore faced limited issues with energy consumption. Now there’s growing interest in embedding vision processing into SoCs to perform advanced vision tasks in consumer and mobile applications. This requires dramatic reductions in power consumption and cost, as well as much greater performance, to accommodate emerging algorithms and demands for greater vision accuracy.
There’s a huge opportunity for embedded vision processors that offer low cost and low power consumption yet can handle sophisticated image-processing tasks for high-definition video streams. We see two main reasons for growing interest in embedded vision. First, we interact with the world around us through vision, and having electronic products react to us in the same way will enable much more naturalistic interactions with the digital realm. Second, today’s technology, such as very high-performance embedded processors and advanced manufacturing processes, gives us the horsepower to embed vision capabilities in an SoC at size and power consumption levels that keep costs low and don’t burn up the device.
Embedded vision opportunities
What sorts of applications are emerging? Think of today’s high-end automobiles, which already have four or five cameras to help drivers back up, parallel park or stay in the right lane. As the amount of autonomy we allow our vehicles increases, so will the need for low-cost imaging systems to gather information about their surroundings. These systems need to handle object-detection tasks at high speed on HD video streams, so that the vehicle can respond appropriately. Over the next 5 to 10 years, the level of embedded vision in cars will increase rapidly to a point at which, in the not-too-distant future, your car may no longer need your input.
We’re already getting used to face detection in cameras, for setting critical focus automatically, and even expression recognition, for example smile-detection algorithms that control a camera’s shutter to take a photo at the perfect moment. But how about a mobile phone that can detect when you (and only you) look at it, and wake itself from sleep mode as you go to pick it up? Or a TV that can be controlled by gestures, a little like the screens in Tony Stark’s lair in Iron Man, so you never have to hunt for the remote again.
There are opportunities in retail, too, with digital signage that recognises regular customers and presents them with customised offers as they arrive in a store. More sophisticated systems could even update the signage as favoured customers move between departments – so they see special offers for the shirts that they were just browsing as they move on to other parts of the store.
There’s a myriad of potential applications for embedded vision processors – once their cost and power consumption can be reduced far enough. The challenge is that processing a video frame requires a lot of work to find objects or recognize gestures.
The recognition challenge
There are multiple steps in processing an image, but just the first step (known as the image pyramid) with a VGA image (of 640×480 pixels) requires more than 15 million operations. Doing this at 30 frames per second requires about 450 million operations, per second, while on an HD image it will require more than 2 billion operations per second.
Achieving this performance in a high-volume consumer application, or one that runs on batteries, requires a special embedded vision processor such as Synopsys’ DesignWare EV IP cores.
It is possible do this kind of vision processing on a RISC CPU, but the trade-off for the flexibility of these architectures is that you’re unlikely to achieve the performance necessary to enable broad application in embedded vision systems. Similarly, you could use a GPU or a set of GPUs to run the detection tasks, but their power consumption is measured in watts and the implementation costs would limit their application.
Our approach to bringing advanced vision capabilities to the embedded market has been to develop a programmable object-detection engine constructed from specialized vision-processing elements, and a dynamic streaming interconnect system to manage the dataflow through the object-detection engine. This engine is combined with up to four 32bit ARC RISC CPUs, a DMA to import and export the frame data in the background, and low-latency shared memory to enable efficient passing of information and intermediate results. The resulting DesignWare EV processors offer the flexibility of programmable solutions with the performance and efficiency of dedicated hardware.
How does the object-detection engine manage to execute vision tasks more efficiently than a RISC CPU or GPU? It can do this because it is designed to run a convolutional neural network (CNN) executable very quickly and efficiently.
CNN is the leading vision algorithm in terms of accuracy and quality of results for object detection. With a CNN, a lot of the hard work of object detection is done through offline training, using a server farm to examine a population of, say, 100,000 images of a target object, to find their common features, which are then expressed as a CNN executable. This executable is then programmed into the object-detection engine and run to find objects or gestures in images.
With Synopsys’ DesignWare EV processors, software programmers don’t have to worry about the details of accessing this detection capability. The object-detection engine is called from the OpenVX runtime operating on the ARC RISC cores when the CNN executable needs to be run to find an object in an image. To the programmer, the engine is just a function call that runs the CNN executable and returns a result.
Synopsys’ DesignWare EV processors are configurable at build time, giving the user the flexibility to implement them as needed for their application. Using CNN for object detection, the EV processors deliver up to 1000GOPS/Watt (billion operations per Watt). We believe this is five times better than can be achieved with other vision processors. This configurability and performance efficiency translates into low power consumption and small size, and is combined with an easy programmer’s model, making Synopsys’s DesignWare EV52 and EV54 processors a great solution for emerging consumer and portable vision applications.
To find out more about applying advanced object detection to your embedded vision designs, follow the links below.
Further information
To find out more about Synopsys’ DesignWare EV processor family, click here.
For more information on computer vision and CNN principles, see the following:
- NVIDIA’s article, “Accelerate Machine Learning with the cuDNN Deep Neural Network Library”
- Embedded Vision 2014 Summit keynote, “Convolutional Networks: Unleashing the Potential of Machine Learning for Robust Perception Systems” by Yann LeCun, Director of AI Research at Facebook and Silver Professor of Data Science, Computer Science, Neural Science, and Electrical Engineering at New York University
- Computer vision page on Wikipedia
Author
Michael Thompson is the senior manager of product marketing for ARC processors and embedded vision at Synopsys. Thompson has more than 30 years of experience in both the design and support of microprocessors, microcontrollers, IP cores, and the development of embedded applications and tools. He has worked for Virage Logic, Actel, MIPS, ZiLOG, and Philips/Signetics. He has a BSEE from Northern Illinois University and an MBA from Santa Clara University.
Company info
Synopsys Corporate Headquarters 690 East Middlefield Road Mountain View, CA 94043 (650) 584-5000 (800) 541-7737 www.synopsys.comSign up for more
If this was useful to you, why not make sure you’re getting our regular digests of Tech Design Forum’s technical content? Register and receive our newsletter free.