Cutting through the AI hype with OneSpin’s Raik Brinkmann
Artificial intelligence (AI) and its two main offshoots, machine learning (ML) and deep learning (DL), are hot topics right now. But in actual fact. It can be traced back to a conference in 1956 at Dartmouth College. At the time, researchers considered designing systems that resembled characteristics of the human brain. But how far have we got in applying these technologies not only to improve the design tools themselves but also to address the new verification challenges presented by AI-based architectures?
Dr Raik Brinkmann, CEO of formal verification specialist One Spin Solutions, is one of the EDA executives who has been publicly raising the challenges ahead. He shares the excitement about AI but also cuts through the hype to emphasize how much needs to be done.
I recently had the chance to sit down for an in-depth discussion on AI’s implications and its execution in design. This is an edited transcript of that discussion. It offers an important reality check, whether you are a tool user, an AI designer or simply interested in knowing exactly how far we are along the learning curve.
Lauro: Raik, what can you tell us about the status quo in AI with respect to EDA tools specifically?
Raik: You can look at AI from different perspectives. The obvious question is, “Can we use artificial intelligence to make design verification easier?” I think this is the most interesting application, but also the most challenging and I don’t have a recipe yet for how to do that.
We are not at a point where you can say, “This is how you could make things easier for verification engineers,” whether in the formal, simulation or emulation areas. However, there are plenty of ideas on how to get data from verification projects, on monitoring and tracking verification activities performed by verification engineers over time, and on trying to make predictions on future verification projects. In general, our large customers are better equipped to handle the above because they work on several concurrent projects that cumulatively generate a large amount of data. They may even have tools to keep track of their progress. Verification is truly a big-data challenge.
EDA processes, especially verification, generate huge amounts of data, including the design-specific data, the verification requirements, the universal verification methodology (UVM) test environment, and others. You may conceive of using AI for analyzing and correlating the data and for a given design type. For example, you have communication chips with so many cores, and so many features. There, you might estimate how long it may take to verify them, and what level of efforts might be required.
Significant work has been done already by companies on back-end design and manufacturing processes where you have issues like yield. Data is collected over time and then analyzed for root-causes. Problems in the floorplanning and yield optimization are good examples. And we already do this in verification with functional coverage, although there is no learning associated with it. It is just data analysis.
The metadata in verification is an interesting target for making life easier for verification engineers by reaching better visibility and a more realistic estimation of a verification project.
Lauro: Do you see any other area in verification where AI could help?
Raik: So, how can we use artificial intelligence and machine learning to optimize our tools and make better tools?
An obvious approach is to look at different ways of running your tools and try to optimize the settings. This involves running the tool on a benchmark set multiple times, automatically changing the settings, or applying different conditions across verification environment, and then getting the data and analyzing it. It would be interesting to connect the benchmark data with the user data, but that will require overcoming the users’ reluctance to release proprietary information. Still, it will be a great help if we can tap into the EDA user’s data for machine learning without actually transferring the design.
There is work in progress on this application in the formal space. A few companies have already published papers on the topic with promising results.
Lauro: Is there any other area where machine learning can bring value?
Raik: Another idea is, “can you make an EDA company more effective by using machine learning?” At OneSpin, we follow an agile tool development paradigm, relying on a large amount of automation for building and testing. We also track requirements regarding features, source code, testing following good engineering practices, as well as requirements to fulfill the demands of safety standards. This process generates a lot of data that can potentially be leveraged using machine learning. For example, it can be used to optimize test sequencing, test runtime or to perform impact analysis for change requests. This is similar to what our customers will do when applying machine learning to SoC design projects as mentioned earlier. We apply the same principles to EDA tool development.
The guidelines would only affect the tool design process and not the tool itself. They would help the tool supplier to become more efficient because tool development would be done in a smarter way.
Lauro: How about verifying designs that implement AI?
Raik: From a business point of view, this is probably the most interesting topic. What are the specifics of AI chips, what are the next-generation AI chips going to look like, and what are the implications for register transfer level (RTL) design and verification process?
A requirement common to almost all AI chips is the use of floating point units. There are different types of floating point units: multiply-accumulate or fused-multiply-accumulate that is the main workflow for machine learning. Many DSP IP chips perform floating-point calculations. Some perform only fixed-point math but many use floating point math, half-precision, single-precision and variable precision floating-point. You want to make sure that these DSP IP blocks are actually working according to the IEEE standard.
You may think that multiplication is easy. In an ideal world, that’s true. But in the real-world, it is not that easy because the multiplier has rounding modes, saturation, operation triggering, latency, and other things. To handle all these issues properly requires significant effort to write a comprehensive testbench to assure compliance with the standard. Formal analysis is ideal to help in this regard.
In general, much higher performance than in a non-AI chip design is necessary in AI chips which may have significant implications on the existing chip architectures.
I attended the Future Chips Conference in Beijing in December 2017. It was enlightening to see how semiconductor companies make the case for their new AI chip architectures. As an example, there are one-million cameras in Beijing monitoring people on the streets, all of them producing data at a very high datarate. If you want to transmit all that data to a computer center, it wouldn’t work. The data stream would be too much. Not only that, if you want to process that huge volume of data in a big data center, it would consume a lot of power. To correct the problem, more intelligence, more AI has to go into the cameras themselves. You would basically upload the picture of the person into the camera and once the camera identifies it, it would flag it.
The moral of the story is, AI needs to migrate to the edge to avoid a lot of power and throughput requirements that cannot be met with current technologies. To address these limitations, people are designing new architectures.
In general, the main problem for a chip designer is that power consumption is dominated by memory transfers. If you look at the structure of a neural network or other machine learning systems, it consists of a big tensor or matrix multiplication, namely, a huge network of arithmetic functions involving large numbers of coefficients and arguments stored in memory. The execution of these functions involves loading the coefficients and passing the arguments for each individual node through the network, which requires many cycles. This process consumes lots of power.
To cope with this, designers move the memory next to the computer or the computer into the memory, for instance, by installing local memory on the chip. But the current process technology is not good for that since you cannot integrate seamlessly logic with memory. Process-wise, it’s a mismatch.
The trend today is monolithic 3D flows. This means that you have tiny wires through the different layers in 3D. In 3D, you can put the memory on top of the computer.
Lauro: You alluded earlier to verification of AI designs. What can you say about this?
Raik: Yes, verification of an AI system. This is a whole new, potentially big area where we are still learning how to do it. It is completely unclear to me how people want to make sure these AI designs are safe and sound.
Fundamentally, AI designs change the rules of the game. You move from an engineering discipline well thought through, starting with model creation, followed by implementation, and then verification where you can explain why it works, to a completely different paradigm that is data driven. In an AI design, you can observe every single multiplication and addition in the neural network, know any of its million parameters, but you still have no clue what it does and why it is exhibiting a particular behavior on a grand scale. You have complete transparency of the system, but you have no understanding why it works.
You may be able to verify each individual piece like the multiplications above, but what are the properties you want to verify for the whole system, and how do you approach that? There is no strict mathematical description or specification of the system. You may only have statistical expectations from a statistical system.
There is lot of work that shows you can easily mislead an AI network into believing something that was actually not the case. For example, if there are a few black bars on a road sign, an AI system may tell you it is a 40-miles-per-hour street sign instead of a stop sign, which is totally crazy.
Here is where emulation can help because you may want to study the effect of certain changes to the data or to the network or to the coefficients in a deeper and subtler way when you are training the networks. But once you get silicon, you have even more effects like physical effects from functional safety standpoint and their impact the system. As an example, if you perform safety analysis for an automotive chip with regard to physical errors, like fault propagation analysis, you work with a statistical model on the likelihood of faults to estimate if the chip gets you 99.9 % reliability. A similar approach might work for data-driven systems where your data and maybe even the processing are not 100% accurate by construction.
The bottom line is, we need some sort of statistical verification, because essentially the whole machine learning is a statistical method. It is no longer a deterministic process of formulating something that you want to achieve and then implementing it. Rather, machine learning is a statistical process, which leads to a statistical verification process.
Lauro: Raik, thank you for taking the time to talk to us. Your analysis of the potential applications for AI in the field of EDA and design verification is impressive and exhaustive.