What can AI do for verification? Speakers from two major EDA companies described some of the possibilities including those of foundation models, as well as techniques already in use, at the recent Verification Futures Europe conference.
Matt Graham, project engineering group director in the system-verification group at Cadence Design Systems, said at the conference organized by Tessolve a major reason for bringing AI into verification lies in the expected increase in design activity.
Quoting estimates by IBS, Graham noted, “We’re looking at something like 4x the number of annual design starts over the next decade…When I visit customers they say ‘three years ago, we did one chip a year, now we’re doing four chips or five or six’.”
But these organizations are unable to budget for commensurate increases in staffing. “And the engineering talent may or may not exist,” he added. AI may provide the missing link. “I don’t think it’s going to replace all the engineers. Not in my lifetime anyway. I think what we can do is make the engineers more efficient.”
Primarily, Graham argued, AI is best deployed to filter the masses of data that EDA flows produce. “Think about how many log files you have, how many waveforms, how many pieces of RTL, and the output of any EDA tool that you may just ignore. It never gets used. If you think about failure analysis, how often do you look at the log files of tests that pass. It’s not because we don’t think there’s good information. It just that we can’t [as humans] consume that much data. AI can potentially help us leverage that.”
Ramesh Narayanaswamy, principal engineer at Synopsys, agreed, “You could be doing a lot of noise reduction. Nobody reads a log file but there’s probably one gem in there you would like to know about. The tools can help you.”
The metadata around log entries may prove as helpful as the errors or warnings themselves, potentially spotting patterns in commits that are more likely to lead to problems than others, and so in need of testing earlier in a regression suite to reduce the amount of compute needed to detect problems. Graham pointed to work in the company’s Xcelium ML verification-management tool to improve coverage for less runtime.
Narayanaswamy explored some of the possibilities presented by large-language or foundation models such as OpenAI’s GPT. However, he cautioned that there is a lot of work to be done before they are ready whereas more traditional ML methods such as decision trees offer better explainability and applicability to EDA problems.
He used the example of StarCoder, a word-completion large-language model. “Star Coder is interesting because it’s trained on not just the English language but on a huge collection of curated code.” Presented with the phrase “def matmul”, it will use the learned coding patterns to produce a matrix-multiplication function that is more or less ready to run.
However, there are caveats. Often the results provided by large-language models contain errors and “hallucinations” of connections between data they have learned that are not real. They lack the ability to plan to a significant degree and, importantly for RTL, lack the capacity that other methods can deploy.
The number of tokens the open-source models can handle are on the order of a few thousand. “Maybe there are some with 8K tokens. If you try to ask a question about a Verilog module and say summarize it, having 8K tokens available isn’t much. You may have to chop it up in an intelligent way to handle the problem. You can’t throw a million lines of code at it and hope it will do something. It won’t. You need to do some clever stuff around it,” Narayanaswamy argued.
The problem of introducing errors may prove to be a stumbling block, particularly for less experienced engineers. Though some research on productivity with language models mentioned by both Graham and Narayanaswamy has indicated bigger improvements for less experienced staff, this may prove to be the opposite in hardware and software design.
“If you’re a new developer and the tool gives you a slightly buggy UVM testbench, you need to be a UVM expert to know there’s a bug in there and be able to clean it up. A less-experienced verification engineer might accept it and check in some garbage,” he noted.
However, the errors might be acceptable if the overall advice is sound. Work on text-to-text transformations by Google looked at how this can improve program performance: changing the structure to take advantage of more efficient routines, such as list comprehensions in Python versus a loop structure.
“If you have a human in the loop and, let’s say it gave you some slightly buggy list comprehension, you could have fixed it up and at least got the idea that the list comprehension was going to be faster. That’s probably the way to think about it: It’s an assistant for somebody with context,” Narayanaswamy noted.
In addition to Cadence and Synopsys, Siemens has been analysing the potential for AI in verification, summarising the team’s thoughts so far in a white paper published a couple of months ago.
“One thing we’ve finding out about AI and EDA is that it’s not about finding a better algorithm using ChatGPT. It’s really about being data scientists, and looking at all the data we have, organising and conditioning that data and then applying some amount of AI on top of that data to do something interesting,” said Graham.
Following the European event, for the first time, Tessolve is running a US version of the Verification Futures conference in Austin, Texas on September 14th, 2023.