Cadence uses machine learning to trim constrained-random runtimes
Cadence Design Systems has developed a stimulus optimizer based on neural networks to try to improve the runtime of verification regressions that use simulation driven by constrained-random verification.
The machine-learning algorithm in Xcelium ML points the randomization kernel in the simulator away from regions that do not appear to improve coverage based on prior runs. Paul Cunningham, corporate vice president and general manager of the system verification group at Cadence, said most customers use the savings, which have been on average a five-fold reduction in compute cycles for early-access customers running distributed regressions, to extend the range of tests they apply during regressions as well as to run quick sanity checks on new code check-ins intended to pick up on a major problems caused by an alteration. “We’ve not seen anything much less than a 3x saving,” Cunningham claimed. “We are still in the early days for this technology so we don’t have a ton of data. There are some cases where we got a 10-20x saving.
“It’s one of the most obvious things to apply machine learning to,” Cunningham added. “The concept is very intuitive but it’s surprisingly difficult to make it work. We’ve been working on it for two years and made several attempts.”
Train early, train often
One potential issue with a system that learns which stimuli to drop that cover one part of the randomizable space because they have not been shown to reveal any problems is that code changes later on can easily make those values relevant again. Cunningham said experience at early-access customers such as Kioxia is that retraining the model every week provides a good tradeoff between simulation efficiency and safety. “Retraining needs to be an ongoing thing.”
The model training uses log files dumped to disk by each of the simulation worker nodes and can be done by a separate bank of machines – either on-premise or in the cloud – so that it does not affect the runtime of the simulation farm itself. “The actual runtime needed to dump the data is an incremental overhead,” Cunningham said. “And the training method is very parallelizable.”
Customers are able to tune how aggressively stimuli are to be dropped out. Very aggressive settings have become popular for the short sanity checks used at check-in points so they run quickly to detect obvious issues. For the longer, overnight regressions, customers can opt for a safer settings to ensure they are not accidentally letting bugs get through that would be caught if pruning was not being used.