Who would have thought the New Year would start on such a rocky note for all things digital? I am talking, of course, about Spectre and Meltdown.
I admit I didn’t pick up the signs when trouble started brewing last June. Having done some research, the problems appear to have been kept deliberately under wraps, so that hackers would not immediately start looking for exploits.
The current furore has been caused by these two spectacular vulnerabilities that affect almost all processors. They are not just keeping Intel, ARM, AMD, Apple, Microsoft, Google, Amazon and Facebook awake at night. Just about anyone who uses a PC desktop, laptop, tablet, mobile phones or any other device (or cloud service) based on mainstream CPUs has reason to be concerned. I’m tossing and turning a bit myself.
That has made me think about how we got into this mess from the perspectives of both verification and validation. I find it hard to believe that processor vendors are running short of smart engineers and are under-resourced, or that they didn’t try hard enough to prevent this problem arising.
Spectre and Meltdown leave devices vulnerable to side channel attacks. These have been around for a while. They can be launched against CPUs predominantly due to the well-known speculative out-of-order execution optimization and branch target prediction features built into virtually all modern-day processors. So, I doubt that we are looking at a simple failure of intelligence or a mass oversight. Assuming that is the case, how did these vulnerabilities elude the industry for so long?
My theory comes down to a general bias in how we look at the design of electronic systems: security simply is not sufficiently seen as a hardware issue.
Security - the bigger picture
Cybersecurity is still mostly seen as problem that revolves around application software and operating systems. Hence, all the investment—products, security companies, and research—in those spaces. After all, we all know how to install and update anti-virus and anti-malware programs.
This is wrong. Very wrong, as has become painfully obvious. Suddenly, even my mother-in-law is asking, “What’s this problem about passwords leaking? I read this in the Times of India.”
Security is – and always has been – a system issue. It needs to be designed in and architected up-front. We cannot continue to use legacy hardware optimizations from the 1990s and remain wilfully ignorant of the fact that they can be compromised or assume vulnerabilities will not arise. Speculative execution and other fancy optimizations are needed for performance, and have been around for a long time; virtually all CPUs are designed that way. I’ll repeat: Side channel attacks are not new. And the knowledge that there are hackers out there is not new either.
The mindset for verification and validation needs to change. In 1999, Ken McMillan published a now classic paper on formally verifying Tomasulo’s out-of-order algorithm. Back then, security was not such a big deal. Now, the ubiquity of computers and connected devices has made it the biggest of big deals.
Verification alone is not enough. Thorough validation is also needed. Conventional techniques based on dynamic simulation cannot scale, and mindless emulation will not either. We need to deploy an array of more advanced verification techniques, such as formal, to help with verification and validation.
Let me illustrate my recommendation for a greater use of formal verification with a story drawn from my own experiences.
The spectre of nasty surprises
Not long ago, I worked in support of a young engineer verifying a design that, while far less complex than the processors just exposed, shared many of their core characteristics. We stumbled upon something interesting.
The design had a customisable CPU that was meant to support any firmware/software interface and lots of other interesting interfaces. Its designers were confident that they had a decent architecture in place and had tested the microarchitecture implementing it with basic C simulations. We picked this project because there were limited resources, and I was always looking for new ways to apply formal.
One of the first things I check in a verification plan is whether it captures the scope of the runtime features of the design. As this design could be programmed through a software interface, and had regions of memory that had privileged, as well as unprivileged access, it was clear that we needed to check that no unprivileged access to the privileged region was possible.
Although the architects were aware of this concern and had taken precautions when designing the instruction set architecture and the microarchitecture, there were plenty of instances where our formal testbench found such vulnerabilities.
In some cases, they were oversights on the part of the designer or were design bugs where the implementation did not conform to the known expectation (which, in some cases, was fully specified, but in others, had been specified only partially). However, the most important aspect of our work was that we exposed some combinations of instruction execution that were illegal from the viewpoint of security. They could have provided illegal access to regions of memory, or, in the presence of an attack, created a livelock in the design.
These validation issues got the architects and designers hooked on formal. Not only did we end up carrying out a standard functional verification of the design, but we also specifically targeted a range of security issues that we suspected could be triggered through software and firmware. We provided proofs in a formal tool that they were there, but that if the design was protected through exception recovery hardware, we could protect the processor against explicit breaches, and the best part was that we could prove this exhaustively in the formal tool
Asking the right questions
Having the ability to run a formal tool and model checks that reflect the actual suspected attacks was the most illuminating part of the work. We did not have to load the firmware or software to carry out the analysis. The formal tools generated the scenarios for us. Our job was to make sure we asked the right questions in the form of formal properties. As my former doctorate supervisor, Tom Melham, would say, “It’s all in the target goal that you’re trying to prove. If you phrase the goal well, most of the work is done.”
My conclusion then and now is that we often fail to ask the right question at the right time.
So how can this story help us now, and provide that needed teachable moment. Well, let’s look at the Spectre-and-Meltdown issue in another way.
All too often, I have heard engineers declare, “We don’t need to worry about verification. Our design is an iteration of something we have used before, so it is silicon-proven.”
Well, as Spectre and Meltdown show, ‘silicon-proven’ is not enough. What was proven in a design targeting end-use cases a decade ago can, over time, become vulnerable. Our increasingly networked world is evolving rapidly; so are the strategies used by hackers.
First then, we need to acknowledge that security is as much a hardware issue as anything else. It is one of a number of elements to consider within the system. Having done that, we then need to look for the best tools available to address the challenge.
Here, modern-day formal verification technology can be a huge help. All you need is a methodology to apply it and the expertise to ensure that you ask the right questions to reach your goal. I strongly encourage all of you to consider actively deploying formal for security in your design verification. As an industry, we can—we must—do better.