The verification of designs that transport data in a serial manner is a challenge for both simulation and formal techniques. The complexity of such designs stems from the ‘serialized’ nature of the packet flow where the state of each packet depends upon the history of all the previous packets in flight. What is interesting about such designs is that they are present everywhere in all kinds of hardware designs. Packet-based serialized data flows can be seen in networking routers, bus protocols, bus bridges, load-store units in CPUs, packing and unpacking designs, and SoC peripherals such as I2C, I2S, USB, UART, Ethernet.
These designs are inherently complex with multiple mutually interacting state machines. At a high level they are about data flow but the control mesh that routes the data correctly can be very difficult to verify completely with directed testing or by constrained random simulation. If you throw in this mix the fact that you can have multiple clock domains – one for the input and the other for the output - obtaining a near exhaustive result is nearly impossible. Nevertheless, while the formal verification of such designs is a challenging task, it can be solved through a combination of methodology and technology.
When methodology meets technology
I have used many formal tools over the past 20 years and seen the underlying technology evolve. Existing tools have improved and new players have come in to the field.
Axiomise recently became a Synopsys partner. I was keen to lay my hands-on VC Formal, one of its more recent tools. At its heart VC Formal is a model checker (a.k.a. property checker) with a suite of applications (apps) stacked on top. I wanted to focus on property checking and wanted to see how the tool coped with the design example discussed in the introduction. I was impressed by the overall quality of VC Formal and its feature-rich offering for both novice and advanced users. Not only was the tool able to find bugs in my design (which I introduced) but it was also able to converge on proofs on hard sequential properties, helped by the Axiomise methodology on scalable formal proofs. I like the fact that the VC Formal orchestration layer enables a wide variety of proof engines to co-operate amongst themselves to speed up their work once sufficient proof engineering support is provided by the user.
Coping with formal complexity
Abstraction is one of the main recipes to control the complexity of formal verification so that we not only find deep bugs in such designs but also obtain exhaustive proofs.
I haveve been pursuing research and application of abstraction for scalable formal verification for 17 years - starting in the early days with my doctoral thesis in problem reduction techniques for scalable model checking.
Let’s look at some of the published literature on this topic. Ed Clarke’s paper is a great read and addresses abstraction techniques nicely. I have also found that, especially on serial designs, a combination of data and temporal abstraction techniques works very well in practice, and this approach dates back to the foundational work done by Tom Melham. Though a lot of early work done by Melham was in the context of theorem proving and higher order logic, the application of these concepts is highly relevant to model checking as well – and this is what has inspired me to build my own abstraction-based solutions.
Smart tracker abstraction
To verify any data transport design, the testbench must observe all the necessary data flows. This is typically done in simulation by tracking all the concrete traces of data words comprising 0s and 1s. All data is tracked at all clock cycles in a scoreboard and at the output the data is compared to what is supposed to be the expected value sampled at the input side.
The key idea in smart tracker abstraction is that we do not explicitly track all the data; we track only one data value. However, that data word is non-deterministically chosen by the formal tool at runtime. The formal tool will instantiate all possible data values to this non-deterministic variable; the user does not have to do anything. This contrasts with simulation where the stimulus is explicitly applied and manipulated.
In our discussion, we will call the non-deterministic variable a ‘watched value’. We use a counter to track the watched value. On a new data write when a new word is accepted in the DUT, this counter increments, and continues to increment until the watched data appears on the input port. On every read, when the data is read out the counter decrements. When the counter reaches the value one, we expect to see the watched value on the output. If we see any other data appear at the output – due, say, to a design bug such as reordering, data loss, or duplication - we will detect the bug as the output data will not match the watched data.
We used this approach to verify a family of FIFOs and noted a massive performance boost. We exhaustively verified FIFOs as deeps as 8192 carrying 32-bit payloads using this abstraction. Though our packet design, and the numerous other serial designs were an order of magnitude more complex than FIFO implementations, the basic transaction counting method rooted in smart tracker abstraction was effectively the main vehicle behind the exhaustive verification of such designs.
The reason this tracker is called ‘smart’ is because it only tracks one symbolic data value and yet tracks all the data values in the design. Moreover, due to the data independence nature of data transport designs, we do not need to observe, store or compare other data values entering and exiting the DUT. By not doing explicit observation, storage or comparison of other values, we save compute time and memory.
We carried out comparisons of the smart tracker method with the two-transaction method where one tracks two distinct values. We found that smart tracker method outperforms the two-transaction method in compute speed, scalability and time. We discuss these techniques in detail in the Axiomise formal verification training program.
Serial packet design
Our packet-based design has three dimensions. The first dimension is the depth of the buffer (Figure 1 shows an 8-deep buffer). Each buffer location can have a varying number of packets and the maximum number that can be stored is configured to be a fixed number at run time. The buffer depth is controlled through a parameter as well. The packets themselves can be of any size ranging from 1 bit to n-bits also configured and fixed at run time. Figure 1 shows a layout where we have an 8-deep buffer with each buffer location able to store up to four packets, each packet being of 8-bit. The green dots indicate how many of the packet locations are harvested so far by active packets. Buffer location 0 has four active packets, while buffer location 1 has one, and so on.
Figure 2 shows the design interface. Input data is transferred in packets where each input packet is written into data_i on an input handshake hsk_i (valid_i && enable_o). Output data is read out on an output handshake hsk_o (valid_o && enable_i).
Exactly how many packets are written is defined by the input pkt_len which is registered into the design on the very first beat of the input handshake of a new packet. Subsequent, values of pkt_len (which is PKT_BITS wide) are disregarded until another new packet stream is seen. The read and write of the packets is controlled by buffer read/write FSM which is a function of read/write pointers indexing the buffer depth and a packet read/write FSM which is a function of packet read/write pointers indexing the packet locations. A new valid write starts when the write pointer index is pointing to 0 and there is an input handshake. In this clock cycle, the value of pkt_len is registered in the design and this is how many packets would be written at this buffer index pointed to by wptr. So, the wptr walks along the buffer depth, while pkt_wptr indexes the individual packets at a given buffer location. Similarly, rptr reads along the buffer depth, while pkt_rptr reads the individual packets from the buffer indexed by rptr. The design also has flags empty_o and full_o to indicate when it is empty and full respectively, and, is driven to reset by an active low resetn.
Verifying packet transfer
Since this is a multi-packet design, we need an array of watched values not just one.
We define these watched values by using the logic datatype in SV: logic [DATA_WIDTH-1:0] wd [PKT_LEN-1:0]; Here, PKT_LEN is a function of the input pkt_len and is defined as 1<<PKT_BITS.
We constrain these watched values to be stable after reset. This allows the formal tool to keep the values stable for each run that it executes, but the value in each run is chosen to be unique. We show four sampling registers we use for detecting the entry and exit of the watched packet stream in Figure 3.
The ready_to_*_sampling_in_* signals are defined in Figure 4.
The sample_in_*_cond signals are wires, while the sample_*_started and sample_*_finished signals are registers. The sample_in_started_cond signal is high on an input handshake, when the packet write pointer is 0, and the very first watched data packet wd appears on data_i. The sample_in_finish_cond signal is high on an input handshake, when the packet write pointer is equal to the value that is meant to be written, and we are ready to start sampling in (i.e., we had started to write the earlier packets since those conditions were met earlier).
The sample_out_started_cond is met when input has been sampled completely and read has been issued and the tracking counter is one. The signal sampling_out_finish_cond goes high on an output handshake when the packet read pointer is equal to the value stored at the time of the packet write and earlier conditions for sampling out data have been met through ready_to_start_sampling_out.
Two other useful signals we need to express our intent are shown in Figure 5.
The smart tracker counter can now be defined as shown in Figure 6.
The related increment and decrement are shown in Figure 7. The signal cpkt_len stores the packet size on every input write handshake.
We sample in the size of the packets by reading in the pkt_len when the sampling condition is set which is defined by sample_in_started_cond. We copy this value into a register called the watched_pkt_len.
The property that establishes that all packets received at the input interface are delivered to the output at the correct time without loss, reorder or duplication is shown in Figure 8. What we need is a property for each packet. Here, the counter_out counts how many watched packets are left to come out once they have been sampled in completely but not sampled out.
We use a generate loop to model these, where the loop itself is sensitive to PKT_LEN (Figure 9).
Note, that the loop above ranges up to PKT_LEN-2, and for the final packet PKT_LEN-1, we write a separate assertion. This is due to the design artefact and the way we have modeled our testbench registers that track the data. For the very last packet in the cycle we read the register sample_out_finished is high. When sample_out_finished goes high then we see the watched data stored at index PKT_LEN-1, or the watched packet at location 0 is seen (for those cases where the watched packet is of length 0).
The property that checks that the very last packet has been delivered is shown in Figure 10.
Results and discussion
We carried out a range of runs using the VC Formal Property Verification (FPV) app on different configurations of buffer depth, maximum packets, and the width of the data vector. The results are shown for an 8-deep buffer in Figure 11. What we noted was that with increasing packet sizes the proof times were scaling linearly when the data width was 1-bit.
However, if you test the assertions for the entire word at once, this tool (or for that matter any tool) will not be able to converge in a predictable manner. So, we sliced the data word property into smaller bit-level properties and ran them sequentially to accumulate the overall run time for the entire word.
Of course, one can run these in parallel with multiple CPU cores. We carried out these runs on a virtual Linux machine with 6 GB memory and a single CPU core. The Complexity Report feature in VC Formal shows that the size and depth of the buffer is a bottleneck that must be overcome.
To increase the proof convergence rate on deeper configurations of the design involves stitching additional helper properties to make it easier for VC Formal to deduce the proof convergence sooner. Once we provided the helper properties, we were able to prove a configuration instance where the buffer depth was 256 with variable packet sizes of up to 8 packets carrying 32-bit data (nearly 10 20,000 states).
The Iterative Convergence Methodology (ICM) feature in VC Formal can aid in identifying the helper properties. However, we did not use it in this case as we were familiar with what was needed.
Verification of sequential designs is a challenge for all verification technologies including simulation and formal verification. But if you use the right methodology with a mature tool that has great solvers a significant boost in performance can be achieved.
This topic is covered in greater depth in Axiomise training programs. More information on those is available here.