High-level synthesis (HLS) promises to deliver complex designs faster and with greater maintainability. Much of the focus has been on using architectural exploration to find ways to create denser logic blocks that meet performance goals. But congestion has emerged as an issue that Cadence Design Systems is addressing with research into its root causes.
Phil Bishop, vice president of R&D for high-level synthesis at Cadence, described the efforts at the company’s Front End Design Summit in California late last year. Explaining the motivation, he said: “SystemC is often golden at our customers. It’s value proposition is that by keeping things in SystemC you are keeping less code and it’s easier to maintain.
“With high-level synthesis, some of the major optimizations are around architectural exploration. You can explore them very early in the design flow. The tools today – and we have two different technologies at Cadence – can help you optimise a whole load of different design types. And the design flow is really helping us provide strong output.”
Bishop said the market for HLS is growing very quickly now. “In terms of HLS usage, early on it was all datapath-oriented.” Recently, the usage has expanded into cores such as memory controllers, “things heavily laden with state machines”, as well as heavy usage in vertical markets, usually based around video and graphics. “These are often algorithm-rich apps that lend themselves to exploration in C++.”
“800-plus tapeouts have been done either with Cynthesiser or CtoSilicon,” Bishop added.
Having dealt with a number of designs, the Cadence team noticed that the way in which HLS works can lead to excessive congestion.
“We thought ‘we have to dig into this area’,” Bishop said. “Congestion is at the heart of a lot of design-closure challenges. We saw that it was increasing chip size, time and cost. We noticed congestion took one of two forms. Global and also local, within a specific module or block. We thought ‘let’s focus in on local’. We noticed that this congestion was mostly caused by SystemC optimizations that lead to bad logic design.”
Chief of the problems was the generation of multiplexers and demultiplexers with large numbers of connections, putting heavy demands on routing resource and so leading to congestion, something that has become a major concern because of the problems of providing access to cells in processes with highly restrictive design rules.
“We looked at our HLS flow. The issue is [it is not until] way down in the flow you find the congestion problem. [Today’s] flow does not incorporate enough feedback and the back-end has limited opportunities to take away the congestion. You are doing local optimizations at that point but we feel you can do a lot more at the global level.
“We looked into some customer examples. They were very happy with design, but had to tackle congestion problem,” Bishop explained. The horizontal and vertical congestion in parts of video code, graphics processor and discrete cosine transform engines that Cadence analyzed could prove to be two orders of magnitude higher than what customers wanted. RTL Compiler could attempt local fixes to address some part of the congestion, but ideally a flow would bring the tool’s understanding of layout into HLS.
Source of the problem
“When we first delved in we though the tool was over-optimizing, over-sharing,” Bishop explained. “Micro-architectural decisions can have a strong impact. Resource and register during area minimization can increase congestion. But coding style ended up being a strong influence on congestion. Ninety five per cent of all congestion was SystemC constructs leading to large amounts of muxing and demuxing.
“The muxes were being generated by part of the algorithm, typically large loop constructs. We found that any [HLS] tool that tried to transpose [the design] into RTL would have to break that loop. We found that if you wrote [the code] in a different way, you got rid of the congestion. But it wasn’t intuitive,” Bishop added.
“When we saw that we started working on a package to run reports: find muxes with tremendous amounts of wiring in RTL then point back to where those constructs had been generated,” Bishop said.
Working with a lead customer, Cadence built a congestion-detection package that works module by module and, where it finds instances, points back to the troublesome SystemC source.
“Having built that design flow we looked at ten new cases brought in from customers. We tried to find different varieties of design. We have run hundreds of cases now but these ten were pretty informative. We found a large amount of area usage didn’t necessarily track congestion. Large number of muxes were not necessarily an issue either.
“The big problems for us were SystemC constructs where, from a HLS, perspective we were breaking a loop. If there is a lot of work done in the loop, we ended up with a tremendous number of muxes in the loop,” Bishop said.
As it is primarily caused by coding style, the first step is to provide information to designers to let them correct the problem. “Users gets info on what is happening and given a list of changes that could be made. Today we have a kind of open-loop kind of system, if you will. They know what they can change and which SystemC construct is causing the issue.
“But there are additional things we are doing in R&D, to perform more automated detection and more visualizations. And then automatically apply corrections – that’s our next move. We are also looking at scheduling and resource sharing techniques that consider routing congestion.”
Summarizing, Bishop claimed: “HLS has definitely matured and is now encompassing a broader value proposition and a greater number of design styles. Cadence has chosen to make this investment to lead the market in this area.
“We are finding that the way to have powerful high-level design is to partner very closely with front-end design team. Using these optimizations creates a very powerful C-to-silicon design environment.”