High-level Synthesis—Quo Vadis?

Dr. Johannes Stahl, Director of Product Marketing, System-Level Solutions, Synopsys

If you have ever watched the epic 1951 movie “Quo Vadis” starring Peter Ustinov, you know what suspense is – that combination of excitement and patience. High-level synthesis is no different. It has been around commercially for almost two decades with its share of technology highs and lows. Every year the design community is hungry to hear when the next big breakthrough for mainstream adoption and impact on productivity is going to hit the market. Although many different technologies have been proposed and commercialized, their benefits where not substantial enough to be relevant to the hardware designer community, compared to incremental improvements in RTL methodologies.

There has also been a lot of confusion about who should actually use high-level synthesis tools. Is it the algorithm designer, system architect or is it the hardware designer? More fundamentally, why should design teams adopt HLS? Is it to improve quality of results (QoR, which entails area, performance, power) or is it for verification productivity? It’s a tough storyline to follow. The plot is thick and there’s a large cast of characters; however, most of them have already exited the stage.

So, first let’s try to briefly classify the different technologies that are available in today’s markets. If we look at the HLS technologies accessible to system engineers for FPGA-based prototyping, we have to start with their commonly-used design methodologies. For communications system design, the predominant way systems are captured is model-based. Algorithm designers use tools like SPW/System Studio from Synopsys or Simulink® from Mathworks. For rapid prototyping, they can get libraries from FPGA vendors that plug into these tools and allow them to generate RTL which maps to FPGA architectures. Some tools have a more generic approach that allows targeting FPGAs and SoCs from the same block diagram. These methodologies have been proven for many years and are in production today [1]. However, they can only partially be considered HLS, since the QoR optimizations performed are at a low level. Still, these methodologies fulfill one goal of HLS already, which is verification inside a high-level design environment from which RTL is then generated.

The second class of implementation tools for communication engineers are model-based HLS tools, such as Synphony Model Compiler from Synopsys or Simulink HDL CoderTM from Mathworks, both supporting system definition in Simulink and then synthesis to RTL. These tools combine block generators (for example, filters or FFTs) with global HLS optimizations, while solving some of the tougher problems with multi-rate design implementation. Because they are contained in a modeling environment and are based on a set of library elements, they deliver predictable results and easy adoption for system engineers. FPGA prototyping is very straightforward with these technologies; production deployment of the RTL depends on tight integration with the backend synthesis tools for SoC and FPGA.

The other class of applications that have adopted HLS approaches are imaging and video designs, which have a very sophisticated combination of control and data processing. It’s a long standing myth that HLS does not handle control processing well; this is actually an integral part of today’s HLS tools. The commonality between all HLS tools used for this space is that they are language based and not model based. The fundamental reason is that most video applications start as a software implementation on a processor, usually a reference C or C++ program. The most natural approach, therefore, is to start with this high-level C/C++ code and from there, refine the code for implementation.

The implementation choices are:

Software: Pick an off-the-shelf processor core and optimize the C code using software tool chains tuned for that processor.

Custom Processor: Build a dedicated programmable core again starting with a high-level language such as LISA

Hardware Accelerator: Offload some of the processing into an accelerator and use a top-down C/C++ based HLS flow, such as Synopsys Synphony C Compiler or Mentor Catapult C.

In many cases, there is a need to evaluate and verify the combination of software and HLS-designed hardware. Both the functionality, which could be validated using a virtual prototype, and the impact of high amounts of video data across common SoC memory and interconnect infrastructures are critical to make the final design decisions. In order to facilitate that, HLS tools have to provide the capability to generate SystemC models at various abstraction levels to plug into complete SystemC platform models.

The other method for HLS being used today is a bottom-up approach, in which individual blocks are coded in SystemC, synthesized separately and then assembled for system simulation. This approach, after significant amounts of training in SystemC coding, has some attraction for hardware designers since they can consider SystemC as an HDL that has capabilities of higher abstraction and at the same time provides all low-level mechanisms to express RTL behavior.

Unfortunately, this bottom-up approach forfeits a lot of flexibility for architecture optimization and also requires a lot of manual coding work which, of course, introduces bugs that will have to be found and eliminated, thus reducing the verification benefit HLS provides in the first place.

An interesting side note on HLS progress is illustrated in a study done by BDTI [2] that compared HLS flows to software flows. One would assume that option #1 above (a tuned software implementation) would be the path of least resistance in terms of implementation effort. But the results indicate that tuning software for embedded CPUs/DSPs is itself a very difficult process and the latest HLS tools have closed the productivity gap quite a bit. Designers should find this encouraging. At a minimum, it is certainly fuel for the current sequel of the HLS story.

However, to know where the HLS story is going, you have to take a hard look at the overall problem: the challenges for SoC design center around verification productivity – including the software – and power optimization (QoR). Any HLS methodology will have to maintain close links to SystemC flows for the architecture and software verification. SystemC models, as a byproduct from HLS, provide a high ROI for design teams employing SystemC-based methods. On the other hand, RTL verification for designs that require cycle-by-cycle definitions of the functionality don’t benefit from HLS, since they can be effectively specified and verified using traditional HDL languages and methods. Trying to use a higher level language, such as SystemC, for this purpose only decreases productivity because a massive amount of technology and methods needed to deal with RTL are not available with SystemC.

That leaves HLS for the designs that have room for multi-cycle operation. The design teams that come up with the best architectures in the design continuum available for multi-cycle implementation will win. They will use a combination of off-the-shelf processors (thousands of cycles), custom processors (hundreds of cycles) and accelerators (tens of cycles) designed with HLS. Being able to hit ‘compile’ from a common C/C++ source and have code either execute on the processor or implemented as a hardware block will be the dominant and critical methodology.

Should the hardware designer of today be concerned that the ‘compile’ button will replace his job? For the blocks that will be in the HLS space, the hardware designer’s job will be one of the most challenging and critical ones. The split between verification engineer and design engineer will go away for these functions, as design and verification are highly integrated for HLS. Also, in the same way that a DSP engineer has to know how to write good code for his/her processor’s C-Compiler, the hardware designer will have to have the same knowledge. With all the sophistication EDA companies are building into their high-level synthesis tools, the impact of input code on the quality of the output RTL is still high. Just imagine the design opportunity at hand since every line of C-code generates 10 lines of Verilog RTL.

The final myth that has hindered HLS for many years is that chip real estate is growing so much, that it does not matter if the gate complexity coming out of HLS is a little higher. The time-to-market advantage was supposed to be more important. There is only one area for which this is actually true –FPGA-based design. This is a rapidly growing area for HLS and will see a huge amount of adoption for HLS in the coming years, provided the integration of HLS with FPGA synthesis tools simplifies time-to-results. However, for SoC design there is not a single customer that doesn’t want to get the best RTL, gate count and power possible. As the design size attacked with HLS is in the multi-million gates range today and should easily hit 10 million in the next few years, it will be critical to improve the integration between HLS and the leading RTL synthesis.

The success of HLS will depend on continuous innovation, making its adoption easier, as well as integrating it into the system-level, verification and backend flows to produce a robust, high QoR path to implementation. Top-down, C/C++-based HLS technologies are poised to become the mainstream methodology in the years to come. We should all look forward to what is sure to be an exciting and happy ending for this story.

[1] “SAWLESS RF Transceiver Technology Meets the Multiband, Multimode Requirements for 4G/3G/2G”, Vivek Bhan, Fujitsu Semiconductor, Wireless Design Magazine, November 2010
[2] BDTi Certified™ Results for the Synopsys Synphony C Compiler, 2010, find the article here.