Takashi Kawabe, Konica Minolta Technology Center, Inc.
Mitsuhiro Matsumoto, EVE KK
Konica Minolta Technology Center, Inc., of Tokyo, Japan, is known for its high-speed, high-performance LSI designs used in image processing. As most semiconductor companies can attest, it is finding that the hardware debugging process is getting more and more complicated.
As a consequence, it recently set out to evaluate various hardware emulators to determine the best implementation for a transaction-based co-emulation methodology. The goal was to improve hardware verification and SoC realization efforts of its hardware design team, and to drastically reduce the time it spent creating custom transactors.
The three-month evaluation set out to identify which co-emulation system best optimized a set of transactors. Konica Minolta selected EVE’s ZeBu-XXL emulator and ZEMI-3 transactor compiler.
Transaction-level modeling enables the construction of complex, system-level test environments. The parallel execution of the testbench and emulated design under test (DUT) offers faster performance over simulation. However, there are challenges to overcome and that was part of Konica Minolta’s evaluation criteria. Each transactor had been designed differently, for example. Shortcomings in their design or implementation can impact the co-emulation performance and aren’t the only components of the co-emulation environment that can limit performance, another challenge.
According to Amdahl’s Law, the speedup in a system will be limited by its slowest path, which means that any serial or parallel path in the co-emulation environment can become a bottleneck. To overcome these obstacles and achieve optimal performance in transaction-based co-emulation, every challenge must be examined and not just transactors.
The design team at Konica Minolta Technology Center needed to identify and optimize additional performance bottlenecks found in transaction-based co-emulation.
The original test environment included a set of 160,000 Verilog test patterns, each having a short execution time in simulation — only 1.7K seconds, or under 30 minutes. Alternatively, simulating the complete set serially would take 3,200 days. Konica Minolta elected to use transaction-based co-emulation to accelerate its verification efforts, setting a performance target of one month to complete the entire test-pattern set. That’s a reduction time of more than 100X from simulation, calculated at 16 seconds per test pattern.
The ZeBu-XXL “driver clock,” used for transactor processing and to derive the DUT clocks, was set at 6MHz, while ZEMI-3 transactors were optimized for performance based upon previous experience with transactor development.
Although ZeBu accepts behavioral code, using synthesizable Verilog reduced the number of state machines and logic generated. Smaller transactions, such as individual memory writes, were grouped together to reduce the frequency of synchronization between the testbench and the emulator.
The initial implementation using pre-optimized transactors resulted in a runtime of 51s per test-pattern — an improvement and above the performance target. Additional exploration was required to identify and eliminate additional performance bottlenecks.
The Konica Minolta team made sure that each test pattern included an initial memory load prior to the test, as well as a trailing memory dump at the end of the test. A custom C function was used to measure the performance of the test environment. It showed that each memory load or dump took 6.5 seconds to complete, contributing to more than 25% of the total runtime.
Memory activities were performed using a text-based $readmemh format. The use of a binary file format eased file handling and bandwidth requirements for the memory load/dump, improving runtime to 0.16 seconds for each activity. Each test pattern executed in 38.5 seconds, once this optimization was applied.
Another performance bottleneck was found in the emulator setup time. ZeBu emulators are based on standard Xilinx FPGAs. When a test is initiated, these FPGAs are loaded with the DUT and transactor bitstreams, and the initial memory contents are loaded — a process taking approximately 30 seconds.
In typical emulation usage, where the DUT executes application software or boots an operating system, this short setup time can be insignificant. With a test-pattern run-time of only 38.5 seconds, it is a huge factor.
Since the DUT and test environment did not change between tests — only the test pattern changed — the repeated emulation setup was not required. The test environment was adjusted to setup only once and run all of the test patterns. With the emulator setup occurring only once, executing the complete test-pattern took 16 days, or 8.5 seconds per test pattern. This outcome was double Konica Minolta’s target, a 200 times performance boost over the original simulation runs.
The Konica Minolta evaluation report shows three types of optimization technique: structural optimization, memory/communication optimization and setup-time optimization. Moreover, the report recommends mixing these techniques effectively to achieve the maximum performance.
The conclusion may seem obvious, but well worth summarizing here. Achieving maximum performance in transaction-based co-emulation requires more than fast transactors and a fast emulator. Potential bottlenecks can occur in any stage and can come from unexpected sources. Each emulation test environment must be individually analyzed to identify and eliminate performance bottlenecks.
The Konica Minolta Technology Center design team is quite satisfied with its transaction-based co-emulation methodology and hardware debugging process. In fact, it is now a key portion of its LSI designs.
Editor’s Note: Konica Minolta’s Takashi Kawabe will offer more details on the evaluation and selection of ZeBu-XXL and ZEMI-3 transactor compiler during a DAC User Track presentation Thursday, June 7, from 9-10:30 a.m. at the Moscone Center, Room 303.