Conferences and supporting programme
Analyzing the Generation and Optimization of an FPGA Accelerator with High-Level Synthesis
In embedded system design offloading compute intensive functionality from CPU to FPGA gets more and more popular due to todays performance and energy efficiency requirements. Especially with the availability of SoC-FPGAs like the Xilinx Zynq-7000 or Zynq Ultrascale+ MPSoC devices that combine CPUs using low-power embedded dual-core A9 or quad-core ARM A53 with programmable logic fabric for fixed-function and programmable-function acceleration. While an FPGA can offer huge acceleration at low energy cost, it also has brings in restrictions, e.g. relatively high cost, long development cycles and difficult development with Hardware Description Languages (HDL) like VHDL or Verilog compared to software development. To avoid the last two issues, an approach called High Level Synthesis (HLS) is now available for productive use. High-Level Synthesis (HLS) provides a way to automatically derive register transfer level (RTL) description from a high-level description of a functionality. The high-level description may be provided in C/C++, SystemC or openCL, etc.The HLS tool automatically extracts functional structure and timely dependencies to synthesize the output RTL. However, the coding style of, e.g. C code, targetting an HLS compiler differs from code describing the same functionality targetting a standard compiler, e.g. gcc. Intellectual Property described in a high-level language is functionally validated within a test bench to ensure correct behavior of the description as well as the resulting hardware. As the architecture of the synthesized RTL output is not just depending on the functional description, but in the same amount, depend on additional directives to the compiler, HLS allows for rapid architecture exploration while keeping the functional code untouched. This architectures exploration is achieved by augmenting the functionally tested and correctly styled code with pragmas to instruct the compiler to apply specific design patterns, e.g. loop unrolling and pipelining on register and block level. As an example a highly regular, but recursive algorithm, namely the cryptographic algorithm AES, is adapted from an example CPU targetted code to HLS enabled code. The effect of multiple design (compiler) directives, i.e. pragmas, is demonstrated by showing their impact on resource usage and performance. During the process of code adaption and architecture exploration, we faced some problems with the tools. We share our approach to overcome these issues on the way to an efficient and functionally correctly working accelerator. We will also provide some overview on how the tools actually respond to specific input and how this translates to the real, i.e. placed and routed circuit implemented on the FPGA.
--- Date: 28.02.2018 Time: 2:30 PM - 3:00 PM Location: Conference Counter NCC Ost