Conferences and supporting programme
From Matlab To FPGA in Manageable Steps, a True Story in Double Precision
The growing computational power of our machines seems to only increase our hunger for even more teraflops. But at the same time we strive for low power consumption and flexibility. Our desktop CPUs have seen relatively little improvements over the past decade, unlike the highly capable hybrid systems that combine CPU, GPU and FPGA architectures, like Xilinx' Zynq MPSoC. FPGA based cloud computing provides computational power by the hour to those who need it. While FPGA devices offer a unique balance of flexibility and efficiency, programming these devices has usually been restricted to that handful of specialists who have the necessary knowledge and skills. This has been the major limiting factor in the broad adoption of these systems. And hybrid CPU/FPGA systems only appear to increase the amount of skill required, by requiring the engineer to also cope with the complexity of coupling the subsystems together. In this presentation I will show the complete flow from a mathlab model to its implementation in a hybrid CPU/FPGA system, a Xilinx Zynq. All that is required is a general understanding of what an FPGA is, and how it can be used to implement mathematical algorithms. No VHDL or Verilog experience required. The mathlab function in question is the wavelet transformation, often used in signal compression and pattern recognition. The algorithm implementation uses double-precision floating point math, usually frowned upon by FPGA engineers, but we will see later that this poses no problem for the hardware. We ported the implementation to plain C++ (or C) code, and write test code to verify it. This test code is being used throughout the project to detect any regression. We ran the code on the target CPU platform to get a baseline benchmark. Next step was to optimize the code for a more efficient implementation, and test and benchmark that again, on a desktop PC. Then we added an FPGA card to this desktop machine, to make it mimic the target. We used Dyplo to produce a bitstream for that FPGA card in a few clicks, which also adds high speed data transfer capability between the desktop CPU and FPGA. We'll pass the algorithm's C++ code on to Dyplo for implementation on the FPGA, and using the existing test code we can verify its performance. In a few iterations, the algorithm runs at the required speed, and within the resource limits. We can use the exact same software and tools to generate the final software and FPGA firmware for the Zynq target. In a few days work, we produced an implementation for the wavelet transformation on a hybrid CPU/FPGA platform that outperforms the CPU-only implementation in both speed and power efficiency. During the design we used test-driven software development, and in a sustainable pace arrived at the set goals.
--- Date: 28.02.2018 Time: 1:30 PM - 2:00 PM Location: Conference Counter NCC Ost