Conferences and supporting programme
Tackling the Emerging Software and System Challenges in the New Multicore Automotive Era
As Moore’s law starts to saturate and architectures evolve to multicore systems the problems have now shifted to figuring out how to map software to balance multiple parameters of power consumed, latency, throughput, and cost. Hardware architectures innovations range from homogeneous RISC cores to hybrids including GPUs and embedded DSPs to more radical ones that include one or more of RISC, GPU, DSP, and embedded FPGA. System architects traditionally figured out what software should run on what computer island. But as the number of cores grow to 100s, software optimization tools are better equipped to automate the challenge. This is not a simple load-balancing problem. The impact of the on-chip communication, whether it is a bus, a network-on-chip or a hybrid can have a huge impact on the performance of the overall system. The existence of direct-write, shared-memory or the number of caches can have a profound impact in the throughput of a software solution when memory access drives performance more than computation. In the Object recognition space, while training in the cloud might imply “scalable” computation availability, when these demands move to the edge, resources are constrained and the same problems come again. While approaches like OpenVX help in addressing this problem, the solution today lies primarily in a trial and error approach, which can take months, to optimize if at all. Having worked for over ten years researching this problem in the heterogeneous multicore space, we offer a performance-estimation based simulation technology to solve a multi-parameter optimization problem. The key to our ability to evaluate multiple architectures is to be able externalize this with a modeling and library based approach that allows us to scale and move rapidly in being able to support, compare and contrast a single software solution on a variety of architectures to be able to find the best fit for the problem at hand. Imagine trying to figure out whether a 2-core, 4-core or 8-core chip (or worse 100-core vs. 200-core) is best suited to meet your performance needs without having to over-design the solution and understand how much headroom is still left in the design to handle immediate future requirements. In our talk, we will take a specific solution of applying a deep-learning software problem on two different architectures and show the value of early architecture evaluation and design trade-offs. We will show real examples from major partners from Tier One, OEMs in automotive and 5G in the wireless industry. Embedded DSPs and Embedded FPGAs as alternatives to GPUs are a viable alternative to explore. When Power is the dominating factor in architecture design, the combination of software mapping onto CPUs vs. DSPs can have a huge impact. Thread implementations like 'p-threads' have inherent non-determinism that can lead to deadlocks.
--- Date: 01.03.2018 Time: 2:00 PM - 2:30 PM Location: Conference Counter NCC Ost