Conferences and supporting programme
Partitioning of Algorithms for Distributed Computation
When designing embedded systems it is vital to determine the required computing power in advance. With nowadays micro-controllers, signal processors and programmable logic usually the main problem is not to get sufficient processing capability, but to choose the appropriate computing platform. The majority of embedded systems do not operate in an isolated stand-alone environment, but rather communicate and interact with others. This holds true for a system scale, where for example multiple IoT sensors unite themselves, as well as on intra device scale, where a general purpose controller teams up with digital signal processors (DSP) for wireless communication and data acquisition. Within this paper a method to analyse the data flow and processing needs of arbitrary algorithms is described. Such an analysis is one of the initial steps required in order to distribute the various processing tasks to the components within a system. In order to do so an algorithm described in a high level language (such as C++/C) is transformed in a generalized intermediate representation (LLVM IR) by use of a standard compilation flow (clang). Based on the intermediate representation the algorithm can be dynamically analysed by means of executing an instrumented binary, in addition to a purely static analysis. By combining the results of the dynamic and static analysis the data flow as well as the control flow graph can be extracted. Further the paper describes the challenges which occur when extracting the computational effort and amount of data out of the analysis data. Examples for such challenges are: Dealing with aspects of the single static assignment form of the intermediate language in presence of loop constructs (phi-nodes). Handling of code sequences that do not contribute to the algorithm itself, such as calling subroutines with the required parameter passing. Decomposing of data access to aggregate structures (arrays). With these techniques applied the impact of the compilers optimization level can be minimized. The goal is to achieve similar results whether or not the compiler performs aggressive function inlining or not. Finally a proof of concept tool has been implemented, which is able to automatically process an C++ application, identifies the function to be analysed, runs the static and dynamic analysis and generates a combined data- and control-flow graph. With this graph a reasonable metric of the computational effort, as well as the required processing data for each part of the algorithm can be calculated. This provides the required quantitative data material for formulating a suitable system partitioning. Such a partition might be on macroscopic scale, like partitioning computation between a web server and an (resource limited) embedded client with constrained data flow in between, or on microscopic scale, like a general purpose CPU paired with a DSP.
--- Date: 01.03.2018 Time: 12:00 PM - 12:30 PM Location: Conference Counter NCC Ost