Konferenzen und Rahmenprogramm
Deep Learning Inference on the MPPA3 Manycore Processor
We present how deep learning acceleration is performed on the Kalray MPPA3 manycore processor. Each core is tightly coupled with a coprocessor that leverages the core load-store unit to transfer data at the rate of 32 bytes per memory access. The coprocessor then uses these data blocks as either left or right operands of a matrix multiply-accumulate unit, while taking care of formatting them in a uniform way with a fixed number of rows and a variable number of columns, depending on the size of the elements. A standard trained neural network is taken as input by the KaNN code generator. The generated code executes one layer at a time, in topological sort order of the network. For each layer, execution is distributed across the compute clusters allocated to this inference. Network parameters are read from DDR memory and multicasted through the processor RDMA NoC, while the input and output activations of the current layer are distributed across the local memories of the compute clusters.
--- Datum: 26.02.2020 Uhrzeit: 10:30 - 11:00 Uhr Ort: Conference Counter NCC Ost