Luka Vukusic
Optimization of Reconfigurable Hardware Accelerators for Embedded Machine Learning
Abstract
This thesis delves into the synergies between neural networks and field programmable
gate arrays (FPGAs) for efficient acceleration. FPGAs, renowned for their reconfigurability,
offer a promising avenue for implementing neural network accelerators. The
focus is on optimizing STANN, a C++ template library tailored for FPGA-based neural
network implementations, by leveraging insights from state-of-the-art libraries, such
as hls4ml. The optimization endeavors center around the refinement of dense layers,
targeting improvements in matrix multiplication, along with activation function calculations.
Encouraging outcomes include a substantial sixfold reduction in computation
time of the activation function by approximating its results utilizing lookup tables.
Furthermore, optimization attempts on matrix multiplication showcase a one-third
reduction in latency for a neural network architecture boasting ten hidden layers,
each housing 128 neurons. Through comparative analysis with leading acceleration
libraries, this work unveils promising possibilities for STANN’s further optimization,
underscoring the ongoing potential for refinement of FPGA-driven neural network
acceleration.