Boosting Deep Neural Network Efficiency with Dual-Module Inference

ICML-2020 [Accepted], 2020

  • Develop a light-weighted auxiliary “little” module with random projection and weight quantization for probing Neural Network (NN) layerwise output sparsity to facilitate NN inference acceleration;
  • The proposed scheme can be easily applied to various types of neural networks, such as CNN, and LSTM. (e.g., on ResNet-18, and it outperforms the state-of-the-art solutions with much higher FLOPs reduction, memory saving and model accuracy);
  • The proposed scheme can also be applied to various tasks, such as object detection (SSD: Single Shot MultiBox Detector).