Operator Partitioning and Parallel Scheduling Optimization for Deep Learning Compiler

Operator Partitioning and Parallel Scheduling Optimization for Deep Learning Compiler
Author	Zhiyu Li Xiang Zhou Wenbin Weng
Abstract	TVM(tensor virtual machine) as a deep learning compiler which supports the conversion of machine learning models into TVM IR(intermediate representation) and to optimise the generation of high-performance machine code for various hardware platforms. While the traditional approach is to parallelise the cyclic transformations of operators, in this paper we partition the implementation of the operators in the deep learning compiler TVM with parallel scheduling to derive a faster running time solution for the operators. An optimisation algorithm for partitioning and parallel scheduling is designed for the deep learning compiler TVM, where operators such as two-dimensional convolutions are partitioned into multiple smaller implementations and several partitioned operators are run in parallel scheduling to derive the best operator partitioning and parallel scheduling decisions by means of performance estimation. To evaluate the effectiveness of the algorithm, multiple examples of the two-dimensional convolution operator, the average pooling operator, the maximum pooling operator, and the ReLU activation operator with different input sizes were tested on the CPU platform, and the performance of these operators was experimentally shown to be improved and the operators were run speedily.
Year of Publication	2022
Conference Name	2022 IEEE 5th International Conference on Automation, Electronics and Electrical Engineering (AUTEEE)
Google Scholar \| BibTeX