Hardware IP for High Performance AI Data Compression

Technical introduction:

1. Project overview (research background + project overview)

    This technology realizes efficient lossless/lossy compression/decompression technology for the intermediate layer results and weights of neural networks, and develops and implements corresponding hardware IP, which can be used for integration in related AI computing chips, significantly reducing AI processor performance.On-chip cache requirements and off-chip memory access bandwidth requirements.Relevant technologies have been licensed to 2 industry leading companies with an output value of over 10 billion yuan, and the patent licensing fee exceeds 1 million yuan.

2. Key technologies (technical advantages and features) 

l  For on-chip decompression of weights, it achieves lossless decompression close to the huffman compression rate (the total compression rate is only 3% higher than that of huffman). DC synthesis results); the operating frequency meets 1GHz (pre imitation, tsmc28).

l  For on-chip compression/decompression of the intermediate layer results of the neural network, a lossless compression/decompression hardware encoding technology close to the Huffman compression rate is realized (the total compression rate is only 5% higher than that of Huffman), Throughput satisfies a single core greater than 8pixel /cycle; the single core of the logic gate overhead of the decompression and compression module is less than 50K gates; (DC synthesis result); the operating frequency meets 1GHz (pre imitation, tsmc28).

l  On-chip compression/decompression for the intermediate layer results of the neural

    network, which implements lossy compression/decompression hardware coding technology, needs to reuse the MAC computing unit in the processor, for the published original version (on github) The commonly used networks such as Yolo-V3, VGG16, and Resnet have achieved a compression rate of 3x-5x, and the prediction accuracy of the network does not drop by more than 1%, and the network can be finetuned to further reduce. The entire compression and decompression unit is less than 100K gates; (DC synthesis results); the operating frequency meets 0.95GHz (pre-imitation, tsmc28).

3. Application areas and market prospects

    The hardware IP of this project can be used for integration in related AI computing chips, which significantly reduces the on-chip cache requirements and off-chip memory bandwidth requirements of AI processors, and realizes an end-to-end AI processor system, which can significantly reduce the transmission delay caused byInsufficient chip computing performance and increased power consumption

4. Cooperation Model: Technology Transfer, Authorization, Licensing 

5. Contact us: hejing@nju.edu.cn

Fig. 1 Compression ratio and network prediction accuracy reduction under lossy compression in the middle layer

 (the network has not been finetuned)