Skip to main content Scroll Top

This new Chinese re-configurable processor gives AI a performance boost

WHY THIS MATTERS IN BRIEF

A new Hong Kong reconfigurable chip runs deep-learning models several times faster than GPUs.

 

Matthew Griffin is the World’s #1 Futurist Keynote Speaker and Global Advisor for the G7 and Fortune 500, specialising in exponential disruption across 100 countries. Book a Keynote or Advisory SessionJoin 1M+ followers on YouTube and explore his 15-book Codex of the Future series.

 


 

Deep learning is a critical computing approach that is pushing the boundaries of technology – crunching immense amounts of data and uncovering subtle patterns that humans could never discern on their own. But for optimal performance, deep learning algorithms need to be supported with the right software compiler and hardware combinations. In particular, Re-configurable Computer Chips, which allow for flexible use of hardware resources for computing as needed, are key.

 

RELATED
3D Bio-Printing robot prints human tissue within the body to treat internal injuries

 

In a recent study, researchers in Hong Kong report a new reconfigurable processor, dubbed ReAAP, that outperforms several computing platforms commonly used to support Deep Neural Networks (DNNs), a useful form of deep learning that often involves large data sets with many computationally intensive layers of data. They describe it in a published in .

Whereas ordinary processors typically allow data to be processed using specific hardware pathways, reconfigurable processors offer a more adaptive option: reconfiguring the most efficient hardware resources to process the data as needed.

“Reconfigurable processors combine the advantages of software flexibility and hardware parallelism,” explains Jianwei Zheng, a postdoctoral researcher with the department of electronic and at the Hong Kong University of Science and Technology, who was involved in the study.

 

RELATED
Google reinvents hyperscale storage

 

These advantages prompted his team to create ReAAP, which is an integrated software-hardware system. Its software compiler is responsible for evaluating and optimising diverse deep-learning workloads. Once it determines the best solution for processing data in parallel, it sends instructions to reconfigure the hardware coprocessor, which allocates appropriate hardware resources to do the parallel computing.

“As an end-to-end system, ReAAP can be deployed to accelerate various deep-learning applications just by customising a script in [the] software for each application,” explains Zheng.

In their study, the researchers compared their proposed software compiler in ReAAP to three other baseline software compilers on a Nvidia GPU and an ARM CPU. The results show that it performs 1.9 to 5.7 times as fast as the next best software complier running on the GPU and 1.6 to 3.3 times as fast as the same software compiler running on the CPU.

Additionally, Zheng notes that ReAAP achieves a consistently high utilisation of hardware resources for a wide range of diverse compute-intensive layers.

 

RELATED
Nvidia reaches $20 Billion licensing deal with AI chip maker Groq

 

While ReAAP is good at handling DNNs with typical data-intensive workloads, it is currently not well suited to support DNNs when data is sparse. Zheng says his team hopes to address this issue in the future.

What’s more, the researchers hope to build upon ReAAP so that it can better process quantised data – data that’s processed in a way that significantly reduces the memory requirement and computational cost of neural networks.

“After the extension [of ReAAP to better handle quantised data] is completed and evaluated, we will consider commercialising it together with a couple of other acceleration solutions for AI computing,” says Zheng, noting that this will make ReAAP more efficient on resource-constrained platforms such as various Internet of Things (IoT) devices.

 


 

What makes a reconfigurable processor faster for AI?
Instead of forcing data down fixed hardware paths, ReAAP's compiler works out the most efficient layout for each deep-learning workload and reshapes the chip's hardware to match – combining software flexibility with hardware parallelism to keep far more of the chip busy.

Related Posts

Leave a comment

Pin It on Pinterest

Share This