Hammered by GPU sanctions Chinese firms cut AI inference costs by 90%

By Matthew Griffin Intelligence and the Senses 8th November 2024

WHY THIS MATTERS IN BRIEF

Unable to get access to the best GPUs to train their AI models Chinese companies are finding new, better, cheaper and more innovative ways to train their models – many of which are now as good as American counterparts.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

As Chinese companies continue to face export bans on the GPUs and AI Accelerator chips that are pivotal to helping them train their latest Artificial Intelligence (AI) models they’re not only coming up with new innovative ways of training their AI models, for example, by using distributed GPU clusters, but they are also fundamentally changing the AI training methods they use to stay competitive with the West.

New analysis suggests AI could upend everyones pay and flatten wages

One example of this is the way that Chinese AI companies are driving down costs to create competitive models, as they contend with US chip restrictions and smaller budgets than their Western counterparts. Start-ups such as 01.ai and DeepSeek have reduced prices by adopting strategies such as focusing on smaller data sets to train AI models and hiring cheap but skilled computer engineers. Bigger technology groups such as Alibaba, Baidu and ByteDance have also engaged in a pricing war to cut “inference” costs, the price of calling upon large language models to generate a response, by more than 90 per cent and to a fraction of that offered by US counterparts – which is huge.

The Future of AI, by keynote Matthew Griffin

This is despite Chinese companies having to navigate Washington’s ban on exports of the highest-end Nvidia AI chips, seen as crucial to developing the most cutting edge models in the US. Beijing-based 01.ai, led by Lee Kai-Fu, the former head of Google China, said it has cut inference costs by building a model trained on smaller amounts of data that requires less computing power and optimising their hardware.

“China’s strength is to make really affordable inference engines and then to let applications proliferate,” Lee told reporters. This week, 01.ai’s Yi-Lightning model came joint third among LLM companies alongside x.AI’s Grok-2, but behind OpenAI and Google in a ranking released by researchers at UC Berkeley SkyLab and LMSYS. The evaluations are based on users that score different models’ answers to queries. Other Chinese players, including ByteDance, Alibaba and DeepSeek have also crept up the ranking boards of LLMs. The cost for inference at 01.ai’s Yi-Lightning is 14 cents per million tokens, compared with 26 cents for OpenAI’s smaller model GPT o1-mini. Meanwhile inference costs for OpenAI’s much larger GPT 4o is $4.40 per million tokens.

LG's keynote speaker at CES 2021 was a digital human called Reah

The number of tokens used to generate a response depends on the complexity of the query. Lee also said Yi-Lightning cost $3mn to “pre-train,” initial model training that can then be fine-tuned or customised for different use cases.

This is a small fraction of the cost cited by the likes of OpenAI for its large models which now cost more than $200 Million per model to train. He added the aim is not to have the “best model,” but a competitive one that is “five to 10 times less expensive” for developers to use to build applications. Many Chinese AI groups, including 01.ai, DeepSeek, MiniMax and Stepfun have adopted a so-called Master-of-Expert approach that I’ve talked about before and which was used to train GPT-4, a strategy first popularised by US researchers.

Rather than training one “dense model” at once on a vast database that has scraped data from the internet and other sources, the approach combines many smaller, often multi-billion parameter, neural networks trained on industry-specific data. Researchers view the model-of-expert approach as a key way to achieve the same level of intelligence as a dense model but with less computing power. But the approach can be more prone to failure as engineers have to orchestrate the training process across multiple “experts” rather than in one model.

New spray on sensor technology turns everything into a user interface

Given the difficulty in securing a steady and ample supply of high-end AI chips, Chinese AI players have been competing over the past year to develop the highest-quality data sets to train these “experts” to set themselves apart from the competition. Lee said 01.ai has approaches to data collection beyond the traditional method of scraping the internet, including scanning books and crawling articles on the messaging app WeChat that are inaccessible on the open web.

“There is a lot of thankless grunt work” for engineers to label and rank data, he said, but added China — with its vast pool of cheap engineering talent — is better placed to do that than the US.

“China’s strength is not doing the best breakthrough research that no one has done before where the budget has no limit,” said Lee. “China’s strength is to build well, build fast, build reliably, and build cheap.”

Matthew Griffin / About Author

Matthew Griffin is a multi-award winning Futurist and expert in Disruption and Innovation, Geopolitics, Leadership, and Technology, who NASA have described as a "walking encyclopaedia of the future" and a "futurist Polymath." 15-time best selling author of the "Codex of the Future" series, Matthew is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working with royal households, world leaders, G7, G20, and G77 governments, NGOs, and multi-national mid and mega cap firms to help them explore, shape, and lead the next 50 years of business and society.

An award-winning YouTube creator with over a million followers, with an unrivalled global reach and impact, Matthew is a highly sought-after international keynote speaker, lecturer, and mentor who collaborates with global leaders through the United Nations Alliance of Civilizations (UNAOC) and United Nations General Assembly (UNGA) to shape pivotal initiatives such as the UN’s AI for Humanity program, the United Nations Conference of the Parties (UN COP), and the World Economic Forum in Davos.

As the former Global Head of Cloud, National Security, and Enterprise Sales for companies including Atos, Dell-EMC, and IBM, Matthew has a proven track record of building multi-billion dollar business units and turning failing divisions into market leaders. His ability to identify, analyse, and communicate the implications of hundreds of emerging technologies and trends is unparalleled, and his insights are trusted by many of the world’s most respected organisations, including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi, Coca-Cola, Dentons, Deloitte, Dow Jones, EY, Google, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, Siemens AG and Siemens Energy, T-Mobile, UBS, VISA, Walmart, Workday, Worldpay and many others.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.