Bloomberg used 1.3 Million GPU hours and 600 Billion documents to train BloombergGPT

0 0

By Matthew Griffin Intelligence and the Senses 26th May 2023

WHY THIS MATTERS IN BRIEF

Creating even “small” AI models takes up extraordinary amounts of computer power, but as computers get more powerful and cheaper, and as AI learning algorithms get more efficient this “effort” will drop.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

Financial powerhouse Bloomberg has announced that it’s trying to prove that there are smarter ways to fine-tune Artificial Intelligence (AI) applications without the ethical or security concerns plaguing the likes of ChatGPT.

Australia's central bank is sceptical on a digital currency, but signals that could change

Bloomberg recently released BloombergGPT, a homegrown large-language model with 50 billion parameters that is targeted at financial applications. The model is built on a knowledge base that Bloomberg has collected over the last 15 years, and which it provides to customers as part of its Terminal product.

The Future of AI and Generative AI by keynote Matthew Griffin

The model does not have the scope of ChatGPT, which is based on the 175-billion parameter GPT-3. But smaller models are the way to go when it comes to domain-specific applications like finance, researchers from Bloomberg and Johns Hopkins argued in an academic paper.

BloombergGPT also borrows the chatbot functionality from ChatGPT, and offers more accuracy than comparable models with more parameters, researchers said.

“General models cover many domains, are able to perform at a high level across a wide variety of tasks, and obviate the need for specialization during training time. However, results from existing domain-specific models show that general models cannot replace them,” wrote the researchers.

JPMorgan used ChatGPT to analyse decades worth of Fed speeches to find signals

Others IT executives have also argued in favor of smaller models with a few billion parameters, specifically for scientific applications. Smaller models increase the accuracy of results, and can be trained significantly faster than one-size-fits-all models like GPT-3. Smaller models also require fewer computing resources.

Bloomberg assigned close to 1.3 million hours of training time for BloombergGPT on Nvidia’s A100 GPUs in Amazon’s AWS cloud. The training was done in 64 GPU clusters, each with eight Nvidia A100 GPUs (40GB variants).

The GPU clusters were linked up using Nvidia’s proprietary NVSwitch interconnect, which had transfer speeds of 600GBps. Nvidia’s GPUDirect connected the compute nodes with AWS Elastic Fabric Adapter, which had transfer speeds of 400Gbps.

Bloomberg used Amazon’s Lustre file system – which is widely used in high-performance computing – for high-speed access to files. The file system supported up to 1000 MBps read and write throughput per TiB of storage.

BloombergGPT is one example of a company using Amazon’s cloud service to train large-language models. ChatGPT runs on Nvidia’s GPUs in Microsoft’s Azure service, and Google this week published a paper on its large-language models running on supercomputers with 4,096 TPUs (Tensor processing units).

Google's democratic AI re-distributes wealth better than politicians

The overall memory footprint in distributed GPUs was not enough, so Bloomberg made optimizations to train the model. One optimization involved breaking up training across 128 GPUs, with four copies assigned to deal with swapping or glitches. Another optimization included switching over to BF16 vector processing, which reduced the memory requirements in training, while storing parameters in FP32.

“After experimenting with various techniques, we achieve 102 TFLOPs on average and each training step takes 32.5 seconds,” researchers wrote.

Bloomberg scraped 54% of the data set – or 363 Billion internal documents dating back to 2007 – from Bloomberg’s internal database. The training involved stripping the formatting and templates of the data, which was then fed into the training system. The remaining 345 billion documents were sourced from publicly available press releases, Bloomberg news pieces, public filings, and even Wikipedia. Each document was referred to as a “Token.”

The researchers wanted training sequences to be the length of 2,048 tokens to maintain the highest levels of GPU utilization.

Darktrace's new AI automatically stops cyber attacks

“Since we are data limited, we choose the largest model that we can, while ensuring that we can train on all our tokens and still leave ~30% of the total compute budget as a buffer for unforeseen failures, retries, and restarts,” the researchers wrote.

Bloomberg said it would not release its BloombergGPT model for evaluation, which follows the footsteps of OpenAI, which open-sourced GPT-3 but is charging for access to the closed-source GPT-4 that was announced the other month. Bloomberg’s business model revolves around proprietary algorithms it uses to provide intelligence to traders and analysts, and opening up BloombergGPT could expose core assets like FINPILE database, which is the main source of documents used to train the model.

The researchers also noted uncertainty around the toxicity and ethical use of large-language models, which have been pointed out as more users try out ChatGPT. The company is locking down BloombergGPT for security reasons.

“Each decision reflects a combination of factors, including model use, potential harms, and business decisions,” the researchers said.

The company will build on the model as it feeds more data into the system and sorts through the issues.

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.