Scroll Top

Bloomberg used 1.3 Million GPU hours and 600 Billion documents to train BloombergGPT



Creating even “small” AI models takes up extraordinary amounts of computer power, but as computers get more powerful and cheaper, and as AI learning algorithms get more efficient this “effort” will drop.


Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trendsconnect, watch a keynote, or browse my blog.

Financial powerhouse Bloomberg has announced that it’s trying to prove that there are smarter ways to fine-tune Artificial Intelligence (AI) applications without the ethical or security concerns plaguing the likes of ChatGPT.


Scientists have built the worlds first re-programmable Quantum computer


Bloomberg recently released BloombergGPT, a homegrown large-language model with 50 billion parameters that is targeted at financial applications. The model is built on a knowledge base that Bloomberg has collected over the last 15 years, and which it provides to customers as part of its Terminal product.


The Future of AI and Generative AI by keynote Matthew Griffin


The model does not have the scope of ChatGPT, which is based on the 175-billion parameter GPT-3. But smaller models are the way to go when it comes to domain-specific applications like finance, researchers from Bloomberg and Johns Hopkins argued in an academic paper.


BloombergGPT also borrows the chatbot functionality from ChatGPT, and offers more accuracy than comparable models with more parameters, researchers said.

“General models cover many domains, are able to perform at a high level across a wide variety of tasks, and obviate the need for specialization during training time. However, results from existing domain-specific models show that general models cannot replace them,” wrote the researchers.


Nowhere to hide for Goldman's elite as AI culls jobs


Others IT executives have also argued in favor of smaller models with a few billion parameters, specifically for scientific applications. Smaller models increase the accuracy of results, and can be trained significantly faster than one-size-fits-all models like GPT-3. Smaller models also require fewer computing resources.

Bloomberg assigned close to 1.3 million hours of training time for BloombergGPT on Nvidia’s A100 GPUs in Amazon’s AWS cloud. The training was done in 64 GPU clusters, each with eight Nvidia A100 GPUs (40GB variants).

The GPU clusters were linked up using Nvidia’s proprietary NVSwitch interconnect, which had transfer speeds of 600GBps. Nvidia’s GPUDirect connected the compute nodes with AWS Elastic Fabric Adapter, which had transfer speeds of 400Gbps.

Bloomberg used Amazon’s Lustre file system – which is widely used in high-performance computing – for high-speed access to files. The file system supported up to 1000 MBps read and write throughput per TiB of storage.

BloombergGPT is one example of a company using Amazon’s cloud service to train large-language models. ChatGPT runs on Nvidia’s GPUs in Microsoft’s Azure service, and Google this week published a paper on its large-language models running on supercomputers with 4,096 TPUs (Tensor processing units).


Companies have a plan to fix blockchain's massive energy problem


The overall memory footprint in distributed GPUs was not enough, so Bloomberg made optimizations to train the model. One optimization involved breaking up training across 128 GPUs, with four copies assigned to deal with swapping or glitches. Another optimization included switching over to BF16 vector processing, which reduced the memory requirements in training, while storing parameters in FP32.

“After experimenting with various techniques, we achieve 102 TFLOPs on average and each training step takes 32.5 seconds,” researchers wrote.

Bloomberg scraped 54% of the data set – or 363 Billion internal documents dating back to 2007 – from Bloomberg’s internal database. The training involved stripping the formatting and templates of the data, which was then fed into the training system. The remaining 345 billion documents were sourced from publicly available press releases, Bloomberg news pieces, public filings, and even Wikipedia. Each document was referred to as a “Token.”

The researchers wanted training sequences to be the length of 2,048 tokens to maintain the highest levels of GPU utilization.


This new quantum computing algorithm could crack the world's encryption faster than before


“Since we are data limited, we choose the largest model that we can, while ensuring that we can train on all our tokens and still leave ~30% of the total compute budget as a buffer for unforeseen failures, retries, and restarts,” the researchers wrote.

Bloomberg said it would not release its BloombergGPT model for evaluation, which follows the footsteps of OpenAI, which open-sourced GPT-3 but is charging for access to the closed-source GPT-4 that was announced the other month. Bloomberg’s business model revolves around proprietary algorithms it uses to provide intelligence to traders and analysts, and opening up BloombergGPT could expose core assets like FINPILE database, which is the main source of documents used to train the model.

The researchers also noted uncertainty around the toxicity and ethical use of large-language models, which have been pointed out as more users try out ChatGPT. The company is locking down BloombergGPT for security reasons.

“Each decision reflects a combination of factors, including model use, potential harms, and business decisions,” the researchers said.

The company will build on the model as it feeds more data into the system and sorts through the issues.

Related Posts

Leave a comment


1000's of articles about the exponential future, 1000's of pages of insights, 1000's of videos, and 100's of exponential technologies: Get The Email from 311, your no-nonsense briefing on all the biggest stories in exponential technology and science.

You have Successfully Subscribed!

Pin It on Pinterest

Share This