WHY THIS MATTERS IN BRIEF Creating even “small” AI models takes up extraordinary amounts of computer power, but as computers get more powerful and cheaper, and as AI learning algorithms get more efficient this “effort” will drop. Love the Exponential Future? Join our XPotential Community, future proof yourself...
Creating even “small” AI models takes up extraordinary amounts of computer power, but as computers get more powerful and cheaper, and as AI learning algorithms get more efficient this “effort” will drop.
Financial powerhouse Bloomberg has announced that it’s trying to prove that there are smarter ways to fine-tune Artificial Intelligence (AI) applications without the ethical or security concerns plaguing the likes of ChatGPT.
Bloomberg recently released BloombergGPT, a homegrown large-language model with 50 billion parameters that is targeted at financial applications. The model is built on a knowledge base that Bloomberg has collected over the last 15 years, and which it provides to customers as part of its Terminal product.
The Future of AI and Generative AI by keynote Matthew Griffin
The model does not have the scope of ChatGPT, which is based on the 175-billion parameter GPT-3. But smaller models are the way to go when it comes to domain-specific applications like finance, researchers from Bloomberg and Johns Hopkins argued in an academic paper.
BloombergGPT also borrows the chatbot functionality from ChatGPT, and offers more accuracy than comparable models with more parameters, researchers said.
“General models cover many domains, are able to perform at a high level across a wide variety of tasks, and obviate the need for specialization during training time. However, results from existing domain-specific models show that general models cannot replace them,” wrote the researchers.
Others IT executives have also argued in favor of smaller models with a few billion parameters, specifically for scientific applications. Smaller models increase the accuracy of results, and can be trained significantly faster than one-size-fits-all models like GPT-3. Smaller models also require fewer computing resources.
Bloomberg assigned close to 1.3 million hours of training time for BloombergGPT on Nvidia’s A100 GPUs in Amazon’s AWS cloud. The training was done in 64 GPU clusters, each with eight Nvidia A100 GPUs (40GB variants).
The GPU clusters were linked up using Nvidia’s proprietary NVSwitch interconnect, which had transfer speeds of 600GBps. Nvidia’s GPUDirect connected the compute nodes with AWS Elastic Fabric Adapter, which had transfer speeds of 400Gbps.
Bloomberg used Amazon’s Lustre file system – which is widely used in high-performance computing – for high-speed access to files. The file system supported up to 1000 MBps read and write throughput per TiB of storage.
BloombergGPT is one example of a company using Amazon’s cloud service to train large-language models. ChatGPT runs on Nvidia’s GPUs in Microsoft’s Azure service, and Google this week published a paper on its large-language models running on supercomputers with 4,096 TPUs (Tensor processing units).
The overall memory footprint in distributed GPUs was not enough, so Bloomberg made optimizations to train the model. One optimization involved breaking up training across 128 GPUs, with four copies assigned to deal with swapping or glitches. Another optimization included switching over to BF16 vector processing, which reduced the memory requirements in training, while storing parameters in FP32.
“After experimenting with various techniques, we achieve 102 TFLOPs on average and each training step takes 32.5 seconds,” researchers wrote.
Bloomberg scraped 54% of the data set – or 363 Billion internal documents dating back to 2007 – from Bloomberg’s internal database. The training involved stripping the formatting and templates of the data, which was then fed into the training system. The remaining 345 billion documents were sourced from publicly available press releases, Bloomberg news pieces, public filings, and even Wikipedia. Each document was referred to as a “Token.”
The researchers wanted training sequences to be the length of 2,048 tokens to maintain the highest levels of GPU utilization.
“Since we are data limited, we choose the largest model that we can, while ensuring that we can train on all our tokens and still leave ~30% of the total compute budget as a buffer for unforeseen failures, retries, and restarts,” the researchers wrote.
Bloomberg said it would not release its BloombergGPT model for evaluation, which follows the footsteps of OpenAI, which open-sourced GPT-3 but is charging for access to the closed-source GPT-4 that was announced the other month. Bloomberg’s business model revolves around proprietary algorithms it uses to provide intelligence to traders and analysts, and opening up BloombergGPT could expose core assets like FINPILE database, which is the main source of documents used to train the model.
The researchers also noted uncertainty around the toxicity and ethical use of large-language models, which have been pointed out as more users try out ChatGPT. The company is locking down BloombergGPT for security reasons.
“Each decision reflects a combination of factors, including model use, potential harms, and business decisions,” the researchers said.
The company will build on the model as it feeds more data into the system and sorts through the issues.
Matthew Griffin, described as “The Adviser behind the Advisers” and a “Young Kurzweil,” is the founder and CEO of the World Futures Forum and the 311 Institute, a global Futures and Deep Futures consultancy working between the dates of 2020 to 2070, and is an award winning futurist, and author of “Codex of the Future” series.
Regularly featured in the global media, including AP, BBC, Bloomberg, CNBC, Discovery, RT, Viacom, and WIRED, Matthew’s ability to identify, track, and explain the impacts of hundreds of revolutionary emerging technologies on global culture, industry and society, is unparalleled. Recognised for the past six years as one of the world’s foremost futurists, innovation and strategy experts Matthew is an international speaker who helps governments, investors, multi-nationals and regulators around the world envision, build and lead an inclusive, sustainable future.
A rare talent Matthew’s recent work includes mentoring Lunar XPrize teams, re-envisioning global education and training with the G20, and helping the world’s largest organisations envision and ideate the future of their products and services, industries, and countries.
Matthew's clients include three Prime Ministers and several governments, including the G7, Accenture, Aon, Bain & Co, BCG, Credit Suisse, Dell EMC, Dentons, Deloitte, E&Y, GEMS, Huawei, JPMorgan Chase, KPMG, Lego, McKinsey, PWC, Qualcomm, SAP, Samsung, Sopra Steria, T-Mobile, and many more.
FANATICALFUTURIST PODCAST! Hear about ALL the latest futures news and breakthroughs!SUBSCRIBE
EXPLORE MORE!
1000's of articles about the exponential future, 1000's of pages of insights, 1000's of videos, and 100's of exponential technologies: Get The Email from 311, your no-nonsense briefing on all the biggest stories in exponential technology and science.