OpenAI's mini GPT-4o model is mighty and 60% cheaper than its big brother

By Matthew Griffin Intelligence and the Senses 2nd August 2024

WHY THIS MATTERS IN BRIEF

OpenAI is going for the large foundational AI model market and also the smaller niche AI model market, and this is a big announcement that most people will pass by.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

OpenAI has made available GPT-4o Mini – a much smaller and so called Small Language Model (SLM), and much cheaper version of its GPT-4o generative Large Language Model (LLM) – via its cloud.

An AI learned to identify PTSD symptoms better than human experts

The Microsoft-backed super lab said Thursday GPT-4o Mini is like regular GPT-4o in that it’s multi-modal – it can handle more than just the written word – and has a context window of 128,000 tokens and was trained on materials dated up to October 2023. The Mini version can emit up to 16,000 tokens of output.

However, while GPT-4o, OpenAI’s top-end model, costs $5 and $15 per million input and output tokens respectively the Mini edition costs over 60% less at 15 and 60 cents respectively. And you can halve those numbers if using delayed batch processing.

The Future of AI, by keynote Matthew Griffin

I’m told the cut-down version is not fully featured yet, supporting just text and vision via its API. Other input and output formats, such as audio, are coming in the indeterminate future.

In creating GPT-4o Mini, OpenAI emphasised how safe it had made the thing, claiming to filter out offensive data from training materials and giving it the same guardrails as GPT-4o. Mini has also gained mechanisms to, ideally, thwart attempts to persuade the model to do things it’s not supposed to, such as making it ignore previous instructions and override its makers’ intentions, according to OpenAI.

Google launches Gemini 2.0 its new model for everything

“GPT-4o mini in the API is the first model to apply our instruction hierarchy method, which helps to improve the model’s ability to resist jailbreaks, prompt injections, and system prompt extractions,” the super-lab said. “This makes the model’s responses more reliable and helps make it safer to use in applications at scale. We’ll continue to monitor how GPT-4o mini is being used and improve the model’s safety as we identify new risks.”

Furthermore, OpenAI claimed GPT-4o Mini is ahead of comparable LLMs in benchmarks. Compared to Google’s lighter-weight Gemini Flash and Anthropic’s Claude Haiku, Mini was usually between five and 15 percent more accurate in tests such as MMLU. In two outliers it was nearly twice as accurate as the competition, and in another a little worse than Gemini Flash but still ahead of Claude Haiku, allegedly.

Competition between OpenAI and Anthropic has a personal edge, as the latter was co-founded and built in part by executives and engineers from the former.

Anthropic releases their OpenAI GPT-4 busting Claude AI 3 model

GPT-4o Mini performs well for sure but it doesn’t have an overall commanding lead – and that’s indicative of OpenAI’s recent loss of absolute leadership in the LLM arena. As veteran open source developer Simon Willison detailed in his keynote at the AI Engineer World’s Fair last month, 2024 has seen many of OpenAI’s competitors release their own GPT-4-class models.

“The best models are grouped together: GPT-4o, the brand new Claude 3.5 Sonnet and Google Gemini 1.5 Pro,” Willison declared. “I would classify all of these as GPT-4 class. These are the best available models, and we have options other than GPT-4 now. The pricing isn’t too bad either – significantly cheaper than in the past.”

At 82 percent accuracy in MMLU and a cost of 15 cents per million tokens, GPT-4o Mini is mostly ahead of the pack. However, Willison noted the LMSYS Chatbot Arena benchmark provides a more realistic evaluation of LLM quality because actual humans are asked to compare outputs and choose which is better – a brute-force but effective way of ranking different models.

Apple report says powerful AI models aren't reasoning at all

GPT-4o Mini is too new to be included in the tournament-style benchmark, though he noted that full-size GPT-4o is only barely ahead of its rivals. Anthropic’s flagship Claude 3.5 Sonnet currently has 1,271 points to GPT-4o’s 1,287. Gemini 1.5 Pro isn’t far behind at 1,267. Slightly less performant but still respectable models include Nvidia and Mistral’s brand-new Nemotron 4 340B Instruct at 1,209 points, and Meta’s LlaMa 3 70B Instruct at 1,201.

Willison also noted the Mini is cheaper than Claude 3 Haiku and Gemini 1.5 Flash. OpenAI may be the best, in terms of these test scores, from SLM’s to big LLMs, but it no longer has the dominating lead it once had. That’s probably a good thing – between costly AI hardware and high power usage, the last thing AI needed was an LLM monopoly.

Matthew Griffin / About Author

Matthew Griffin is a multi-award winning Futurist and expert in Disruption and Innovation, Geopolitics, Leadership, and Technology, who NASA have described as a "walking encyclopaedia of the future" and a "futurist Polymath." 15-time best selling author of the "Codex of the Future" series, Matthew is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working with royal households, world leaders, G7, G20, and G77 governments, NGOs, and multi-national mid and mega cap firms to help them explore, shape, and lead the next 50 years of business and society.

An award-winning YouTube creator with over a million followers, with an unrivalled global reach and impact, Matthew is a highly sought-after international keynote speaker, lecturer, and mentor who collaborates with global leaders through the United Nations Alliance of Civilizations (UNAOC) and United Nations General Assembly (UNGA) to shape pivotal initiatives such as the UN’s AI for Humanity program, the United Nations Conference of the Parties (UN COP), and the World Economic Forum in Davos.

As the former Global Head of Cloud, National Security, and Enterprise Sales for companies including Atos, Dell-EMC, and IBM, Matthew has a proven track record of building multi-billion dollar business units and turning failing divisions into market leaders. His ability to identify, analyse, and communicate the implications of hundreds of emerging technologies and trends is unparalleled, and his insights are trusted by many of the world’s most respected organisations, including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi, Coca-Cola, Dentons, Deloitte, Dow Jones, EY, Google, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, Siemens AG and Siemens Energy, T-Mobile, UBS, VISA, Walmart, Workday, Worldpay and many others.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.