WHY THIS MATTERS IN BRIEF
OpenAI is going for the large foundational AI model market and also the smaller niche AI model market, and this is a big announcement that most people will pass by.
Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.
OpenAI has made available GPT-4o Mini – a much smaller and so called Small Language Model (SLM), and much cheaper version of its GPT-4o generative Large Language Model (LLM) – via its cloud.
The Microsoft-backed super lab said Thursday GPT-4o Mini is like regular GPT-4o in that it’s multi-modal – it can handle more than just the written word – and has a context window of 128,000 tokens and was trained on materials dated up to October 2023. The Mini version can emit up to 16,000 tokens of output.
However, while GPT-4o, OpenAI’s top-end model, costs $5 and $15 per million input and output tokens respectively the Mini edition costs over 60% less at 15 and 60 cents respectively. And you can halve those numbers if using delayed batch processing.
The Future of AI, by keynote Matthew Griffin
I’m told the cut-down version is not fully featured yet, supporting just text and vision via its API. Other input and output formats, such as audio, are coming in the indeterminate future.
In creating GPT-4o Mini, OpenAI emphasised how safe it had made the thing, claiming to filter out offensive data from training materials and giving it the same guardrails as GPT-4o. Mini has also gained mechanisms to, ideally, thwart attempts to persuade the model to do things it’s not supposed to, such as making it ignore previous instructions and override its makers’ intentions, according to OpenAI.
“GPT-4o mini in the API is the first model to apply our instruction hierarchy method, which helps to improve the model’s ability to resist jailbreaks, prompt injections, and system prompt extractions,” the super-lab said. “This makes the model’s responses more reliable and helps make it safer to use in applications at scale. We’ll continue to monitor how GPT-4o mini is being used and improve the model’s safety as we identify new risks.”
Furthermore, OpenAI claimed GPT-4o Mini is ahead of comparable LLMs in benchmarks. Compared to Google’s lighter-weight Gemini Flash and Anthropic’s Claude Haiku, Mini was usually between five and 15 percent more accurate in tests such as MMLU. In two outliers it was nearly twice as accurate as the competition, and in another a little worse than Gemini Flash but still ahead of Claude Haiku, allegedly.
Competition between OpenAI and Anthropic has a personal edge, as the latter was co-founded and built in part by executives and engineers from the former.
GPT-4o Mini performs well for sure but it doesn’t have an overall commanding lead – and that’s indicative of OpenAI’s recent loss of absolute leadership in the LLM arena. As veteran open source developer Simon Willison detailed in his keynote at the AI Engineer World’s Fair last month, 2024 has seen many of OpenAI’s competitors release their own GPT-4-class models.
“The best models are grouped together: GPT-4o, the brand new Claude 3.5 Sonnet and Google Gemini 1.5 Pro,” Willison declared. “I would classify all of these as GPT-4 class. These are the best available models, and we have options other than GPT-4 now. The pricing isn’t too bad either – significantly cheaper than in the past.”
At 82 percent accuracy in MMLU and a cost of 15 cents per million tokens, GPT-4o Mini is mostly ahead of the pack. However, Willison noted the LMSYS Chatbot Arena benchmark provides a more realistic evaluation of LLM quality because actual humans are asked to compare outputs and choose which is better – a brute-force but effective way of ranking different models.
GPT-4o Mini is too new to be included in the tournament-style benchmark, though he noted that full-size GPT-4o is only barely ahead of its rivals. Anthropic’s flagship Claude 3.5 Sonnet currently has 1,271 points to GPT-4o’s 1,287. Gemini 1.5 Pro isn’t far behind at 1,267. Slightly less performant but still respectable models include Nvidia and Mistral’s brand-new Nemotron 4 340B Instruct at 1,209 points, and Meta’s LlaMa 3 70B Instruct at 1,201.
Willison also noted the Mini is cheaper than Claude 3 Haiku and Gemini 1.5 Flash. OpenAI may be the best, in terms of these test scores, from SLM’s to big LLMs, but it no longer has the dominating lead it once had. That’s probably a good thing – between costly AI hardware and high power usage, the last thing AI needed was an LLM monopoly.