Scroll Top

OpenAI’s mini GPT-4o model is mighty and 60% cheaper than its big brother

WHY THIS MATTERS IN BRIEF

OpenAI is going for the large foundational AI model market and also the smaller niche AI model market, and this is a big announcement that most people will pass by.

 

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trendsconnect, watch a keynote, or browse my blog.

OpenAI has made available GPT-4o Mini – a much smaller and so called Small Language Model (SLM), and much cheaper version of its GPT-4o generative Large Language Model (LLM) – via its cloud.

 

RELATED
Andrew Ng has started a new AI company: DeepLearning.ai

 

The Microsoft-backed super lab said Thursday GPT-4o Mini is like regular GPT-4o in that it’s multi-modal – it can handle more than just the written word – and has a context window of 128,000 tokens and was trained on materials dated up to October 2023. The Mini version can emit up to 16,000 tokens of output.

However, while GPT-4o, OpenAI’s top-end model, costs $5 and $15 per million input and output tokens respectively the Mini edition costs over 60% less at 15 and 60 cents respectively. And you can halve those numbers if using delayed batch processing.

 

The Future of AI, by keynote Matthew Griffin

 

I’m told the cut-down version is not fully featured yet, supporting just text and vision via its API. Other input and output formats, such as audio, are coming in the indeterminate future.

In creating GPT-4o Mini, OpenAI emphasised how safe it had made the thing, claiming to filter out offensive data from training materials and giving it the same guardrails as GPT-4o. Mini has also gained mechanisms to, ideally, thwart attempts to persuade the model to do things it’s not supposed to, such as making it ignore previous instructions and override its makers’ intentions, according to OpenAI.

 

RELATED
ChatGPT is almost acing the hardest USMLE medical exams in the USA

 

“GPT-4o mini in the API is the first model to apply our instruction hierarchy method, which helps to improve the model’s ability to resist jailbreaks, prompt injections, and system prompt extractions,” the super-lab said. “This makes the model’s responses more reliable and helps make it safer to use in applications at scale. We’ll continue to monitor how GPT-4o mini is being used and improve the model’s safety as we identify new risks.”

Furthermore, OpenAI claimed GPT-4o Mini is ahead of comparable LLMs in benchmarks. Compared to Google’s lighter-weight Gemini Flash and Anthropic’s Claude Haiku, Mini was usually between five and 15 percent more accurate in tests such as MMLU. In two outliers it was nearly twice as accurate as the competition, and in another a little worse than Gemini Flash but still ahead of Claude Haiku, allegedly.

Competition between OpenAI and Anthropic has a personal edge, as the latter was co-founded and built in part by executives and engineers from the former.

 

RELATED
IBM's "flying brain" Watson heads into space

 

GPT-4o Mini performs well for sure but it doesn’t have an overall commanding lead – and that’s indicative of OpenAI’s recent loss of absolute leadership in the LLM arena. As veteran open source developer Simon Willison detailed in his keynote at the AI Engineer World’s Fair last month, 2024 has seen many of OpenAI’s competitors release their own GPT-4-class models.

“The best models are grouped together: GPT-4o, the brand new Claude 3.5 Sonnet and Google Gemini 1.5 Pro,” Willison declared. “I would classify all of these as GPT-4 class. These are the best available models, and we have options other than GPT-4 now. The pricing isn’t too bad either – significantly cheaper than in the past.”

At 82 percent accuracy in MMLU and a cost of 15 cents per million tokens, GPT-4o Mini is mostly ahead of the pack. However, Willison noted the LMSYS Chatbot Arena benchmark provides a more realistic evaluation of LLM quality because actual humans are asked to compare outputs and choose which is better – a brute-force but effective way of ranking different models.

 

RELATED
Uber used AI to automatically fire drivers and now they're going to court

 

GPT-4o Mini is too new to be included in the tournament-style benchmark, though he noted that full-size GPT-4o is only barely ahead of its rivals. Anthropic’s flagship Claude 3.5 Sonnet currently has 1,271 points to GPT-4o’s 1,287. Gemini 1.5 Pro isn’t far behind at 1,267. Slightly less performant but still respectable models include Nvidia and Mistral’s brand-new Nemotron 4 340B Instruct at 1,209 points, and Meta’s LlaMa 3 70B Instruct at 1,201.

Willison also noted the Mini is cheaper than Claude 3 Haiku and Gemini 1.5 Flash. OpenAI may be the best, in terms of these test scores, from SLM’s to big LLMs, but it no longer has the dominating lead it once had. That’s probably a good thing – between costly AI hardware and high power usage, the last thing AI needed was an LLM monopoly.

Related Posts

Leave a comment

You have Successfully Subscribed!

Pin It on Pinterest

Share This