It’s the latest achievement for the publicly available and currently free technology, which was released at the end of last year and has been the subject of non-stop coverage since.
ChatGPT is one of a class of “Generative AI” programs that can produce images, video and audio, make arguments, summarise books, tell jokes, write code, and generally be useful to people. But for educators, the technology opens the door to widespread cheating on homework and take-home assignments, and many have been scrambling to rethink the nature of assessment or otherwise discourage students using the tool.
This week, the New York school system banned the use of ChatGPT, while Australian universities said they were reinstating “pen and paper” exams and beefing up cheating detection measures.
Now, in a pre-print study that has not yet been peer reviewed, researchers have explored the upper limits of ChatGPT’s capabilities. They say the AI tool achieved over 50 per cent in one of the most difficult standardised tests around: the US medical licensing exam (USMLE).
Just weeks after the launch of ChatGPT in December last year, researchers at a California-based healthcare provider, Ansible Health, began experimenting with the tool in their day-to-day work. They found it could help with tasks such as drafting payment notices, simplifying jargon-dense radiology reports, and even to brainstorm answers for “diagnostically challenging cases”.
“Overall, our clinicians reported a 33 per cent decrease … in the time required to complete documentation and indirect patient care tasks,” the study authors wrote.
To test the program’s ability to perform clinical reasoning, they had it sit a mock, abbreviated version of the USMLE, which is required for any doctor to obtain a license to practice medicine in the US.
The USMLE consists of three exams, with the first generally taken by second-year medical students, the second by those in their fourth year, and the last by physicians after a year of postgraduate education.
For most applicants, the tests require more than a year of dedicated preparation time. The first two tests each take a day, and the last takes two days. The researchers fed questions from previous exams to ChatGPT and had the answers, ranging from open-ended written responses to multiple choice, independently scored by two physician adjudicators.
They also checked that the answers to those questions weren’t likely to be in the dataset accessible by the AI tool when it had been trained – in other words that ChatGPT hadn’t already seen the answers. The tool received more than 50 per cent across all examinations, and approached the USMLE pass threshold of about 60 per cent.
“Therefore, ChatGPT is now comfortably within the passing range,” the paper concludes.
Phillip Dawson, an academic integrity researcher at Deakin University, said he wasn’t able to evaluate the study itself, but that “if the authors really did what they say they’ve done, then that’s scary stuff. There’s a sense that this is going to be even bigger than the pandemic in terms of how it changes assessment.”
Kane Murdoch, the head of academic misconduct at Macquarie University, said he was “not surprised at all” that ChatGPT could pass the USMLE. “[And] those are pretty serious and complex exams — simpler assessments would be a piece of cake.”
He and others are pushing for universities to embrace ChatGPT, rather than banning it outright.
“[ChatGPT] is like the advent of the calculator — a game changer,” he said. “Telling students that using it is forbidden won’t stop usage.
“I expect it to be very heavily used until such times as universities develop new strategies for assessment,” he added.
The Tertiary Education Quality and Standards Agency (TEQSA), which regulates higher education in Australia, appears to agree that ChatGPT shouldn’t be banned.
“That’s not a practical or sustainable strategy,” said Helen Gniel, who runs TEQSA’s higher education integrity unit. “Machine learning is only going to improve. It’s going to become quite standard.”
The use of the tool is made all that more difficult thanks to the fact that it’s plausible academic writing is very hard for educators or existing academic integrity software to detect, although the style is bland and formulaic, and it has a habit of making up facts and references.
Kane Murdoch, whose job at Macquarie University includes detecting the use of AI text-generators, said most academics “don’t know what they’re looking for” and would fail to notice when a student has used ChatGPT. “What I’m looking for is really gross errors of fact,” he said.
And, as academics around the world find ChatGPT both amazing and horrifying at the same time, the only thing that’s certain is that this and other tools like it are going to get a lot better very fast indeed.
Matthew Griffin, described as “The Adviser behind the Advisers” and a “Young Kurzweil,” is the founder and CEO of the World Futures Forum and the 311 Institute, a global Futures and Deep Futures consultancy working between the dates of 2020 to 2070, and is an award winning futurist, and author of “Codex of the Future” series.
Regularly featured in the global media, including AP, BBC, Bloomberg, CNBC, Discovery, RT, Viacom, and WIRED, Matthew’s ability to identify, track, and explain the impacts of hundreds of revolutionary emerging technologies on global culture, industry and society, is unparalleled. Recognised for the past six years as one of the world’s foremost futurists, innovation and strategy experts Matthew is an international speaker who helps governments, investors, multi-nationals and regulators around the world envision, build and lead an inclusive, sustainable future.
A rare talent Matthew’s recent work includes mentoring Lunar XPrize teams, re-envisioning global education and training with the G20, and helping the world’s largest organisations envision and ideate the future of their products and services, industries, and countries.
Matthew's clients include three Prime Ministers and several governments, including the G7, Accenture, Aon, Bain & Co, BCG, Credit Suisse, Dell EMC, Dentons, Deloitte, E&Y, GEMS, Huawei, JPMorgan Chase, KPMG, Lego, McKinsey, PWC, Qualcomm, SAP, Samsung, Sopra Steria, T-Mobile, and many more.
FANATICALFUTURIST PODCAST! Hear about ALL the latest futures news and breakthroughs!SUBSCRIBE
1000's of articles about the exponential future, 1000's of pages of insights, 1000's of videos, and 100's of exponential technologies: Get The Email from 311, your no-nonsense briefing on all the biggest stories in exponential technology and science.