OpenAI Claims Major Drop in Hallucinations with GPT-5
OpenAI claims GPT-5 is ~45% less likely to produce factual errors than GPT-4o on real-world queries.

OpenAI has announced the launch of GPT-5, which the startup claims is the most advanced AI model yet. It also claims its latest AI model delivers significantly more accurate answers compared to its predecessors, with a notable reduction in hallucinations.
According to the company, GPT-5 is ~45% less likely to produce factual errors than GPT-4o on real-world queries with web search enabled, and ~80% less likely when using its “thinking” mode—OpenAI’s term for deeper reasoning.
The improvements are particularly visible in complex, open-ended questions. OpenAI stress-tested GPT-5 using public factuality benchmarks like LongFact and FActScore. The results showed that “GPT-5 thinking” produces around six times fewer hallucinations than its earlier o3 model, demonstrating a significant leap in long-form factual accuracy.
In addition to factual reliability, GPT-5 also shows enhanced instruction-following and reduced sycophancy. The model’s performance is especially strong in writing, coding, and health-related queries—three of ChatGPT’s most common use cases.
Recently, OpenAI also released two open-source models- GPT-OSS-120B and GPT-OSS-20B- which also hallucinate significantly more compared to older models.
"gpt-oss-120b and gpt-oss-20b underperform OpenAI o4-mini on both our SimpleQA and PersonQA evaluations," OpenAI admitted.
Previously, OpenAI admitted that their earlier models– o3 and o4 mini– hallucinate more often than older reasoning models like o1, o1-mini, and o3-mini, as well as traditional models such as GPT-4.
The startup said “more research is needed” to explain why hallucinations increase as reasoning capabilities scale.