AI-Era News Quality

Can you trust AI news summaries?

AI news summaries are wrong about half the time. The BBC and EBU 2025 research, the largest study of its kind, found 45 percent of AI assistant answers about the news carried at least one significant issue. Here is where they fail, where they hold up, and what a sane 2026 news stack looks like.

May 23, 2026

Kira Shishkin

AI news summaries are wrong about half the time. That headline number comes from the largest study of its kind: 22 public service media organizations in 18 countries, 14 languages, more than 3,000 AI responses, every major chatbot tested. Forty-five percent of answers carried at least one significant issue. An earlier BBC-only study from February 2025 put the figure at 51 percent. Sourcing fails in one in three answers. Accuracy fails in one in five. Trust in AI news summaries has climbed among readers under 35. The reliability of those summaries has not climbed in their favor.

How often do AI news summaries actually get it wrong?

The two largest pieces of research in the field arrive at the same range.

The BBC's February 2025 study tested ChatGPT, Copilot, Gemini, and Perplexity on 100 news questions, asked them to source from BBC News, and had 45 BBC journalists rate the answers. Fifty-one percent of responses had significant issues. Nineteen percent introduced factual errors when citing BBC content. Thirteen percent of direct quotes were either altered from the original or never appeared in the cited article.

The October 2025 follow-up, coordinated by the European Broadcasting Union and led by the BBC with 22 public service broadcasters, scaled the test. Over 3,000 responses. 18 countries. 14 languages. Forty-five percent of answers carried at least one significant issue. Thirty-one percent had serious sourcing problems. Twenty percent had major accuracy issues, including hallucinated details and outdated information. When smaller errors were counted as well, 81 percent of responses contained a mistake of some kind.

The pattern across studies is consistent. AI summaries of news are wrong roughly half the time, with the error rate spread across accuracy, attribution, context, and tone. The errors are not random tail events. They are systemic, and the researchers say so plainly.

Where do AI summaries fail most?

The errors group into five categories. In rough order of how often each shows up:

Sourcing. The single largest failure mode. About a third of answers attribute claims to outlets that did not make them, cite syndicated or copied versions of articles instead of originals, or fabricate links entirely. The chatbot's answer inherits authority from the source it claims to be using. When the attribution is wrong, the trust is on false credentials.
Outdated information. AI models train on snapshots. News changes daily. Any story that moves between the training cutoff and the query is a candidate for the assistant to confidently report stale facts. ChatGPT was caught telling researchers that Pope Francis was still leading the Catholic Church weeks after his death. Gemini insisted no NASA astronauts had ever been stranded on the International Space Station, despite two crew members spending nine months stuck there.
Hallucinated specifics. Names, numbers, dates, and quotes invented from nothing. The BBC found that one in five answers reproduced incorrect dates, numbers, or factual statements and attributed them to sources that contained no such claim. Thirteen percent of direct quotes were either edited from the source or never existed in the cited article at all.
Opinion presented as fact. Chatbots routinely flatten the boundary between editorial framing and reporting. The researchers found multiple examples of chatbots inserting adjectives or framing into quoted material that the source had not used. Editorialization shows up most in stories with contested framing, which is exactly the territory where the distinction between opinion and fact matters most.
Missing context. A factually accurate one-sentence answer that omits the qualifier, the counterargument, or the timing detail can read as a complete answer while delivering a misleading one. Context errors are the hardest for a reader to catch because the response sounds clean and confident.

A single response can hit several of these categories at once. The researchers counted any one of them as a significant issue. The 45 percent figure is the floor, not the ceiling.

Why is AI bad at news specifically?

Most AI tools handle synthesis tasks well. They draft emails, summarize meetings, explain concepts, and walk through technical material with reasonable accuracy. The gap on news is structural, not random.

Four reasons:

News changes faster than training data. Most models update their underlying knowledge on a slower cycle than the news cycle moves. Live search features bolt a retrieval layer on top, but the model still synthesizes what it retrieves through a static prior. When yesterday's headline contradicts last month's training, the model often picks the version it was trained on.
Confidence calibration is broken. OpenAI conceded in a September 2025 paper that its models are rewarded during training for guessing rather than admitting uncertainty. The same incentive structure that makes chatbots sound articulate makes them sound certain when they should not. Independent research from Columbia Journalism Review's Tow Center found premium chatbots produced more confidently incorrect answers than their free counterparts.
Source attribution is not enforced. A chatbot can cite a link without having read the article, paraphrase a syndicated copy as if it were the original, or invent a citation that looks plausible. The user sees a hyperlink and trusts it. The studies show the hyperlink is wrong roughly a third of the time.
News writing has voice. Reporting carries judgment about what is significant, what is contested, what counts as evidence, and what to leave out. Editorial decisions are baked into wording. A model that flattens that voice in summary will distort the story even when the literal facts survive.

The combination is unforgiving. A tool good at explaining the French Revolution can be unreliable on a story that broke this morning, because the news task has different failure modes from the synthesis task.

Which AI assistants are most accurate?

The October 2025 EBU study ranked the four leading assistants on the share of responses with significant issues:

Gemini. Significant issues in 76 percent of responses, sourcing errors in 72 percent. Worst performer by a wide margin.
ChatGPT. Significant issues in roughly a third of responses. Errors clustered in outdated information and confidently incorrect specifics.
Copilot. Significant issues in roughly a third of responses. Sourcing errors at 15 percent.
Perplexity. Best of the four on sourcing (15 percent) and overall reliability (around 30 percent significant issues). Still wrong about a third of the time.

The honest framing: there is a gap between Gemini and the rest, but the rest still fail roughly a third of the time. None of these tools is in the territory of reliable for news. A 30 percent error rate would not be tolerated from a news source. The market is tolerating it from these tools because the interface feels like a search result rather than a journalist.

When are AI summaries actually fine?

The studies suggest a usable boundary. AI summaries hold up better for some tasks than others.

Where AI summaries are competitive with reading the underlying material:

Evergreen explainers. What is inflation, how does a parliamentary system work, what is the history of NATO. Settled topics with broad documentation and slow-moving facts.
Background context on long-covered stories. A war that has been reported on for two years, an ongoing legal case, a public figure with a decade-long record. Plenty of training data, stable framing, lower hallucination risk.
Concept translation. Explaining a technical news story (a Federal Reserve decision, a climate report, a court ruling) to a non-specialist. The model is doing translation work, not original reporting.

Where AI summaries are unreliable:

Breaking news. Anything that broke in the last 48 hours. The model is either guessing or pulling thin sourcing.
Direct quotes. One in eight quotes from cited articles are either altered or invented. Do not trust them without checking the source.
Recent personnel changes. Heads of state, CEOs, military leadership. Training-cutoff drift produces the most confidently wrong errors here.
Contested framing. Stories where editorial choices about what to call something matter. The model flattens the distinction.
Statistics and dates. Numbers and timestamps are the most-hallucinated specifics.

The general rule: AI summaries are useful where the underlying facts are stable and widely documented, unreliable where the story is fresh, the framing matters, or the specifics are precise. Use them where the failure modes do not apply.

What does a sane news stack look like in 2026?

The data points toward a small set of practices. Five rules:

Treat AI summaries as a starting point, not an answer. Use them to surface a topic or get fast context. Verify against a primary source before acting on any specific claim. The "use AI as a research assistant, then check" pattern handles most of the failure modes.
Read the underlying article when accuracy matters. If the topic is recent or sensitive, or the kind of thing that would change your behavior, click through to the source the assistant cites. Confirm the assistant did not invent or alter the quote.
Pick one human-edited input and read it daily. A defined-input news source you trust, written by editors who carry the legal and reputational risk of being wrong. Format matters less than the editorial layer.
Distrust confident specifics. Numbers, dates, names, quotes: assume them wrong until verified. Confidence in the chatbot's voice is not evidence of accuracy.
Avoid AI for breaking news. The first 48 hours after a major event are when AI summaries fail most. Read a human report instead.

These rules are simple. The reason they are not common practice is that the AI assistant interface is designed to feel like a finished answer, not a draft to verify. The reader has to apply the discipline. The product will not.

How informed.now thinks about AI in the news stack

We built informed.now on the bet that a daily brief, written and edited by humans, sent to a channel without an algorithm, would solve a different problem than the AI assistant. The AI assistant is built to maximize answer fluency. The brief is built to maximize signal-to-noise on the day's news.

The two are complementary, not interchangeable. Read the brief at 8 AM. Ask a chatbot for background on a topic you want to go deeper on. Verify the chatbot's specifics against the underlying coverage. What the data does not support is using the chatbot as the primary news input. The error rate is too high and the failure modes are too well-documented to delegate the news task to a tool that gets it wrong half the time.

The category will improve. The studies show progress between February and October 2025 within the same vendor. But improving and trustworthy are different things. The reader's defense in the meantime is the same as it has always been for any information source: pick the humans you trust, read them on a schedule, verify the rest.

The bigger pattern

The question "can you trust AI news summaries" has a clean answer in 2026. No, not as a primary news source. Yes, with verification, for specific tasks where the failure modes do not bite. The mistake is using one tool for both jobs and assuming the fluent voice is doing the work of an editor. It is not. Editors carry judgment. Chatbots carry probabilities. Read accordingly.

‹ The case for reading less news

How news gets chosen ›