Research · Fact-checking methodology

Do fact-checkers actually contradict each other? What a Harvard study found.

Published June 9, 2026 · 5 min read

6.5%

of claims overlap between major fact-checkers

~0.1%

genuine contradiction rate (1 in 749 matching claims)

13.1%

of apparent disagreements are a rating-scale artifact

The critique comes up constantly: fact-checkers can't be trusted because they disagree with each other. It sounds compelling, but a 2023 study from Harvard Kennedy School examined the question rigorously — with actual data — and the finding is more surprising than the conventional wisdom suggests.

Most fact-checkers are checking different things entirely

Researchers Lee, Xiong, Seo, and Lee scraped the full published corpora of four major fact-checking organizations — Snopes, PolitiFact, Logically, and the Australian Associated Press FactCheck — and built an automated system to detect when two checkers had reviewed the same underlying claim. Across Snopes and PolitiFact from 2016 to 2022, representing more than 22,000 articles combined, only 749 claims overlapped. That is a 6.5% overlap rate.

This number matters for the disagreement debate. Most apparent conflicts between fact-checkers aren't conflicts at all: the two organizations simply aren't addressing the same claim. The fact-checking landscape is fragmented and diverse, with each organization following its own agenda-setting process and geographic focus. Perceived disagreement between checkers is often a selection effect, not a judgment conflict.

When they do check the same claim, they almost never actually disagree

Of the 749 matching claims, 69.6% had identical ratings and 74.2% showed identical veracity judgments at the outset. The researchers then manually examined every disagreement to find its cause. After that analysis, only 1 claim out of 749 — approximately 0.1% — was a genuine contradiction: a real dispute over the interpretation of evidence. That single case involved a contextual disagreement over a Ben Carson quote.

"When multiple fact-checking organizations consistently agree… the public is more likely to trust their assessments."

Lee et al. (2023), HKS Misinformation Review — paraphrasing the study's framing of corroboration as the clearest trust signal in the research.

Most "disagreement" is a rating-scale artifact — not a judgment conflict

So where does the perceived disagreement come from? The study's breakdown of the 228 rating-level discrepancies is instructive:

• 98 cases (13.1% of all disagreements) were caused entirely by mismatched rating scales — Snopes uses fine-grained category labels like Miscaptioned, Scam, and Satire that don't map cleanly onto PolitiFact's six-point Truth-O-Meter. The underlying judgments were often identical; the stamps looked different.
• 59 cases (7.9%) reflected different emphasis or focus on the same underlying fact — not opposing conclusions, but different framings.
• 57 cases (7.6%) were similar but subtly distinct claims — the automated matching system caught topic-level overlap that human reviewers would not have coded as the same statement.
• 13 cases (1.7%) were timing differences — one checker rated "Unproven" before sufficient evidence existed; the other rated "False" days later once evidence had emerged.
• 1 case (0.1%) was a genuine contradiction — a real dispute over the meaning of evidence.

In other words, the dominant source of apparent disagreement is not that fact-checkers interpret evidence differently. It is that they use incompatible verdict labels, which makes the same judgment look like different judgments.

What this research means for how FactGuard works

These findings directly shaped FactGuard's approach. Three specific choices trace back to what the research shows.

Corroboration receipts, not just verdicts

The study's clearest practical recommendation is that corroboration — independent checkers reaching the same conclusion — is the strongest trust signal available. Rather than asking you to trust a single automated verdict, FactGuard surfaces cross-checker agreement where it exists: when IFCN-accredited fact-checking organizations have independently reached the same conclusion on a claim, we show it, with links to each review. The research is direct on this point: "fact-checking articles gain value from the confirmation through multiple fact checks by different organizations."

Small, mappable verdict bands

The finding that 13.1% of apparent disagreements were a pure rating-scale artifact — not a difference of judgment — is a warning against fine-grained, non-comparable verdict taxonomies. FactGuard uses a small, consistent set of verdict labels: True, Mostly True, Mixed, Mostly False, False, and Not Yet Verified. The nuance belongs in the supporting receipts and limitations, not in a proliferating label taxonomy that manufactures phantom conflict.

Transparent sources for every verdict

The study specifically praises fact-checkers who publish "detailed and comprehensive investigations that include multiple sources and references" — and treats that transparency as the mechanism of credibility, not an add-on. Every FactGuard verdict links to the cited primary sources and shows the reasoning that produced the result. A verdict without visible reasoning is just a stamp. We call our approach "receipts, not stamps."

What the study doesn't resolve

The research is encouraging for confidence in major fact-checking organizations, but it has limits worth being clear about. The high agreement rate applies to cases where two checkers have both reviewed the same claim — a small fraction of all claims checked. The long tail of novel, breaking, or niche claims arrives without any prior independent verdict to compare against. That is exactly why FactGuard's "Not Yet Verified" result is a first-class outcome rather than an error state: returning an honest "the evidence is insufficient to assess this confidently" is more useful than forcing a verdict from thin data.

The research also documents that agreement drops in the ambiguous middle ground — claims that are partially true, context-dependent, or genuinely contested. Overconfident verdicts on exactly these claims are where fact-checking most commonly fails. Calibrated confidence is not a hedge; it is the more accurate answer.

Source cited in this article (CC BY 4.0):
Lee, S., Xiong, A., Seo, H., & Lee, D. (2023). "'Fact-checking' fact checkers: A data-driven approach." Harvard Kennedy School (HKS) Misinformation Review, 4(5). DOI: 10.37016/mr-2020-126. Open access at HKS Misinformation Review → Licensed under Creative Commons Attribution 4.0 (CC BY 4.0). All statistics and findings cited above are drawn from this paper. Interpretations and product choices described here are FactGuard's own.

Published 2026-06-09 · Last reviewed 2026-06-09

For general information only — not legal, medical, or financial advice. See our methodology for how FactGuard verifies claims.

Spotted a mistake? Contact us.