Improve Knowledge Integrity

We are working to extend the verifiability of content and increase resilience to misinformation.

knowledge integrity image

Project overview

The strategic direction of “Knowledge as a Service” envisions a world in which platforms and tools are available to allies and partners to “organize and exchange free, trusted knowledge beyond Wikimedia”. Achieving this goal requires not only new infrastructure for representing, curating, linking, and disseminating knowledge, but also efficient and scalable strategies to preserve the reliability and integrity of this knowledge. Technology platforms across the web are looking at Wikipedia as the neutral arbiter of information, but as Wikimedia aspires to extend its scope and scale, the possibility that parties with special interests will manipulate content, or bias to go undetected, becomes material.

We have been leading projects to help our communities represent, curate, and understand information provenance in Wikimedia projects more efficiently. We are conducting novel research on why editors source information, and how readers access sources; we are developing algorithms to identify statements in need of sources and gaps in information provenance; we are designing data structures to represent, annotate, and analyze source metadata in machine-readable formats as well as tools to monitor in real time changes made to references across the Wikimedia ecosystem.

More information can be found in our white paper.

Recent updates

  1. A Comparative Study of Reference Reliability in Multiple Language Editions of Wikipedia

    We quantify the cross-lingual patterns of the perennial sources list, a collection of reliability labels for web domains identified and collaboratively agreed upon by Wikipedia editors.
  2. Reference Quality in English Wikipedia

    We operationalize the notion of reference quality by defining reference need (RN), i.e., the percentage of sentences missing a citation, and reference risk (RR), i.e., the proportion of non-authoritative references.
  3. Fair Multilingual Vandalism Detection System for Wikipedia

    We are building the next generation of ML tools for Knowledge Integrity. The model is now in production. Please check our most recent paper explaining the research behind this new tool.
  4. Designing Trust Indicators on Wikipedia

    Watch the recorded talk for our new paper on designing trust indicators for readers of Wikipedia at CHI 2022.
  5. Controversial Content in Wikidata

    We released a report studying where we find (or don't find) "controversy" in Wikidata in terms of disputed content, collaboration, and edit wars.
  6. Disinformation and AI on Wikipedia

    We released a blogpost discussing the use of AI for addressing disinformation and why Wikimedia's approach is different than many social media platforms.
  7. A Large Scale Dataset for Content Reliability on Wikipedia

    We release Wiki-Reliability, a dataset of articles with reliability concerns on English Wikipedia for training language models to detect content reliability issues.
  8. Tracking Knowledge Propagation Across Wikipedia Languages

    We present a dataset of inter-language knowledge propagation in Wikipedia.
  9. Social Media Traffic Report

    We are piloting a daily report of the most-visited articles on English Wikipedia from Reddit, Youtube, Twitter, and Facebook.


Resources and links