Improve Knowledge Integrity

We are working to extend the verifiability of content and increase resilience to misinformation.

knowledge integrity image

Project overview

The strategic direction of “Knowledge as a Service” envisions a world in which platforms and tools are available to allies and partners to “organize and exchange free, trusted knowledge beyond Wikimedia”. Achieving this goal requires not only new infrastructure for representing, curating, linking, and disseminating knowledge, but also efficient and scalable strategies to preserve the reliability and integrity of this knowledge. Technology platforms across the web are looking at Wikipedia as the neutral arbiter of information, but as Wikimedia aspires to extend its scope and scale, the possibility that parties with special interests will manipulate content, or bias to go undetected, becomes material.

We have been leading projects to help our communities represent, curate, and understand information provenance in Wikimedia projects more efficiently. We are conducting novel research on why editors source information, and how readers access sources; we are developing algorithms to identify statements in need of sources and gaps in information provenance; we are designing data structures to represent, annotate, and analyze source metadata in machine-readable formats as well as tools to monitor in real time changes made to references across the Wikimedia ecosystem.

More information can be found in our white paper.

Recent updates

  1. Disinformation and AI on Wikipedia

    We released a blogpost discussing the use of AI for addressing disinformation and why Wikimedia's approach is different than many social media platforms.
  2. A Large Scale Dataset for Content Reliability on Wikipedia

    We release Wiki-Reliability, a dataset of articles with reliability concerns on English Wikipedia for training language models to detect content reliability issues.
  3. Tracking Knowledge Propagation Across Wikipedia Languages

    We present a dataset of inter-language knowledge propagation in Wikipedia.
  4. Social Media Traffic Report

    We are piloting a daily report of the most-visited articles on English Wikipedia from Reddit, Youtube, Twitter, and Facebook.
  5. Patrolling on Wikipedia

    We released a report on challenges around patrolling, vandalism, and related editor workflows.
  6. Citation needed coverage

    VentureBeat provided some coverage of our Citation Needed study on which statements in English Wikipedia are lacking citations and why.
  7. Citation needed blog post

    We are using machine learning to predict whether—and why—any given sentence on Wikipedia may need a citation in order to help editors identify areas of content violating the verifiability policy.
  8. Reader trust survey

    The first round of surveys went out for research on the role of citations in how readers evaluate Wikipedia articles.
  9. WikiCite 2018

    The third annual WikiCite conference wrapped up at Berkeley, California. Stay tuned for reports.
  10. WikiCite at TechStorm

    Miriam Redi and Antonin Delpeuch presented some fun with WikiCite in Wikidata.
  11. WikiCiteVis: exploring citations of Wikipedia

    Find out how scholarly articles are cited on Wikipedia with WikiCiteVis.
  12. Accessibility of Wikipedia references

    How many Wikipedia references are available to read? We measured the proportion of open access sources across languages and topics.
  13. Characterizing Wikipedia Citation Usage

    We're starting a new collaboration with researchers at Stanford University and EPFL to understand the role of external citations among Wikipedia readers.
  14. Wikipedia’s top-cited scholarly articles — revealed

    "Gene collections and astronomy studies dominate the list of the most-cited publications with DOIs on the popular online encyclopaedia." Nature on our dataset of citations by identifier in Wikipedia.
  15. The most-cited authors of Wikipedia had no idea

    "A single academic paper, published by three Australian researchers in 2007, has been cited by Wikipedia editors over 2.8 million times. And the researchers behind it didn't have a clue." Our dataset and analysis of citations by identifier in Wikipedia got featured in Wired.
  16. What are the ten most cited sources on Wikipedia? Let’s ask the data.

    We released a dataset with fifteen million records, documenting source usage in Wikipedia by identifier across nearly 300 languages.
  17. Unsourced statements in Wikipedia

    We are kicking off a new project and a collaboration with a team at Leibniz Universität Hannover to identify statements in Wikipedia that need an inline citation to a reliable source, using a machine-assisted framework.
  18. The WikiCite 2017 report

    We published our annual report, highlighting the accomplishments the community and our network of partner organizations have achieved this past year.
  19. Citations with context

    We published a dataset representing structured metadata and contextual information about every reference added in the history of English Wikipedia.
  20. Unlocking citations from tens of millions of scholarly papers

    We gave a keynote on our progress in liberating open citation data, and reusing it in projects like Wikidata, at SWIB17, the 2017 Conference on Semantic Web in Libraries.
  21. Wikidata as a structured repository of bibliographic data

    A video of our session on WikiCite at WikidataCon 2017 and an overview of why we're building an open knowledge base of citable sources to support free knowledge.
  22. Sockpuppet detection in Wikimedia projects

    We started a formal collaboration with researchers at Stanford University aiming to design and evaluate algorithmic strategies to identify potential sockpuppet accounts on Wikipedia. The aim is to develop high-precision detection models using previously identified, malicious sockpuppets.
  23. Initiative for Open Citations

    We launched the Initiative for Open Citations (I4OC): an advocacy initiative and coalition co-founded by the Wikimedia Foundation, promoting the unrestricted availability of citation data.


Resources and links