Programs

Improve Knowledge Integrity

We are working to extend the verifiability of content and increase resilience to misinformation.

Project overview

The strategic direction of “Knowledge as a Service” envisions a world in which platforms and tools are available to allies and partners to “organize and exchange free, trusted knowledge beyond Wikimedia”. Achieving this goal requires not only new infrastructure for representing, curating, linking, and disseminating knowledge, but also efficient and scalable strategies to preserve the reliability and integrity of this knowledge. Technology platforms across the web are looking at Wikipedia as the neutral arbiter of information, but as Wikimedia aspires to extend its scope and scale, the possibility that parties with special interests will manipulate content, or bias to go undetected, becomes material.

We have been leading projects to help our communities represent, curate, and understand information provenance in Wikimedia projects more efficiently. We are conducting novel research on why editors source information, and how readers access sources; we are developing algorithms to identify statements in need of sources and gaps in information provenance; we are designing data structures to represent, annotate, and analyze source metadata in machine-readable formats as well as tools to monitor in real time changes made to references across the Wikimedia ecosystem.

More information can be found in our white paper.

Recent updates

Publications

Aitolkyn Baigutanova, Diego Saez-Trumper, Miriam Redi, Meeyoung Cha, Pablo Aragón. 2023. A Comparative Study of Reference Reliability in Multiple Language Editions of Wikipedia. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM ’23), October 21–25, 2023, Birmingham, United Kingdom. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3583780.3615254.
Mykola Trokhymovych, Muniza Aslam, Ai-Jou Chou, Ricardo Baeza-Yates, Diego Saez-Trumper. 2023. Fair multilingual vandalism detection system for Wikipedia. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '23).
Aitolkyn Baigutanova, Jaehyeon Myung, Diego Saez-Trumper, Ai-Jou Chou, Miriam Redi, Changwook Jung, Meeyoung Cha. 2023. Longitudinal Assessment of Reference Quality on Wikipedia. In Proceedings of The Web Conference 2023 (WWW '23). https://doi.org/10.1145/3543507.3583218
Andrew Kuznetsov, Margeigh Novotny, Jessica Klein, Diego Saez-Trumper, Aniket Kittur. 2022. Templates and Trust-o-meters: Towards a widely deployable indicator of trust in Wikipedia. CHI '22: CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491102.3517523
Mykola Trokhymovych, Diego Saez-Trumper. 2021. WikiCheck: An end-to-end open source Automatic Fact-Checking API based on Wikipedia. 30th ACM International Conference on Information and Knowledge Management (CIKM '21).
Pablo Aragón, Diego Sáez-Trumper. 2021. A preliminary approach to knowledge integrity risk assessment in Wikipedia projects. MIS2'21: Misinformation and Misbehavior Mining on the Web Workshop held in conjunction with KDD 2021.
KayYen Wong, Miriam Redi, Diego Saez-Trumper. 2021. Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia. SIGIR '21. Dataset.
Rodolfo Valentim, Giovanni Comarela, Souneil Park, Diego Saez-Trumper. 2021. Tracking Knowledge Propagation Across Wikipedia Languages. Proceedings of the Fifteenth International AAAI Conference on Web and Social Media (ICWSM '21). Dataset.
Tiziano Piccardi, Miriam Redi, Giovanni Colavizza, Robert West. 2020. Quantifying Engagement with Citations on Wikipedia. In Proceedings of The Web Conference 2020 (WWW '20). https://doi.org/10.1145/3366423.3380300
Diego Saez-Trumper. 2019. Online Disinformation and the Role of Wikipedia.
Miriam Redi, Besnik Fetahu, Jonathan Morgan, Dario Taraborelli. 2019. Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's Verifiability. In Proceedings of The Web Conference 2019 (WWW '19). https://doi.org/10.1145/3308558.3313618
Dario Taraborelli, Lydia Pintscher, Daniel Mietchen, and Sarah Rodlund. 2017. WikiCite 2017 Report. figshare. https://doi.org/10.6084/m9.figshare.5648233
Dario Taraborelli, Jonathan Dugan, Lydia Pintscher, Daniel Mietchen, and Cameron Neylon. 2016. WikiCite 2016 Report. figshare. https://doi.org/10.6084/m9.figshare.4042530

Wikimedia Research

Improve Knowledge Integrity

Project overview

Recent updates

A Comparative Study of Reference Reliability in Multiple Language Editions of Wikipedia

Reference Quality in English Wikipedia

Fair Multilingual Vandalism Detection System for Wikipedia

Designing Trust Indicators on Wikipedia

Controversial Content in Wikidata

Disinformation and AI on Wikipedia

A Large Scale Dataset for Content Reliability on Wikipedia

Tracking Knowledge Propagation Across Wikipedia Languages

Social Media Traffic Report

Publications

Resources and links