Programs

Address Knowledge Gaps

We are developing systems that identify and address gaps across Wikimedia projects.

Project overview

In 2030, the world’s population is projected to be 8.6 billion, almost 80% of which will live in Africa and Asia. Latin America’s population will continue to grow rapidly while population growth in Europe and Northern America—today’s largest sources of contributors and readership to Wikimedia projects—will plateau. How can we help Wikimedia projects thrive in a world that is becoming increasingly different from the one we are building for today, both in terms of production and consumption of content?

The Wikimedia movement has identified as a strategic goal supporting “the knowledge and communities that have been left out by structures of power and privilege”. In order to meet this goal, we need to understand how to serve audiences, groups, and cultures that today are underrepresented in Wikipedia, Wikidata, Commons and other Wikimedia projects—in terms of participation, access, representation, and coverage.

We have begun to advance knowledge equity with a research program to address knowledge gaps. This program aims to deliver citable, peer-reviewed knowledge and new technology in order to generate baseline data on the diversity of the Wikimedia contributor population, understand reader needs across languages, remove barriers for contribution by underrepresented groups, and help contributors identify and expand missing content across languages and topics.

More information can be found in our roadmap.

Recent updates

Publications

Dale Zhou, Shubhankar P. Patankar, David M. Lydon-Staley, Perry Zurn, Martin Gerlach, Dani S. Bassett. 2024. Architectural styles of curiosity in global Wikipedia mobile app readership. Science Advances. 10, eadn3268.
Tomás Feith, Akhil Arora, Martin Gerlach, Debjit Paul, Robert West. 2024. Entity Insertion in Multilingual Linked Corpora: The Case of Wikipedia. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP '24).
Paramita Das, Isaac Johnson, Diego Saez-Trumper, Pablo Aragón. 2024. Language-Agnostic Modeling of Wikipedia Articles for Content Quality Assessment across Languages. Proceedings of the Eighteenth International AAAI Conference on Web and Social Media (ICWSM '24).
Mo Houtti, Isaac Johnson, Morten Warncke-Wang, Loren Terveen. 2024. Leveraging Recommender Systems to Reduce Content Gaps on Peer Production Platforms. Proceedings of the Eighteenth International AAAI Conference on Web and Social Media (ICWSM '24).
Akhil Arora, Robert West, Martin Gerlach. 2024. Orphan Articles: The Dark Matter of Wikipedia. Proceedings of the Eighteenth International AAAI Conference on Web and Social Media (ICWSM '24).
Tiziano Piccardi, Martin Gerlach, Robert West. 2024. Curious Rhythms: Temporal Regularities of Wikipedia Consumption. Proceedings of the Eighteenth International AAAI Conference on Web and Social Media (ICWSM '24).
Morten Warncke-Wang, Rita Ho, Marshall Miller, Isaac Johnson. 2023. Increasing Participation in Peer Production Communities with the Newcomer Homepage. Proc. ACM Hum.-Comput. Interact. (CSCW '23). https://doi.org/10.1145/3610071
Tiziano Piccardi, Martin Gerlach, Akhil Arora, and Robert West. 2023. A Large-Scale Characterization of How Readers Browse Wikipedia. ACM Transactions on the Web. https://doi.org/10.1145/3580318
Akhil Arora, Martin Gerlach, Tiziano Piccardi, Alberto García-Durán, Robert West. 2022. Wikipedia Reader Navigation: When Synthetic Data Is Enough. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (WSDM '22).
Narges Azizifard, Lodewijk Gelauff, Jean-Olivier Gransard-Desmond, Miriam Redi, Rossano Schifanella. 2022. Wiki Loves Monuments: Crowdsourcing the Collective Image of the Worldwide Built Heritage. J. Comput. Cult. Herit. 16, 1, Article 20 (March 2023), 27 pages. https://doi.org/10.1145/3569092
Mo Houtti, Isaac Johnson, Joel Cepeda, Soumya Khandelwal, Aviral Bhatnagar, Loren Terveen. 2022. "We Need a Woman in Music": Exploring Wikipedia's Values on Article Priority. 25th ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW '22). https://doi.org/10.1145/3555156
Tiziano Piccardi, Martin Gerlach, Robert West. 2022. Going Down the Rabbit Hole: Characterizing the Long Tail of Wikipedia Reading Sessions. WikiWorkshop 2022: In Companion Proceedings of The Web Conference 2022 (WWW '22).
Pablo Beytía, Pushkal Agarwal, Miriam Redi, Vivek K. Singh. 2022. Visual Gender Biases in Wikipedia: A Systematic Evaluation across the Ten Most Spoken Languages. Proceedings of the Sixteenth International AAAI Conference on Web and Social Media (ICWSM '22).
Daniele Rama, Tiziano Piccardi, Miriam Redi, Rossano Schifanella. 2022. A Large Scale Study of Reader Interactions with Images on Wikipedia. EPJ Data Science. 11, Article 1. https://doi.org/10.1140/epjds/s13688-021-00312-8
Martin Gerlach, Marshall Miller, Rita Ho, Kosta Harlan, Djellel Difallah. 2021. A Multilingual Entity Linking System for Wikipedia with a Machine-in-the-Loop Approach. 30th ACM International Conference on Information and Knowledge Management (CIKM '21).
Isaac Johnson, Florian Lemmerich, Diego Sáez-Trumper, Robert West, Markus Strohmaier, Leila Zia. 2021. Global gender differences in Wikipedia readership. Proceedings of the Fifteenth International AAAI Conference on Web and Social Media (ICWSM '21).
Miriam Redi, Martin Gerlach, Isaac Johnson, Jonathan Morgan, Leila Zia. 2021. A Taxonomy of Knowledge Gaps for Wikimedia Projects (Second Draft).
Oleksii Moskalenko, Denis Parra, Diego Saez-Trumper. 2020. Scalable Recommendation of Wikipedia Articles to Editors Using Representation Learning. ComplexRec 2020, Workshop on Recommendation in Complex Scenarios at the ACM RecSys Conference on Recommender Systems (RecSys 2020).
Miriam Redi, Martin Gerlach, Isaac Johnson, Jonathan Morgan, Leila Zia. 2020. A Taxonomy of Knowledge Gaps for Wikimedia Projects (First Draft).
Valerio Lorini, Javier Rando, Diego Saez-Trumper, Carlos Castillo. 2020. Uneven Coverage of Natural Disasters in Wikipedia: The Case of Floods. 17th International Conference on Information Systems for Crisis Response and Management (ISCRAM 2020).
Kateryna Liubonko, Diego Sáez-Trumper. 2020. Matching Ukrainian Wikipedia Red Links with English Wikipedia’s Articles. WikiWorkshop 2020: In Companion Proceedings of the Web Conference 2020 (WWW '20). https://doi.org/10.1145/3366424.3383571
Ramtin Yazdanian, Leila Zia, Jonathan Morgan, Bahodir Mansurov, Robert West. 2019. Eliciting New Wikipedia Users’ Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start. In Proceedings of the Thirteenth International AAAI Conference on Web and Social Media (ICWSM '19).
Florian Lemmerich, Diego Sáez-Trumper, Robert West, Leila Zia. 2019. Why the World Reads Wikipedia: Beyond English Speakers. International ACM Conference on Web Search and Data Mining (WSDM '19). https://doi.org/10.1145/3289600.3291021
Tiziano Piccardi, Michele Catasta, Leila Zia, and Robert West. 2018. Structuring Wikipedia Articles with Section Recommendations. Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '18).
Philipp Singer, Florian Lemmerich, Robert West, Leila Zia, Ellery Wulczyn, Markus Strohmaier, and Jure Leskovec. 2017. Why We Read Wikipedia. In Proceedings of the 26th International Conference on World Wide Web (WWW '17). DOI: https://doi.org/10.1145/3038912.3052716
Ashwin Paranjape, Robert West, Leila Zia, and Jure Leskovec. 2016. Improving Website Hyperlink Structure Using Server Logs. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM '16). ACM, New York, NY, USA, 615-624. https://doi.org/10.1145/2835776.2835832
Ellery Wulczyn, Robert West, Leila Zia, and Jure Leskovec. 2016. Growing Wikipedia Across Languages via Recommendation. In Proceedings of the 25th International Conference on World Wide Web (WWW '16). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 975-985. https://doi.org/10.1145/2872427.2883077

Wikimedia Research

Address Knowledge Gaps

Project overview

Recent updates

A multilingual model for entity insertion in Wikipedia articles

Curiosity of Wikipedia readers

A multilingual model for measuring readability

Language-Agnostic Modeling of Wikipedia Article Quality

Recommender Systems to Reduce Content Gaps

Impact of the Newcomer Homepage

Temporal Regularities of Wikipedia Consumption

Orphan Articles: The Dark Matter of Wikipedia

Leveraging Recommender Systems to Reduce Content Gaps on Peer Production Platforms

Publications

Resources and links