Programs

Address Knowledge Gaps

We are developing systems that identify and address gaps across Wikimedia projects.

knowledge gaps image

Project overview

In 2030, the world’s population is projected to be 8.6 billion, almost 80% of which will live in Africa and Asia. Latin America’s population will continue to grow rapidly while population growth in Europe and Northern America—today’s largest sources of contributors and readership to Wikimedia projects—will plateau. How can we help Wikimedia projects thrive in a world that is becoming increasingly different from the one we are building for today, both in terms of production and consumption of content?

The Wikimedia movement has identified as a strategic goal supporting “the knowledge and communities that have been left out by structures of power and privilege”. In order to meet this goal, we need to understand how to serve audiences, groups, and cultures that today are underrepresented in Wikipedia, Wikidata, Commons and other Wikimedia projects—in terms of participation, access, representation, and coverage.

We have begun to advance knowledge equity with a research program to address knowledge gaps. This program aims to deliver citable, peer-reviewed knowledge and new technology in order to generate baseline data on the diversity of the Wikimedia contributor population, understand reader needs across languages, remove barriers for contribution by underrepresented groups, and help contributors identify and expand missing content across languages and topics.

More information can be found in our roadmap.

Recent updates

  1. A multilingual model for entity insertion in Wikipedia articles

    We published a new paper at EMNLP ‘24 where develop a model for automatically locating a suitable position for a new link in a Wikipedia article which could support editors in cases where a suitable anchor text does not yet exist.
  2. Curiosity of Wikipedia readers

    We published a new paper in Science Advances where we uncover complex patterns of Wikipedia navigation and characterizes reader curiosity types.
  3. A multilingual model for measuring readability

    We published a new paper at ACL ‘24 where we develop a multilingual model to score the readability of Wikipedia articles across languages.
  4. Language-Agnostic Modeling of Wikipedia Article Quality

    We published a dataset paper in ICWSM '24 with the results of applying our language-agnostic article quality model to millions of revisions from over 300 language editions of Wikipedia.
  5. Recommender Systems to Reduce Content Gaps

    We published a paper in ICWSM '24 describing an experiment to explore the potential role of recommender systems in reducing content gaps on Wikipedia.
  6. Impact of the Newcomer Homepage

    We published a new paper showing how the Newcomer Homepage has increased participation amongst newcomers to Wikipedia.
  7. Temporal Regularities of Wikipedia Consumption

    We published a new paper about the temporal patterns in how Wikipedia articles are accessed by readers helping us understand the diversity of their information needs.
  8. Orphan Articles: The Dark Matter of Wikipedia

    We published a new paper about the surprisingly large number of orphan articles in Wikipedia and how to improve their visibility.
  9. Leveraging Recommender Systems to Reduce Content Gaps on Peer Production Platforms

    We published a new paper about editor willingness to accept more diverse recommendations that shows promise for using recommender systems to address content gaps.

Publications

Resources and links