Address Knowledge Gaps

We are developing systems that identify and address gaps across Wikimedia projects.

knowledge gaps image

Project overview

In 2030, the world’s population is projected to be 8.6 billion, almost 80% of which will live in Africa and Asia. Latin America’s population will continue to grow rapidly while population growth in Europe and Northern America—today’s largest sources of contributors and readership to Wikimedia projects—will plateau. How can we help Wikimedia projects thrive in a world that is becoming increasingly different from the one we are building for today, both in terms of production and consumption of content?

The Wikimedia movement has identified as a strategic goal supporting “the knowledge and communities that have been left out by structures of power and privilege”. In order to meet this goal, we need to understand how to serve audiences, groups, and cultures that today are underrepresented in Wikipedia, Wikidata, Commons and other Wikimedia projects—in terms of participation, access, representation, and coverage.

We have begun to advance knowledge equity with a research program to address knowledge gaps. This program aims to deliver citable, peer-reviewed knowledge and new technology in order to generate baseline data on the diversity of the Wikimedia contributor population, understand reader needs across languages, remove barriers for contribution by underrepresented groups, and help contributors identify and expand missing content across languages and topics.

More information can be found in our white paper.

Recent updates

  1. The Wikipedia image/caption matching challenge

    We released a new dataset and Kaggle competition aimed at addressing missing captions for images on Wikipedia.
  2. Evaluating list building tools for ad-hoc topic models

    We published some initial insights on how language-agnostic topic models can help curate lists of articles related to specific topics across projects and languages.
  3. A Taxonomy of Knowledge Gaps for Wikimedia Projects (Second Draft)

    We published the second draft of the Knowledge Gaps taking into account extensive feedback from the community.
  4. A Taxonomy of Knowledge Gaps for Wikimedia Projects (First Draft)

    We published the first draft of the Knowledge Gaps Taxonomy, our attempt at bringing structure to what we know about how to understand and address knowledge gaps in the Wikimedia projects.
  5. Global gender differences in Wikipedia readership

    We published a preprint (now accepted to ICWSM 2021) describing the gender distribution of readers on Wikipedia and how gender manifests in records of user behavior.
  6. Understanding Engagement with Images in Wikipedia

    We published some initial insights into how readers engage with image content in Wikipedia articles.
  7. Eliciting Interests Blogpost

    We published a blogpost describing our recent work in eliciting the interests of new editors.
  8. Outstanding Problem-Solution Paper Award

    Our paper describing methods for supporting new Wikipedia editors received the Outstanding Problem-Solution Paper Award at ICWSM '19.
  9. Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires

    Our paper describing methods for supporting new Wikipedia editors by eliciting their interests and recommending related articles is presented at ICWSM '19.
  10. Dataset released for Why the World Reads Wikipedia

    We have released the dataset of reader motivations from many different countries accompanying our WSDM '19 paper: Why the World Reads Wikipedia.
  11. Why the World Reads Wikipedia: Beyond English Speakers

    Our WSDM '19 paper on Wikipedia reader motivations across many different language editions and countries is now available.
  12. Structuring Wikipedia Articles with Section Recommendations

    Our paper describing methods for Wikipedia article expansion through recommendation of sections sourced from similar articles is accepted for publication at SIGIR '18.
  13. Using Wikipedia categories for research: opportunities, challenges, and solutions

    Our collaborator Tiziano Piccardi (EPFL) will present at our monthly showcase on language-agnostic methods to extract a pure hierarchy from the Wikipedia graph, part of our work on article expansion recommendations.
  14. Beyond Automatic Translation: Aligning Wikipedia sections across multiple languages

    In this showcase, Diego Saez-Trumper presents research on cross-language section alignment for Wikipedia articles, using Wikidata and cross-lingual word embeddings, and how we're applying these results to improve section recommendations.
  15. Visual Enrichment of Collaborative Knowledge Bases

    Miriam Redi will present at our monthly showcase on opportunities in the use of machine learning and computer vision for the visual enrichment of collaborative knowledge bases, with results from a pilot for recommending high-quality Commons images to Wikidata items.
  16. Wikipedia explains how those late-night reading binges happen

    "Most people visit another link when they look up a topic on Wikipedia." Our study is featured in Engadget.
  17. The State of the Article Expansion Recommendation System

    In this showcase, Leila Zia gives a comprehensive overview of our work on recommender systems to grow and expand Wikipedia articles across languages, including the result of the first line of experiments on the quality of such recommendations.
  18. An overview of “Why we read Wikipedia”

    A video of a presentation for the Wikimedia Foundation's Metrics and Activities Meeting, with an overview of our research, its takeaways and future directions
  19. Reader behavior and motivations across 14 languages

    We repeated the 2016 study, this time in 14 languages. We collected more than 210,000 responses that we are currently analyzing.
  20. Why we read Wikipedia: New paper

    We published a paper with the research methodology and the resulting taxonomy of use cases, and the associated behavioral patterns, of readers of English Wikipedia.
  21. Voice and exit in a voluntary work environment

    We kicked off a new research project and collaboration aiming to design experimental frameworks to identify and tackle the potential causes of women's lack of participation in Wikipedia.
  22. We are live in the Content Translation tool

    Our recommendation API is now integrated in the Content Translation tool. The Recommendation API is responsible for more than 10% of all articles created through Content Translation tool.
  23. Building an article expansion recommender

    We're kicking off a new project aiming to design a recommendation system to identify missing content from already existing Wikipedia articles.
  24. GapFinder is launched

    We launched a tool helping editors identify and contribute missing content across Wikipedia languages.
  25. Growing Wikipedia across languages: New paper

    We published a paper describing an end-to-end system to find, rank, and recommend missing articles across Wikipedia languages. We show that through recommendations we can increase the article creation rate by a 3x factor, without compromising on quality.


Resources and links