Growing Wikipedia across Languages via Recommendations

We are developing systems that identify content gaps across Wikimedia projects, prioritize them, and recommend them to editors based on their interests.

knowledge gaps image

Project overview

Wikipedia contains over 40 million articles across 293 language editions. However, content in Wikipedia is not evenly distributed across these languages. More importantly, there are major gaps in content, and its quality, across these languages.

As of 2018, only 10% of Wikipedia languages contain millions of articles, while 60% of them contain 10,000 or fewer articles. At the article level, the largest language editions are not without gaps either. Almost 40% of English Wikipedia articles are stub-level entries, with too little content to provide encyclopedic coverage of a subject. Only 1% of English Wikipedia consists of Good or Featured articles.

This project aims to address such gaps by using data mining and machine learning techniques to identify missing content across Wikimedia projects, prioritize them, and recommend them to editors based on their public edit histories.

Recent updates

  1. Outstanding Problem-Solution Paper Award

    Our paper describing methods for supporting new Wikipedia editors received the Outstanding Problem-Solution Paper Award at ICWSM '19.
  2. Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires

    Our paper describing methods for supporting new Wikipedia editors by eliciting their interests and recommending related articles is presented at ICWSM '19.
  3. Structuring Wikipedia Articles with Section Recommendations

    Our paper describing methods for Wikipedia article expansion through recommendation of sections sourced from similar articles is accepted for publication at SIGIR '18.
  4. Using Wikipedia categories for research: opportunities, challenges, and solutions

    Our collaborator Tiziano Piccardi (EPFL) will present at our monthly showcase on language-agnostic methods to extract a pure hierarchy from the Wikipedia graph, part of our work on article expansion recommendations.
  5. Beyond Automatic Translation: Aligning Wikipedia sections across multiple languages

    In this showcase, Diego Saez-Trumper presents research on cross-language section alignment for Wikipedia articles, using Wikidata and cross-lingual word embeddings, and how we're applying these results to improve section recommendations.
  6. Visual Enrichment of Collaborative Knowledge Bases

    Miriam Redi will present at our monthly showcase on opportunities in the use of machine learning and computer vision for the visual enrichment of collaborative knowledge bases, with results from a pilot for recommending high-quality Commons images to Wikidata items.
  7. The State of the Article Expansion Recommendation System

    In this showcase, Leila Zia gives a comprehensive overview of our work on recommender systems to grow and expand Wikipedia articles across languages, including the result of the first line of experiments on the quality of such recommendations.
  8. We are live in the Content Translation tool

    Our recommendation API is now integrated in the Content Translation tool. The Recommendation API is responsible for more than 10% of all articles created through Content Translation tool.
  9. Building an article expansion recommender

    We're kicking off a new project aiming to design a recommendation system to identify missing content from already existing Wikipedia articles.
  10. GapFinder is launched

    We launched a tool helping editors identify and contribute missing content across Wikipedia languages.
  11. Growing Wikipedia across languages: New paper

    We published a paper describing an end-to-end system to find, rank, and recommend missing articles across Wikipedia languages. We show that through recommendations we can increase the article creation rate by a 3x factor, without compromising on quality.

Project team

Leila Zia, Miriam Redi, Diego Sáez-Trumper, Robert West


Michele Catasta (Stanford University), Jure Leskovec (Stanford University), Ashwin Paranjape (Stanford University), Tiziano Piccardi (EPFL), Ellery Wulczyn (Wikimedia Foundation)


Resources and links