Research Report Nº 9
The ninth in a series of biannual reports from Wikimedia Research, published every June and December.
Addressing knowledge gapsWe develop models and insights using scientific methods to identify, measure, and bridge Wikimedia’s knowledge gaps.
A deeper understanding of content contributions
With the availability of scalable edit types library, our research on Wikipedia Edit Types has entered the next stage. We have started experimenting with large language models to auto-generate edit summary recommendations for edits that changed textual content on English Wikipedia. We submitted a paper with our initial results which is now under review. (Learn more)
An improved readership experience through better new user onboarding
We worked on making the link recommendation model more accurate and scalable across different languages. We started by improving and releasing a new version of the mwtokenizer library to support tokenization across Wikipedia languages to scale the link recommendation model. (Learn more)
A model to increase the visibility of articles
Our paper characterizing orphan articles in Wikipedia has been accepted for publication at ICWSM 2024. We developed an experimental tool for developers and researchers to experiment with the output of the research. This technology can be used to support editors in de-orphanizing Wikipedia articles and to support readers by recommending orphan articles they can read next. (Learn more)
Metrics to measure knowledge gaps
Our focus was on three primary areas: ensuring the adoption and usage of the knowledge gap metrics and measurements we have researched and developed over the past years; improving the accessibility of the measurements we publish; continuing research to develop metrics for the knowledge gaps that currently do not have a metric.
One of the barriers for the adoption of the gender and geography content gap metrics in the Wikimedia Foundation was that the Foundation needed a more nuanced metric that takes into account not only the number of articles but also their quality. In response we developed a new knowledge gaps metric: standard quality , the percentage of articles meeting a standard quality threshold. The standard quality metric for the gender and geography gaps is now used by the Wikimedia Foundation to make strategic decisions. This work was part of our contributions to one of the Foundation’s top annual plan goals, Infrastructure: Building the infrastructure of knowledge as a service.
We advanced our efforts to make the Knowledge Gap Index data more accessible. First, we created a public repository which will include productionized knowledge gap metrics as they become available. We started developing an architecture and documenting the code for data pipelines that provide measurements for the content gap metrics. Lastly, to lower the barrier for utilizing this data, we also developed notebooks that can help visualize the gender and geography data gaps and act as a starting point to answer more questions with using the data.
We continued research towards developing more content gap metrics for the identified gaps and advocated for the productionization of the metrics we have developed. We prioritized our attention on three fronts: readability gap, structured data gap, and language gap. You can learn more below.
The multilingual readability model which can be used to develop a readability gap metric is now in production (Get readability prediction via querying the service or use the interface to test it). We also completed the first iteration of developing a metric for the structured data gap. The metric calculates how complete a Wikidata item's labels and descriptions are. (Notebook)
We started investigating the development of metrics for the language gap, to study how well Wikimedia projects cover languages from around the world. We compiled data from different existing sources to create a “state of languages” table: a semi-automated table that shows which languages currently have which free-knowledge projects. (Learn more)
Finally with the survey expertise that are now in the Research team, we used the opportunity to rethink how we approach survey development and data collection to serve the Movement. As a result we prioritized work on three fronts that you can learn more about below.
Revamping Reader Surveys. Inspired by our 2019 research, we launched the Global Reader Survey, a representative survey of Wikipedia readers, fielded in 23 languages (representing at least 90% of current Wikipedia readership). The data from the survey will provide measurements for 9 of the Readers’ gaps. (Task)
Refocusing the Community Insights survey. The survey is run by WMF to learn about the communities. We reflected on the usage of the data from this survey and considered the needs of the organization for data. As a result of this exercise we made a few changes in the survey’s roadmap. Specifically we decided to reduce the scope of the survey to two types of questions related to the contributors’ representation gaps and community health considering the importance of the topics for WMF. With the clarity of scope, we prioritized work on the operations side to prepare for the launch of the updated survey in 29-languages in March 2024. The data from the survey will provide measurements for 7 of the Contributors’ gaps.
Streamlining survey analyses. With the goal of reducing the number of weeks needed for analyzing survey data, we continued building repeatable frameworks for analyzing survey data. We are using the opportunity of supporting the December 2023 Developer Satisfaction Survey as a use-case that can help us improve our framework.
A deeper understanding of the role of visual knowledge in Wikimedia Projects
Due to other priorities this project was on a pause during the period of this report.
A model for image recommendation
We concluded our support to Wikimedia Foundation’s Product teams for the “section-level image suggestion” structured task. At the end of November 2023, 4629 images were added on 14 Wikipedia languages through this task. One limitation of the current algorithm for image suggestion is its limited coverage, because it is based on section-image associations that are already existing in some Wikipedia languages. Addressing this limitation requires highly precise multimedia retrieval frameworks. To advance research on this front, we organized the AToMiC (Authoring Tools for Multimedia Content) track for the annual TREC conference.
A deeper understanding of reader navigation
We are happy to share that our paper on temporal regularities of Wikipedia consumption has been accepted for publication at ICWSM 2024 (paper. This concludes the portion of readership research dedicated to understanding how readers navigate within the encyclopedia. (Learn more)
We started a new line of research to understand how readers enter Wikipedia. More specifically, we are looking at the relevance of Wikipedia to users of search engines, which are one of the main drivers of traffic to Wikipedia. Specifically, we aim to study what proportion of search engine queries lead to a user's visit to Wikipedia. As a first step towards this goal, we are building a new dataset that combines Google Trends (what users on Google are searching for) and Wikipedia clickstream (how readers reach Wikipedia), two publicly available resources. (Learn more)
A unified framework for equitable article prioritization
We concluded the work on our experimental framework to test the balance between personalization and content equity within recommender systems. The learnings from this research are accepted for publication in ICWSM 2024. (Paper, Learn more)
We continued work on integrating our content tagging models into existing product workflows with the goal of promoting more equitable article improvements. We worked on the adoption of the model on three fronts. As of July 2023, all Growth features in all languages are now using our multilingual topic models to suggest articles for newcomers, greatly improving the coverage of topic labels. Additionally, after a successful pilot, we are assisting the Android team as they prepare to launch a model for AI-generated article description recommendations. We are also receiving and recording feedback for our List-Building tool, as it is being tested by campaign organizers to more easily build article worklists that are relevant to their topic areas. (Learn more).
Models for content tagging
We concluded our work on this front, and we are now working on integrating our content tagging tools in the context of other projects and product workflows. See the previous paragraphs update for work done on this space.
Large language models for text simplification
We started exploratory work on developing a model to automatically simplify Wikipedia article text using large language models. The model aims to improve the readability of articles and is a follow-up to our work on measuring readability as part of the knowledge gap metrics. We started by conducting a literature review to understand existing solutions and best practices for systematically evaluating the model’s performance. Based on these insights, we developed a working prototype by fine-tuning a pre-trained language model (T5) and evaluating it on the D-Wikipedia benchmark dataset. (Learn more)
Improve Knowledge IntegrityWe develop models and insights using scientific methods to support the technology and policy needs of the Wikimedia projects in the areas of misinformation, disinformation, and content integrity.
Enhanced models for content patrolling
We primarily focused our work on the Language Agnostic Revert Risk model improvements which have been in production for some time now. We improved the model for better accuracy and serving time. Furthermore we debugged the model based on feedback from the usage of the model by Wikimedia Enterprise. On another hand, we developed and released an annotation tool through which we can collect ground truth labels for evaluating and improving our models. Lastly, we launched a labeling campaign to evaluate and improve the Revert Risk model for Wikidata. (Learn more)
Wikipedia Knowledge Integrity Risk Observatory
We focused on the adoption and usage of the Observatory. As a result of this work the Observatory’s data and inferred insights are now being used on two fronts. First by the Trust & Safety Disinformation team in monitoring the Wikimedia projects at the time of upcoming elections in countries across the world; Second by the Moderator Tools team for characterizing high-risk revisions for the Automoderator tool, a tool that allows moderators automate the prevention or reversion of bad edits. (Learn more)
A project to help develop critical readers
The research is now concluded. A preprint is available and is currently under review.
Infrastructure for more efficient machine learning research and development
We have been investing on the engineering front over the past couple of years to better align the research environment with WMF’s production environment with the ultimate goal of facilitating the use of research outputs in products. During the period of this report, we invested on three fronts.
First, we started developing pipelines that streamline dataset generation with a particular focus on the datasets our team relies on for developing machine learning models. Our goal is to streamline dataset creation processes from data labeling to storage, versioning, quality assurance, and human feedback loop to create high quality training and evaluation datasets. We currently have two datasets as part of this new pipeline and expect to add more in the coming months.
Second, we developed tooling to support working with embeddings (special codes represented by a sequence of numbers that help computers understand and represent words, sentences, or objects in a way that is useful for various tasks, such as comparing the similarity of two Wikipedia articles or two images). In particular, in support of automatically building lists for campaigns, we built a vector database index in WMF's compute infrastructure and hosted the vector database service on WMF’s open cloud computing infrastructure (Cloud VPS).
Third, we started experimenting with and building exploratory tooling for using Cloud GPUs for machine learning training tasks, with a focus on fine-tuning LLMs.
Conducting foundational workWikimedia projects are created and maintained by a vast network of individual contributors and organizations. We focus part of our efforts on strengthening part of this network: the Wikimedia research community.
A Wikimedia Research course
A few instructors developed their sessions content. We continued fine-tuning what is in and out of every module and invited some more instructors to develop the course’s content. However, the progress on the course was slower than expected during the period of this report due to unforeseen need for change in priorities for the Head of Research. If you are interested to receive more frequent updates about this project, please subscribe to the relevant epic task.
Our July showcase focused on improving knowledge integrity in Wikimedia projects, with presentations of recent work by the Wikimedia Foundation Research team on reference quality in Wikipedia and multilingual approaches to revert prediction. We explored the role of rules on Wikipedia in our September showcase with talks covering their implications for experiential epistemology and a comparative study of rule-making activity in the five largest wikis. The topic of our October showcase was data privacy featuring works on synthetic data of Wikipedia reader navigation patterns and differential privacy approaches for datasets released by the Wikimedia Foundation. In November, two different pieces of bibliometrics research were presented to connect this showcase with the GLAM Wiki 2023 conference. Our final showcase of 2023 was a panel titled "A year of Generative AI: Future directions for Wikimedia". (Learn more)
We continued offering 1:1 consultations with members of our team. In the past six months, most office hours were focused on assisting prospective applicants with their Research Fund submissions. (Learn more)
We launched the third cycle of the Wikimedia Research Fund and have received 76 submissions. (Learn more)
We participated in the first in-person Wikimania in Singapore after a few years of virtual Wikimanias. Wikimania is the largest annual gathering of the Wikimedia Movement celebrating all the free knowledge projects hosted by the Wikimedia Foundation. We led a session titled “10 Research findings and how you can use them in your works (Slides, Video) and participated in two panels: AI advancements and the Wikimedia Projects (Video) and ChatGPT vs. WikiGPT (Video). Wikimania was a great opportunity for us to connect with volunteers and other Movement participants and discuss Wikimedia research and much more.
We participated in this year’s Wiki Indaba which took place in Morocco. We presented a talk titled “Reading Wikipedia in Swahili, Yoruba, French, and English: Insights from Sub-Saharan Africa” (Video) and engaged with Wikimedians from across the African region.
Presentations and keynotes
We engaged with research audiences through the following presentations and keynotes during the past six months.
In July, we participated in a panel discussion on The Sociopolitical Impact of AI in the Digital Humanism Summit on AI and Democratic Sustainability in Vienna organized by the Digital Humanism Initiative, the OSUN Hannah Arendt Humanities Network at Bard College and the Institute for Human Sciences (IWM). Our contribution focused on featuring the work and principles of AI/ML in the Wikimedia Research ecosystem.
In July, we gave an invited talk as part of The Weaponization of Knowledge Conference at the University of Tübingen. We were invited by research collaborators from the University of North Carolina – Chapel Hill to share findings of ongoing research on historical revisionism in Japanese Wikipedia. (Extended abstract)
In August, we presented a paper entitled “Fair multilingual vandalism detection system for Wikipedia” at the XXIX Conference in On Knowledge Discovery and Data Mining (KDD’23). This paper describes our scientific approach to create the Revert Risk Models.
In October, we moderated a panel discussion on Ethical Challenges of the Democratisation of Artificial Intelligence at Decidim Fest 2023 Conference in Barcelona with panelists from Queer In AI and Universitat Oberta de Catalunya.
In October, we co-organize the CSCW Workshop on epistemic injustice in online communities. The workshop discussed the systemic silencing, exclusion, or delegitimization of certain knowledge contributions.
In November, we hosted the AToMiC track at the TREC 2023 conference. We received a total of 27 runs from 3 different teams (11 for the image promotion task, and 16 for the image suggestion task). A second edition of the AToMiC Track has been accepted and will be held with TREC 2024.
In November we participated on a panel organized by Wikimedia Argentina, discussing the usage of AI-based technologies in the Wikimedia projects. (Video [Spanish])
In December, we participated in the “Recentering Platform Governance” Workshop hosted at Yale University’s Law school. The workshop considered ways to center not-for-profit initiatives and community-based models for platform governance in policy discussions.
Mentorship through Outreachy
We continue to participate in Outreachy and are excited to have a new intern onboard. Shriya Kamat Tarcar will be working with our Lead Design Researcher, Mike Raish, on a Multilingual Wikipedia Editor Survey project.
Mentorship through internships
All of our past internships have been completed prior to the period of this report and no new internships have been taken on. We remain committed to mentoring through internships in the future.
We have continued our commitment to formalizing our approach to ethical development and deployment of AI technologies on the Wikimedia projects through a number of initiatives.
We continued collaborating with the Human Rights team to develop a stable Human Rights AI Checklist. We are wrapping up our first round of testing of the checklist with model developers to get feedback on the ease of using it and to identify whether any major components are missing. (Task)
The pilot of the machine-assisted article descriptions tool on the Wikipedia Android app wrapped up successfully. The results were largely positive and it identified one failure case with hallucination of dates that we were able to address ahead of a broader release. The model is being put on our Machine Learning platform and in preparation for the expected release to the 25 language editions that the model supports in early 2024.
Survey support for WMF and the affiliates
Every year Wikimedia Foundation and the affiliates send a large number of surveys to different Wikimedia communities to gather valuable data that can inform decision making or support developing insights about the state of the communities and the projects. Developing and implementing good surveys requires knowledge about how to form survey questions, ethics, how to sample, where to expose the survey, how to analyze the data from the survey, and more. Through a new service we aim to provide methodological and question writing feedback, and assist CentralNotice administrators in reviewing community surveys distributed through banners. We further manage WMF survey tools, and provide access to Qualtrics for affiliates who may need more advanced survey tooling. We supported Wiki Women Camp, the Wikimania 2024 Core Organizing Team, Wikimedia Deutschland, and Wikimedia Chile on their survey projects. If you are a Wikimedia affiliate, you can request survey support by writing to email@example.com with the details of your project.
Frameworks for metric definition and dissemination
We contributed to Infrastructure: Building the infrastructure of knowledge as a service, one of the top-line goals of Wikimedia Foundation’s annual plan for 2023-2024. Our contributions on this front do not fall under existing roadmaps of our team, as a result, we are going to report them in this dedicated section.
Our primary contributions were two-fold. First, we organized and led a committee of experts from across WMF who developed criteria for metrics that strategic decision making and understanding of the projects rely on (essential metrics). Second, we developed a process for computing, visualizing, and presenting business critical annual plan metrics (core metrics) for decision makers.
The people on the Research team
After experimenting with a new mission and set of audiences for our team, we now are ready to share them with you.
Our mission is to develop models and insights utilizing scientific methods, and strengthen the Wikimedia research communities. We do this in order to: support technology and policy needs of the Wikimedia projects, and advance the understanding of the Wikimedia projects.
The clarity of mission and audiences allows our team to maintain focus, be clearer about the impact that we seek through our work, and prioritize incoming requests more effectively.
During the period of this report two important changes happened that we would like to share with you.
First, Design Research joined the Research team. The addition of people and expertise from the Design Research team to the Research team enables the Research team to investigate research questions that require mixed method approaches and a more diverse set of expertise to tackle than what our team had previously included.
Second, we hired a Research Manager for the Knowledge Integrity and Research Engineering operations in our team. Both of these areas had benefited from dedicated individual contributors over the past few years investing their expertise in the corresponding spaces. However, we had repeatedly felt the need for dedicated management capacity to support prioritization, removal of blockers and streamlining of processes for more effective and impactful research and engineering work in these spaces. In the next section, we will share with you more about the people who have joined our team over the past months.
In July 2023 we welcomed the Design Research team, formerly part of the Design Strategy group, to our team.
Eli Asikin-Garmager joined WMF in 2019 and is now a Principal Design Researcher. Eli has worked in a range of contexts and industries as a design researcher and linguist. He received his PhD in Linguistics from the University of Iowa, and currently works with the Wikimedia Language Team to help people access and contribute knowledge in a greater number of languages. He has also supported other product teams at the Wikimedia Foundation by conducting research around reading experiences and experiments with new form factors for encyclopedic content. More broadly, Eli is interested in language variation, translation, and human experiences around AI, particularly in regards to machine translation and machine-augmented tasks.
Daisy Chen joined WMF in 2012 and is a Lead Design Researcher. She received a BA in Sociology from Stanford where she focused on social movements and social psychology. Prior to joining WMF, she worked in public service, immigration and corporate law and at a start-up. Daisy joined WMF as a paralegal on the Legal and Community Advocacy team, and in 2014 pivoted to become a Design Researcher. She has extensive experience in usability research, contextual inquiry, survey research, and more.
Gabriel Escalante is the Design Research team’s manager. He is a sociologist by training and has a Master’s degree in Social Studies from Mexico’s National Autonomous University (UNAM). Gabriel has over 15 years of qualitative research experience working for different organizations in Latin and North America, Europe and Asia.
Bethany Gerdemann joined WMF in 2019 as the Program Manager for the Design Research team. In this capacity she develops, streamlines, and maintains processes that are essential for the operations of design researchers including but not limited to participant recruitment. Bethany has an MS degree in International and Development Economics from University of San Francisco where her thesis involved field research in Sierra Leone to better understand competition and cooperation in different family structures.
Claudia Lo joined WMF in 2018 and is a Senior Design Researcher with a focus on moderation and governance within the Wikimedia Movement. She served on the drafting committee for developing the Universal Code of Conduct Enforcement Guidelines which created a set of standards for enforcing the movement’s Universal Code of Conduct across all Movement-affiliated projects. She currently supports the creation of tools for volunteer moderators as well as the organization’s Trust and Safety product work. Prior to joining WMF Claudia received an MSc from MIT's Comparative Media Studies program, where she took an interdisciplinary approach to studying online volunteer moderators. Claudia's academic background combines queer and feminist media studies, and sociology of virtual spaces.
Michael Raish joined WMF in 2021 as a Lead Design Researcher. Since then he has been involved in multiple projects including Trust & Wikipedia, an experiment by the Web team for offering “trust signals” to Wikipedia readers on desktop. Mike received his PhD in 2017 from Georgetown University’s Department of Arabic and Islamic Studies, where his dissertation focused on developing methods to design Arabic language tests and measuring the proficiency of non-native Arabic learners—especially in written contexts. In his work at WMF, Mike leans heavily on his background in linguistics, qualitative research, experimental design, and discourse analysis to inform product design decisions.
Xiao Xiao joined us as a Research Manager to oversee the research engineering operations in our team as well as the implementation of the knowledge integrity roadmap. Xiao has a PhD in mathematics from New York University, where she focused on fluid dynamics and computational partial differential equations. She developed algorithms that were applied in oceanography and later in computational cell biology. As a Manager, Xiao has led teams of ML scientists and engineers in Thomson Reuters and later for a fast-paced start-up company in the financial compliance domain. Xiao’s focus has been on deployment of ML models in production. She has led discussions on ML architecture and coordinating among teams to bridge the gap between theoretically sound ML models and their productionization - improving model reproducibility, reducing latency, and maintaining model stability.
Research ShowcasesEvery 3rd Wednesday of the month (virtual)Join us for Wikimedia-related research presentations and discussions. The showcases are great entry points into the world of Wikimedia research and for connecting with other Wikimedia researchers. Learn more
Research Office HoursThroughout the month (virtual)You can book a 1:1 consultation session with a member of the Research team to seek advice on your data or research related questions. All are welcome! Book a session
Wikimedia Hackathon 2024May 2024The 2024 Wikimedia Hackathon is scheduled to take place in Tallinn, Estonia, from May 3rd – 5th 2024. Learn more
Wiki WorkshopJune 2024The eleventh edition of Wiki Workshop, our largest Wikimedia research event of the year, will take place on June 20, 2024. Stay tuned!
Wikimania 2024August 2024The nineteenth edition of Wikimania, the largest Wikimedia conference of the year, will take place from the 7th to the 10th of August 2024 in Katowice, Poland. Register
We encourage you to keep in touch with us via one or more of the methods listed in the Keep in touch section to receive more information about these and other events.
Trends to watch
We're keeping an eye on significant trends that relate to the Wikimedia projects and the broader ecosystem in which Wikimedia operates:
Policy pages as a focus on research and tooling
As captured in a recent Research Showcase, there's a growing recognition of the importance of helping editors to navigate policy on Wikipedia, to build a more inclusive and welcoming space. For example, Wikimedia Foundation’s product teams are considering how to surface relevant policies to editors during the editing process (Edit Check) to help them make adjustments prior to publishing, and potentially prevent demotivating reverts. There is also interesting recent work from the Wikimedia Research community focusing on policy-related aspects of the projects, for example providing tools and data to help communities understand how content is moderated, or studying the evolution of policies and community.
It has been discussed in the Wikimedia spaces as one of the most prominent applications for new large language models, with many practical use-cases for Wikipedia and its sister projects. These include: summarizing sections of articles, summarizing discussions of talk pages, and summarizing Phabricator tickets (see, e.g., this project at the 2023 Wikimedia Hackathon). The Research team at the Wikimedia Foundation is working on models for text simplification, a special case of text summarization, which will help bridge the readability gap by generating article summaries with different levels of complexity.
Funds for the Research team are provided by donors who give to the Wikimedia Foundation in different ways. Thank you!
Keep in touch with us
The Wikimedia Foundation's Research team is part of a global network of researchers who advance our understanding of the Wikimedia projects. We invite everyone who wants to stay in touch with this network to join the public wiki-research-l mailing list, or follow us on Twitter/X.