Pinned post

In the new issue of our monthly newsletter:
• Detecting spam, and pages to protect
• Editors' intelligence test scores related to article quality
• Three papers on Wikipedia citations

Pinned post

In the new issue of our newsletter:
• Thanking editors makes them come back more, but not contribute more
• "Wikipedia's Network Bias" on abortion and other controversial topics
• The interests of designer drugs editors

RT @ARPHAPreprints
You heard about this idea from us!😉Scientists propose an interactive guide on @Wikipedia & @wikidata to help agencies, professionals & communicators reach the public quickly in times of emergencies like the pandemic.

"Predicting Links on Wikipedia with Anchor Text Information" a study of the transductive and the inductive tasks of in-Wiki link prediction on several subsets of the English .

(Brochier et al, )


"COVID-19 Pandemic Wikipedia Readership" @Wikimedia releases 2 datasets related to Wikipedia readership during the pandemic period (Jan-June 2020): 1) COVID-19 article page views by country.
2) One hop navigation where one of the pages is Covid-related.

"WhatTheWikiFact: Fact-Checking Claims Against Wikipedia" a system predicting the veracity of a claim, showing evidence for the verification and confidence scores.

(Chernyavskiy et al, 2021)

tool: extmon.centralus.cloudapp.azur

Looking for research on a particular aspect of Wikipedia? Try searching the archives of our monthly newsletter, going back almost a decade:

We're very pleased to announce the availability of the TREC Fair Ranking 2021 participant instructions, corpus, and training queries, in partnership with @wikiresearch. TRECers, start your engines!

"Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia" text and features of 1M English Wikipedia revisions, labeled as positive/negative with respect to the 10 most popular content reliability templates.

(Wong et al, )

RT @manoelribeiro
Sudden Attention Shifts on Wikipedia During the COVID-19 Crisis
With @krisgligoric, @peyrardMax, @f_lemmerich, @mstrohm, and @cervisiarius

RT @lcptuk
Wikipedia and Westminster
A study of the production and consumption of information on Wikipedia about UK politicians

RT @TimoTijhof and sister projects, April 2021:

* 50%: Chrome +Mobile
* 23.8%: Safari +Mobile
* 5.2%: Firefox +Mobile
* 2.8%: Edge
* 2.6%: Samsung
* 2.0%: Chrome iOS
* 1.6%: Google app
* 0.8%: Opera
* 0.7%: IE

100% = 16.4 billion views (-apps, -bots)


"QuTI! Quantifying Text-Image Consistency in Multimodal Documents" a Web application that quantifies relations of entities (persons, locations, events) in image and text, based on .

(Springstein et al, 2021)


RT @mrlogix
Excited to know that our work, COVIWD: COVID-19 Wikidata Dashboard (, is featured on the open book:

Addshore & Mietchen, D. & Willighagen, E. (2020). Wikidata Queries around the SARS-CoV-2 virus and pandemic. Maastricht, NL:

RT @arxiv_cs_cl WEC: Deriving a Large-scale Cross-document Event Coreference dataset from Wikipedia. (arXiv:2104.05022v2 [cs.CL] UPDATED)

RT @outbreaksci
Update: Meta-Research: Citation needed? Wikipedia and the COVID-19 pandemic read/write/request review

"Wikidata and the bibliography of life." The biology taxonomic community lacks a centralised, curated literature database, and @wikidata's active community and sophisticated models of bibliographic information could help with it.

(@rdmpage, 2021)

RT @conzept__
I wanted to explore some of these 'non-western' artists, so I hacked their names into the "hexagonal-covers" page format:

Some other day I should hack this into a proper sparql-based workflow.

"Representation of Non-Western Cultural Knowledge on Wikipedia: The Case of the Visual Arts" Wikipedia, Wikidata, and Wikimedia Commons strongly favour the Western canon, giving many times more coverage to Western art.

(Poulter and Ahmed, 2021)

RT @ZetaVector
Live at : "MultiModalQA: complex question answering over text, tables and images" by @AlonTalmor @OriYoran @LahavDan et al.

TL;DR 👉 QA dataset that requires multi-modal multi-hop reasoning over wikipedia text, tables and images, accompanied by a new multi-hop model.

RT @TsinghuaNLP
MAVEN is a massive general domain event detection dataset, which contains 4,480 Wikipedia documents, 118,732 event mentions, and 168 event types. Check out our EMNLP 2020 paper (

Show older

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!