In this section we provide and discuss some important definitions used in Wikidata Analytics and provide pointers to documentation and further readings.


The Wikidata WDCM Reuse Statistic

The knowlegde and information in Wikidata is widely reused across the WMF’s projects: Wikipedia, Wikivoyage, Wiktionary, etc. The reuse statistics tell us about the extent of that reuse in each particular project, e.g. en.wikipedia.org, it.wikivoyage.org, ru.wiktionary.org, etc. The fundamental Wikidata reuse statistic that is used in Wikidata Analytics is defined in the Wikidata Concepts Monitor (WDCM) project and for that reason termed WDCM Reuse Statistic.

Definition. The WDCM Reuse Statistic defines the Wikidata reuse of a particular Wikidata item, on a particular page of a particular WMF project as a sole “mention” of that item, on that page, in that project. That means exactly the following: if any particular Wikidata item is reused, in any form, on a particular page in a particular project, we count only one mention of that item, on that page, in that project. What follows is that the total reuse statistic for an item in a project is the count of pages in that project that make any use of that item.

This might sound as an oversimplification, especially given that Wikidata items can be reused in various ways across any page in the WMF universe. However, there are strong technical reasons to define Wikidata item reuse in this way. This definition is motivated as follows:

Aspect of the entity used. Well known values:

(taken from the Wikibase wbc_entity_usage schema documentation)

Hence, in order to avoid the usage of potentially over-lapping usage aspects to provide an approximation of Wikidata reuse statistics, we have opted to define item reuse merely as a mention on a page. A single item can thus be reused in many ways on the same page, but we count only one mention of that item and do not take various aspects in account at all. Thus the definition of the WDCM Reuse statistic does not provide for the most precise measurement of Wikidata reuse, but it provides for a certain measurement of what can be measured with certainty.

Additional efforts to better categorize Wikidata reuse in WMF projects was put by the WMF Research Team: Categorize different types of Wikidata re-use within Wikimedia projects. The complexity of the problem presents itself clearly in the discussions in this Phab ticket.


Data Sources for Wikidata Analytics

Here a list of data sources used to build the results presented by Wikidata Analytics:


Algorithms and Models for Wikidata Analytics

We make use of various AI/ML workhorses to figure out the patterns of Wikidata reuse across the WMF universe and derive suitable datasets to illustrate its complexity in ergonomic ways. Here’s some of them:


Learn together

If you would like to get to know more about our work, contact us: goran.milovanovic_ext@wikimedia.de. If you would like to learn Data Science by using Wikidata, get in touch. If you would want us to learn or discover something new about Wikidata together, drop us an email or contact us at Phabricator. Knowledge is free, and we are more than happy to share.


Goran S. Milovanović, Data Scientist for Wikidata, Wikimedia Deutschland: goran.milovanovic_ext@wikimedia.de
This is free software: all content and code is GPL v2.0 licensed.