Supplementary materials for paper: A Study of the Quality of Wikidata (Under review)


This page provides pointers to the materials (datasets, software and notebooks) used in a paper currently under review (more details about the publication will be announced at a later stage)

Summary:The increasing adoption of Wikidata [1] in a wide range of applications (entity linking, question answering, link prediction, etc.) motivates the need for high-quality knowledge to support them. However, we currently lack an understanding of the quality of the semantic infor-mation captured in Wikidata. In this paper, we explore two notions of data quality in Wikidata: 1) a community-based notion which captures the ongoing community consensus on the recorded knowledge, assumingthat statements that have been removed and not added back are implicitly agreed to be of low quality by the community; and 2) a constraint-based notion which encodes Wikidata constraints efficiently, and detects their violations in the data. Our analysis reveals that low-quality state-ments can be detected with both strategies, while their cause ranges frommodeling errors and factual errors, to constraint incompleteness. These findings can complement ongoing efforts by the Wikidata community toimprove data quality based on games and suggestions, aiming to makeit easier for users and editors to find and correct mistakes.

Datasets.


Input Datasets

We used the following datasets for our paper. We will deposit them in Zenodo upon paper acceptance (in order to preserve annonimity requirements). We note that some of the files may not have the same source date, as we performed the analysis as new data became available. However, since Wikidata is continouslu evolving, all files are sufficiently close in time so as to be compatible with one another.

Software and Notebooks.


The pointers for using the main software used can be found below:

Bibliography.


  1. Vrandecic, D., Krotzsch, M.: Wikidata: a free collaborative knowledgebase. Com-munications of the ACM57(10), 78–85 (2014)

About the authors.


Authors are omitted as per the submission requirements.

Designed deived from w3.css