This page provides pointers to the materials (datasets, software and notebooks) used in a paper currently under review
Summary: Application developers today have three choices for exploiting the knowledge present in Wikidata [1] : they can download the Wikidata dumps in JSON or RDF format, they can use the Wikidata API to get data about individual entities, or they can use the Wikidata SPARQL endpoint. None of these methods can support complex, yet common, query use cases, such as retrieval of large amounts of data or aggregations over large fractions of Wikidata. This paper introduces KGTK Kypher, a query language and processor that allows users creating personalized variants of Wikidata on a laptop. We present several use cases that illustrate the types of analyses that Kypher enables users to run on the full Wikidata KG on a laptop, combining data from external resources such as DBpedia. The Kypher queries for these use cases run much faster on a laptop than the equivalent SPARQL queries on a Wikidata clone running on a powerful server with 24h time-out limits.
This page and the materials described on it (excluding external references) is available at a permanent URL: https://w3id.org/kgtk_kypher.
A preprint of the paper will soon be available here.
The cache (sqlite) used to store the queries is available in the following DOI: https://doi.org/10.5281/zenodo.5146407
The results of the analyis are the times comparing KGTK and SPARQL, as reported in the paper.
Research Lead
Research Lead at the Information Sciences Institute, University of Southern California.
Research Director
Research Director at the center on Knowledge Graphs, Information Sciences Institute, University of Southern California.
Research Scientist
Researcher at the Information Sciences Institute, University of Southern California.
Student worker
Master student at the University of Southern California.
Designed deived from w3.css