This paper provides pointers to the materials used in a paper under review at the Journal of Quantitative Science Studies
Abstract (extracted from the submitted paper): An increasing number of researchers rely on computational methods to generate or manipulate the results described in their scientific publications. Software created to this end--scientific software--is key to understanding, reproducing, and reusing existing work in many disciplines, ranging from Geosciences to Astronomy or Artificial Intelligence. However, scientific software is usually challenging to find, set up, and compare to similar software due to its disconnected documentation (dispersed in manuals, readme files, web sites, and code comments) and the lack of structured metadata to describe it. As a result, researchers have to manually inspect existing tools in order to understand their differences and incorporate them into their work. This approach scales poorly with the number of publications and tools made available every year. In this paper we address these issues by introducing a framework for automatically extracting scientific software metadata from its documentation (in particular, their readme files); a methodology for structuring the extracted metadata in a Knowledge Graph (KG) of scientific software; and an exploitation framework for browsing, comparing and exploring the contents of the generated KG. We demonstrate our approach by creating a prototype with metadata from over ten thousand scientific software entries from public code repositories.
The paper associated to this research object describes our approach for creating a framework to 1) extract metadata from scientific software repositories; 2) create knowledge graphs of connected scientific software; 3) exploiting the created knowledge graph. Our approach produced the following resources:
Student visitor
Student at the Washington University in St. Louis. Aidan participated in the NSF Research Exchange Undergraduate program in the summer of 2020.
Researcher
Researcher at the Information Sciences Institute of the University of Southern California. Daniel's research activities focus on e-Science and the Semantic web, specifically on how to increase the understandability of software and scientific workflows using provenance, metadata, intermediate results and Linked Data.
Designed deived from w3.css