This page describes the additional materials used for the publication "Automated Hypothesis Testing with Large Scientific Data Repositories", which is currently under review in the ACS2016 conference.
If you want to get an RDF description of the contents presented in this document, just parse it with your favourite RDF-a parser. Alternatively, you can use content negotiation on its id (https://w3id.org/dgarijo/ro/acs2016) to retrieve it on TTL , RDF/XML or JSON-LD formats.
A link to the pdf paper will be provided here once the review process is done.
Summary extracted from the submitted paper: "The automation of important aspects of scientific data analysis would significantly accelerate the pace of science and innovation. Although important aspects of data analysis can be automated, the hypothesize-test-evaluate discovery cycle is largely carried out by hand by researchers. This introduces a significant human bottleneck, which is inefficient and can lead to erroneous and incomplete explorations. We introduce a novel approach to automate the hypothesize-test-evaluate discovery cycle with an intelligent system that a scientist can task to test hypotheses of interest in a data repository. Our approach captures three types of data analytics knowledge: 1) common data analytic methods represented as semantic workflows; 2) meta-analysis methods that aggregate those results, represented as meta-workflows; and 3) data analysis strategies that specify for a type of hypothesis what data and methods to use, represented as lines of inquiry. Given a hypothesis specified by a scientist, appropriate lines of inquiry are triggered, which lead to retrieving relevant datasets, running relevant workflows on that data, and finally running meta-workflows on workflow results. The scientist is then presented with a level of confidence on the initial hypothesis (or a revised hypothesis) based on the data and methods applied. We have implemented this approach in the DISK system, and applied it to multi-omics data analysis."
The paper associated to this page describes the DISK framework, which aims to automatize the hypothesis test-refine life cycle. Below you can see a list of the materials that we have used to test and demonstrate our approach.
|Yolanda Gil||Yolanda Gil is Director of Knowledge Technologies and at the Information Sciences Institute of the University of Southern California, and Research Professor in the Computer Science Department. Her research interests include intelligent user interfaces, social knowledge collection, provenance and assessment of trust, and knowledge management in science. Her most recent work focuses on intelligent workflow systems to support collaborative data analytics at scale.|
|Daniel Garijo Verdejo||Daniel Garijo is a postdoc researcher in the Information Sciences Institute of the University of Southern California. His research activities focus on e-Science and the Semantic web, specifically on how to increase the understandability of scientific workflows using provenance, metadata, intermediate results and Linked Data.|
|Varun Ratnakar||Varun Ratnakar is a research programmer at the Information Sciences Institute of the University of Southern California. He is the main developer of the Wings workflow system.|
|Rajiv Mayani||Rajiv Mayani is a programmer analyst at the Information Sciences Institute of the University of Southern California.|
|Parag Mallik||Parag Mallik is an assistant professor (Research) in Radiology. Parag is also member of the Stanford Cancer Institute and a faculty fellow of Stanford ChEM-H. After completing his PhD, he trained with Ruedi Aebersold in clinical proteomics and systems biology at the Institute for Systems Biology.|
|Ravali Adusumilli||Ravali Adusumilliis a bioinformaticist at the Mallik Lab of Stanford Univesity. She is interested in developing tools and pipelines for multi-omic analysis.|
|Hunter Boyce||Hunter Boyce is a postdoctoral researcher at the Mallik lab of Stanford university.|