The Workflow Motif Ontology

The Workflow Motif Ontology

Release 14 June 2013

This version:
http://vocab.linkeddata.es/motifs/version/14062013/
Previous version:
http://vocab.linkeddata.es/motifs/version/15052013/
Latest version:
http://purl.org/net/wf-motifs
Revision
Revision 1.2
Authors:
Daniel Garijo, Ontology Engineering Group, Universidad Politécnica de Madrid, Spain
Pinar Alper, University of Manchester, UK
Khalid Belhajjame, University of Manchester, UK
Contributors:
Oscar Corcho, Ontology Engineering Group, Universidad Politécnica de Madrid, Spain
Yolanda Gil, University of Southern California, USA
Carole Goble, University of Manchester, UK
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic License.

Abstract

This document explains the ontology for the Workflow Motif catalogue described in [Workflow Catalogue]. The catalogue highlights the results obtained from a manual analysis performed over a set of real-world scientific workflows from Taverna [Taverna], Wings [Wings], Galaxy [Galaxy] and Vistrails [Vistrails]. Workflow Motifs outline the kinds of data-intensive activities that are observed in workflows (data-operation motifs) and the different manners in which activities are implemented within workflows (workflow-oriented motifs). These motifs are helpful to identify the functionality of the steps in a given workflow, to develop best practices for workflow design, and to develop approaches for automated generation of workflow abstractions

The latest OWL encoding of the Workflow Motifs Ontology can be found here

Most of the content displayed in this document has been retrieved from [Workflow Catalogue].

Table of Contents

1. Introduction back to ToC

Scientific workflows have been increasingly used in the last decade as an instrument for data intensive science. Workflows serve a dual function: first, as detailed documentation of the scientific method used for an experiment (i. e. the input sources and processing steps taken for the derivation of a certain data item), and second, as re-usable, executable artifacts for data-intensive analysis. Scientific workflows are composed of a variety of data manipulation activities such as Data Movement, data transformation, Data Analysis and Data Visualization to serve the goals of the scientific study. The composition is done through the constructs made available by the workflow system used, and is largely shaped by the function undertaken by the workflow and the environment in which the system operates.

A major difficulty in understanding workflows is their complex nature. A workflow may contain several scientifically-significant analysis steps, combined with other Data Preparation or result delivery activities, and in different implementation styles depending on the environment and context in which the workflow is executed. This difficulty in understanding stands in the way of reusing workflows.

As a first step towards addressing this issue [Workflow Catalogue] describes a catalogue of domain independent conceptual abstractions for workflow steps called scientific Workflow Motifs. The catalogue was built based on an empirical analysis performed over 260 workflow descriptions from Taverna [Taverna], Wings [Wings], Galaxy [Galaxy] and Vistrails [Vistrails]. Motifs are provided through i) a characterization of the kinds of data-operation activities that are carried out within workflows, which are referred to as data-operation motifs, and ii) a characterization of the different manners in which those activity motifs are realized/implemented within workflows, referred to as workflow-oriented motifs.

This document specifies the classes and properties of the Workflow Motifs ontology, the OWL 2 encoding ot the aforementioned motif catalogue. The goal of this ontology is to provide the means to annotate workflows and their steps with the motifs of the vocabulary, without setting any restriction on how the workflows are defined themselves.

1.1. Namespace declarations back to ToC

Table 2: Namespaces used in the document
owl<http://www.w3.org/2002/07/owl#>
rdfs<http://www.w3.org/2000/01/rdf-schema#>
wfm<http://purl.org/net/wf-motifs#>
exA prefix used for the examples. It could be, for instance, <http:example.org#>

2. Workflow Motifs Overview back to ToC

The Workflow Motifs Ontology represents the concepts of Workflow Motif Catalogue [Workflow Catalogue], which follow the next hierarchy:

The classes and properties of the ontology can be found below. It is important to note that in order to keep the ontology simple the properties that link a workflow step with a motif have no domain specified. This decission has been taken due to the possibilities of representing workflows. How workflows are represented is out of the scope of this ontology. Examples on how to annotate workflows can be found in section 3.1.

Classes

The classes identify the different motifs obtained after performing the manual analysis. They correspond to those shown in the previous hierarchy.

Object properties

The object properties relate a workflow or step of a workflow to the motif or motifs describing its functionality.

3. Workflow Motifs Description back to ToC

The goal of this document is to define an OWL 2 ontology for the motif catalogue proposed in [Workflow Catalogue]. The Workflow Motif ontology provides the means to annotate scientific workflows, helping creators and curators to describe their functionality and facilitating the search of workflows with a particular purpose (e.g., retrieving workflows with merging, analysis and filtering steps).

The Workflow Motif ontology can also be used to annotate other types of scientific processes such as laboratory protocols or business workflows. Since these scientific processes may be defined according to different models, we define no domain for the object properties that bind a motif to a process (or a workflow step).

3.1 Describing workflows with the Workflow Motif ontology

Figure 1 shows an example of how the Workflow Motif ontology can be used, considering a workflow specification that has workflows (ex:Workflow) and workflow steps (ex:WorkflowStep). In this case we also use the notion of processes (ex:Process) to identify a generic class used to refer to both workflow steps and workflows. To annotate a workflow, the user can associate it with the corresponding Workflow Motif using the wfm:hasWorkflowMotif property. Similarily, to annotate a given workflow step the user can associate it with the corresponding data operation motif using the wfm:hasDataOperationMotif property. Finally, in order to simplify the annotation process, the user may use the more general property wfm:hasMotif to associate a workflow or a workflow step to a motif (represented by the class wfm:Motif).

Figure 1 example
Figure 1. Example of the Workflow Motif ontology. If the workflow being annotated has ex:WorkflowSteps
and ex:Workflows, these classes could be used as domain of the properties of the Workflow Motif
ontology for annotation. The ex:Process class has been added in case the user doesn't want to
specify if the resource being annotated is a step or a workflow.

Section 3.1.1 and Section 3.1.2 show how to annotate workflows from Taverna and Wings with the Workflow Motif ontology. Both systems use different workflow specification ontologies (wfdesc [Wfdesc] and OPMW [OPMW]) but they are easily annotated with the Workflow Motif ontology.

3.1.1 Annotating a Taverna workflow defined with the wfdesc ontology

Figure 2 shows a workflow created in the Taverna workflow system for functional genomics, where different motifs have been identified in the workflow steps (dotted boxes). Three processes are stateful invocations of web services (getJobState, sleep and warp2D), two are moving data to external servers (DataUpload and DownloadResults), one performs the data analysis of the workflow (warp2D) and one augments the input for the warp2D from several input parameters of the workflow (warp2D_input).

The workflow is defined according to the wfdesc ontology [Wfdesc], where all the workflow steps are encoded as wfdesc:Processes, and have associated a different URI. Each wfdesc:Process has one or more inputs and produces an output. More details about the ontology are available on the wfdesc specification web page.

Figure 2 example
Figure 2. A Taverna workflow example for functional genomics, with some motifs highlighted.
The workflow transfers data files containing proteomics data to a remote server and
augments several parameters for the invocation request. Then the workflow waits for
job completion and inquires about the state of the submitted warping job. Once the
inquiry call is returned the results are downloaded from the remote server.

Figure 3 illustrates how to annotate four of the processors of the workflow in Figure 2 with their correspondent motif instances (the rest have been excluded for simplicity). Each processor (of type wfdesc:Process) is taken as subject for the wfm:hasDataOperationMotif or wfm:hasWorkflowMotif properties, which link them to the appropriate instances of the motifs of the catalog. In this example the motif instances are blank nodes, but any other identifier would be valid as well.

Figure 3 example
Figure 3. Annotation of part of the workflow of Figure 2 with the Workflow Motif ontology.
Each instance has the class it belongs to highlighted before its identifier.
Workflow fragment instances (processors) are represented in purple, while
workflow motif instances (blank nodes) are represented in blue.

3.1.2 Annotating a Wings workflow defined with the OPMW ontology

Figure 4 shows a workflow created in the Wings workflow system for performing a ligand binding sites comparison of the inputs. Eight motifs have been identified in the workflow: two perform the data anlysis of the input datasets (both instances of SMAPV2), two sort the obtained results (both instances of ResultSorter), two merge both branches of the workflow (Merger and SMAPAlignementMerger) and two identify a repetitive sequence (the sequence SMAPV2 plus ResultSorter occur two times).

The workflow is defined according to the OPMW ontology [OPMW], where each step of the workflow is defined as a opmw:WorkflowTemplateProcess. Each opmw:WorkflowTemplateProcess uses one or more inputs and produces an output. More information about the OPMW ontology can be found in the ontology specification web page.

Figure 4 example
Figure 4. A Wings workflow for ligand binding sites comparison. The workflow takes as input
the proteins and homology models of TB-Drugome and compares them against the
approved drugs in parallel. The results are then sorted, filtered according
to a p-value and finally merged.

Figure 5 shows how some of the motifs identified in Figure 4 can be annotated with the Workflow Motif ontology (the rest have been ommited for clarity). The way the annotations are performed is similar to the method followed in Section 3.1.1. Each opmw:WorkflowTemplateProcess is bound to a motif instance with the wfm:hasDataOperationMotif or the wfm:hasWorkflowMotif object properties. A special case is the binding to the internal macro (_:mtf3), where the subject of the property is the sequence of the :SMAP_V2_1 and the :SMAPResultSorter_1 steps. In this case a named graph is used to group the steps (:namedGraph1) and bind them to the internal macro motif instance (_:mtf3).

Figure 5 example
Figure 5. Annotation of part of the workflow of Figure 4 with the Workflow Motif ontology. Each instance has the class
it belongs to highlighted before its identifier. Workflow fragment instances (processors) are represented
in purple, while workflow motif instances (blank nodes) are represented in blue. Two processors have
been grouped in a named graph in order to associate a motif to the sub-workflow they compose.

4. Cross reference for Workflow Motifs classes and properties

This section provides details for each class and property defined by the Workflow Motifs Ontology.

4.1 Classes

Atomic Workflowc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#AtomicWorkflow

Inter-workflow Motif used to characterize the workflows that perform an atomic unit of functionality, which effectively requires no sub-workflow usage. Typically these workflows are designed to be included in other workflows. Atomic Workflows are the main mechanism of modularizing functionality within scientific workflows.

has super-classes

Combinec back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#Combine

Data Preparation motif that refers to the step or group of steps in the workflow aggregating information from different sources. For example, the joining of two tables in a new one or the merging of three different files in a bigger one

has super-classes

Composite Workflowc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#CompositeWorkflow

Inter-workflow Motif referring to all those workflows that have one or more sub-workflows included in them (when these sub-workflows overlap they offer different views of the global workflow).

has super-classes

Computational Stepc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#ComputationalStep

Intra-Workflow Motif used to refer to activities performed by a computer. This motif applies to most of the steps of the workflow, except for those being Human Interaction Steps.

has super-classes

Data Analysisc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#DataAnalysis

Data Operation Motif that refers to a broad category of tasks in diverse domains. An important number of workflows are designed with the purpose of analyzing different features of input data, ranging from simple comparisons between the datasets to complex protein analysis to see whether two molecules can be docked successfully or not.

has super-classes

Data Cleaningc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#DataCleaning

Data Operation Motif that refers to the step or series of steps for cleaning and curating data in a workflow. Typically these steps are undertaken by sophisticated tooling/services, or by human interactions. A Data Cleaning step preserves and enriches the content of data (e.g., by a user’s annotation of a result with additional information, detecting and removing inconsistencies on the data, etc.).

has super-classes

Data Movementc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#DataMovement

Certain analysis activities that are performed via external tools or services require the submission of data to a location accessible by the service/tool (i.e., a web or a local directory respectively). In such cases the workflow contains dedicated step(s) for the upload/transfer of data to these locations. The same applies to the outputs, in which case a data download/retrieval step is used to chain the data to the next steps of the workflow.

has super-classes

Data Preparationc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#DataPreparation

Data, as it is originally retrieved, may need several transformations before being able to be used in a workflow step. These steps, typically known as ”Shims” [SHIMS], can be annotated using the Data Preparation Motif.

has super-classes
has sub-classes
Combine c, Filter c, Format Transformation c, Group c, Input Augmentation c, Output Extraction c, Sort c, Split c.

Data Retrievalc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#DataRetrieval

Workflows exploit heterogeneous data sources, remote databases, repositories and other web resources mostly exposed via SOAP or REST services. Scientific data deposited in these repositories are retrieved through query and retrieval steps inside workflows. The Data Retrieval motif identifies those tasks within the workflow which are responsible for retrieving data from external sources into the workflow environment.

has super-classes

Data Visualizationc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#DataVisualization

Being able to show the results is as important as producing them in some workflows. Scientists use visualizations to show the conclusions of their experiments and to take important decisions in the pipeline itself. Therefore certain steps in workflows are dedicated to generation of plots and graph outputs from input data. The Data Visualization motif also includes the generation of tables and files for browsing and reading the results of the workflow.

has super-classes

Filter c back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifsl#Filter

Data Preparation motif that refers to a filtering step or set of steps. For example, A file filtered by a set of set of keywords, a table filtered by a threshold, etc.

has super-classes

Format Transformationc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#FormatTransformation

Workflows that bring together multiple access or analysis activities usually contain steps for Format Transformations. These steps preserve the content of the inputs while converting its representation format. An example would be a converter from CSV to VOTable format in the astronomy domain, a converter from FASTA to Swiss-Prott sequence in biology, or an Arff formating component for Weka in the text analysis domain.

has super-classes

Groupc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#Group

Data Preparation motif that refers to the step or set of steps that reorganize the input into different groups. For example, grouping a table by a certain category.

has super-classes

Human Interaction Stepc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#HumanInteractionStep

Intra-Workflow Motif used to characterize the activities that require human inputs during their execution. For example, manual data curation of a table for a future step in the workflow, cleaning and filtering steps (such as selecting a specific dataset to continue the experiment), etc.

has super-classes

Input Augmentationc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#InputAugmentation

Data Preparation motif that refers to the step or set of steps dedicated to generate an aggregation of multiple parameters and scripts for tools and external services. For example, the generation of queries for input retrieval through an aggregation of multiple parameters, the generation of scripts to be executed in further steps, etc.

has super-classes

Inter-workflow Motifc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#InterWorkflowMotif

Workflow Motif that relates workflows with each other by determining whether different workflows are a composition of each other (Composite Workflow) or not (Atomic Workflow), or they have a very similar composition but work for different inputs (Workflow Overloading).

has super-classes
has sub-classes
Atomic Workflow c, Composite Workflow c, Workflow Overload c

Internal Macroc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#InternalMacro

Intra-Workflow Motif that refers to those groups of steps in the workflow that correspond to repetitive patterns of combining tasks. For example, if a workflow has several branches with the same sequence of repeated steps, the sequence becomes an Internal Macro.

has super-classes

Intra-Workflow Motifc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#IntraWorkflowMotif

Workflow Motif that describes a step or a series of steps within a single workflow.

has super-classes
has sub-classes
Asynchronous invocation c, Computational interaction c, Human interaction c, Internal Macro c, Synchronous invocation c

Motifc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#Motif

A motif is a domain independent conceptual abstraction of one or more steps of a given workflow.

has sub-classes
Data Operation Motif c, WorkflowMotif c
is in range of
has motif op

Data Operation Motifc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#DataOperationMotif

A Data Operation Motif describes the data manipulation and/or transformation carried out by a step in the workflow, a collection of steps in the workflow or a sub-workflow.

has super-classes
has sub-classes
Data Preparation c, Data Cleaning c, Data Analysis c, Data Retrieval c, Data Visualization c, Data Movement c
is in range of
has Data Operation Motif op

Output Extractionc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#OutputExtraction

Data Preparation motif that refers to the step or set of steps in the workflow retrieving only the relevant portion from the the output of a previous step. For example, retrieving the tag value of an XML fragment.

has super-classes

Sortc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#Sort

Data Preparation motif that refers to the step or set of steps ordering the input by certain parameter. For example, a quicksort algorithm that takes an unordered vector and produces a sorted output.

has super-classes

Splitc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifsl#Split

Data Preparation motif that refers to the step or steps in the workflow separating an input into different outputs. For example, splitting a dataset in three different subsets to be processed in parallel in a workflow.

has super-classes

Stateful (asynchronous) invocationc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#StatefulInvocation

Certain activities such as analysis or visualizations could be performed through interaction with stateful (web) services that allow for creation of jobs over remote grid environments. These are typically performed via invocation of multiple operations at a service endpoint. An example would be a BLAST job submission where the service invoker is responsible to first create a job, then submit the data, check the status and retrieve the results once it has finished.

has super-classes

Stateless (synchronous) invocationc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#StatelessInvocation

Intra-Workflow Motif that requires a step in the workflow for performing a service call or tool invocation. All the steps of a workflow are by default stateless (synchronous) invocations unless they are explicitly declared to be stateful.

has super-classes

WorkflowMotifc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#WorkflowMotif

Motif that describes how a Data Operation Motif is realized (i.e., implemented) within a workflow. For example, a visualization step (Data Operation Motif) can be realized in different ways: via a stateful multi-step invocation, through a single Stateless Invocation (depending on the environmental constraints and nature of the services), or via a sub-workflow.

has super-classes
has sub-classes
Inter-workflow Motif c, Intra-Workflow Motif c
is in range of
has Workflow Motif op

Workflow Overloadc back to ToC or Class ToC

IRI: http://purl.org/net/wf-motifs#WorkflowOverload

Inter-workflow Motif used to characterize workflows that are used to operate over different input parameter types. An example is performing an analysis over a String input parameter, or performing it over the contents of a specified File. Overloading is a direct response to the heterogeneity of environments in which workflows are used.

has super-classes

4.2 Object Properties

has motifop back to ToC or Object Property ToC

IRI: http://purl.org/net/wf-motifs#hasMotif

Object property used to annotate a step in the workflow, a group of steps, a subworkflow, or a workflow with a motif.

has sub-properties
has Data Operation Motif op, has Workflow Motif op
has range

has Data Operation Motifop back to ToC or Object Property ToC

IRI: http://purl.org/net/wf-motifs#hasDataOperationMotif

Object property that connects a workflow step with its correspondent Data Operation Motif.

has super-properties
has range

has Workflow Motifop back to ToC or Object Property ToC

IRI: http://purl.org/net/wf-motifs#hasWorkflowMotif

Object property that relates a workflow to its correspondent Workflow Motif.

has super-properties
has range

6. References back to ToC

[Galaxy]
Jeremy Goecks, Anton Nekrutenko, and James Taylor. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology, 11(8):R86, 2010. http://genomebiology.com/2010/11/8/R86
[OPMW]
Daniel Garijo and Yolanda Gil. A new approach for publishing workflows: Abstractions, standards, and Linked Data. In Proceedings of the 6th workshop on Workflows in support of large-scale science, Proceedings of the 6th workshop on Workflows in support of large-scale science, pages 47-56, Seattle, 2011. ACM. http://www.isi.edu/~gil/papers/garijo-gil-works11.pdf
[SHIMS]
Hull, Duncan, Robert Stevens, Phillip Lord, Chris Wroe, and Carole Goble. 2004. Treating shimantic web syndrome with ontologies. In AKT Workshop on Semantic Web Services, ed. John Domingue, Liliana Cabral, and Enrico Motta. Vol. 122. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.1644.
[Taverna]
Paolo Missier, Stian Soiland-Reyes, Stuart Owen, Wei Tan, Alex Nenadic, Ian Dunlop, Alan Williams, Tom Oinn, and Carole Goble. Taverna, reloaded. In M Gertz, T Hey, and B Ludaescher, editors, Procs. SSDBM 2010, Heidelberg, Germany, 2010. http://www.taverna.org.uk/pages/wp-content/uploads/2010/04/T2Architecture.pdf
[Vistrails]
Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire, and Claudio T. Silva. Querying and re-using workflows with vistrails. In Proceedings of the 2008 ACM SIGMOD international conference on Man- agement of data, SIGMOD '08, pages 1251-1254, New York, NY, USA, 2008. ACM. http://dl.acm.org/citation.cfm?id=1376747
[Workflow Catalogue]
Common Motifs in Scientific Workflows: An Empirical Analysis. Garijo, D.; Alper, P.; Belhajjame, K.; Corcho, O.; Gil, Y.; and Goble, C. 2012. In 8th IEEE International Conference on eScience 2012, IEEE Computer Society Press, USA, Chicago. http://www.isi.edu/~gil/papers/garijo-etal-escience12.pdf.
[Wfdesc]
Khalid Belhajjame, Oscar Corcho, Daniel Garijo, Jun Zhao, Paolo Missier, David Newman, Raul Palma, Sean Bechhofer, Esteban Garcia-Cuesta, Jose-Manuel Gomez-Perez, Graham Klyne, Kevin Page, Marco Roos, Jose Enrique Ruiz, Stian Soiland-Reyes, Lourdes Verdes-Montenegro, David De Roure, and Carole Goble. Workflow-centric research objects: First class citizens in scholarly discourse. In Proceedings of Sepublica2012, pages 1-12, 2012. http://users.ox.ac.uk/~oerc0033/preprints/sepublica2012.pdf
[Wings]
Yolanda Gil, Varun Ratnakar, Jihie Kim, Pedro A. González-Calero, Paul T. Groth, Joshua Moody, and Ewa Deelman. Wings: Intelligent workflow-based design of computational experiments. IEEE Intelligent Systems, 26(1):62–72, 2011. http://www.isi.edu/~gil/papers/gil-etal-ieee-is-11.pdf

7. Acknowledgements back to ToC

We would like to thank Silvio Peroni for developing the LODE framework, partially used in for the cross reference section of this document and Raul Alcazar and Miguel Angel García Delgado for their technical support.

8. Changes from the last release back to ToC

  • Renamed "OperationMotif" to "DataOperationMotif".
  • Removed "StructuredDataPreparation" and "UnstructuredDataPreparation" classes.
  • Replaced "Merge" and "Join" with the "Combine" class.
  • Removed the "DataCreation" class.
  • Renamed "AsynchronousInvocation" and "SynchronousInvocation" to "StatefulInvocation" and "StatelessInvocation" respectively.
  • Renamed "implementsMotif" to "hasMotif". The change applies to its subproperties as well.
  • Renamed "ComputationalInteraction" to "ComputationalStep".