This document explains the ontology for the Workflow Motif catalogue described in [Workflow Catalogue]. The catalogue highlights the results obtained from a manual analysis performed over a set of real-world scientific workflows from Taverna [Taverna], Wings [Wings], Galaxy [Galaxy] and Vistrails [Vistrails]. Workflow Motifs outline the kinds of data-intensive activities that are observed in workflows (data-operation motifs) and the different manners in which activities are implemented within workflows (workflow-oriented motifs). These motifs are helpful to identify the functionality of the steps in a given workflow, to develop best practices for workflow design, and to develop approaches for automated generation of workflow abstractions
The latest OWL encoding of the Workflow Motifs Ontology can be found here
Most of the content displayed in this document has been retrieved from [Workflow Catalogue].
Scientific workflows have been increasingly used in the last decade as an instrument for data intensive science. Workflows serve a dual function: first, as detailed documentation of the scientific method used for an experiment (i. e. the input sources and processing steps taken for the derivation of a certain data item), and second, as re-usable, executable artifacts for data-intensive analysis. Scientific workflows are composed of a variety of data manipulation activities such as Data Movement, data transformation, Data Analysis and Data Visualization to serve the goals of the scientific study. The composition is done through the constructs made available by the workflow system used, and is largely shaped by the function undertaken by the workflow and the environment in which the system operates.
A major difficulty in understanding workflows is their complex nature. A workflow may contain several scientifically-significant analysis steps, combined with other Data Preparation or result delivery activities, and in different implementation styles depending on the environment and context in which the workflow is executed. This difficulty in understanding stands in the way of reusing workflows.
As a first step towards addressing this issue [Workflow Catalogue] describes a catalogue of domain independent conceptual abstractions for workflow steps called scientific Workflow Motifs. The catalogue was built based on an empirical analysis performed over 260 workflow descriptions from Taverna [Taverna], Wings [Wings], Galaxy [Galaxy] and Vistrails [Vistrails]. Motifs are provided through i) a characterization of the kinds of data-operation activities that are carried out within workflows, which are referred to as data-operation motifs, and ii) a characterization of the different manners in which those activity motifs are realized/implemented within workflows, referred to as workflow-oriented motifs.
This document specifies the classes and properties of the Workflow Motifs ontology, the OWL 2 encoding ot the aforementioned motif catalogue. The goal of this ontology is to provide the means to annotate workflows and their steps with the motifs of the vocabulary, without setting any restriction on how the workflows are defined themselves.
owl | <http://www.w3.org/2002/07/owl#> |
rdfs | <http://www.w3.org/2000/01/rdf-schema#> |
wfm | <http://purl.org/net/wf-motifs#> |
ex | A prefix used for the examples. It could be, for instance, <http:example.org#> |
The classes and properties of the ontology can be found below. It is important to note that in order to keep the ontology simple the properties that link a workflow step with a motif have no domain specified. This decission has been taken due to the possibilities of representing workflows. How workflows are represented is out of the scope of this ontology. Examples on how to annotate workflows can be found in section 3.1.
The classes identify the different motifs obtained after performing the manual analysis. They correspond to those shown in the previous hierarchy.
The object properties relate a workflow or step of a workflow to the motif or motifs describing its functionality.
The goal of this document is to define an OWL 2 ontology for the motif catalogue proposed in [Workflow Catalogue]. The Workflow Motif ontology provides the means to annotate scientific workflows, helping creators and curators to describe their functionality and facilitating the search of workflows with a particular purpose (e.g., retrieving workflows with merging, analysis and filtering steps).
The Workflow Motif ontology can also be used to annotate other types of scientific processes such as laboratory protocols or business workflows. Since these scientific processes may be defined according to different models, we define no domain for the object properties that bind a motif to a process (or a workflow step).
Figure 1 shows an example of how the Workflow Motif ontology can be used, considering a workflow specification that has workflows (ex:Workflow) and workflow steps (ex:WorkflowStep). In this case we also use the notion of processes (ex:Process) to identify a generic class used to refer to both workflow steps and workflows. To annotate a workflow, the user can associate it with the corresponding Workflow Motif using the wfm:hasWorkflowMotif property. Similarily, to annotate a given workflow step the user can associate it with the corresponding data operation motif using the wfm:hasDataOperationMotif property. Finally, in order to simplify the annotation process, the user may use the more general property wfm:hasMotif to associate a workflow or a workflow step to a motif (represented by the class wfm:Motif).
Section 3.1.1 and Section 3.1.2 show how to annotate workflows from Taverna and Wings with the Workflow Motif ontology. Both systems use different workflow specification ontologies (wfdesc [Wfdesc] and OPMW [OPMW]) but they are easily annotated with the Workflow Motif ontology.
Figure 2 shows a workflow created in the Taverna workflow system for functional genomics, where different motifs have been identified in the workflow steps (dotted boxes). Three processes are stateful invocations of web services (getJobState, sleep and warp2D), two are moving data to external servers (DataUpload and DownloadResults), one performs the data analysis of the workflow (warp2D) and one augments the input for the warp2D from several input parameters of the workflow (warp2D_input).
The workflow is defined according to the wfdesc ontology [Wfdesc], where all the workflow steps
are encoded as wfdesc:Processes
, and have associated a different URI. Each wfdesc:Process
has one or more inputs and produces an output.
More details about the ontology are available on the wfdesc specification web page.
Figure 3 illustrates how to annotate four of the processors of the workflow in Figure 2 with their correspondent motif instances (the rest have
been excluded for simplicity). Each processor (of type wfdesc:Process
) is taken as subject for the wfm:hasDataOperationMotif or
wfm:hasWorkflowMotif properties, which link them to the appropriate instances of the motifs of the catalog. In this example the motif instances
are blank nodes, but any other identifier would be valid as well.
Figure 4 shows a workflow created in the Wings workflow system for performing a ligand binding sites comparison of the inputs. Eight motifs have been identified in the workflow: two perform the data anlysis of the input datasets (both instances of SMAPV2), two sort the obtained results (both instances of ResultSorter), two merge both branches of the workflow (Merger and SMAPAlignementMerger) and two identify a repetitive sequence (the sequence SMAPV2 plus ResultSorter occur two times).
The workflow is defined according to the OPMW ontology [OPMW], where each step of the workflow is defined as
a opmw:WorkflowTemplateProcess
. Each opmw:WorkflowTemplateProcess
uses one or more inputs and produces an output. More information about the
OPMW ontology can be found in the ontology specification web page.
Figure 5 shows how some of the motifs identified in Figure 4 can be annotated with the Workflow Motif ontology
(the rest have been ommited for clarity). The way the annotations are performed is similar to the method followed in Section 3.1.1.
Each opmw:WorkflowTemplateProcess
is bound to a motif instance with the wfm:hasDataOperationMotif or the wfm:hasWorkflowMotif object properties.
A special case is the binding to the internal macro (_:mtf3), where the subject of the property is the sequence of the :SMAP_V2_1 and
the :SMAPResultSorter_1 steps. In this case a named graph is used to group the steps (:namedGraph1) and bind them to the internal macro motif instance (_:mtf3).
IRI: http://purl.org/net/wf-motifs#AtomicWorkflow
IRI: http://purl.org/net/wf-motifs#Combine
Data Preparation motif that refers to the step or group of steps in the workflow aggregating information from different sources. For example, the joining of two tables in a new one or the merging of three different files in a bigger one
IRI: http://purl.org/net/wf-motifs#CompositeWorkflow
Inter-workflow Motif referring to all those workflows that have one or more sub-workflows included in them (when these sub-workflows overlap they offer different views of the global workflow).
IRI: http://purl.org/net/wf-motifs#ComputationalStep
Intra-Workflow Motif used to refer to activities performed by a computer. This motif applies to most of the steps of the workflow, except for those being Human Interaction Steps.
IRI: http://purl.org/net/wf-motifs#DataAnalysis
Data Operation Motif that refers to a broad category of tasks in diverse domains. An important number of workflows are designed with the purpose of analyzing different features of input data, ranging from simple comparisons between the datasets to complex protein analysis to see whether two molecules can be docked successfully or not.
IRI: http://purl.org/net/wf-motifs#DataCleaning
Data Operation Motif that refers to the step or series of steps for cleaning and curating data in a workflow. Typically these steps are undertaken by sophisticated tooling/services, or by human interactions. A Data Cleaning step preserves and enriches the content of data (e.g., by a user’s annotation of a result with additional information, detecting and removing inconsistencies on the data, etc.).
IRI: http://purl.org/net/wf-motifs#DataMovement
Certain analysis activities that are performed via external tools or services require the submission of data to a location accessible by the service/tool (i.e., a web or a local directory respectively). In such cases the workflow contains dedicated step(s) for the upload/transfer of data to these locations. The same applies to the outputs, in which case a data download/retrieval step is used to chain the data to the next steps of the workflow.
IRI: http://purl.org/net/wf-motifs#DataPreparation
Data, as it is originally retrieved, may need several transformations before being able to be used in a workflow step. These steps, typically known as ”Shims” [SHIMS], can be annotated using the Data Preparation Motif.
IRI: http://purl.org/net/wf-motifs#DataRetrieval
Workflows exploit heterogeneous data sources, remote databases, repositories and other web resources mostly exposed via SOAP or REST services. Scientific data deposited in these repositories are retrieved through query and retrieval steps inside workflows. The Data Retrieval motif identifies those tasks within the workflow which are responsible for retrieving data from external sources into the workflow environment.
IRI: http://purl.org/net/wf-motifs#DataVisualization
Being able to show the results is as important as producing them in some workflows. Scientists use visualizations to show the conclusions of their experiments and to take important decisions in the pipeline itself. Therefore certain steps in workflows are dedicated to generation of plots and graph outputs from input data. The Data Visualization motif also includes the generation of tables and files for browsing and reading the results of the workflow.
IRI: http://purl.org/net/wf-motifsl#Filter
Data Preparation motif that refers to a filtering step or set of steps. For example, A file filtered by a set of set of keywords, a table filtered by a threshold, etc.
IRI: http://purl.org/net/wf-motifs#FormatTransformation
Workflows that bring together multiple access or analysis activities usually contain steps for Format Transformations. These steps preserve the content of the inputs while converting its representation format. An example would be a converter from CSV to VOTable format in the astronomy domain, a converter from FASTA to Swiss-Prott sequence in biology, or an Arff formating component for Weka in the text analysis domain.
IRI: http://purl.org/net/wf-motifs#Group
Data Preparation motif that refers to the step or set of steps that reorganize the input into different groups. For example, grouping a table by a certain category.
IRI: http://purl.org/net/wf-motifs#HumanInteractionStep
Intra-Workflow Motif used to characterize the activities that require human inputs during their execution. For example, manual data curation of a table for a future step in the workflow, cleaning and filtering steps (such as selecting a specific dataset to continue the experiment), etc.
IRI: http://purl.org/net/wf-motifs#InputAugmentation
Data Preparation motif that refers to the step or set of steps dedicated to generate an aggregation of multiple parameters and scripts for tools and external services. For example, the generation of queries for input retrieval through an aggregation of multiple parameters, the generation of scripts to be executed in further steps, etc.
IRI: http://purl.org/net/wf-motifs#InterWorkflowMotif
Workflow Motif that relates workflows with each other by determining whether different workflows are a composition of each other (Composite Workflow) or not (Atomic Workflow), or they have a very similar composition but work for different inputs (Workflow Overloading).
IRI: http://purl.org/net/wf-motifs#InternalMacro
Intra-Workflow Motif that refers to those groups of steps in the workflow that correspond to repetitive patterns of combining tasks. For example, if a workflow has several branches with the same sequence of repeated steps, the sequence becomes an Internal Macro.
IRI: http://purl.org/net/wf-motifs#IntraWorkflowMotif
Workflow Motif that describes a step or a series of steps within a single workflow.
IRI: http://purl.org/net/wf-motifs#Motif
A motif is a domain independent conceptual abstraction of one or more steps of a given workflow.
IRI: http://purl.org/net/wf-motifs#DataOperationMotif
A Data Operation Motif describes the data manipulation and/or transformation carried out by a step in the workflow, a collection of steps in the workflow or a sub-workflow.
IRI: http://purl.org/net/wf-motifs#OutputExtraction
Data Preparation motif that refers to the step or set of steps in the workflow retrieving only the relevant portion from the the output of a previous step. For example, retrieving the tag value of an XML fragment.
IRI: http://purl.org/net/wf-motifs#Sort
Data Preparation motif that refers to the step or set of steps ordering the input by certain parameter. For example, a quicksort algorithm that takes an unordered vector and produces a sorted output.
IRI: http://purl.org/net/wf-motifsl#Split
Data Preparation motif that refers to the step or steps in the workflow separating an input into different outputs. For example, splitting a dataset in three different subsets to be processed in parallel in a workflow.
IRI: http://purl.org/net/wf-motifs#StatefulInvocation
Certain activities such as analysis or visualizations could be performed through interaction with stateful (web) services that allow for creation of jobs over remote grid environments. These are typically performed via invocation of multiple operations at a service endpoint. An example would be a BLAST job submission where the service invoker is responsible to first create a job, then submit the data, check the status and retrieve the results once it has finished.
IRI: http://purl.org/net/wf-motifs#StatelessInvocation
Intra-Workflow Motif that requires a step in the workflow for performing a service call or tool invocation. All the steps of a workflow are by default stateless (synchronous) invocations unless they are explicitly declared to be stateful.
IRI: http://purl.org/net/wf-motifs#WorkflowMotif
Motif that describes how a Data Operation Motif is realized (i.e., implemented) within a workflow. For example, a visualization step (Data Operation Motif) can be realized in different ways: via a stateful multi-step invocation, through a single Stateless Invocation (depending on the environmental constraints and nature of the services), or via a sub-workflow.
IRI: http://purl.org/net/wf-motifs#WorkflowOverload
Inter-workflow Motif used to characterize workflows that are used to operate over different input parameter types. An example is performing an analysis over a String input parameter, or performing it over the contents of a specified File. Overloading is a direct response to the heterogeneity of environments in which workflows are used.
IRI: http://purl.org/net/wf-motifs#hasMotif
Object property used to annotate a step in the workflow, a group of steps, a subworkflow, or a workflow with a motif.
IRI: http://purl.org/net/wf-motifs#hasDataOperationMotif
Object property that connects a workflow step with its correspondent Data Operation Motif.
IRI: http://purl.org/net/wf-motifs#hasWorkflowMotif
Object property that relates a workflow to its correspondent Workflow Motif.
We would like to thank Silvio Peroni for developing the LODE framework, partially used in for the cross reference section of this document and Raul Alcazar and Miguel Angel García Delgado for their technical support.
Inter-workflow Motif used to characterize the workflows that perform an atomic unit of functionality, which effectively requires no sub-workflow usage. Typically these workflows are designed to be included in other workflows. Atomic Workflows are the main mechanism of modularizing functionality within scientific workflows.