[provenance-challenge] Re: review of workflows for pc3

pgroth at ISI.EDU pgroth at ISI.EDU
Tue Nov 25 14:36:51 GMT 2008


Hi Yogesh,

Thanks for the extra information. 

All: Apologies for the delay in processing messages. Hopefully, the list should
now be fixed. 

Thanks,
Paul

Quoting Yogesh Simmhan <yoges at microsoft.com>:

> Hi Paul,
> 
> Thanks for your comments. Regarding the ease of portability of the Pan-STARRS
> Load/Merge workflow, all our activities are either SQL queries and updates,
> or file system operations. While our current executables are for MSSQL/C#,
> the SQL activities are simple enough to port to any relational DBMS (MySQL,
> Apache Derby, ...) and programming language. The main workflows operate on 3
> relational tables with about 50 columns.
> 
> If selected, we can provide Java source code using Derby, in addition to the
> C# version using MSSQL. We'll also provide textual descriptions of the
> activities to enable them to be ported to other DB/languages.
> 
> While the typical Pan-STARRS workflows operate on large datasets, there is
> nothing that prevents the challenge workflows from operating on a subset of
> those. Indeed, we use small CSV files and databases (<1MB) for our own
> testing that we can provide for the challenge.
> 
> Metadata about the telescope is not part of the normal workflow pipeline, but
> we can consider incorporating supplementary annotations about the telescope
> outside the scope of the workflow to see how the provenance systems embed
> annotations in OPM and handle annotation queries.
> 
> Best,
> --Yogesh
> 
> 
> |
> | pgroth at ISI.EDU wrote:
> | > Hi,
> | >
> | > To kick start our discussion about what workflows should be used for
> | the third
> | > provenance challenge, below are my thoughts on which would be most
> | appropriate
> | > and some questions to the authors. First, let me say that I thought
> | all the
> | > workflows would provide a good basis for an interesting challenge but
> | to be
> | > decisive I'm selected two.
> | >
> | > The two selection criteria I used were the complexity of the
> | structures within
> | > the workflows (i.e. did it have loops, hierarchies, collections, etc.)
> | and how
> | > easy it would be for other teams to get the workflows up and running.
> | I believe
> | > given the complex control structures in some of these workflows that
> | it would
> | > be difficult to provide intermediary data sets and thus teams would
> | need to
> | > execute the workflows themselves unlike previous challenges where
> | dummy
> | > components could be used.
> | >
> | > 1. Build and test workflow
> | > In terms of being able to execute the workflows, the Software build
> | and testing
> | > workflow seems by far the easiest to get up and running. Most systems
> | have ant
> | > and java and the build file can be easily adapted to use Makefiles.
> | Likewise,
> | > the ant file has a multi-level hierarchy, which is an interesting
> | structure.
> | > The downside to the workflow is it's lack of complexity, it does not
> | have
> | > collections or nested data sets. However, I think the workflow would
> | make for a
> | > simple starting point for testing interoperability before moving on
> | to the more
> | > complex second workflow. Furthermore, by using an ant file the
> | challenge does
> | > not become too workflow specific.
> | >
> | > 2. MSR-WSU Pan-Starrs workflow
> | > My first choice for second workflow is the MSR-WSU, Panstarrs
> | workflow. It has a
> | > number of interesting workflow structures such as if/else as well as
> | loops over
> | > collections. I also like the the idea of having multiple levels of
> | abstraction
> | > around database tables. It would be interesting to ask for the
> | provenance of an
> | > individual items in a table and retrieve all the modifications on
> | each table
> | > including modifications to individual items. The explicit use of
> | database
> | > tables might also encourage the database community to get involved
> | with the
> | > challenge. What do others think on this issue?
> | >
> | > I'm wondering if the questions about external details from the
> | Neptune workflow
> | > (e.g. the types of sensor detail) could be incorporated in the
> | Panstars
> | > workflow? For example, the telescope which the data was collected
> | from?
> | >
> | > The major reservation I have with this workflow is how easy it would
> | be for
> | > others to execute. Given the Pan-STARRS workflow is designed to work
> | with large
> | > data, can the MSR team comment on whether small data sets are
> | available? Also,
> | > given that the implementation requires .Net, how easy could this be
> | run on
> | > non-windows machines? Are there non-windows executables available?
> | >
> | > * myExperiment & Brain Imaging Workflows
> | > If the Panstarrs workflow can not be executed by different teams
> | easily, I think
> | > we should look at selecting one of these options. Can these two teams
> | comment
> | > on how easy it would be for others to use the components within their
> | workflows
> | > without invoking their particular workflow enactment engines?
> | >
> | > I did like the dynamic nature of the Taverna workflow as it makes for
> | a good
> | > case for provenance (e.g. the abstracts returned from PubMed will
> | vary over
> | > time) Could we incorporate this into our selections?
> | >
> | > With that, what do you think?
> | >
> | > Thanks,
> | > Paul
> | >
> | > --------------------------------------------------------------
> | > Paul Groth, Ph.D.
> | > Postdoctoral Research Associate
> | > Information Sciences Institute
> | > University of Southern California
> | > pgroth at isi.edu
> | > Tel:  310 448 8482  Fax: 310 822 0751
> | > http://www.isi.edu/~pgroth/
> | > http://thinklinks.wordpress.org
> | >
> | >
> | >
> | >
> | >
> |
> |
> | --
> | Professor Luc Moreau               tel:   +44 23 8059 4487
> | Electronics and Computer Science   email: l.moreau at ecs.soton.ac.uk
> | University of Southampton          www:   www.ecs.soton.ac.uk/~lavm
> | Southampton SO17 1BJ               skype: prof.luc.moreau
> | United Kingdom                     fring: Luc
> |
> |
> |
> 
> 




More information about the Provenance-challenge-ipaw-info mailing list