[provenance-challenge] Re: review of workflows for pc3

Luc Moreau L.Moreau at ecs.soton.ac.uk
Tue Nov 25 12:13:46 GMT 2008


Hi Paul,
Thanks for this.  The archive shows that quite a few messages have not 
gone through
http://www.ipaw.info/mail/archive.php/

(In particular, the quickstart of the review activity).

About your comment, thanks for selecting the ant workflow. I'd like to 
correct you.
It does contain collections!  xjc generates multiple java files from an 
xsd file, and javac
compiles several files into many classes.  The fact taht we didn't have 
to enumerate all them
in the ant file is the proof that the workflow operates at the level of 
collections!

I will post this on the mailing list once it has been fixed.

Luc


pgroth at ISI.EDU wrote:
> Hi,
>
> To kick start our discussion about what workflows should be used for the third
> provenance challenge, below are my thoughts on which would be most appropriate
> and some questions to the authors. First, let me say that I thought all the
> workflows would provide a good basis for an interesting challenge but to be
> decisive I'm selected two.
>
> The two selection criteria I used were the complexity of the structures within
> the workflows (i.e. did it have loops, hierarchies, collections, etc.) and how
> easy it would be for other teams to get the workflows up and running. I believe
> given the complex control structures in some of these workflows that it would
> be difficult to provide intermediary data sets and thus teams would need to
> execute the workflows themselves unlike previous challenges where dummy
> components could be used.
>
> 1. Build and test workflow
> In terms of being able to execute the workflows, the Software build and testing
> workflow seems by far the easiest to get up and running. Most systems have ant
> and java and the build file can be easily adapted to use Makefiles. Likewise,
> the ant file has a multi-level hierarchy, which is an interesting structure.
> The downside to the workflow is it's lack of complexity, it does not have
> collections or nested data sets. However, I think the workflow would make for a
> simple starting point for testing interoperability before moving on to the more
> complex second workflow. Furthermore, by using an ant file the challenge does
> not become too workflow specific.
>
> 2. MSR-WSU Pan-Starrs workflow
> My first choice for second workflow is the MSR-WSU, Panstarrs workflow. It has a
> number of interesting workflow structures such as if/else as well as loops over
> collections. I also like the the idea of having multiple levels of abstraction
> around database tables. It would be interesting to ask for the provenance of an
> individual items in a table and retrieve all the modifications on each table
> including modifications to individual items. The explicit use of database
> tables might also encourage the database community to get involved with the
> challenge. What do others think on this issue?
>
> I'm wondering if the questions about external details from the Neptune workflow
> (e.g. the types of sensor detail) could be incorporated in the Panstars
> workflow? For example, the telescope which the data was collected from?
>
> The major reservation I have with this workflow is how easy it would be for
> others to execute. Given the Pan-STARRS workflow is designed to work with large
> data, can the MSR team comment on whether small data sets are available? Also,
> given that the implementation requires .Net, how easy could this be run on
> non-windows machines? Are there non-windows executables available?
>
> * myExperiment & Brain Imaging Workflows
> If the Panstarrs workflow can not be executed by different teams easily, I think
> we should look at selecting one of these options. Can these two teams comment
> on how easy it would be for others to use the components within their workflows
> without invoking their particular workflow enactment engines?
>
> I did like the dynamic nature of the Taverna workflow as it makes for a good
> case for provenance (e.g. the abstracts returned from PubMed will vary over
> time) Could we incorporate this into our selections?
>
> With that, what do you think?
>
> Thanks,
> Paul
>
> --------------------------------------------------------------
> Paul Groth, Ph.D.
> Postdoctoral Research Associate
> Information Sciences Institute
> University of Southern California
> pgroth at isi.edu
> Tel:  310 448 8482  Fax: 310 822 0751
> http://www.isi.edu/~pgroth/
> http://thinklinks.wordpress.org
>
>
>
>
>   


-- 
Professor Luc Moreau               tel:   +44 23 8059 4487              
Electronics and Computer Science   email: l.moreau at ecs.soton.ac.uk
University of Southampton          www:   www.ecs.soton.ac.uk/~lavm
Southampton SO17 1BJ               skype: prof.luc.moreau
United Kingdom                     fring: Luc
                                   
                                   



More information about the Provenance-challenge-ipaw-info mailing list