[provenance-challenge] Re: review of workflows for pc3

Satya Sahoo sahoo.2 at wright.edu
Wed Nov 26 15:48:23 GMT 2008


Hi Luc,
We have just released the provenance algebra work as a Microsoft technical report:
http://research.microsoft.com/research/pubs/view.aspx?type=Technical%20Report&id=1587
 
The slides from Rogers presentation at the provenance in workflows workshop at Utah are available for download at: http://knoesis.wright.edu/library/presentations/BargaProvenanceWorkshop.pptx
 
Hope this helps.
 
Best,
Satya

----- Original Message -----
From: Luc Moreau <L.Moreau at ecs.soton.ac.uk>
Date: Wednesday, November 26, 2008 9:47 am
Subject: Re: [provenance-challenge] Re: review of workflows for pc3
To: "provenance-challenge at ipaw.info" <provenance-challenge at ipaw.info>
Cc: Paul Groth <pgroth at ISI.EDU>, Satya Sahoo <sahoo.2 at wright.edu>

> Thanks Yogesh.  Is there some slides or papers about 
> Roger's work?
> 
>  From a challenge view point, it would be useful to 
> characterise the 
> type of provenance we would ideally like
> to capture within the database. It seems that a layered model is 
> particularly appropriate here: the activity level
> description could constitute an OPM account, whereas a more fine-
> grained 
> provenance (with the database sense) could
> form another account.
> 
> Luc
> 
> 
> Yogesh Simmhan wrote:
> > Hi Luc,
> >
> > In the current system, we work around having to instrument the 
> DB by having individual SQL queries wrapped as C# activities. 
> The activities pass through the input params to the 
> parameterized SQL queries. Provenance is captured at the 
> activity level. We also capture the actual queries and query 
> plans from MSSQL server, but don't integrate it with the 
> provenance yet.
> >
> > Roger B. is working on a design and prototype for a more DB 
> centric and semantic approach using materialized views and first 
> class provenance operators. His presentation at the recent 
> provenance in workflows workshop at Utah talked about it 
> (http://wiki.esi.ac.uk/ProvenanceInWorkflows).>
> > Best,
> > --Yogesh
> >
> >
> > | -----Original Message-----
> > | From: provenance-challenge-ipaw-info-bounces at ipaw.info
> > | [mailto:provenance-challenge-ipaw-info-bounces at ipaw.info] On 
> Behalf Of
> > | Luc Moreau
> > | Sent: Wednesday, November 26, 2008 4:02 AM
> > | To: provenance-challenge at ipaw.info; Paul Groth
> > | Cc: Satya Sahoo
> > | Subject: [provenance-challenge] Re: review of workflows for pc3
> > |
> > | Yogesh,
> > |
> > | There is however an interesting technical challenge (probably
> > | appropriate for a provenance challenge!).
> > | If we intend to export provenance information into the OPM 
> format, we
> > | probably need
> > | to capture this information (in part) inside the database 
> processing> | SQL
> > | queries.
> > | Are you already doing this in your system?
> > |
> > | This presents us with an opportunity to have contributions 
> from members
> > | of the database community.
> > | Who is on this list at this moment? (James? Peter? Val? 
> Jan?  Natalia?)
> > |
> > | This will require us to structure the workflow in different 
> "stages"> | where different technologies (including databases)
> > | are involved.
> > |
> > | Can you comment on this?
> > |
> > | Cheers,
> > | Luc
> > |
> > | Yogesh Simmhan wrote:
> > | > Hi Paul,
> > | >
> > | > Thanks for your comments. Regarding the ease of 
> portability of the
> > | Pan-STARRS Load/Merge workflow, all our activities are 
> either SQL
> > | queries and updates, or file system operations. While our current
> > | executables are for MSSQL/C#, the SQL activities are simple 
> enough to
> > | port to any relational DBMS (MySQL, Apache Derby, ...) and 
> programming> | language. The main workflows operate on 3 
> relational tables with about
> > | 50 columns.
> > | >
> > | > If selected, we can provide Java source code using Derby, 
> in addition
> > | to the C# version using MSSQL. We'll also provide textual 
> descriptions> | of the activities to enable them to be ported to 
> other DB/languages.
> > | >
> > | > While the typical Pan-STARRS workflows operate on large 
> datasets,> | there is nothing that prevents the challenge 
> workflows from operating
> > | on a subset of those. Indeed, we use small CSV files and databases
> > | (<1MB) for our own testing that we can provide for the 
> challenge.> | >
> > | > Metadata about the telescope is not part of the normal workflow
> > | pipeline, but we can consider incorporating supplementary 
> annotations> | about the telescope outside the scope of the 
> workflow to see how the
> > | provenance systems embed annotations in OPM and handle annotation
> > | queries.
> > | >
> > | > Best,
> > | > --Yogesh
> > | >
> > | >
> > | > |
> > | > | pgroth at ISI.EDU wrote:
> > | > | > Hi,
> > | > | >
> > | > | > To kick start our discussion about what workflows 
> should be used
> > | for
> > | > | the third
> > | > | > provenance challenge, below are my thoughts on which 
> would be
> > | most
> > | > | appropriate
> > | > | > and some questions to the authors. First, let me say 
> that I
> > | thought
> > | > | all the
> > | > | > workflows would provide a good basis for an 
> interesting challenge
> > | but
> > | > | to be
> > | > | > decisive I'm selected two.
> > | > | >
> > | > | > The two selection criteria I used were the complexity 
> of the
> > | > | structures within
> > | > | > the workflows (i.e. did it have loops, hierarchies, 
> collections,> | etc.)
> > | > | and how
> > | > | > easy it would be for other teams to get the workflows 
> up and
> > | running.
> > | > | I believe
> > | > | > given the complex control structures in some of these 
> workflows> | that
> > | > | it would
> > | > | > be difficult to provide intermediary data sets and 
> thus teams
> > | would
> > | > | need to
> > | > | > execute the workflows themselves unlike previous 
> challenges where
> > | > | dummy
> > | > | > components could be used.
> > | > | >
> > | > | > 1. Build and test workflow
> > | > | > In terms of being able to execute the workflows, the 
> Software> | build
> > | > | and testing
> > | > | > workflow seems by far the easiest to get up and 
> running. Most
> > | systems
> > | > | have ant
> > | > | > and java and the build file can be easily adapted to use
> > | Makefiles.
> > | > | Likewise,
> > | > | > the ant file has a multi-level hierarchy, which is an 
> interesting> | > | structure.
> > | > | > The downside to the workflow is it's lack of 
> complexity, it does
> > | not
> > | > | have
> > | > | > collections or nested data sets. However, I think the 
> workflow> | would
> > | > | make for a
> > | > | > simple starting point for testing interoperability 
> before moving
> > | on
> > | > | to the more
> > | > | > complex second workflow. Furthermore, by using an ant 
> file the
> > | > | challenge does
> > | > | > not become too workflow specific.
> > | > | >
> > | > | > 2. MSR-WSU Pan-Starrs workflow
> > | > | > My first choice for second workflow is the MSR-WSU, 
> Panstarrs> | > | workflow. It has a
> > | > | > number of interesting workflow structures such as 
> if/else as well
> > | as
> > | > | loops over
> > | > | > collections. I also like the the idea of having 
> multiple levels
> > | of
> > | > | abstraction
> > | > | > around database tables. It would be interesting to ask 
> for the
> > | > | provenance of an
> > | > | > individual items in a table and retrieve all the 
> modifications on
> > | > | each table
> > | > | > including modifications to individual items. The 
> explicit use of
> > | > | database
> > | > | > tables might also encourage the database community to get
> > | involved
> > | > | with the
> > | > | > challenge. What do others think on this issue?
> > | > | >
> > | > | > I'm wondering if the questions about external details 
> from the
> > | > | Neptune workflow
> > | > | > (e.g. the types of sensor detail) could be 
> incorporated in the
> > | > | Panstars
> > | > | > workflow? For example, the telescope which the data 
> was collected
> > | > | from?
> > | > | >
> > | > | > The major reservation I have with this workflow is how 
> easy it
> > | would
> > | > | be for
> > | > | > others to execute. Given the Pan-STARRS workflow is 
> designed to
> > | work
> > | > | with large
> > | > | > data, can the MSR team comment on whether small data 
> sets are
> > | > | available? Also,
> > | > | > given that the implementation requires .Net, how easy 
> could this
> > | be
> > | > | run on
> > | > | > non-windows machines? Are there non-windows 
> executables available?
> > | > | >
> > | > | > * myExperiment & Brain Imaging Workflows
> > | > | > If the Panstarrs workflow can not be executed by 
> different teams
> > | > | easily, I think
> > | > | > we should look at selecting one of these options. Can 
> these two
> > | teams
> > | > | comment
> > | > | > on how easy it would be for others to use the 
> components within
> > | their
> > | > | workflows
> > | > | > without invoking their particular workflow enactment 
> engines?> | > | >
> > | > | > I did like the dynamic nature of the Taverna workflow 
> as it makes
> > | for
> > | > | a good
> > | > | > case for provenance (e.g. the abstracts returned from 
> PubMed will
> > | > | vary over
> > | > | > time) Could we incorporate this into our selections?
> > | > | >
> > | > | > With that, what do you think?
> > | > | >
> > | > | > Thanks,
> > | > | > Paul
> > | > | >
> > | > | > -------------------------------------------------------
> -------
> > | > | > Paul Groth, Ph.D.
> > | > | > Postdoctoral Research Associate
> > | > | > Information Sciences Institute
> > | > | > University of Southern California
> > | > | > pgroth at isi.edu
> > | > | > Tel:  310 448 8482  Fax: 310 822 0751
> > | > | > http://www.isi.edu/~pgroth/
> > | > | > http://thinklinks.wordpress.org
> > | > | >
> > | > | >
> > | > | >
> > | > | >
> > | > | >
> > | > |
> > | > |
> > | > | --
> > | > | Professor Luc 
> Moreau               tel:   +44 23 8059 4487
> > | > | Electronics and Computer Science   email: 
> l.moreau at ecs.soton.ac.uk> | > | University of 
> Southampton          www:   www.ecs.soton.ac.uk/~lavm
> > | > | Southampton SO17 
> 1BJ               skype: prof.luc.moreau
> > | > | United 
> Kingdom                     fring: Luc
> > | > |
> > | > |
> > | > |
> > | >
> > | >
> > | >
> > |
> > |
> > | --
> > | Professor Luc 
> Moreau               tel:   +44 23 8059 4487
> > | Electronics and Computer Science   email: 
> l.moreau at ecs.soton.ac.uk> | University of 
> Southampton          www:   www.ecs.soton.ac.uk/~lavm
> > | Southampton SO17 
> 1BJ               skype: prof.luc.moreau
> > | United 
> Kingdom                     fring: Luc
> > |
> > |
> > |
> >
> >
> >   
> 
> 
> -- 
> Professor Luc 
> Moreau               tel:   +44 23 8059 4487              
> Electronics and Computer Science   email: 
> l.moreau at ecs.soton.ac.ukUniversity of 
> Southampton          www:   www.ecs.soton.ac.uk/~lavm
> Southampton SO17 
> 1BJ               skype: prof.luc.moreau
> United 
> Kingdom                     fring: Luc
>                                    
>                                    
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/provenance-challenge-ipaw-info/attachments/20081126/0f55a472/attachment-0001.html 


More information about the Provenance-challenge-ipaw-info mailing list