[provenance-challenge] Re: review of workflows for pc3
Satya Sahoo
sahoo.2 at wright.edu
Wed Nov 26 15:48:23 GMT 2008
Hi Luc,
We have just released the provenance algebra work as a Microsoft technical report:
http://research.microsoft.com/research/pubs/view.aspx?type=Technical%20Report&id=1587
The slides from Rogers presentation at the provenance in workflows workshop at Utah are available for download at: http://knoesis.wright.edu/library/presentations/BargaProvenanceWorkshop.pptx
Hope this helps.
Best,
Satya
----- Original Message -----
From: Luc Moreau <L.Moreau at ecs.soton.ac.uk>
Date: Wednesday, November 26, 2008 9:47 am
Subject: Re: [provenance-challenge] Re: review of workflows for pc3
To: "provenance-challenge at ipaw.info" <provenance-challenge at ipaw.info>
Cc: Paul Groth <pgroth at ISI.EDU>, Satya Sahoo <sahoo.2 at wright.edu>
> Thanks Yogesh. Is there some slides or papers about
> Roger's work?
>
> From a challenge view point, it would be useful to
> characterise the
> type of provenance we would ideally like
> to capture within the database. It seems that a layered model is
> particularly appropriate here: the activity level
> description could constitute an OPM account, whereas a more fine-
> grained
> provenance (with the database sense) could
> form another account.
>
> Luc
>
>
> Yogesh Simmhan wrote:
> > Hi Luc,
> >
> > In the current system, we work around having to instrument the
> DB by having individual SQL queries wrapped as C# activities.
> The activities pass through the input params to the
> parameterized SQL queries. Provenance is captured at the
> activity level. We also capture the actual queries and query
> plans from MSSQL server, but don't integrate it with the
> provenance yet.
> >
> > Roger B. is working on a design and prototype for a more DB
> centric and semantic approach using materialized views and first
> class provenance operators. His presentation at the recent
> provenance in workflows workshop at Utah talked about it
> (http://wiki.esi.ac.uk/ProvenanceInWorkflows).>
> > Best,
> > --Yogesh
> >
> >
> > | -----Original Message-----
> > | From: provenance-challenge-ipaw-info-bounces at ipaw.info
> > | [mailto:provenance-challenge-ipaw-info-bounces at ipaw.info] On
> Behalf Of
> > | Luc Moreau
> > | Sent: Wednesday, November 26, 2008 4:02 AM
> > | To: provenance-challenge at ipaw.info; Paul Groth
> > | Cc: Satya Sahoo
> > | Subject: [provenance-challenge] Re: review of workflows for pc3
> > |
> > | Yogesh,
> > |
> > | There is however an interesting technical challenge (probably
> > | appropriate for a provenance challenge!).
> > | If we intend to export provenance information into the OPM
> format, we
> > | probably need
> > | to capture this information (in part) inside the database
> processing> | SQL
> > | queries.
> > | Are you already doing this in your system?
> > |
> > | This presents us with an opportunity to have contributions
> from members
> > | of the database community.
> > | Who is on this list at this moment? (James? Peter? Val?
> Jan? Natalia?)
> > |
> > | This will require us to structure the workflow in different
> "stages"> | where different technologies (including databases)
> > | are involved.
> > |
> > | Can you comment on this?
> > |
> > | Cheers,
> > | Luc
> > |
> > | Yogesh Simmhan wrote:
> > | > Hi Paul,
> > | >
> > | > Thanks for your comments. Regarding the ease of
> portability of the
> > | Pan-STARRS Load/Merge workflow, all our activities are
> either SQL
> > | queries and updates, or file system operations. While our current
> > | executables are for MSSQL/C#, the SQL activities are simple
> enough to
> > | port to any relational DBMS (MySQL, Apache Derby, ...) and
> programming> | language. The main workflows operate on 3
> relational tables with about
> > | 50 columns.
> > | >
> > | > If selected, we can provide Java source code using Derby,
> in addition
> > | to the C# version using MSSQL. We'll also provide textual
> descriptions> | of the activities to enable them to be ported to
> other DB/languages.
> > | >
> > | > While the typical Pan-STARRS workflows operate on large
> datasets,> | there is nothing that prevents the challenge
> workflows from operating
> > | on a subset of those. Indeed, we use small CSV files and databases
> > | (<1MB) for our own testing that we can provide for the
> challenge.> | >
> > | > Metadata about the telescope is not part of the normal workflow
> > | pipeline, but we can consider incorporating supplementary
> annotations> | about the telescope outside the scope of the
> workflow to see how the
> > | provenance systems embed annotations in OPM and handle annotation
> > | queries.
> > | >
> > | > Best,
> > | > --Yogesh
> > | >
> > | >
> > | > |
> > | > | pgroth at ISI.EDU wrote:
> > | > | > Hi,
> > | > | >
> > | > | > To kick start our discussion about what workflows
> should be used
> > | for
> > | > | the third
> > | > | > provenance challenge, below are my thoughts on which
> would be
> > | most
> > | > | appropriate
> > | > | > and some questions to the authors. First, let me say
> that I
> > | thought
> > | > | all the
> > | > | > workflows would provide a good basis for an
> interesting challenge
> > | but
> > | > | to be
> > | > | > decisive I'm selected two.
> > | > | >
> > | > | > The two selection criteria I used were the complexity
> of the
> > | > | structures within
> > | > | > the workflows (i.e. did it have loops, hierarchies,
> collections,> | etc.)
> > | > | and how
> > | > | > easy it would be for other teams to get the workflows
> up and
> > | running.
> > | > | I believe
> > | > | > given the complex control structures in some of these
> workflows> | that
> > | > | it would
> > | > | > be difficult to provide intermediary data sets and
> thus teams
> > | would
> > | > | need to
> > | > | > execute the workflows themselves unlike previous
> challenges where
> > | > | dummy
> > | > | > components could be used.
> > | > | >
> > | > | > 1. Build and test workflow
> > | > | > In terms of being able to execute the workflows, the
> Software> | build
> > | > | and testing
> > | > | > workflow seems by far the easiest to get up and
> running. Most
> > | systems
> > | > | have ant
> > | > | > and java and the build file can be easily adapted to use
> > | Makefiles.
> > | > | Likewise,
> > | > | > the ant file has a multi-level hierarchy, which is an
> interesting> | > | structure.
> > | > | > The downside to the workflow is it's lack of
> complexity, it does
> > | not
> > | > | have
> > | > | > collections or nested data sets. However, I think the
> workflow> | would
> > | > | make for a
> > | > | > simple starting point for testing interoperability
> before moving
> > | on
> > | > | to the more
> > | > | > complex second workflow. Furthermore, by using an ant
> file the
> > | > | challenge does
> > | > | > not become too workflow specific.
> > | > | >
> > | > | > 2. MSR-WSU Pan-Starrs workflow
> > | > | > My first choice for second workflow is the MSR-WSU,
> Panstarrs> | > | workflow. It has a
> > | > | > number of interesting workflow structures such as
> if/else as well
> > | as
> > | > | loops over
> > | > | > collections. I also like the the idea of having
> multiple levels
> > | of
> > | > | abstraction
> > | > | > around database tables. It would be interesting to ask
> for the
> > | > | provenance of an
> > | > | > individual items in a table and retrieve all the
> modifications on
> > | > | each table
> > | > | > including modifications to individual items. The
> explicit use of
> > | > | database
> > | > | > tables might also encourage the database community to get
> > | involved
> > | > | with the
> > | > | > challenge. What do others think on this issue?
> > | > | >
> > | > | > I'm wondering if the questions about external details
> from the
> > | > | Neptune workflow
> > | > | > (e.g. the types of sensor detail) could be
> incorporated in the
> > | > | Panstars
> > | > | > workflow? For example, the telescope which the data
> was collected
> > | > | from?
> > | > | >
> > | > | > The major reservation I have with this workflow is how
> easy it
> > | would
> > | > | be for
> > | > | > others to execute. Given the Pan-STARRS workflow is
> designed to
> > | work
> > | > | with large
> > | > | > data, can the MSR team comment on whether small data
> sets are
> > | > | available? Also,
> > | > | > given that the implementation requires .Net, how easy
> could this
> > | be
> > | > | run on
> > | > | > non-windows machines? Are there non-windows
> executables available?
> > | > | >
> > | > | > * myExperiment & Brain Imaging Workflows
> > | > | > If the Panstarrs workflow can not be executed by
> different teams
> > | > | easily, I think
> > | > | > we should look at selecting one of these options. Can
> these two
> > | teams
> > | > | comment
> > | > | > on how easy it would be for others to use the
> components within
> > | their
> > | > | workflows
> > | > | > without invoking their particular workflow enactment
> engines?> | > | >
> > | > | > I did like the dynamic nature of the Taverna workflow
> as it makes
> > | for
> > | > | a good
> > | > | > case for provenance (e.g. the abstracts returned from
> PubMed will
> > | > | vary over
> > | > | > time) Could we incorporate this into our selections?
> > | > | >
> > | > | > With that, what do you think?
> > | > | >
> > | > | > Thanks,
> > | > | > Paul
> > | > | >
> > | > | > -------------------------------------------------------
> -------
> > | > | > Paul Groth, Ph.D.
> > | > | > Postdoctoral Research Associate
> > | > | > Information Sciences Institute
> > | > | > University of Southern California
> > | > | > pgroth at isi.edu
> > | > | > Tel: 310 448 8482 Fax: 310 822 0751
> > | > | > http://www.isi.edu/~pgroth/
> > | > | > http://thinklinks.wordpress.org
> > | > | >
> > | > | >
> > | > | >
> > | > | >
> > | > | >
> > | > |
> > | > |
> > | > | --
> > | > | Professor Luc
> Moreau tel: +44 23 8059 4487
> > | > | Electronics and Computer Science email:
> l.moreau at ecs.soton.ac.uk> | > | University of
> Southampton www: www.ecs.soton.ac.uk/~lavm
> > | > | Southampton SO17
> 1BJ skype: prof.luc.moreau
> > | > | United
> Kingdom fring: Luc
> > | > |
> > | > |
> > | > |
> > | >
> > | >
> > | >
> > |
> > |
> > | --
> > | Professor Luc
> Moreau tel: +44 23 8059 4487
> > | Electronics and Computer Science email:
> l.moreau at ecs.soton.ac.uk> | University of
> Southampton www: www.ecs.soton.ac.uk/~lavm
> > | Southampton SO17
> 1BJ skype: prof.luc.moreau
> > | United
> Kingdom fring: Luc
> > |
> > |
> > |
> >
> >
> >
>
>
> --
> Professor Luc
> Moreau tel: +44 23 8059 4487
> Electronics and Computer Science email:
> l.moreau at ecs.soton.ac.ukUniversity of
> Southampton www: www.ecs.soton.ac.uk/~lavm
> Southampton SO17
> 1BJ skype: prof.luc.moreau
> United
> Kingdom fring: Luc
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/provenance-challenge-ipaw-info/attachments/20081126/0f55a472/attachment-0001.html
More information about the Provenance-challenge-ipaw-info
mailing list