[provenance-challenge] Re: review of workflows for pc3
Luc Moreau
L.Moreau at ecs.soton.ac.uk
Wed Nov 26 22:54:13 GMT 2008
yes, even before the credit crunch, it tasted like an expensive hobby ;-)
Luc
Roger Barga wrote:
> This was no doubt one of the finest whiskeys I have ever tasted. However, the saying "champagne tastes on a beer budget" comes to mind, so I will enjoy my 10 year Ardbeg and Ardbeg Nam Best.
>
> roger
> ________________________________________
> From: provenance-challenge-ipaw-info-bounces at ipaw.info [provenance-challenge-ipaw-info-bounces at ipaw.info] On Behalf Of Luc Moreau [L.Moreau at ecs.soton.ac.uk]
> Sent: Wednesday, November 26, 2008 2:43 PM
> To: provenance-challenge at ipaw.info
> Cc: Satya Sahoo; Paul Groth
> Subject: [provenance-challenge] Re: review of workflows for pc3
>
> Roger Barga wrote:
>
>> PS - the last time we met you mentioned a single malt with
>> 'provenance' in the name. Was that an Ardbeg Provenance by chance?
>> If so, I had a chance to try it on a recent trip to Edinburgh -
>> absolutely wonderful.
>>
>>
>
> It was! (and for the couple remaining glasses, no doubt will be).
>
> Trying to search for the provenance of 'provenance whisky', I found two
> interesting pages:
>
> http://www.thewhiskyexchange.com/P-6279.aspx
> http://www.whiskymag.com/whisky/brand/ardbeg/whisky615.html
>
> It is really great!
> Luc
>
>
>> ________________________________________
>> From: provenance-challenge-ipaw-info-bounces at ipaw.info
>> [provenance-challenge-ipaw-info-bounces at ipaw.info] On Behalf Of Luc
>> Moreau [L.Moreau at ecs.soton.ac.uk]
>> Sent: Wednesday, November 26, 2008 6:44 AM
>> To: provenance-challenge at ipaw.info
>> Cc: Satya Sahoo; Paul Groth
>> Subject: [provenance-challenge] Re: review of workflows for pc3
>>
>> Thanks Yogesh. Is there some slides or papers about Roger's work?
>>
>> From a challenge view point, it would be useful to characterise the
>> type of provenance we would ideally like
>> to capture within the database. It seems that a layered model is
>> particularly appropriate here: the activity level
>> description could constitute an OPM account, whereas a more fine-grained
>> provenance (with the database sense) could
>> form another account.
>>
>> Luc
>>
>>
>> Yogesh Simmhan wrote:
>>
>>> Hi Luc,
>>>
>>> In the current system, we work around having to instrument the DB by
>>>
>> having individual SQL queries wrapped as C# activities. The activities
>> pass through the input params to the parameterized SQL queries.
>> Provenance is captured at the activity level. We also capture the
>> actual queries and query plans from MSSQL server, but don't integrate
>> it with the provenance yet.
>>
>>> Roger B. is working on a design and prototype for a more DB centric
>>>
>> and semantic approach using materialized views and first class
>> provenance operators. His presentation at the recent provenance in
>> workflows workshop at Utah talked about it
>> (http://wiki.esi.ac.uk/ProvenanceInWorkflows).
>>
>>> Best,
>>> --Yogesh
>>>
>>>
>>> | -----Original Message-----
>>> | From: provenance-challenge-ipaw-info-bounces at ipaw.info
>>> | [mailto:provenance-challenge-ipaw-info-bounces at ipaw.info] On Behalf Of
>>> | Luc Moreau
>>> | Sent: Wednesday, November 26, 2008 4:02 AM
>>> | To: provenance-challenge at ipaw.info; Paul Groth
>>> | Cc: Satya Sahoo
>>> | Subject: [provenance-challenge] Re: review of workflows for pc3
>>> |
>>> | Yogesh,
>>> |
>>> | There is however an interesting technical challenge (probably
>>> | appropriate for a provenance challenge!).
>>> | If we intend to export provenance information into the OPM format, we
>>> | probably need
>>> | to capture this information (in part) inside the database processing
>>> | SQL
>>> | queries.
>>> | Are you already doing this in your system?
>>> |
>>> | This presents us with an opportunity to have contributions from
>>>
>> members
>>
>>> | of the database community.
>>> | Who is on this list at this moment? (James? Peter? Val? Jan?
>>>
>> Natalia?)
>>
>>> |
>>> | This will require us to structure the workflow in different "stages"
>>> | where different technologies (including databases)
>>> | are involved.
>>> |
>>> | Can you comment on this?
>>> |
>>> | Cheers,
>>> | Luc
>>> |
>>> | Yogesh Simmhan wrote:
>>> | > Hi Paul,
>>> | >
>>> | > Thanks for your comments. Regarding the ease of portability of the
>>> | Pan-STARRS Load/Merge workflow, all our activities are either SQL
>>> | queries and updates, or file system operations. While our current
>>> | executables are for MSSQL/C#, the SQL activities are simple enough to
>>> | port to any relational DBMS (MySQL, Apache Derby, ...) and programming
>>> | language. The main workflows operate on 3 relational tables with about
>>> | 50 columns.
>>> | >
>>> | > If selected, we can provide Java source code using Derby, in
>>>
>> addition
>>
>>> | to the C# version using MSSQL. We'll also provide textual descriptions
>>> | of the activities to enable them to be ported to other DB/languages.
>>> | >
>>> | > While the typical Pan-STARRS workflows operate on large datasets,
>>> | there is nothing that prevents the challenge workflows from operating
>>> | on a subset of those. Indeed, we use small CSV files and databases
>>> | (<1MB) for our own testing that we can provide for the challenge.
>>> | >
>>> | > Metadata about the telescope is not part of the normal workflow
>>> | pipeline, but we can consider incorporating supplementary annotations
>>> | about the telescope outside the scope of the workflow to see how the
>>> | provenance systems embed annotations in OPM and handle annotation
>>> | queries.
>>> | >
>>> | > Best,
>>> | > --Yogesh
>>> | >
>>> | >
>>> | > |
>>> | > | pgroth at ISI.EDU wrote:
>>> | > | > Hi,
>>> | > | >
>>> | > | > To kick start our discussion about what workflows should be used
>>> | for
>>> | > | the third
>>> | > | > provenance challenge, below are my thoughts on which would be
>>> | most
>>> | > | appropriate
>>> | > | > and some questions to the authors. First, let me say that I
>>> | thought
>>> | > | all the
>>> | > | > workflows would provide a good basis for an interesting
>>>
>> challenge
>>
>>> | but
>>> | > | to be
>>> | > | > decisive I'm selected two.
>>> | > | >
>>> | > | > The two selection criteria I used were the complexity of the
>>> | > | structures within
>>> | > | > the workflows (i.e. did it have loops, hierarchies, collections,
>>> | etc.)
>>> | > | and how
>>> | > | > easy it would be for other teams to get the workflows up and
>>> | running.
>>> | > | I believe
>>> | > | > given the complex control structures in some of these workflows
>>> | that
>>> | > | it would
>>> | > | > be difficult to provide intermediary data sets and thus teams
>>> | would
>>> | > | need to
>>> | > | > execute the workflows themselves unlike previous challenges
>>>
>> where
>>
>>> | > | dummy
>>> | > | > components could be used.
>>> | > | >
>>> | > | > 1. Build and test workflow
>>> | > | > In terms of being able to execute the workflows, the Software
>>> | build
>>> | > | and testing
>>> | > | > workflow seems by far the easiest to get up and running. Most
>>> | systems
>>> | > | have ant
>>> | > | > and java and the build file can be easily adapted to use
>>> | Makefiles.
>>> | > | Likewise,
>>> | > | > the ant file has a multi-level hierarchy, which is an
>>>
>> interesting
>>
>>> | > | structure.
>>> | > | > The downside to the workflow is it's lack of complexity, it does
>>> | not
>>> | > | have
>>> | > | > collections or nested data sets. However, I think the workflow
>>> | would
>>> | > | make for a
>>> | > | > simple starting point for testing interoperability before moving
>>> | on
>>> | > | to the more
>>> | > | > complex second workflow. Furthermore, by using an ant file the
>>> | > | challenge does
>>> | > | > not become too workflow specific.
>>> | > | >
>>> | > | > 2. MSR-WSU Pan-Starrs workflow
>>> | > | > My first choice for second workflow is the MSR-WSU, Panstarrs
>>> | > | workflow. It has a
>>> | > | > number of interesting workflow structures such as if/else as
>>>
>> well
>>
>>> | as
>>> | > | loops over
>>> | > | > collections. I also like the the idea of having multiple levels
>>> | of
>>> | > | abstraction
>>> | > | > around database tables. It would be interesting to ask for the
>>> | > | provenance of an
>>> | > | > individual items in a table and retrieve all the
>>>
>> modifications on
>>
>>> | > | each table
>>> | > | > including modifications to individual items. The explicit use of
>>> | > | database
>>> | > | > tables might also encourage the database community to get
>>> | involved
>>> | > | with the
>>> | > | > challenge. What do others think on this issue?
>>> | > | >
>>> | > | > I'm wondering if the questions about external details from the
>>> | > | Neptune workflow
>>> | > | > (e.g. the types of sensor detail) could be incorporated in the
>>> | > | Panstars
>>> | > | > workflow? For example, the telescope which the data was
>>>
>> collected
>>
>>> | > | from?
>>> | > | >
>>> | > | > The major reservation I have with this workflow is how easy it
>>> | would
>>> | > | be for
>>> | > | > others to execute. Given the Pan-STARRS workflow is designed to
>>> | work
>>> | > | with large
>>> | > | > data, can the MSR team comment on whether small data sets are
>>> | > | available? Also,
>>> | > | > given that the implementation requires .Net, how easy could this
>>> | be
>>> | > | run on
>>> | > | > non-windows machines? Are there non-windows executables
>>>
>> available?
>>
>>> | > | >
>>> | > | > * myExperiment & Brain Imaging Workflows
>>> | > | > If the Panstarrs workflow can not be executed by different teams
>>> | > | easily, I think
>>> | > | > we should look at selecting one of these options. Can these two
>>> | teams
>>> | > | comment
>>> | > | > on how easy it would be for others to use the components within
>>> | their
>>> | > | workflows
>>> | > | > without invoking their particular workflow enactment engines?
>>> | > | >
>>> | > | > I did like the dynamic nature of the Taverna workflow as it
>>>
>> makes
>>
>>> | for
>>> | > | a good
>>> | > | > case for provenance (e.g. the abstracts returned from PubMed
>>>
>> will
>>
>>> | > | vary over
>>> | > | > time) Could we incorporate this into our selections?
>>> | > | >
>>> | > | > With that, what do you think?
>>> | > | >
>>> | > | > Thanks,
>>> | > | > Paul
>>> | > | >
>>> | > | > --------------------------------------------------------------
>>> | > | > Paul Groth, Ph.D.
>>> | > | > Postdoctoral Research Associate
>>> | > | > Information Sciences Institute
>>> | > | > University of Southern California
>>> | > | > pgroth at isi.edu
>>> | > | > Tel: 310 448 8482 Fax: 310 822 0751
>>> | > | > http://www.isi.edu/~pgroth/
>>> | > | > http://thinklinks.wordpress.org
>>> | > | >
>>> | > | >
>>> | > | >
>>> | > | >
>>> | > | >
>>> | > |
>>> | > |
>>> | > | --
>>> | > | Professor Luc Moreau tel: +44 23 8059 4487
>>> | > | Electronics and Computer Science email: l.moreau at ecs.soton.ac.uk
>>> | > | University of Southampton www:
>>>
>> www.ecs.soton.ac.uk/~lavm
>>
>>> | > | Southampton SO17 1BJ skype: prof.luc.moreau
>>> | > | United Kingdom fring: Luc
>>> | > |
>>> | > |
>>> | > |
>>> | >
>>> | >
>>> | >
>>> |
>>> |
>>> | --
>>> | Professor Luc Moreau tel: +44 23 8059 4487
>>> | Electronics and Computer Science email: l.moreau at ecs.soton.ac.uk
>>> | University of Southampton www: www.ecs.soton.ac.uk/~lavm
>>> | Southampton SO17 1BJ skype: prof.luc.moreau
>>> | United Kingdom fring: Luc
>>> |
>>> |
>>> |
>>>
>>>
>>>
>>>
>> --
>> Professor Luc Moreau tel: +44 23 8059 4487
>> Electronics and Computer Science email: l.moreau at ecs.soton.ac.uk
>> University of Southampton www: www.ecs.soton.ac.uk/~lavm
>> Southampton SO17 1BJ skype: prof.luc.moreau
>> United Kingdom fring: Luc
>>
>>
>
>
> --
> Professor Luc Moreau tel: +44 23 8059 4487
> Electronics and Computer Science email: l.moreau at ecs.soton.ac.uk
> University of Southampton www: www.ecs.soton.ac.uk/~lavm
> Southampton SO17 1BJ skype: prof.luc.moreau
> United Kingdom fring: Luc
>
>
--
Professor Luc Moreau tel: +44 23 8059 4487
Electronics and Computer Science email: l.moreau at ecs.soton.ac.uk
University of Southampton www: www.ecs.soton.ac.uk/~lavm
Southampton SO17 1BJ skype: prof.luc.moreau
United Kingdom fring: Luc
More information about the Provenance-challenge-ipaw-info
mailing list