[provenance-challenge] Re: review of workflows for pc3
Luc Moreau
L.Moreau at ecs.soton.ac.uk
Thu Nov 27 09:15:55 GMT 2008
yapw! (yet another provenance whisky!)
Jose Manuel Gómez Pérez wrote:
> Luc, Roger,
>
> Sorry to break in but perhaps you are referring to this whiskey? I found
> it in Inverness, though to be honest I didn't dare buying it. Anyway, I
> couldn't resist taking a picture.
>
> Cheers,
> Jose
>
>
> Luc Moreau wrote:
>
>> Roger Barga wrote:
>>
>>> PS - the last time we met you mentioned a single malt with
>>> 'provenance' in the name. Was that an Ardbeg Provenance by chance?
>>> If so, I had a chance to try it on a recent trip to Edinburgh -
>>> absolutely wonderful.
>>>
>>>
>> It was! (and for the couple remaining glasses, no doubt will be).
>>
>> Trying to search for the provenance of 'provenance whisky', I found two
>> interesting pages:
>>
>> http://www.thewhiskyexchange.com/P-6279.aspx
>> http://www.whiskymag.com/whisky/brand/ardbeg/whisky615.html
>>
>> It is really great!
>> Luc
>>
>>
>>> ________________________________________
>>> From: provenance-challenge-ipaw-info-bounces at ipaw.info
>>> [provenance-challenge-ipaw-info-bounces at ipaw.info] On Behalf Of Luc
>>> Moreau [L.Moreau at ecs.soton.ac.uk]
>>> Sent: Wednesday, November 26, 2008 6:44 AM
>>> To: provenance-challenge at ipaw.info
>>> Cc: Satya Sahoo; Paul Groth
>>> Subject: [provenance-challenge] Re: review of workflows for pc3
>>>
>>> Thanks Yogesh. Is there some slides or papers about Roger's work?
>>>
>>> From a challenge view point, it would be useful to characterise the
>>> type of provenance we would ideally like
>>> to capture within the database. It seems that a layered model is
>>> particularly appropriate here: the activity level
>>> description could constitute an OPM account, whereas a more fine-grained
>>> provenance (with the database sense) could
>>> form another account.
>>>
>>> Luc
>>>
>>>
>>> Yogesh Simmhan wrote:
>>>
>>>> Hi Luc,
>>>>
>>>> In the current system, we work around having to instrument the DB by
>>>>
>>> having individual SQL queries wrapped as C# activities. The activities
>>> pass through the input params to the parameterized SQL queries.
>>> Provenance is captured at the activity level. We also capture the
>>> actual queries and query plans from MSSQL server, but don't integrate
>>> it with the provenance yet.
>>>
>>>> Roger B. is working on a design and prototype for a more DB centric
>>>>
>>> and semantic approach using materialized views and first class
>>> provenance operators. His presentation at the recent provenance in
>>> workflows workshop at Utah talked about it
>>> (http://wiki.esi.ac.uk/ProvenanceInWorkflows).
>>>
>>>> Best,
>>>> --Yogesh
>>>>
>>>>
>>>> | -----Original Message-----
>>>> | From: provenance-challenge-ipaw-info-bounces at ipaw.info
>>>> | [mailto:provenance-challenge-ipaw-info-bounces at ipaw.info] On
>>>>
>>> Behalf Of
>>>
>>>> | Luc Moreau
>>>> | Sent: Wednesday, November 26, 2008 4:02 AM
>>>> | To: provenance-challenge at ipaw.info; Paul Groth
>>>> | Cc: Satya Sahoo
>>>> | Subject: [provenance-challenge] Re: review of workflows for pc3
>>>> |
>>>> | Yogesh,
>>>> |
>>>> | There is however an interesting technical challenge (probably
>>>> | appropriate for a provenance challenge!).
>>>> | If we intend to export provenance information into the OPM format, we
>>>> | probably need
>>>> | to capture this information (in part) inside the database processing
>>>> | SQL
>>>> | queries.
>>>> | Are you already doing this in your system?
>>>> |
>>>> | This presents us with an opportunity to have contributions from
>>>>
>>> members
>>>
>>>> | of the database community.
>>>> | Who is on this list at this moment? (James? Peter? Val? Jan?
>>>>
>>> Natalia?)
>>>
>>>> |
>>>> | This will require us to structure the workflow in different "stages"
>>>> | where different technologies (including databases)
>>>> | are involved.
>>>> |
>>>> | Can you comment on this?
>>>> |
>>>> | Cheers,
>>>> | Luc
>>>> |
>>>> | Yogesh Simmhan wrote:
>>>> | > Hi Paul,
>>>> | >
>>>> | > Thanks for your comments. Regarding the ease of portability of the
>>>> | Pan-STARRS Load/Merge workflow, all our activities are either SQL
>>>> | queries and updates, or file system operations. While our current
>>>> | executables are for MSSQL/C#, the SQL activities are simple enough to
>>>> | port to any relational DBMS (MySQL, Apache Derby, ...) and
>>>>
>>> programming
>>>
>>>> | language. The main workflows operate on 3 relational tables with
>>>>
>>> about
>>>
>>>> | 50 columns.
>>>> | >
>>>> | > If selected, we can provide Java source code using Derby, in
>>>>
>>> addition
>>>
>>>> | to the C# version using MSSQL. We'll also provide textual
>>>>
>>> descriptions
>>>
>>>> | of the activities to enable them to be ported to other DB/languages.
>>>> | >
>>>> | > While the typical Pan-STARRS workflows operate on large datasets,
>>>> | there is nothing that prevents the challenge workflows from operating
>>>> | on a subset of those. Indeed, we use small CSV files and databases
>>>> | (<1MB) for our own testing that we can provide for the challenge.
>>>> | >
>>>> | > Metadata about the telescope is not part of the normal workflow
>>>> | pipeline, but we can consider incorporating supplementary annotations
>>>> | about the telescope outside the scope of the workflow to see how the
>>>> | provenance systems embed annotations in OPM and handle annotation
>>>> | queries.
>>>> | >
>>>> | > Best,
>>>> | > --Yogesh
>>>> | >
>>>> | >
>>>> | > |
>>>> | > | pgroth at ISI.EDU wrote:
>>>> | > | > Hi,
>>>> | > | >
>>>> | > | > To kick start our discussion about what workflows should be
>>>>
>>> used
>>>
>>>> | for
>>>> | > | the third
>>>> | > | > provenance challenge, below are my thoughts on which would be
>>>> | most
>>>> | > | appropriate
>>>> | > | > and some questions to the authors. First, let me say that I
>>>> | thought
>>>> | > | all the
>>>> | > | > workflows would provide a good basis for an interesting
>>>>
>>> challenge
>>>
>>>> | but
>>>> | > | to be
>>>> | > | > decisive I'm selected two.
>>>> | > | >
>>>> | > | > The two selection criteria I used were the complexity of the
>>>> | > | structures within
>>>> | > | > the workflows (i.e. did it have loops, hierarchies,
>>>>
>>> collections,
>>>
>>>> | etc.)
>>>> | > | and how
>>>> | > | > easy it would be for other teams to get the workflows up and
>>>> | running.
>>>> | > | I believe
>>>> | > | > given the complex control structures in some of these workflows
>>>> | that
>>>> | > | it would
>>>> | > | > be difficult to provide intermediary data sets and thus teams
>>>> | would
>>>> | > | need to
>>>> | > | > execute the workflows themselves unlike previous challenges
>>>>
>>> where
>>>
>>>> | > | dummy
>>>> | > | > components could be used.
>>>> | > | >
>>>> | > | > 1. Build and test workflow
>>>> | > | > In terms of being able to execute the workflows, the Software
>>>> | build
>>>> | > | and testing
>>>> | > | > workflow seems by far the easiest to get up and running. Most
>>>> | systems
>>>> | > | have ant
>>>> | > | > and java and the build file can be easily adapted to use
>>>> | Makefiles.
>>>> | > | Likewise,
>>>> | > | > the ant file has a multi-level hierarchy, which is an
>>>>
>>> interesting
>>>
>>>> | > | structure.
>>>> | > | > The downside to the workflow is it's lack of complexity, it
>>>>
>>> does
>>>
>>>> | not
>>>> | > | have
>>>> | > | > collections or nested data sets. However, I think the workflow
>>>> | would
>>>> | > | make for a
>>>> | > | > simple starting point for testing interoperability before
>>>>
>>> moving
>>>
>>>> | on
>>>> | > | to the more
>>>> | > | > complex second workflow. Furthermore, by using an ant file the
>>>> | > | challenge does
>>>> | > | > not become too workflow specific.
>>>> | > | >
>>>> | > | > 2. MSR-WSU Pan-Starrs workflow
>>>> | > | > My first choice for second workflow is the MSR-WSU, Panstarrs
>>>> | > | workflow. It has a
>>>> | > | > number of interesting workflow structures such as if/else as
>>>>
>>> well
>>>
>>>> | as
>>>> | > | loops over
>>>> | > | > collections. I also like the the idea of having multiple levels
>>>> | of
>>>> | > | abstraction
>>>> | > | > around database tables. It would be interesting to ask for the
>>>> | > | provenance of an
>>>> | > | > individual items in a table and retrieve all the
>>>>
>>> modifications on
>>>
>>>> | > | each table
>>>> | > | > including modifications to individual items. The explicit
>>>>
>>> use of
>>>
>>>> | > | database
>>>> | > | > tables might also encourage the database community to get
>>>> | involved
>>>> | > | with the
>>>> | > | > challenge. What do others think on this issue?
>>>> | > | >
>>>> | > | > I'm wondering if the questions about external details from the
>>>> | > | Neptune workflow
>>>> | > | > (e.g. the types of sensor detail) could be incorporated in the
>>>> | > | Panstars
>>>> | > | > workflow? For example, the telescope which the data was
>>>>
>>> collected
>>>
>>>> | > | from?
>>>> | > | >
>>>> | > | > The major reservation I have with this workflow is how easy it
>>>> | would
>>>> | > | be for
>>>> | > | > others to execute. Given the Pan-STARRS workflow is designed to
>>>> | work
>>>> | > | with large
>>>> | > | > data, can the MSR team comment on whether small data sets are
>>>> | > | available? Also,
>>>> | > | > given that the implementation requires .Net, how easy could
>>>>
>>> this
>>>
>>>> | be
>>>> | > | run on
>>>> | > | > non-windows machines? Are there non-windows executables
>>>>
>>> available?
>>>
>>>> | > | >
>>>> | > | > * myExperiment & Brain Imaging Workflows
>>>> | > | > If the Panstarrs workflow can not be executed by different
>>>>
>>> teams
>>>
>>>> | > | easily, I think
>>>> | > | > we should look at selecting one of these options. Can these two
>>>> | teams
>>>> | > | comment
>>>> | > | > on how easy it would be for others to use the components within
>>>> | their
>>>> | > | workflows
>>>> | > | > without invoking their particular workflow enactment engines?
>>>> | > | >
>>>> | > | > I did like the dynamic nature of the Taverna workflow as it
>>>>
>>> makes
>>>
>>>> | for
>>>> | > | a good
>>>> | > | > case for provenance (e.g. the abstracts returned from PubMed
>>>>
>>> will
>>>
>>>> | > | vary over
>>>> | > | > time) Could we incorporate this into our selections?
>>>> | > | >
>>>> | > | > With that, what do you think?
>>>> | > | >
>>>> | > | > Thanks,
>>>> | > | > Paul
>>>> | > | >
>>>> | > | > --------------------------------------------------------------
>>>> | > | > Paul Groth, Ph.D.
>>>> | > | > Postdoctoral Research Associate
>>>> | > | > Information Sciences Institute
>>>> | > | > University of Southern California
>>>> | > | > pgroth at isi.edu
>>>> | > | > Tel: 310 448 8482 Fax: 310 822 0751
>>>> | > | > http://www.isi.edu/~pgroth/
>>>> | > | > http://thinklinks.wordpress.org
>>>> | > | >
>>>> | > | >
>>>> | > | >
>>>> | > | >
>>>> | > | >
>>>> | > |
>>>> | > |
>>>> | > | --
>>>> | > | Professor Luc Moreau tel: +44 23 8059 4487
>>>> | > | Electronics and Computer Science email:
>>>>
>>> l.moreau at ecs.soton.ac.uk
>>>
>>>> | > | University of Southampton www:
>>>>
>>> www.ecs.soton.ac.uk/~lavm
>>>
>>>> | > | Southampton SO17 1BJ skype: prof.luc.moreau
>>>> | > | United Kingdom fring: Luc
>>>> | > |
>>>> | > |
>>>> | > |
>>>> | >
>>>> | >
>>>> | >
>>>> |
>>>> |
>>>> | --
>>>> | Professor Luc Moreau tel: +44 23 8059 4487
>>>> | Electronics and Computer Science email: l.moreau at ecs.soton.ac.uk
>>>> | University of Southampton www: www.ecs.soton.ac.uk/~lavm
>>>> | Southampton SO17 1BJ skype: prof.luc.moreau
>>>> | United Kingdom fring: Luc
>>>> |
>>>> |
>>>> |
>>>>
>>>>
>>>>
>>>>
>>> --
>>> Professor Luc Moreau tel: +44 23 8059 4487
>>> Electronics and Computer Science email: l.moreau at ecs.soton.ac.uk
>>> University of Southampton www: www.ecs.soton.ac.uk/~lavm
>>> Southampton SO17 1BJ skype: prof.luc.moreau
>>> United Kingdom fring: Luc
>>>
>>>
>>
>
> --
>
> Jose Manuel Gomez-Perez
> Research Manager
> jmgomez at isoco.com
> #T +34913349778
> #M +34609077103
> Pedro de Valdivia, 10
> 28006 Madrid, Spain
>
> iSOCO
> enabling the networked economy
> www.isoco.com
>
> P Please consider your environmental responsibility before printing this
> e-mail
>
>
>
>
> ------------------------------------------------------------------------
>
--
Professor Luc Moreau tel: +44 23 8059 4487
Electronics and Computer Science email: l.moreau at ecs.soton.ac.uk
University of Southampton www: www.ecs.soton.ac.uk/~lavm
Southampton SO17 1BJ skype: prof.luc.moreau
United Kingdom fring: Luc
More information about the Provenance-challenge-ipaw-info
mailing list