[provenance-challenge] Re: PC3 workflow code available and schedule

Simon Miles drsimonmiles at googlemail.com
Thu Feb 26 18:35:57 GMT 2009


Hello Paul, Yogesh,

I've got a few suggestions of queries for the provenance challenge.  I
will not be offended if you ignore those you don't find interesting
enough to use :-).  I include the queries below, but am happy to put
them on the Wiki myself if you tell me to.

=========

1. For a given detection, which CSV files contributed to it?

Basic sample answer: The CSV file containing the Detection table.

Advanced sample answer: The CSV file containing the Detection table,
CSV file containing the Image table (as the image is an attribute of
the detection), and CSV file containing the FrameMetadata table (as
the frame metadata is an attribute of the image).

=========

2. A CSV or header file is deleted during the workflow's execution.
How much time expired between a successful IsMatchCSVFileTables test
(when the file existed) and an unsuccessful IsExistsCSVFile test (when
the file had been deleted)?

Sample answer: 3ms

For testing the above query, we it may be simplest to edit the
workflow to include deletion of the CSV file as a step.

=========

3. The user considers a table to contain values they do not expect.
Was the range check (IsMatchTableColumnRanges) performed for this
table?

Sample answer: Yes

=========

4. The workflow halts due to failing an IsMatchTableColumnRanges
check.  How many tables successfully loaded before the workflow halted
due to a failed check?

Sample answer: 2

=========

Finally, a couple of questions inspired by dynamic program slicing:

5. Which operation executions were strictly necessary for the Image
table to contain a particular (non-computed) value?

Sample answer: call of ReadCSVReadyFile, call of CreateEmptyLoadDB,
2nd call of ReadCSVFileColumnNames, 2nd call of LoadCSVFileIntoTable
(2nd calls because Image is loaded in the 2nd iteration of the for
loop, excluded checks because they do not change anything, excluded
UpdatedComputedColumns because it is non-computed, excluded
CompactDatabase because it does not affect the value).

=========

6. Which pairs of procedures in the workflow could be swapped and the
same result still be obtained (given the particular data input)?

Sample answer: (I won't enumerate them all, but I think some can be
swapped as the checks in particular are not causally dependent, but we
cannot swap those inside the loop with those outside).

Thanks,
Simon

2009/2/6 Paul Groth <pgroth at isi.edu>:
> Hi Everyone,
>
> Yogesh Simmhan has now made the code, in both C# and Java, available
> for the PanStarrs workflow.  Everything is available at http://twiki.ipaw.info/bin/view/Challenge/ThirdProvenanceChallenge
>  . We are looking for your help in reviewing the code and proposing
> provenance queries for the challenge. We are aiming to complete this
> by the end of the month so that we can start the challenge.
>
> The proposed PC3 schedule is as follows:
>
> 1. Review of code and provenance query proposals (to Feb 27)
> - March 2 - PC3 Starts
> 2. Make the workflow work with individual team's systems [Mar 2 - Mar
> 30]
> 3. Generate provenance for the challenge workflow & run queries on it
> [Mar 30 - Apr 13]
> 4. Export OPM Graphs and import from others [Apr 13 - May 4]
> 5. Run queries on imported OPM graph [Apr 27 - Jun 1]
> 6. Prepare slides for challenge [Jun 1 - Jun 8]
> - PC3 Workshop June 10 - 11 held in Amsterdam
>
> Thanks for your participation and I look forward to seeing your
> provenance queries and comments on the code.
>
> Paul
>
>
>
> --------------------------------------------------------------
> Paul Groth, Ph.D.
> Postdoctoral Research Associate
> Information Sciences Institute
> University of Southern California
> pgroth at isi.edu
> Tel:  310 448 8482  Fax: 310 822 0751
> http://www.isi.edu/~pgroth/
> http://thinklinks.wordpress.org
>
>
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>



-- 
Dr Simon Miles
Lecturer, Department of Computer Science
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166



More information about the Provenance-challenge-ipaw-info mailing list