[provenance-challenge] ProvBench @ProvenanceWeek Call for Paper
Paolo Missier
pmissier at acm.org
Tue Apr 22 12:12:14 BST 2014
Dear colleagues,
a reminder that ProvBench has an open call for contributions, to be presented as part of Provenance Week.
Timeline:
Expression of interest: May 2nd, 2014.
Submission deadline: May 9th, 2014.
Notification: June 1st, 2014.
Workshop 13th of June, 2014
Thanks! --Paolo Missier
--
Paolo Missier - Paolo.Missier at newcastle.ac.uk, pmissier at acm.org
School of Computing Science, Newcastle University, UK
professional: http://www.cs.ncl.ac.uk/people/Paolo.Missier
photography: http://scattidistratti.smugmug.com/
PGP Public key: 0x45596549 - key servers: pool.sks-keyservers.net
=--= Tempus fugit =--=
-------------- next part --------------
Call for contributions
========================
ProvBench (co-located with the Provenance Week (IPAW + TAPP))
https://sites.google.com/site/provbench/home/provbench-provenance-week-2014
Cologne, 13th of June, 2014
========================
ProvBench: Benchmarking Provenance Management Systems
2nd edition: Call for benchmarking datasets
Background
----------
Provenance metadata, or metadata that describes the origins of data, is now widely
as a key ingredient for numerous (traditional and novel) applications. For example,
provenance can be used to inspect the quality of data provided by third-parties,
to identify active members in social networks analytics, as well as to ensure correct
attribution, citation and licensing.
The increasing number of provenance-related proposals and systems generates the need
for a well documented and impartial provenance corpus that can be used by researchers
and systems developers as a means for testing and validating their provenance management
systems (ProvMS), including storage techniques for large provenance graphs, query models,
and analysis algorithms. These systems are currently being tested and assessed on
proprietary provenance datasets. This makes it difficult to benchmark and compare
different implementations.
On the other hand, benchmark datasets are already available for a wide variety of generic
DBMS, upon which many implementations of ProvMS are based. These generic systems include
RDF triple stores, native graph DBMS, relational DBMS, and more. Thus, the questions we
aim to answer include:
Is there in fact a need for new benchmark datasets which are specific to provenance data and
that reflect its usages? for instance: system-level provenance, provenance of web pages
(MediaWiki), provenance of a SW project, provenance of scientific workflows, provenance of
human processes, etc.
Does provenance exhibit typical data or query patterns that may suggest ways to optimize
either storage, or query processing?
To what realistic sizes and at what rate does provenance data accumulate in different settings,
and when does size begin to pose a problem to storage and query processing?
Objective
---------
With these questions in mind, ProvBench looks to build upon the tradition of database benchmarks
(e.g. relational, RDF). Its purpose is to collect a corpus of provenance datasets, along with
associated query workloads that are at the same time:
broad: representative of a variety of provenance usage scenarios
specific to provenance data (as opposed to general RDF, graph, or relational benchmarking datasets)
challenging to provenance management systems (scalable storage, query performance)
Why do this?
------------
You will not get a formal paper publication out of this, as we cannot include your documentation
in the TAPP/IPAW proceedings. However you will get a data publication with an official DOI.
The datasets will be cited by members of the community who make use of them in their publications.
To encourage this practice, the datasets accepted in ProvBench will be minted by DOIs, which will
be allocated with the help of FigShare.
Submissions
-----------
Submissions can be entirely new or they can be new versions, or refinements, of submissions to
the first edition of ProvBench.
Submissions should include a dataset and accompanying documentation, as specified below.
Contributors should email Khalid Belhajjame (kbelhajj at googlemail.com), for access to Github.
Each submission shall consist of:
- A dataset (provenance trace).
Multiple distinct datasets can be submitted. These however should be “similar” provenance
traces at differing scale, derived from the same original data source.
Traces can be serialized in any of the W3C PROV encodings[1], either official (PROV Notation,
PROV-O) or unofficial (PROV-XML, PROV-JSON)
- A query workload. Lacking a standard query language for provenance, queries are to be
expressed in natural language and must be sufficiently precise to allow for unambiguous implementation.
- Metadata: size ( number or entities, activities, relationships), format, authorship, etc.
- Rationale and documentation for the submission, including:
the type of scenario that the submission is representative of, along with any background info
useful to understand the domain
What can the dataset and its accompanying queries be used to test
What makes the dataset distinct from generic DBMS benchmarks
What makes the submission challenging
How the dataset has been used to test specific properties of a ProvMS
Note: The rationale document does not constitute a paper, and will not be published in a proceedings. Companion papers, if desired, should be submitted to TAPP[2] or IPAW[3].
The Event
----------
The day of the event might take place as a mixture of presentations, mini-hackathon and panel sessions, depending on the number of submissions and number of participants. A detailed agenda will be announced a few weeks prior to the event.
Note that you have to register to ProvenanceWeek2014 in order to attend this event.
Important Dates (tentative)
Expression of interest: May 2nd, 2014.
Submission deadline: May 9th, 2014.
Notification: June 1st, 2014.
Organisers
----------
khalid Belhajjame, Université Paris-Dauphine
Adriane Chapman, The MITRE Corporation
Hugo Firth, Newcastle Univeristy
Paolo Missier, Newcastle University
Jun Zhao, Lancaster University
References
[1]: http://www.w3.org/TR/prov-overview/
[2]: http://provenanceweek.dlr.de/tapp/call-participation/
[3]: http://provenanceweek.dlr.de/ipaw/call-participation/
[4]: https://github.com/provbench
More information about the Provenance-challenge-ipaw-info
mailing list