[GOAL] Re: Some discussion points for the UK OA initiative
Peter Murray-Rust
pm286 at cam.ac.uk
Mon May 7 22:47:35 BST 2012
I hope I can give a factual analysis of your question.
On Mon, May 7, 2012 at 9:50 PM, Richard Poynder
<ricky at richardpoynder.co.uk>wrote:
> Is it not the case that there are two parts to the data issue, and these
> two
> parts are often conflated? There is data mining (mining the underlying data
> associated with scholarly papers), and there is text mining, pulling data
> out from the text of scholarly papers (i.e. treating papers as data). As I
> understand it, both these things present somewhat different problems, and
> so
> presumably require different solutions.
>
The "data" spectrum is wider than that. "data mining" tends to be narrower
than "data analysis" or data re-use. It implies that there are patterns in
the data that can be best revealed (or only revealed) by machine methods.
For example the analysis of genomic data could be regarded as data-mining.
In many cases single instances or data sets can be valuable and the term
"data-mining" may not be appropriate. For example most single data sets
submitted as "supplemental information" would probably not be large enough
for data-mining but could be valuable for data analysis or re-use. However
if a large number of datasets can be assembled from such supp-info then
data mining might be appropriate.
Constraints on datamining include lack of clear metadata, and maybe lack of
clear licences.
>
> For instance, I am told that researchers concerned about text mining argue
> that when their institution buys a subscription to an electronic journal
> they should be acquiring not only the right to read the papers in it, but
> the right to text mine them too. Publishers, however, do not see it that
> way. This is not the same problem as that described by Keith below I think.
>
>
There are many different approaches to data and it's probably difficult to
generalize.
> That said, not all OA publishers are text-mining friendly either.
I think the term "OA publisher" is not precise. If the publications carry a
CC-BY or equivalent licence, as they do from BMC or PLoS, then the reader
has the effective right to carry out textmining. However many publications
(sic) are "OA" in the sense that they are visible somewhere, but do not
carry a clear licence that permits textmining.
> Nature
> reports that "of the 2.4 million abstracts listed by PubMedCentral, only
> 400,000 (17%) are licensed for text-mining."
> (http://www.nature.com/news/trouble-at-the-text-mine-1.10184).
>
The licence rights on UK/PMC content is poorly defined and I don't think
anyonw know what the numbers are. Without a machine readable-licence then
the only way of knowing whether something is text-minable is whether it is
;published by BMC or PLoS. The figure that are known to be fully
BOAI-compliant is less than 400,000.
Also it's important not to confuse abstracts with full papers. The full
text of many papers is not visible on UK/PMC although the abstracts are.
The rights on abstracts are usually unclear. I gather than abstracts have
had to be removed from PMC at the behest of the publishers.
>
> I hope the UK government is clear that these are different problems
> requiring different solutions.
>
> The first problem is lack of clarity and information.
> >>
>
> 4. DATA. What about authors who do not wish to make their research data
> freely accessible to all immediately, having gathered it for the purpose of
> analyzing and data-mining it themselves? Would it not be a better idea for
> the time being to merely recommend rather than require that data be made OA
> as soon as possible, rather than risk resistance from authors who are happy
> to give away their journal articles but not their data?
>
> [Keith Jeffery]
> [Keith Jeffery] you are right to raise this. Different communities /
> domains of research have different practices with embargo periods on data
> to
> allow the project leader / team to have publication precedence. So we have
> publishers wanting embargos for articles and communities wanting embargos
> for data (and probably also associated software which may raise issues
> concerning confidentiality / patenting). The UK funding councils are
> pushing for the same conditions on data as on publications but the document
> is not yet finalised. One solution would be to make data available openly
> but to have agreements that any researcher working on the data other than
> the original project team should (a) notify of intent to publish (b)
> ideally
> co-publish with the original team or (c) minimally cite the original team
> publication and dataset/software. It is all a matter of research ethics.
> The present competitive research world does not encourage such ethics.
> Again the Finch committee output will be interesting. The whole area of
> research data from publicly-funded research has been caught up with the
> open
> 'data.gov' (public service information, semantic web, linked open data)
> agenda. While the two certainly are related, I am not convinced the
> semantic web / LOD browsing over data to find the nearest hospital or local
> government office - or crime statistics in your neighbourhood or league
> table ratings of local schools - is the same as managing terabytes (or
> more) of research data with specialised and complex software.
>
> Best
> Keith
>
>
> _______________________________________________
> GOAL mailing list
> GOAL at eprints.org
> http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/goal/attachments/20120507/c70b8b47/attachment-0001.html
More information about the GOAL
mailing list