[GOAL] Re: Some discussion points for the UK OA initiative

Richard Poynder ricky at richardpoynder.co.uk
Mon May 7 21:50:49 BST 2012


Is it not the case that there are two parts to the data issue, and these two
parts are often conflated? There is data mining (mining the underlying data
associated with scholarly papers), and there is text mining, pulling data
out from the text of scholarly papers (i.e. treating papers as data). As I
understand it, both these things present somewhat different problems, and so
presumably require different solutions.

For instance, I am told that researchers concerned about text mining argue
that when their institution buys a subscription to an electronic journal
they should be acquiring not only the right to read the papers in it, but
the right to text mine them too. Publishers, however, do not see it that
way. This is not the same problem as that described by Keith below I think.

That said, not all OA publishers are text-mining friendly either. Nature
reports that "of the 2.4 million abstracts listed by PubMedCentral, only
400,000 (17%) are licensed for text-mining."
(http://www.nature.com/news/trouble-at-the-text-mine-1.10184).

I hope the UK government is clear that these are different problems
requiring different solutions.

>>

4. DATA. What about authors who do not wish to make their research data
freely accessible to all immediately, having gathered it for the purpose of
analyzing and data-mining it themselves? Would it not be a better idea for
the time being to merely recommend rather than require that data be made OA
as soon as possible, rather than risk resistance from authors who are happy
to give away their journal articles but not their data?

[Keith Jeffery]
[Keith Jeffery] you are right to raise this.  Different communities /
domains of research have different practices with embargo periods on data to
allow the project leader / team to have publication precedence.  So we have
publishers wanting embargos for articles and communities wanting embargos
for data (and probably also associated software which may raise issues
concerning confidentiality / patenting).  The UK funding councils are
pushing for the same conditions on data as on publications but the document
is not yet finalised. One solution would be to make data available openly
but to have agreements that any researcher working on the data other than
the original project team should (a) notify of intent to publish (b) ideally
co-publish with the original team  or (c) minimally cite the original team
publication and dataset/software.  It is all a matter of research ethics.
The present competitive research world does not encourage such ethics.
Again the Finch committee output will be interesting.  The whole area of
research data from publicly-funded research has been caught up with the open
'data.gov' (public service information, semantic web, linked open data)
agenda.  While  the two certainly are related, I am not convinced the
semantic web / LOD browsing over data to find the nearest hospital or local
government office - or crime statistics in your neighbourhood or league
table ratings of local schools -  is the same as managing terabytes (or
more) of research data with specialised and complex software.

Best
Keith




More information about the GOAL mailing list