[GOAL] Re: Some discussion points for the UK OA initiative

Stevan Harnad harnad at ecs.soton.ac.uk
Mon May 7 23:11:00 BST 2012


Richard, you are quite right that making research data open for all
to mine is not the same thing as making the texts of research articles
(Libre) OA for text-mining, and you are also right that there are
different problems associated with each.

What I wrote below was about making research data open for mining.
This has nothing to do with publishers; the problems concern only
the researchers' own first-exploitation rights, and the wish (indeed
the need) of many not to surrender their hard-earned first-exploitation
rights  s soon as they've gathered their data (or right after their first public 
report based on the data). These are author-side barriers to data OA.
There are no author-side barriers to article OA (just sluggishness,
and some groundless fears).

Making article texts open for text-mining calls for Libre OA. Publishers
are much more resistant to Libre OA than to Gratis OA (free online access);
hence the barriers to Libre OA are much higher than the barriers to
Gratis OA. The majority of journals (including most of the top journals
in most fields) already endorse immediate Gratis Green OA self-archiving
of the author's peer-reviewed final draft, by the author, in the author's
institutional repository.

But few publishers endorse Libre OA, for fear of 3rd-party free-riders.
(Moreover, some flavors of Libre OA call for further re-use rights
that even some authors would not wish to grant.)

So all in all, both data OA and Libre OA face problems that Green
Gratis OA does not face.

So let funders and institutions mandate Green Gratis OA worldwide
first, and then let's worry about data OA and Libre OA (and Gold OA).

For then we will at least (and at last!) have free online access to research
articles -- whereas we only have it to about 20% of research now.

That will be a huge step forward for research progress.

And (I suggest), it will also be the fastest and surest step toward
data OA, Libre OA and Gold OA thereafter.

Over-reaching for them now just keeps of deprived of the 100%
Gratis Green OA that is already within our grasp.

Stevan Harnad

On 2012-05-07, at 4:50 PM, Richard Poynder wrote:

> Is it not the case that there are two parts to the data issue, and these two
> parts are often conflated? There is data mining (mining the underlying data
> associated with scholarly papers), and there is text mining, pulling data
> out from the text of scholarly papers (i.e. treating papers as data). As I
> understand it, both these things present somewhat different problems, and so
> presumably require different solutions.
> 
> For instance, I am told that researchers concerned about text mining argue
> that when their institution buys a subscription to an electronic journal
> they should be acquiring not only the right to read the papers in it, but
> the right to text mine them too. Publishers, however, do not see it that
> way. This is not the same problem as that described by Keith below I think.
> 
> That said, not all OA publishers are text-mining friendly either. Nature
> reports that "of the 2.4 million abstracts listed by PubMedCentral, only
> 400,000 (17%) are licensed for text-mining."
> (http://www.nature.com/news/trouble-at-the-text-mine-1.10184).
> 
> I hope the UK government is clear that these are different problems
> requiring different solutions.
> 
>>> 
> 
> 4. DATA. What about authors who do not wish to make their research data
> freely accessible to all immediately, having gathered it for the purpose of
> analyzing and data-mining it themselves? Would it not be a better idea for
> the time being to merely recommend rather than require that data be made OA
> as soon as possible, rather than risk resistance from authors who are happy
> to give away their journal articles but not their data?
> 
> [Keith Jeffery]
> [Keith Jeffery] you are right to raise this.  Different communities /
> domains of research have different practices with embargo periods on data to
> allow the project leader / team to have publication precedence.  So we have
> publishers wanting embargos for articles and communities wanting embargos
> for data (and probably also associated software which may raise issues
> concerning confidentiality / patenting).  The UK funding councils are
> pushing for the same conditions on data as on publications but the document
> is not yet finalised. One solution would be to make data available openly
> but to have agreements that any researcher working on the data other than
> the original project team should (a) notify of intent to publish (b) ideally
> co-publish with the original team  or (c) minimally cite the original team
> publication and dataset/software.  It is all a matter of research ethics.
> The present competitive research world does not encourage such ethics.
> Again the Finch committee output will be interesting.  The whole area of
> research data from publicly-funded research has been caught up with the open
> 'data.gov' (public service information, semantic web, linked open data)
> agenda.  While  the two certainly are related, I am not convinced the
> semantic web / LOD browsing over data to find the nearest hospital or local
> government office - or crime statistics in your neighbourhood or league
> table ratings of local schools -  is the same as managing terabytes (or
> more) of research data with specialised and complex software.
> 
> Best
> Keith
> 
> 
> _______________________________________________
> GOAL mailing list
> GOAL at eprints.org
> http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal




More information about the GOAL mailing list