[GOAL] Re: Text mining
Peter Murray-Rust
pm286 at cam.ac.uk
Wed May 9 10:00:48 BST 2012
On Wed, May 9, 2012 at 8:50 AM, Richard Poynder
<ricky at richardpoynder.co.uk>wrote:
> Members of the list might be interested in this article on text mining in
> The Chronicle of Higher Education:****
>
> ** **
>
>
> http://chronicle.com/article/Hot-Type-Elsevier-Experiments/131789/?key=TDl1JFRhMyxLNHk3NmsQZG1danFtNxgmZHEQPi0pblFRFQ%3D%3D
> ****
>
> ** **
>
> They might also be interested in some comments on the topic made by Ann
> Okerson on Liblicense: ****
>
> ** **
>
> http://listserv.crl.edu/wa.exe?A2=LIBLICENSE-L;b9df11a3.1205****
>
> ** **
>
> But should it really be necessary for librarians to act as mediators when
> researchers want to undertake a text mining project?
>
I will reply to this as someone who knows about text-mining and has also
asked publishers to provide clear statements of what is and what is not
allowed. (Summary- very little is clear and almost nothing is allowed
without "negotiation".) I assume that this is in scope for the list. Please
note that this discussion has no relevance to Green Open Access as it
relates to paid subscriptions to the literature.
For information the Liblicense discussion included:
"This brought to mind the efforts of various of us (librarians) over
the last few years to have data or text-mining language inserted into
standard library-publisher contracts, pretty much without success.
However, several publishers (including Elsevier) did tell me that,
while not able to insert such clauses, they'd be glad to work with
campus researchers on a trial basis, thus developing a better
understanding of just what such projects entail and in order to be
prepared for requests in the future. Not having researchers to bring
to the table, we got no further. One of my thoughts as I read today's
article is that a gap still exists between libraries, researchers, and
publishers -- we should have been able to work out such forays before
2012!
PMR : I am glad to see the librarians tried, although they have had no
impact. As they are the actual purchasers they are they ones with the
immediate power.
The "experiments" they mention include UBC and Heather Piwowar (a prominent
postdoc researcher) coming to an agreement that she could carry out
content-mining under terms set by Elsevier and mutually agreed during the
phone "negotiation".
I and many others have several problems with this.
* it legitimizes the "ownership" of the content with the publishers.
* It requires the publishers to agree to all content-mining research. This
is not unique to Elsevier - at least 4 other publishers took similar (but
even woollier) positions.
* it does not scale. If there are 2000 research universities and 100
publishers then the amount of negotiation is immense
* it is unnecessary. It would be possible to agree a protocol for
content-mining on a per-publisher or per-university basis or even (as we
are doing in the creation of a manifesto) on a worldwide basis.
* some publishers have started to suggest additional charges for
content-mining. This must be absolutely resisted.
If the publishers withdrew the additional clauses they have inserted then
the only barriers to content-mining would be technical (server capacity,
etc.). They are not arduous and standard information/web engineering. This
shouldn't be beyond an industry (academia) which spends 10 billion USD on
subscriptions and a lot on repositories.
I have made these views public (e.g. to the Chronicle and soon to other
organs) and offered opinions as to why publishers are so resistant to
content-mining. Of course some are enthusiastically pro - BMC, PLoS, eLife,
etc. Any imaginative publisher should be able to see that there are many
positive aspects to content-mining - more readership, more clicks, etc. But
my own survey suggest san industry which is doing very well with the
current model, and can protect it by doing nothing as academia is too
weak/fearful to challenge it (cf Michael Eisen's correct criticism of
Harvard).
What this does is to increase the tensions between the scholarly community
and publishers and marginalize the role of the universities in solving the
problem (which university has made a statement about content-mining?). If
the problem is to be solved it will come from initiatives such as the UK
government's and the EC's and funders. I expect that in the UK there will
be enough public authoritative statement that will effectively legitimize
CM outside of publisher control. At that stage a gold rush for
content-mining will develop.
In anticipation of this some of us are drafting a Manifesto for Open
Content Mining (RichardP and Peter Suber are involved) which we hope will
create something as simple and powerful as BOAI/BBB for content-mining.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/goal/attachments/20120509/32b8b418/attachment-0001.html
More information about the GOAL
mailing list