[GOAL] Re: CC-BY: the wrong goal for open access, and neither necessary nor sufficient for data and text mining
Dan Stowell
dan.stowell at eecs.qmul.ac.uk
Wed Oct 10 09:14:52 BST 2012
Hi all,
Some points re this discussion:
Helen wrote:
> 1. CC-BY is not necessary for data and text-mining. Internet search engines such as google and social media companies do extensive data and text mining, and they do not limit themselves to CC-BY material. This is true even in the EU, so is not prevented by the EU's support for copyright of data. To illustrate: if data and text-mining is not permissible without CC-BY, then Google must shut down, immediately.
This point is a bit weird. Firstly, just because Google is doing
something and getting away with it, doesn't mean a lone academic can be
confident of doing something similar and getting away with it. I was
always amazed by how brazenly Youtube set up its service *before* making
agreements with the major media companies, when I would have assumed
they would have been sued out of existence.
Secondly, some sort of licensing IS generally necessary for data and
text mining. Just because it's not CC doesn't mean it's not a licence.
For example Google Books reuses content, on the basis of explicit
agreements which were apparently made with deposit libraries and
publishers (I don't know the detail of that one). Facebook uses explicit
licensing that its users sign up to. Twitter does the same, and third
parties who mine Twitter any more than a tiny amount have to agree to
specific terms. Etc etc.
Some sort of enabling licence is clearly necessary, and of course for
data-mining we wish for a licence that "pre-approves" our actions so
that we don't have to conduct a million negotiations before we analyse
an aggregated dataset.
Ross wrote:
> WRT to your point 2 "CC-BY is not sufficient for data and text-mining" (nor
> is *any* applicable licence AFAIK - I know of no licence that asserts that
> digital material must be made available in a readily machine-interpretable
> form in the licence)
Actually the GPL is a very good example. It is for software, and the GPL
authors don't recommend it be used for texts, but it offers a
delightfully clear requirement that "the preferred form of the work for
making modifications" is made available. In the world of software, this
is the source code, but if applied to data it's clear that it would
militate against providing data tables as images.
When I first heard of CC licenses I was surprised that they didn't use
some form of words like this. It doesn't seem to "care" whether
downstream users get the perfect original or a low-quality JPEG. Since
then, I've come to decide that this relatively slack aspect of CC
licences was very good for cultural works and so forth.
But for the purposes of academic data reuse, perhaps this is the more
pertinent part of Helen's criticism.
The Open Database Licence also appears to assert "that digital material
must be made available in a readily machine-interpretable form"
<http://opendatacommons.org/licenses/odbl/summary/> though I'm less
familiar with that (see the "Keep open" part of the summary).
Best
Dan
P.S. One very minor additional point - Ross wrote:
> practically the SA clause means that other content that doesn't
> have that *exact* licence (CC-BY-NC-SA) cannot be remixed with content
under this licence
Be careful: the way you phrased it is not quite true. You can combine
CC-BY or CC-BY-NC content into a CC-BY-NC-SA work, for example. The
resulting work must be CC-BY-NC-SA in that case.
--
Dan Stowell
Postdoctoral Research Assistant
Centre for Digital Music
Queen Mary, University of London
Mile End Road, London E1 4NS
http://www.elec.qmul.ac.uk/digitalmusic/people/dans.htm
http://www.mcld.co.uk/
More information about the GOAL
mailing list