[GOAL] Re: OASPA's ironic demonstration of the inadequacy of CC-BY for data mining
Heather Morrison
hgmorris at sfu.ca
Tue Mar 12 17:31:29 GMT 2013
OASPA has now released their dataset - thanks OASPA! The data is on a spreadsheet with no license on it. That's just fine - none is needed!
If there is a license for data, CC-BY is not optimal, as the attribution element becomes problematic when combining more than a couple of datasets. Data mining experts are calling for something more like public domain.
Another issue is the format. I'm glad that OASPA has released the data, but, like me, they have done so with an Excel spreadsheet. This is a proprietary format, and I'm not sure if it's optimal for data processing.
Surely a forward-thinking publishers association would be on top of the curve with respect to reusable formats?
Heather
On 2013-03-12, at 10:06 AM, David Prosser wrote:
> This is a slightly odd argument. I don't think that anybody has ever claimed that a CC-BY license is all that you need to data mine. Obviously, if the data are not in a format that can be mined then the license is almost irrelevant. The claim by CC-BY supporters is that it is the optimal license for data mining research papers - i.e., of all the licenses it (or its equivalent) is the one that allows the greatest freedom by the miners.
>
> (An additional complication is, of course, that we are really talking about data here rather than papers, and so perhaps a database license would be even more appropriate.)
>
> David
>
>
>
> On 12 Mar 2013, at 16:38, Heather Morrison wrote:
>
>> The Open Access Scholarly Publishers Association (OASPA)'s chart illustrating the growth of the CC-BY license ironically demonstrates the inadequacy of the license for data mining. This chart is posted in image format on a CC-BY licensed blog. The data per se has not been posted for download, and there is no explanation of the method of data capture. One could copy out the data points manually, with some estimation, for manipulation. However, this blogpost illustrates very well that a work can be CC-BY licensed but virtually useless for data mining.
>>
>> I would contrast this with my similar data set for The Dramatic Growth of Open Access, which for years has been available through a CC-BY-NC-SA license. The full dataset is available for anyone, anywhere to download and manipulate. This practice is probably not optimal for several reasons. The most important from the perspective of data manipulation, I suspect, is because I use an excel spreadsheet. I suspect csv format would be more useful. I'd appreciate some advice on this; perhaps this will be an emerging role for librarians? Public domain for the data per se would make more sense. What's needed here is a way to manage granting credit to the dataset creator that doesn't impose restrictive terms on the data per se.
>>
>> Links:
>> OASPA ironically demonstrates limitations of CC-BY:
>> http://poeticeconomics.blogspot.ca/2013/03/open-access-scholarly-publishers.html
>>
>> OASPA growth in the use of the CC-BY license:
>> http://oaspa.org/growth-in-use-of-the-cc-by-license-2/
>>
>> Dramatic Growth of Open Access Series:
>> http://poeticeconomics.blogspot.ca/2006/08/dramatic-growth-of-open-access-series.html
>>
>> Where to download full data:
>> http://summit.sfu.ca/item/10990
>>
>> best,
>>
>> Heather G. Morrison, PhD
>> The Imaginary Journal of Poetic Economics
>> http://poeticeconomics.blogspot.com
>> _______________________________________________
>> GOAL mailing list
>> GOAL at eprints.org
>> http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal
>
>
> _______________________________________________
> GOAL mailing list
> GOAL at eprints.org
> http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal
More information about the GOAL
mailing list