[GOAL] Re: CC-BY: the wrong goal for open access, and neither necessary nor sufficient for data and text mining

Heather Morrison hgmorris at sfu.ca
Tue Oct 9 19:52:53 BST 2012


On 2012-10-09, at 10:47 AM, Ross Mounce wrote:

> 
> 1.      CC-BY is not necessary for data and text-mining.
> 
> In some sense true, it is not *strictly* necessary

Glad we agree on that! 


> - but it sure does alleviate concerns over being sued!
> Google can 'get away with it' because they don't need to document the in-between steps - transparency. Researchers and academics *do* need to be able to display reproducible literature mining techniques and thus will need to reproduce some published content (in my understanding) in order to show that their methods work as described. Thus there is an easily explainable difference between Google's needs (no need for transparency, just present the results of the mining analyses without republishing the analysed content), and the needs of academic research (reproducibility/transparency demonstrated by reproducing some annotated/analysed content AND results). I'm sure there are other reasons too but AFAIK CC-BY is 'best' for mining (well, CC0 would be better, but that's not realistic for OA)

Ross, just minutes ago you were proudly asserting that you and other researchers are knowingly using illegal methods for gaining access to research literature such as asking for PDFs over twitter. Which is it, Ross - do academics need to be accountable and transparent, or can they do what they like?

> 
> As you well know other licences like CC-BY-NC leave one uncomfortably open to legal action if one posts such material on say, an ad-supported blog.

Forcing CC-BY could well leave one open to legal action. Picture, for example, a research subject whose picture is used for advertising purposes without their permission, or a scholar whose work is used in this manner who actively disagrees with the ad (e.g. a researcher whose conclusions suggest that one should avoid a drug, and a pharma company that cherry-picks a bit of the article that appears to support use of the drug). 

Speaking of open and transparent methods, are researchers telling human research subjects that their contributions may be given away on a blanket basis for third parties to sell? Would a research ethics committee even approve such an approach? Without this permission, I would argue that CC-BY, where human subjects are involved, will frequently be in violation of research ethics. 

As part of my dissertation, I did some interviews with senior people in academic publishing. The results were very interesting, and in some cases I have quoted the respondent at some length. I can assure that I did not ask permission from these people to give away rights to sell their words to others, and if I had wanted to do so, I would have needed to clear this with research ethics first. 


> I do not believe Open Access should prevent the sharing of materials on blogs and other popular places/uses and thus CC-BY is the 'safest' licence from the re-user POV. 
> 
> Digital content placed publicly on the internet needs *a* licence, and for OA research works; CC-BY looks like the best of those available to me. You are free to suggest an alternate licence and I think it would help your argument if you actually did, rather than just criticizing one option and seemingly providing no alternative.

I do not agree that licensing is necessary needed, or always helpful. My own position is that articulating the commons (what it can or should mean) is a long-term project, and advocating for specific licenses shuts down the conversation prematurely. Of the CC licenses, I think CC-BY-NC-SA is the strongest license for open access, as it protects OA downstream. However, there may be good reasons for not allowing derivatives, and so I do not recommend insisting that everyone use any one particular license.
> 
>  
> 2.  CC-BY is not sufficient for data and text-mining. The Creative Commons licenses are designed as a means for creators to waive rights that they would otherwise have under copyright; they do not place any obligations on the Licensor. There is nothing to stop a creator from using a CC-BY license with a locked-down PDF with extra DRM designed to prevent data and text-mining.
> 
> 
> I also see the problem described here.

Thanks - interjecting for emphasis, I think we might be getting somewhere...


> But licencing and CC-BY has nothing to do with this problem! 
> 
> The problem described here, in my words is: obfuscation. This kind of thing is commonly encountered when publishers publish non-machine interpretable tables of data as *images* in academic works rather than copy-pasteable numbers or data as they should do.  It doesn't matter what the licence is, CC-BY or even All Rights Reserved(!) - it's very difficult to mine usable correct information out of such tables/content. As a further example, they could provide all the text as a 'screenshot' style image to further hamper mining efforts. Thus I'm afraid point 2 bares no relevance to Open Access & CC-BY.

Similarly, if the aim is to encourage publication of reusable tables, then demanding CC-BY is not helpful. You can publish images with the CC-BY license.

best,

Heather Morrison

> 
> 
> Ross
>  
> -- 
> -/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
> Ross Mounce
> PhD Student & Open Knowledge Foundation Panton Fellow
> Fossils, Phylogeny and Macroevolution Research Group
> University of Bath, 4 South Building, Lab 1.07
> http://about.me/rossmounce
> -/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
> _______________________________________________
> GOAL mailing list
> GOAL at eprints.org
> http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal




More information about the GOAL mailing list