<div dir="ltr"><div class="gmail_extra">On Fri, Aug 23, 2013 at 6:58 AM, Burns, Christopher S wrote:<br><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Although a harvester would be very nice, sampling theory and some manual<br>
work does the trick too... [in <span style="font-family:arial,sans-serif;font-size:13px">my dissertation] </span>I took the sample in May 2010 and collected </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
bibliometric and other relevant data from Google Scholar in July 2010, July 2011, </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
and July 2012.<br></blockquote><div><br></div><div style>Yes, hand-sampling can and does provide valuable information. </div><div style><br></div><div style>But, as I said, for systematic ongoing monitoring of the global time-course of OA growth across institutions, disciplines and nations, hand-sampling is excruciatingly difficult and time-consuming, holding research that could greatly benefit the worldwide research community (as well as Google and Google Scholar) to a scale and pace that is more suitable for a doctoral dissertation.</div>
<div style><br></div><div style>Historically speaking, if a few projects designed to monitor the ongoing global growth and distribution of OA were allowed to do machine data-mining in Google space, the growth rate of OA would be dramatically accelerated (and thereby also the size and functionality of Google Scholar space).</div>
<div style><br></div><div style>Otherwise, efforts to enrich Google Scholar space are relegated to the same fate as attempts to enrich vendors, spammers, napsters or phishermen.</div><div style><br></div><div style>Stevan Harnad</div>
<div style><br></div><div style><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
Key findings include:<br>
<br>
Of the 995 bibliographic references in my sample, 691 referred to<br>
journal articles.<br>
<br>
In 2012, 662 of those references were valid references and Google<br>
Scholar was able to locate them.<br>
<br>
Of those 662 references, 381 (57.55%) were retrievable full-text from<br>
Google Scholar in 2012, and this was up from 345 (out of 648 valid and<br>
Google Scholar locatable) in 2010. Of course, these were retrievable w/o<br>
the benefit of a university library's proxy.<br>
<br>
The sources providing access were varied but not numerous (by category),<br>
but universities (which includes institutional repositories) were the<br>
most common source for full text access via Google Scholar. In 2012, 145<br>
(63.32%) universities provided access to 199 (52.09%) of the documents.<br>
<br>
One more bit:<br>
<br>
For the 2012 data, of those articles that were available full text via<br>
Google Scholar, the median citation count was 49. For those articles<br>
that were not available full text via Google Scholar, the median<br>
citation count was 20.<br>
<br>
I'm no longer collecting data in the same way I did for 2010, 2011, and<br>
2012. Instead I'm getting set to sample from multiple sources, other<br>
than and in addition to CiteULike, in order to acquire even more<br>
credible results.<br>
<br>
Sean Burns<br>
<br>
--<br>
C. Sean Burns | Assistant Professor<br>
School of Library and Information Science<br>
University of Kentucky<br>
327 Little Library Building | Lexington, KY 40506-0224<br>
Phone +1 859-218-2296 | Fax +1 859-257-4205<br>
<a href="https://ci.uky.edu/lis/" target="_blank">https://ci.uky.edu/lis/</a><br>
<a href="https://sweb.uky.edu/~csbu225" target="_blank">https://sweb.uky.edu/~csbu225</a><br>
<br>
<br>
> Adminstrative info for SIGMETRICS (for example unsubscribe):<br>
> <a href="http://web.utk.edu/~gwhitney/sigmetrics.html" target="_blank">http://web.utk.edu/~gwhitney/sigmetrics.html</a><br>
> This is a response to a query regarding Eric Archambault's report on<br>
> OA Growth by Adam G Dunn in Science Insider: "I find it difficult to<br>
> believe that the authors of the study managed to create a harvester<br>
> that could identify and verify the pdfs linked to by Google Scholar<br>
> when Google Scholar actively blocks IP addresses when they identify<br>
> crawling."<br>
><br>
> Our own "harvester" attempts to gather the all-important data on OA<br>
> growth were blocked by Google.<br>
><br>
> It is completely understandable and justifiable that Google shields<br>
> its increasingly vital global database and search mechanisms from the<br>
> countless and incessant worldwide attempts at exploitation by<br>
> commercial interests, spammers, and malware that could bring Google to<br>
> its knees if not rigorously and relentlessly blocked.<br>
><br>
> But in the very special (and tiny) case of scientific research<br>
> articles it would not only be a great help to the worldwide research<br>
> community but to Google (and Google Scholar) itself if Google granted<br>
> special individual exemptions for important international studies like<br>
> Eric Archambault's, which was commissioned by the European Union to<br>
> monitor the global growth rate of open access to research.<br>
><br>
> Google and Google Scholar would become all the richer as research<br>
> databases if data like Eric's (and our own) were not made so<br>
> excruciatingly difficult and time-consuming to gather by Google's<br>
> blanket blockage of automated data-mining.<br>
><br>
><br>
> (We do not trawl books, so Google's agreements with publishers are not<br>
> violated or at issue in any way. We just want to trawl for articles<br>
> whose metadata match the the metadata from Web of Science or SCOPUS<br>
> and have been made freely accessible on the web; nor do we want their<br>
> full-texts: just to check whether they are there!)<br>
><br>
> Stevan Harnad<br>
><br>
><br>
<br>
</blockquote></div><br></div></div>