[GOAL] Fwd: Re: Google Scholar discoverability of repository content
Stevan Harnad
harnad at ecs.soton.ac.uk
Fri Feb 17 12:37:11 GMT 2012
Important feedback from Tim Brody, one of the developers of EPrints:
Begin forwarded message:
> From: Tim Brody <tdb2 at ecs.soton.ac.uk>
> Date: February 17, 2012 6:33:22 AM EST
> To: eprints-tech at ecs.soton.ac.uk
> Cc: JISC-REPOSITORIES at JISCMAIL.AC.UK
> Subject: [EP-tech] Re: Google Scholar discoverability of repository content
> --------
>
> Hi All,
>
> Here is some specific advice for existing repository administrators from
> Google Scholar:
> http://roar.eprints.org/help/google_scholar.html
>
> As far as I'm aware there isn't anyone running EPrints 2 now, so
> EPrints-based repositories are already (and for a long) the "best in
> class" for Google Scholar.
>
>
> Right, this paper ...
>
> Table 1 is irrelevant and misleading. Scholar links first to the
> publisher and, only if there is no publisher link, directly to the IR
> version. That's a policy decision on the part of Scholar and nothing to
> do with IRs.
>
> Table 2 gives us some useful data. The headline rate for EPrints is 88%
> (based on CalTech). Unfortunately the authors haven't provided an
> analysis of what happened to the missing records. I've done a quick
> random sample of CalTech and I suspect the missing records will consist
> of:
> 1) Non-OA/non-full-text records (I'm sure a query to the CalTech
> repository admin could supply the data).
> 2) A percentage of PDFs that Scholar won't be able to parse. CalTech
> contains some old (1950s), scanned PDFs from Journals. Where the article
> isn't at the top of the page Scholar will struggle to parse the
> title/authors/abstract and therefore won't be able to match it to their
> records e.g. http://authors.library.caltech.edu/5815/
>
>
> The remainder of the paper describes the authors' process of fixing
> their own IR (based on CONTENTdm).
>
>
> The authors then wrongly conclude:
>
> "Despite GS’s endorsement of three software packages, the surveys
> conducted for this paper demonstrates that software is not a deciding
> factor for indexing ratio in GS. Each of the three recommended software
> packages showed good indexing ratios for some repositories and poor
> ratios for others."
>
> The authors looked at one instance of EPrints and, despite being a
> relatively old version, found 88% of its records indexed in GS.
>
> It is unfortunate that this paper has suggested that IR software in
> general is poorly indexed in GS. On the contrary, some badly implemented
> IR software is poorly indexed in GS.
>
>
> After all that is said, the most critical factor to IR visibility is
> having (BOAI definition) open access content. Hiding content behind
> search forms, click-throughs and other things that emphasise the IR at
> the expense of the content will hurt your visibility.
>
> Lastly, Google will index your metadata-only records while Google
> Scholar is looking for full-texts. Your GS/Google ratio will approximate
> how many of your records have an attached open access PDF (.doc etc).
>
>
> Sincerely,
> Tim Brody
> (EPrints Developer)
>
> On Wed, 2012-02-15 at 11:31 +0000, Stevan Harnad wrote:
>> Can we enhance the google-scholar discoverability of EPrints (and
>> DSpace) repositories?
>>
>> http://linksource.ebsco.com/linking.aspx?sid=google&auinit=K&aulast=Arlitsch&atitle=Invisible+Institutional+Repositories:+Addressing+the+Low+Indexing+Ratios+of+IRs+in+Google+Scholar&title=Library+Hi+Tech&volume=30&issue=1&date=2012&spage=4&issn=0737-8831
>>
>> Kenning Arlitsch, Patrick Shawn OBrien, (2012) "Invisible Institutional
>> Repositories: Addressing the Low Indexing Ratios of IRs in Google
>> Scholar", Library Hi Tech, Vol. 30 Iss: 1
>>
>> Purpose - Google Scholar has difficulty indexing the contents of
>> institutional repositories, and the authors hypothesize the reason is
>> that most repositories use Dublin Core, which cannot express
>> bibliographic citation information adequately for academic papers.
>> Google Scholar makes specific recommendations for repositories,
>> including the use of publishing industry metadata schemas over Dublin
>> Core. This paper tests a theory that transforming metadata schemas in
>> institutional repositories will lead to increased indexing by Google
>> Scholar.
>>
>> Design/methodology/approach - The authors conducted two surveys of
>> institutional and disciplinary repositories across the United States,
>> using different methodologies. They also conducted three pilot projects
>> that transformed the metadata of a subset of papers from USpace, the
>> University of Utah's institutional repository, and examined the results
>> of Google Scholar's explicit harvests.
>>
>> Findings - Repositories that use GS recommended metadata schemas and
>> express them in HTML meta tags experienced significantly higher indexing
>> ratios. The ease with which search engine crawlers can navigate a
>> repository also seems to affect indexing ratio. The second and third
>> metadata transformation pilot projects at Utah were successful,
>> ultimately achieving an indexing ratio of greater than 90%.
>> Research limitations/implications - The second survey was limited to
>> forty titles from each of seven repositories, for a total of 280 titles.
>> A larger survey that covers more repositories may be useful.
>>
>> Practical implications - Institutional repositories are achieving
>> significant mass, and the rate of author citations from those
>> repositories may affect university rankings. Lack of visibility in
>> Google Scholar, however, will limit the ability of IRs to play a more
>> significant role in those citation rates.
>> Originality/value - Little or no research has been published about
>> improving the indexing ratio of institutional repositories in Google
>> Scholar. The authors believe that they are the first to address the
>> possibility of transforming IR metadata to improve indexing ratios in
>> Google Scholar.
>> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>> *** Archive: http://www.eprints.org/tech.php/
>> *** EPrints community wiki: http://wiki.eprints.org/
>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/goal/attachments/20120217/74c67357/attachment-0001.html
More information about the GOAL
mailing list