<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=us-ascii" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 9.00.8112.16457"></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=371551112-05012013><FONT color=#0000ff
size=2 face=Arial>It's my understanding that Google (and Google Scholar) find
published articles because the publishers enable crawling - whether the content
is freely available or not (if I'm oversimplifying, someone will no
doubt set me right). Are repository managers unintentionally blocking
this?</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=371551112-05012013><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=371551112-05012013><FONT color=#0000ff
size=2 face=Arial>Sally</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV align=left><FONT size=2 face=Arial>Sally Morris</FONT></DIV>
<DIV align=left><FONT size=2 face=Arial>South House, The Street, Clapham,
Worthing, West Sussex, UK BN13 3UU</FONT></DIV>
<DIV align=left><FONT size=2 face=Arial>Tel: +44 (0)1903
871286</FONT></DIV>
<DIV align=left><FONT size=2 face=Arial>Email:
sally@morris-assocs.demon.co.uk</FONT></DIV>
<DIV> </DIV><BR>
<DIV dir=ltr lang=en-us class=OutlookMessageHeader align=left>
<HR tabIndex=-1>
<FONT size=2 face=Tahoma><B>From:</B> goal-bounces@eprints.org
[mailto:goal-bounces@eprints.org] <B>On Behalf Of </B>Stevan
Harnad<BR><B>Sent:</B> 04 January 2013 23:19<BR><B>To:</B> Global Open Access
List (Successor of AmSci)<BR><B>Subject:</B> [GOAL] Searching for OA vs.
Providing OA<BR></FONT><BR></DIV>
<DIV></DIV>On Fri, Jan 4, 2013 at 5:03 PM, Gerritsma, Wouter <SPAN
dir=ltr><<A href="mailto:Wouter.Gerritsma@wur.nl"
target=_blank>Wouter.Gerritsma@wur.nl</A>></SPAN> wrote:<BR>
<DIV class=gmail_quote>
<DIV> </DIV>
<BLOCKQUOTE
style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex"
class=gmail_quote>
<DIV lang=EN-GB vlink="purple" link="blue">
<DIV>
<P class=MsoNormal> <SPAN
style="FONT-FAMILY: Verdana,sans-serif; COLOR: rgb(31,73,125); FONT-SIZE: 10pt">Google
Scholar is a very good fulltext scholarly search engine, no doubt about it.
But it doesn’t find all the ftxt available on the web, albeit it does a good
job.</SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Verdana','sans-serif'; COLOR: #1f497d; FONT-SIZE: 10pt">Take
e.g. one of my articles <A
href="http://scholar.google.com/scholar?cluster=17014920805021872143&hl=en&as_sdt=0,5"
target=_blank>http://scholar.google.com/scholar?cluster=17014920805021872143&hl=en&as_sdt=0,5</A>
GS found two PDF version’s but not the one on our universities repository.
That is still not fully indexed. Although it gets close <A
href="http://library.wur.nl/WebQuery/wurpubs/lang/380005"
target=_blank>http://library.wur.nl/WebQuery/wurpubs/lang/380005</A> it found
our metadata reocrd, but not the ftxt.<U></U><U></U></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Verdana','sans-serif'; COLOR: #1f497d; FONT-SIZE: 10pt">I
guess this is still the case with many repositories. Earlier this year it was
even reported in the literature:<U></U><U></U></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Verdana','sans-serif'; COLOR: #1f497d; FONT-SIZE: 10pt"><U></U><U></U></SPAN> </P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Verdana','sans-serif'; COLOR: #1f497d; FONT-SIZE: 10pt"
lang=NL>Arlitsch, K. & P.S. O'Brien (2012). </SPAN><SPAN
style="FONT-FAMILY: 'Verdana','sans-serif'; COLOR: #1f497d; FONT-SIZE: 10pt">Invisible
institutional repositories: addressing the low indexing ratios of IRs in
Google. Library Hi Tech, 30(1): 60-81 <A
href="http://dx.doi.org/10.1108/07378831211213210"
target=_blank>http://dx.doi.org/10.1108/07378831211213210</A><U></U><U></U></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Verdana','sans-serif'; COLOR: #1f497d; FONT-SIZE: 10pt"><U></U><U></U></SPAN> </P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Verdana','sans-serif'; COLOR: #1f497d; FONT-SIZE: 10pt">So
Google Scholar is still not the cure all for all OA available in the world.
Interestingly our repository is better indexed in the standard Google search
engine rather than the Scholar version.<U></U><U></U></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Verdana','sans-serif'; COLOR: #1f497d; FONT-SIZE: 10pt"><U></U><U></U></SPAN> </P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Verdana','sans-serif'; COLOR: #1f497d; FONT-SIZE: 10pt">So
my point is, doing a search on GS, and finding a lot of hits still doesn’t
guarantee to find all the ftxt of those papers.
<U></U><U></U></SPAN></P></DIV></DIV></BLOCKQUOTE>
<DIV><BR></DIV>Google Scholar does not find *all* the full-text papers freely
accessible online, but it finds most of them -- and incomparably more of them
than any other search engine.
<DIV><BR></DIV>
<DIV>Yes, Google Scholar coverage of institutional repositories can and will
improve. But it won't make much difference as long as most institutional
repositories are un-mandated, and hence near empty.</DIV>
<DIV><BR></DIV>
<DIV>To repeat, the problem is that Green OA self-archiving needs to be mandated
-- by institutions and funders -- worldwide. Till it is, we are mostly spinning
wheels.</DIV>
<DIV><BR></DIV>
<DIV>And if and when OA is closer to 90% than to 10%, not only will Google
Scholar developers dramatically upgrade Google Scholar's power, but many other
OA-specific search engines will do so too.</DIV>
<DIV><BR></DIV>
<DIV>Till then, however, it's hardly worth their while.</DIV>
<DIV><BR></DIV>
<DIV>Stevan Harnad</DIV>
<DIV><BR></DIV>
<DIV>PS: A side-bet (that I've made before): Once OA full-text is reliably near
100%, intelligent text-mining software-based tagging will outperform any
prefabricated, author-generated, librarian-generated or crowd-source based
tagging scheme for search and discovery. (But there's no motivation at all to
develop such future wonders on the impoverished corpus we have now...)</DIV>
<DIV> </DIV>
<BLOCKQUOTE
style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex"
class=gmail_quote>
<DIV lang=EN-GB vlink="purple" link="blue">
<DIV>
<P class=MsoNormal><B><SPAN
style="FONT-FAMILY: 'Tahoma','sans-serif'; FONT-SIZE: 10pt"
lang=EN-US>From:</SPAN></B><SPAN
style="FONT-FAMILY: 'Tahoma','sans-serif'; FONT-SIZE: 10pt" lang=EN-US> <A
href="mailto:goal-bounces@eprints.org"
target=_blank>goal-bounces@eprints.org</A> [mailto:<A
href="mailto:goal-bounces@eprints.org"
target=_blank>goal-bounces@eprints.org</A>] <B>On Behalf Of </B>Stevan
Harnad<BR><B>Sent:</B> donderdag 3 januari 2013 2:09<BR><B>To:</B> Global Open
Access List (Successor of AmSci)<BR><B>Cc:</B> SPARC Open Access Forum; <A
href="mailto:scholcomm@ala.org" target=_blank>scholcomm@ala.org</A> T.F.;
LibLicense-L Discussion Forum<BR><B>Subject:</B> [GOAL] Re: New Year's
challenge for repository developers and managers: awesome
cross-search<U></U><U></U></SPAN></P>
<P class=MsoNormal><U></U><U></U> </P>
<DIV>
<DIV>
<P class=MsoNormal>CHEER-LEADING, CHALLENGES AND
REALITY<U></U><U></U></P></DIV>
<DIV>
<P class=MsoNormal><U></U><U></U> </P></DIV>
<DIV>
<P class=MsoNormal>What is missing and needed is not "awesome repositories
cross-search tools." <U></U><U></U></P></DIV>
<DIV>
<P class=MsoNormal><U></U><U></U> </P></DIV>
<DIV>
<P class=MsoNormal>What is missing and needed is OA repository deposits, and
OA deposit mandates. <U></U><U></U></P></DIV>
<DIV>
<P class=MsoNormal><U></U><U></U> </P></DIV>
<DIV>
<P class=MsoNormal>The repositories are mostly
empty. <U></U><U></U></P></DIV>
<DIV>
<P class=MsoNormal><U></U><U></U> </P></DIV>
<DIV>
<P class=MsoNormal>And Google Scholar finds what OA content there is --
wherever it is on the web -- incomparably better than "awesome
repositories cross-search tools."<U></U><U></U></P></DIV>
<DIV>
<P class=MsoNormal><U></U><U></U> </P></DIV>
<DIV>
<P class=MsoNormal>Here is just a sample vanity search on a relatively
uncommon name (try your own):<U></U><U></U></P></DIV>
<DIV>
<P class=MsoNormal><U></U><U></U> </P></DIV>
<DIV>
<P class=MsoNormal><B>Awesome repositories cross-search
tool:</B> Harnad <A
href="http://network.bepress.com/explore/?q=Harnad" target=_blank>140
hits</A><U></U><U></U></P></DIV>
<DIV>
<P class=MsoNormal><B>Google Scholar:</B> Harnad <A
href="http://scholar.google.ca/scholar?q=Harnad&btnG=&hl=en&as_sdt=0%2C5"
target=_blank>15,900 hits</A> (author:Harnad: <A
href="http://scholar.google.ca/scholar?q=author%3AHarnad&btnG=&hl=en&as_sdt=0%2C5"
target=_blank>1,010</A> hits)<U></U><U></U></P></DIV>
<DIV>
<P class=MsoNormal><U></U><U></U> </P></DIV>
<DIV>
<P class=MsoNormal><U></U><U></U> </P></DIV>
<DIV>
<P
class=MsoNormal><U></U><U></U> </P></DIV></DIV></DIV></DIV><BR>_______________________________________________<BR>GOAL
mailing list<BR><A href="mailto:GOAL@eprints.org">GOAL@eprints.org</A><BR><A
href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal"
target=_blank>http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal</A><BR><BR></BLOCKQUOTE></DIV><BR></BODY></HTML>