Something to consider with open access is open access of the search, not
just the content. You can have an open field, like government
information, where the paid search becomes necessary, like Lexis, and
even though the content is free of restrictions someone must pay to
access the search and effectively must pay to access free content. With
something like a federated search that requires content providers to
pay to have their material indexed (this is what Digital Commons Network
is doing), you have a different issue. If a search like that catches
on, then the institutions with repositories must pay to have their
content included and not be invisible. It may be just as expensive,
especially given how scare good technology skills are.<br><br>Google Scholar chooses what to index partly by when they get around to it, partly arbitrary, and without transparency. I don't necessarily like that.<br>
<br>However, I vastly prefer arbitrary inclusion criteria for the Google Scholar search to the expensive paid inclusion in something like the Digital Commons Network. There each archive with included content has shelled out upwards of 15K per year to buy the Digital Commons platform, and that's ball park pricing for a tiny institution with minimal content. Most pay much more. There is some administrative reason for not including content on other platforms in that the management at Digital Commons always has access to an up-to-date list of repositories on that platform, but doesn't have an already maintained in-house list of other repositories. Significantly, there is no clear technological reason for not including content from other platforms. There are two types of metadata in play in a Digital Commons repository: Dublin Core, and some Digital Commons specific metadata which seems to be organizing serials (Dublin Core doesn't account for serials). The Dublin Core metadata is fairly easy to harvest from a repository. To get a pull of Dublin Core records from a Digital Commons site, you go to the OAI-PMH feed at (base URL)/do/oai , for example, <a href="http://lib.dr.iastate.edu/do/oai/">http://lib.dr.iastate.edu/do/oai/</a> lets you query and get the indexing info. Then to get full text, you hook any attached file, and extract full text from it. Using that, I could build a clunky cross repository search as a weekend project. Every repository platform will let you do this: get indexing information and see any attached (publicly available) files.<br>
<br>I would be much more comfortable with something like Digital Commons Network, if they pulled records from repositories in other platforms, for example, by looking at <a href="http://www.openarchives.org/Register/BrowseSites">http://www.openarchives.org/Register/BrowseSites</a> and harvesting those records. They could have done that, but didn't. <br>
<br>That makes institutions invisible unless the institution pays up.<br><br>-Wilhelmina Randtke<br><br><br><div class="gmail_quote">On Fri, Jan 4, 2013 at 4:03 PM, Gerritsma, Wouter <span dir="ltr"><<a href="mailto:Wouter.Gerritsma@wur.nl" target="_blank">Wouter.Gerritsma@wur.nl</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div link="blue" vlink="purple" lang="EN-GB">
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d">Hi Stevan,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d">Google Scholar is a very good fulltext scholarly search engine, no doubt about it. But it doesn’t find all the ftxt available on the web, albeit it does a good
job. <u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d">Take e.g. one of my articles
<a href="http://scholar.google.com/scholar?cluster=17014920805021872143&hl=en&as_sdt=0,5" target="_blank">
http://scholar.google.com/scholar?cluster=17014920805021872143&hl=en&as_sdt=0,5</a> GS found two PDF version’s but not the one on our universities repository. That is still not fully indexed. Although it gets close
<a href="http://library.wur.nl/WebQuery/wurpubs/lang/380005" target="_blank">http://library.wur.nl/WebQuery/wurpubs/lang/380005</a> it found our metadata reocrd, but not the ftxt.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d">I guess this is still the case with many repositories. Earlier this year it was even reported in the literature:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d" lang="NL">Arlitsch, K. & P.S. O'Brien (2012).
</span><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d">Invisible institutional repositories: addressing the low indexing ratios of IRs in Google. Library Hi Tech, 30(1): 60-81 <a href="http://dx.doi.org/10.1108/07378831211213210" target="_blank">http://dx.doi.org/10.1108/07378831211213210</a><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d">So Google Scholar is still not the cure all for all OA available in the world. Interestingly our repository is better indexed in the standard Google search
engine rather than the Scholar version.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d">So my point is, doing a search on GS, and finding a lot of hits still doesn’t guarantee to find all the ftxt of those papers.
<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d">Al the best Wouter
<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"> <a href="mailto:goal-bounces@eprints.org" target="_blank">goal-bounces@eprints.org</a> [mailto:<a href="mailto:goal-bounces@eprints.org" target="_blank">goal-bounces@eprints.org</a>]
<b>On Behalf Of </b>Stevan Harnad<br>
<b>Sent:</b> donderdag 3 januari 2013 2:09</span></p><div class="im"><br>
<b>To:</b> Global Open Access List (Successor of AmSci)<br>
</div><b>Cc:</b> SPARC Open Access Forum; <a href="mailto:scholcomm@ala.org" target="_blank">scholcomm@ala.org</a> T.F.; LibLicense-L Discussion Forum<br>
<b>Subject:</b> [GOAL] Re: New Year's challenge for repository developers and managers: awesome cross-search<u></u><u></u><p></p><div class="im">
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal">CHEER-LEADING, CHALLENGES AND REALITY<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">What is missing and needed is not "awesome repositories cross-search tools." <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">What is missing and needed is OA repository deposits, and OA deposit mandates. <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">The repositories are mostly empty. <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">And Google Scholar finds what OA content there is -- wherever it is on the web -- incomparably better than "awesome repositories cross-search tools."<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Here is just a sample vanity search on a relatively uncommon name (try your own):<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"><b>Awesome repositories cross-search tool:</b> Harnad <a href="http://network.bepress.com/explore/?q=Harnad" target="_blank">140 hits</a><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><b>Google Scholar:</b> Harnad <a href="http://scholar.google.ca/scholar?q=Harnad&btnG=&hl=en&as_sdt=0%2C5" target="_blank">15,900 hits</a> (author:Harnad: <a href="http://scholar.google.ca/scholar?q=author%3AHarnad&btnG=&hl=en&as_sdt=0%2C5" target="_blank">1,010</a> hits)<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div></div>
</div>
<br>_______________________________________________<br>
GOAL mailing list<br>
<a href="mailto:GOAL@eprints.org">GOAL@eprints.org</a><br>
<a href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal" target="_blank">http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal</a><br>
<br></blockquote></div><br>