[GOAL] Re: New Year's challenge for repository developers and managers: awesome cross-search
Wilhelmina Randtke
randtke at gmail.com
Mon Jan 7 16:41:43 GMT 2013
Something to consider with open access is open access of the search, not
just the content. You can have an open field, like government information,
where the paid search becomes necessary, like Lexis, and even though the
content is free of restrictions someone must pay to access the search and
effectively must pay to access free content. With something like a
federated search that requires content providers to pay to have their
material indexed (this is what Digital Commons Network is doing), you have
a different issue. If a search like that catches on, then the institutions
with repositories must pay to have their content included and not be
invisible. It may be just as expensive, especially given how scare good
technology skills are.
Google Scholar chooses what to index partly by when they get around to it,
partly arbitrary, and without transparency. I don't necessarily like that.
However, I vastly prefer arbitrary inclusion criteria for the Google
Scholar search to the expensive paid inclusion in something like the
Digital Commons Network. There each archive with included content has
shelled out upwards of 15K per year to buy the Digital Commons platform,
and that's ball park pricing for a tiny institution with minimal content.
Most pay much more. There is some administrative reason for not including
content on other platforms in that the management at Digital Commons always
has access to an up-to-date list of repositories on that platform, but
doesn't have an already maintained in-house list of other repositories.
Significantly, there is no clear technological reason for not including
content from other platforms. There are two types of metadata in play in a
Digital Commons repository: Dublin Core, and some Digital Commons specific
metadata which seems to be organizing serials (Dublin Core doesn't account
for serials). The Dublin Core metadata is fairly easy to harvest from a
repository. To get a pull of Dublin Core records from a Digital Commons
site, you go to the OAI-PMH feed at (base URL)/do/oai , for example,
http://lib.dr.iastate.edu/do/oai/ lets you query and get the indexing info.
Then to get full text, you hook any attached file, and extract full text
from it. Using that, I could build a clunky cross repository search as a
weekend project. Every repository platform will let you do this: get
indexing information and see any attached (publicly available) files.
I would be much more comfortable with something like Digital Commons
Network, if they pulled records from repositories in other platforms, for
example, by looking at http://www.openarchives.org/Register/BrowseSites and
harvesting those records. They could have done that, but didn't.
That makes institutions invisible unless the institution pays up.
-Wilhelmina Randtke
On Fri, Jan 4, 2013 at 4:03 PM, Gerritsma, Wouter
<Wouter.Gerritsma at wur.nl>wrote:
> Hi Stevan,****
>
> ** **
>
> Google Scholar is a very good fulltext scholarly search engine, no doubt
> about it. But it doesn’t find all the ftxt available on the web, albeit it
> does a good job. ****
>
> Take e.g. one of my articles
> http://scholar.google.com/scholar?cluster=17014920805021872143&hl=en&as_sdt=0,5GS found two PDF version’s but not the one on our universities repository.
> That is still not fully indexed. Although it gets close
> http://library.wur.nl/WebQuery/wurpubs/lang/380005 it found our metadata
> reocrd, but not the ftxt.****
>
> I guess this is still the case with many repositories. Earlier this year
> it was even reported in the literature:****
>
> ** **
>
> Arlitsch, K. & P.S. O'Brien (2012). Invisible institutional repositories:
> addressing the low indexing ratios of IRs in Google. Library Hi Tech,
> 30(1): 60-81 http://dx.doi.org/10.1108/07378831211213210****
>
> ** **
>
> So Google Scholar is still not the cure all for all OA available in the
> world. Interestingly our repository is better indexed in the standard
> Google search engine rather than the Scholar version.****
>
> ** **
>
> So my point is, doing a search on GS, and finding a lot of hits still
> doesn’t guarantee to find all the ftxt of those papers. ****
>
> ** **
>
> Al the best Wouter ****
>
> ** **
>
> *From:* goal-bounces at eprints.org [mailto:goal-bounces at eprints.org] *On
> Behalf Of *Stevan Harnad
> *Sent:* donderdag 3 januari 2013 2:09
>
> *To:* Global Open Access List (Successor of AmSci)
> *Cc:* SPARC Open Access Forum; scholcomm at ala.org T.F.; LibLicense-L
> Discussion Forum
> *Subject:* [GOAL] Re: New Year's challenge for repository developers and
> managers: awesome cross-search****
>
> ** **
>
> CHEER-LEADING, CHALLENGES AND REALITY****
>
> ** **
>
> What is missing and needed is not "awesome repositories cross-search
> tools." ****
>
> ** **
>
> What is missing and needed is OA repository deposits, and OA deposit
> mandates. ****
>
> ** **
>
> The repositories are mostly empty. ****
>
> ** **
>
> And Google Scholar finds what OA content there is -- wherever it is on the
> web -- incomparably better than "awesome repositories cross-search tools."
> ****
>
> ** **
>
> Here is just a sample vanity search on a relatively uncommon name (try
> your own):****
>
> ** **
>
> *Awesome repositories cross-search tool:* Harnad 140 hits<http://network.bepress.com/explore/?q=Harnad>
> ****
>
> *Google Scholar:* Harnad 15,900 hits<http://scholar.google.ca/scholar?q=Harnad&btnG=&hl=en&as_sdt=0%2C5>
> (author:Harnad: 1,010<http://scholar.google.ca/scholar?q=author%3AHarnad&btnG=&hl=en&as_sdt=0%2C5>
> hits)****
>
> ** **
>
> ** **
>
> ** **
>
> _______________________________________________
> GOAL mailing list
> GOAL at eprints.org
> http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/goal/attachments/20130107/515b9a70/attachment-0001.html
More information about the GOAL
mailing list