[BOAI] Unethical harvesters
Peter Suber
peters at earlham.edu
Wed Oct 28 16:59:51 GMT 2009
[Forwarding from Arthur Sale. --Peter Suber.]
I write to draw the lists attention to unethical
behaviour by a national harvester the
Australian Research Online gateway. This
gateway, operated by the National Library of
Australia, has rejected the OAI-PMH standard and
has announced a local variant. This sort of
behaviour by harvesters must be firmly stamped on
as soon as possible. International standards are
to be complied with, not modified for budgetary convenience.
Responsibility
It is the responsibility of any OAI-PMH harvester
such as ARO, ADT, ROAR, OpenDOAR, OAIster, etc to
harvest correctly from all OAI-PMH-compliant
repositories that exist in the wild and which it
regard as its target group. Please examine that
sentence carefully: the responsibility is with a
gateway (which ARO is) to harvest from any
compliant OAI-PMH interface, and not to
misrepresent the data. The National Library fails on both counts.
Remember that international standards such as
OAI-PMH are designed to permit global interchange
of metadata. Any harvester that insists on some
individual or local restriction of the
international standard is irresponsible. I did
not expect this of the National Library of
Australia. So far it seems to be globally unique in this behaviour.
Why does it fail? In a nutshell, possible hubris
and probable laziness. As to hubris, the NLA has
produced a set of requirements for harvesting to
which expects repositories to comply. Requiring
each repository to comply with its requirements
rather than National Library of Australia (NLA) harvesting properly:
* multiplies the work as each Australian
repository has to adapt its interface or opt-out
(rather than the NLA doing the job properly once),
* introduces the chance of breaking an existing
harvesting arrangement if the repository changes its interface, and
* would be absolutely fatal to the whole global
enterprise if another harvester came up with incompatible requirements.
In the case of my university it would definitely
break our in-house one-on-one harvesting for
Government data reporting and would be likely to
have similar flow on effects for our national PhD
thesis harvesting at the very least. If all
harvesters were to come up with idiosyncratic
requirements, the world would be in a real mess
and harvesting, not to mention search engines,
would be infeasible. Just imagine if Google were
to behave the same way in the html world! At most
these ARO requirements constitute a set of suggestions.
The probable laziness comes from programmers. It
is trivially easy to do a proper harvest from all
the repositories that exist in Australia (there
are not that many and even fewer softwares). I
can think of at least two strategies, neither of
which would take more than an hour of a competent
programmers time. ADT and the rest of the
worlds OAI harvesters can do it, why cant the NLA?
Best Practice
I hesitated to write this section because some
will think it is important. It isnt. The main
issue is the one above. However, it is bound to
be raised by the NLA to justify their so-called
requirements. This is the argument that their
harvesting requirements are good practice. In
fact it is not difficult to mount a case that the
GNU EPrints scheme is better practice than the
ARO scheme. Consider these quotes from the Dublin
Core Initiative (the red is mine):
4.14. Identifier
Label: Resource Identifier
Element Description: An unambiguous reference to
the resource within a given context. Recommended
best practice is to identify the resource by
means of a string or number conforming to a
formal identification system. Examples of formal
identification systems include the Uniform
Resource Identifier (URI) (including the Uniform
Resource Locator (URL), the Digital Object
Identifier (DOI) and the International Standard Book Number (ISBN).
Guidelines for content creation:
This element can also be used for local
identifiers (e.g. ID numbers or call numbers)
assigned by the Creator of the resource to apply
to a particular item. It should not be used for
identification of the metadata record itself.
[Using Dublin Core - The Elements,
<http://dublincore.org/documents/usageguide/elements.shtml>http://dublincore.org/documents/usageguide/elements.shtml]
3. Element Content and Controlled Vocabularies
Each Dublin Core element is optional and
repeatable, and there is no defined order of
elements. The ordering of multiple occurrences of
the same element (e.g., Creator) may have a
significance intended by the provider, but
ordering is not guaranteed to be preserved in every user environment.
[Using Dublin Core,
<http://dublincore.org/documents/usageguide/>http://dublincore.org/documents/usageguide/]
The NLA requirements specify that the relevant
metadata must be in a dc:identifier field
contrary to these guidelines. Further ARO
require that the first dc:identifier element be
the metadata identifier, despite clear indications that order does not matter.
Dont get me wrong. I am not on a crusade to
change the way repositories currently present
their OAI-PMH elements, unlike ARO. I really
dont care much how they interpret the standards.
But I do care about the NLA assuming such a
bullying stance in relation to Australian
repositories. Already at least two Australian
repositories have confessed to changing their
OAI-PMH interface to suit ARO! If this happens
elsewhere, the consequences for open access are
significant as incompatibilities are bound to arise.
Conclusions
1. Readers of the list should be alert for
similar unethical behaviour in their territories.
2. ARO and the NLA should start harvesting from
the Australian OAI-PMH interfaces correctly, as
soon as possible, just as the rest of the world does.
3. In the meantime, mis-harvested repositories
should be withdrawn from the ARO gateway database.
4. If ARO does not comply, Australian
repositories will need to consider boycotting the service.
Arthur Sale
Emeritus Professor of Computer Science
University of Tasmania
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/boai-forum/attachments/20091028/14d72d6c/attachment.html
More information about the Boai-forum
mailing list