<html>
<body>
<font size=3>[Forwarding from Arthur Sale. --Peter Suber.]<br><br>
<br>
I write to draw the list’s attention to unethical behaviour by a national
harvester – the <i>Australian Research Online</i> gateway. This
gateway, operated by the <i>National Library of Australia</i>, has
rejected the OAI-PMH standard and has announced a local variant. This
sort of behaviour by harvesters must be firmly stamped on as soon as
possible. International standards are to be complied with, not modified
for budgetary convenience.<br>
<br>
<b>Responsibility<br><br>
</b><u>It is the responsibility of any OAI-PMH harvester such as ARO,
ADT, ROAR, OpenDOAR, OAIster, etc to harvest correctly from all
OAI-PMH-compliant repositories that exist in the wild and which it regard
as its target group</u>. Please examine that sentence carefully: the
responsibility is with a <i>gateway</i> (which ARO is) to harvest from
<i>any</i> compliant OAI-PMH interface, and not to <i>misrepresent</i>
the data. The National Library fails on both counts.<br>
<br>
Remember that international standards such as OAI-PMH are designed to
permit global interchange of metadata. Any harvester that insists on some
individual or local restriction of the international standard is
irresponsible. I did not expect this of the National Library of
Australia. So far it seems to be globally unique in this behaviour.<br>
<br>
Why does it fail? In a nutshell, possible hubris and probable laziness.
As to hubris, the NLA has produced a set of requirements for harvesting
to which expects repositories to comply. Requiring each repository
to comply with its “requirements” rather than National Library of
Australia (NLA) harvesting properly:<br><br>
* multiplies the work as <u>each</u> Australian repository has to adapt
its interface or opt-out (rather than the NLA doing the job properly
<u>once</u>),<br><br>
* introduces the chance of breaking an existing harvesting arrangement if
the repository changes its interface, and <br><br>
* would be absolutely fatal to the whole global enterprise if another
harvester came up with incompatible requirements. <br><br>
In the case of my university it would definitely break our in-house
one-on-one harvesting for Government data reporting and would be likely
to have similar flow on effects for our national PhD thesis harvesting at
the very least. If all harvesters were to come up with idiosyncratic
requirements, the world would be in a real mess and harvesting, not to
mention search engines, would be infeasible. Just imagine if Google were
to behave the same way in the html world! At most these ARO
“requirements” constitute a set of suggestions.<br>
<br>
The probable laziness comes from programmers. It is trivially easy to do
a proper harvest from all the repositories that exist in Australia (there
are not that many and even fewer softwares). I can think of at least two
strategies, neither of which would take more than an hour of a competent
programmer’s time. ADT and the rest of the world’s OAI harvesters can do
it, why can’t the NLA?<br>
<br>
<b>“Best Practice”<br><br>
</b>I hesitated to write this section because some will think it is
important. It isn’t. The main issue is the one above. However, it is
bound to be raised by the NLA to justify their so-called “requirements”.
This is the argument that their harvesting “requirements” are good
practice. In fact it is not difficult to mount a case that the GNU
EPrints scheme is better practice than the ARO scheme. Consider these
quotes from the Dublin Core Initiative (the red is mine):<br><br>
</font><h3><b>“4.14. Identifier</b></h3><font size=3><i>Label: Resource
Identifier<br>
Element Description:</i> An unambiguous reference to the resource within
a given context. Recommended best practice is to identify the resource by
means of a string or number conforming to a formal identification system.
Examples of formal identification systems include the Uniform Resource
Identifier (URI) (including the Uniform Resource Locator (URL), the
Digital Object Identifier (DOI) and the International Standard Book
Number (ISBN).<br>
<i>Guidelines for content creation:<br>
</i>This element can also be used for local identifiers (e.g. ID numbers
or call numbers) assigned by the Creator of the resource to apply to a
particular item. It should not be used for identification of the metadata
record itself.” <br><br>
[Using Dublin Core - The Elements,
<a href="http://dublincore.org/documents/usageguide/elements.shtml">
http://dublincore.org/documents/usageguide/elements.shtml</a>] <br><br>
</font><h3><b>“3. Element Content and Controlled
Vocabularies</b></h3><font size=3>Each Dublin Core element is optional
and repeatable, and there is no defined order of elements. The ordering
of multiple occurrences of the same element (e.g., Creator) may have a
significance intended by the provider, but ordering is not guaranteed to
be preserved in every user environment.”<br><br>
[Using Dublin Core,
<a href="http://dublincore.org/documents/usageguide/">
http://dublincore.org/documents/usageguide/</a>] <br>
<br>
The NLA “requirements” specify that the relevant metadata must be in a
dc:identifier field contrary to these guidelines. Further ARO “require”
that the <u>first</u> dc:identifier element be the metadata identifier,
despite clear indications that order does not matter.<br>
<br>
Don’t get me wrong. I am not on a crusade to change the way repositories
currently present their OAI-PMH elements, unlike ARO. I really don’t care
much how they interpret the standards. But I do care about the NLA
assuming such a bullying stance in relation to Australian repositories.
Already at least two Australian repositories have confessed to changing
their OAI-PMH interface to suit ARO! If this happens elsewhere, the
consequences for open access are significant as incompatibilities are
bound to arise.<br>
<br>
<b>Conclusions<br><br>
</b>1. Readers of the list should be alert for similar unethical
behaviour in their territories.<br>
2. ARO and the NLA should start harvesting from the Australian
OAI-PMH interfaces correctly, as soon as possible, just as the rest of
the world does.<br>
3. In the meantime, mis-harvested repositories should be withdrawn
from the ARO gateway database.<br>
4. If ARO does not comply, Australian repositories will need to
consider boycotting the service.<br>
<br>
Arthur Sale<br>
Emeritus Professor of Computer Science<br>
University of Tasmania<br>
</font></body>
</html>