[EP-tech] Re: Thousands of old eprints repropagated via OAI after epadmin redo_thumbnails &co.

Florian Heß hess at ub.uni-heidelberg.de
Wed Apr 9 10:31:43 BST 2014


Am 08.04.2014 11:57, schrieb Sebastien Francois:
> - recommit: by definition, this action should touch lastmod

Hi Sebastien,

I am afraid, I disagree here partly. Recommits should touch lastmod only 
*if* there are dirty substantial = user-editable metadata columns. This 
admittedly is difficult to decide by epadmin recommit tool as the 
changes often have taken place directly in advance, bypassing the API, 
for this I assume is the main purpose of that tool. Hence, what about 
--non-volatile-change alias --no-touch-lastmod switches (and/or their 
respective positive counterparts) to epadmin recommit and alike?

Look, there are so many actions an eprint commit trigger (e.g. 
/cfg.d/eprint_fields_automatic.pl) might include that you developers 
possibly cannot forsee, that you maybe would [not] consider an 
anti-conception feature misuse, and that might need a "recommit" 
sometimes e.g. when code just added requires all older eprints to be 
reprocessed. Touching lastmod no matter if a specific data object meets 
any seldomly occurring conditions for a given action, can result in 
problems. A guy from China had problems accessing our OAI server after - 
and maybe just because - we regenerated the thumbnails, thus potentially 
making they swallow half of HeiDOK.

Some more info on my OAI-harvesting aggregator scenario so you 
understand my problem:
The aggregator database is kept small by dropping items that have not 
been modified for more than 100 days. Practically, epadmin recommit is 
therefore a superb tool to make our "new media" service advertise rather 
old if not obsolete stuff. According to OAI specification (as is how I 
remember once having read), OAI-compliant repositories should bear in 
mind harvesters not mirroring all of a data provider. This includes in 
my eyes that the data provider should repropagate records with some 
caution in order to not irritate "bleeding edge stuff" harvesters. Sure, 
one can still argue that those are better off considering dc:date more, 
but this is not always an appropriate filtering criterion.


Kind regards
Florian :-)


-- 
UB Heidelberg (Altstadt)
Plöck 107-109, 69117 HD
Abt. Informationstechnik
Tel. 06221 / 54 3550
http://www.ub.uni-heidelberg.de/



More information about the Eprints-tech mailing list