[EP-tech] Re: Thousands of old eprints repropagated via OAI after epadmin redo_thumbnails &co.

Sebastien Francois sf2 at ecs.soton.ac.uk
Tue Apr 8 10:57:08 BST 2014


Hi Florian,

Interesting scenario!..

On 08/04/14 07:43, Florian Heß wrote:
> Hi,
>
> is there an option (am I just missing it? using EPrints v3.3.10) to
> leave the current lastmod timestamp untouched when processing an epadmin
> or alike routine automated by EPrints-boxed tools? We had in the past
> and will still have a need to batch-process plenty of eprints, epadmin
> redo_thumbnails for instance, which results in e.g. their being
> renotified via our aggregator for freshly acquired media (RSS-feed and
> mail channel both are limited to 1000 items per request, thus some
> really fresh ones might be suppressed in the list). Client OAI
> harvesters might handle them as new, too, which would be not that
> user-friendly.
In the OAI scenario, I think that the OAI clients are faulty as an 
update of the lastmod timestamp doesn't modify the resource's unique 
identifier which should be used to see if an item is new or being updated.

But I agree that certain actions shouldn't update the lastmod field (cf. 
below).

> Pondering on it, I would even prefer to see EPrints update it only when
> a non-admin user has acted upon an eprint, when they changed metadata.
> But sometimes the admin might want to touch eprints "obviously" indeed,
> e.g. when he changed field values using the regular workflow or when he
> explicitly opts in that.
>
> To put it in a nutshell, I'd wish I could use EPrints API this way:
>
>      use EPrints qw(no_autoupdate_lastmod);
>
>      $dataobj->commit(); # stealth update if $dataobj in storage
>      $dataobj->commit({ update_lastmod => 1 });
>          # opt-in overwrite default {update_lastmod}
>          #     = !exists $import_opts{no_autoupdate_lastmod}
>
> In order to ensure that changes made by admin are still obvious in terms
> of database-level debugging or "forensics", my idea is to have an
> API-hidden and unprocessed native DATESTAMP field, say "sql_updated",
> and have it independently update with means of the database engine.
> (AFAIK, MySQL implies out-of-the-box "ON UPDATE CURRENT_TIMESTAMP()" for
> any first datestamp field of a table.)
There's a "non_volatile_change" flag you can set (grep for it in 
DataObj/EPrint.pm), which does pretty much the same as 
"no_autoupdate_lastmod".

I don't see a need for another timestamp, but I agree that the behaviour 
around lastmod could be reviewed. Also I don't think fields should be 
updated or not depending on which part of the system you're using 
(workflow etc) or which user is modifying a resource. The behaviour 
should be consistent and intuitive (and handled at the low-level for 
such system/internal fields).

What about reviewing which actions should update lastmod and which ones 
should NOT update lastmod?

I think that lastmod should be updated when either the metadata is 
modified and/or when the file content is changed hence, from the 
available epadmin functions:

- rebuild_triples: no metadata/content change => no lastmod update
- recommit: by definition, this action should touch lastmod
- reorder: re-create the order values for searching => no lastmod update
- reindex: similar as above
- redo_mime_type: might modify the Document's mime type => update 
lastmod when the mime type is updated
- redo_thumbnails: generation of volatile files for previewing => no 
lastmod update

What do you reckon? Which other actions need to be reviewed/included here?

> By the way, guessing there isn't another way to restore the timestamps
> but from backup dumps, is there? Is there yet a way to commit an eprint
> explicitly without updating the lastmod timestamp that I can consider in
> the future to prevent this?
You might/should be able to recover the timestamps by querying the 
"history" dataset which keeps records of changes for eprint objects 
alongside their revision number (which is stored in the eprint).

By setting the non_volatile_change flag you should be able to avoid the 
auto-updating property of lastmod.

I can create new github issues once we're happy with the revised behaviour.

Seb.

>
>
> Regards
> Florian
>



More information about the Eprints-tech mailing list