[EP-tech] Poor performance due to cachemap, non-SQL joins
Jan Ploski
jpl at plosquare.com
Wed Sep 5 17:56:51 BST 2012
Hi,
In EPrints 3.0.5 (old, I know) I see very poor performance when a user
ticks the checkbox to view their eprints in live archive. Apparently
what happens is that IDs of all eprints from the archive are first
inserted into one of the dynamically created cache tables (this means
tens of thousands of individual INSERTs at a time, which seems like
great waste - the INSERTs are not even batched). Afterwards, only the
user's own eprints are displayed (let's say one or two of them).
I also noticed that joins (as in "database joins") are performed on huge
arrays in Perl code, which are scanned sequentially, rather than at the
SQL level. This contributes greatly to the sluggishness of
generate_views (2-3 days in an installation with 70000 eprints).
I suppose that these issues are known. But I searched in
trac.eprints.org, and haven't any conclusive answers to whether they
still exist in the current version? Trying to make a stronger case for
an upgrade...
Regards,
Jan Ploski
More information about the Eprints-tech
mailing list