<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">The largest that I'm aware of
(>300K) is
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
<a href="http://discovery.ucl.ac.uk/">http://discovery.ucl.ac.uk</a>.<br>
<br>
Andrew<br>
<br>
<br>
On 09/06/14 10:04, sf2 wrote:<br>
</div>
<blockquote
cite="mid:52f68b51f1ff3fc2968c34a95371b563@ecs.soton.ac.uk"
type="cite">
<p> </p>
<p>The Uni of Southampton has over 100k records, the repo works
fine.</p>
<p>Bits that may not scale so well on 3.2/3.3:</p>
<p>- Searching/Indexing: indexes are stored alongside your data,
mysql database - deep LEFT JOIN are generated if you're using
many fields in your simple search</p>
<p>- Too many Compound/Multiple fields: each compound/multiple
field adds a DB auxilliary table (one extra READ or WRITE for
each of those)</p>
<p>- Views: crunching the "totals" is tricky over large filtered
datasets - also lots of sorting going on -> slow</p>
<p>- Document relations: some bugs in EPrints 3.2 generates lots
of document relations (thumbnails etc) - clogs the DB</p>
<p>- History: similarly some bugs in early 3.2's were generating
far too many "history" records (one DB record + one XML file
on-disk) which slows things down a lot</p>
<p> </p>
<p>Unlike Yuri, I don't recall any slow delivery of content - if
you look at Apache::Rewrite you'll see that EPrints releases the
file to Apache early in the request process - and that scales.</p>
<p>FYI, I want to get rid of searching out of EPrints altogether
and use only Xapian: no more "search/indexes" data in your
metadata database -> lighter DB, searching/ordering done by a
1/3 party library we don't need to maintain. Also Xapian offers
lots of extras (facets, suggestions, probability match...)</p>
<p>Also, on my eprints4 branch on github you'll see a series of
patches to enable memory caching (via memcached) to read data
records (eprint,user..) from memory rather than from the DB (of
course fall backs to the DB when the record is modified).
Untested on 3.3, may work ;-)</p>
<p> </p>
<p>Seb</p>
<p> </p>
<p> </p>
<p><span style="font-family: 'Lucida Grande', Verdana, Arial,
Helvetica, sans-serif;">On 09.06.2014 11:53, Yuri wrote:</span></p>
<blockquote type="cite" style="padding-left:5px;
border-left:#1010ff 2px solid; margin-left:5px"><!-- html ignored --><!-- head ignored --><!-- meta ignored -->
<pre>Il 09/06/2014 10:09, Ian Stuart ha scritto:</pre>
<blockquote type="cite" style="padding-left:5px;
border-left:#1010ff 2px solid; margin-left:5px">Are there any
large-scale EPrints repos out there? (by large scale, I mean
100,000+ accessible records)</blockquote>
<pre>we've about 40.000 record in two repository (with 10.000 record with
full text)
I think the big problem is in Apache delivery files (also you've to tune
it for Perl and both static content...), there should be a away to serve
files without using perl, or in a minimal way. Another big problem is
updating views, takes a lot of time and I had to disable some of the
because it takes ages (days) do regenerate/update the view.
The site is often at load 1, 1.5, most of the time serving pdfs outside.
It works but not perfect.</pre>
<blockquote type="cite" style="padding-left:5px;
border-left:#1010ff 2px solid; margin-left:5px">The database
technology will cope with up to 2 million records, but I don't
think the rest of EPrints will cope :D ... but what's in use,
in practice?</blockquote>
<pre>*** Options: <a moz-do-not-send="true" href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech">http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</a>
*** Archive: <a moz-do-not-send="true" href="http://www.eprints.org/tech.php/">http://www.eprints.org/tech.php/</a>
*** EPrints community wiki: <a moz-do-not-send="true" href="http://wiki.eprints.org/">http://wiki.eprints.org/</a>
*** EPrints developers Forum: <a moz-do-not-send="true" href="http://forum.eprints.org/">http://forum.eprints.org/</a>
</pre>
</blockquote>
<p> </p>
<div> </div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">*** Options: <a class="moz-txt-link-freetext" href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech">http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</a>
*** Archive: <a class="moz-txt-link-freetext" href="http://www.eprints.org/tech.php/">http://www.eprints.org/tech.php/</a>
*** EPrints community wiki: <a class="moz-txt-link-freetext" href="http://wiki.eprints.org/">http://wiki.eprints.org/</a>
*** EPrints developers Forum: <a class="moz-txt-link-freetext" href="http://forum.eprints.org/">http://forum.eprints.org/</a>
</pre>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Andrew D Bell
EPrints Services
School of Electronics and Computer Science
University of Southampton
Southampton
SO17 1BJ
+44 (0)23 8059 8814
<a class="moz-txt-link-abbreviated" href="mailto:a.d.bell@ecs.soton.ac.uk">a.d.bell@ecs.soton.ac.uk</a>
<a class="moz-txt-link-freetext" href="http://www.eprints.org/">http://www.eprints.org/</a>
<a class="moz-txt-link-freetext" href="http://eprintsservices.wordpress.com/">http://eprintsservices.wordpress.com/</a>
<a class="moz-txt-link-freetext" href="http://twitter.com/EPrintsServices">http://twitter.com/EPrintsServices</a></pre>
</body>
</html>