[EP-tech] Re: Memory usage in 3.2, Sword 1.3 and epdata packages
Tim Brody
tdb2 at ecs.soton.ac.uk
Fri Jul 12 10:51:12 BST 2013
Correct. In 3.2 the HTTP post is all worked on in memory. In 3.3 XML
data are streamed and will be written to disk as it arrives.
/Tim.
On Fri, 2013-07-12 at 08:26 +0100, Ian Stuart wrote:
> With no real knowledge, and certainly no investigation.... I would
> suspect the problem is actually with how the base64 files are handled,
> rather then being an EPrints memory leak per sae.
>
> From the SWORD importers I've written, the process seems to be to
> 1) read in the deposit
> 2) unpack the deposit (zip into disk space, XML into memory)
> 3) create the eprint object
> 4) attach the files
> 5) write everything out
>
> So I would suspect that what's happening is that all your base64 files
> are created (in memory) from the XML (which is also in memory)
>
> On 12/07/13 03:57, Mark Gregson wrote:
> > We’re using SWORD with epdata packages to deposit documents and
> > multimedia into our repository (3.2). This works fine for small file
> > sizes but at CPU and memory increases quickly until with a ~200MB file
> > the httpd process consumes all available memory and dies. This is on a
> > RHEL5 64bit box with 8GB memory with a separate DB server.
> >
> > Clearly, the epdata format is not the most appropriate for this size
> > file due to the increased file size as a result of the base64 encoding
> > and because the document is embedded within the XML. Changing package
> > format may alleviate/resolve the problem but as this is definitely going
> > to be a challenge in our environment I’m hoping it will be easier to
> > deal with the issue within EPrints.
> >
> > Note, I’ve already ascertained that is not related to libxm2’s
> > XML_PARSE_HUGE option being disabled, the failure occurs trying to run df.
> >
> > I’m about to start hunting for memory leaks and then doing additional
> > memory profiling. If anyone has any suggestions about likely locations
> > for memory leaks in the code, information about expected memory usage
> > for SWORD with epdata packages, data from previous profiling, etc, it
> > would be very valuable.
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: This is a digitally signed message part
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20130712/dc1b0b61/attachment.bin
More information about the Eprints-tech
mailing list