[EP-tech] Re: Memory usage in 3.2, Sword 1.3 and epdata packages

Tim Brody tdb2 at ecs.soton.ac.uk
Fri Jul 12 10:51:12 BST 2013


Correct. In 3.2 the HTTP post is all worked on in memory. In 3.3 XML
data are streamed and will be written to disk as it arrives.

/Tim.

On Fri, 2013-07-12 at 08:26 +0100, Ian Stuart wrote:
> With no real knowledge, and certainly no investigation.... I would 
> suspect the problem is actually with how the base64 files are handled, 
> rather then being an EPrints memory leak per sae.
> 
>  From the SWORD importers I've written, the process seems to be to
> 1) read in the deposit
> 2) unpack the deposit (zip into disk space, XML into memory)
> 3) create the eprint object
> 4) attach the files
> 5) write everything out
> 
> So I would suspect that what's happening is that all your base64 files 
> are created (in memory) from the XML (which is also in memory)
> 
> On 12/07/13 03:57, Mark Gregson wrote:
> > We’re using SWORD with epdata packages to deposit documents and
> > multimedia into our repository (3.2). This works fine for small file
> > sizes but at CPU and memory increases quickly until with a ~200MB file
> > the httpd process consumes all available memory and dies.  This is on a
> > RHEL5 64bit box with 8GB memory with a separate DB server.
> >
> > Clearly, the epdata format is not the most appropriate for this size
> > file due to the increased file size as a result of the base64 encoding
> > and because the document is embedded within the XML.  Changing package
> > format may alleviate/resolve the problem but as this is definitely going
> > to be a challenge in our environment I’m hoping it will be easier to
> > deal with the issue within EPrints.
> >
> > Note, I’ve already ascertained that is not related to libxm2’s
> > XML_PARSE_HUGE option being disabled, the failure occurs trying to run df.
> >
> > I’m about to start hunting for memory leaks and then doing additional
> > memory profiling.  If anyone has any suggestions about likely locations
> > for memory leaks in the code, information about expected memory usage
> > for SWORD with epdata packages, data from previous profiling, etc, it
> > would be very valuable.
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: This is a digitally signed message part
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20130712/dc1b0b61/attachment.bin 


More information about the Eprints-tech mailing list