[EP-tech] Internal server error when refreshing views

David R Newman drn at ecs.soton.ac.uk
Mon Jan 17 12:15:58 GMT 2022


Hi both,

I thought I should provide a bit if context to the introduction of the 
deprecation warning.

I have been trying to extend the acceptance testing for the EPrints 
codebase as part of my CI framework.  This means testing of different 
EPrints setups.  At the moment I run separate daily testing for the zero 
and publication flavours of EPrints 3.4. Having three potential XML 
libraries (XML::LibXML, XML::GDOME and XML::DOM) used by EPrints would 
require running separate testing against each library.

Therefore, I did some investigation into the status of all three 
libraries.  My conclusions were that LibXML is effectively the industry 
standard.  It is easily available across all platforms EPrints is 
supported and looks the best supported going forward. The latter of 
these is very important as XML import/export for EPrints is a critical 
feature and also a potent vector for cyberattacks.  So I want to make 
sure the underlying library will be patched for vulnerabilities and 
these will be rolled out in a way that it is easy for those deploying 
EPrints to upgrade, (i.e. through OS package management).

As John has identified, EPrints 3.4 (and 3.3) will try to use LibXML 
unless it is explicitly disabled (slightly confusingly by setting 
enable_libxml to 0).  If it is not disabled and XML::LibXML is not 
installed, then XML initialisation will fail rather than try to see if 
XML::GDOME is installed.  Therefore, as he suggests grepping for 
'enable_libxml' through the directory he lists should help you find the 
config line you need to comment out.  I would also added ~/site_lib/ to 
that list, if it exists on your EPrints repository server.

Regards

David Newman

On 17/01/2022 10:52, John Salter via Eprints-tech wrote:
> *CAUTION:* This e-mail originated outside the University of Southampton.
>
> Hi Jim,
> I've had a better look, and this is the block that checks which XML 
> modules to use, and also produces the warning you see:
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints3.4%2Fblob%2Fmaster%2Fperl_lib%2FEPrints%2FXML.pm%23L65-L86&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ca988862a82514a66fab608d9d9b321c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637780185647702009%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=k03pxT8JIHNgGShfhm%2FZlalnkgIpEehLkJVUex52BOA%3D&reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints3.4%2Fblob%2Fmaster%2Fperl_lib%2FEPrints%2FXML.pm%23L65-L86&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ca988862a82514a66fab608d9d9b321c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637780185647702009%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=k03pxT8JIHNgGShfhm%2FZlalnkgIpEehLkJVUex52BOA%3D&amp;reserved=0>
>
> The option to select which XML library to use is:
>
> $c->{enable_libxml}
>
> From the above, EPrints will try to use LibXML if:
> $c->{enable_libxml} **does not** exist
>
> $c->{enable_libxml} exists and is set to 1
>
> If either:
> -  'enable_libxml' is set to 0
>
> - the module EPrints::XML::LibXML produces errors
>
> then the deprecation warning message is produced.
>
> So, try grepping for 'enable_libxml' in
>
> ~/lib/
>
> ~/archives/ARCHIVEID/cfg/
>
> ~/cfg/
>
> First I would try checking LibXML is happy. Try running these on the 
> commandline (as the EPrints user):
> perl -e 'use XML::LibXML 1.63;'
>
> perl -e 'use XML::LibXML::SAX;'
>
> If either of the above lines produce errors or warnings, I'd look at 
> fixing them first.
>
> Secondly, I'd try grepping for 'enable_libxml' in:
>
> [EPRINTS_ROOT]/lib/
>
> [EPRINTS_ROOT]/cfg/
>   [EPRINTS_ROOT]/archives/[ARCHIVE_ID]/cfg/
>
> If it is explicitly disabled, try commenting-out that line, and running:
>   [EPRINTS_ROOT]/bin/epadmin test
>
> To see if things look happy.
>
> Cheers,
>
> John
>
> *From:*John Salter
> *Sent:* 14 January 2022 20:38
> *To:* Jim Brinkley <brinkley at uw.edu>; eprints-tech at ecs.soton.ac.uk
> *Subject:* Re: Internal server error when refreshing views
>
> Hi Jim,
>
> If memory serves me correctly, somewhere* in the EPrints config, there 
> is an option to use either DOM, or LibXML.
>
> This faint memory ties in with your note about previous upgrades. It 
> seems like your install might not be using LibXML, even though you've 
> added the packages to the server.
>
> *the 'somewhere' is possibly the crux here. My v3.4 knowledge isn't as 
> ingrained as v3.3, and I'm not at my computer at the moment.
>
> If you try grepping for 'DOM' in:
>
> ~/lib/cfg
>
> ~/perl_lib/SystemSettings.pl
>
> ~/cfg/
>
> ~/archives/ARCHIVE_ID/cfg/
>
> do you find any options that say 'use XML::DOM' in some way (although 
> not a literal perl 'use XML::DOM' statement)?
>
> Cheers,
>
> John
>
> ------------------------------------------------------------------------
>
> *From:*Jim Brinkley <brinkley at uw.edu>
> *Sent:* 14 January 2022 20:13
> *To:* John Salter <J.Salter at leeds.ac.uk>; eprints-tech at ecs.soton.ac.uk 
> <eprints-tech at ecs.soton.ac.uk>
> *Cc:* Jim Brinkley <brinkley at uw.edu>
> *Subject:* Re: Internal server error when refreshing views
>
> PS I just noticed that the error I see in the Apache server log 
> whenever I get the Internal Server Error is exactly the same one I see 
> at the end of the output below.
>
> *From: *Jim Brinkley <brinkley at uw.edu>
> *Date: *Friday, January 14, 2022 at 12:07 PM
> *To: *John Salter <J.Salter at leeds.ac.uk>, 
> "eprints-tech at ecs.soton.ac.uk" <eprints-tech at ecs.soton.ac.uk>
> *Cc: *Jim Brinkley <brinkley at uw.edu>
> *Subject: *Re: Internal server error when refreshing views
>
> John,
>
>                 Thanks for you quick reply. My guess is the problem is 
> not to do with the specific subject “3-D Reconconstruction” because it 
> occurs for all subjects in my list, including single word subjects, 
> such as “MindSeer”. However I just now ran
>
> [EPRINTS_ROOT]/bin/generate_views [ARCHIVE_ID] --view subjects
>
> as you suggested, and got the following output:
>
> eprints at synapse:/opt/eprints3$ bin/generate_views sigpubs --view subjects
>
> *** DEPRECATION WARNING ***
>
> In future versions, EPrints will be standardising to only support the 
> LibXML library for providing XML functionality.Please ensure LibXML is 
> installed before upgrading EPrints.
>
> Subroutine parse_xml_string redefined at 
> /opt/eprints3/bin/../perl_lib/EPrints/XML/DOM.pm line 119.
>
> Subroutine _parse_url redefined at 
> /opt/eprints3/bin/../perl_lib/EPrints/XML/DOM.pm line 144.
>
> Subroutine parse_xml redefined at 
> /opt/eprints3/bin/../perl_lib/EPrints/XML/DOM.pm line 164.
>
> Subroutine event_parse redefined at 
> /opt/eprints3/bin/../perl_lib/EPrints/XML/DOM.pm line 211.
>
> Subroutine _dispose redefined at 
> /opt/eprints3/bin/../perl_lib/EPrints/XML/DOM.pm line 248.
>
> Subroutine clone_and_own redefined at 
> /opt/eprints3/bin/../perl_lib/EPrints/XML/DOM.pm line 261.
>
> Subroutine document_to_string redefined at 
> /opt/eprints3/bin/../perl_lib/EPrints/XML/DOM.pm line 276.
>
> Subroutine make_document redefined at 
> /opt/eprints3/bin/../perl_lib/EPrints/XML/DOM.pm line 286.
>
> Subroutine version redefined at 
> /opt/eprints3/bin/../perl_lib/EPrints/XML/DOM.pm line 295.
>
> Can't use an undefined value as an ARRAY reference at 
> /opt/eprints3/bin/../perl_lib/XML/DOM/NamedNodeMap.pm line 142.
>
> I’ve seen something like this everytime I’ve run generate_views. I 
> thought maybe I should update LibXML, so in Ubuntu I did apt install 
> libxml-perl, and similar for some of the other ones in the above list. 
> But from the message above it looks like the perl modules in 
> eprints3..perl_lib are redefining existing ones in Ubuntu, so that 
> last error seems to be happening in the version of XML::DOM in 
> eprints3..perl_lib.
>
> I should also say that the last time I moved sigpubs to a new server 
> (around 2019) I upgraded from Eprints 2 to 3, and I think this problem 
> has been happening ever since. I pretty much ignored it until now 
> because I didn’t have time to deal with it, and I could always tell 
> people to just refresh the browser, but now I have a bit more time and 
> it would be nice to fix this.
>
> Jim
>
> *From: *John Salter <J.Salter at leeds.ac.uk>
> *Date: *Friday, January 14, 2022 at 11:27 AM
> *To: *"eprints-tech at ecs.soton.ac.uk" <eprints-tech at ecs.soton.ac.uk>, 
> Jim Brinkley <brinkley at uw.edu>
> *Subject: *Re: Internal server error when refreshing views
>
> Hi Jim,
> At a guess, this sounds like something that is trying to group records 
> together into a browse view mis-treating the '3-D ...' text - maybe 
> incorrectly normalising it to create the A / B / C ... links at the 
> top of the view page.
>
> If you have access to the server, and run:
>
> [EPRINTS_ROOT]/bin/generate_views [ARCHIVE_ID] --view subjects
>
> does it give any additional warnings/errors?
>
> Cheers,
>
> John
>
> ------------------------------------------------------------------------
>
> *From:*eprints-tech-bounces at ecs.soton.ac.uk 
> <eprints-tech-bounces at ecs.soton.ac.uk> on behalf of Jim Brinkley via 
> Eprints-tech <eprints-tech at ecs.soton.ac.uk>
> *Sent:* 14 January 2022 19:02
> *To:* eprints-tech at ecs.soton.ac.uk <eprints-tech at ecs.soton.ac.uk>
> *Subject:* [EP-tech] Internal server error when refreshing views
>
> *CAUTION:*This e-mail originated outside the University of Southampton.
>
> Hi,
>
> I run an eprints3 site at sigpubs.si.washington.edu. It generally 
> works fine except when a view regenerates I get an internal server error.
>
> To recreate this issue I can login, go to Admin:System Tools, and then 
> click on Regenerate Views, which as I understand it, causes all views 
> to be regenerated whenever I request them. For example, I can go to 
>  menu Browse:Browse by Subjects, then click on any of the subjects in 
> my customized subject list, as for example, my first subject, "3-D 
> Reconstruction". I then get "Internal Server Error". If I then refresh 
> the browser page the error goes away and the correct view appears. 
> This view then remains correct for as long as I've tested it, but I 
> think there may be a timeout when it gets regenerated again.
>
> I looked in the apache server log and find this error whenever I get 
> the Internal Server Error:
>
> Can't use an undefined value as an ARRAY reference at 
> /opt/eprints3/perl_lib/XML/DOM/NamedNodeMap.pm line 142.\n, 
> referer:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsigpubs.si.washington.edu%2Fview%2Fsubjects%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ca988862a82514a66fab608d9d9b321c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637780185647702009%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=fzOIf9fe9Wiu1wKd72gvIsviy760%2BhnBOJ0vzJrxOMA%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsigpubs.si.washington.edu%2Fview%2Fsubjects%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ca988862a82514a66fab608d9d9b321c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637780185647702009%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=fzOIf9fe9Wiu1wKd72gvIsviy760%2BhnBOJ0vzJrxOMA%3D&amp;reserved=0>
>
> I  am running what I think is the latest stable version of Eprints: 
> 3.4.3, on Ubuntu 20.04, perl version 5.30, apache2. Apache is running 
> as www-data.www-data. The repository is owned by eprints.eprints, and 
> www-data is in group eprints so it can write files to the repository. 
> I don't think permissions are the issue because once I refresh the 
> page the view is OK, and the files in the views directory are changed.
>
> I've searched the web and this mailing list and can't find this 
> particular situation. Any suggestions? Thanks.
>
> Jim Brinkley
>
> Structural Informatics Group
>
> University of Washington
>
> Seattle USA
>
> http//:si.washington.edu
>
>
> *** Options:http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ca988862a82514a66fab608d9d9b321c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637780185647702009%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=rkAnx7lt9Cry7Elj4fIjl4NIxScNh7x%2Frw0S6QwPJHw%3D&amp;reserved=0
> *** EPrints community wiki:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ca988862a82514a66fab608d9d9b321c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637780185647702009%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=dMn%2BJNG4okZBadguZsNF7slspWhtYofAWPpYM8MyWd8%3D&amp;reserved=0

-- 
This email has been checked for viruses by AVG.
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.avg.com%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ca988862a82514a66fab608d9d9b321c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637780185647702009%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=9e92EDkvZpNZQVgusohY1VZt%2FQc07iAvgdE171FtJXc%3D&amp;reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20220117/da40fffa/attachment-0001.html 


More information about the Eprints-tech mailing list