[EP-tech] Antwort: Re: Antwort: Re: perl module update introduced some trouble with entities

Christopher Gutteridge totl at soton.ac.uk
Tue Dec 1 15:07:28 GMT 2020


This was a bit of a fiddle to make it possible to do things like £ 
é etc.  to make people's lives a little easier when writing the 
templates which are XHTML.

The obvious other approach would be to preprocess them with something 
like this

s/&([a-z]+);/expandentitiy($1)/ge

but that would break with the wisdom that you should never parse XML 
with a regular expression.


On 01/12/2020 14:37, David R Newman via Eprints-tech wrote:
>
> Hi all,
>
> I have been blind.  EPrints (at least latest 3.4) already has an 
> entities.dtd in lib/ and is already used in most of the standard XML 
> template, phrase, etc. files.  I think the problem is that it does not 
> link in properly in most if not all cases. So I will investigate how 
> that can be done better to avoid encountering undefined entities errors.
>
> Regards
>
> David Newman
>
> On 01/12/2020 14:26, martin.braendle at uzh.ch wrote:
>> *CAUTION:* This e-mail originated outside the University of Southampton.
>>
>> The entities file we have here has the following preamble
>>
>> <!-- Portions (C) International Organization for Standardization 1986
>>      Permission to copy in any form is granted for use with
>>      conforming SGML systems and applications as defined in
>>      ISO 8879, provided this notice is included in all copies.
>> -->
>>
>> and contains more than 500 lines.
>>
>> It stems most probably from here: 
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2FTR%2FREC-html40-971218%2Fsgml%2Fentities.html&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509770161%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=2eHr8dRDRRDmGOYgfSZZ%2B6W1IhoNh3CaVd1rvc6fRsQ%3D&amp;reserved=0 
>> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2FTR%2FREC-html40-971218%2Fsgml%2Fentities.html&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509770161%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=2eHr8dRDRRDmGOYgfSZZ%2B6W1IhoNh3CaVd1rvc6fRsQ%3D&amp;reserved=0>
>>
>> Kind regards,
>>
>> Martin
>>
>> --
>> Dr. Martin Brändle
>> Zentrale Informatik
>> Universität Zürich
>> Stampfenbachstr. 73
>> CH-8006 Zürich
>>
>> Inactive hide details for "David R Newman" ---01/12/2020 
>> 15:19:06---Hi all, EPrints 3.4 has has the patch applied for issues 
>> wi"David R Newman" ---01/12/2020 15:19:06---Hi all, EPrints 3.4 has 
>> has the patch applied for issues with newer versions of
>>
>> Von: "David R Newman" <drn at ecs.soton.ac.uk>
>> An: eprints-tech at ecs.soton.ac.uk
>> Kopie: th.lauke at arcor.de, martin.braendle at uzh.ch
>> Datum: 01/12/2020 15:19
>> Betreff: Re: Antwort: Re: [EP-tech] perl module update introduced 
>> some trouble with entities
>>
>> ------------------------------------------------------------------------
>>
>>
>>
>> Hi all,
>>
>> EPrints 3.4 has has the patch applied for issues with newer versions 
>> of LibXML and EPrints 3.4.2 onwards should have this particular issue 
>> resolved.  Regarding special characters, I will look into producing 
>> (or hopefully finding) an entities.dtd for all the special characters 
>> that EPrints repositories may want use and then update standard 
>> template and phrase files to use this.  In fact it is probably even 
>> worth doing this for citation and workflow files as well.  I have 
>> created an issue for EPrints 3.4 to address this:
>>
>> _https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints3.4%2Fissues%2F112_&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509780153%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ipUuBxXHIZLi%2BlQvZTpiXHutP%2FbeYmSY852uLjlFlEg%3D&amp;reserved=0 
>> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints3.4%2Fissues%2F112&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509780153%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=L4nlwbDYGg0apB4%2FoBivi%2FV2M2LVKNHF%2FORlO1jz6n0%3D&amp;reserved=0> 
>>
>>
>> Regards
>>
>> David Newman
>>
>> On 01/12/2020 13:35, _martin.braendle at uzh.ch_ 
>> <mailto:martin.braendle at uzh.ch> wrote:
>>
>>     *CAUTION:* This e-mail originated outside the University of
>>     Southampton.
>>
>>     Hi Thomas,
>>
>>     there should be  an entities.dtd file in [eprints_root]/lib/,
>>     maybe this is missing or entries are missing in it?
>>
>>     Also a phrase file should mention that in the
>>
>>     <!DOCTYPE phrases SYSTEM "entities.dtd">
>>
>>     definition right at the beginning after the XML declaration.
>>
>>     Kind regards,
>>
>>     Martin
>>
>>     --
>>     Dr. Martin Brändle
>>     Zentrale Informatik
>>     Universität Zürich
>>     Stampfenbachstr. 73
>>     CH-8006 Zürich
>>
>>
>>     Inactive hide details for "David R Newman via Eprints-tech"
>>     ---01/12/2020 14:24:14---Hi Thomas, Named HTML entities are not
>>     sup"David R Newman via Eprints-tech" ---01/12/2020 14:24:14---Hi
>>     Thomas, Named HTML entities are not supported in XML you need to
>>     use the decimal
>>
>>     Von: "David R Newman via Eprints-tech"
>>     _<eprints-tech at ecs.soton.ac.uk>_
>>     <mailto:eprints-tech at ecs.soton.ac.uk>
>>     An: _<eprints-tech at ecs.soton.ac.uk>_
>>     <mailto:eprints-tech at ecs.soton.ac.uk>, _<th.lauke at arcor.de>_
>>     <mailto:th.lauke at arcor.de>
>>     Datum: 01/12/2020 14:24
>>     Betreff: Re: [EP-tech] perl module update introduced some trouble
>>     with entities
>>     Gesendet von: _<eprints-tech-bounces at ecs.soton.ac.uk>_
>>     <mailto:eprints-tech-bounces at ecs.soton.ac.uk>
>>
>>     ------------------------------------------------------------------------
>>
>>
>>
>>     Hi Thomas,
>>
>>     Named HTML entities are not supported in XML you need to use the
>>     decimal code XML entity for &auml; which is &#228;
>>
>>     This is the same as needing to replace things like &amp; and
>>     &copy; with their equivalent decimal code XML entities.
>>
>>     Regards
>>
>>     David Newman
>>
>>     On 01/12/2020 12:00, th.lauke--- via Eprints-tech wrote:
>>
>>
>>     CAUTION: This e-mail originated outside the University of
>>     Southampton.
>>
>>     Hi all,
>>
>>     any hint where to start digging for reason(s) after following error:
>>     Failed to parse XML file:
>>     /usr/share/eprints/site_lib/lang/en/phrases/modified.xml: Entity:
>>     line 226: parser error : Entity 'auml' not defined
>>
>>     This error occurs after updating some perl modules ... :(
>>
>>     Is the 'bad' module already known?
>>     What is more effective: Fixing the module (version) or the phrase
>>     file?
>>
>>     Thanks for any idea in advance
>>     Thomas
>>
>>     *** Options:
>>     _http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech_
>>     <http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech>
>>     *** Archive:
>>     _https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509780153%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=xgzKAciKtVg10ZIZM815vdNerhsgrD%2BYK2ExuSZ03OQ%3D&amp;reserved=0
>>     <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509780153%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=xgzKAciKtVg10ZIZM815vdNerhsgrD%2BYK2ExuSZ03OQ%3D&amp;reserved=0>
>>     *** EPrints community wiki:
>>     _https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509780153%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NMUrjIa%2FI0VigEx8hJb3QD7xXjk9cwBG2GjXIGE0b1M%3D&amp;reserved=0
>>     <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509780153%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NMUrjIa%2FI0VigEx8hJb3QD7xXjk9cwBG2GjXIGE0b1M%3D&amp;reserved=0>
>>
>>     	
>>         Virus-free. _https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com_%2F&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509790146%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=lTKI4UOpP5%2FZl9n59RoTclO3CgGkMjbB1CwxsS01fHQ%3D&amp;reserved=0
>>         <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509790146%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=pziKZHfL8B9dAuev68vCUpINtNcdJpsagqTyA1TfM9U%3D&amp;reserved=0>
>>
>>     *** Options:
>>     _http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech_
>>     <http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech>
>>     *** Archive: _https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F_&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509790146%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=szWU82ptgdzpow%2FYNyYIEJaky5MiaySBLxxoWj%2F6h64%3D&amp;reserved=0
>>     <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509790146%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=uJl8zvDpmzOt7%2FkEzsNpvReLDo6Im%2Bj9aj4jXQV%2Bpxk%3D&amp;reserved=0>
>>     *** EPrints community wiki: _https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F_&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509790146%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=3WRQg4m7kLJPPw6JsGnt3NibAjXhmU5XPoRbUh2FeKM%3D&amp;reserved=0
>>     <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509790146%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NtrMbtZBNt6rC7JQDSYGtuFPpyY9VnGnLQcKoLNW4Po%3D&amp;reserved=0>
>>
>> 	
>>
>>     Virus-free. _https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com_%2F&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509790146%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=lTKI4UOpP5%2FZl9n59RoTclO3CgGkMjbB1CwxsS01fHQ%3D&amp;reserved=0
>>     <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509790146%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=pziKZHfL8B9dAuev68vCUpINtNcdJpsagqTyA1TfM9U%3D&amp;reserved=0>
>>
>>
>>
>
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509800141%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=F0yvURCFg5hqOpGnYVisdanMX%2F9%2B0zgb0gjtqFbBi3Y%3D&amp;reserved=0> 
> 	Virus-free. https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2F&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509800141%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KpWIJgGdm69PJNIX6EyTxJU%2Fw4T6G9TYDpNfmjKAGEU%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509800141%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=F0yvURCFg5hqOpGnYVisdanMX%2F9%2B0zgb0gjtqFbBi3Y%3D&amp;reserved=0> 
>
>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509800141%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=N1D9fD7kcFfoa%2F227WNf0ZcJs2ODW19ojEsuQW%2BMphs%3D&amp;reserved=0
> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7C%7C17ae8f455af14b1ad95b08d8960ad26b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637424320509800141%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NqLrOXXcfxYSi2J%2B8l0aefn0sSlEhR%2BVOyhLgUeL7jc%3D&amp;reserved=0

-- 
Christopher Gutteridge <totl at soton.ac.uk>
You should read our team blog at http://blog.soton.ac.uk/webteam/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20201201/db902ab1/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20201201/db902ab1/attachment-0002.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20201201/db902ab1/attachment-0003.gif 


More information about the Eprints-tech mailing list