<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hi James,</p>
<p>I think you would need to look at this field in the Elements
record in its database to look how it is being stored differently
when there is an import compared to where there is manual entry.
As you said I think the problem is in part that text box entries
get parsed and encoded before going into the database but imports
do not (or at very least the process between input and output to
the Elements database is different). It would be useful to know
how they look different in the Elements database as they may
assist making EPrints more resilient to unexpected encodings in
future. <br>
</p>
<p>However "\\x{2019}" looks like an escaped version of something
that is not particularly valid. If this was "\\u{2019}" this
would probably work better as \x I think can only be used to
represent a standard ASCII character that can be only two hex
digits like \x3a is a colon ":". \u is used for the extended
character set (i.e. UTF-16). \u{2019} in UTF-8 would be
\xE2\x80\x99.<br>
</p>
<p>It would be interesting to get a bit more information about your
other issue with regular quote marks and semi-colons that are part
of the standard ASCII set rather than an extended characters.
These really should not be causing a problem.</p>
<p>Regards</p>
<p>David Newman<br>
</p>
<div class="moz-cite-prefix">On 17/02/2021 09:44, James Kerwin via
Eprints-tech wrote:<br>
</div>
<blockquote type="cite" cite="mid:EMEW3|cd4942bd9091e1691c4e24848fa90e3bx1G9ju14eprints-tech-bounces|ecs.soton.ac.uk|CAKkNZ9CgeZoYtTU5D0yVPy1VsYxK1gNmKPn9c9=8gBRRbc1Msw@mail.gmail.com">
<div style="padding-bottom: 10px; padding-top: 5px;">
<div style="padding:12px; border:1px solid #8D3970;
background-color:#F7F9FA; color:#8D3970; font-size:14px;
line-height:22px; font-family: Calibri, Arial, Helvetica,
sans-serif;"> <strong>CAUTION:</strong> This e-mail
originated outside the University of Southampton. </div>
</div>
<div>
<div dir="ltr">Hi All,<br>
<div><br>
</div>
<div>This is an Elements/EPrints question. Apologies that it
isn't purely EPrints, but this is probably the best place to
get an answer. I want to know if others experience this or
it's some oddity to our setup.</div>
<div><br>
</div>
<div>We are using RT1 (for now) and EPrints 3.3.14 (also for
now until upgrade). Occasionally we get an Elements record
that is from Scopus, PubMed etc. that has an odd character
in it that prevents upload. When I look in the Apache logs
it tells me the problem. Yesterday's one was the presence
of:<br>
<br>
"Unicode Character “’” (U+2019)" <br>
<br>
Which showed in the logs as:<br>
<br>
"Can't escape \\x{2019}, try uri_escape_utf8() instead at
/opt/eprints3/perl_lib/URI/Escape.pm"<br>
<br>
Importantly if I copy the problem characters to the manual
elements record it doesn't pose a problem. There appears
some processing to properly encode characters entered via
text box, but not characters dragged in from other sources
into Elements.<br>
<br>
I've also had the issue with the files containing "'" or"
";" etc not being accessible via Elements (a very similar,
but different problem).<br>
<br>
I found where I COULD fix the former issue, but it involves
changing EPrints code when I SHOULD be altering the
Symplectic connector code on the repo server.<br>
<br>
Anyway, I'm not specifically looking for a solution, but has
anybody else experienced anything similar? If so, does it
stop with RT2? I hope to raise a ticket with Symplectic over
this eventually.</div>
<div><br>
</div>
<div>Thanks,</div>
<div>James<br>
<br>
<br>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">*** Options: <a class="moz-txt-link-freetext" href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech">http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</a>
*** Archive: <a class="moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C33220a06523044b1ea9d08d8d32f2fe4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491546906397063%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KanBr1EFmg3Ix2Q0QgVFBEhlEoFO6mAqPrLefp0KOKE%3D&reserved=0" originalSrc="http://www.eprints.org/tech.php/" shash="BsH0Pe406vXwRmesA9JgcSWY4s9ZwNzYr8/VHbLxtH6Iv94NPEcwLv6Rr0bJ82LiBgjodk/lnyfGSrJBbq1jj1nhOPFMsoiBMatZFBKOauQPDp620vN3cTMT4+kYEMsVanRPIriT5cmg5EueGU2d89GYW26GpfYlIjubwTTj3WY=">http://www.eprints.org/tech.php/</a>
*** EPrints community wiki: <a class="moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C33220a06523044b1ea9d08d8d32f2fe4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491546906407060%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2FBnUGphl0jECIHxqlW6ALJeravcODbmLKMLUSOQ9Lps%3D&reserved=0" originalSrc="http://wiki.eprints.org/" shash="zuYl89qOo/s+URibx0I/kyz4Iu9r3869HWNBqk1Zz39jhMeOCFZr9efjdcm1hKKFYOhzJp+Yy+jOCVpFMRbkqXcwthaP/RfjxyrCpYz6gHF1U5qqy05+u+K5zKbvt9s9ryM6lPpzvJDQizNY1L+GnrfG2H1j4WkLOxrl0P4EXww=">http://wiki.eprints.org/</a></pre>
</blockquote>
<div id="DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><br>
<table style="border-top: 1px solid #D3D4DE;">
        <tr>
<td style="width: 55px; padding-top: 13px;"><a href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C33220a06523044b1ea9d08d8d32f2fe4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491546906407060%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=K%2B7u6F7KPeL0%2FsSPTv6nTUcc63qpgEMBz58qs2UiFc0%3D&reserved=0" originalSrc="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient" shash="Df6AQOgcm3h6XDZbBeRHKF4I+KyiQO8BLDWThHkQ2UCW6dSOiHoWxNFsFsnWhRr12qy3YmVfmuU73ArHZH7+xe2SLqQ2ZkT51pmjShra5Ocg5zL4pkWVfxe7HklOzuvLrI6fmvuBHsys08iJjS9neYqUeEBy6zwRfLO+k9hnvgM=" target="_blank"><img src="https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png" alt="" width="46" height="29" style="width: 46px; height: 29px;"></a></td>
                <td style="width: 470px; padding-top: 12px; color: #41424e; font-size: 13px; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">Virus-free. <a href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C33220a06523044b1ea9d08d8d32f2fe4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491546906417052%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=CkCOYhi%2Fw2UPnmPdlHZJVPZCPcIPMa%2BnbujXTila29M%3D&reserved=0" originalSrc="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient" shash="mx6JrnX79+lGnfplE3NIxI9GlKAiKHtOKW47DCiMUKKiG3ZtSG7Kz/4moNq3j4MvPFmky4UNzSoHrbR84cNNrMvKnsQ+iaaiSaMX2w6DTabSqkzaPT0qaUumhT2OeefKpMzYzcCJzaMzdfh5bagrZDVBKVp2WnFRSfHdnCtJmPY=" target="_blank" style="color: #4453ea;">www.avg.com</a>
                </td>
        </tr>
</table><a href="#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1" height="1"> </a></div></body>
</html>