<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>Hi James,</p>
    <p>I think you would need to look at this field in the Elements
      record in its database to look how it is being stored differently
      when there is an import compared to where there is manual entry.&nbsp;
      As you said I think the problem is in part that text box entries
      get parsed and encoded before going into the database but imports
      do not (or at very least the process between input and output to
      the Elements database is different).&nbsp; It would be useful to know
      how they look different in the Elements database as they may
      assist making EPrints more resilient to unexpected encodings in
      future.&nbsp; <br>
    </p>
    <p>However &quot;\\x{2019}&quot; looks like an escaped version of something
      that is not particularly valid.&nbsp; If this was &quot;\\u{2019}&quot; this
      would probably work better as \x I think can only be used to
      represent a standard ASCII character that can be only two hex
      digits like \x3a is a colon &quot;:&quot;. \u is used for the extended
      character set (i.e. UTF-16).&nbsp; \u{2019} in UTF-8 would be
      \xE2\x80\x99.<br>
    </p>
    <p>It would be interesting to get a bit more information about your
      other issue with regular quote marks and semi-colons that are part
      of the standard ASCII set rather than an extended characters.&nbsp;
      These really should not be causing a problem.</p>
    <p>Regards</p>
    <p>David Newman<br>
    </p>
    <div class="moz-cite-prefix">On 17/02/2021 09:44, James Kerwin via
      Eprints-tech wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:EMEW3|cd4942bd9091e1691c4e24848fa90e3bx1G9ju14eprints-tech-bounces|ecs.soton.ac.uk|CAKkNZ9CgeZoYtTU5D0yVPy1VsYxK1gNmKPn9c9=8gBRRbc1Msw@mail.gmail.com">
      
      <div style="padding-bottom: 10px; padding-top: 5px;">
        <div style="padding:12px; border:1px solid #8D3970;
          background-color:#F7F9FA; color:#8D3970; font-size:14px;
          line-height:22px; font-family: Calibri, Arial, Helvetica,
          sans-serif;"> <strong>CAUTION:</strong> This e-mail
          originated outside the University of Southampton. </div>
      </div>
      <div>
        <div dir="ltr">Hi All,<br>
          <div><br>
          </div>
          <div>This is an Elements/EPrints question. Apologies that it
            isn't purely EPrints, but this is probably the best place to
            get an answer. I want to know if others experience this or
            it's some oddity to our setup.</div>
          <div><br>
          </div>
          <div>We are using RT1 (for now) and EPrints 3.3.14 (also for
            now until upgrade). Occasionally we get an Elements record
            that is from Scopus, PubMed etc. that has an odd character
            in it that prevents upload. When I look in the Apache logs
            it tells me the problem. Yesterday's one was the presence
            of:<br>
            <br>
            &nbsp;&quot;Unicode Character “’” (U+2019)&quot; <br>
            <br>
            Which showed in the logs as:<br>
            <br>
            &quot;Can't escape \\x{2019}, try uri_escape_utf8() instead at
            /opt/eprints3/perl_lib/URI/Escape.pm&quot;<br>
            <br>
            Importantly if I copy the problem characters to the manual
            elements record it doesn't pose a problem. There appears
            some processing to properly encode characters entered via
            text box, but not characters dragged in from other sources
            into Elements.<br>
            <br>
            I've also had the issue with the files containing &quot;'&quot; or&quot;
            &quot;;&quot; etc not being accessible via Elements (a very similar,
            but different problem).<br>
            <br>
            I found where I COULD fix the former issue, but it involves
            changing EPrints code when I SHOULD be altering the
            Symplectic connector code on the repo server.<br>
            <br>
            Anyway, I'm not specifically looking for a solution, but has
            anybody else experienced anything similar? If so, does it
            stop with RT2? I hope to raise a ticket with Symplectic over
            this eventually.</div>
          <div><br>
          </div>
          <div>Thanks,</div>
          <div>James<br>
            <br>
            <br>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">*** Options: <a class="moz-txt-link-freetext" href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech">http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</a>
*** Archive: <a class="moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C33220a06523044b1ea9d08d8d32f2fe4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491546906397063%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KanBr1EFmg3Ix2Q0QgVFBEhlEoFO6mAqPrLefp0KOKE%3D&amp;reserved=0" originalSrc="http://www.eprints.org/tech.php/" shash="BsH0Pe406vXwRmesA9JgcSWY4s9ZwNzYr8/VHbLxtH6Iv94NPEcwLv6Rr0bJ82LiBgjodk/lnyfGSrJBbq1jj1nhOPFMsoiBMatZFBKOauQPDp620vN3cTMT4+kYEMsVanRPIriT5cmg5EueGU2d89GYW26GpfYlIjubwTTj3WY=">http://www.eprints.org/tech.php/</a>
*** EPrints community wiki: <a class="moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C33220a06523044b1ea9d08d8d32f2fe4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491546906407060%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2FBnUGphl0jECIHxqlW6ALJeravcODbmLKMLUSOQ9Lps%3D&amp;reserved=0" originalSrc="http://wiki.eprints.org/" shash="zuYl89qOo/s+URibx0I/kyz4Iu9r3869HWNBqk1Zz39jhMeOCFZr9efjdcm1hKKFYOhzJp+Yy+jOCVpFMRbkqXcwthaP/RfjxyrCpYz6gHF1U5qqy05+u+K5zKbvt9s9ryM6lPpzvJDQizNY1L+GnrfG2H1j4WkLOxrl0P4EXww=">http://wiki.eprints.org/</a></pre>
    </blockquote>
  <div id="DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><br>
<table style="border-top: 1px solid #D3D4DE;">
        <tr>
        <td style="width: 55px; padding-top: 13px;"><a href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C33220a06523044b1ea9d08d8d32f2fe4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491546906407060%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=K%2B7u6F7KPeL0%2FsSPTv6nTUcc63qpgEMBz58qs2UiFc0%3D&amp;reserved=0" originalSrc="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient" shash="Df6AQOgcm3h6XDZbBeRHKF4I+KyiQO8BLDWThHkQ2UCW6dSOiHoWxNFsFsnWhRr12qy3YmVfmuU73ArHZH7+xe2SLqQ2ZkT51pmjShra5Ocg5zL4pkWVfxe7HklOzuvLrI6fmvuBHsys08iJjS9neYqUeEBy6zwRfLO+k9hnvgM=" target="_blank"><img src="https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png" alt="" width="46" height="29" style="width: 46px; height: 29px;"></a></td>
                <td style="width: 470px; padding-top: 12px; color: #41424e; font-size: 13px; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">Virus-free. <a href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C33220a06523044b1ea9d08d8d32f2fe4%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491546906417052%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=CkCOYhi%2Fw2UPnmPdlHZJVPZCPcIPMa%2BnbujXTila29M%3D&amp;reserved=0" originalSrc="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient" shash="mx6JrnX79+lGnfplE3NIxI9GlKAiKHtOKW47DCiMUKKiG3ZtSG7Kz/4moNq3j4MvPFmky4UNzSoHrbR84cNNrMvKnsQ+iaaiSaMX2w6DTabSqkzaPT0qaUumhT2OeefKpMzYzcCJzaMzdfh5bagrZDVBKVp2WnFRSfHdnCtJmPY=" target="_blank" style="color: #4453ea;">www.avg.com</a>
                </td>
        </tr>
</table><a href="#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1" height="1"> </a></div></body>
</html>