<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
tt
        {mso-style-priority:99;
        font-family:"Courier New";}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Arial","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="SV" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#1F497D">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#1F497D">Thank you!&nbsp; Works like a charm
</span><span lang="EN-GB" style="font-size:11.0pt;font-family:Wingdings;color:#1F497D">J</span><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#1F497D">/Christer<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
<b>On Behalf Of </b>martin.braendle@id.uzh.ch<br>
<b>Sent:</b> den 18 februari 2016 11:22<br>
<b>To:</b> eprints-tech@ecs.soton.ac.uk<br>
<b>Subject:</b> [EP-tech] Antwort: Antwort: Searching fails when database field contains Å (utf8 %c3%85)<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p><span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">I have added </span>
<br>
<br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">chr(0x00c5) =&gt; 'A', &nbsp; &nbsp; # Å</span><br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">chr(0xc385) =&gt; 'A', &nbsp; &nbsp; # Å</span><br>
<br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">to the $EPrints::Index::FREETEXT_CHAR_MAPPING list in Tokenizer.pm, restarted both the web server and indexer process, and reindexed the eprint that contained Ågren as author.</span><br>
<br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">Now Advanced Search works with Å
</span><br>
<br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">Kind regards,</span><br>
<br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">Martin</span><br>
<br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">--</span><br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">Dr. Martin Brändle</span><br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">Zentrale Informatik</span><br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">Universität Zürich</span><br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">Stampfenbachstr. 73</span><br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">CH-8006 Zürich</span><br>
<br>
<br>
<img width="16" height="16" id="_x0000_i1025" src="cid:image001.gif@01D16A52.F112D1E0" alt="Inactive hide details for martin.braendle---18/02/2016 10:14:46---Hi, we can reproduce the behavior:"><span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#424282">martin.braendle---18/02/2016
 10:14:46---Hi, we can reproduce the behavior:</span><br>
<br>
<span style="font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#5F5F5F">Von:
</span><span style="font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;"><a href="mailto:martin.braendle@id.uzh.ch">martin.braendle@id.uzh.ch</a></span><br>
<span style="font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#5F5F5F">An: </span>
<span style="font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;"><a href="mailto:eprints-tech@ecs.soton.ac.uk">eprints-tech@ecs.soton.ac.uk</a></span><br>
<span style="font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#5F5F5F">Datum:
</span><span style="font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">18/02/2016 10:14</span><br>
<span style="font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#5F5F5F">Betreff:
</span><span style="font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">[EP-tech] Antwort: &nbsp;Searching fails when database field contains Å (utf8 %c3%85)</span><br>
<span style="font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#5F5F5F">Gesendet von:
</span><span style="font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;"><a href="mailto:eprints-tech-bounces@ecs.soton.ac.uk">eprints-tech-bounces@ecs.soton.ac.uk</a></span><o:p></o:p></p>
<div class="MsoNormal">
<hr size="2" width="100%" noshade="" style="color:#8091A5" align="left">
</div>
<p class="MsoNormal"><br>
<br>
<br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">Hi,</span><br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;"><br>
we can reproduce the behavior:</span><br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;"><br>
Advanced search (which goes to the SQL index): Ågren, ågren, &quot;Ågren&quot; and &quot;ågren&quot; all fail</span><br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;"><br>
Quick search (which goes to the Xapian index:) both creators_name:ågren and creators_name:Ågren find results &nbsp; (creators_name is the field name we use for authors)</span><br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;"><br>
perl_lib/EPrints/Index/Tokenizer.pm contains a translation list that maps Unicode characters to ASCII - Å is missing there. Maybe this is the clue?</span><br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;"><br>
Best regards,</span><br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;"><br>
Martin</span><br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;"><br>
--<br>
Dr. Martin Brändle<br>
Zentrale Informatik<br>
Universität Zürich<br>
Stampfenbachstr. 73<br>
CH-8006 Zürich</span><br>
<br>
<br>
<img border="0" width="16" height="16" id="_x0000_i1027" src="cid:image001.gif@01D16A52.F112D1E0" alt="Inactive hide details for Christer Enkvist ---17/02/2016 17:20:34---Hello all! I have encountered a weird UTF-8 related problem"><span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#424282">Christer
 Enkvist ---17/02/2016 17:20:34---Hello all! I have encountered a weird UTF-8 related problem when querying names in the advanced sear</span><br>
<span style="font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:#5F5F5F"><br>
Von: </span><span style="font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;">Christer Enkvist &lt;<a href="mailto:christer.enkvist@slu.se">christer.enkvist@slu.se</a>&gt;<span style="color:#5F5F5F"><br>
An: </span>&quot;<a href="mailto:eprints-tech@ecs.soton.ac.uk">eprints-tech@ecs.soton.ac.uk</a>&quot; &lt;<a href="mailto:eprints-tech@ecs.soton.ac.uk">eprints-tech@ecs.soton.ac.uk</a>&gt;<span style="color:#5F5F5F"><br>
Datum: </span>17/02/2016 17:20<span style="color:#5F5F5F"><br>
Betreff: </span>[EP-tech] Searching fails when database field contains Å (utf8 %c3%85)<span style="color:#5F5F5F"><br>
Gesendet von: </span><a href="mailto:eprints-tech-bounces@ecs.soton.ac.uk">eprints-tech-bounces@ecs.soton.ac.uk</a></span><o:p></o:p></p>
<div class="MsoNormal">
<hr size="2" width="100%" noshade="" style="color:#A0A0A0" align="left">
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
<br>
<span style="font-size:10.0pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;"><br>
Hello all!<br>
<br>
I have encountered a weird UTF-8 related problem when querying names in the advanced search. &nbsp;If the name of an author contains Å, like Ångström, (UTF-8 %c3%85, A with a ring above) then querying will fail. &nbsp;I have not seen the problem for any other character,
 e.g. no problem with ”å” (a with ring above), %c3%a5, or any other non A-Z letter such as ä,Ä,ö, or Ö. &nbsp;The problem is when the database entry itself contains an Å, which is typically when the character is the first in the name like Ångström or in a hyphened
 name like Per-Åke.<br>
<br>
Furthermore, if the queryterm contains an “Å” then it will fail. &nbsp;A few examples:<br>
<br>
Mårten – works<br>
mårten – works<br>
MåRTEN -- works<br>
MÅRTEN -- fails<br>
mÅrten -- fails<br>
<br>
The query field is (normally) case insensitive so it shouldn’t matter if I write “ångström” or “Ångström”. &nbsp;However, hit or miss in this case depends on if the database have an Å and/or the query term contains an Å as it seems like Eprints cannot handle “Å”.
 &nbsp;Always, displays correct and is correctly written into the database. &nbsp;Only problem is the advanced search.<br>
<br>
Should add that querying the database using SQL works without any problems (incl all upper/lower combinations). &nbsp;Any ideas what may be wrong with Eprints and where to start looking? &nbsp;<br>
<br>
Regards,<br>
Christer<br>
<br>
</span><b><span style="font-size:10.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;"><br>
Christer Enkvist, Ph D</span></b><span style="font-size:10.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;"><br>
System Administrator/System Librarian<br>
Division of Scholarly Communication <br>
Swedish University of Agricultural Sciences<br>
Uppsala, Sweden<br>
<br>
Telephone: 018-671042<br>
</span><tt><span style="font-size:10.0pt">*** Options: </span></tt><a href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech"><tt><span style="font-size:10.0pt">http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</span></tt></a><span style="font-size:10.0pt;font-family:&quot;Courier New&quot;"><br>
<tt>*** Archive: </tt></span><a href="http://www.eprints.org/tech.php/"><tt><span style="font-size:10.0pt">http://www.eprints.org/tech.php/</span></tt></a><span style="font-size:10.0pt;font-family:&quot;Courier New&quot;"><br>
<tt>*** EPrints community wiki: </tt></span><a href="http://wiki.eprints.org/"><tt><span style="font-size:10.0pt">http://wiki.eprints.org/</span></tt></a><span style="font-size:10.0pt;font-family:&quot;Courier New&quot;"><br>
<tt>*** EPrints developers Forum: </tt></span><a href="http://forum.eprints.org/"><tt><span style="font-size:10.0pt">http://forum.eprints.org/</span></tt></a><br>
<tt><span style="font-size:10.0pt">*** Options: <a href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech">
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</a></span></tt><span style="font-size:10.0pt;font-family:&quot;Courier New&quot;"><br>
<tt>*** Archive: <a href="http://www.eprints.org/tech.php/">http://www.eprints.org/tech.php/</a></tt><br>
<tt>*** EPrints community wiki: <a href="http://wiki.eprints.org/">http://wiki.eprints.org/</a></tt><br>
<tt>*** EPrints developers Forum: <a href="http://forum.eprints.org/">http://forum.eprints.org/</a></tt></span><o:p></o:p></p>
</div>
</body>
</html>