<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
tt
        {mso-style-priority:99;
        font-family:"Courier New";}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Arial","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="SV" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D">Thank you! Works like a charm
</span><span lang="EN-GB" style="font-size:11.0pt;font-family:Wingdings;color:#1F497D">J</span><span lang="EN-GB" style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D">/Christer<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
<b>On Behalf Of </b>martin.braendle@id.uzh.ch<br>
<b>Sent:</b> den 18 februari 2016 11:22<br>
<b>To:</b> eprints-tech@ecs.soton.ac.uk<br>
<b>Subject:</b> [EP-tech] Antwort: Antwort: Searching fails when database field contains Å (utf8 %c3%85)<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">I have added </span>
<br>
<br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif"">chr(0x00c5) => 'A', # Å</span><br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif"">chr(0xc385) => 'A', # Å</span><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif"">to the $EPrints::Index::FREETEXT_CHAR_MAPPING list in Tokenizer.pm, restarted both the web server and indexer process, and reindexed the eprint that contained Ågren as author.</span><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Now Advanced Search works with Å
</span><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Kind regards,</span><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Martin</span><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif"">--</span><br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Dr. Martin Brändle</span><br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Zentrale Informatik</span><br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Universität Zürich</span><br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Stampfenbachstr. 73</span><br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif"">CH-8006 Zürich</span><br>
<br>
<br>
<img width="16" height="16" id="_x0000_i1025" src="cid:image001.gif@01D16A52.F112D1E0" alt="Inactive hide details for martin.braendle---18/02/2016 10:14:46---Hi, we can reproduce the behavior:"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#424282">martin.braendle---18/02/2016
10:14:46---Hi, we can reproduce the behavior:</span><br>
<br>
<span style="font-size:7.5pt;font-family:"Arial","sans-serif";color:#5F5F5F">Von:
</span><span style="font-size:7.5pt;font-family:"Arial","sans-serif""><a href="mailto:martin.braendle@id.uzh.ch">martin.braendle@id.uzh.ch</a></span><br>
<span style="font-size:7.5pt;font-family:"Arial","sans-serif";color:#5F5F5F">An: </span>
<span style="font-size:7.5pt;font-family:"Arial","sans-serif""><a href="mailto:eprints-tech@ecs.soton.ac.uk">eprints-tech@ecs.soton.ac.uk</a></span><br>
<span style="font-size:7.5pt;font-family:"Arial","sans-serif";color:#5F5F5F">Datum:
</span><span style="font-size:7.5pt;font-family:"Arial","sans-serif"">18/02/2016 10:14</span><br>
<span style="font-size:7.5pt;font-family:"Arial","sans-serif";color:#5F5F5F">Betreff:
</span><span style="font-size:7.5pt;font-family:"Arial","sans-serif"">[EP-tech] Antwort: Searching fails when database field contains Å (utf8 %c3%85)</span><br>
<span style="font-size:7.5pt;font-family:"Arial","sans-serif";color:#5F5F5F">Gesendet von:
</span><span style="font-size:7.5pt;font-family:"Arial","sans-serif""><a href="mailto:eprints-tech-bounces@ecs.soton.ac.uk">eprints-tech-bounces@ecs.soton.ac.uk</a></span><o:p></o:p></p>
<div class="MsoNormal">
<hr size="2" width="100%" noshade="" style="color:#8091A5" align="left">
</div>
<p class="MsoNormal"><br>
<br>
<br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Hi,</span><br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif""><br>
we can reproduce the behavior:</span><br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif""><br>
Advanced search (which goes to the SQL index): Ågren, ågren, "Ågren" and "ågren" all fail</span><br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif""><br>
Quick search (which goes to the Xapian index:) both creators_name:ågren and creators_name:Ågren find results (creators_name is the field name we use for authors)</span><br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif""><br>
perl_lib/EPrints/Index/Tokenizer.pm contains a translation list that maps Unicode characters to ASCII - Å is missing there. Maybe this is the clue?</span><br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif""><br>
Best regards,</span><br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif""><br>
Martin</span><br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif""><br>
--<br>
Dr. Martin Brändle<br>
Zentrale Informatik<br>
Universität Zürich<br>
Stampfenbachstr. 73<br>
CH-8006 Zürich</span><br>
<br>
<br>
<img border="0" width="16" height="16" id="_x0000_i1027" src="cid:image001.gif@01D16A52.F112D1E0" alt="Inactive hide details for Christer Enkvist ---17/02/2016 17:20:34---Hello all! I have encountered a weird UTF-8 related problem"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#424282">Christer
Enkvist ---17/02/2016 17:20:34---Hello all! I have encountered a weird UTF-8 related problem when querying names in the advanced sear</span><br>
<span style="font-size:7.5pt;font-family:"Arial","sans-serif";color:#5F5F5F"><br>
Von: </span><span style="font-size:7.5pt;font-family:"Arial","sans-serif"">Christer Enkvist <<a href="mailto:christer.enkvist@slu.se">christer.enkvist@slu.se</a>><span style="color:#5F5F5F"><br>
An: </span>"<a href="mailto:eprints-tech@ecs.soton.ac.uk">eprints-tech@ecs.soton.ac.uk</a>" <<a href="mailto:eprints-tech@ecs.soton.ac.uk">eprints-tech@ecs.soton.ac.uk</a>><span style="color:#5F5F5F"><br>
Datum: </span>17/02/2016 17:20<span style="color:#5F5F5F"><br>
Betreff: </span>[EP-tech] Searching fails when database field contains Å (utf8 %c3%85)<span style="color:#5F5F5F"><br>
Gesendet von: </span><a href="mailto:eprints-tech-bounces@ecs.soton.ac.uk">eprints-tech-bounces@ecs.soton.ac.uk</a></span><o:p></o:p></p>
<div class="MsoNormal">
<hr size="2" width="100%" noshade="" style="color:#A0A0A0" align="left">
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial","sans-serif""><br>
Hello all!<br>
<br>
I have encountered a weird UTF-8 related problem when querying names in the advanced search. If the name of an author contains Å, like Ångström, (UTF-8 %c3%85, A with a ring above) then querying will fail. I have not seen the problem for any other character,
e.g. no problem with ”å” (a with ring above), %c3%a5, or any other non A-Z letter such as ä,Ä,ö, or Ö. The problem is when the database entry itself contains an Å, which is typically when the character is the first in the name like Ångström or in a hyphened
name like Per-Åke.<br>
<br>
Furthermore, if the queryterm contains an “Å” then it will fail. A few examples:<br>
<br>
Mårten – works<br>
mårten – works<br>
MåRTEN -- works<br>
MÅRTEN -- fails<br>
mÅrten -- fails<br>
<br>
The query field is (normally) case insensitive so it shouldn’t matter if I write “ångström” or “Ångström”. However, hit or miss in this case depends on if the database have an Å and/or the query term contains an Å as it seems like Eprints cannot handle “Å”.
Always, displays correct and is correctly written into the database. Only problem is the advanced search.<br>
<br>
Should add that querying the database using SQL works without any problems (incl all upper/lower combinations). Any ideas what may be wrong with Eprints and where to start looking? <br>
<br>
Regards,<br>
Christer<br>
<br>
</span><b><span style="font-size:10.0pt;font-family:"Calibri","sans-serif""><br>
Christer Enkvist, Ph D</span></b><span style="font-size:10.0pt;font-family:"Calibri","sans-serif""><br>
System Administrator/System Librarian<br>
Division of Scholarly Communication <br>
Swedish University of Agricultural Sciences<br>
Uppsala, Sweden<br>
<br>
Telephone: 018-671042<br>
</span><tt><span style="font-size:10.0pt">*** Options: </span></tt><a href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech"><tt><span style="font-size:10.0pt">http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</span></tt></a><span style="font-size:10.0pt;font-family:"Courier New""><br>
<tt>*** Archive: </tt></span><a href="http://www.eprints.org/tech.php/"><tt><span style="font-size:10.0pt">http://www.eprints.org/tech.php/</span></tt></a><span style="font-size:10.0pt;font-family:"Courier New""><br>
<tt>*** EPrints community wiki: </tt></span><a href="http://wiki.eprints.org/"><tt><span style="font-size:10.0pt">http://wiki.eprints.org/</span></tt></a><span style="font-size:10.0pt;font-family:"Courier New""><br>
<tt>*** EPrints developers Forum: </tt></span><a href="http://forum.eprints.org/"><tt><span style="font-size:10.0pt">http://forum.eprints.org/</span></tt></a><br>
<tt><span style="font-size:10.0pt">*** Options: <a href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech">
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</a></span></tt><span style="font-size:10.0pt;font-family:"Courier New""><br>
<tt>*** Archive: <a href="http://www.eprints.org/tech.php/">http://www.eprints.org/tech.php/</a></tt><br>
<tt>*** EPrints community wiki: <a href="http://wiki.eprints.org/">http://wiki.eprints.org/</a></tt><br>
<tt>*** EPrints developers Forum: <a href="http://forum.eprints.org/">http://forum.eprints.org/</a></tt></span><o:p></o:p></p>
</div>
</body>
</html>