<html><body>
<p><font size="2" face="sans-serif">I have added </font><br>
<br>
<font size="2" face="sans-serif">chr(0x00c5) => 'A', # Å</font><br>
<font size="2" face="sans-serif">chr(0xc385) => 'A', # Å</font><br>
<br>
<font size="2" face="sans-serif">to the $EPrints::Index::FREETEXT_CHAR_MAPPING list in Tokenizer.pm, restarted both the web server and indexer process, and reindexed the eprint that contained Ågren as author.</font><br>
<br>
<font size="2" face="sans-serif">Now Advanced Search works with Å </font><br>
<br>
<font size="2" face="sans-serif">Kind regards,</font><br>
<br>
<font size="2" face="sans-serif">Martin</font><br>
<br>
<font size="2" face="sans-serif">--</font><br>
<font size="2" face="sans-serif">Dr. Martin Brändle</font><br>
<font size="2" face="sans-serif">Zentrale Informatik</font><br>
<font size="2" face="sans-serif">Universität Zürich</font><br>
<font size="2" face="sans-serif">Stampfenbachstr. 73</font><br>
<font size="2" face="sans-serif">CH-8006 Zürich</font><br>
<br>
<br>
<img width="16" height="16" src="cid:1__=4EBBF5CEDFABC5D38f9e8a93df9@lotus.uzh.ch" border="0" alt="Inactive hide details for martin.braendle---18/02/2016 10:14:46---Hi, we can reproduce the behavior:"><font size="2" color="#424282" face="sans-serif">martin.braendle---18/02/2016 10:14:46---Hi, we can reproduce the behavior:</font><br>
<br>
<font size="1" color="#5F5F5F" face="sans-serif">Von:        </font><font size="1" face="sans-serif">martin.braendle@id.uzh.ch</font><br>
<font size="1" color="#5F5F5F" face="sans-serif">An:        </font><font size="1" face="sans-serif">eprints-tech@ecs.soton.ac.uk</font><br>
<font size="1" color="#5F5F5F" face="sans-serif">Datum:        </font><font size="1" face="sans-serif">18/02/2016 10:14</font><br>
<font size="1" color="#5F5F5F" face="sans-serif">Betreff:        </font><font size="1" face="sans-serif">[EP-tech] Antwort: Searching fails when database field contains Å (utf8 %c3%85)</font><br>
<font size="1" color="#5F5F5F" face="sans-serif">Gesendet von:        </font><font size="1" face="sans-serif">eprints-tech-bounces@ecs.soton.ac.uk</font><br>
<hr width="100%" size="2" align="left" noshade style="color:#8091A5; "><br>
<br>
<br>
<font size="2" face="sans-serif">Hi,</font><font size="3" face="serif"><br>
</font><font size="2" face="sans-serif"><br>
we can reproduce the behavior:</font><font size="3" face="serif"><br>
</font><font size="2" face="sans-serif"><br>
Advanced search (which goes to the SQL index): Ågren, ågren, "Ågren" and "ågren" all fail</font><font size="3" face="serif"><br>
</font><font size="2" face="sans-serif"><br>
Quick search (which goes to the Xapian index:) both creators_name:ågren and creators_name:Ågren find results (creators_name is the field name we use for authors)</font><font size="3" face="serif"><br>
</font><font size="2" face="sans-serif"><br>
perl_lib/EPrints/Index/Tokenizer.pm contains a translation list that maps Unicode characters to ASCII - Å is missing there. Maybe this is the clue?</font><font size="3" face="serif"><br>
</font><font size="2" face="sans-serif"><br>
Best regards,</font><font size="3" face="serif"><br>
</font><font size="2" face="sans-serif"><br>
Martin</font><font size="3" face="serif"><br>
</font><font size="2" face="sans-serif"><br>
--<br>
Dr. Martin Brändle<br>
Zentrale Informatik<br>
Universität Zürich<br>
Stampfenbachstr. 73<br>
CH-8006 Zürich</font><font size="3" face="serif"><br>
<br>
<br>
</font><img src="cid:1__=4EBBF5CEDFABC5D38f9e8a93df9@lotus.uzh.ch" width="16" height="16" alt="Inactive hide details for Christer Enkvist ---17/02/2016 17:20:34---Hello all! I have encountered a weird UTF-8 related problem"><font size="2" color="#424282" face="sans-serif">Christer Enkvist ---17/02/2016 17:20:34---Hello all! I have encountered a weird UTF-8 related problem when querying names in the advanced sear</font><font size="3" face="serif"><br>
</font><font size="1" color="#5F5F5F" face="sans-serif"><br>
Von: </font><font size="1" face="sans-serif">Christer Enkvist <christer.enkvist@slu.se></font><font size="1" color="#5F5F5F" face="sans-serif"><br>
An: </font><font size="1" face="sans-serif">"eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk></font><font size="1" color="#5F5F5F" face="sans-serif"><br>
Datum: </font><font size="1" face="sans-serif">17/02/2016 17:20</font><font size="1" color="#5F5F5F" face="sans-serif"><br>
Betreff: </font><font size="1" face="sans-serif">[EP-tech] Searching fails when database field contains Å (utf8 %c3%85)</font><font size="1" color="#5F5F5F" face="sans-serif"><br>
Gesendet von: </font><font size="1" face="sans-serif">eprints-tech-bounces@ecs.soton.ac.uk</font><font size="3" face="serif"><br>
</font><hr width="100%" size="2" align="left" noshade><font size="3" face="serif"><br>
<br>
</font><font size="2" face="Arial"><br>
Hello all!<br>
<br>
I have encountered a weird UTF-8 related problem when querying names in the advanced search. If the name of an author contains Å, like Ångström, (UTF-8 %c3%85, A with a ring above) then querying will fail. I have not seen the problem for any other character, e.g. no problem with ”å” (a with ring above), %c3%a5, or any other non A-Z letter such as ä,Ä,ö, or Ö. The problem is when the database entry itself contains an Å, which is typically when the character is the first in the name like Ångström or in a hyphened name like Per-Åke.<br>
<br>
Furthermore, if the queryterm contains an “Å” then it will fail. A few examples:<br>
<br>
Mårten – works<br>
mårten – works<br>
MåRTEN -- works<br>
MÅRTEN -- fails<br>
mÅrten -- fails<br>
<br>
The query field is (normally) case insensitive so it shouldn’t matter if I write “ångström” or “Ångström”. However, hit or miss in this case depends on if the database have an Å and/or the query term contains an Å as it seems like Eprints cannot handle “Å”. Always, displays correct and is correctly written into the database. Only problem is the advanced search.<br>
<br>
Should add that querying the database using SQL works without any problems (incl all upper/lower combinations). Any ideas what may be wrong with Eprints and where to start looking? <br>
<br>
Regards,<br>
Christer<br>
<br>
</font><font size="2" face="Calibri"><b><br>
Christer Enkvist, Ph D</b></font><font size="2" face="Calibri"><br>
System Administrator/System Librarian<br>
Division of Scholarly Communication <br>
Swedish University of Agricultural Sciences<br>
Uppsala, Sweden<br>
<br>
Telephone: 018-671042<br>
</font><tt><font size="2">*** Options: </font></tt><a href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech"><tt><font size="2" color="#0000FF"><u>http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</u></font></tt></a><tt><font size="2"><br>
*** Archive: </font></tt><a href="http://www.eprints.org/tech.php/"><tt><font size="2" color="#0000FF"><u>http://www.eprints.org/tech.php/</u></font></tt></a><tt><font size="2"><br>
*** EPrints community wiki: </font></tt><a href="http://wiki.eprints.org/"><tt><font size="2" color="#0000FF"><u>http://wiki.eprints.org/</u></font></tt></a><tt><font size="2"><br>
*** EPrints developers Forum: </font></tt><a href="http://forum.eprints.org/"><tt><font size="2" color="#0000FF"><u>http://forum.eprints.org/</u></font></tt></a><font size="3" face="serif"><br>
</font><tt><font size="2">*** Options: </font></tt><tt><font size="2"><a href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech">http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</a></font></tt><tt><font size="2"><br>
*** Archive: </font></tt><tt><font size="2"><a href="http://www.eprints.org/tech.php/">http://www.eprints.org/tech.php/</a></font></tt><tt><font size="2"><br>
*** EPrints community wiki: </font></tt><tt><font size="2"><a href="http://wiki.eprints.org/">http://wiki.eprints.org/</a></font></tt><tt><font size="2"><br>
*** EPrints developers Forum: </font></tt><tt><font size="2"><a href="http://forum.eprints.org/">http://forum.eprints.org/</a></font></tt><tt><font size="2"><br>
</font></tt><br>
</body></html>