[EP-tech] Antwort: Searching fails when database field contains Å (utf8 %c3%85)

martin.braendle at id.uzh.ch martin.braendle at id.uzh.ch
Thu Feb 18 09:11:45 GMT 2016


Hi,

we can reproduce the behavior:

Advanced search (which goes to the SQL index): Ågren, ågren, "Ågren" and
"ågren" all fail

Quick search (which goes to the Xapian index:) both creators_name:ågren and
creators_name:Ågren find results   (creators_name is the field name we use
for authors)

perl_lib/EPrints/Index/Tokenizer.pm contains a translation list that maps
Unicode characters to ASCII - Å is missing there. Maybe this is the clue?

Best regards,

Martin

--
Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Stampfenbachstr. 73
CH-8006 Zürich




Von:	Christer Enkvist <christer.enkvist at slu.se>
An:	"eprints-tech at ecs.soton.ac.uk" <eprints-tech at ecs.soton.ac.uk>
Datum:	17/02/2016 17:20
Betreff:	[EP-tech] Searching fails when database field contains Å (utf8
            %c3%85)
Gesendet von:	eprints-tech-bounces at ecs.soton.ac.uk



Hello all!

I have encountered a weird UTF-8 related problem when querying names in the
advanced search.  If the name of an author contains Å, like Ångström,
(UTF-8 %c3%85, A with a ring above) then querying will fail.  I have not
seen the problem for any other character, e.g. no problem with ”å” (a with
ring above), %c3%a5, or any other non A-Z letter such as ä,Ä,ö, or Ö.  The
problem is when the database entry itself contains an Å, which is typically
when the character is the first in the name like Ångström or in a hyphened
name like Per-Åke.

Furthermore, if the queryterm contains an “Å” then it will fail.  A few
examples:

Mårten – works
mårten – works
MåRTEN -- works
MÅRTEN -- fails
mÅrten -- fails

The query field is (normally) case insensitive so it shouldn’t matter if I
write “ångström” or “Ångström”.  However, hit or miss in this case depends
on if the database have an Å and/or the query term contains an Å as it
seems like Eprints cannot handle “Å”.  Always, displays correct and is
correctly written into the database.  Only problem is the advanced search.

Should add that querying the database using SQL works without any problems
(incl all upper/lower combinations).  Any ideas what may be wrong with
Eprints and where to start looking?

Regards,
Christer


Christer Enkvist, Ph D
System Administrator/System Librarian
Division of Scholarly Communication
Swedish University of Agricultural Sciences
Uppsala, Sweden

Telephone: 018-671042
 *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20160218/ef0a6c1f/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20160218/ef0a6c1f/attachment.gif 


More information about the Eprints-tech mailing list