[EP-tech] Re: advanced search doesn't work with utf-8 characters
Tommy Ingulfsen
tommy at library.caltech.edu
Mon Jul 8 17:23:28 BST 2013
I think you may have come across the same problem that is described in
this thread:
http://www.eprints.org/tech.php/thread-17424.html
Maybe you can try Tim's patch and see if that works for you?
tommy
On 7/5/13 6:43 AM, "Dobrica Pavlinusic" <dpavlin at rot13.org> wrote:
>I have problem with utf-8 characters in advanced search. None of queries
>which contain utf-8 characters (in Croatia we have few of them: šđčćž)
>produce any results.
>
>I have read through wiki and this mail list and figured out that
>$EPrints::Index::FREETEXT_CHAR_MAPPING might be to blame. I added
>mapping for our characters but it didn't help (it would be nice to have
>full support for all characters without need to edit eprints source).
>
>Digging around through eprints source code, I noticed that my queries
>are split on utf-8 characters. If I uncomment line in Eprints::Search
>with $self->get_conditions->describe I can see following behaviour:
>
>1. search query: "Agić" (utf-8 as last char)
>
>AND(
> =($archive.metadata_visibility,"show") ... eprint,
> =($archive.eprint_status,"archive") ... eprint,
> index($archive.creators_name,"agi") ... eprint__rindex
>)
>
>As you can see, utf-8 character gets dropped and this doesn't produce
>any results. I did check in eprint__rindex table and I do have "agić" in
>there.
>
>2. search query: "Bolanča" (utf-8 is next-to last char)
>
>AND(
> =($archive.metadata_visibility,"show") ... eprint,
> =($archive.eprint_status,"archive") ... eprint,
> AND(
> grep($archive.creators_name,"%[bolan]%[a]%-%") ...
>eprint__index_grep,
> AndSubQuery(
> index($archive.creators_name,"bolan") ...
>eprint__rindex,
> index($archive.creators_name,"a") ...
>eprint__rindex
> )
> )
>)
>
>This is even worse, because it split search query into two queries on
>utf-8 character.
>
>I spent last three days inserting warns here-and-there in source code in
>an effort to find out where this splitting is happending, but I have hit
>the brick wall with this problem.
>
>I would appriciate any info or pointers how to resolve this problem.
>
>--
>Dobrica Pavlinusic 2share!2flame
>dpavlin at rot13.org
>Unix addict. Internet consultant.
>http://www.rot13.org/~dpavlin
>
>*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>*** Archive: http://www.eprints.org/tech.php/
>*** EPrints community wiki: http://wiki.eprints.org/
More information about the Eprints-tech
mailing list