[EP-tech] Re: international character search problem

Tommy Ingulfsen tommy at library.caltech.edu
Thu Jan 24 17:20:55 GMT 2013

Hi and thanks for putting up a patch on Git so quickly. I'm sorry to say
that I ran into another problem when I patched our server with the new
perl_lib/EPrints/MetaField/Name.pm. Previously, the regular expression
that splits up initials was located after the test for whether we're doing
a simple search (as opposed to an advanced search) - this is the new
version of the code I'm talking about:

# split up initials
	$v2 =~ s/([\p{Uppercase}])/ $1/g;

	# name searches are case sensitive
	$v2 = "\L$v2";

	if( $search_mode eq "simple" )
		return EPrints::Search::Condition->new(
			$v2 );

Now, if I do a simple search for e.g. "James", the splitting up of
initials above causes a search for " James" to be performed, which doesn't
work so well. I'm not entirely sure what the intention of all of the code
is, so I don't have a fix for this myself yet.

There was another, unrelated, issue I came across while debugging. In the
table eprint__rindex, I noticed that some of the non-ASCII characters in
creators_name are stored correctly - e.g. "zenginoğlu". But then there are
some authors whose names don't come through right. For example, when I
entered a new paper written by "Magó", the creators_name is stored as
"mago" in eprint__rindex.word. Another example I found is "Eötvös", which
is stored as "eoetvoes". I haven't looked into this one in detail myself
yet, so I don't have any pointers as to what the cause may be.

Anyway, the first search issue is more pressing for us, so if anyone on
the list has any ideas for a robust solution that would be great.

Tommy, Caltech

On 1/17/13 4:38 AM, "Tim Brody" <tdb2 at ecs.soton.ac.uk> wrote:

>On Thu, 17 Jan 2013 00:46:37 +0000, Tommy Ingulfsen
><tommy at library.caltech.edu> wrote:
>> I may have found a bug in EPrints 3.3.10. One of the authors in our
>> repository is Anıl Zenginoğlu (if the name doesn't come out right in
>> email, his homepage is  http://www.tapir.caltech.edu/~anil/). Searching
>> for the surname works fine with the simple search, but with the advanced
>> search we don't get any results. I believe the problem is with line 230
>> perl_lib/EPrints/MetaField/Name.pm:
>> # remove not a-z characters (except ,)
>> $v2 =~ s/[^a-z,]/ /ig;
>> That code splits up "zenginoğlu" to "zengino lu". A possible solution
>> be
>> use utf8;
>>>> $v2 =~ s/[^\p{L},]/ /ig;
>> Maybe someone with a strong encodings-fu can comment?
>I've written a fix here:
>All the best,
>*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>*** Archive: http://www.eprints.org/tech.php/
>*** EPrints community wiki: http://wiki.eprints.org/

More information about the Eprints-tech mailing list