[EP-tech] EPrints Search - Latest Items

Christopher Gutteridge totl at soton.ac.uk
Thu Apr 30 09:29:46 BST 2020


EPrints makes some decisions on what to index. Those can be overridden, 
if I recall the old magics from the dawn of time.

https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints%2Fblob%2F3.3%2Flib%2Fdefaultcfg%2Fcfg.d%2Findexing.pl&data=01%7C01%7C%7C2869087cc1a6427a613c08d7ece0a485%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=hFiR5RBcISoUoOQrBDXCFvHvZnk1RztxJq1nJD%2BbJP4%3D&reserved=0

That, by default, uses EPrints word split function 
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints%2Fblob%2F3.3%2Fperl_lib%2FEPrints%2FIndex%2FTokenizer.pm%23L39&data=01%7C01%7C%7C2869087cc1a6427a613c08d7ece0a485%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=ZP1jFbLkYqS2NlyxJYYCBkZV%2FW7QvZ2s0WpWlW3NimA%3D&reserved=0 
which apparently uses the perl regexp library to decide word breaks, but 
you can write one that does what you want. freetext_seperator_chars 
seems utterly ignored now.

This is still obeyed
$c->{indexing}->{freetext_min_word_size} = 3;

Which caused some issues for people with Chinese name "Wu".

I would suggest considering keeping it by altering indexing.pl to always 
index numbers even if they are one or two digits long. Something like 
this (of course you'd then have to entirely reindex)


         # First approximation is if this word is over or equal
         # to the minimum size set in SiteInfo.
         my $ok = $wordlen >= $c->{indexing}->{freetext_min_word_size};

         if( $word =~ m/^\d+$/ ) {
                     $ok = 1;
         }

On 30/04/2020 08:27, Yuri via Eprints-tech wrote:
>
> Hi!
>
>  I've found that the virus can be referred also as "SARS COV-2" so 
> maybe you can add also this. But beware that Eprints search has a 
> problem with -, it split the word using it.
>
> Il 27/04/20 17:06, James Kerwin via Eprints-tech ha scritto:
>> Hello All,
>>
>> I hope everyone is well in body and mind.
>>
>> I need some help with the EPrints search function. I have been asked 
>> to add a box to the repository homepage that lists the latest 
>> coronavirus-related deposits.
>>
>> I'm hoping to search via keywords for "coronavirus" and "covid-19". I 
>> also want to search for either of these terms in titles. To do this 
>> I'm currently butchering a copy of cgi/latest_tool.
>>
>> I can get the keywords part to work using:
>>
>>             $c->{latest_rona_modes} = {
>>
>>             default => { citation => "noauth" },
>>
>>             fplatest => {
>>
>>             citation => "popular", max => 5,
>>
>>             #citation => "result", max => 3,
>>
>>             filters => [
>>
>>             #{ meta_fields => [ "full_text_status","full_text_status"
>>             ], value => ("none"||"public") }
>>
>>             { meta_fields => [ "keywords" ], value => "covid-19"}
>>
>> This also works with "title" as you would expect.
>>
>> What I really want is to do a search where the keywords can be 
>> "covid-19" OR "coronavirus" as well as including some allowance for 
>> adding an:
>>
>>  "OR title LIKE '%covid-19%' OR title LIKE 'coronavirus' in MYSQL-speak.
>>
>> Am I able to do this using the EPrints::Search plugin? I've tried 
>> reading the codumentation and experimenting with it, but I'm not 
>> getting very far.
>>
>> If it's not possible I can think of a number of bodges for it, but 
>> decided it was best to attempt the proper way first.
>>
>> Thanks,
>> James
>>
>> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=01%7C01%7C%7C2869087cc1a6427a613c08d7ece0a485%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=oHll7MzmLQ%2B9xVFblp555zPkypUvEWSLQ3IM9fYvIQs%3D&reserved=0
>> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=01%7C01%7C%7C2869087cc1a6427a613c08d7ece0a485%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=pMk3%2Fok7vYtg7e7O9xGKzP6HmZGYxw8aaFznf0pre4E%3D&reserved=0
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=01%7C01%7C%7C2869087cc1a6427a613c08d7ece0a485%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=oHll7MzmLQ%2B9xVFblp555zPkypUvEWSLQ3IM9fYvIQs%3D&reserved=0
> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=01%7C01%7C%7C2869087cc1a6427a613c08d7ece0a485%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=pMk3%2Fok7vYtg7e7O9xGKzP6HmZGYxw8aaFznf0pre4E%3D&reserved=0

-- 
Christopher Gutteridge <totl at soton.ac.uk>
You should read our team blog at http://blog.soton.ac.uk/webteam/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20200430/83fb8d20/attachment.html 


More information about the Eprints-tech mailing list