[EP-tech] EPrints Search - Latest Items

Yuri yurj at alfa.it
Thu Apr 30 09:51:37 BST 2020


Thanks for the pointer, maybe a check against a fixed vocabulary can be 
enough.

This also mean reindex all the archive. Is it possible to reindex only 
title and keywords? Full text can be a problem to reindex if you've a 
lot of pdf, for example.

Il 30/04/20 10:29, Christopher Gutteridge via Eprints-tech ha scritto:
>
> EPrints makes some decisions on what to index. Those can be 
> overridden, if I recall the old magics from the dawn of time.
>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints%2Fblob%2F3.3%2Flib%2Fdefaultcfg%2Fcfg.d%2Findexing.pl&data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=IvxVPKbdvcTkiXxWzaeCaY62YdrSRXqd5jFaoKPWCx0%3D&reserved=0
>
> That, by default, uses EPrints word split function 
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints%2Fblob%2F3.3%2Fperl_lib%2FEPrints%2FIndex%2FTokenizer.pm%23L39&data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=7uecgJZHTAslXVM5KuDw3BrFvzT0FeNeJJfu3%2FRhpVQ%3D&reserved=0 
> which apparently uses the perl regexp library to decide word breaks, 
> but you can write one that does what you want. 
> freetext_seperator_chars seems utterly ignored now.
>
> This is still obeyed
> $c->{indexing}->{freetext_min_word_size} = 3;
>
> Which caused some issues for people with Chinese name "Wu".
>
> I would suggest considering keeping it by altering indexing.pl to 
> always index numbers even if they are one or two digits long. 
> Something like this (of course you'd then have to entirely reindex)
>
>
>         # First approximation is if this word is over or equal
>         # to the minimum size set in SiteInfo.
>         my $ok = $wordlen >= $c->{indexing}->{freetext_min_word_size};
>
>         if( $word =~ m/^\d+$/ ) {
>                     $ok = 1;
>         }
>
> On 30/04/2020 08:27, Yuri via Eprints-tech wrote:
>>
>> Hi!
>>
>>  I've found that the virus can be referred also as "SARS COV-2" so 
>> maybe you can add also this. But beware that Eprints search has a 
>> problem with -, it split the word using it.
>>
>> Il 27/04/20 17:06, James Kerwin via Eprints-tech ha scritto:
>>> Hello All,
>>>
>>> I hope everyone is well in body and mind.
>>>
>>> I need some help with the EPrints search function. I have been asked 
>>> to add a box to the repository homepage that lists the latest 
>>> coronavirus-related deposits.
>>>
>>> I'm hoping to search via keywords for "coronavirus" and "covid-19". 
>>> I also want to search for either of these terms in titles. To do 
>>> this I'm currently butchering a copy of cgi/latest_tool.
>>>
>>> I can get the keywords part to work using:
>>>
>>>             $c->{latest_rona_modes} = {
>>>
>>>             default => { citation => "noauth" },
>>>
>>>             fplatest => {
>>>
>>>             citation => "popular", max => 5,
>>>
>>>             #citation => "result", max => 3,
>>>
>>>             filters => [
>>>
>>>             #{ meta_fields => [
>>>             "full_text_status","full_text_status" ], value =>
>>>             ("none"||"public") }
>>>
>>>             { meta_fields => [ "keywords" ], value => "covid-19"}
>>>
>>> This also works with "title" as you would expect.
>>>
>>> What I really want is to do a search where the keywords can be 
>>> "covid-19" OR "coronavirus" as well as including some allowance for 
>>> adding an:
>>>
>>>  "OR title LIKE '%covid-19%' OR title LIKE 'coronavirus' in MYSQL-speak.
>>>
>>> Am I able to do this using the EPrints::Search plugin? I've tried 
>>> reading the codumentation and experimenting with it, but I'm not 
>>> getting very far.
>>>
>>> If it's not possible I can think of a number of bodges for it, but 
>>> decided it was best to attempt the proper way first.
>>>
>>> Thanks,
>>> James
>>>
>>> *** Options:http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>> *** Archive:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=0SyTZUHcF3jgCF9wiXxsG1IdLL3QWYz4iXxImS0gmQM%3D&reserved=0
>>> *** EPrints community wiki:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=2oHDd2JDuW5tDznmw%2Bx8mKLCk5X2SLsmhWWZReYNIcg%3D&reserved=0
>>
>> *** Options:http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>> *** Archive:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=0SyTZUHcF3jgCF9wiXxsG1IdLL3QWYz4iXxImS0gmQM%3D&reserved=0
>> *** EPrints community wiki:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=2oHDd2JDuW5tDznmw%2Bx8mKLCk5X2SLsmhWWZReYNIcg%3D&reserved=0
> -- 
> Christopher Gutteridge<totl at soton.ac.uk>  
> You should read our team blog athttp://blog.soton.ac.uk/webteam/
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=0SyTZUHcF3jgCF9wiXxsG1IdLL3QWYz4iXxImS0gmQM%3D&amp;reserved=0
> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=2oHDd2JDuW5tDznmw%2Bx8mKLCk5X2SLsmhWWZReYNIcg%3D&amp;reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20200430/7a1ae7eb/attachment-0001.html 


More information about the Eprints-tech mailing list