[EP-tech] EPrints Search - Latest Items

Christopher Gutteridge totl at soton.ac.uk
Thu Apr 30 09:55:04 BST 2020


I don't recall if you can reindex individual fields.

On 30/04/2020 09:51, Yuri via Eprints-tech wrote:
>
> Thanks for the pointer, maybe a check against a fixed vocabulary can 
> be enough.
>
> This also mean reindex all the archive. Is it possible to reindex only 
> title and keywords? Full text can be a problem to reindex if you've a 
> lot of pdf, for example.
>
> Il 30/04/20 10:29, Christopher Gutteridge via Eprints-tech ha scritto:
>>
>> EPrints makes some decisions on what to index. Those can be 
>> overridden, if I recall the old magics from the dawn of time.
>>
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints%2Fblob%2F3.3%2Flib%2Fdefaultcfg%2Fcfg.d%2Findexing.pl&data=01%7C01%7C%7Cbdf305b2704e4ffb496908d7ece42d6d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=SrR1%2FhqpRRbxFl2bgSehx%2FlWwFf3EV4RxSbgVJ5p0Sw%3D&reserved=0
>>
>> That, by default, uses EPrints word split function 
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints%2Fblob%2F3.3%2Fperl_lib%2FEPrints%2FIndex%2FTokenizer.pm%23L39&data=01%7C01%7C%7Cbdf305b2704e4ffb496908d7ece42d6d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=vjTyV70Vsyd5WHcIi9IcaUASsqKAYSwvhMZ%2BhdzBn2M%3D&reserved=0 
>> which apparently uses the perl regexp library to decide word breaks, 
>> but you can write one that does what you want. 
>> freetext_seperator_chars seems utterly ignored now.
>>
>> This is still obeyed
>> $c->{indexing}->{freetext_min_word_size} = 3;
>>
>> Which caused some issues for people with Chinese name "Wu".
>>
>> I would suggest considering keeping it by altering indexing.pl to 
>> always index numbers even if they are one or two digits long. 
>> Something like this (of course you'd then have to entirely reindex)
>>
>>
>>         # First approximation is if this word is over or equal
>>         # to the minimum size set in SiteInfo.
>>         my $ok = $wordlen >= $c->{indexing}->{freetext_min_word_size};
>>
>>         if( $word =~ m/^\d+$/ ) {
>>                     $ok = 1;
>>         }
>>
>> On 30/04/2020 08:27, Yuri via Eprints-tech wrote:
>>>
>>> Hi!
>>>
>>>  I've found that the virus can be referred also as "SARS COV-2" so 
>>> maybe you can add also this. But beware that Eprints search has a 
>>> problem with -, it split the word using it.
>>>
>>> Il 27/04/20 17:06, James Kerwin via Eprints-tech ha scritto:
>>>> Hello All,
>>>>
>>>> I hope everyone is well in body and mind.
>>>>
>>>> I need some help with the EPrints search function. I have been 
>>>> asked to add a box to the repository homepage that lists the latest 
>>>> coronavirus-related deposits.
>>>>
>>>> I'm hoping to search via keywords for "coronavirus" and "covid-19". 
>>>> I also want to search for either of these terms in titles. To do 
>>>> this I'm currently butchering a copy of cgi/latest_tool.
>>>>
>>>> I can get the keywords part to work using:
>>>>
>>>>             $c->{latest_rona_modes} = {
>>>>
>>>>             default => { citation => "noauth" },
>>>>
>>>>             fplatest => {
>>>>
>>>>             citation => "popular", max => 5,
>>>>
>>>>             #citation => "result", max => 3,
>>>>
>>>>             filters => [
>>>>
>>>>             #{ meta_fields => [
>>>>             "full_text_status","full_text_status" ], value =>
>>>>             ("none"||"public") }
>>>>
>>>>             { meta_fields => [ "keywords" ], value => "covid-19"}
>>>>
>>>> This also works with "title" as you would expect.
>>>>
>>>> What I really want is to do a search where the keywords can be 
>>>> "covid-19" OR "coronavirus" as well as including some allowance for 
>>>> adding an:
>>>>
>>>>  "OR title LIKE '%covid-19%' OR title LIKE 'coronavirus' in 
>>>> MYSQL-speak.
>>>>
>>>> Am I able to do this using the EPrints::Search plugin? I've tried 
>>>> reading the codumentation and experimenting with it, but I'm not 
>>>> getting very far.
>>>>
>>>> If it's not possible I can think of a number of bodges for it, but 
>>>> decided it was best to attempt the proper way first.
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>>> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=01%7C01%7C%7Cbdf305b2704e4ffb496908d7ece42d6d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=bMk10ir3se1FWpSlorkYi%2FjWJuR7uc1DXjagxxu3wPc%3D&reserved=0
>>>> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=01%7C01%7C%7Cbdf305b2704e4ffb496908d7ece42d6d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=%2Bae8Gl1RyG0P6%2B0RsaGpfjl%2BNm5MAivqaLXZ8amdcAY%3D&reserved=0
>>>
>>> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>>> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=01%7C01%7C%7Cbdf305b2704e4ffb496908d7ece42d6d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=bMk10ir3se1FWpSlorkYi%2FjWJuR7uc1DXjagxxu3wPc%3D&reserved=0
>>> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=01%7C01%7C%7Cbdf305b2704e4ffb496908d7ece42d6d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=%2Bae8Gl1RyG0P6%2B0RsaGpfjl%2BNm5MAivqaLXZ8amdcAY%3D&reserved=0
>> -- 
>> Christopher Gutteridge <totl at soton.ac.uk>
>> You should read our team blog at http://blog.soton.ac.uk/webteam/
>>
>> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
>> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7C%7Cbdf305b2704e4ffb496908d7ece42d6d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=bMk10ir3se1FWpSlorkYi%2FjWJuR7uc1DXjagxxu3wPc%3D&amp;reserved=0
>> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7C%7Cbdf305b2704e4ffb496908d7ece42d6d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=%2Bae8Gl1RyG0P6%2B0RsaGpfjl%2BNm5MAivqaLXZ8amdcAY%3D&amp;reserved=0
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7C%7Cbdf305b2704e4ffb496908d7ece42d6d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=bMk10ir3se1FWpSlorkYi%2FjWJuR7uc1DXjagxxu3wPc%3D&amp;reserved=0
> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7C%7Cbdf305b2704e4ffb496908d7ece42d6d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=%2Bae8Gl1RyG0P6%2B0RsaGpfjl%2BNm5MAivqaLXZ8amdcAY%3D&amp;reserved=0

-- 
Christopher Gutteridge <totl at soton.ac.uk>
You should read our team blog at http://blog.soton.ac.uk/webteam/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20200430/166967e6/attachment-0001.html 


More information about the Eprints-tech mailing list