<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:Consolas;
        mso-fareast-language:EN-US;}
span.EmailStyle19
        {mso-style-type:personal;
        font-family:"Verdana",sans-serif;
        color:windowtext;}
span.EmailStyle20
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
span.EmailStyle21
        {mso-style-type:personal-reply;
        font-family:"Verdana",sans-serif;
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-AU" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Hi John,
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Thanks for the feedback… I think I was involved in the discussions years ago about the simple search etc…<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">This particular issue appears to be more subtle,
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">It removes the trailing ‘s’ from the keywords (unless it is all caps).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt;font-family:"Courier New"">Keyword indexed term<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt;font-family:"Courier New"">platypus platypu
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt;font-family:"Courier New"">Platypus platypu
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt;font-family:"Courier New"">PLATYPUS platypus
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">It removes the trailing ‘s’ from the search term.. But if the keywords are entered in all caps, it doesn’t remove the ‘s’ Then the search fails for that item
(as it has removed the ‘s’ from the search term).<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt;font-family:"Courier New"">Search Expression ‘platypus’<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt;font-family:"Courier New"">Search performed using ‘platypu’<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt;font-family:"Courier New"">Keyword indexed term Match<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt;font-family:"Courier New"">platypus platypu Yes<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt;font-family:"Courier New"">Platypus platypu Yes<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:36.0pt"><span style="font-size:10.0pt;font-family:"Courier New"">PLATYPUS platypus No<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">I will dig into it, and let you know if there is more patching required ;)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Cheers<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Matt.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="mso-fareast-language:EN-AU">From:</span></b><span lang="EN-US" style="mso-fareast-language:EN-AU"> eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
<b>On Behalf Of </b>John Salter<br>
<b>Sent:</b> Thursday, 1 June 2017 7:16 PM<br>
<b>To:</b> eprints-tech@ecs.soton.ac.uk<br>
<b>Subject:</b> Re: [EP-tech] Plural words in search results<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">Hi Matt,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">I've looked into a similar issue in the past - and I think it was discussed on the tech list a few years ago.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">I had added a fix (we're 3.3.10 too) - which recently was discovered to break things in a more subtle way.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">If I remember the full story, it goes something like this:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">The 'simple' search field is broken in vanilla EPrints 3.3.10 - as it doesn't strip out short-words.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">This fix for this initially was to run the search terms via the $c->{extract_words} function (in cfg.d/indexer.pl).<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">This seemed to resolve the issue (we'd been running it like this for a few years), but we discovered that for a search field looking at multiple metafield types (e.g. a text field and a name field),
if the search term ended in -ss it wouldn't find anything.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">My current fix is: </span>
<a href="https://gist.github.com/jesusbagpuss/e096430c825d34a2ef1de671e8a7dfda"><span lang="EN-GB">https://gist.github.com/jesusbagpuss/e096430c825d34a2ef1de671e8a7dfda</span></a><span lang="EN-GB" style="color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">Both are 'patch' files (overwrite methods in the core EPrints modules - we try to keep these things separated - but you could just take the methods and edit the files they're patching directly).<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">There are two files - one resolves an issue with apostrophes in names (which may or may not affect you).<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">The issue you report is slightly different to the one we found - but I think the cause might be very similar - the stripping of a trailling 's' is applied during indexing, but the same is not applied
when searching.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">Hope that gets you somewhere - some of this stuff is fairly recent in my mind (fixing the fix took a bit of tracing through the modules) - there may be more useful stuff I have in my head!<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">Cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D">John<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="mso-fareast-language:EN-GB">From:</span></b><span lang="EN-US" style="mso-fareast-language:EN-GB">
</span><a href="mailto:eprints-tech-bounces@ecs.soton.ac.uk"><span lang="EN-US" style="mso-fareast-language:EN-GB">eprints-tech-bounces@ecs.soton.ac.uk</span></a><span lang="EN-US" style="mso-fareast-language:EN-GB"> [</span><a href="mailto:eprints-tech-bounces@ecs.soton.ac.uk"><span lang="EN-US" style="mso-fareast-language:EN-GB">mailto:eprints-tech-bounces@ecs.soton.ac.uk</span></a><span lang="EN-US" style="mso-fareast-language:EN-GB">]
<b>On Behalf Of </b>Matthew Brady<br>
<b>Sent:</b> 01 June 2017 06:09<br>
<b>To:</b> </span><a href="mailto:eprints-tech@ecs.soton.ac.uk"><span lang="EN-US" style="mso-fareast-language:EN-GB">eprints-tech@ecs.soton.ac.uk</span></a><span lang="EN-US" style="mso-fareast-language:EN-GB"><br>
<b>Subject:</b> [EP-tech] Plural words in search results<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">Hi All,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">One of our users came across a problem, when performing some keyword searches… and assumed it was a case problem, since the all uppercase words in their testing weren’t returning
in the result set.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">After testing, I have a preliminary diagnosis, (we are running 3.3.10 if it makes a difference).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">It appears the index process is removing the ‘s’ off the end of the word (unless the word is all caps).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">When performing a search, the system removes the ‘s’ from the search term, and performs the search… in our case this returns 2 of 3 test records.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">When I took the last two letters off each eprint’s keywords, and then performed a search, it returned all three records in the results..<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">+----------+----------+---------------+--------------------------+<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">|<- details from eprint__rindex ->|<- eprint.keywords field->|<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">+----------+----------+---------------+--------------------------+<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">| eprintid | field | word | keywords |<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">+----------+----------+---------------+--------------------------+<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">| 29533 | keywords | ornithorhynch | ornithorhynch |<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">| 29534 | keywords | ornithorhynch | Ornithorhynch |<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">| 29535 | keywords | ornithorhynch | ORNITHORHYNCH |<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">+----------+----------+---------------+--------------------------+<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">The plural determination holds true for the humble Platypus as well
</span><span style="font-size:10.0pt;font-family:Wingdings">L</span><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">+----------+----------+-----------------+---------------------------+<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">| eprintid | field | word | keywords |<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">+----------+----------+-----------------+---------------------------+<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">| 29533 | keywords | ornithorhynchu | ornithorhynchus, platypus |<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">| 29533 | keywords | platypu | ornithorhynchus, platypus |<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">| 29534 | keywords | ornithorhynchu | Ornithorhynchus, Platypus |<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">| 29534 | keywords | platypu | Ornithorhynchus, Platypus |<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">| 29535 | keywords | ornithorhynchus | ORNITHORHYNCHUS, PLATYPUS |<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">| 29535 | keywords | platypus | ORNITHORHYNCHUS, PLATYPUS |<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New"">+----------+----------+-----------------+---------------------------+<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Cheers<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Matt.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<pre><o:p> </o:p></pre>
<pre>_____________________________________________________________<o:p></o:p></pre>
<pre>This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>The University of Southern Queensland is a registered provider of education with the Australian Government.<o:p></o:p></pre>
<pre>(CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 )<o:p></o:p></pre>
</div>
<PRE>
_____________________________________________________________
This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email.
The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt.
The University of Southern Queensland is a registered provider of education with the Australian Government.
(CRICOS Institution Code QLD 00244B / NSW 02225M, TEQSA PRV12081 )
</PRE></body>
</html>