<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Thanks for the pointer, maybe a check against a fixed vocabulary
      can be enough.<br>
    </p>
    <p>This also mean reindex all the archive. Is it possible to reindex
      only title and keywords? Full text can be a problem to reindex if
      you've a lot of pdf, for example.<br>
    </p>
    <div class="moz-cite-prefix">Il 30/04/20 10:29, Christopher
      Gutteridge via Eprints-tech ha scritto:<br>
    </div>
    <blockquote type="cite"
cite="mid:EMEW3|873b453e06baf3567d21799648f3bbcfw3Y9Va14eprints-tech-bounces|ecs.soton.ac.uk|53088265-0f14-fff8-e573-a29509f22563@soton.ac.uk">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <p>EPrints makes some decisions on what to index. Those can be
        overridden, if I recall the old magics from the dawn of time.</p>
      <p><a class="moz-txt-link-freetext"
href="https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints%2Fblob%2F3.3%2Flib%2Fdefaultcfg%2Fcfg.d%2Findexing.pl&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=IvxVPKbdvcTkiXxWzaeCaY62YdrSRXqd5jFaoKPWCx0%3D&amp;reserved=0" originalSrc="https://github.com/eprints/eprints/blob/3.3/lib/defaultcfg/cfg.d/indexing.pl" shash="AOjNPec0avysvVuqPLObsnWB8F8qtJHhfIikvtI05lrq5fdJH25wTmRi4ilGah2GIevxMUri6CBgJy10jn1l47uflsXcswJSJctUC5kz0w7UnVrndNkEMHOexEv7ufNLeG+IAZlnu4wqYFg7DtJILbE3T7wTBqYodlOAt924YJo="
originalsrc="https://github.com/eprints/eprints/blob/3.3/lib/defaultcfg/cfg.d/indexing.pl"
shash="JMB48g7NpNmEBNVuP881j1CAK5U+lODflG1z2GMSivwHEV9cm3YWxnlRefgh641ZdleOwar+jFs/Fd5n/Hak9ZRlHXBy2BVrkcyIk5k9sRtQPX/eagPggD4zI7UwVCJ+1n3ws7gfDDUJ8wl7iVNsfNZfWaECRLRz2HhN/Hc9zDg="
          moz-do-not-send="true">https://github.com/eprints/eprints/blob/3.3/lib/defaultcfg/cfg.d/indexing.pl</a></p>
      <p>That, by default, uses EPrints word split function
        <a class="moz-txt-link-freetext"
href="https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints%2Fblob%2F3.3%2Fperl_lib%2FEPrints%2FIndex%2FTokenizer.pm%23L39&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=7uecgJZHTAslXVM5KuDw3BrFvzT0FeNeJJfu3%2FRhpVQ%3D&amp;reserved=0" originalSrc="https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Index/Tokenizer.pm#L39" shash="yz5MIqmgZTycTPOiqgByiwUEiIwrsDUMDkYP462G52WO5xEdtOh8B1iCOp4EXGrDGUhWgXVrVfDKLaFt2Zb/A2QxJZg0+a6SIl+vBvKaqaU3p6Wm5AFL1fKUHclxGylo3rpTsLPd0TWSWnMOPQHPcrsVNrxvc8M4A57hKRgbwtU="
originalsrc="https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Index/Tokenizer.pm#L39"
shash="iX4Zfb01QXW+9CSmaXibzdI2nNNvU/VtXZZG5y56gBOiXjZmySPV/9ZPC5ICZ2sJodDGgCf2lHuTMidIXcjI6pN1MKC/3E1dVC++WYcYtzuRL0Yd4s2CeZE1G+xv5ksQcBH20XzKMzG9TYYE+yV7mReZf1jnUHhR19J5LNBGLhE="
          moz-do-not-send="true">https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Index/Tokenizer.pm#L39</a>
        which apparently uses the perl regexp library to decide word
        breaks, but you can write one that does what you want. <span
          class="pl-c1"><span class="text-bold bg-yellow-light rounded-1
            d-inline-block">freetext_seperator_chars seems utterly
            ignored now. <br>
          </span></span></p>
      <p><span class="pl-c1"><span class="text-bold bg-yellow-light
            rounded-1 d-inline-block">This is still obeyed </span></span><br>
        <span class="pl-c1"><span class="text-bold bg-yellow-light
            rounded-1 d-inline-block"><span class="pl-smi">$c</span><span
              class="pl-k">-&gt;</span>{<span class="pl-c1">indexing</span>}<span
              class="pl-k">-&gt;</span>{<span class="pl-c1">freetext_min_word_size</span>}
            = 3;</span></span></p>
      <p><span class="pl-c1"><span class="text-bold bg-yellow-light
            rounded-1 d-inline-block">Which caused some issues for
            people with Chinese name "Wu".</span></span></p>
      <p><span class="pl-c1"><span class="text-bold bg-yellow-light
            rounded-1 d-inline-block">I would suggest considering
            keeping it by altering indexing.pl to always index numbers
            even if they are one or two digits long. Something like this
            (of course you'd then have to entirely reindex)<br>
          </span></span></p>
      <p><tt><br>
        </tt><tt>        # First approximation is if this word is over
          or equal</tt><tt><br>
        </tt><tt>        # to the minimum size set in SiteInfo.</tt><tt><br>
        </tt><tt>        my $ok = $wordlen &gt;=
          $c-&gt;{indexing}-&gt;{freetext_min_word_size};</tt><tt><br>
        </tt></p>
      <p><tt><font color="#ff0000">        if( $word =~ m/^\d+$/ ) {<br>
                                $ok = 1;<br>
                    }  </font></tt><br>
      </p>
      <div class="moz-cite-prefix">On 30/04/2020 08:27, Yuri via
        Eprints-tech wrote:<br>
      </div>
      <blockquote type="cite"
cite="mid:EMEW3|b966302d0df9e000a031a1f8b6e8872cw3Y8YW14eprints-tech-bounces|ecs.soton.ac.uk|d32bf79e-5ac1-3c16-f6d0-c890b0b95c0d@alfa.it">
        <p>Hi!</p>
        <p> I've found that the virus can be referred also as "SARS
          COV-2" so maybe you can add also this. But beware that Eprints
          search has a problem with -, it split the word using it.<br>
        </p>
        <div class="moz-cite-prefix">Il 27/04/20 17:06, James Kerwin via
          Eprints-tech ha scritto:<br>
        </div>
        <blockquote type="cite"
cite="mid:EMEW3|942a457e724595d3b487147e73b60d14w3QG9014eprints-tech-bounces|ecs.soton.ac.uk|CAKkNZ9Bp5Hpsxb-G9oKRtnn6-pHfcG10ob8mfYBkQ-KFcAF6Sw@mail.gmail.com">
          <div dir="ltr">Hello All,<br>
            <div><br>
            </div>
            <div>I hope everyone is well in body and mind.</div>
            <div><br>
            </div>
            <div>I need some help with the EPrints search function. I
              have been asked to add a box to the repository homepage
              that lists the latest coronavirus-related deposits.</div>
            <div><br>
            </div>
            <div>I'm hoping to search via keywords for "coronavirus" and
              "covid-19". I also want to search for either of these
              terms in titles. To do this I'm currently butchering a
              copy of cgi/latest_tool.</div>
            <div><br>
            </div>
            <div>I can get the keywords part to work using:</div>
            <div><br>
            </div>
            <blockquote style="margin:0 0 0
              40px;border:none;padding:0px">
              <blockquote style="margin:0 0 0
                40px;border:none;padding:0px">
                <blockquote style="margin:0 0 0
                  40px;border:none;padding:0px">
                  <div>$c-&gt;{latest_rona_modes} = {</div>
                </blockquote>
              </blockquote>
              <blockquote style="margin:0 0 0
                40px;border:none;padding:0px">
                <blockquote style="margin:0 0 0
                  40px;border:none;padding:0px">
                  <div> default =&gt; { citation =&gt; "noauth" },</div>
                </blockquote>
              </blockquote>
              <blockquote style="margin:0 0 0
                40px;border:none;padding:0px">
                <blockquote style="margin:0 0 0
                  40px;border:none;padding:0px">
                  <div> fplatest =&gt; { </div>
                </blockquote>
              </blockquote>
              <blockquote style="margin:0 0 0
                40px;border:none;padding:0px">
                <blockquote style="margin:0 0 0
                  40px;border:none;padding:0px">
                  <div> citation =&gt; "popular", max =&gt; 5, </div>
                </blockquote>
              </blockquote>
              <blockquote style="margin:0 0 0
                40px;border:none;padding:0px">
                <blockquote style="margin:0 0 0
                  40px;border:none;padding:0px">
                  <div> #citation =&gt; "result", max =&gt; 3, </div>
                </blockquote>
              </blockquote>
              <blockquote style="margin:0 0 0
                40px;border:none;padding:0px">
                <blockquote style="margin:0 0 0
                  40px;border:none;padding:0px">
                  <div> filters =&gt; [</div>
                </blockquote>
              </blockquote>
              <blockquote style="margin:0 0 0
                40px;border:none;padding:0px">
                <blockquote style="margin:0 0 0
                  40px;border:none;padding:0px">
                  <div> #{ meta_fields =&gt; [
                    "full_text_status","full_text_status" ], value =&gt;
                    ("none"||"public") }</div>
                </blockquote>
              </blockquote>
              <blockquote style="margin:0 0 0
                40px;border:none;padding:0px">
                <blockquote style="margin:0 0 0
                  40px;border:none;padding:0px">
                  <div> { meta_fields =&gt; [ "keywords" ], value =&gt;
                    "covid-19"}</div>
                  <div><br>
                  </div>
                </blockquote>
              </blockquote>
            </blockquote>
            This also works with "title" as you would expect.
            <div><br>
            </div>
            <div>What I really want is to do a search where the keywords
              can be "covid-19" OR "coronavirus" as well as including
              some allowance for adding an:</div>
            <div><br>
            </div>
            <div> "OR title LIKE '%covid-19%' OR title LIKE
              'coronavirus' in MYSQL-speak.</div>
            <div><br>
            </div>
            <div>Am I able to do this using the EPrints::Search plugin?
              I've tried reading the codumentation and experimenting
              with it, but I'm not getting very far.</div>
            <div><br>
            </div>
            <div>If it's not possible I can think of a number of bodges
              for it, but decided it was best to attempt the proper way
              first.</div>
            <div><br>
            </div>
            <div>Thanks,</div>
            <div>James</div>
          </div>
          <br>
          <fieldset class="mimeAttachmentHeader"></fieldset>
          <pre class="moz-quote-pre" wrap="">*** Options: <a class="moz-txt-link-freetext" href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech" moz-do-not-send="true">http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</a>
*** Archive: <a class="moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=0SyTZUHcF3jgCF9wiXxsG1IdLL3QWYz4iXxImS0gmQM%3D&amp;reserved=0" originalSrc="http://www.eprints.org/tech.php/" shash="Z6VrWAdFYJY9UuYGLjG6+dJII751kYUN/9L+305kLRYpqaviP1IyNx6AGkn5kX1ehpa02pVIsyTbCT3fPLhUZs3FXP3+QNJNOVOkRX5vtVUnMMxed7PgDUKiWz28VDaKYcX1Z8eG9/ZjoLs3kygvv5zvN3pmzjkTFaIYt1cVi90=" originalsrc="http://www.eprints.org/tech.php/" shash="w3R5DHB8xCIiggOf7/WGtRHe3k0Ab2fp/M2xGXgAxMoixEU1YlExwhs0Pvh6CAOpnLOLBMvY2h8fL06Pfc8wBaiN5H99sBAyz1q39oKciOcfZFJ+4BnESOZnONMjOUO92n/z8PLgBB5/LbpVALLryBE+JVGlCdd066X2kE6vCMs=" moz-do-not-send="true">http://www.eprints.org/tech.php/</a>
*** EPrints community wiki: <a class="moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=2oHDd2JDuW5tDznmw%2Bx8mKLCk5X2SLsmhWWZReYNIcg%3D&amp;reserved=0" originalSrc="http://wiki.eprints.org/" shash="GI8qIL/DaEwED5kvfNUF0ZlFiy7jlaV3X4DJcyBMCJLnTIW7xdZ7EILa8pySg//SX1TyYhYaJLe8YYskr2fvseX+qXB8jC9tlLA9keyeuMUYGwmp2sZp5lOq88VdYlX61QtdpdzBPxn6N1RMfcm8OHjYKY87omgZDPhHGAJ2uRc=" originalsrc="http://wiki.eprints.org/" shash="NxAHIfrSyQ6xk6E/BGupI66yfpnVaUxQx/YmHgmKIiBR5elc1jC7Po8rJ/9qRmINcYrJMvq7WVhDgaZiHgmiNooo9+3+RUIQMSl+z7RVgBGfTyK/eeAow44cYMAFTtS0TWw1fCf5HJ1o6pzV5uHCrNzltJw26YxUNfTsypQfHZI=" moz-do-not-send="true">http://wiki.eprints.org/</a></pre>
        </blockquote>
        <br>
        <fieldset class="mimeAttachmentHeader"></fieldset>
        <pre class="moz-quote-pre" wrap="">*** Options: <a class="moz-txt-link-freetext" href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech" moz-do-not-send="true">http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</a>
*** Archive: <a class="moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=0SyTZUHcF3jgCF9wiXxsG1IdLL3QWYz4iXxImS0gmQM%3D&amp;reserved=0" originalSrc="http://www.eprints.org/tech.php/" shash="Z6VrWAdFYJY9UuYGLjG6+dJII751kYUN/9L+305kLRYpqaviP1IyNx6AGkn5kX1ehpa02pVIsyTbCT3fPLhUZs3FXP3+QNJNOVOkRX5vtVUnMMxed7PgDUKiWz28VDaKYcX1Z8eG9/ZjoLs3kygvv5zvN3pmzjkTFaIYt1cVi90=" originalsrc="http://www.eprints.org/tech.php/" shash="w3R5DHB8xCIiggOf7/WGtRHe3k0Ab2fp/M2xGXgAxMoixEU1YlExwhs0Pvh6CAOpnLOLBMvY2h8fL06Pfc8wBaiN5H99sBAyz1q39oKciOcfZFJ+4BnESOZnONMjOUO92n/z8PLgBB5/LbpVALLryBE+JVGlCdd066X2kE6vCMs=" moz-do-not-send="true">http://www.eprints.org/tech.php/</a>
*** EPrints community wiki: <a class="moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=2oHDd2JDuW5tDznmw%2Bx8mKLCk5X2SLsmhWWZReYNIcg%3D&amp;reserved=0" originalSrc="http://wiki.eprints.org/" shash="GI8qIL/DaEwED5kvfNUF0ZlFiy7jlaV3X4DJcyBMCJLnTIW7xdZ7EILa8pySg//SX1TyYhYaJLe8YYskr2fvseX+qXB8jC9tlLA9keyeuMUYGwmp2sZp5lOq88VdYlX61QtdpdzBPxn6N1RMfcm8OHjYKY87omgZDPhHGAJ2uRc=" originalsrc="http://wiki.eprints.org/" shash="NxAHIfrSyQ6xk6E/BGupI66yfpnVaUxQx/YmHgmKIiBR5elc1jC7Po8rJ/9qRmINcYrJMvq7WVhDgaZiHgmiNooo9+3+RUIQMSl+z7RVgBGfTyK/eeAow44cYMAFTtS0TWw1fCf5HJ1o6pzV5uHCrNzltJw26YxUNfTsypQfHZI=" moz-do-not-send="true">http://wiki.eprints.org/</a></pre>
      </blockquote>
      <pre class="moz-signature" cols="72">-- 
Christopher Gutteridge <a class="moz-txt-link-rfc2396E" href="mailto:totl@soton.ac.uk" moz-do-not-send="true">&lt;totl@soton.ac.uk&gt;</a> 
You should read our team blog at <a class="moz-txt-link-freetext" href="http://blog.soton.ac.uk/webteam/" moz-do-not-send="true">http://blog.soton.ac.uk/webteam/</a></pre>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">*** Options: <a class="moz-txt-link-freetext" href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech">http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</a>
*** Archive: <a class="moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=0SyTZUHcF3jgCF9wiXxsG1IdLL3QWYz4iXxImS0gmQM%3D&amp;reserved=0" originalSrc="http://www.eprints.org/tech.php/" shash="Z6VrWAdFYJY9UuYGLjG6+dJII751kYUN/9L+305kLRYpqaviP1IyNx6AGkn5kX1ehpa02pVIsyTbCT3fPLhUZs3FXP3+QNJNOVOkRX5vtVUnMMxed7PgDUKiWz28VDaKYcX1Z8eG9/ZjoLs3kygvv5zvN3pmzjkTFaIYt1cVi90=">http://www.eprints.org/tech.php/</a>
*** EPrints community wiki: <a class="moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2c8ae23589734dfe652508d7ece3b1fd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=2oHDd2JDuW5tDznmw%2Bx8mKLCk5X2SLsmhWWZReYNIcg%3D&amp;reserved=0" originalSrc="http://wiki.eprints.org/" shash="GI8qIL/DaEwED5kvfNUF0ZlFiy7jlaV3X4DJcyBMCJLnTIW7xdZ7EILa8pySg//SX1TyYhYaJLe8YYskr2fvseX+qXB8jC9tlLA9keyeuMUYGwmp2sZp5lOq88VdYlX61QtdpdzBPxn6N1RMfcm8OHjYKY87omgZDPhHGAJ2uRc=">http://wiki.eprints.org/</a></pre>
    </blockquote>
  </body>
</html>