<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p>Hi Avi,</p>
    <p>I have noted this issue happening quite a lot as well.  I have
      tracked it down to an issue indexing PDF documents where the
      extracted word to be indexed contains non-ascii characters.  If
      the whole word is non-ascii characters, basically the empty string
      gets indexed, if there is more than one word that is all non-ascii
      characters, then it fails with the error you see below, as it
      cannot index the empty string twice for the same EPrint and field
      (i.e. documents).  This is because the eprint__rindex table has
      three fields that make up a primary key, field, word and eprintid.
      As the middle one is not set that is is why you see documents--91
      rather than something like documents-word-91 in your error
      message.  <br>
    </p>
    <p>As far as I can tell, this just effects this one badly encoded
      word from getting indexed rather than preventing all indexing for
      the whole EPrint.  I have tested this by writing a script to
      completely de-index an EPrint and then ran reindex,  I could see
      the records disappeared from the eprint__rindex table and then
      reappear again after the reindex.</p>
    <p>I am going to see if I can get the encoding issue sorted out, as
      this is likely to be problematic for people who are indexing
      publications with non-Latin alphabets.  However, this is never
      straightforward, based on past experience.<br>
    </p>
    <p>Regards<br>
    </p>
    David Newman<br>
    <br>
    <div class="moz-cite-prefix">On 02/03/2018 10:53, Stenger, Avischai
      wrote:<br>
    </div>
    <blockquote
cite="mid:EMEW3|1e5e0e710537650cd9bed196dcb0a11eu21B4214eprints-tech-bounces|ecs.soton.ac.uk|277F4767-58F2-4F89-9C57-8FA30BEBB138@ulb.tu-darmstadt.de"
      type="cite">
      <meta http-equiv="Context-Type" content="text/html;
        charset=us-ascii">
      <div class="">
        <br class="">
      </div>
      <div class="">
        <div class="">
          <div class="">
            Hello 2 all,</div>
          <div class="">
            <br class="">
          </div>
          <div class="">
            i have some eprints that do not get rindexed. If i execute,
            as an example:</div>
          <div class="">
            <br class="">
          </div>
          <div class="">
            ~/bin<span class="">/epadmin reindex REPO eprint 91   </span></div>
          <div class="">
            <span class=""><br class="">
            </span></div>
          <div class="">
            <span class="">i get The error: </span></div>
          <div class=""><br class="">
          </div>
          <div class="">
            <div class="">
              <span class="">DBD::mysql::st execute failed: Duplicate
                entry 'documents--91' for key 'PRIMARY' at
                /usr/share/eprints/bin/../perl_lib/EPrints/Database.pm
                line 1287.</span></div>
            <div class="">
              <span class=""><br class="">
              </span></div>
            <div class="">
              <span class=""><br class="">
              </span></div>
            <div class="">
              <span class=""><br class="">
              </span></div>
            <div class="">
              i noticed that if i replace the PDF-Document in this
              eprint  i can indexed it without any Error-message.</div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              if i check the PDF with some open-pdf-checker it says the
              PDF ist okay.</div>
            <div class="">
              (<a moz-do-not-send="true"
                href="https://www.pdf-online.com/osa/validate.aspx"
                class="">https://www.pdf-online.com/osa/validate.aspx</a>) </div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              tnks and have a good weekend</div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              Avi</div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              <div class=""><span class=""><br class="">
                </span></div>
            </div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              <br class="">
            </div>
            <div class="">
              <span class=""></span></div>
            <div class="">
              <span class=""><br class="">
              </span></div>
            <div class="">
              <span class=""><br class="">
              </span></div>
            <div class="">
              <span class=""><br class="">
              </span></div>
            <div class="">
              <span class=""><br class="">
              </span></div>
            <div class="">
              <span class=""><br class="">
              </span></div>
            <div class="">
              <span class=""><br class="">
              </span></div>
            <div class="">
              <span class=""><br class="">
              </span></div>
            <div class="">
               </div>
            <div class="">
              <span class=""><br class="">
              </span></div>
          </div>
          <div class="">
            <br class="">
          </div>
          <br class="Apple-interchange-newline">
        </div>
        <br class="Apple-interchange-newline">
      </div>
      <br class="">
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">*** Options: <a class="moz-txt-link-freetext" href="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech">http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</a>
*** Archive: <a class="moz-txt-link-freetext" href="http://www.eprints.org/tech.php/">http://www.eprints.org/tech.php/</a>
*** EPrints community wiki: <a class="moz-txt-link-freetext" href="http://wiki.eprints.org/">http://wiki.eprints.org/</a>
*** EPrints developers Forum: <a class="moz-txt-link-freetext" href="http://forum.eprints.org/">http://forum.eprints.org/</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>