[EP-tech] Re: Xapian indexing

David R Newman drn at ecs.soton.ac.uk
Wed Dec 9 18:34:01 GMT 2015


Hi Josée,

Turns out to be a really simple answer to this question but a rather
long way round to discovering it.

By default namedset fields have text_index set to 0.  Therefore if only
namedset fields are changed the EPrint will not be queued for
re-indexing, even if the field in question will be re-indexed if you
change a non-namedset field at the same time.  The solution is to add a:

text_index => 1

to the namedset field you want to be indexed.

I suspect the reason that namedset is non indexed because it is not the
value you see in the select box that will be added to the index but the
underlying value in the namedset file, which often not the same.  Also
search on such a short term is likely to return quite a few results
where this value matches but on another indexed field.  Therefore, I
think text_index is turned off by default because it is unlikely doing a
free text search on a namedset value is going to return you the set of
results you are expecting.  In some cases it may be appropriate, at
which point you should set text_index to 1 for this field.

Regards

David Newman 

On Wed, 2015-12-09 at 17:12 +0000, David R Newman wrote:
> Hi Josée,
> 
> I am currently looking into this issue as well as I have identified a
> situation where a small percentage of EPrints cannot be found when you
> individual search on their title.  I have script for automating testing
> this on multiple EPrints at once, which I can make available.
> 
> On the specific issue you describe, I can replicate the same issue on a
> 3.3.14 version of EPrints.  I have yet to dig down into what is causing
> it not being put in the indexer queue but I do not think it will be too
> difficult to figure out.  I found that if I subsequently change another
> non-namedset field it will schedule for re-index both that field and the
> namedset field I had previously changed.
> 
> I am not certain if your issue relates the problem I mentioned initially
> as I think the problem is non-Xapian dependent, as it is not until the
> indexing task is run later by the indexer, does it know whether it will
> indexed using Xapian or just to the database.
> 
> Regards
> 
> David Newman
> 
> 
> On Wed, 2015-12-02 at 07:55 +0100, Lessard Josée wrote:
> > 
> > Hello,
> > we use Xapian for our simple search.
> > 
> > 
> > 
> > The Xapian indexing  is correct when a reference is validated in the
> > archive (eprint_status:buffer => archive)
> > 
> > But, if the correction is made on a  "namedsets" field, the document
> > indexing is not launched!
> > If the modification is made on a  "type text" field, indexing is
> > launched.
> > Have you ever had this problem reported?  How to make sure re-indexing
> > is launched on any field type modifications?
> > 
> > Sorry for my English.
> > 
> > Sincerly
> > Josée Lessard
> > 
> > 
> > eprint_search_simple.pl
> > 
> > 
> > 
> > $c->{search}->{simple} = 
> > {
> >     search_fields => [
> >         {
> >             id => 'q',
> >             meta_fields => [
> >                 'documents',
> >                 'eprintid',
> >                 'title',
> >                 'abstract',
> >                 'date',
> >                 'type',
> >                 'statut_indexation',
> >                 'indexeur',
> > ...
> >             ]
> >         },
> >     ],
> >     preamble_phrase => 'cgi/search:preamble',
> >     title_phrase => 'cgi/search:simple_search',
> >     citation => 'result',
> >     page_size => 20,
> >     order_methods => {
> >         'byyear'      => '-date/creators_name/title',
> >         'byyearoldest'     => 'date/creators_name/title',
> >         'byname'       => 'creators_name/-date/title',
> >         'bytitle'      => 'title/creators_name/-date',
> >         'bytype'      => 'type/-date/title',
> >         'byti'             => '-full_text_status/-date/title',
> >     },
> >     default_order => 'byyear',
> >     show_zero_results => 1,
> > };
> > 
> > 
> > 
> > 
> > /opt/www/eprints-3.3.12/archives/agritrop/cfg/namedsets/statut_indexation
> > 
> > 
> > 
> > a_classer
> > a_indexer
> > a_indexer_indexeur
> > en_cours_d_indexation
> > a_indexer_electronique
> > a_indexer_papier
> > document_a_numeriser
> > notice_indexee
> > 
> > 
> > 
> > 
> > __________________________________
> > 
> > Correction eprints
> > 
> > 
> > Résultat :
> > 
> > 
> > 
> > title
> > 
> > "Publications et travaux du SAR 1996"
> > 
> > eprint_status
> > 
> > "archive"
> > 
> > statut_indexation
> > 
> > "en_cours_d_indexation"
> > 
> > 
> > Indexation Xapian :
> > 
> >       * title:1996 
> >       * title:du 
> >       * title:et 
> >       * title:publications 
> >       * title:sar 
> >       * title:travaux 
> >       * statut_indexation:notice_indexee
> >       * lastmod:20150909
> > 
> >  
> > 
> >  
> > 
> > 
> >  
> > 
> > -- 
> > -- 
> > Josée Lessard
> > 
> > Documentaliste
> > 
> > Cirad-Dgdrs-Délégation à l'information scientifique et technique
> > 
> > TA 183/05 - Avenue Agropolis - 34398 Montpellier Cedex 5 (Tél: +33 4
> > 67 61 57 37)
> > 
> > 
> > *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> > *** Archive: http://www.eprints.org/tech.php/
> > *** EPrints community wiki: http://wiki.eprints.org/
> > *** EPrints developers Forum: http://forum.eprints.org/
> 
> 
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/




More information about the Eprints-tech mailing list