[EP-tech] Searching URLs
Martin Brändle
martin.braendle at uzh.ch
Fri Jan 20 14:42:15 GMT 2023
CAUTION: This e-mail originated outside the University of Southampton.
Hi,
we observe in our repo that only complete URLs can be searched in Url-type fields.
As far as I understand from the Metafield definition, text_index => 1, sql_index => 0, and default search behavior is "IN", so it should be possible to search also for single words of an URL or URLs truncated with %, or not?
Also, when I investigated the repository database tables, I see that the eprint__index table only contains complete URLs for an Url-type field. In addition, there is a limit in eprint__index for the ids column, which might hamper large repositories. The ids column data type is "text", which allows for 64K characters maximum. It stores the eprint ids (concatenated with a colon) for an indexed word. So a maximum of about 10K eprintids is possible for a word. Frequent words (which are not stopwords) may not be indexed completely …
Kind regards,
Martin
--
Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Stampfenbachstr. 73
CH-8006 Zürich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20230120/de798575/attachment-0001.html
More information about the Eprints-tech
mailing list