<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body dir="auto">
<div style="padding-bottom: 10px; padding-top: 5px;">
<div style="padding:12px; border:1px solid #8D3970; background-color:#F7F9FA; color:#8D3970; font-size:14px; line-height:22px; font-family: Calibri, Arial, Helvetica, sans-serif;">
<strong>CAUTION:</strong> This e-mail originated outside the University of Southampton.
</div>
</div>
<div>Martin thanks for your feedback, this workflow includes a thesaurus&nbsp;<a href="https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbuildvoc.co.uk%2Fbv%2Fen%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C44027a98ef3b4cae519308d8c1d706bd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637472476561649826%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0Xj9gQWOvNibiFY7V2iJ%2BHJhAIaD3%2FA83NiM3lLZpio%3D&amp;reserved=0" originalSrc="https://buildvoc.co.uk/bv/en/" shash="tj7yqZv8zdBr88GsgtAp2NIEUfXolSP1o09WeiF4SPwvn7Xjp0MVV6xv5bzyzrXqUUzzCVzYW9R6Zozc2h1jGR54/orhijy5/FWTZjbULKvUCasurEG7t8CF6DQNquR1cDIe6Y/WpwDr0FIIdL3xGeLMSyy+DWJbqMZIQ3OEVKA=">https://buildvoc.co.uk/bv/en/</a>&nbsp;and for NLP &nbsp;<a href="https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNatLibFi%2FAnnif&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C44027a98ef3b4cae519308d8c1d706bd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637472476561659821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2B7bdgqG9kr6eIQFT7snYGQnMuDqsm5hpnZxMmUQfhKs%3D&amp;reserved=0" originalSrc="https://github.com/NatLibFi/Annif" shash="OTvBaBCMivRix63irLkaq4oOtVMY91GakaxGaI4yWe8GxWO1cvIsHfpfp45eHznBTqrckNvALniyvF34CyygAmtJ903hiICNRMtYSh9FyLFJDXHuUZuXDJu0CDh2Q4JsEx7nmYt8XJmgGDW2IasdceOPmoa99hErwT2t4WAycs0=">https://github.com/NatLibFi/Annif</a>&nbsp;which processes
 the abstract in eprints and returns the keywords.
<div><br>
</div>
<div>Need to have the each keyword in a individual fields, so I can tell the indexer these fields are “phrase” with white space</div>
<div><br>
</div>
<div>Any ideas on script to create 10 fields for uncontrolled keywords?<br>
<br>
<div dir="ltr">
<p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;">Best Regards,</p>
<p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><span style="background-color: rgba(255, 255, 255, 0);">Phil Stacey&nbsp;<a href="tel:07792661738" dir="ltr" x-apple-data-detectors="true" x-apple-data-detectors-type="telephone" x-apple-data-detectors-result="0">07792661738</a><o:p></o:p></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0.0001pt;"><a href="https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Feprints.buildvoc.co.uk%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C44027a98ef3b4cae519308d8c1d706bd%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637472476561669819%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Rx4acBW%2B2K3MmoTh1xiUDUMA9%2F4hNCL893gHl7McukA%3D&amp;reserved=0" originalSrc="https://eprints.buildvoc.co.uk/" shash="Tt+aGt1xBMLxlGRUIv5yvi/PE0VVz5W710srBsyuy4wxi3zS/9RAmB2/yZDD3UD5NyU83Oce17BaRp7hyusBycuV8BWgr3khLnN2hk0Z1Sl5gk3U4AGE7XkRRaP+Ud+tUXrLcRGQOyrWO47Ed5pqBnkPKnDCJwgur9U0ec9BkfM=" style="caret-color: rgb(0, 0, 0); background-color: rgba(255, 255, 255, 0);"><font color="#000000">building regulations guidance for fire safety</font></a></p>
</div>
<div dir="ltr"><br>
<blockquote type="cite">On 25 Jan 2021, at 11:49, eprints-tech-request@ecs.soton.ac.uk wrote:<br>
<br>
</blockquote>
</div>
<blockquote type="cite">
<div dir="ltr"><span>Send Eprints-tech mailing list submissions to</span><br>
<span>&nbsp; &nbsp;eprints-tech@ecs.soton.ac.uk</span><br>
<span></span><br>
<span>To subscribe or unsubscribe via the World Wide Web, visit</span><br>
<span>&nbsp; &nbsp;http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</span><br>
<span>or, via email, send a message with subject or body 'help' to</span><br>
<span>&nbsp; &nbsp;eprints-tech-request@ecs.soton.ac.uk</span><br>
<span></span><br>
<span>You can reach the person managing the list at</span><br>
<span>&nbsp; &nbsp;eprints-tech-owner@ecs.soton.ac.uk</span><br>
<span></span><br>
<span>When replying, please edit your Subject line so it is more specific</span><br>
<span>than &quot;Re: Contents of Eprints-tech digest...&quot;</span><br>
<span></span><br>
<span></span><br>
<span>Today's Topics:</span><br>
<span></span><br>
<span>&nbsp;&nbsp;1. Antwort: Re: &nbsp;Help indexing phrases (martin.braendle@uzh.ch)</span><br>
<span></span><br>
<span></span><br>
<span>----------------------------------------------------------------------</span><br>
<span></span><br>
<span>Message: 1</span><br>
<span>Date: Mon, 25 Jan 2021 12:49:00 +0100</span><br>
<span>From: &lt;martin.braendle@uzh.ch&gt;</span><br>
<span>Subject: [EP-tech] Antwort: Re: &nbsp;Help indexing phrases</span><br>
<span>To: &lt;eprints-tech@ecs.soton.ac.uk&gt;, David R Newman</span><br>
<span>&nbsp; &nbsp;&lt;drn@ecs.soton.ac.uk&gt;</span><br>
<span>Message-ID:</span><br>
<span>&nbsp; &nbsp;&lt;OF0D573B1B.0648AFE6-ONC1258668.0040E951-C1258668.0040E953@lotus.uzh.ch&gt;</span><br>
<span>&nbsp; &nbsp;</span><br>
<span>Content-Type: text/plain; charset=&quot;utf-8&quot;</span><br>
<span></span><br>
<span>CAUTION: This e-mail originated outside the University of Southampton.</span><br>
<span>Hi Phil,</span><br>
<span></span><br>
<span>in the final end, reverse indexes of standard search engines are single term based. This is a basic principle.</span><br>
<span></span><br>
<span>Xapian is pretty basic in this matter - more advanced search engines such as ElasticSearch offer field types such as &quot;keyword&quot; that allow to store multi-term expression - in the end however, the Lucene backend also will store single terms in its reverse
 indexes.</span><br>
<span></span><br>
<span>Still, there is the difficulty how to identify a multi-term expression within a bulk of text - this is usually the field of Natural Language Processing, and special tools and thesauri are needed.</span><br>
<span></span><br>
<span>Kind regards,</span><br>
<span></span><br>
<span>Martin</span><br>
<span></span><br>
<span></span><br>
<span>-----&lt;eprints-tech-bounces@ecs.soton.ac.uk&lt;mailto:eprints-tech-bounces@ecs.soton.ac.uk&gt;&gt; schrieb: -----</span><br>
<span>An: &lt;eprints-tech@ecs.soton.ac.uk&lt;mailto:eprints-tech@ecs.soton.ac.uk&gt;&gt;, &quot;Phil Stacey&quot; &lt;phil@buildvoc.co.uk&lt;mailto:phil@buildvoc.co.uk&gt;&gt;</span><br>
<span>Von: &quot;David R Newman via Eprints-tech&quot;</span><br>
<span>Gesendet von:</span><br>
<span>Datum: 25.01.2021 10:39</span><br>
<span>Betreff: Re: [EP-tech] Help indexing phrases</span><br>
<span></span><br>
<span></span><br>
<span>Hi Phil,</span><br>
<span></span><br>
<span>Unfortunately, I don't think this is possible. &nbsp;I think you would need to create a new field that is an id multiple field and use this. &nbsp;You could probably write a script to map from the uncontrolled keywords field into this new multiple id field. &nbsp;However,
 even with this new field I am not sure how well Xapian would index these as individual multi-word terms. &nbsp;Advanced search for this field should work as you require. &nbsp;In 3.4.2 I introduced the Idci MetaField that is basically the same as the Id MetaField but
 that matches case-insensitively, this is useful for mathcing things like email addresses and usernames, where case does not usually make a functional difference.</span><br>
<span></span><br>
<span>I have been thinking how best to implement a keywords fields that is more effective across simple, advanced and faceted search, particularly for multi-word terms. &nbsp;I have yet to conclude on a solution, as I need to better understand how Xapian indexing
 works to see if it can be setup to allow EPrints to effectively index multiple-word terms.</span><br>
<span></span><br>
<span>Regards</span><br>
<span></span><br>
<span>David Newman</span><br>
<span></span><br>
<span>On 25/01/2021 07:06, Phil Stacey via Eprints-tech wrote:</span><br>
<span>CAUTION: This e-mail originated outside the University of Southampton.</span><br>
<span>Using uncontrolled keywords field which has phrases separated by commas, like to index the whole phrase.</span><br>
<span></span><br>
<span>For example :-</span><br>
<span>evacuation lift, part b - fire safety, b5 access and facilities for the fire</span><br>
<span>service, fire risk assessment, residual risk, building safety, b4 external</span><br>
<span>fire spread, means of escape, principal works, health &amp; safety strategy</span><br>
<span>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Feprints.buildvoc.co.uk%2Fid%2Feprint%2F865%2F&amp;amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd650fd3e9fd54110c0b408d8c127381e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471721476180416%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;amp;sdata=lxE4DiijUGLui2WFWhwA%2FySOoWJa3Uijk2azAPpVyZg%3D&amp;amp;reserved=0&lt;https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Feprints.buildvoc.co.uk%2Fid%2Feprint%2F865%2F&amp;amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd650fd3e9fd54110c0b408d8c127381e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471721476180416%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;amp;sdata=lxE4DiijUGLui2WFWhwA%2FySOoWJa3Uijk2azAPpVyZg%3D&amp;amp;reserved=0&gt;</span><br>
<span></span><br>
<span>Question how do I configure xapian or indexing.pl to index the whole phrase instead of the individual terms for example fire, safety, or building</span><br>
<span></span><br>
<span>Best Regards,</span><br>
<span>Phil Stacey</span><br>
<span>building regulations guidance for fire safety&lt;https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Feprints.buildvoc.co.uk%2F&amp;amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd650fd3e9fd54110c0b408d8c127381e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471721476180416%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;amp;sdata=7q7J9l%2B5dM2XQviPOgkoybUzXvpLt7ztuAuhbWaxiPY%3D&amp;amp;reserved=0&gt;</span><br>
<span></span><br>
<span></span><br>
<span>*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</span><br>
<span>*** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd650fd3e9fd54110c0b408d8c127381e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471721476180416%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;amp;sdata=kOiGWkarrhigMqrcCE4cbCd82ehChP%2BD%2FlCOoUJJykY%3D&amp;amp;reserved=0&lt;https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd650fd3e9fd54110c0b408d8c127381e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471721476190368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;amp;sdata=UM%2BxhNNxVNSlHjJRgc7dYBcUrPtkPIASt1KT8Mx9LUI%3D&amp;amp;reserved=0&gt;</span><br>
<span>*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd650fd3e9fd54110c0b408d8c127381e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471721476190368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;amp;sdata=yl7aORZ4BKG4DGeS8LlAmEsBwgs8uWOXGQZBKe1%2BySc%3D&amp;amp;reserved=0&lt;https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd650fd3e9fd54110c0b408d8c127381e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471721476190368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;amp;sdata=yl7aORZ4BKG4DGeS8LlAmEsBwgs8uWOXGQZBKe1%2BySc%3D&amp;amp;reserved=0&gt;</span><br>
<span></span><br>
<span>[https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fipmcdn.avast.com%2Fimages%2Ficons%2Ficon-envelope-tick-green-avg-v1.png&amp;amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd650fd3e9fd54110c0b408d8c127381e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471721476190368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;amp;sdata=33QOtvlmixf89XhKaIYkvdAvnw1xD0KQLRLYEtWRcq4%3D&amp;amp;reserved=0]&lt;https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd650fd3e9fd54110c0b408d8c127381e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471721476190368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;amp;sdata=spaZFVYfcZU%2FrOHrM4IPdR5GIXY7trSM4hAH%2BTA88u0%3D&amp;amp;reserved=0&gt;
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Virus-free. https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2F&amp;amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd650fd3e9fd54110c0b408d8c127381e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471721476190368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;amp;sdata=NbZGUjVDKdeeSTooJrFJmXiWu9IHB0qXZkOModVvs%2FM%3D&amp;amp;reserved=0&lt;https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd650fd3e9fd54110c0b408d8c127381e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471721476190368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;amp;sdata=spaZFVYfcZU%2FrOHrM4IPdR5GIXY7trSM4hAH%2BTA88u0%3D&amp;amp;reserved=0&gt;</span><br>
<span>*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</span><br>
<span>*** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd650fd3e9fd54110c0b408d8c127381e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471721476190368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;amp;sdata=UM%2BxhNNxVNSlHjJRgc7dYBcUrPtkPIASt1KT8Mx9LUI%3D&amp;amp;reserved=0</span><br>
<span>*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd650fd3e9fd54110c0b408d8c127381e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637471721476190368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;amp;sdata=yl7aORZ4BKG4DGeS8LlAmEsBwgs8uWOXGQZBKe1%2BySc%3D&amp;amp;reserved=0</span><br>
<span>-------------- next part --------------</span><br>
<span>An HTML attachment was scrubbed...</span><br>
<span>URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20210125/251a1fed/attachment.html
</span><br>
<span></span><br>
<span>------------------------------</span><br>
<span></span><br>
<span>_______________________________________________</span><br>
<span>Eprints-tech mailing list</span><br>
<span>Eprints-tech@ecs.soton.ac.uk</span><br>
<span>http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</span><br>
<span></span><br>
<span></span><br>
<span>End of Eprints-tech Digest, Vol 148, Issue 45</span><br>
<span>*********************************************</span><br>
</div>
</blockquote>
</div>
</div>
</body>
</html>