<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="blue" vlink="purple">
<div style="padding-bottom: 10px; padding-top: 5px;">
<div style="padding:12px; border:1px solid #8D3970; background-color:#F7F9FA; color:#8D3970; font-size:14px; line-height:22px; font-family: Calibri, Arial, Helvetica, sans-serif;">
<strong>CAUTION:</strong> This e-mail originated outside the University of Southampton.
</div>
</div>
<div>
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Just another quick thought: most harvesters present a user-agent string of either:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">- something useful e.g. 'IRUS_metadata_harvesting_bot' or '</span>
<span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">
Unpaywall (http://unpaywall.org/; mailto:team@impactstory.org)'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">- something software-y e.g. '</span>
<span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">
Apache-HttpClient/4.5.1 (Java/11.0.15) ', 'pyoai' or 'GuzzleHttp/6.5.5 curl/7.58.0 PHP/7.4.29'<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">These could also be triggering a WAF (or similar mechanism) to say 'no'.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">As the requests are currently being blocked, they probably aren't reaching your Apache logs, but you could check older logs with something
 like this (assuming you're using the common log format) to get a list of user-agents hitting the OAI endpoint, and how many times they've been:<br>
you@server&gt; grep 'oai2' /path/to/the/apache/access.log | cut -d\&quot; -f6 | sort | uniq -c | sort -n<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">The 'use a double-quote as a delimiter' feels a bit hacky - but in this case I think is easier than splitting on whitespace or another
 character!<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">John<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">From:</span></b><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif"> eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
<b>On Behalf Of </b>John Salter via Eprints-tech<br>
<b>Sent:</b> 09 August 2022 10:11<br>
<b>To:</b> eprints-tech@ecs.soton.ac.uk; James Kerwin &lt;jkerwin2101@gmail.com&gt;<br>
<b>Subject:</b> Re: [EP-tech] OAI Harvester broken by new security<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<div style="border:solid #8D3970 1.0pt;padding:9.0pt 9.0pt 9.0pt 9.0pt">
<p class="MsoNormal" style="line-height:16.5pt;background:#F7F9FA"><strong><span style="font-size:10.5pt;font-family:&quot;Calibri&quot;,sans-serif;color:#8D3970">CAUTION:</span></strong><span style="font-size:10.5pt;font-family:&quot;Calibri&quot;,sans-serif;color:#8D3970"> This
 e-mail originated outside the University of Southampton. <o:p></o:p></span></p>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Hi James,<br>
I'm guessing the 'security changes' include a WAF (web application firewall) or similar?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">The OAI-PMH resumptionToken isn't that complicated - essentially parameters that can be passed to the script directly are URL-encoded.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">I can see how this might trigger some WAF rules.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">I think the main approaches are:-<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">- whitelist the OAI-PMH endpoint in the WAF<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">- whitelist harvested in the WAF (you might not know all harvesters that visit your repo though!)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">- create a ruleset for the OAI-PMH vocabulary to be included in the WAF<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">The nature of an OAI-PMH harvest could look very much like a bad-actor probing your server.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">The nature of the response payload could also mean the harvest creates peaks in server usage, which could make automated tooling connect
 the OAI-PMH requests to a DOS style attack.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Without knowing exactly what's at play it's difficult to make more refined suggestions.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Happy to have an off-list discussion about this, seeing as it's security-related.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">John<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">From:</span></b><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">
<a href="mailto:eprints-tech-bounces@ecs.soton.ac.uk">eprints-tech-bounces@ecs.soton.ac.uk</a> [<a href="mailto:eprints-tech-bounces@ecs.soton.ac.uk">mailto:eprints-tech-bounces@ecs.soton.ac.uk</a>]
<b>On Behalf Of </b>James Kerwin via Eprints-tech<br>
<b>Sent:</b> 09 August 2022 09:57<br>
<b>To:</b> <a href="mailto:eprints-tech@ecs.soton.ac.uk">eprints-tech@ecs.soton.ac.uk</a><br>
<b>Subject:</b> [EP-tech] OAI Harvester broken by new security<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<div style="border:solid #8D3970 1.0pt;padding:9.0pt 9.0pt 9.0pt 9.0pt">
<p class="MsoNormal" style="line-height:16.5pt;background:#F7F9FA"><strong><span style="font-size:10.5pt;font-family:&quot;Calibri&quot;,sans-serif;color:#8D3970">CAUTION:</span></strong><span style="font-size:10.5pt;font-family:&quot;Calibri&quot;,sans-serif;color:#8D3970"> This
 e-mail originated outside the University of Southampton. <o:p></o:p></span></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">Hello all, <o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class="MsoNormal">Hope everyone is doing well.&nbsp;<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class="MsoNormal">This isn't a specific&nbsp;EPrints problem, but as you all use EPrints there may be some experience...<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class="MsoNormal">We've had some security changes at the uni recently. Some of these result in us clicking buttons in EPrints and then we get taken to our IT Services security page. So far we've handled this by accessing via the university network (e.g.
 VPN).<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class="MsoNormal">This issue has now hit our OAI harvester. Specifically under &quot;ListRecords&quot; when we click the &quot;Resume&quot; button (<a href="https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2Fcgi%2Foai2%3Fverb%3DListRecords%26metadataPrefix%3Doai_dc&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C920ae137da0c40287b4408da79ecd7b2%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637956355379273068%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=oiQsWohAKYjImanZ9V6SPm%2BEdjNISaR9kKfCe2h82Fc%3D&amp;reserved=0" originalSrc="https://livrepository.liverpool.ac.uk/cgi/oai2?verb=ListRecords&metadataPrefix=oai_dc" shash="r7Z8JHuX4Lp/opwn8dXuOtnA8HQwchQl5ZI+2dUEDCFwW9Qpq3vgG13kJOPus0oQA6PwjpuoPCKjN2/pxqX9QlkqSodx1zjd12YbjhGV29YhWJ5MbYHbfkedt46BF3gozVWOjAW9xaeuvCpjeGIlA6AxwtMmZ3PJcELrnAebtxU=" target="_blank">https://livrepository.liverpool.ac.uk/cgi/oai2?verb=ListRecords&amp;metadataPrefix=oai_dc</a>).
 Currently the organisations that usually harvest our content are unable to. I have spoken with our IT Services team to find a solution. Has anybody else experienced similar issues at their organisations and are there any steps you think I can take to resolve
 it?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class="MsoNormal">It doesn't help that I don't know how resumption tokens work. I assume they are stored in a database somewhere? Or a file? The other incidences of this in the repository occur when making changes to file metadata, though not EPrint record
 metadata.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">James<o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>