[EP-tech] WG: Antwort: Linkcheck: HEAD method ends up in 404

David R Newman drn at ecs.soton.ac.uk
Wed Aug 5 09:13:17 BST 2020


Hi all,

I can see this happens with EPrints 3.3.12+ (and maybe earlier) but is 
not a problem with 3.4.2. Also the HEAD request works with /1234/ but 
not /id/eprint/1234/ on 3.3.12+.  If the trailing slash is missing you 
get redirected (302) to the version with a trailing slash but this then 
fails with /id/eprint/1234 on 3.3.12+ (but otherwise works).  I can take 
a look through the CRUD.pm files for 3.3.16 and 3.4.2 and see if I can 
spot what has been fixed.  I know for 3.4.2 I did some work to support 
PATCH requests but I am not aware of any recent changes/fixes for HEAD 
requests.

That said, for3.4 functionality was introduced to allow the default 
abstract/summary page URLs to be the /id/eprint/... version as Google 
Scholar said it would make it easier for them to discover and index 
these.  This is because they could tell from the URL that it was an 
EPrints repository, whereas the generic 
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhostname.example.org%2F1234%2F&data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=nzJtapsXSSWDzFSQeeMGOhjl7Pb7MzjbHZpAOyBc4fQ%3D&reserved=0 could be anything, so they would not 
know to priotize its analysis and indexing.  There is a good chance that 
whilst this work was being done either by chance this HEAD request issue 
was fixed or it was detected and fixed during adding this 
functionality.  I was not involved directly in this work, so cannot tell 
you exactly what happened, I am just going on the information I know.

Regards

David Newman

On 05/08/2020 08:51, John Salter via Eprints-tech wrote:
> One quick thought - does the trailing slash make a difference?
> In 3.3 /id/eprint/1234 redirected, but /id/eprint/1234/ was also a 404.
>
> Cheers,
> John
> ------------------------------------------------------------------------
> *From:* eprints-tech-bounces at ecs.soton.ac.uk 
> <eprints-tech-bounces at ecs.soton.ac.uk> on behalf of Martin Braendle 
> via Eprints-tech <eprints-tech at ecs.soton.ac.uk>
> *Sent:* 05 August 2020 08:43
> *To:* eprints-tech at ecs.soton.ac.uk <eprints-tech at ecs.soton.ac.uk>
> *Subject:* [EP-tech] WG: Antwort: Linkcheck: HEAD method ends up in 404
>
> Any comment on this by one of the EPrints developers @Southampton ?
>
> Kind regards,
>
> Martin
>
>
> ----- Weitergeleitet von Martin Brändle/at/UZH am 05.08.2020 09:41 -----
>
> Von: "Martin Braendle via Eprints-tech" <eprints-tech at ecs.soton.ac.uk>
> An: <eprints-tech at ecs.soton.ac.uk>
> Datum: 22.07.2020 07:57
> Betreff: [EP-tech] Antwort:  Linkcheck: HEAD method ends up in 404
> Gesendet von: <eprints-tech-bounces at ecs.soton.ac.uk>
>
> ------------------------------------------------------------------------
>
>
>
> Hi,
>
> just to bring up that topic again: perl_lib/EPrints/Apache/CRUD.pm 
> should allow HEAD requests for https://{repo}/id/eprint/{xy}/  - that 
> is why we wonder that EPrints returns a 404 ?
>
> We observe that not only with our repo, but with other EPrints repos 
> as well, e.g.
>
> curl  "_https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmadoc.bib.uni-mannheim.de%2Fid%2Feprint%2F3147%2F_&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=AROyiYOVadFtefBq%2FNAuurqHpeILkTvtP67YcKhkZMU%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmadoc.bib.uni-mannheim.de%2Fid%2Feprint%2F3147%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=j%2BlzgSOT%2FfUudDt1LMK%2Fnnaoy3LsjCLGoAPuBXcAX0k%3D&amp;reserved=0>" 
>   yields the page
> curl --head "_https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmadoc.bib.uni-mannheim.de%2Fid%2Feprint%2F3147%2F_&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=AROyiYOVadFtefBq%2FNAuurqHpeILkTvtP67YcKhkZMU%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmadoc.bib.uni-mannheim.de%2Fid%2Feprint%2F3147%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=j%2BlzgSOT%2FfUudDt1LMK%2Fnnaoy3LsjCLGoAPuBXcAX0k%3D&amp;reserved=0>" 
>  yields HTTP status 404!
>
> So this must be a general bug of EPrints and it is not working 
> according to the specification in perl_lib/EPrints/Apache/CRUD.pm
>
> Kind regards,
>
> Martin
>
> --
> Dr. Martin Brändle
> Zentrale Informatik
> Universität Zürich
> Stampfenbachstr. 73
> CH-8006 Zürich
>
> Inactive hide details for "Martin Braendle via Eprints-tech" 
> ---14.07.2020 14:12:41---Hi out there we're working on a 
> linkcheck"Martin Braendle via Eprints-tech" ---14.07.2020 
> 14:12:41---Hi out there we're working on a linkchecker to remove all 
> gone official and related
>
> Von: "Martin Braendle via Eprints-tech" <eprints-tech at ecs.soton.ac.uk>
> An: <eprints-tech at ecs.soton.ac.uk>
> Datum: 14.07.2020 14:12
> Betreff: [EP-tech] Linkcheck: HEAD method ends up in 404
> Gesendet von: <eprints-tech-bounces at ecs.soton.ac.uk>
> ------------------------------------------------------------------------
>
>
>
> Hi out there
>
> we're working on a linkchecker to remove all gone official and related 
> links in our Repo. Some of the URLs return to our own Repo and 
> lickchecker gets an ugly 404 although the publications exist.
>
> So, what we're doing is some LWP::UserAgent  stuff, a simple get HEAD 
> of the URL an then analyze the response. If there was a '$status_code 
> == HTTP_METHOD_NOT_ALLOWED' we would try a GET and all together we're 
> doing some delay/retry/timeout handling. But in the end we allways 
> catch a 404 :-(
>
> Additional information
> - We use a 404 handler
> - We're allowed to use Get, Put, Trace, Options - all fine, only HEAD 
> method results in a 404 ?!?
> - We use the redirect from _https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1_&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=Vssg8ogmr2VB0kbCnHqzlhfCltalcKLW2YWHPWJkBio%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=xkMc2tdAtPR%2F08psr1J2qBPcHazQr7GbrOJ7dzNSyqE%3D&amp;reserved=0> => 
> _https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fid%2Feprint%2F1%2F_&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=%2FF77kHizWtYbkNfbQvD6YpU1W6b1tS0nyHcpk5hEpI8%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fid%2Feprint%2F1%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=POMVYNVqfmiu8GB4nx1nFZ2OeB00veJdrL4BsCoxjwU%3D&amp;reserved=0> and 
> it only seems to concern this dynamic type of content; static pages 
> work fine.
>
> Let's show some examples via CURL:
>
> [zora]$ *curl -i -X HEAD -L "**_https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1_*&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=ab1mXqeJ8eVibdBOkT6rXjffrlEntCsYrANavt3E9vc%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=xkMc2tdAtPR%2F08psr1J2qBPcHazQr7GbrOJ7dzNSyqE%3D&amp;reserved=0>*" 
> (**_https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1%27_*&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=2JKqzEPEO8KduF1XYdkv5x%2Bw4QapcQJpTKaqjAzF2cg%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1%27&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=5Hpzg2W2PV8kpXgsDlnNsQ2tszM4M%2Fl46yZV1KFbnug%3D&amp;reserved=0>*) 
>
> HTTP/1.1 303 See Other*
> Date: Tue, 14 Jul 2020 11:49:08 GMT
> Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips 
> mod_perl/2.0.11 Perl/v5.16.3
> Location: /id/eprint/1
>
> HTTP/1.1 303 See Other
> Date: Tue, 14 Jul 2020 11:49:13 GMT
> Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips 
> mod_perl/2.0.11 Perl/v5.16.3
> Allow: GET,HEAD,PUT,OPTIONS
> Location: _https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fid%2Feprint%2F1%2F_&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=%2FF77kHizWtYbkNfbQvD6YpU1W6b1tS0nyHcpk5hEpI8%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fid%2Feprint%2F1%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=POMVYNVqfmiu8GB4nx1nFZ2OeB00veJdrL4BsCoxjwU%3D&amp;reserved=0>
> Strict-Transport-Security: max-age=15780000*
>
> HTTP/1.1 404 Not Found*
> Date: Tue, 14 Jul 2020 11:49:18 GMT
> Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips 
> mod_perl/2.0.11 Perl/v5.16.3
> Cache-Control: no-store, no-cache, must-revalidate
> Strict-Transport-Security: max-age=15780000
> Content-Type: text/html; charset=utf-8
>
>
>
> [zora]$ *curl -i -X HEAD -L "**_https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F_*&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=V%2Bxfrvdw62f0qM9JnJRm%2Bl3oQCQG0WQSZexVcsF0F%2FY%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=G5bM7XvwQPMJA6tAXBiDHK5VF7KMCcRGcCi199BgMnM%3D&amp;reserved=0>*" 
> (**_https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F%27_*&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=Y2ko9Gjiu6BB6pDuJNGJhDvjKjiBl3kedjpYe1T42hI%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F%27&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=Nr%2BK1hNB2OiFYAN3uMpUKCNie91%2F%2BYgRiKLGtofHbN4%3D&amp;reserved=0>*) 
>
> HTTP/1.1 200 OK*
> Date: Tue, 14 Jul 2020 11:49:31 GMT
> Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips 
> mod_perl/2.0.11 Perl/v5.16.3
> Expires: Thu, 13 Aug 2020 11:49:31 GMT
> Cache-Control: no-store, no-cache, must-revalidate
> Vary: Accept-Encoding
> Strict-Transport-Security: max-age=15780000
> Content-Type: text/html; charset=utf-8
>
>
>
> [zora]$ *curl -i -X HEAD -L "**_https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fhelp%2F_*&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=nEt6tEUpskLEb9yo5uHudLTbMK%2FLjw1R0uC%2Fho2bmlI%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fhelp%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=DXN3OvKykSJKfQaBHmq9bhIofkWsWL%2FDSdwXGQWc7wg%3D&amp;reserved=0>*" 
> (**_https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fhelp%2F%27_*&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=oU5%2B3ZGVDdTsuuoCsxaImbB4LivmbXd7sW91CprrhFs%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fhelp%2F%27&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=8Z1IO%2BLz1cOWeKTHbg8GWOUgSMfLze3F%2BQlVui3crqY%3D&amp;reserved=0>*) 
>
> HTTP/1.1 200 OK*
> Date: Tue, 14 Jul 2020 11:49:53 GMT
> Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips 
> mod_perl/2.0.11 Perl/v5.16.3
> Expires: Thu, 13 Aug 2020 11:49:53 GMT
> Cache-Control: no-store, no-cache, must-revalidate
> Vary: Accept-Encoding
> Strict-Transport-Security: max-age=15780000
> Content-Type: text/html; charset=utf-8
>
>
> Does anybody has any suggestion, solution, hint?
>
> Kind gerads from Zürich
> Martin & Jens
> *** Options: 
> _http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech_
> *** Archive: _https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F_&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=zZxfjYVthGQs3kkO68JtiYhr9whFEw6kU%2FuhqQ1nCzc%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=O4q5dpuubC03KJYtYxSQnUkTgiY3NlkdNUHwcEdVUDw%3D&amp;reserved=0>
> *** EPrints community wiki: _https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F_&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=u2ijKr%2BDpA1qTUElx1XE86Jwo4J6KCkudLgHa87iaiw%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=dnI4TwoX%2FNra%2BmjQbjgWqXGEZj9sDwZw%2BjifphBEBis%3D&amp;reserved=0>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=O4q5dpuubC03KJYtYxSQnUkTgiY3NlkdNUHwcEdVUDw%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=O4q5dpuubC03KJYtYxSQnUkTgiY3NlkdNUHwcEdVUDw%3D&amp;reserved=0>
> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=dnI4TwoX%2FNra%2BmjQbjgWqXGEZj9sDwZw%2BjifphBEBis%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=dnI4TwoX%2FNra%2BmjQbjgWqXGEZj9sDwZw%2BjifphBEBis%3D&amp;reserved=0>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=O4q5dpuubC03KJYtYxSQnUkTgiY3NlkdNUHwcEdVUDw%3D&amp;reserved=0
> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=dnI4TwoX%2FNra%2BmjQbjgWqXGEZj9sDwZw%2BjifphBEBis%3D&amp;reserved=0


-- 
This email has been checked for viruses by AVG.
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.avg.com%2F&amp;data=01%7C01%7C%7C2619ddaf0c464aae947508d8391769a1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=DAVVrEs4Tdh9Z58pRvddJQaV9ByJMJptsn9PneVeSVc%3D&amp;reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20200805/6b99eb3f/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1B219717.gif
Type: image/gif
Size: 105 bytes
Desc: not available
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20200805/6b99eb3f/attachment-0001.gif 


More information about the Eprints-tech mailing list