[EP-tech] WG: Antwort: Linkcheck: HEAD method ends up in 404

John Salter J.Salter at leeds.ac.uk
Wed Aug 5 08:51:49 BST 2020


One quick thought - does the trailing slash make a difference?
In 3.3 /id/eprint/1234 redirected, but /id/eprint/1234/ was also a 404.

Cheers,
John
________________________________
From: eprints-tech-bounces at ecs.soton.ac.uk <eprints-tech-bounces at ecs.soton.ac.uk> on behalf of Martin Braendle via Eprints-tech <eprints-tech at ecs.soton.ac.uk>
Sent: 05 August 2020 08:43
To: eprints-tech at ecs.soton.ac.uk <eprints-tech at ecs.soton.ac.uk>
Subject: [EP-tech] WG: Antwort: Linkcheck: HEAD method ends up in 404


Any comment on this by one of the EPrints developers @Southampton ?

Kind regards,

Martin


----- Weitergeleitet von Martin Brändle/at/UZH am 05.08.2020 09:41 -----

Von: "Martin Braendle via Eprints-tech" <eprints-tech at ecs.soton.ac.uk>
An: <eprints-tech at ecs.soton.ac.uk>
Datum: 22.07.2020 07:57
Betreff: [EP-tech] Antwort:  Linkcheck: HEAD method ends up in 404
Gesendet von: <eprints-tech-bounces at ecs.soton.ac.uk>

________________________________



Hi,

just to bring up that topic again: perl_lib/EPrints/Apache/CRUD.pm should allow HEAD requests for https://{repo}/id/eprint/{xy}/  - that is why we wonder that EPrints returns a 404 ?

We observe that not only with our repo, but with other EPrints repos as well, e.g.

curl  "https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmadoc.bib.uni-mannheim.de%2Fid%2Feprint%2F3147%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=stB6wvg%2FrJb187jZGORe7FP3MxFqETg1S4uRmzx4tw8%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmadoc.bib.uni-mannheim.de%2Fid%2Feprint%2F3147%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=stB6wvg%2FrJb187jZGORe7FP3MxFqETg1S4uRmzx4tw8%3D&amp;reserved=0>"   yields the page
curl --head "https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmadoc.bib.uni-mannheim.de%2Fid%2Feprint%2F3147%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=stB6wvg%2FrJb187jZGORe7FP3MxFqETg1S4uRmzx4tw8%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmadoc.bib.uni-mannheim.de%2Fid%2Feprint%2F3147%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=stB6wvg%2FrJb187jZGORe7FP3MxFqETg1S4uRmzx4tw8%3D&amp;reserved=0>"  yields HTTP status 404!

So this must be a general bug of EPrints and it is not working according to the specification in perl_lib/EPrints/Apache/CRUD.pm

Kind regards,

Martin

--
Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Stampfenbachstr. 73
CH-8006 Zürich

[Inactive hide details for "Martin Braendle via Eprints-tech" ---14.07.2020 14:12:41---Hi out there we're working on a linkcheck]"Martin Braendle via Eprints-tech" ---14.07.2020 14:12:41---Hi out there we're working on a linkchecker to remove all gone official and related

Von: "Martin Braendle via Eprints-tech" <eprints-tech at ecs.soton.ac.uk>
An: <eprints-tech at ecs.soton.ac.uk>
Datum: 14.07.2020 14:12
Betreff: [EP-tech] Linkcheck: HEAD method ends up in 404
Gesendet von: <eprints-tech-bounces at ecs.soton.ac.uk>
________________________________



Hi out there

we're working on a linkchecker to remove all gone official and related links in our Repo. Some of the URLs return to our own Repo and lickchecker gets an ugly 404 although the publications exist.

So, what we're doing is some LWP::UserAgent  stuff, a simple get HEAD of the URL an then analyze the response. If there was a '$status_code == HTTP_METHOD_NOT_ALLOWED' we would try a GET and all together we're doing some delay/retry/timeout handling. But in the end we allways catch a 404 :-(

Additional information
- We use a 404 handler
- We're allowed to use Get, Put, Trace, Options - all fine, only HEAD method results in a 404 ?!?
- We use the redirect from https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=XLKeai65X%2FrSiIsax9DL6Skq%2B%2BJm9yO2hp9u9Xts7QY%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=XLKeai65X%2FrSiIsax9DL6Skq%2B%2BJm9yO2hp9u9Xts7QY%3D&amp;reserved=0> => https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fid%2Feprint%2F1%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=Hlg2HLlGOhsCWq3frydZVgW1Baypx0niZNKQzlY%2F9kY%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fid%2Feprint%2F1%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=Hlg2HLlGOhsCWq3frydZVgW1Baypx0niZNKQzlY%2F9kY%3D&amp;reserved=0> and it only seems to concern this dynamic type of content; static pages work fine.

Let's show some examples via CURL:

[zora]$ curl -i -X HEAD -L "https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=XLKeai65X%2FrSiIsax9DL6Skq%2B%2BJm9yO2hp9u9Xts7QY%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=XLKeai65X%2FrSiIsax9DL6Skq%2B%2BJm9yO2hp9u9Xts7QY%3D&amp;reserved=0>" (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=XLKeai65X%2FrSiIsax9DL6Skq%2B%2BJm9yO2hp9u9Xts7QY%3D&amp;reserved=0'<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1%27&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=GTVUJSqISvMB25KepFTDEGJ3cRQe4iovR5yhxQfMAOo%3D&amp;reserved=0>)
HTTP/1.1 303 See Other
Date: Tue, 14 Jul 2020 11:49:08 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips mod_perl/2.0.11 Perl/v5.16.3
Location: /id/eprint/1

HTTP/1.1 303 See Other
Date: Tue, 14 Jul 2020 11:49:13 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips mod_perl/2.0.11 Perl/v5.16.3
Allow: GET,HEAD,PUT,OPTIONS
Location: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fid%2Feprint%2F1%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=Hlg2HLlGOhsCWq3frydZVgW1Baypx0niZNKQzlY%2F9kY%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fid%2Feprint%2F1%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=Hlg2HLlGOhsCWq3frydZVgW1Baypx0niZNKQzlY%2F9kY%3D&amp;reserved=0>
Strict-Transport-Security: max-age=15780000

HTTP/1.1 404 Not Found
Date: Tue, 14 Jul 2020 11:49:18 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips mod_perl/2.0.11 Perl/v5.16.3
Cache-Control: no-store, no-cache, must-revalidate
Strict-Transport-Security: max-age=15780000
Content-Type: text/html; charset=utf-8



[zora]$ curl -i -X HEAD -L "https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=cQP4glxG1us%2F0vbaRh9%2BRfUPk2UHUmkJ9l08391GdNw%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=cQP4glxG1us%2F0vbaRh9%2BRfUPk2UHUmkJ9l08391GdNw%3D&amp;reserved=0>" (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=cQP4glxG1us%2F0vbaRh9%2BRfUPk2UHUmkJ9l08391GdNw%3D&amp;reserved=0'<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F%27&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=ylEpiKxquvzneedYatdt8B0%2FXQSJWFzStEHjOXesqp8%3D&amp;reserved=0>)
HTTP/1.1 200 OK
Date: Tue, 14 Jul 2020 11:49:31 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips mod_perl/2.0.11 Perl/v5.16.3
Expires: Thu, 13 Aug 2020 11:49:31 GMT
Cache-Control: no-store, no-cache, must-revalidate
Vary: Accept-Encoding
Strict-Transport-Security: max-age=15780000
Content-Type: text/html; charset=utf-8



[zora]$ curl -i -X HEAD -L "https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fhelp%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=AJMrusxgfA3MHrb3pZ8Kpk0BzBIKt8ZvmT1y3amlPMA%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fhelp%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=AJMrusxgfA3MHrb3pZ8Kpk0BzBIKt8ZvmT1y3amlPMA%3D&amp;reserved=0>" (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fhelp%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=AJMrusxgfA3MHrb3pZ8Kpk0BzBIKt8ZvmT1y3amlPMA%3D&amp;reserved=0'<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fhelp%2F%27&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=2rV9oeAiK35J5j%2FcbAhVXVE2CCluuaWTC6nHg3SeOxc%3D&amp;reserved=0>)
HTTP/1.1 200 OK
Date: Tue, 14 Jul 2020 11:49:53 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips mod_perl/2.0.11 Perl/v5.16.3
Expires: Thu, 13 Aug 2020 11:49:53 GMT
Cache-Control: no-store, no-cache, must-revalidate
Vary: Accept-Encoding
Strict-Transport-Security: max-age=15780000
Content-Type: text/html; charset=utf-8


Does anybody has any suggestion, solution, hint?

Kind gerads from Zürich
Martin & Jens
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=BlOwFdj4qGWIJVTBl3jNv9BpIGu96a9fuQePI08fJtE%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=BlOwFdj4qGWIJVTBl3jNv9BpIGu96a9fuQePI08fJtE%3D&amp;reserved=0>
*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=4WM0pTIARwe0Ucl6PLxzVVSXhh3ChbhIPhCWfy8gNcs%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=4WM0pTIARwe0Ucl6PLxzVVSXhh3ChbhIPhCWfy8gNcs%3D&amp;reserved=0>
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=BlOwFdj4qGWIJVTBl3jNv9BpIGu96a9fuQePI08fJtE%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=BlOwFdj4qGWIJVTBl3jNv9BpIGu96a9fuQePI08fJtE%3D&amp;reserved=0>
*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=4WM0pTIARwe0Ucl6PLxzVVSXhh3ChbhIPhCWfy8gNcs%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C82adf1a1c6bf49f0235408d839146abf%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=4WM0pTIARwe0Ucl6PLxzVVSXhh3ChbhIPhCWfy8gNcs%3D&amp;reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20200805/deacf511/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1B219717.gif
Type: image/gif
Size: 105 bytes
Desc: 1B219717.gif
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20200805/deacf511/attachment-0001.gif 


More information about the Eprints-tech mailing list