[EP-tech] redirect some eprintid url to another site

Yuri Carrer yuri.carrer at unipd.it
Wed May 18 09:31:03 BST 2022


CAUTION: This e-mail originated outside the University of Southampton.

Hi!


 thanks for this, very interesting. The main goal is to remove the site totally from public, even the login, styles, javascript. I could put traefik or a proxy in front of it, if there's no way for Apache to asks for a basic auth before anything else. I think this is the tricky part:


  <Directory "/usr/share/eprints3/cgi/users">
    AuthName "User Area"
    AuthType "Basic"
    PerlAuthenHandler EPrints::Apache::Auth::authen
    PerlAuthzHandler EPrints::Apache::Auth::authz
    require valid-user

    ...


I've added a


<Location /cgi/users/home>
    AuthType Basic
    AuthName "Restricted Content"
    AuthUserFile /etc/apache2/.htpasswd
    Require valid-user
</Location>


but it is ignored.


Il 17/05/22 18:54, John Salter via Eprints-tech ha scritto:
Hi Yuri,
You could use a trigger based on this:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FDisplay_a_custom_response_during_downtime&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=qhw028OnnzaNFb7PXIDUJz90jJRWzn04%2BFZdVcp1Jow%3D&amp;reserved=0
Instead of having a whitelisted IP address, you could check for a current_user:
-    if( $repository->config( "maintenance_allowed_ip" ) && $request->connection->remote_ip eq $repository->config( "maintenance_allowed_ip" ) ){
+    if( defined $repository->current_user ){


If you use EPrints user login (e.g. not 3rd party SSO), you would have to let the login page be displayed by changing this line (I haven't tested the updated one):
-    elsif( $uri !~ /^$urlpath\/(style|images)\// )
+    elsif( $uri !~ #^$urlpath/(style/|images/|cgi/users/login|cgi/users/home)# )

Cheers,
John



From: eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of Yuri via Eprints-tech
Sent: 16 May 2022 10:34
To: eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>
Subject: Re: [EP-tech] redirect some eprintid url to another site

CAUTION: This e-mail originated outside the University of Southampton.

Thanks to all, everything has worked as expected.



A question:



I need to give access to the old content for some months, so I'll protect the new url with a basic auth. Doing it in '/var/auto-apache.conf' works but not /cgi/users/home because the perl handler kicks in before (page works but no css/js because they are blocked by auth).



Is there a documented way to make a Eprints repository totally private? I mean, all the site can be accessed only with a login and pass.


Il 15/03/22 12:18, David R Newman ha scritto:

Hi Yuri,



Assuming your repository is already HTTPS only, so you only need worry about editing ssl/securevhost.conf which has the PerlTransHandler line you could put in side a LocationMatch around the outside.  You can technically do this for HTTP configuration but it will get overwritten if you ever run generate_apacheconf again.



I have not ever tried putting a LocationMatch around a PerlTransHandler line but I am not aware of anything in Apache that would stop that working.  Presumably you can separately write your mod rewrite rules in Apache configuration to deal with the redirects you need.  However, it sounds like if you knew some of the redirects that would not help you guess the rest so I assume you would need to programmatically generate the mod rewrite rules.



I assume by rename the archive you mean put on another hostname.  You can pretty much do what you said but I would make sure you set aside plenty of time and a backout strategy in case you have problems.  For changing the base URL you will need to edit the archive's cfg/cfg.d/10_core.pl.  Usually it is only the host and/or securehost configuration settings that need to be changed.  I would then run all of the following scripts:



(0. epadmin test)

1. generate_apacheconf

2. generate_static

3. apachectl restart (reload is probably sufficient but just to be safe I tend to use restart when I change the Apache config)

4. epadmin refresh_abstracts

5. epadmin refresh_views



I would also make sure you restart the indexer via the web admin menu.  I cannot think of a specific reason why indexer tasks would care about a hostname change but probably best to be sure.



Having a look at Jon Salter's suggestion, that looks like a good solution.  Although still hits EPrints, so would contribute to more server load than being able to redirect before hitting EPrints.  Although, I think it is fairly negligible unless your server is always running hot.



Regards



David Newman






On 15/03/2022 10:28, Yuri wrote:
CAUTION: This e-mail originated outside the University of Southampton.

Hi David!



 being it almost 99% of the archive and some thousands of items, it is quite difficult to have thousands lines $c->{rewrite_exceptions} but seems the only possible path, being the perl handler running before rewriterule. In other cases, it is possible to use LocationMatch to set the default handler thus running rewrite rules.



Another option could be rename the old archive? Thus we could use the virtualhost to do just redirects, and access old items (we need them internally anyway).



Other than changing the base url, change apache configs, running generate_static / generate_abstracts, what would I need to rename the old archive?


Il 14/03/22 18:00, David R Newman ha scritto:

Hi both,



I have been doing something similar recently, albeit for abstract pages.  I prefer the brute force approach of adding to $c->{rewrite_exceptions} and them manually adding the Apache Mod Rewrite rules to an archive level file called cfg/apache_redirects.conf and then including that in  cfg/apachevhost.conf and/or ssl/securevhost.conf.  You could write a script to programmatically generate this and the cfg.d file for rewrite_exceptions from a mapping file.



I had considered doing something that would allow you to add a metadata field called redirect_url or similar that you could just edit as a user (probably an admin or editor only if the item is live), which could then be used to automatically redirect off site.  However, that would require some changes at a core level, which feels a bit excessive for tackling this problem.



One option, if you want to redirect just from abstract pages, is you could test for this new redirect_url field being set and if it is embedding some JavaScript in the abstract page that redirects to the new URL.  That is a bit hacky but makes it easier to add new items to be redirected in future rather than having to maintain a mappings list independent of the database.  However, this is no use if you want to redirect document URLs.  I am not sure whether that is what you want to do?



Regards



David Newman




On 14/03/2022 4:07 pm, Yuri via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.

Hi John!



 thanks for sharing the gist. The objects are about 10.000, so I should load the map from a file. Unfortunately, It is an old Eprints without the EP_TRIGGER_URL_REWRITE but I think I just can copy the code at the begin of Rewrite.pm.



Thanks!


Il 14/03/22 16:35, John Salter ha scritto:
Hi Yuri,
I would use the EPrints URL Rewrite trigger.

How many items are mapped to the other system?
Do you want to map landing page requests to one URL, and document requests to another URL (e.g. directly to the document in the other system)?

This gist: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fjesusbagpuss%2Fa5c574e1839612ef7e332d1d25edac42&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=81s5diIV580EFZ5ldP4mZX7C0hypKHMxiKcthxZJqA8%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fjesusbagpuss%2Fa5c574e1839612ef7e332d1d25edac42&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=81s5diIV580EFZ5ldP4mZX7C0hypKHMxiKcthxZJqA8%3D&amp;reserved=0> allows you to specify the eprintid / new locations in a hash.
If all the new locations are on the same site, you could update line 19 to include the new base http URL, and just have the eprintid => otherid in the hash.

As written, it will capture requests for anything starting with the EPrintID (requests for the landing page; downloads; thumbnail requests).
You could map these URLs individually, and change the regex match on line 13 to redirect document requests to the new document URL; landing page requests to the new landing page etc.

Hope that helps - let me know if you need more info.

Cheers,
John



________________________________
From: eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> <eprints-tech-bounces at ecs.soton.ac.uk><mailto:eprints-tech-bounces at ecs.soton.ac.uk> on behalf of Yuri via Eprints-tech <eprints-tech at ecs.soton.ac.uk><mailto:eprints-tech at ecs.soton.ac.uk>
Sent: 14 March 2022 13:45
To: EPrints.org Technical List <eprints-tech at ecs.soton.ac.uk><mailto:eprints-tech at ecs.soton.ac.uk>
Subject: [EP-tech] redirect some eprintid url to another site

CAUTION: This e-mail originated outside the University of Southampton.

Hi!

  we're migrating many objects from eprints to various other platform. I
would like to make redirects for the URLs of this documents. For example
from myeprint.com/eprintid to another.site.com/otherid (I have a map
with eprintid otherurl)

I'm trying to do it with RewriteMap and RewriteRule but Eprints define
the perl handler to manage urls (PerlTransHandler
EPrints::Apache::Rewrite) to handle rewrites. I would like not to use
cfg.d/url.pl because there are a lot of objects.

Any idea? Should I patch Rewrite.pm to do it internally from a mapfile?
Return DECLINED? I don't know if it is worth the time, I would prefer a
simpler solution.


*** Options: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.ecs.soton.ac.uk%2Fmailman%2Flistinfo%2Feprints-tech&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=3po7fcFfST5IssAuKs2Z4EZ%2Fbum9O0vYkZC3pktYFqQ%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.ecs.soton.ac.uk%2Fmailman%2Flistinfo%2Feprints-tech&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=3po7fcFfST5IssAuKs2Z4EZ%2Fbum9O0vYkZC3pktYFqQ%3D&amp;reserved=0>
*** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=KVnIyTW7FAc8e9KwfKUSsz5jrHhKUhhDQ6wAc%2B0q9Ak%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=KVnIyTW7FAc8e9KwfKUSsz5jrHhKUhhDQ6wAc%2B0q9Ak%3D&amp;reserved=0>
*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=tZflISRTcH6CNrARf3M6q7kJkBWVmGBTIz9he6mm8yc%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=tZflISRTcH6CNrARf3M6q7kJkBWVmGBTIz9he6mm8yc%3D&amp;reserved=0>



*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.ecs.soton.ac.uk%2Fmailman%2Flistinfo%2Feprints-tech&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=3po7fcFfST5IssAuKs2Z4EZ%2Fbum9O0vYkZC3pktYFqQ%3D&amp;reserved=0>

*** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=KVnIyTW7FAc8e9KwfKUSsz5jrHhKUhhDQ6wAc%2B0q9Ak%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=KVnIyTW7FAc8e9KwfKUSsz5jrHhKUhhDQ6wAc%2B0q9Ak%3D&amp;reserved=0>

*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=tZflISRTcH6CNrARf3M6q7kJkBWVmGBTIz9he6mm8yc%3D&amp;reserved=0<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cf24c88c6b45e46cb4f7608da38a8c037%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637884594673919689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=tZflISRTcH6CNrARf3M6q7kJkBWVmGBTIz9he6mm8yc%3D&amp;reserved=0>

--
Yuri Carrer

 CAB - Centro di Ateneo per le Biblioteche, Università di Padova
 Tel: 049/827 9712 - Via Beato Pellegrino, 28 - Padova

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20220518/bc7fc529/attachment-0001.html 


More information about the Eprints-tech mailing list