[EP-tech] Plan S - Persistent Identifiers
David R Newman
drn at ecs.soton.ac.uk
Wed Apr 28 10:50:45 BST 2021
Hi James,
Fortunately (or unfortunately) I have had quite a few thoughts on the
matter. I have done my best to keep them to the point.
First, I don't think it is possible to account for the same item being
in multiple repositories. As an individual institutional repository
owner you have no control over other institutional repositories who may
have shared authors on publications and have the right to make the same
publication available on their institutional repositories. Having a
background in the Semantic Web, trying to determine if two things with
different unique identifiers are actually the same thing is a near
impossible problem to solve definitively. The best you can do is ensure
the same unique identifier is not somehow used for two different things
and also avoid creating and using more unique identifiers than are
absolutely necessary.
EPrints has always had a unique identifier in the form of a URI (e.g.
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Feprints.example.org%2Fid%2Feprint%2F123&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C77295f4a36d24085692708d90a2b1889%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637552002514180923%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=L59g46Q%2F6g7BDVoQujkGnzddHtfWTOpQOpjM4cdCA20%3D&reserved=0). I would suggest this is the
most appropriate unique identifier to use as every item in your
repository will have one but not every item will necessarily have a DOI
or similar unique identifier. You could configure your repository to
use a DOI minting service (e.g. data repositories often use DataCite)
but this rather breaks the rule of creating more unique identifiers than
are absolutely necessary.
One potential problem I have noted with EPrints URIs is that these were
all originally http but if you modify you HTTPS configuration to ensure
HTTPS is used everywhere, then these URIs will likely also be changed to
https, making them non-persistent which is another big no-no. For this
reason, early on in EPrints 3.4 I introduced a configuration properly
'uri_url' to ensure that you could modify a repository's HTTPS
configuration but if you had this configuration option set you could
keep the URIs as http. As in the context of being a unique identifier,
you need to consider the URI as being a string of characters and if this
string of characters changes, then it is no longer the same unique
identifier, even though it is still describing the same thing.
I think you also identified another potential problem with the structure
of an EPrints URI, which is if there is a change to the hostname of the
repository itself. Again the uri_url option should allow you to ensure
URIs do not change. Unfortunately, this may lead to confusion for users
who wonder why the hostname for these URIs is different to the hostname
of the repository. Also, depending what happens to the old hostname's
DNS registration these URIs may become unresolvable. However, there is
no requirement for URIs, as any unique identifier, to be resolvable.
If an item has a DOI provided by a journal, an ISBN provided by a book
publisher, etc. then this would typically be more useful than an
institutional repository's URI, as this would be used in a general
context (i.e. you would expect a DOI or ISBN to appear in the citation
for such an item). However, I think to provide the best possible
coverage there is need for both forms for unique identifier: the one
from the original publisher (if that is not the institutional
repository, which would likely be the case for theses, etc.) and one
from the institutional repository. If you provide export formats that
can be ingested by third-party applications that include both unique
identifiers and therefore build a link between the two, it is possible
to build and network of unique identifiers for a particular item. Then
when you get a journal article that has authors from multiple
institutions, it will be possible to see that a publication from
institution A is the same publication as from institution B.
Regards
David Newman
On 28/04/2021 10:02, James Kerwin via Eprints-tech wrote:
> *CAUTION:* This e-mail originated outside the University of Southampton.
> Hi All,
>
> For once I have not broken anything, just looking for opinions and advice.
>
> As part of Plan S we need to have persistent identifiers for scholarly
> publications. I have read this EPrints wiki:
>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FPlan_S&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C77295f4a36d24085692708d90a2b1889%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637552002514180923%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=b%2B8N5JKVh2xDnKsKe7dEndbGNk9o6OvfkcsMtx0Jo%2BI%3D&reserved=0
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FPlan_S&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C77295f4a36d24085692708d90a2b1889%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637552002514190875%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0YFJM6VaLSiwxP6bVfhBKa9DeUZ15as8zjeUh7SNUVY%3D&reserved=0>
>
> At Liverpool we aren't 100% sure about this topic. DOI would be the
> obvious choice, but there are some on my team who reasonably point out
> that the same item could be in several repositories and end up having
> several separate DOIs associated with it. I'm not sure how much that
> matters.
>
> Does anybody have any thoughts on this point? We spoke with my
> predecessor, Adam, who was really helpful. Unconvinced team members
> have suggested using handle.net
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhandle.net%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C77295f4a36d24085692708d90a2b1889%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637552002514190875%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=E%2Bd0wGdFo1QGIkOIOm0xg%2BTyUe%2FxcwEm9zudytd2Zk4%3D&reserved=0>
> which I think is overkill and doesn't necessarily meet the needs of
> Plan S in itself.
>
> Also, the URL/EPrints ID for each item, is this not a suitable
> persistent identifier? The wiki linked above does mention this.
> There's always the possibility a repository URL could change in the
> future, but I would expect some sort of redirect to overcome this.
>
> If there is a more suitable place for this type of discussion please
> send me there.
>
> Thanks,
> James
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C77295f4a36d24085692708d90a2b1889%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637552002514190875%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZJMuNUoK48AKLcs8T8XG6veLezbHdQg9quT6byuDhv4%3D&reserved=0
> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C77295f4a36d24085692708d90a2b1889%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637552002514190875%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0iE1EWKGhLUbKabcM6%2FWqlRNR7fX%2BjNb9%2BbS8eLwu5k%3D&reserved=0
--
This email has been checked for viruses by AVG.
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.avg.com%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C77295f4a36d24085692708d90a2b1889%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637552002514190875%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OWq98ojMviDX2iY5MJ173HcrZEveNCcpb4tubavDvyY%3D&reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20210428/7c88eda0/attachment-0001.html
More information about the Eprints-tech
mailing list