[GOAL] Re: PMC & UKPMC Should Harvest From Institutional Repositories
Johanna McEntyre
mcentyre at ebi.ac.uk
Fri Apr 13 16:27:39 BST 2012
Stevan,
Thanks for these comments on how PMC & UKPMC could be improved. While I can't respond to the mandate changes suggested, I can comment on the suggestion that UKPMC should harvest/link to IR versions of papers.
We have considered doing this in some depth. However, for a number of reasons this is not as straightforward to actually do as it is to say:
(1) Firstly, UKPMC is a full text article database. Harvesting protocols such as OAI-PMH deal in metadata only. UKPMC is already supplemented by PubMed, Agricola, and EPO patent abstracts (about 26 million of them), so it is unclear how much content routine harvesting would add.
(2) Secondly, there is no clean way to identify life science & related content in IRs (this is a matter of research not production-level functionality), apart from perhaps resolving metadata to PMIDs, which then of course would not add new content to UKPMC.
(3) Thirdly, because UKPMC is primarily interested in full text articles, we would want to identify those records in IRs that have full text. Again, there is no clean programmatic way of doing this that we know of. If anyone knows how to do this programmatically then we would be interested in learning how.
(4) Finally, PMC & UKPMC (and PMC Canada) archive full text articles in XML. This structured content facilitates:
(a) linking to related public life science databases such as UniProt;
(b) operations such as text mining and smart indexing (e.g. restricting searches to figure legends);
(c) insures the integrity of the archive since viewed articles are rendered from the XML database to HTML on the fly, and
(d) reuse by third parties, in the case of OA articles.
Therefore, in the event that we could identify life science full text articles in IRs, we would want to add the ones we don't already have to UKPMC, not just link to them. For those articles, there is a lack of clarity regarding licensing information. Establishing the license of a given article currently requires a manual process and therefore is not at all scalable or sustainable. The only way around this that I can envision is for licensing information to be represented formally in structured data, with the best enabling licenses for content exchange being CC-BY or CC0.
If we harvest full-text content into UKPMC - which we do not have to right to harvest - we know from experience that this would be subject to a take-down request. Harvesting content, converting it to XML, and then being asked to remove it from the repository is not a strategy we wish to follow.
Content exchange to maximize usage in different contexts need not be a one-way process. Another option to consider is to encourage authors to deposit centrally (so we can do the things listed above) and then push content from UKPMC to populate IRs, for the purpose of institutional reporting, for example. We have an FTP site of OA articles: http://ukpmc.ac.uk/ftp/oa (there are over 400,000 OA articles there now) and will soon be releasing a web service that will retrieve metadata and full text (in the case of OA articles).
I'd also like to add that we are actively exploring how UKPMC can integrate with IRs, in particular with respect to related data resources via the EBI's partnership in the OpenAIRE Plus project. We will be continuing to collaborate to explore how IRs and UKPMC can interoperate better.
Jo McEntyre
On Apr 12, 2012, at 12:05 PM, Stevan Harnad wrote:
> On 2012-04-12, at 5:44 AM, Steve Hitchcock wrote:
>
>> Do we know why Pubmed does not apparently link to papers in IRs?
>> Is this Pubmed policy, or is there a technical reason?
>>
>> Stephen Curry: PubMed, the first port of call for anyone searching
>> the biomedical literature, frequently links to publisher’s site but
>> never to institutional repositories
>> http://occamstypewriter.org/scurry/2012/03/18/elsevier-the-research-works-act-and-open-access-where-to-now/
>
> PubMed & PubMed Central are wonderful resources, but not nearly
> as resourceful or wonderful as they easily could be.
>
> (1) PMC & UKPMC should of course be harvesting or linking
> institutional repository (IR) versions of papers, not just
> PMC/UKPMC-deposited and publisher-hosted papers.
>
> (2) Funders should be mandating IR deposit and PMC harvesting
> rather than direct PMC deposit. By thus making funder mandates
> and institutional mandates convergent and collaborative instead
> of divergent and competitive, this will motivate and facilitate adoption
> and compliance with institutional mandates: institutions are the universal
> providers of all research output, funded and unfunded.
>
> (3) IRs should mandate immediate deposit irrespective of publisher
> OA policy: If authors wish to honor publisher OA embargoes, they
> can set access to the deposit as Closed Access during the embargo
> and rely on providing almost-OA via the IR's email eprint request button
>
> (4) Funder mandates should require deposit by the fundee -- the one
> bound by the mandate -- rather than by the publisher, who is not
> bound by the mandate, and indeed in conflict of interest with it.
> http://openaccess.eprints.org/index.php?/archives/876-.html
>
> (5) Publishers (partly to protect from rival publisher free-loading,
> partly to discourage funder mandates, and partly out of simple
> misunderstanding of network capability) are much more likely
> to endorse immediate institutional self-archiving than institution-external
> deposit. This yet another reason funders should mandate institutional
> deposit and metadata harvesting instead of direct institution-external deposit.
>
> Stevan Harnad
>
>
> _______________________________________________
> GOAL mailing list
> GOAL at eprints.org
> http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal
More information about the GOAL
mailing list