[EP-tech] Altmetric explorer harvest

David R Newman drn at ecs.soton.ac.uk
Wed Nov 24 23:57:27 GMT 2021


Hi Ranju,


It looks like you have unsupported (in XML) ASCII characters like 
Vertical Tab (0x0B) in your abstracts.  You could make the changes in 
the following Git commit to fix the issue for the oai_dc format, which 
is the one that Altmetric would use to import from your repository's 
OAI-PMH interface:


https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints3.4%2Fcommit%2F94b2b57bb13796b812f516f0f457b43dccd047c2&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd4f96c24b07c455427b308d9afa62d46%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637733950543472667%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Z%2FPGcjnLgP9jBKAKMGHFHzbl7Qq6XrMOBA6KtDNcDSY%3D&reserved=0


It also introduces a new EPrints::XML:remove_invalid_chars function to 
more general tidy up ASCII characters that cannot be represented in XML, 
mainly control codes [1].  This does not fix the rdf and mets formats in 
OAI-PMH but I don't think these are generally used.  If I get a chance, 
I may look into whether these can be similarly fixed.


Regards


David Newman

[1] https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FC0_and_C1_control_codes&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd4f96c24b07c455427b308d9afa62d46%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637733950543671797%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=QbnExl%2BVH32eNd4ZA2NAw3FebPYyOQoEND6oneBXaIc%3D&reserved=0


On 24/11/2021 15:00, Ranju Upadhyay via Eprints-tech wrote:
> *CAUTION:* This e-mail originated outside the University of Southampton.
> Hi Team,
>
> Our University has purchased  Altmetric Explorer and now they want to 
> harvest metadata from our IR
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.altmetric.com%2Fexplorer%2Flogin&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd4f96c24b07c455427b308d9afa62d46%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637733950543681753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=CBjzTtu5vQUbpSwS0OSqIIRbIbLUqQY%2F%2B1CFOQS9pKk%3D&reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.altmetric.com%2Fexplorer%2Flogin&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd4f96c24b07c455427b308d9afa62d46%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637733950543681753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=CBjzTtu5vQUbpSwS0OSqIIRbIbLUqQY%2F%2B1CFOQS9pKk%3D&amp;reserved=0>
>
> But they seem to have come across some items (I think old ones) that 
> have some characters that are corrupt or not encoded properly in some 
> fileds (mostly in Abstract field) and that gives invalid XML error, 
> when they try to harvest. There are several thousand items in our IR 
> so bit difficult for me to check what items might have that issue, is 
> there any script or something that I could run against our items and 
> spot those ?
>
>
> Any help is appreciated.
>
> Best regards,
> Ranju
>
> *Ranju Upadhyay Rai*
>
> Library Programmer
>
> University Library
>
> **
>
> Ollscoil Mhá Nuad, Maigh Nuad, Co. Chill Dara, Éire, W23 VP22
>
> Maynooth University, Maynooth, Co. Kildare, Ireland, W23 VP22
>
>
> *T: *+353 1 708 3378 *M: *+353 87 98 43811
>
> Ranju.Upadhyay at mu.ie <mailto:YourEmailHere at nuim.ie> *W: 
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.maynoothuniversity.ie%2Flibrary&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd4f96c24b07c455427b308d9afa62d46%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637733950543681753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=0%2Fl9cnVP3nwbSTvp75aFIQycIqdkYN1nnEPOkiSV8rk%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.maynoothuniversity.ie%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd4f96c24b07c455427b308d9afa62d46%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637733950543681753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=kPevqLHPG7BKE6O3UPx3EHd%2F9k6Y4r%2Bn9ceEgEgSa44%3D&amp;reserved=0>*
>
>
> All personal data received by the Library will be held safely and 
> securely in compliance with the EU General Data Protection Regulation 
> (GDPR) and the Data Protection Act (Ireland) 2018. The Library may 
> retain personal data for operational, statistical and archival 
> purposes. For further information please consult the University's Data 
> Protection Policy: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.maynoothuniversity.ie%2Fdata-protection&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd4f96c24b07c455427b308d9afa62d46%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637733950543681753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=NZTCNIgewlez0YerSkQF4DFimK1d6Ti193NC0D0dEyQ%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.maynoothuniversity.ie%2Fdata-protection&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd4f96c24b07c455427b308d9afa62d46%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637733950543681753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=NZTCNIgewlez0YerSkQF4DFimK1d6Ti193NC0D0dEyQ%3D&amp;reserved=0> or 
> contact the MU Data Protection Officer, at dataprotection at mu.ie 
> <mailto:dataprotection at mu.ie>.
>
> An t-eolas pearsanta ar fad a fhaigheann an Leabharlann bíonn sé slán 
> agus sabháilte faoina coimirce ar mhaithe le Rialachán Ginearálta 
> maidir le Cosaint Sonraí an Aontais Eorpaigh agus leis an Acht um 
> Chosaint Sonraí (Éire) 2018 a chomhlíonadh. D’fhéadfadh an Leabharlann 
> eolas pearsanta a choimead ar chúiseanna oibríochta, staitistiúil agus 
> cartlainne. Le haghaidh tuilleadh eolais, téigh i gcomhairle le 
> Polasaí  na hOllscoile um Chosaint Sonraí: 
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.maynoothuniversity.ie%2Fdata-protection&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd4f96c24b07c455427b308d9afa62d46%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637733950543681753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=NZTCNIgewlez0YerSkQF4DFimK1d6Ti193NC0D0dEyQ%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.maynoothuniversity.ie%2Fdata-protection&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd4f96c24b07c455427b308d9afa62d46%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637733950543681753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=NZTCNIgewlez0YerSkQF4DFimK1d6Ti193NC0D0dEyQ%3D&amp;reserved=0> 
> nó téigh i dteagmháil le hOifigeach Cosanta Sonraí na hOllscoile ag 
> dataprotection at mu.ie <mailto:dataprotection at mu.ie>.
>
>
>
> *** Options:http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd4f96c24b07c455427b308d9afa62d46%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637733950543681753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ZMBwFpG1VwPmiFM%2FdRRlOh6WlY1Z1lz4jmtKg7IYGNc%3D&amp;reserved=0
> *** EPrints community wiki:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd4f96c24b07c455427b308d9afa62d46%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637733950543681753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=NvPyrUq7fAc%2B0cNMRmSAlN4lEHb%2FkA2jrG2Wbhgdz7c%3D&amp;reserved=0

-- 
This email has been checked for viruses by AVG.
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.avg.com%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cd4f96c24b07c455427b308d9afa62d46%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637733950543681753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=uFHzU1%2Buvfad32fHPE%2FQI4Ivvr1nR5a55bWTrwPXqzo%3D&amp;reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20211124/de3c7128/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-dcwo1qaz.png
Type: image/png
Size: 9931 bytes
Desc: not available
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20211124/de3c7128/attachment-0001.png 


More information about the Eprints-tech mailing list