[EP-tech] Elements-EPrints Odd Characters stopping upload

David R Newman drn at ecs.soton.ac.uk
Wed Feb 17 13:06:10 GMT 2021


Hi James,

Yes, that does look like a bit if a complex issue.  The URL it is 
pointing to on EPrints seems to be its representation of the record 
through Symplectic's RT1 rt4eprints handler.  Thay URL looks ok once you 
decode it but it is not the same URL as the file URL for EPrints, 
although I would expect it to return the same things.  However, the 
rt4eprints handler may then parse the file parameter in the get header 
is some interestingw ay.  Semi-colons are often used as a separator in 
various different ways so it may think you have specified two files:

Induction with Thymoglobulin in High-Risk Renal Transplant Patients

Beauty and the Beast.pdf

What is does in this situation is uncertain without reviewing their 
code.  It may just try to recover one or other of the potential files 
neither of which exists, so it fails.  I think you are probably right 
that RT2 will fix this as there will no longer be an rt4eprints handler 
to which the URL is refering.

Regards

David Newman

On 17/02/2021 12:52, James Kerwin wrote:
> *CAUTION:* This e-mail originated outside the University of Southampton.
> Right it did not take long to refamiliarise myself. The following 
> EPrints record has one file with a semicolon in the file name (";" for 
> the avoidance of doubt). I can access the file on EPrints, but in 
> Elements via Chrome and FireFox I get quite similar errors.
>
> *EPrints Record:* https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2F3008387%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725560950%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=EeAUat72iK15mHSUmreA21YKxctlI617rl9GXubfqaw%3D&reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2F3008387%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725560950%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=EeAUat72iK15mHSUmreA21YKxctlI617rl9GXubfqaw%3D&amp;reserved=0>
> *File link/url:* 
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2F3008387%2F1%2FInduction%2520with%2520Thymoglobulin%2520in%2520High-Risk%2520Renal%2520Transplant%2520Patients%253B%2520Beauty%2520and%2520the%2520Beast.pdf&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725565933%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UMDkDBnqjL%2BZB3V9S542Ar27K%2FjFjwuW8YnkCapszgA%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2F3008387%2F1%2FInduction%2520with%2520Thymoglobulin%2520in%2520High-Risk%2520Renal%2520Transplant%2520Patients%253B%2520Beauty%2520and%2520the%2520Beast.pdf&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725565933%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UMDkDBnqjL%2BZB3V9S542Ar27K%2FjFjwuW8YnkCapszgA%3D&amp;reserved=0>
>
> In Elements when I go to the record and click the file icon I would 
> expect it to download or open in the browser, instead I get an error:
>
>         The web page at
>         https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Felements.liverpool.ac.uk%2Frepository.html%3Fcom%3Dget-file%26publication-id%3D220114%26rfid%3Dhttps%253A%252F%252Flivrepository.liverpool.ac.uk%252Frt4eprints%252Ffile%252F106106%252FInduction%252520with%252520Thymoglobulin%252520in%252520High-Risk%252520Renal%252520Transplant%252520Patients%25253B%252520Beauty%252520and%252520the%252520Beast.pdf&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725565933%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=96GIAHh4xB%2BK%2BopTiIG6H3%2FKK0md4Voyk8CymZM3E9g%3D&amp;reserved=0
>         <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Felements.liverpool.ac.uk%2Frepository.html%3Fcom%3Dget-file%26publication-id%3D220114%26rfid%3Dhttps%253A%252F%252Flivrepository.liverpool.ac.uk%252Frt4eprints%252Ffile%252F106106%252FInduction%252520with%252520Thymoglobulin%252520in%252520High-Risk%252520Renal%252520Transplant%252520Patients%25253B%252520Beauty%252520and%252520the%252520Beast.pdf&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725565933%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=96GIAHh4xB%2BK%2BopTiIG6H3%2FKK0md4Voyk8CymZM3E9g%3D&amp;reserved=0>
>
> Going through this with another file on another record that has no ";" 
> in the filename it works as expected.
>
> Looking at the links from both EPrints and Elements there's some sort 
> of difference in the link to the file:
>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2F3008387%2F1%2FInduction%2520with%2520Thymoglobulin%2520in%2520High-Risk%2520Renal%2520Transplant%2520Patients%253B%2520Beauty%2520and%2520the%2520Beast.pdf&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725570903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sIMMYxvr1Yq9dw49dv2BHUhZP7BphWAwik8%2F1LfS07k%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2F3008387%2F1%2FInduction%2520with%2520Thymoglobulin%2520in%2520High-Risk%2520Renal%2520Transplant%2520Patients%253B%2520Beauty%2520and%2520the%2520Beast.pdf&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725570903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sIMMYxvr1Yq9dw49dv2BHUhZP7BphWAwik8%2F1LfS07k%3D&amp;reserved=0> 
>
>
> https%3A%2F%2Flivrepository.liverpool.ac.uk 
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2F2flivrepository.liverpool.ac.uk%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725570903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bL4V7MMa4xAVEO%2FgMOWE0sqMLeW%2F456xVeBzIqapjxA%3D&amp;reserved=0>%2Frt4eprints%2Ffile%2F106106%2FInduction%2520with%2520Thymoglobulin%2520in%2520High-Risk%2520Renal%2520Transplant%2520Patients%253B%2520Beauty%2520and%2520the%2520Beast.pdf 
>
>
> This is an example of what we get for a working record:
>
> Eprints: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2F3115606%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725570903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OIYdX16GH0unwilJcDxR%2FCAP37JAI1%2BeF09K2ZH%2BexE%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2F3115606%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725570903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OIYdX16GH0unwilJcDxR%2FCAP37JAI1%2BeF09K2ZH%2BexE%3D&amp;reserved=0>
> EPrints File Link:
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2F3115606%2F1%2FLee%2520et%2520al%2520Erratum%25202021pdf.pdf&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725575881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=IkUXp4T0B7DkmA9h%2Bz6o7LlXU37IwQhjT7%2BTSLeWAuA%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2F3115606%2F1%2FLee%2520et%2520al%2520Erratum%25202021pdf.pdf&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725575881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=IkUXp4T0B7DkmA9h%2Bz6o7LlXU37IwQhjT7%2BTSLeWAuA%3D&amp;reserved=0>
> Elements File Link:
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Felements.liverpool.ac.uk%2Frepository.html%3Fcom%3Dget-file%26publication-id%3D486086%26rfid%3Dhttps%253A%252F%252Flivrepository.liverpool.ac.uk%252Frt4eprints%252Ffile%252F371659%252FLee%252520et%252520al%252520Erratum%2525202021pdf.pdf&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725575881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qaLSQ4BuK16RuDv2jpuKsqkJDpelLaEHOUrcXXkgark%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Felements.liverpool.ac.uk%2Frepository.html%3Fcom%3Dget-file%26publication-id%3D486086%26rfid%3Dhttps%253A%252F%252Flivrepository.liverpool.ac.uk%252Frt4eprints%252Ffile%252F371659%252FLee%252520et%252520al%252520Erratum%2525202021pdf.pdf&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725575881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qaLSQ4BuK16RuDv2jpuKsqkJDpelLaEHOUrcXXkgark%3D&amp;reserved=0>
>
> The only consistent difference is the presence of these characters, 
> but I can't really understand where it's going wrong. I had wanted to 
> take a proper look at it and get in touch with Symplectic, but people 
> have the audacity to keep finding new problems and giving me more 
> work... We are due to upgrade to RT2 soonish (2nd half of this year 
> probably) and I am hoping all of this goes away with that.
>
> Thanks,
> James
>
> On Wed, Feb 17, 2021 at 12:18 PM James Kerwin <jkerwin2101 at gmail.com 
> <mailto:jkerwin2101 at gmail.com>> wrote:
>
>     Hi David,
>
>     Thank you for your reply. Unfortunately I don't have access to the
>     Elements database(s) but I've explained this issue to our Elements
>     people and hopefully should get a response. Meanwhile, some time
>     ago Mr Salter gave me the means to extract the Elements xml and
>     transform it via the crosswalks outside of EPrints, so I may do
>     that with the different records and see what we get. Doing this
>     has only just now occurred to me now so I'll give it a go.
>
>     On the subject of the character in question... The error code
>     comes from (I think!):
>
>     eprints3/perl_lib/URI/Escape.pm
>
>     Specifically here in the _fail_hi sub:
>
>                 "sub uri_escape {
>                     my($text, $patn) = @_;
>                     return undef unless defined $text;
>                     if (defined $patn){
>                         unless (exists  $subst{$patn}) {
>                             # Because we can't compile the regex we
>                 fake it with a cached sub
>                             (my $tmp = $patn) =~ s,/,\\/,g;
>                             eval "\$subst{\$patn} = sub {\$_[0] =~
>                 s/([$tmp])/\$escapes{\$1} || _fail_hi(\$1)/ge; }";
>                             Carp::croak("uri_escape: $@") if $@;
>                         }
>                         &{$subst{$patn}}($text);
>                     } else {
>                         $text =~ s/($Unsafe{RFC3986})/$escapes{$1} ||
>                 _fail_hi($1)/ge;
>                     }
>                     $text;
>                 }
>
>                 sub _fail_hi {
>                     my $chr = shift;
>                     Carp::croak(sprintf "Can't escape \\x{%04X}, try
>                 uri_escape_utf8() instead", ord($chr));"
>
>     The FULL error log line says:
>
>                 Can't escape \\x{2019}, try uri_escape_utf8() instead
>                 at /opt/eprints3/perl_lib/URI/Escape.pm line
>                 178.\n\tURI::Escape::_fail_hi('\xe2\x80\x99') called
>                 at /opt/eprints3/perl_lib/URI/Escape.pm line
>                 171\n\tURI::Escape::uri_escape('Published by the
>                 American Physical Society under the terms of...')
>                 called at (eval 177) line
>                 82\n\tEPrints::Config::uolrepo::__ANON__('dataset',
>                 'EPrints::DataSet=HASH(0x7f21238f9358)', 'repository',
>                 'Symplectic::Wrappers::EPrintsSession=HASH(0x7f2124610710)',
>                 'dataobj',
>                 'EPrints::DataObj::EPrint=HASH(0x7f21285879b0)',
>                 'changed', 'HASH(0x7f212d684f18)') called at
>                 /opt/eprints3/perl_lib/EPrints/DataSet.pm line
>                 1517\n\tEPrints::DataSet::run_trigger('EPrints::DataSet=HASH(0x7f21238f9358)',
>                 105, 'dataobj',
>                 'EPrints::DataObj::EPrint=HASH(0x7f21285879b0)',
>                 'changed', 'HASH(0x7f212d684f18)') called at
>                 /opt/eprints3/perl_lib/EPrints/DataObj.pm line
>                 669\n\tEPrints::DataObj::commit('EPrints::DataObj::EPrint=HASH(0x7f21285879b0)',
>                 undef) called at
>                 /opt/eprints3/perl_lib/EPrints/DataObj/EPrint.pm line
>                 1011\n\tEPrints::DataObj::EPrint::commit('EPrints::DataObj::EPrint=HASH(0x7f21285879b0)')
>                 called at
>                 /opt/eprints3/perl_lib/Symplectic/RepoProcess/MetadataManager.pm
>                 line
>                 355\n\tSymplectic::RepoProcess::MetadataManager::add_preferred_bibliographic('Symplectic::RepoProcess::MetadataManager=HASH(0x7f2123858468)',
>                 'eprint',
>                 'EPrints::DataObj::EPrint=HASH(0x7f21285879b0)',
>                 'raw_record',
>                 'XML::LibXML::Document=SCALAR(0x7f212858bb60)',
>                 'types', 'ARRAY(0x7f21254315a0)', 'limit_to',
>                 'ARRAY(0x7f21215fceb8)', ...) called at
>                 /opt/eprints3/perl_lib/Symplectic/RepoProcess/MetadataManager.pm
>                 line
>                 240\n\tSymplectic::RepoProcess::MetadataManager::add_bibliographic('Symplectic::RepoProcess::MetadataManager=HASH(0x7f2123858468)',
>                 'eprint',
>                 'EPrints::DataObj::EPrint=HASH(0x7f21285879b0)',
>                 'publication',
>                 'Symplectic::PubsModel::Publication=HASH(0x7f212d6b7fe8)')
>                 called at
>                 /opt/eprints3/perl_lib/Symplectic/RepoProcess/IngestWorkflow.pm
>                 line
>                 203\n\tSymplectic::RepoProcess::IngestWorkflow::update_metadata('Symplectic::RepoProcess::IngestWorkflow=HASH(0x7f212858f348)',
>                 'eprint',
>                 'EPrints::DataObj::EPrint=HASH(0x7f21285879b0)',
>                 'publication',
>                 'Symplectic::PubsModel::Publication=HASH(0x7f212d6b7fe8)',
>                 'auth_details',
>                 'Symplectic::PubsModel::AuthDetails=HASH(0x7f212d785c38)',
>                 'record',
>                 'Symplectic::RepoModel::PublicationsRecord=HASH(0x7f212c73f510)',
>                 ...) called at
>                 /opt/eprints3/perl_lib/Symplectic/RepoProcess/PublicationManager.pm
>                 line
>                 65\n\tSymplectic::RepoProcess::PublicationManager::get_deposit_representation('Symplectic::RepoProcess::PublicationManager=HASH(0x7f212d7ac290)',
>                 'publication',
>                 'Symplectic::PubsModel::Publication=HASH(0x7f212d6b7fe8)',
>                 'auth_details',
>                 'Symplectic::PubsModel::AuthDetails=HASH(0x7f212d785c38)')
>                 called at
>                 /opt/eprints3/perl_lib/Symplectic/Process/FileDepositProcessor.pm
>                 line
>                 148\n\tSymplectic::Process::FileDepositProcessor::handle('Symplectic::Process::FileDepositProcessor=HASH(0x7f212d6d73b0)',
>                 'pid', 485375, 'auth_details',
>                 'Symplectic::PubsModel::AuthDetails=HASH(0x7f212d785c38)',
>                 'deposit_props',
>                 'Symplectic::PubsModel::DepositProperties=HASH(0x7f212e8a0440)',
>                 'atom', 'CGI::File::Temp=GLOB(0x7f212d7fae08)', ...)
>                 called at
>                 /opt/eprints3/perl_lib/Symplectic/Handlers/RepositoryHandler.pm
>                 line
>                 235\n\tSymplectic::Handlers::RepositoryHandler::post_handler('session',
>                 'Symplectic::Wrappers::EPrintsSession=HASH(0x7f2124610710)',
>                 'request',
>                 'Apache2::RequestRec=SCALAR(0x7f212e8a77a8)',
>                 'auth_details',
>                 'Symplectic::PubsModel::AuthDetails=HASH(0x7f212d785c38)')
>                 called at
>                 /opt/eprints3/perl_lib/Symplectic/Handlers/RepositoryHandler.pm
>                 line
>                 109\n\tSymplectic::Handlers::RepositoryHandler::handler_multi('Apache2::RequestRec=SCALAR(0x7f212e8a77a8)',
>                 undef) called at
>                 /opt/eprints3/perl_lib/Symplectic/Apache/Rewrite.pm
>                 line
>                 98\n\tSymplectic::Apache::Rewrite::__ANON__('Apache2::RequestRec=SCALAR(0x7f212e8a77a8)')
>                 called at -e line 0\n\teval {...} called at -e line 0\n
>
>
>     I'm making some big assumptions, but I THINK the "\\x{%04X}" is
>     saying "take 4 characters from the result of ord($chr) and put
>     them here". I'm possibly very wrong. I think any solution for this
>     needs to belong in the Symplectic code on the repo server. I don't
>     fancy altering core EPrints code for the sake of this. I'll be in
>     a whole world of hell before I know it. Yesterday when tracing
>     this I ended up at:
>
>     eprints3/symplectic/perl_lib/Symplectic/RepoProcess/MetadataManager.pm
>
>     Reading through the code it appears to identify the preferred
>     record and start processing it. Perhaps this is a good opportunity
>     to intervene and either swap bad characters for good ones or
>     encode/decode "properly" (as if I know what I'm talking about).
>     Complicated slightly by not being able to thoroughly test it. I
>     suppose another option would be to see what XSLT etc. can do with
>     regard to this and so catch the problem within the crosswalks.
>
>     If we verify the manual record in Elements it gets a higher
>     precedence than the Scopus record and so the problem disappears.
>
>     Regarding the other problem with the file link I will need to
>     refamiliarise myself with it and I'll reply later. Plus this email
>     is already wordy enough as it is!
>
>     Thanks,
>     James
>
>
>
>     On Wed, Feb 17, 2021 at 10:31 AM David R Newman
>     <drn at ecs.soton.ac.uk <mailto:drn at ecs.soton.ac.uk>> wrote:
>
>         Hi James,
>
>         I think you would need to look at this field in the Elements
>         record in its database to look how it is being stored
>         differently when there is an import compared to where there is
>         manual entry.  As you said I think the problem is in part that
>         text box entries get parsed and encoded before going into the
>         database but imports do not (or at very least the process
>         between input and output to the Elements database is
>         different).  It would be useful to know how they look
>         different in the Elements database as they may assist making
>         EPrints more resilient to unexpected encodings in future.
>
>         However "\\x{2019}" looks like an escaped version of something
>         that is not particularly valid.  If this was "\\u{2019}" this
>         would probably work better as \x I think can only be used to
>         represent a standard ASCII character that can be only two hex
>         digits like \x3a is a colon ":". \u is used for the extended
>         character set (i.e. UTF-16).  \u{2019} in UTF-8 would be
>         \xE2\x80\x99.
>
>         It would be interesting to get a bit more information about
>         your other issue with regular quote marks and semi-colons that
>         are part of the standard ASCII set rather than an extended
>         characters.  These really should not be causing a problem.
>
>         Regards
>
>         David Newman
>
>         On 17/02/2021 09:44, James Kerwin via Eprints-tech wrote:
>>         *CAUTION:* This e-mail originated outside the University of
>>         Southampton.
>>         Hi All,
>>
>>         This is an Elements/EPrints question. Apologies that it isn't
>>         purely EPrints, but this is probably the best place to get an
>>         answer. I want to know if others experience this or it's some
>>         oddity to our setup.
>>
>>         We are using RT1 (for now) and EPrints 3.3.14 (also for now
>>         until upgrade). Occasionally we get an Elements record that
>>         is from Scopus, PubMed etc. that has an odd character in it
>>         that prevents upload. When I look in the Apache logs it tells
>>         me the problem. Yesterday's one was the presence of:
>>
>>          "Unicode Character “’” (U+2019)"
>>
>>         Which showed in the logs as:
>>
>>         "Can't escape \\x{2019}, try uri_escape_utf8() instead at
>>         /opt/eprints3/perl_lib/URI/Escape.pm"
>>
>>         Importantly if I copy the problem characters to the manual
>>         elements record it doesn't pose a problem. There appears some
>>         processing to properly encode characters entered via text
>>         box, but not characters dragged in from other sources into
>>         Elements.
>>
>>         I've also had the issue with the files containing "'" or" ";"
>>         etc not being accessible via Elements (a very similar, but
>>         different problem).
>>
>>         I found where I COULD fix the former issue, but it involves
>>         changing EPrints code when I SHOULD be altering the
>>         Symplectic connector code on the repo server.
>>
>>         Anyway, I'm not specifically looking for a solution, but has
>>         anybody else experienced anything similar? If so, does it
>>         stop with RT2? I hope to raise a ticket with Symplectic over
>>         this eventually.
>>
>>         Thanks,
>>         James
>>
>>
>>
>>         *** Options:http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech  <http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech>
>>         *** Archive:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725575881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=rwyhLneAUod8gNQ280Q0nXbSf9lCIoRAYzOYIXbewew%3D&amp;reserved=0  <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725580859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Tberp4kYSQOksfoPqXPb9%2Bi6%2BCoorWZfgy5l6TH1jbU%3D&amp;reserved=0>
>>         *** EPrints community wiki:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725580859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0XI506jmM94FD36lonDEjYvyCdGCygAQm9zRcSYjf6I%3D&amp;reserved=0  <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725580859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0XI506jmM94FD36lonDEjYvyCdGCygAQm9zRcSYjf6I%3D&amp;reserved=0>
>
>         <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725580859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Y9NSvHyxbYIRjprBB5x7Cu0hHcp%2FUw3QWRW5tvXwnEs%3D&amp;reserved=0>
>         	Virus-free. https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725580859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=q5dqMMjGfNlI4u0HguBJW5kEd7HyZv2mGYZ3DgocRWk%3D&amp;reserved=0
>         <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725585839%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=huntLrCoFdIeXtH0v4UqUP85x%2BRHfisUgWKJ3fzXKS0%3D&amp;reserved=0>
>
>


-- 
This email has been checked for viruses by AVG.
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.avg.com%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cfb1a3312d8574fab00e008d8d344cc3e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637491639725585839%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=HccxYGJ0PTs3tlKatYXlpWriZeEnlFEPFVNVMhgomiA%3D&amp;reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20210217/1710c47a/attachment-0001.html 


More information about the Eprints-tech mailing list