[EP-tech] Antwort: Re: validation on upload field

John Salter J.Salter at leeds.ac.uk
Thu Nov 30 15:43:55 GMT 2017

In GitHub, there is this:
which works alongside an addition to System.pm:

That might be useful to know about?


From: eprints-tech-bounces at ecs.soton.ac.uk [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of martin.braendle at id.uzh.ch
Sent: 30 November 2017 13:56
To: eprints-tech at ecs.soton.ac.uk
Subject: [EP-tech] Antwort: Re: validation on upload field

Hi Alfredo,

another way, instead of validating, is to transcribe the filenames. We extended the sanitise subroutine in perl_lib/EPrints/System.pm like this:

Index: System.pm
--- System.pm (revision 1405)
+++ System.pm (revision 1406)
@@ -25,6 +25,7 @@
 use strict;
 use File::Copy<File:///\\:Copy>;
 use Digest::MD5;
+use Text::Unidecode;

 =item $sys = EPrints::System->new();

@@ -540,6 +541,10 @@
  $filepath = Encode::decode_utf8( $filepath )
  if !utf8::is_utf8( $filepath );

+ # UZH CHANGE ZORA-542 2016/12/21/mb
+ $filepath = unidecode( $filepath );
+ $filepath =~ s![\x20]!_!g;
  # control characters + Win32 restricted
  $filepath =~ s![\x00-\x0f\x7f<>:"\\|?*]!_!g;

Best regards,


Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Stampfenbachstr. 73
CH-8006 Zürich

mail: martin.braendle at id.uzh.ch
phone: +41 44 63 56705
fax: +41 44 63 54505

[Inactive hide details for th.lauke---30.11.2017 13:52:25---Hi Alfredo, we solved an similar feature request by a either reposit]th.lauke---30.11.2017 13:52:25---Hi Alfredo, we solved an similar feature request by a either repository specific (i.e. Eprints/archi

Von: th.lauke at arcor.de
An: eprints-tech at ecs.soton.ac.uk
Datum: 30.11.2017 13:52
Betreff: Re: [EP-tech] validation on upload field
Gesendet von: eprints-tech-bounces at ecs.soton.ac.uk


Hi Alfredo,

we solved an similar feature request by a either repository specific (i.e. Eprints/archives/repoID/cfg/cfg.d/) or server specific (i.e. Eprints/site_lib/cfg.d/) document_validate.pl:

$c->{validate_document} = sub
       my( $document, $repository, $for_archive ) = @_;

       my @problems = ();

       my $xml = $repository->xml();

       # default checks
# :
       # site-specific checks

       # check for proper filename, i.e. accepted by tivoli backup ingesting only ASCI-filenames without blanks
       # print STDERR "main: ", $document->value( "main" )," escaped: ",URI::Escape::uri_escape_utf8($document->value( "main" ), "^A-Za-z0-9\-\._~\/");
       my $doc_name_uri = URI::Escape::uri_escape_utf8($document->value( "main" ), "^A-Za-z0-9\-\._~\/");
       if( $document->value( "main" ) ne $doc_name_uri )
               my $fieldname = $repository->make_element( "span", class=>"ep_problem_field:documents" );
               $fieldname->appendChild( $document->dataset->render_name( $repository ) );

               my $prob = $repository->make_doc_fragment;
               $prob->appendChild( $repository->html_phrase( "validate:bad_filename", fieldname=>$fieldname ) );
               $prob->appendChild( $repository->make_text( $doc_name_uri ) );

               $prob->appendChild( $repository->html_phrase( "validate:original_filename") );
               $prob->appendChild( $repository->make_text( $document->value( "main" ) ) );

               push @problems, $prob;

       return( @problems );

After setting the introduced phrases by
<epp:phrase id="validate:bad_filename">Please replace non-ASCII characters (e.g. 'äöü') or blanks in the name of uploaded <epc:pin name="fieldname" /> appropriately to simplify future handling!<br/>Following filename prepared for repository<br/></epp:phrase>
<epp:phrase id="validate:original_filename"><br/>is different to original one :(<br/></epp:phrase>
in an appropriate .../lang/en/phrases/... file it should work :-)


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20171130/c7be47a8/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: image001.gif
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20171130/c7be47a8/attachment.gif 

More information about the Eprints-tech mailing list