<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="padding-bottom: 10px; padding-top: 5px;">
<div style="padding:12px; border:1px solid #8D3970; background-color:#F7F9FA; color:#8D3970; font-size:14px; line-height:22px; font-family: Calibri, Arial, Helvetica, sans-serif;">
<strong>CAUTION:</strong> This e-mail originated outside the University of Southampton.
</div>
</div>
<div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi everyone,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
This might be useful for others, I solved the issue with a couple of REGEX:</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
</div>
<div>&nbsp; &nbsp; $filename =~ s/\x27/=0027/g;<br>
</div>
<div>&nbsp; &nbsp; $filename =~ s/\x22/=0022/g;</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
to replace the quote and double-quote in what is returned by this function:</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp; &nbsp;file-&gt;get_value(&quot;filename&quot;)</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
>From a digital preservation perspective, I think it is significant to note that &quot;filename&quot; in this object:</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
&nbsp; &nbsp;&nbsp;<a href="https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FFile_Object&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C7683866899a64e2c1cb508d9fb9d1508%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637817476336433845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=v20q6TGy09MIR%2FNd9HLEE6yewp6%2F0ssmvuptQ%2FEn2Eg%3D&amp;reserved=0" originalSrc="https://wiki.eprints.org/w/File_Object" shash="QhbtBzg/XVRt6/iaTgZdoc+J5sCEzhwq1m5jqhPxIf7TOutjNRw5rRlC3DhQEmcWVC6rtjG/LQ4SWHvg91TLWIigY+pBz88wXk9PTAjiE4oxtyZD0PEKADNBkiosEmyS6S0pFqbNSN/swHx8mr3mdjcAmYQXfQTj0UctrgHXbxQ=" id="LPNoLPOWALinkPreview" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;">https://wiki.eprints.org/w/File_Object</a></div>
<div class="_Entity _EType_OWALinkPreview _EId_OWALinkPreview _EReadonly_1"></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
does not necessarily refer to the &quot;filename&quot; on disk.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
What is the function or property (is there one?) in EPrints objects that is identical to the filename of the file as it is on the filesystem?</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Tomasz</div>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> David R Newman &lt;drn@ecs.soton.ac.uk&gt;<br>
<b>Sent:</b> Sunday, February 20, 2022 4:28 PM<br>
<b>To:</b> eprints-tech@ecs.soton.ac.uk &lt;eprints-tech@ecs.soton.ac.uk&gt;; Tomasz Neugebauer &lt;Tomasz.Neugebauer@concordia.ca&gt;<br>
<b>Subject:</b> Re: [EP-tech] apostrophe in file names of uploaded/deposited file</font>
<div>&nbsp;</div>
</div>
<div>
<p><span style="font-weight:bold; color:rgb(156,0,0)">Attention</span> This email originates from outside the concordia.ca domain. // Ce courriel provient de l'exterieur du domaine de concordia.ca</p>
<p><br>
</p>
<div><br>
</div>
<p><font size="4">Hi Tomasz,</font></p>
<p><font size="4">There are two ways to work round this issue.&nbsp; One has been in EPrints for quite a while, another I introduced in 3.4.3 to help deal retrospectively with this issue.</font></p>
<p><font size="4">1. <a class="x_moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FOptional_filename_sanitise.pl&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C7683866899a64e2c1cb508d9fb9d1508%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637817476336433845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=zvAvcmbvfCjUAJlx5QvEjxt5B%2BacOb8Tk48ziY%2BmFMs%3D&amp;reserved=0" originalSrc="https://wiki.eprints.org/w/Optional_filename_sanitise.pl" shash="attP2CF8JYokjLbnD/A501d06aBAm3eIJ8+9vxD/2AE7bCuvPZy/OPImK+FgRwclCIUX52Bn07RvaklvnDYQptEjjjsVGxPHpb1ivLWEKiZFBDMKTuIk/8EelzLEVTjK5fs6/5PLSJdNl66tKuoxhhNCUcaNTV9ezth/RHfWvRw=" originalsrc="https://wiki.eprints.org/w/Optional_filename_sanitise.pl" shash="iTfuAC25PVKTHwLLF+EhEZmnEB656ROXZ2dQ5hLgtNYa1rEv7LooCYe0rX12xvr72a22gerJeAq/RSF2GzgcTbJAEHFEgToclmT+a2uipBPE5cUu6WEXC/IrAzjWY9eq/aKELQvEejoHkRxxxOoUcZYuxfTo4RgXHhVAI0XINL8=">
https://wiki.eprints.org/w/Optional_filename_sanitise.pl</a> allows you to set characters that should be removed before a filename is recorded in the database or saved to disk.&nbsp; I have to admit I did not know about this until fairly recently, so I have not
 tested how well it will work or solve your problem.&nbsp; If you look at /opt/eprints3/lib/cfg,d/optional_filename_sanitise.pl there is a function that can be added under $c-&gt;{optional_filename_sanitise}.&nbsp; The default (albeit commented out) function will remove
 white space, brackets and @ signs into underscores.&nbsp; You could add a line like below to deal with apostrophes.</font></p>
<p><font size="4">$filepath =~ s!\x27!_!g;</font></p>
<p><font size="4">2. The new functionality I added for 3.4.3, is to allow files on disk to be found under the filename &lt;fileid&gt;.bin.&nbsp; This allows you to fix this sort of issue by renaming the file on disk to &lt;fileid&gt;.bin.&nbsp; Also, you can enable it so that future
 files are automatically saved in the format &lt;fileid&gt;.bin by setting:</font></p>
<p><font size="4">$c-&gt;{generic_filenames} = 1;</font></p>
<p><font size="4">I would probably advise against doing this on a live repository, especially if you have unusual uploads like uploading multiple files an once through &quot;Upload from URL&quot;.&nbsp; If you want to test this on a development repo, then please do, as any
 real-world-ish feedback on this feature would be useful.</font></p>
<p><font size="4">Regards</font></p>
<p><font size="4">David Newman<br>
</font></p>
<div class="x_moz-cite-prefix">On 20/02/2022 20:32, Tomasz Neugebauer via Eprints-tech wrote:<br>
</div>
<blockquote type="cite">
<meta name="Generator" content="Microsoft Word 15 (filtered
        medium)">
<div style="padding-bottom:10px; padding-top:5px">
<div style="padding:12px; border:1px solid #8D3970; background-color:#F7F9FA; color:#8D3970; font-size:14px; line-height:22px; font-family:Calibri,Arial,Helvetica,sans-serif">
<strong>CAUTION:</strong> This e-mail originated outside the University of Southampton.
</div>
</div>
<div>
<div class="x_WordSection1">
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
Good afternoon!</p>
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
&nbsp;</p>
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
I’m trying to troubleshoot an issue with exporting out a deposited file that has an apostrophe in the filename.</p>
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
This is the issue: <a href="https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprintsug%2FEPrintsArchivematica%2Fissues%2F40&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C7683866899a64e2c1cb508d9fb9d1508%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637817476336433845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=R2ahb6ANUTaNdOtZDvVYkIZLNFvGaJJJQM1ZDduvwcI%3D&amp;reserved=0" originalSrc="https://github.com/eprintsug/EPrintsArchivematica/issues/40" shash="yHLiuZtPi76rN5V7HpNb5CTV9VlzEneApFMfY7v9iJ4uBhiyvyFAQEwApyT9oKxMcFnFOxNFdGDm/o+pzZqNifr6iGIjEslx75YAFw2idipuMjHy82abpwA+4Fc8YSykm3TuLjgUl65TS1Sxw0WjgMKOGxgP3XSB2BQKR0wWGlU=" originalsrc="https://github.com/eprintsug/EPrintsArchivematica/issues/40" shash="k8uZSeHh78IZN1HXoQ5N9j4Yjk0svZVH/Y4tfwsGLbGKAevBjk+reQEZVpjce/YTW3wCuan3W3f0cfAUecUsthFhBdC9iVipnEk6WiQwR2LtS3jHHeqv0Q7OSsR7b3Wi+AMKYIQxzr8X2R7+uEHUKFnbrNODg28OerXK+M4hlmw=">
https://github.com/eprintsug/EPrintsArchivematica/issues/40</a></p>
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
&nbsp;</p>
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
Does EPrints replace apostrophes in filenames on disk with <span style="">=0027?</span></p>
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
<span style="">Where in the code does that happen?</span></p>
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
<span style="">The URL of the file has the apostrophe, for example:</span></p>
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
<span style=""><a href="https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspectrum.library.concordia.ca%2Fid%2Feprint%2F7066%2F1%2FServices_techniques_a_l%27Universite_Concordia.pdf&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C7683866899a64e2c1cb508d9fb9d1508%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637817476336433845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=XwrpGVlTOCSQyU2Iqq5z1H241%2BnPaVwKurU7VNNyeZo%3D&amp;reserved=0" originalSrc="https://spectrum.library.concordia.ca/id/eprint/7066/1/Services_techniques_a_l'Universite_Concordia.pdf" shash="PRlr122qe4iahmGgvabrb3NlamC/7WakyW/zb17Et5KUJsCIiFZo2s0PRYJTk4joa+3HK3PLlwYWUZwplzAlv/fUBMwtMSlQmZkPsJt0YNsSi+0xV0qzVIq/4i+aUIk+Na1x4lHpMrGnotq9V1PoncQn3xL+eqoRuGxiFlEVIO8=" originalsrc="https://spectrum.library.concordia.ca/id/eprint/7066/1/Services_techniques_a_l'Universite_Concordia.pdf" shash="DORKKMz2v9d9I4bzQNvHglM08SzlsNF3oL2JhhIhdoPUwScEdVkS3HzLzbc6LEZP3oNFl63rur9xTC6v+++/ujP66imYd6l1GhNc5dT6Cvuya5m8OPGg3pMOLfI1FQ16h1XP38v0j3W8BeCpwKakUw+dpr04kJfXJdzqtIHtMls=">https://spectrum.library.concordia.ca/id/eprint/7066/1/Services_techniques_a_l'Universite_Concordia.pdf!
</a></span></p>
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
But unlike other Unicode characters, the apostrophe doesn’t make it into the file name on disk, and is substituted with =0027.</p>
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
I’m looking for confirmation that this is how it is “supposed” to work, and for an understanding where this happens in the code, so that I might ultimately know how many OTHER characters are replaced in this way in the filename?</p>
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
&nbsp;</p>
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
Tomasz</p>
<p class="x_MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 11pt; font-family: Tahoma, sans-serif;">
&nbsp;</p>
</div>
</div>
<br>
<fieldset class="x_moz-mime-attachment-header"></fieldset>
<pre class="x_moz-quote-pre">*** Options: <a class="x_moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.ecs.soton.ac.uk%2Fmailman%2Flistinfo%2Feprints-tech&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C7683866899a64e2c1cb508d9fb9d1508%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637817476336433845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qTkABBuAlfPttEoGATVcPQCEVOsaUPpxUhvsVozkv8M%3D&amp;reserved=0" originalSrc="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech" shash="LlxyNJYr+Fxtpc/10UuhdeZ0k0Q9LdW2gRPC3gE5bCQj2ROI76raiVmR35vbH6oQEGVFCoE3s+MCIPTpI9FJSfU1gaU5UJFzFIJ35yHwrx5vmI3yBxb5pVTJVjmcr9HA6QCrvT/PQX+H+ieoAgO3FHsu8qgDwA0zzUAOTtaSRr0=" originalsrc="http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech" shash="cxStsZRkqC0QSaTWrVHfza+YU5x8l9lwltP5cf7hetKk/prXq4VzV7KZ0/IoYWFHl9pSrDSnb1h+LRojVGPZC+ehO2MMYAsRwtyIrlClM/LmjEuikbf+i28qxZ36D2ptXIrVgGk6HP5e3MHHmjzuYYyE8kVXDLhki/021gZ8pUo=">http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech</a>
*** Archive: <a class="x_moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C7683866899a64e2c1cb508d9fb9d1508%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637817476336433845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KsbMZ3dFwJ4CVVBna3td%2BIiMxEJe0Y9fhtcGW3lWK24%3D&amp;reserved=0" originalSrc="http://www.eprints.org/tech.php/" shash="gFJKIoOxE4HrLmtUxma4xJADhz0d9Llo0qjpy49Z7ogblOlYmDlo+THaiocXQ+YxFy3dQbRDH61LTp/CdbxF4DU/nHXaG9HFndDLhvqtUmajp0W2jqBIxFEUcgHH9WdRyl00hF+WEcGspB5Xvc+hp0ap4sVjNznw7jQvGWQnntQ=" originalsrc="http://www.eprints.org/tech.php/" shash="wQ0f1u+itt2AxD0izFLbloQVBhapD91QbWitLQL5tZrGftTpU0p/42E3Tw46BCGHKf5x4uQMNdbjMxqz059esWGehDjSbYpyCPKBtXKeNRsxo8OxEEgfo1xT+2aI459B/m8xJsDqx/HuDY0lpcz5upLtOu6rlh/B09cKl/dHXWU=">http://www.eprints.org/tech.php/</a>
*** EPrints community wiki: <a class="x_moz-txt-link-freetext" href="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C7683866899a64e2c1cb508d9fb9d1508%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637817476336590040%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=vQsa6ZJy5Xao2lJ9fclZZAToSETinnv%2Bmur9dPuvNGc%3D&amp;reserved=0" originalSrc="http://wiki.eprints.org/" shash="t27+tad61Y6tTYe+Pty4M/ZdLZCIe7ga07i7aSIzOLBrPhi2/pNPPy5MW6iAirerKtX4SzD1ZSmDGLZA6h0ZJnRe6j801bYm4VqAsy4Mo10qRERdJ8Evkf/lb8XyWQd1woFvE6mJukucGbO1svHnnneziIG82/TiWkFfREbMZp0=" originalsrc="http://wiki.eprints.org/" shash="tezTIZDFJ1D9hx/Rd3PkgEHeGuyPQV/4L2wyAmbqlA7/WjaiWpA6ZlvmgMxTbuPH1cfUadcgKR7KcLznl21sjaqG3tc+QNjruo+oVKAwK8jDnBPcKsbG5xyVvQ8pA1WoxBYHgDajlJ97SJuRWOZDPaBbYcfSa4FgRV6E6qWH1TU=">http://wiki.eprints.org/</a></pre>
</blockquote>
</div>
</div>
</body>
</html>