[EP-tech] Indexing based on case sensitive file extension check?

Rory McNicholl rory.mcnicholl at london.ac.uk
Fri Feb 7 14:54:17 GMT 2014


Hello,

Bernard from IOE noticed that if he uploaded a pdf with an uppercase
extension (ie .PDF) it was never indexed. If he replaced that with the
same file with a lowercase extension it got indexed.

I managed to find the cause in
perl_lib/EPrints/Plugin/Convert/PlainText.pm

Where in the *can_convert* (ln 70) and *export* (ln 118) subs, there are
regexs that check the file extension before continuing. These expect
lowercase file extensions and so no indexcodes are extracted from .PDFs
of .DOCs or .HTMLs etc.

Easy to fix, once found, but took me ages.

Looking in github I can't see where any regression might have occurred
so I'm wondering if it was ever thus?

Cheers,

Rory

-- 
Rory McNicholl
Lead developer, Research Repositories Team
Academic Research Technologies
University of London Computer Centre
Senate House
Malet Street
London
WC1E 7HU

t: +44 (0)20 7863 1344
e: r.mcnicholl at ulcc.ac.uk
w: http://www.ulcc.ac.uk/
b: http://dablog.ulcc.ac.uk/


To ensure you receive the full benefits of the repositories service
please remember to cc repositories at ulcc.ac.uk

The University of London is an exempt charity in England and Wales and a
charity registered in Scotland (reg. no. SC041194)






More information about the Eprints-tech mailing list