[EP-tech] digital preservation - indexing errors
Tomasz.Neugebauer at concordia.ca
Thu Jul 30 16:42:55 BST 2015
We just completed an upgrade of our repository, which includes a re-indexing phase of all the contents.
It was a good opportunity to take note of the errors that come up during indexing.
Here is a list of the common errors that occurred during indexing:
1. Error: Illegal entry in bfrange block in ToUnicode CMap
2. Error: Invalid Font Weight
3. Error (##): Illegal character <##> in hex string
4. Error: Can't create transform
5. Error: Couldn't link the profiles
There are also some of these:
Use of uninitialized value $data in substr at /opt/eprints3/tools/../perl_lib/Text/Extract/Word.pm line 68.
Use of uninitialized value $magic in numeric eq (==) at /opt/eprints3/tools/../perl_lib/Text/Extract/Word.pm line 69.
Use of uninitialized value $magic in sprintf at /opt/eprints3/tools/../perl_lib/Text/Extract/Word.pm line 69.
This does not seem to be a Word document, but it is pretending to be one: 0 at /opt/eprints3/tools/doc2txt line 68
Error 255 from doc2txt command: [...]
Error #1 and #3 look to be the most common.
Have you encountered these types of indexing errors?
How serious are they in terms of digital preservation?
Do you use any specific strategies/workflows for dealing with these?
Do the EPrints preservation (http://files.eprints.org/696/) plugins help with identifying/solving these issues?
Thanks for any comments/suggestions about this.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Eprints-tech