[EP-tech] Fixity Check and EPrints - Digital Preservation

Tomasz Neugebauer Tomasz.Neugebauer at concordia.ca
Thu Aug 24 18:34:32 BST 2017

I believe that EPrints stores a checksum value for each uploaded file, but as far as I understand, there is no way to monitor if the checksums match up with current file, and thus no way of checking for bit rot.

DSpace has the following: https://wiki.duraspace.org/display/DSDOC6x/Validating+CheckSums+of+Bitstreams

A periodic fixity check is a part of the lowest level of support for digital preservation, i.e., "Bit-level".  See some examples of Digital Preservation policy, all of which have some variation on this as a requirement:"regularly audit checksums to ensure that no files have corrupted or changed in any way. This practice ensures the ability to provide an exact copy of original files over time":

·         https://www.sfu.ca/content/dam/sfu/archives/DigitalPreservation/FormatPolicyRegistry.pdf "Regularly perform fixity checks on AIPs"

·         https://digital.library.yorku.ca/documentation/fixity-procedures "York University Library are committed to maintaining the integrity of objects in its care. This includes creating checksums for all archival format objects -- plus associated datastreams -- ingested into the repository, and regular fixity checking of those objects"

·         https://researchworks.lib.washington.edu/policy-preservation.html "Maintains the authenticity of the bitstream through integrity checking"

I understand that EPrints is primarily an open access platform, but I think that we should be able to provide at least the lowest "bit-level" digital preservation support with it, and without a Fixity check, I don't think we can ensure that no files are corrupted or changed over time.

Preservation Metadata for Institutional Repositories<http://preserv.eprints.org/papers/presmeta/pm-paper-draft.html>, a report looking at EPrints and digital preservation dating back to 2007 states the following about Fixity checking "Where is fixity check first performed? Not within EPrints currently, but a script that crawls the archive comparing files with checksums is possible".  We are now 10 years later, and I am wondering if and how institutions running EPrints are implementing their Fixity checks? Are you using an external tool like this: https://www.avpreserve.com/tools/fixity/? Are you using your own custom script?  Did you develop something that is integrated with the EPrints Admin interface?

