[EP-tech] Fixity Check and EPrints - Digital Preservation
Christopher Gutteridge
cjg at ecs.soton.ac.uk
Fri Aug 25 11:22:33 BST 2017
Probity is a bit like blockchain, but distributed. (I'm not an expert on
blockchain)
It never caught on, which is a pity, as the idea was sound.
I've some PHP code lying around for making a basic probity website.
On 25/08/2017 11:03, John Salter wrote:
>
> Hi Tomasz,
>
> I think we're looking into similar things at the moment :o)
>
> I think there are similarities between 'fixity' and 'probity' - so
> although there isn't integration of fixity, this might be useful info:
>
> EPrints does support 'probity' files (http://www.probity.org/), which
> include a hash of the contents.
>
> I don’t think these are generated by default, but the $doc->rehash
> command should generate them.
>
> See the EPrints::Probity module, and the 'rehash' option of bin/epadmin.
>
> Running [EPRINTS_ROOT]/bin/epadmin rehash [ARCHIVEID] [docid] will
> generate a file in the owning eprint folder e.g.
>
> [EPRINTS_ROOT]/archives/[ARCHIVEID]/documents/disk0/00/00/00/01/1.2017-08-25T09=003a55=003a29Z.xsh
>
> (for eprintid = 1, and docid = 1. Note the endcoded ':'s (=003a) in
> the timestamp in the filename).
>
> The file has the following data:
>
> <?xml version="1.0" encoding="UTF-8" ?>
>
> <hashlist xmlns="http://probity.org/XMLprobity">
>
> <hash>
>
> <name>wreo.txt</name>
>
> <algorithm>MD5</algorithm>
>
> <value>17f861744d77c1d9754fd7ab6f403065</value>
>
> <date>2017-08-25T09:55:45Z</date>
>
> </hash>
>
> </hashlist>
>
> You can create multiple Probity files, but I don't think there's any
> way to compare one with another, or check the current checksum is
> equal to the most recently store one (which is the main part of your
> question).
>
> Cheers,
>
> John
>
> PS I'm also looking into DROID - as you were at some point. The Bazaar
> package needs an update or three…
>
> *From:*eprints-tech-bounces at ecs.soton.ac.uk
> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] *On Behalf Of *Tomasz
> Neugebauer
> *Sent:* 24 August 2017 18:35
> *To:* eprints-tech at ecs.soton.ac.uk
> *Subject:* [EP-tech] Fixity Check and EPrints - Digital Preservation
>
> I believe that EPrints stores a checksum value for each uploaded file,
> but as far as I understand, there is no way to monitor if the
> checksums match up with current file, and thus no way of checking for
> bit rot.
>
> DSpace has the following:
> https://wiki.duraspace.org/display/DSDOC6x/Validating+CheckSums+of+Bitstreams
>
> A periodic fixity check is a part of the lowest level of support for
> digital preservation, i.e., “Bit-level”. See some examples of Digital
> Preservation policy, all of which have some variation on this as a
> requirement:“regularly audit checksums to ensure that no files have
> corrupted or changed in any way. This practice ensures the ability to
> provide an exact copy of original files over time”:
>
> ·https://www.sfu.ca/content/dam/sfu/archives/DigitalPreservation/FormatPolicyRegistry.pdf
> “Regularly perform fixity checks on AIPs”
>
> ·https://digital.library.yorku.ca/documentation/fixity-procedures
> “York University Library are committed to maintaining the integrity of
> objects in its care. This includes creating checksums for all archival
> format objects -- plus associated datastreams -- ingested into the
> repository, and regular fixity checking of those objects”
>
> ·https://researchworks.lib.washington.edu/policy-preservation.html
> "Maintains the authenticity of the bitstream through integrity checking”
>
> I understand that EPrints is primarily an open access platform, but I
> think that we should be able to provide at least the lowest
> “bit-level” digital preservation support with it, and without a Fixity
> check, I don’t think we can ensure that no files are corrupted or
> changed over time.
>
> Preservation Metadata for Institutional Repositories
> <http://preserv.eprints.org/papers/presmeta/pm-paper-draft.html>, a
> report looking at EPrints and digital preservation dating back to 2007
> states the following about Fixity checking “Where is fixity check
> first performed? Not within EPrints currently, but a script that
> crawls the archive comparing files with checksums is possible”. We are
> now 10 years later, and I am wondering if and how institutions running
> EPrints are implementing their Fixity checks? Are you using an
> external tool like this: https://www.avpreserve.com/tools/fixity/? Are
> you using your own custom script? Did you develop something that is
> integrated with the EPrints Admin interface?
>
>
> Best wishes,
>
> Tomasz
>
> ________________________________________________
>
> Tomasz Neugebauer
> Digital Projects & Systems Development Librarian / Bibliothécaire des
> Projets Numériques & Développement de Systèmes
> Library / Bibliothèque
> Concordia University / Université Concordia//
>
> Tel. / Tél. 514-848-2424 ext. / poste 7738
> Email / courriel: tomasz.neugebauer at concordia.ca
> <mailto:tomasz.neugebauer at concordia.ca>
>
> Mailing address / adresse postale: 1455 De Maisonneuve Blvd.
> W., LB-540-03, Montreal, Quebec H3G 1M8
> Street address / adresse municipale: 1400 De Maisonneuve Blvd.
> W., LB-540-03, Montreal, Quebec H3G 1M8
>
> http://library.concordia.ca <http://library.concordia.ca/>
> http://www.concordia.ca/faculty/tomasz-neugebauer.html //
>
>
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/
--
Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg
University of Southampton Open Data Service: http://data.southampton.ac.uk/
You should read our Web & Data Innovation blog: http://blogs.ecs.soton.ac.uk/webteam/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20170825/6c770e7c/attachment.html
More information about the Eprints-tech
mailing list