[EP-tech] Dissecting the Documents folder
Thomas Lauke
th.lauke at arcor.de
Thu Jun 8 14:24:44 BST 2017
Hi Andrew,
> Do I ... put it in the new <eprints_root>/archives/<myarchive>/documents folder?
Because I have no idea what have to be done additionally in the following I describe my successful path of the past:
- Unpack your documents to /tmp/disc0/00/... e.g. (none of the thumbnails or indexcodes if crucial)
- Replace the leading part of <url> appropriately, i.e. insert the physical structure, by a sed call with following lines:
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/0\1\/\2\/\3\/\4/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/0\1\/\2\/\3\/0\4/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/\1\/\2\/\3/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/\1\/\2\/0\3/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/0\1\/\2\/\3/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/0\1\/\2\/0\3/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/00\/\1\/\2/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/00\/\1\/0\2/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/00\/0\1\/\2/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/00\/0\1\/0\2/
- Take care of the spaces in the file path: fortunately we had file names without any spaces on our linux system, thus I have NO experience :-)
- Remove all <rev_number> tags by `xmlstarlet ed -d "//_:rev_number" in.xml > /tmp/out.xml` to restart the change history
- Check your import file by `~/Eprints/bin/import yourRepo --parse-only --force archive XML yourInput`
- Start final run by `~/Eprints/bin/import yourRepo --migration --force archive XML yourInput`
- If anything fails, restart after `~/Eprints/bin/import yourRepo erase_eprints`
> Which part of the xml needs rewriting to tell the import
> where to look for the file?
none due to your url modification/specification
The numbering follows the order of entries in your import file, thus any gap will be gone, but some confusion during comparing could occur ...
Hth
Thomas
More information about the Eprints-tech
mailing list