[EP-tech] Dissecting the Documents folder

Thomas Lauke th.lauke at arcor.de
Thu Jun 8 14:24:44 BST 2017


Hi Andrew,

> Do I ... put it in the new <eprints_root>/archives/<myarchive>/documents folder?
Because I have no idea what have to be done additionally in the following I describe my successful path of the past:

- Unpack your documents to /tmp/disc0/00/... e.g. (none of the thumbnails or indexcodes if crucial)

- Replace the leading part of <url> appropriately, i.e. insert the physical structure, by a sed call with following lines:
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/0\1\/\2\/\3\/\4/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/0\1\/\2\/\3\/0\4/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/\1\/\2\/\3/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/\1\/\2\/0\3/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/0\1\/\2\/\3/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/0\1\/\2\/0\3/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/00\/\1\/\2/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/00\/\1\/0\2/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/00\/0\1\/\2/
%s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/00\/0\1\/0\2/

- Take care of the spaces in the file path: fortunately we had file names without any spaces on our linux system, thus I have NO experience :-)

- Remove all <rev_number> tags by `xmlstarlet ed -d "//_:rev_number" in.xml > /tmp/out.xml` to restart the change history

- Check your import file by `~/Eprints/bin/import yourRepo --parse-only --force archive XML yourInput`

- Start final run by `~/Eprints/bin/import yourRepo --migration --force archive XML yourInput`

- If anything fails, restart after `~/Eprints/bin/import yourRepo erase_eprints`

> Which part of the xml needs rewriting to tell the import 
> where to look for the file?
none due to your url modification/specification

The numbering follows the order of entries in your import file, thus any gap will be gone, but some confusion during comparing could occur ...

Hth
Thomas


More information about the Eprints-tech mailing list