Skip to content

Instantly share code, notes, and snippets.

@stephenpascoe
Last active December 11, 2015 01:29
Show Gist options
  • Select an option

  • Save stephenpascoe/4523681 to your computer and use it in GitHub Desktop.

Select an option

Save stephenpascoe/4523681 to your computer and use it in GitHub Desktop.
A simple script to convert a drslib dataset directory to git-annex
#!/bin/sh
#
# This is a nieve migration script for converting a drslib dataset directory
# into a git-annex managed format. The result is properly deduplicated
# but each version will be temporarily duplicated during migration. This would
# be problematic for large datasets. A more robust approach could use
# "git-annex reinject" to set the versioned path of each file.
#
cd $DATASET_DIRECTORY
# Initialise git-annex
git init
git annex init
# Use the fastest backend. This doesn't offer cryptographic checksums
git config annex.backends WORM
# Add all the real files (in the "files" subdirectory)
git annex add files
git commit -m "Adding all files to annex"
# For each version make a copy of all files to a to a temporary dir then add
# them to the annex.
for version in v*
do
cp -rL ${version} ${version}_import
git annex add ${version}_import
rm -rf ${version}
git mv ${version}_import ${version}
git commit -m "Imported $version"
done
# Clean up
git rm -r files
git commit -m "Removing reference to files/"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment