-
Star
(124)
You must be signed in to star a gist -
Fork
(16)
You must be signed in to fork a gist
-
-
Save minrk/6176788 to your computer and use it in GitHub Desktop.
| #!/usr/bin/env python | |
| """strip outputs from an IPython Notebook | |
| Opens a notebook, strips its output, and writes the outputless version to the original file. | |
| Useful mainly as a git filter or pre-commit hook for users who don't want to track output in VCS. | |
| This does mostly the same thing as the `Clear All Output` command in the notebook UI. | |
| LICENSE: Public Domain | |
| """ | |
| import io | |
| import sys | |
| try: | |
| # Jupyter >= 4 | |
| from nbformat import read, write, NO_CONVERT | |
| except ImportError: | |
| # IPython 3 | |
| try: | |
| from IPython.nbformat import read, write, NO_CONVERT | |
| except ImportError: | |
| # IPython < 3 | |
| from IPython.nbformat import current | |
| def read(f, as_version): | |
| return current.read(f, 'json') | |
| def write(nb, f): | |
| return current.write(nb, f, 'json') | |
| def _cells(nb): | |
| """Yield all cells in an nbformat-insensitive manner""" | |
| if nb.nbformat < 4: | |
| for ws in nb.worksheets: | |
| for cell in ws.cells: | |
| yield cell | |
| else: | |
| for cell in nb.cells: | |
| yield cell | |
| def strip_output(nb): | |
| """strip the outputs from a notebook object""" | |
| nb.metadata.pop('signature', None) | |
| for cell in _cells(nb): | |
| if 'outputs' in cell: | |
| cell['outputs'] = [] | |
| if 'prompt_number' in cell: | |
| cell['prompt_number'] = None | |
| return nb | |
| if __name__ == '__main__': | |
| filename = sys.argv[1] | |
| with io.open(filename, 'r', encoding='utf8') as f: | |
| nb = read(f, as_version=NO_CONVERT) | |
| nb = strip_output(nb) | |
| with io.open(filename, 'w', encoding='utf8') as f: | |
| write(nb, f) | |
| #!/bin/sh | |
| # | |
| # strip output of IPython Notebooks | |
| # add this as `.git/hooks/pre-commit` | |
| # to run every time you commit a notebook | |
| # | |
| # requires `nbstripout` to be available on your PATH | |
| # | |
| # LICENSE: Public Domain | |
| if git rev-parse --verify HEAD >/dev/null 2>&1; then | |
| against=HEAD | |
| else | |
| # Initial commit: diff against an empty tree object | |
| against=4b825dc642cb6eb9a060e54bf8d69288fbee4904 | |
| fi | |
| # Find notebooks to be committed | |
| ( | |
| IFS=' | |
| ' | |
| NBS=`git diff-index -z --cached $against --name-only | grep '.ipynb$' | uniq` | |
| for NB in $NBS ; do | |
| echo "Removing outputs from $NB" | |
| nbstripout "$NB" | |
| git add "$NB" | |
| done | |
| ) | |
| exec git diff-index --check --cached $against -- |
Slightly modified method that works with the new notebook format (v4) used in iPython 3
https://gist.github.com/waylonflinn/010f0a1a66760adf914f
The essential difference is an added check for the presence of the worksheets object on the root.
I've created a version that removes the whole cell. Although I have to admit the way I track the index is not at all optimal and there might be better ways making proper use of the API. Feedback welcome:
https://gist.github.com/dietmarw/dc0cf089d8d6211136d5
I have added documentation, an nbstripout install command to install the filter in the current Git repository and turned it into a module with a setuptools script entry point: https://github.com/kynan/nbstripout
How do you feel about publishing that on PyPI @minrk?
I've adapted cfriedline's repo to make it easy to install to any repo as a filter https://github.com/jond3k/ipynb_stripout
@jond3k Have a look at my repo linked above: it works with v3 and v4 and has an install command to automate the installation in any git repo.
@kynan feel free to put it on PyPI. No need to wait for me.
@minrk OK, will do, thanks!
Great snippet, thanks a lot for sharing!
Two suggestions:
- Small fix: I guess it should be
grep '\.ipynb$'with the.escaped, else it will match anything - Also add
| tr -d '\000' |before grep:NBS=`git diff-index -z --cached $against --name-only | tr -d '\000' | grep '\.ipynb$' | uniq
The second point is because there will be cases where grep considers the input binary (https://unix.stackexchange.com/questions/19907/what-makes-grep-consider-a-file-to-be-binary). This happens to me when using zsh (i.e. getting Binary file (standard input) matches from grep instead of the matchiing parts)
the pre-commit hook approach didn't work for me (the grep somehow found .py files, but only if there was a .ipynb in the commit..) but filter seems cleaner anyway. Here's what I did to get it working:
I modified cfriedline's nbstripout file slightly to give an informative error when you can't import the latest IPython:
https://github.com/petered/plato/blob/fb2f4e252f50c79768920d0e47b870a8d799e92b/notebooks/config/strip_notebook_output
And added it to my repo, lets say in
./relative/path/to/nbstripoutAlso added the file .gitattributes file to the root of the repo, containing:
And created a
setup_git_filters.shcontainingAnd ran
source setup_git_filters.sh. The fancy $(git rev-parse...) thing is to find the local path of your repo on any (Unix) machine.