Last active
July 26, 2017 23:00
-
-
Save piiswrong/2716581ebeb3d6560c1ad916a793f90d to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Imperative Machine Learning with MXNet and Foobar\n", | |
| "\n", | |
| "This tutorial introduces how to do imperative tensor computation with MXNet and how to use Foo, which is a new user friendly interface for MXNet that doesn't have a name yet :)\n", | |
| "\n", | |
| "You can download this tutorial here: https://gist.github.com/piiswrong/2716581ebeb3d6560c1ad916a793f90d\n", | |
| "\n", | |
| "API Reference for Foo package can be found here: http://mxnet-doc.s3-website-us-east-1.amazonaws.com/api/python/foo.html\n", | |
| "\n", | |
| "## Setup\n", | |
| "\n", | |
| "You need to clone MXNet and checkout the nn branch:\n", | |
| "```\n", | |
| "git clone https://github.com/dmlc/mxnet.git --recursive\n", | |
| "git checkout nn\n", | |
| "```\n", | |
| "Then follow the \"Build from Source\" section of the [installation guide]( http://mxnet.io/get_started/install.html)\n", | |
| "\n", | |
| "Now you can import MXNet" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "from __future__ import print_function\n", | |
| "import mxnet as mx" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Basics\n", | |
| "\n", | |
| "NDArray and Operators" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "x: <NDArray 2x2 @cpu(0)>\n", | |
| "[[ 1. 2.]\n", | |
| " [ 3. 4.]]\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "x = mx.nd.array([[1, 2], [3, 4]])\n", | |
| "y = mx.nd.array([[5, 6], [7, 8]])\n", | |
| "# gpu_x = mx.nd.array([1, 2], ctx=mx.gpu(0))\n", | |
| "print('x: ', x)\n", | |
| "print(x.asnumpy())" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 7, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "[[ 19. 22.]\n", | |
| " [ 43. 50.]]\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "z = mx.nd.dot(x, y)\n", | |
| "print(z.asnumpy())" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "See [here](http://mxnet.io/api/python/ndarray.html) for a list of all operators" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Automatic differentiation\n", | |
| "\n", | |
| "Attach gradient buffers to NDArrays:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 8, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "from mxnet.autograd import *\n", | |
| "\n", | |
| "dx = mx.nd.zeros_like(x)\n", | |
| "dy = mx.nd.zeros_like(y)\n", | |
| "mark_variables(x, dx)\n", | |
| "mark_variables(y, dy)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "backward, gradient, and the train_section:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 9, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "dx = [[ 5. 6.]\n", | |
| " [ 7. 8.]]\n", | |
| "dy = [[ 1. 2.]\n", | |
| " [ 3. 4.]]\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "with train_section():\n", | |
| " z = x * y\n", | |
| " z.backward()\n", | |
| "print('dx = ', dx.asnumpy())\n", | |
| "print('dy = ', dy.asnumpy())" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Supplying head gradient (gradient w.r.t z):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 10, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "dx = [[ 10. 12.]\n", | |
| " [ 14. 16.]]\n", | |
| "dy = [[ 2. 4.]\n", | |
| " [ 6. 8.]]\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "dz = mx.nd.ones_like(z)*2\n", | |
| "with train_section():\n", | |
| " z = x * y\n", | |
| " z.backward(dz)\n", | |
| "print('dx = ', dx.asnumpy())\n", | |
| "print('dy = ', dy.asnumpy())" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Foo\n", | |
| "\n", | |
| "### Layers\n", | |
| "\n", | |
| "Foo provides basic neural network building block as `Layer`. For example, `Dense(4, in_units=2)` is a fully connected layer that takes in length 2 inputs and produce length 4 outputs:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 11, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "from mxnet import foo\n", | |
| "from mxnet.foo import nn\n", | |
| "dense = nn.Dense(4, activation='relu', in_units=2)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Before we can use it, we must initialize dense's parameters:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 12, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "weight: <NDArray 4x2 @cpu(0)>\n", | |
| "[[ 0.04881352 0.09284461]\n", | |
| " [ 0.21518934 0.34426576]\n", | |
| " [ 0.10276335 0.35794562]\n", | |
| " [ 0.04488319 0.34725171]]\n", | |
| "bias: <NDArray 4 @cpu(0)>\n", | |
| "[ 0. 0. 0. 0.]\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "dense.all_params().initialize(mx.init.Uniform(0.5), ctx=mx.cpu(0))\n", | |
| "print('weight: ', dense.weight.data())\n", | |
| "print(dense.weight.data().asnumpy())\n", | |
| "\n", | |
| "print('bias: ', dense.bias.data())\n", | |
| "print(dense.bias.data().asnumpy())" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now we are ready to do a *forward pass*:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 13, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "[[ 0.23450273 0.90372086 0.8186546 0.73938662]\n", | |
| " [ 0.51781899 2.02263117 1.74007249 1.52365637]]\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "output = dense(x)\n", | |
| "print(output.asnumpy())" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Here `x` is interpreted as a input with batch_size=2 and length 2.\n", | |
| "\n", | |
| "### Composing Layers\n", | |
| "You can compose multiple layers into a neural network by inheriting `nn.Layer`:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 15, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "class Net(nn.Layer):\n", | |
| " def __init__(self, **kwargs):\n", | |
| " super(Net, self).__init__(**kwargs)\n", | |
| " with self.scope:\n", | |
| " # layers assigned to self in scope will be registered as sub-layers\n", | |
| " self.fc1 = nn.Dense(4, in_units=2)\n", | |
| " self.fc2 = nn.Dense(3, in_units=4)\n", | |
| " \n", | |
| " def generic_forward(self, F, x):\n", | |
| " # when x is an NDArray, F will be set to mx.nd\n", | |
| " x = F.relu(self.fc1(x))\n", | |
| " x = self.fc2(x)\n", | |
| " return x" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now we can use it the same way we used dense:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 16, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "[[-0.72901833 -2.01382256 1.48214853]\n", | |
| " [-1.71972466 -4.79868174 3.42445707]]\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "net = Net()\n", | |
| "net.all_params().initialize(ctx=mx.cpu(0))\n", | |
| "output = net(x)\n", | |
| "print(output.asnumpy())" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Trainer and Loss Functions\n", | |
| "\n", | |
| "To train the network you need an optimizer:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 17, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "trainer = foo.Trainer(net.all_params(), 'sgd', {'learning_rate': 0.1})" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Then you can forward the network in a train_section and compute gradient with backward:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 25, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "loss = [ 0.94735891 1.18011999]\n", | |
| "fc2.bias = [ 0.0547721 0.08332293 -0.13809504]\n", | |
| "d(loss)/d(fc2.bias) = [-0.17545167 -0.35362723 0.52907884]\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "label = mx.nd.array([0, 1])\n", | |
| "with train_section():\n", | |
| " output = net(x)\n", | |
| " loss = foo.loss.softmax_cross_entropy_loss(output, label)\n", | |
| " loss.backward()\n", | |
| "print('loss = ', loss.asnumpy())\n", | |
| "print('fc2.bias = ', net.fc2.bias.data().asnumpy())\n", | |
| "print('d(loss)/d(fc2.bias) = ', net.fc2.bias.grad().asnumpy())" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now we can make a gradient step with Trainer" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 26, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "fc2.bias = [ 0.06354468 0.1010043 -0.16454898]\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "trainer.step(batch_size=2)\n", | |
| "print('fc2.bias = ', net.fc2.bias.data().asnumpy())" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Note that the weights has changed a little. You can repeat the last two cells in a loop to train your network." | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "anaconda-cloud": {}, | |
| "kernelspec": { | |
| "display_name": "Python [default]", | |
| "language": "python", | |
| "name": "python2" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 2 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython2", | |
| "version": "2.7.12" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 2 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment