piiswrong · July 26, 2017 23:00
diff --git a/foobar.ipynb b/foobar.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Imperative Machine Learning with MXNet and Foobar\n",
    "\n",
    "This tutorial introduces how to do imperative tensor computation with MXNet and how to use Foo, which is a new user friendly interface for MXNet that doesn't have a name yet :)\n",
    "\n",
    "You can download this tutorial here: https://gist.github.com/piiswrong/2716581ebeb3d6560c1ad916a793f90d\n",
    "\n",
    "API Reference for Foo package can be found here: http://mxnet-doc.s3-website-us-east-1.amazonaws.com/api/python/foo.html\n",
    "\n",
    "## Setup\n",
    "\n",
    "You need to clone MXNet and checkout the nn branch:\n",
    "```\n",
    "git clone https://github.com/dmlc/mxnet.git --recursive\n",
    "git checkout nn\n",
    "```\n",
    "Then follow the \"Build from Source\" section of the [installation guide]( http://mxnet.io/get_started/install.html)\n",
    "\n",
    "Now you can import MXNet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "import mxnet as mx"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basics\n",
    "\n",
    "NDArray and Operators"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x:  <NDArray 2x2 @cpu(0)>\n",
      "[[ 1.  2.]\n",
      " [ 3.  4.]]\n"
     ]
    }
   ],
   "source": [
    "x = mx.nd.array([[1, 2], [3, 4]])\n",
    "y = mx.nd.array([[5, 6], [7, 8]])\n",
    "# gpu_x = mx.nd.array([1, 2], ctx=mx.gpu(0))\n",
    "print('x: ', x)\n",
    "print(x.asnumpy())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 19.  22.]\n",
      " [ 43.  50.]]\n"
     ]
    }
   ],
   "source": [
    "z = mx.nd.dot(x, y)\n",
    "print(z.asnumpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See [here](http://mxnet.io/api/python/ndarray.html) for a list of all operators"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Automatic differentiation\n",
    "\n",
    "Attach gradient buffers to NDArrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from mxnet.autograd import *\n",
    "\n",
    "dx = mx.nd.zeros_like(x)\n",
    "dy = mx.nd.zeros_like(y)\n",
    "mark_variables(x, dx)\n",
    "mark_variables(y, dy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "backward, gradient, and the train_section:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dx =  [[ 5.  6.]\n",
      " [ 7.  8.]]\n",
      "dy =  [[ 1.  2.]\n",
      " [ 3.  4.]]\n"
     ]
    }
   ],
   "source": [
    "with train_section():\n",
    "    z = x * y\n",
    "    z.backward()\n",
    "print('dx = ', dx.asnumpy())\n",
    "print('dy = ', dy.asnumpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Supplying head gradient (gradient w.r.t z):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dx =  [[ 10.  12.]\n",
      " [ 14.  16.]]\n",
      "dy =  [[ 2.  4.]\n",
      " [ 6.  8.]]\n"
     ]
    }
   ],
   "source": [
    "dz = mx.nd.ones_like(z)*2\n",
    "with train_section():\n",
    "    z = x * y\n",
    "    z.backward(dz)\n",
    "print('dx = ', dx.asnumpy())\n",
    "print('dy = ', dy.asnumpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Foo\n",
    "\n",
    "### Layers\n",
    "\n",
    "Foo provides basic neural network building block as `Layer`. For example, `Dense(4, in_units=2)` is a fully connected layer that takes in length 2 inputs and produce length 4 outputs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from mxnet import foo\n",
    "from mxnet.foo import nn\n",
    "dense = nn.Dense(4, activation='relu', in_units=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we can use it, we must initialize dense's parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "weight:  <NDArray 4x2 @cpu(0)>\n",
      "[[ 0.04881352  0.09284461]\n",
      " [ 0.21518934  0.34426576]\n",
      " [ 0.10276335  0.35794562]\n",
      " [ 0.04488319  0.34725171]]\n",
      "bias:  <NDArray 4 @cpu(0)>\n",
      "[ 0.  0.  0.  0.]\n"
     ]
    }
   ],
   "source": [
    "dense.all_params().initialize(mx.init.Uniform(0.5), ctx=mx.cpu(0))\n",
    "print('weight: ', dense.weight.data())\n",
    "print(dense.weight.data().asnumpy())\n",
    "\n",
    "print('bias: ', dense.bias.data())\n",
    "print(dense.bias.data().asnumpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are ready to do a *forward pass*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 0.23450273  0.90372086  0.8186546   0.73938662]\n",
      " [ 0.51781899  2.02263117  1.74007249  1.52365637]]\n"
     ]
    }
   ],
   "source": [
    "output = dense(x)\n",
    "print(output.asnumpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here `x` is interpreted as a input with batch_size=2 and length 2.\n",
    "\n",
    "### Composing Layers\n",
    "You can compose multiple layers into a neural network by inheriting `nn.Layer`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class Net(nn.Layer):\n",
    "    def __init__(self, **kwargs):\n",
    "        super(Net, self).__init__(**kwargs)\n",
    "        with self.scope:\n",
    "            # layers assigned to self in scope will be registered as sub-layers\n",
    "            self.fc1 = nn.Dense(4, in_units=2)\n",
    "            self.fc2 = nn.Dense(3, in_units=4)\n",
    "    \n",
    "    def generic_forward(self, F, x):\n",
    "        # when x is an NDArray, F will be set to mx.nd\n",
    "        x = F.relu(self.fc1(x))\n",
    "        x = self.fc2(x)\n",
    "        return x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can use it the same way we used dense:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[-0.72901833 -2.01382256  1.48214853]\n",
      " [-1.71972466 -4.79868174  3.42445707]]\n"
     ]
    }
   ],
   "source": [
    "net = Net()\n",
    "net.all_params().initialize(ctx=mx.cpu(0))\n",
    "output = net(x)\n",
    "print(output.asnumpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Trainer and Loss Functions\n",
    "\n",
    "To train the network you need an optimizer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "trainer = foo.Trainer(net.all_params(), 'sgd', {'learning_rate': 0.1})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then you can forward the network in a train_section and compute gradient with backward:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "loss =  [ 0.94735891  1.18011999]\n",
      "fc2.bias =  [ 0.0547721   0.08332293 -0.13809504]\n",
      "d(loss)/d(fc2.bias) =  [-0.17545167 -0.35362723  0.52907884]\n"
     ]
    }
   ],
   "source": [
    "label = mx.nd.array([0, 1])\n",
    "with train_section():\n",
    "    output = net(x)\n",
    "    loss = foo.loss.softmax_cross_entropy_loss(output, label)\n",
    "    loss.backward()\n",
    "print('loss = ', loss.asnumpy())\n",
    "print('fc2.bias = ', net.fc2.bias.data().asnumpy())\n",
    "print('d(loss)/d(fc2.bias) = ', net.fc2.bias.grad().asnumpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can make a gradient step with Trainer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "fc2.bias =  [ 0.06354468  0.1010043  -0.16454898]\n"
     ]
    }
   ],
   "source": [
    "trainer.step(batch_size=2)\n",
    "print('fc2.bias = ', net.fc2.bias.data().asnumpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the weights has changed a little. You can repeat the last two cells in a loop to train your network."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Imperative Machine Learning with MXNet and Foobar\n",
	"\n",
	"This tutorial introduces how to do imperative tensor computation with MXNet and how to use Foo, which is a new user friendly interface for MXNet that doesn't have a name yet :)\n",
	"\n",
	"You can download this tutorial here: https://gist.github.com/piiswrong/2716581ebeb3d6560c1ad916a793f90d\n",
	"\n",
	"API Reference for Foo package can be found here: http://mxnet-doc.s3-website-us-east-1.amazonaws.com/api/python/foo.html\n",
	"\n",
	"## Setup\n",
	"\n",
	"You need to clone MXNet and checkout the nn branch:\n",
	"```\n",
	"git clone https://github.com/dmlc/mxnet.git --recursive\n",
	"git checkout nn\n",
	"```\n",
	"Then follow the \"Build from Source\" section of the [installation guide]( http://mxnet.io/get_started/install.html)\n",
	"\n",
	"Now you can import MXNet"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"from __future__ import print_function\n",
	"import mxnet as mx"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Basics\n",
	"\n",
	"NDArray and Operators"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"x: <NDArray 2x2 @cpu(0)>\n",
	"[[ 1. 2.]\n",
	" [ 3. 4.]]\n"
	]
	}
	],
	"source": [
	"x = mx.nd.array([[1, 2], [3, 4]])\n",
	"y = mx.nd.array([[5, 6], [7, 8]])\n",
	"# gpu_x = mx.nd.array([1, 2], ctx=mx.gpu(0))\n",
	"print('x: ', x)\n",
	"print(x.asnumpy())"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[ 19. 22.]\n",
	" [ 43. 50.]]\n"
	]
	}
	],
	"source": [
	"z = mx.nd.dot(x, y)\n",
	"print(z.asnumpy())"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"See [here](http://mxnet.io/api/python/ndarray.html) for a list of all operators"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Automatic differentiation\n",
	"\n",
	"Attach gradient buffers to NDArrays:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"from mxnet.autograd import *\n",
	"\n",
	"dx = mx.nd.zeros_like(x)\n",
	"dy = mx.nd.zeros_like(y)\n",
	"mark_variables(x, dx)\n",
	"mark_variables(y, dy)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"backward, gradient, and the train_section:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"dx = [[ 5. 6.]\n",
	" [ 7. 8.]]\n",
	"dy = [[ 1. 2.]\n",
	" [ 3. 4.]]\n"
	]
	}
	],
	"source": [
	"with train_section():\n",
	" z = x * y\n",
	" z.backward()\n",
	"print('dx = ', dx.asnumpy())\n",
	"print('dy = ', dy.asnumpy())"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Supplying head gradient (gradient w.r.t z):"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"dx = [[ 10. 12.]\n",
	" [ 14. 16.]]\n",
	"dy = [[ 2. 4.]\n",
	" [ 6. 8.]]\n"
	]
	}
	],
	"source": [
	"dz = mx.nd.ones_like(z)*2\n",
	"with train_section():\n",
	" z = x * y\n",
	" z.backward(dz)\n",
	"print('dx = ', dx.asnumpy())\n",
	"print('dy = ', dy.asnumpy())"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Foo\n",
	"\n",
	"### Layers\n",
	"\n",
	"Foo provides basic neural network building block as `Layer`. For example, `Dense(4, in_units=2)` is a fully connected layer that takes in length 2 inputs and produce length 4 outputs:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"from mxnet import foo\n",
	"from mxnet.foo import nn\n",
	"dense = nn.Dense(4, activation='relu', in_units=2)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Before we can use it, we must initialize dense's parameters:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"weight: <NDArray 4x2 @cpu(0)>\n",
	"[[ 0.04881352 0.09284461]\n",
	" [ 0.21518934 0.34426576]\n",
	" [ 0.10276335 0.35794562]\n",
	" [ 0.04488319 0.34725171]]\n",
	"bias: <NDArray 4 @cpu(0)>\n",
	"[ 0. 0. 0. 0.]\n"
	]
	}
	],
	"source": [
	"dense.all_params().initialize(mx.init.Uniform(0.5), ctx=mx.cpu(0))\n",
	"print('weight: ', dense.weight.data())\n",
	"print(dense.weight.data().asnumpy())\n",
	"\n",
	"print('bias: ', dense.bias.data())\n",
	"print(dense.bias.data().asnumpy())"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we are ready to do a forward pass:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[ 0.23450273 0.90372086 0.8186546 0.73938662]\n",
	" [ 0.51781899 2.02263117 1.74007249 1.52365637]]\n"
	]
	}
	],
	"source": [
	"output = dense(x)\n",
	"print(output.asnumpy())"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Here `x` is interpreted as a input with batch_size=2 and length 2.\n",
	"\n",
	"### Composing Layers\n",
	"You can compose multiple layers into a neural network by inheriting `nn.Layer`:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 15,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"class Net(nn.Layer):\n",
	" def __init__(self, **kwargs):\n",
	" super(Net, self).__init__(**kwargs)\n",
	" with self.scope:\n",
	" # layers assigned to self in scope will be registered as sub-layers\n",
	" self.fc1 = nn.Dense(4, in_units=2)\n",
	" self.fc2 = nn.Dense(3, in_units=4)\n",
	" \n",
	" def generic_forward(self, F, x):\n",
	" # when x is an NDArray, F will be set to mx.nd\n",
	" x = F.relu(self.fc1(x))\n",
	" x = self.fc2(x)\n",
	" return x"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we can use it the same way we used dense:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 16,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[-0.72901833 -2.01382256 1.48214853]\n",
	" [-1.71972466 -4.79868174 3.42445707]]\n"
	]
	}
	],
	"source": [
	"net = Net()\n",
	"net.all_params().initialize(ctx=mx.cpu(0))\n",
	"output = net(x)\n",
	"print(output.asnumpy())"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Trainer and Loss Functions\n",
	"\n",
	"To train the network you need an optimizer:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 17,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"trainer = foo.Trainer(net.all_params(), 'sgd', {'learning_rate': 0.1})"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Then you can forward the network in a train_section and compute gradient with backward:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 25,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"loss = [ 0.94735891 1.18011999]\n",
	"fc2.bias = [ 0.0547721 0.08332293 -0.13809504]\n",
	"d(loss)/d(fc2.bias) = [-0.17545167 -0.35362723 0.52907884]\n"
	]
	}
	],
	"source": [
	"label = mx.nd.array([0, 1])\n",
	"with train_section():\n",
	" output = net(x)\n",
	" loss = foo.loss.softmax_cross_entropy_loss(output, label)\n",
	" loss.backward()\n",
	"print('loss = ', loss.asnumpy())\n",
	"print('fc2.bias = ', net.fc2.bias.data().asnumpy())\n",
	"print('d(loss)/d(fc2.bias) = ', net.fc2.bias.grad().asnumpy())"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we can make a gradient step with Trainer"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 26,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"fc2.bias = [ 0.06354468 0.1010043 -0.16454898]\n"
	]
	}
	],
	"source": [
	"trainer.step(batch_size=2)\n",
	"print('fc2.bias = ', net.fc2.bias.data().asnumpy())"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Note that the weights has changed a little. You can repeat the last two cells in a loop to train your network."
	]
	}
	],
	"metadata": {
	"anaconda-cloud": {},
	"kernelspec": {
	"display_name": "Python [default]",
	"language": "python",
	"name": "python2"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 2
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.12"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}
No results found