Last active
October 23, 2015 01:15
-
-
Save asalt/4dba5854db30aaaadba0 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# An introduction to dictionaries in python." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Dictionaries are a built in container in python that associate key -> value pairs unidirectionally. Dictionary keys are always unique. Dictionaries are unordered hash tables that can look up a value very quickly from a key. Let's look at a basic dictionary construct." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "d = {'Alex':'grad student', 'Anna':'P.I.', 'Bhoomi':'intern'} # Alex, Anna, Bhoomi are our keys" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now let's build a dictionary on the fly:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Adding color blue to person Alex\n", | |
| "Adding color red to person Anna\n", | |
| "Adding color green to person Bhoomi\n", | |
| "Adding color purple to person Matt\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Here we have two lists of things, people and colors.\n", | |
| "# Our goal is to build a dictionary that maps each person to their favorite color.\n", | |
| "people = ['Alex','Anna','Bhoomi','Matt']\n", | |
| "colors = ['blue','red','green','purple']\n", | |
| "d = dict() # establish variable d as an empty dictionary. We could also say \"d = {}\" for the same effect.\n", | |
| "for person, color in zip(people, colors):\n", | |
| " print('Adding color {} to person {}'.format(color, person))\n", | |
| " d[person] = color\n", | |
| " " | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "'blue'" | |
| ] | |
| }, | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "# Now we can query our dictionary:\n", | |
| "d['Alex']" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "As you can see, by querying a dictionary with a key we obtain its value. But what if the dictionary doesn't have the key we query?" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 4, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "ename": "KeyError", | |
| "evalue": "'Daniel'", | |
| "output_type": "error", | |
| "traceback": [ | |
| "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
| "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", | |
| "\u001b[0;32m<ipython-input-4-b810aeb68963>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Daniel'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", | |
| "\u001b[0;31mKeyError\u001b[0m: 'Daniel'" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "d['Daniel']" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "The error is, not suprisingly, a KeyError letting us know that the key we tried to use is not in our dictionary. \n", | |
| "There is another way to query a dictionary, using the .get() method along with our key as the first argument. This can be useful when you are not sure if a key is present or not in a dictionary." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 5, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "green\n", | |
| "None\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "print(d.get('Bhoomi'))\n", | |
| "print(d.get('Daniel'))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Instead of throwing an error, .get() returns None (by default) if the key is not present in our dictionary. We can modify this behavior by adding in a second argument to our .get() method:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 6, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "red\n", | |
| "not present\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "print(d.get('Anna','not present'))\n", | |
| "print(d.get('Daniel','not present'))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Finally there are a few ways to iterate through and query a dictionary:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 7, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "dict_keys(['Bhoomi', 'Matt', 'Alex', 'Anna'])\n", | |
| "\n", | |
| "Bhoomi\n", | |
| "Matt\n", | |
| "Alex\n", | |
| "Anna\n", | |
| "\n", | |
| "Bhoomi green\n", | |
| "Matt purple\n", | |
| "Alex blue\n", | |
| "Anna red\n", | |
| "\n", | |
| "True\n", | |
| "False\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "print(d.keys()) # gives you all of the keys in our dictionary\n", | |
| "\n", | |
| "print()\n", | |
| "\n", | |
| "for key in d: # iterating through a dictionary like this \n", | |
| " # has you iterate through the keys\n", | |
| " print(key)\n", | |
| "\n", | |
| "print() \n", | |
| "\n", | |
| "for key, value in d.items(): # d.items() returns a tuple of \n", | |
| " # (key,value) pairs\n", | |
| " print(key, value)\n", | |
| "print()\n", | |
| "\n", | |
| "# query if something is present in the dictionary keys\n", | |
| "print('Alex' in d)\n", | |
| "print('Daniel' in d)\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Notice here that we used strings for both keys and values in our dictionary. In reality, a dictionary can have integers and floats for keys, and a multitute of things for values. Lets look at how to construct variable lists on the fly as values for each dictionary key." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 8, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Bhoomi likes tofu, eggplant, broccoli.\n", | |
| "Matt likes steak, cupcakes, sausage.\n", | |
| "Alex likes eggs, pasta.\n", | |
| "Anna likes potatoes, salad.\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Here we have our list of people like before, \n", | |
| "# but now also have the foods each person likes.\n", | |
| "# Our goal is to build a dictionary that maps each person \n", | |
| "# to a list of their favorite foods.\n", | |
| "people = ['Alex','Anna','Bhoomi','Matt']\n", | |
| "favfood = [['eggs','pasta'],['potatoes','salad'],['tofu','eggplant','broccoli'],['steak','cupcakes','sausage']]\n", | |
| "d = dict() # establish our empty dictionary as before\n", | |
| "for foods, person in zip(favfood, people):\n", | |
| " for food in foods: # iterate through each list of food items.\n", | |
| " try:\n", | |
| " d[person].append(food)\n", | |
| " except KeyError:\n", | |
| " d[person] = [] \n", | |
| " d[person].append(food)\n", | |
| "\n", | |
| "for person in d:\n", | |
| " print('{} likes {}.'.format(person,', '.join(d[person])))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now I know this is trivial since we could just assign each person to the list of foods instead of iterating through each foods list, but just go with it for the sake of example. Let's go through this step by step. \n", | |
| "\n", | |
| "First we try to append food to d[person]. This will only work if person is in our dictionary and the value d[person] is a list that we can append to. \n", | |
| "\n", | |
| "If person is not in our dictionary d we will get a KeyError. We plan for this by catching our exception. When this happens, person is not in our dictionary yet. Therefore we first establish the person in our dictionary and give its value the empty list construct as designated with the brackets. Then we go ahead and append our first food to the list of foods for that person.\n", | |
| "\n", | |
| "Like all programming, there are often multiple ways to do things. Here is another way to do the exact same thing:\n", | |
| "\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 9, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Bhoomi likes tofu, eggplant, broccoli.\n", | |
| "Matt likes steak, cupcakes, sausage.\n", | |
| "Alex likes eggs, pasta.\n", | |
| "Anna likes potatoes, salad.\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "d = dict()\n", | |
| "for foods, person in zip(favfood, people):\n", | |
| " for food in foods:\n", | |
| " if person in d: # check if the person has been \n", | |
| " # added to the dictionary yet\n", | |
| " d[person].append(food)\n", | |
| " elif person not in d: # if the person has not been added, add them\n", | |
| " d[person] = []\n", | |
| " d[person].append(food)\n", | |
| " \n", | |
| "for person in d:\n", | |
| " print('{} likes {}.'.format(person,', '.join(d[person])))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "This method of adding multiple values to a given key is so common that there is a modified version of a dictionary that does all of this stuff automatically. This is called the defaultdict() within the collections library. Let's see how it works." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 10, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Bhoomi likes tofu, eggplant, broccoli.\n", | |
| "Matt likes steak, cupcakes, sausage.\n", | |
| "Alex likes eggs, pasta.\n", | |
| "Anna likes potatoes, salad.\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "from collections import defaultdict\n", | |
| "d = defaultdict(list)\n", | |
| "for foods, person in zip(favfood, people):\n", | |
| " for food in foods:\n", | |
| " d[person].append(food)\n", | |
| " \n", | |
| "for person in d:\n", | |
| " print('{} likes {}.'.format(person,', '.join(d[person])))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Our defaultdict() automatically knew to check to see if each person was in the dictionary or not, and if not automatically construct a list to append values to. All we had to do was pass in the type we want our values to be (in this case list, but we can also use other python containers such as str, set, and even dict)." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Bonus: defaultdicts of defaultdicts. \n", | |
| "This is a way to make nested dictionaries using the defaultdict() constructor. Things can get pretty difficult to keep track of pretty quickly, but if you're interested read on. What is nice about this is you can build nested key->value relationships on the fly and perform lookups very quickly." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 11, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Bhoomi likes broccoli and that is 60 calories!.\n", | |
| "Bhoomi likes tofu and that is 85 calories!.\n", | |
| "Bhoomi likes eggplant and that is 90 calories!.\n", | |
| "\n", | |
| "Matt likes cupcakes and that is 300 calories!.\n", | |
| "Matt likes steak and that is 450 calories!.\n", | |
| "Matt likes sausage and that is 175 calories!.\n", | |
| "\n", | |
| "Alex likes eggs and that is 200 calories!.\n", | |
| "Alex likes pasta and that is 250 calories!.\n", | |
| "\n", | |
| "Anna likes potatoes and that is 240 calories!.\n", | |
| "Anna likes salad and that is 55 calories!.\n", | |
| "\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Here we will make a dictionary of people\n", | |
| "# The values for each person will be a dictionary of foods.\n", | |
| "# The value for each food will be an integer of food calories.\n", | |
| "from random import choice\n", | |
| "people = ['Alex','Anna','Bhoomi','Matt']\n", | |
| "favfood = [['eggs','pasta'],['potatoes','salad'],\n", | |
| " ['tofu','eggplant','broccoli'],\n", | |
| " ['steak','cupcakes','sausage']]\n", | |
| "calories = {'eggs':100, 'pasta':125, 'potatoes':80, 'salad':55, \n", | |
| " 'tofu':85, 'eggplant':45, 'broccoli':30,\n", | |
| " 'steak':225, 'cupcakes':300, 'sausage':175}\n", | |
| "d = defaultdict(lambda : defaultdict(int)) # here we are making a dictionary \n", | |
| " # of integer defaultdictionaries. \n", | |
| " # Everytime a new key is added \n", | |
| " # to our dictionary, a new \n", | |
| " # defaultdict(int) is constructed.\n", | |
| " \n", | |
| "def hungry(): return choice([True, False]) # function for randomly \n", | |
| " # returning True or False\n", | |
| "for foods, person in zip(favfood, people):\n", | |
| " for food in foods:\n", | |
| " d[person][food] += calories[food] # Add food calories\n", | |
| " while hungry(): # the person is still hungry if True, \n", | |
| " # so add more calories!\n", | |
| " d[person][food] += calories[food]\n", | |
| " \n", | |
| "for person in d:\n", | |
| " for food in d[person]:\n", | |
| " print('{} likes {} and that is {} calories!.'.format(person, food, d[person][food]))\n", | |
| " print()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Notice here how we chained our dictionary keys:\n", | |
| " \n", | |
| " d[person][food] += calories[food]\n", | |
| " \n", | |
| "adds the person key to the outer dictionary d and constructs a defaultdict(int) with the food as a key and the numeric calorie added on as the value. Also notice how to call the value for the calorie count from one food:\n", | |
| "\n", | |
| " d[person][food]\n", | |
| " \n", | |
| "we are chaining the person key to the outer dictionary then the food key to the inner dictionary to get the calorie count. Example:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 12, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "450" | |
| ] | |
| }, | |
| "execution_count": 12, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "d['Matt']['steak']" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.5.0" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 0 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment