Last active
August 29, 2015 14:04
-
-
Save mgiraldo/a68b53175ce5892531bc to your computer and use it in GitHub Desktop.
Finding shape consensus among multiple geo polygons. See: http://nbviewer.ipython.org/gist/mgiraldo/a68b53175ce5892531bc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "metadata": { | |
| "language": "ruby", | |
| "name": "", | |
| "signature": "sha256:8d8903b40719dd38d9ab7362c4532e99994a2c72f48e9ffc0559d21728f87a34" | |
| }, | |
| "nbformat": 3, | |
| "nbformat_minor": 0, | |
| "worksheets": [ | |
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Finding shape consensus among multiple geo polygons\n", | |
| "\n", | |
| "One of the tasks in the [Building Inspector](http://buildinginspector.nypl.org/) is [fixing building footprints](http://buildinginspector.nypl.org/fix). The user is presented a map with an overlaid shape (red dots). The purpose is to draw the correct shape (or shapes, since the red overlay may cover multiple building footprints).\n", | |
| "\n", | |
| "Multiple people receive the same map and overlay. This notebook describes a process to find the resulting consensus (or mean) shape.\n", | |
| "\n", | |
| "Below is an example showing the map, the original polygon shown to each user (red dots) and the resulting polygons drawn (yellow)." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "IRuby.html '<iframe src=\"http://jsfiddle.net/mgiraldo/pdkCb/3/embedded/result/\" width=500 height=400></iframe>'" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<iframe src=\"http://jsfiddle.net/mgiraldo/pdkCb/3/embedded/result/\" width=500 height=400></iframe>" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 1, | |
| "text": [ | |
| "\"<iframe src=\\\"http://jsfiddle.net/mgiraldo/pdkCb/3/embedded/result/\\\" width=500 height=400></iframe>\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 1 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "It is hard to see but there are 11 yellow polygons: one rectangle in the lower left part, one for the upper right part (both wrong), and 9 for the complete L-shaped building.\n", | |
| "\n", | |
| "# Requirements\n", | |
| "\n", | |
| "The process to find the geometry that best summarizes what is drawn by users has to take into account:\n", | |
| "\n", | |
| "1. an overlay may span _multiple_ polygons (red dots covering more than one building)\n", | |
| "1. polygons may have any number of vertices greater or equal to three\n", | |
| "1. users will not always draw the polygons the same way (eg: use more or fewer points)\n", | |
| "\n", | |
| "The process described in this notebook makes use of the [DBSCAN clustering algorithm](https://en.wikipedia.org/wiki/DBSCAN) to find an unknown amount of dense regions of points and determine the resulting geometries from there. The _input_ to this process will be a GeoJSON FeatureCollection containing all the polygons drawn by contributors that are associated to a given red overlay. the expected _output_ is a list of geo point arrays with the summary shapes determined by the algorithm.\n", | |
| "\n", | |
| "**All the necessary code is included** and should be executable by any machine that has the required Ruby gems installed. _This code was tested on Ruby 2.1.0._\n", | |
| "\n", | |
| "# Process\n", | |
| "\n", | |
| "First, we need the [RGeo](https://github.com/rgeo/rgeo) package along with its [GeoJSON component](https://github.com/rgeo/rgeo-geojson):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": true, | |
| "input": [ | |
| "require 'rgeo'\n", | |
| "require 'rgeo-geojson'" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 2, | |
| "text": [ | |
| "true" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 2 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We will use a [Ruby implementation](https://github.com/matiasinsaurralde/dbscan) of the [DBSCAN clustering algorithm](https://en.wikipedia.org/wiki/DBSCAN)." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "require 'dbscan'" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 3, | |
| "text": [ | |
| "true" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 3 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "For visualization convenience in this notebook we will also use the awesome [Nyaplot](https://github.com/domitry/nyaplot), a D3-powered visualization library. I had to manually build it according to [the instructions](https://github.com/domitry/nyaplot#installation) since it is not yet in RubyGems.org." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "require 'nyaplot'" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 4, | |
| "text": [ | |
| "true" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 4 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Initialize Nyaplot to work in this IRuby Notebook:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "Nyaplot.init_iruby" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<script>\n", | |
| "if(window['d3'] === undefined ||\n", | |
| " window['Nyaplot'] === undefined){\n", | |
| " var path = {\"d3\":\"http://d3js.org/d3.v3.min\"};\n", | |
| "\n", | |
| "\n", | |
| "\n", | |
| " var shim = {\"d3\":{\"exports\":\"d3\"}};\n", | |
| "\n", | |
| " require.config({paths: path, shim:shim});\n", | |
| "\n", | |
| "\n", | |
| "require(['d3'], function(d3){window['d3']=d3;console.log('finished loading d3');\n", | |
| "\n", | |
| "\tvar script = d3.select(\"head\")\n", | |
| "\t .append(\"script\")\n", | |
| "\t .attr(\"src\", \"https://rawgit.com/domitry/Nyaplotjs/master/release/nyaplot.js\")\n", | |
| "\t .attr(\"async\", true);\n", | |
| "\n", | |
| "\tscript[0][0].onload = script[0][0].onreadystatechange = function(){\n", | |
| "\t var event = document.createEvent(\"HTMLEvents\");\n", | |
| "\t event.initEvent(\"load_nyaplot\",false,false);\n", | |
| "\t window.dispatchEvent(event);\n", | |
| "\t console.log('Finished loading Nyaplotjs');\n", | |
| "\t};\n", | |
| "\n", | |
| "\n", | |
| "});\n", | |
| "}\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 5, | |
| "text": [ | |
| "\"<script>\\nif(window['d3'] === undefined ||\\n window['Nyaplot'] === undefined){\\n var path = {\\\"d3\\\":\\\"http://d3js.org/d3.v3.min\\\"};\\n\\n\\n\\n var shim = {\\\"d3\\\":{\\\"exports\\\":\\\"d3\\\"}};\\n\\n require.config({paths: path, shim:shim});\\n\\n\\nrequire(['d3'], function(d3){window['d3']=d3;console.log('finished loading d3');\\n\\n\\tvar script = d3.select(\\\"head\\\")\\n\\t .append(\\\"script\\\")\\n\\t .attr(\\\"src\\\", \\\"https://rawgit.com/domitry/Nyaplotjs/master/release/nyaplot.js\\\")\\n\\t .attr(\\\"async\\\", true);\\n\\n\\tscript[0][0].onload = script[0][0].onreadystatechange = function(){\\n\\t var event = document.createEvent(\\\"HTMLEvents\\\");\\n\\t event.initEvent(\\\"load_nyaplot\\\",false,false);\\n\\t window.dispatchEvent(event);\\n\\t console.log('Finished loading Nyaplotjs');\\n\\t};\\n\\n\\n});\\n}\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 5 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "This is the GeoJSON that describes the shapes that have been drawn by the different contributors:\n", | |
| "\n", | |
| "_Note: this GeoJSON will not validate in [GeoJSONLint](http://geojsonlint.com/) because first and last points do not match_" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "geomstr = '{\"type\":\"FeatureCollection\",\"features\":[{\"type\":\"Feature\",\"properties\":{\"user_id\":638},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98620970547199,40.7356342514617],[-73.98627072572708,40.735547874977094],[-73.98632504045963,40.73557226364293],[-73.98622445762157,40.73570995781772],[-73.9861835539341,40.73569268254945],[-73.98621775209902,40.735640856717666]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":666},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98620769381522,40.73563526765495],[-73.9862660318613,40.735547874977094],[-73.98632504045963,40.735570739351566],[-73.98622579872608,40.73570944972167],[-73.98618154227734,40.73569217445325],[-73.98621775209902,40.73563933242788]]]}},{\"type\":\"Feature\",\"properties\":{\"session_id\":\"79e7ee062a9e0333926e3e1fdc3e92db\"},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98632369935513,40.735570739351566],[-73.98622512817383,40.73570944972167],[-73.98618154227734,40.73569014206842],[-73.98621909320354,40.735640856717666],[-73.98620970547199,40.73563526765495],[-73.98627005517483,40.73554889117169]]]}},{\"type\":\"Feature\",\"properties\":{\"session_id\":\"3d3003b26bb6b2f3b9577924b9ed5e0e\"},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98621842265129,40.7356423810074],[-73.98620903491974,40.73563577575159],[-73.98627139627934,40.735547874977094],[-73.98632436990738,40.735571755545806],[-73.98622579872608,40.73570995781772],[-73.98618087172508,40.735689633972214]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":596},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98626938462257,40.73554889117167],[-73.98632369935513,40.735572771740024],[-73.98622445762157,40.73570894162559],[-73.98618154227734,40.73569065016463],[-73.98621775209902,40.735640856717666],[-73.98620836436749,40.735634251461676]]]}},{\"type\":\"Feature\",\"properties\":{\"session_id\":\"0afaf74383ce51aceba02fc49ce5a9e3\"},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98621775209902,40.73563984052446],[-73.98620836436749,40.73563272717173],[-73.98626938462257,40.735550415463514],[-73.98632235825062,40.73557124744871],[-73.98622360456956,40.73570641325812],[-73.98618768252459,40.73568957578454]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":538},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98632571101189,40.735571755545806],[-73.98622378706932,40.73570995781772],[-73.98618288338184,40.73569268254945],[-73.98621775209902,40.73564034862108],[-73.9862110465765,40.7356362838482],[-73.98627005517483,40.735550923560815]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":580},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98632436990738,40.73557124744871],[-73.98626066744328,40.7356581319994],[-73.98625999689102,40.7356581319994],[-73.98620903491974,40.735634759558316],[-73.98626804351805,40.735547874977094]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":580},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98626133799553,40.7356581319994],[-73.98622579872608,40.73570944972167],[-73.98618154227734,40.73569166635704],[-73.98621842265129,40.73563984052446]]]}},{\"type\":\"Feature\",\"properties\":{\"user_id\":548},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98620970547199,40.73563475955834],[-73.98627005517483,40.73554990736624],[-73.98632369935513,40.735571755545806],[-73.98622360456956,40.73570641325812],[-73.9861848950386,40.735689633972214],[-73.98621842265129,40.735640856717666]]]}},{\"type\":\"Feature\",\"properties\":{\"session_id\":\"53056025663f6d6564a39975971cb87c\"},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98621909320354,40.735638316234656],[-73.98620836436749,40.7356362838482],[-73.98620769381522,40.73563577575159],[-73.98627005517483,40.73554939926897],[-73.98632302880287,40.73557023125444],[-73.98622360456956,40.73570641325812],[-73.98617953062057,40.735689633972214]]]}}]}'" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 6, | |
| "text": [ | |
| "\"{\\\"type\\\":\\\"FeatureCollection\\\",\\\"features\\\":[{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":638},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98620970547199,40.7356342514617],[-73.98627072572708,40.735547874977094],[-73.98632504045963,40.73557226364293],[-73.98622445762157,40.73570995781772],[-73.9861835539341,40.73569268254945],[-73.98621775209902,40.735640856717666]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":666},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98620769381522,40.73563526765495],[-73.9862660318613,40.735547874977094],[-73.98632504045963,40.735570739351566],[-73.98622579872608,40.73570944972167],[-73.98618154227734,40.73569217445325],[-73.98621775209902,40.73563933242788]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"session_id\\\":\\\"79e7ee062a9e0333926e3e1fdc3e92db\\\"},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98632369935513,40.735570739351566],[-73.98622512817383,40.73570944972167],[-73.98618154227734,40.73569014206842],[-73.98621909320354,40.735640856717666],[-73.98620970547199,40.73563526765495],[-73.98627005517483,40.73554889117169]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"session_id\\\":\\\"3d3003b26bb6b2f3b9577924b9ed5e0e\\\"},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98621842265129,40.7356423810074],[-73.98620903491974,40.73563577575159],[-73.98627139627934,40.735547874977094],[-73.98632436990738,40.735571755545806],[-73.98622579872608,40.73570995781772],[-73.98618087172508,40.735689633972214]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":596},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98626938462257,40.73554889117167],[-73.98632369935513,40.735572771740024],[-73.98622445762157,40.73570894162559],[-73.98618154227734,40.73569065016463],[-73.98621775209902,40.735640856717666],[-73.98620836436749,40.735634251461676]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"session_id\\\":\\\"0afaf74383ce51aceba02fc49ce5a9e3\\\"},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98621775209902,40.73563984052446],[-73.98620836436749,40.73563272717173],[-73.98626938462257,40.735550415463514],[-73.98632235825062,40.73557124744871],[-73.98622360456956,40.73570641325812],[-73.98618768252459,40.73568957578454]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":538},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98632571101189,40.735571755545806],[-73.98622378706932,40.73570995781772],[-73.98618288338184,40.73569268254945],[-73.98621775209902,40.73564034862108],[-73.9862110465765,40.7356362838482],[-73.98627005517483,40.735550923560815]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":580},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98632436990738,40.73557124744871],[-73.98626066744328,40.7356581319994],[-73.98625999689102,40.7356581319994],[-73.98620903491974,40.735634759558316],[-73.98626804351805,40.735547874977094]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":580},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98626133799553,40.7356581319994],[-73.98622579872608,40.73570944972167],[-73.98618154227734,40.73569166635704],[-73.98621842265129,40.73563984052446]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"user_id\\\":548},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98620970547199,40.73563475955834],[-73.98627005517483,40.73554990736624],[-73.98632369935513,40.735571755545806],[-73.98622360456956,40.73570641325812],[-73.9861848950386,40.735689633972214],[-73.98621842265129,40.735640856717666]]]}},{\\\"type\\\":\\\"Feature\\\",\\\"properties\\\":{\\\"session_id\\\":\\\"53056025663f6d6564a39975971cb87c\\\"},\\\"geometry\\\":{\\\"type\\\":\\\"Polygon\\\",\\\"coordinates\\\":[[[-73.98621909320354,40.735638316234656],[-73.98620836436749,40.7356362838482],[-73.98620769381522,40.73563577575159],[-73.98627005517483,40.73554939926897],[-73.98632302880287,40.73557023125444],[-73.98622360456956,40.73570641325812],[-73.98617953062057,40.735689633972214]]]}}]}\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 6 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We decode the GeoJSON into a `RGeo::GeoJSON` structure (see the [RGeo::GeoJSON docs](http://rdoc.info/github/rgeo/rgeo-geojson/frames)):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "geocollection = RGeo::GeoJSON.decode(geomstr, :json_parser => :json)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 7, | |
| "text": [ | |
| "#<RGeo::GeoJSON::FeatureCollection:0x80d363fc>" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 7 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We wrap this in a function for convenience:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def parse_geojson(json)\n", | |
| " RGeo::GeoJSON.decode(json, :json_parser => :json)\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 8, | |
| "text": [ | |
| ":parse_geojson" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 8 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "This structure is now a group of [features](http://rdoc.info/github/rgeo/rgeo-geojson/RGeo/GeoJSON/Feature), each with an [RGeo::Geos::CAPIPolygonImpl](http://rdoc.info/github/rgeo/rgeo/RGeo/Geos/CAPIPolygonImpl) geometry describing each polygon, among other properties (see the [RGeo docs](http://rdoc.info/github/rgeo/rgeo/frames)):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "geocollection.first.geometry" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 9, | |
| "text": [ | |
| "#<RGeo::Geos::CAPIPolygonImpl:0x80d3be74 \"POLYGON ((-73.98620970547199 40.7356342514617, -73.98627072572708 40.735547874977094, -73.98632504045963 40.73557226364293, -73.98622445762157 40.73570995781772, -73.9861835539341 40.73569268254945, -73.98621775209902 40.735640856717666, -73.98620970547199 40.7356342514617))\">" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 9 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Algorithm\n", | |
| "\n", | |
| "The main logic behind this process is as follows:\n", | |
| "\n", | |
| "1. cluster all the polygons by their centroids (similar-shaped polygons should have similar centroids<sup>[1]</sup>, clustering will let us identify outliers)\n", | |
| "1. only use clusters that have three or more centroids (three or more people drew similar-shaped polygons)\n", | |
| "1. for each cluster:\n", | |
| " 1. cluster the vertices of its polygons\n", | |
| " 1. find the mean vertex describing each cluster\n", | |
| " 1. connect those mean vertices in the most likely order\n", | |
| " 1. verify that the connected polygon makes sense (will explain better below)\n", | |
| "\n", | |
| "[1] _different polygons might also have similar centroids but we're skipping this for now :)_\n", | |
| "\n", | |
| "Since DBSCAN works with number arrays, we need to convert the complex RGeo structures. Below a simple centroid-extraction function:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def get_centroid(poly_feature)\n", | |
| " return if (poly_feature.geometry.geometry_type.type_name != \"Polygon\")\n", | |
| " c = poly_feature.geometry.centroid\n", | |
| " return [c.x, c.y]\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 10, | |
| "text": [ | |
| ":get_centroid" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 10 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Let's test it with the first polygon in the collection:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "centroid = get_centroid(geocollection.first)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 11, | |
| "text": [ | |
| "[-73.98625268168838, 40.73562601945317]" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 11 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now we need a convenience function to get all the centroids of the collection. We will make it a hash because we later need to be able to go back to this list to extract its corresponding set of polygons and a hash was the way I found most convenient:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def get_all_centroids(geom)\n", | |
| " centroids = {}\n", | |
| " geom.each_with_index do |poly,index|\n", | |
| " next if (poly.geometry.geometry_type.type_name != \"Polygon\")\n", | |
| " centroids[index] = get_centroid(poly)\n", | |
| " end\n", | |
| " return centroids\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 12, | |
| "text": [ | |
| ":get_all_centroids" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 12 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Test again:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "centroids = get_all_centroids(geocollection)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 13, | |
| "text": [ | |
| "{0=>[-73.98625268168838, 40.73562601945317], 1=>[-73.98625173238652, 40.735625569382876], 2=>[-73.9862518966646, 40.73562642272427], 3=>[-73.986252242017, 40.735626656082445], 4=>[-73.98625152460835, 40.735626229414], 5=>[-73.98625207318744, 40.73562418649854], 6=>[-73.98625258509149, 40.7356272053874], 7=>[-73.98626592099406, 40.735602617283476], 8=>[-73.9862216645921, 40.73567482334759], 9=>[-73.98625254867669, 40.735624721075084], 10=>[-73.98625077341322, 40.73562552211442]}" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 13 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "A simple plot of all the centroids using Nyaplot:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "plot = Nyaplot::Plot.new\n", | |
| "plot.width(400)\n", | |
| "plot.height(400)\n", | |
| "plot.zoom(true)\n", | |
| "plot.rotate_x_label(-60)\n", | |
| "points_x = centroids.map { |p| p[1][0] }\n", | |
| "points_y = centroids.map { |p| p[1][1] }\n", | |
| "df = Nyaplot::DataFrame.new({x:points_x,y:points_y})\n", | |
| "# add some padding\n", | |
| "xmin = points_x.min - 1e-5\n", | |
| "xmax = points_x.max + 1e-5\n", | |
| "ymin = points_y.min - 1e-5\n", | |
| "ymax = points_y.max + 1e-5\n", | |
| "plot.xrange([xmin,xmax])\n", | |
| "plot.yrange([ymin,ymax])\n", | |
| "# end padding\n", | |
| "sc = plot.add_with_df(df, :scatter, :x, :y)\n", | |
| "plot.show" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<div id='vis-e5936183-5798-4eb7-b7c6-7c43d7186bf2'></div>\n", | |
| "<script>\n", | |
| "(function(){\n", | |
| " var render = function(){\n", | |
| " var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\"},\"data\":\"4f1cf4be-6ced-46d8-aaf6-69cc3be1f335\"}],\"options\":{\"width\":400,\"height\":400,\"zoom\":true,\"rotate_x_label\":-60,\"xrange\":[-73.98627592099406,-73.98621166459209],\"yrange\":[40.73559261728347,40.73568482334759]}}],\"data\":{\"4f1cf4be-6ced-46d8-aaf6-69cc3be1f335\":[{\"x\":-73.98625268168838,\"y\":40.73562601945317},{\"x\":-73.98625173238652,\"y\":40.735625569382876},{\"x\":-73.9862518966646,\"y\":40.73562642272427},{\"x\":-73.986252242017,\"y\":40.735626656082445},{\"x\":-73.98625152460835,\"y\":40.735626229414},{\"x\":-73.98625207318744,\"y\":40.73562418649854},{\"x\":-73.98625258509149,\"y\":40.7356272053874},{\"x\":-73.98626592099406,\"y\":40.735602617283476},{\"x\":-73.9862216645921,\"y\":40.73567482334759},{\"x\":-73.98625254867669,\"y\":40.735624721075084},{\"x\":-73.98625077341322,\"y\":40.73562552211442}]},\"extension\":[]}\n", | |
| " Nyaplot.core.parse(model, '#vis-e5936183-5798-4eb7-b7c6-7c43d7186bf2');\n", | |
| " };\n", | |
| " if(window['Nyaplot']==undefined){\n", | |
| " window.addEventListener('load_nyaplot', render, false);\n", | |
| "\treturn;\n", | |
| " }\n", | |
| " render();\n", | |
| "})();\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 14, | |
| "text": [ | |
| "\"<div id='vis-e5936183-5798-4eb7-b7c6-7c43d7186bf2'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\"},\\\"data\\\":\\\"4f1cf4be-6ced-46d8-aaf6-69cc3be1f335\\\"}],\\\"options\\\":{\\\"width\\\":400,\\\"height\\\":400,\\\"zoom\\\":true,\\\"rotate_x_label\\\":-60,\\\"xrange\\\":[-73.98627592099406,-73.98621166459209],\\\"yrange\\\":[40.73559261728347,40.73568482334759]}}],\\\"data\\\":{\\\"4f1cf4be-6ced-46d8-aaf6-69cc3be1f335\\\":[{\\\"x\\\":-73.98625268168838,\\\"y\\\":40.73562601945317},{\\\"x\\\":-73.98625173238652,\\\"y\\\":40.735625569382876},{\\\"x\\\":-73.9862518966646,\\\"y\\\":40.73562642272427},{\\\"x\\\":-73.986252242017,\\\"y\\\":40.735626656082445},{\\\"x\\\":-73.98625152460835,\\\"y\\\":40.735626229414},{\\\"x\\\":-73.98625207318744,\\\"y\\\":40.73562418649854},{\\\"x\\\":-73.98625258509149,\\\"y\\\":40.7356272053874},{\\\"x\\\":-73.98626592099406,\\\"y\\\":40.735602617283476},{\\\"x\\\":-73.9862216645921,\\\"y\\\":40.73567482334759},{\\\"x\\\":-73.98625254867669,\\\"y\\\":40.735624721075084},{\\\"x\\\":-73.98625077341322,\\\"y\\\":40.73562552211442}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-e5936183-5798-4eb7-b7c6-7c43d7186bf2');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 14 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": true, | |
| "input": [ | |
| "dists = []\n", | |
| "done = {}\n", | |
| "centroids.each_with_index do |cc1,i|\n", | |
| " centroids.each_with_index do |cc2,j|\n", | |
| " c1 = cc1[1]\n", | |
| " c2 = cc2[1]\n", | |
| " dists.push({:dist=>Math.hypot(c1[0]-c2[0],c1[1]-c2[1]),:from=>i,:to=>j,:from_centroid=>c1,:to_centroid=>c2}) if (c1 != c2 && !done[[c2,c1]]) \n", | |
| " done[[c1,c2]] = true\n", | |
| " end\n", | |
| "end\n", | |
| "dists = dists.sort_by!{|k| k[:dist]}\n", | |
| "dist_df = Nyaplot::DataFrame.new(dists)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<table><tr><th>dist</th><th>from</th><th>to</th><th>from_centroid</th><th>to_centroid</th></tr><tr><td>4.1680249477628687e-07</td><td>2</td><td>3</td><td>[-73.9862518966646, 40.73562642272427]</td><td>[-73.986252242017, 40.735626656082445]</td></tr><tr><td>4.1927880127312373e-07</td><td>2</td><td>4</td><td>[-73.9862518966646, 40.73562642272427]</td><td>[-73.98625152460835, 40.735626229414]</td></tr><tr><td>6.476388201422145e-07</td><td>3</td><td>6</td><td>[-73.986252242017, 40.735626656082445]</td><td>[-73.98625258509149, 40.7356272053874]</td></tr><tr><td>6.919630457708901e-07</td><td>1</td><td>4</td><td>[-73.98625173238652, 40.735625569382876]</td><td>[-73.98625152460835, 40.735626229414]</td></tr><tr><td>7.154453870992346e-07</td><td>5</td><td>9</td><td>[-73.98625207318744, 40.73562418649854]</td><td>[-73.98625254867669, 40.735624721075084]</td></tr><tr><td>7.736974578659688e-07</td><td>0</td><td>3</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.986252242017, 40.735626656082445]</td></tr><tr><td>8.346982305084655e-07</td><td>3</td><td>4</td><td>[-73.986252242017, 40.735626656082445]</td><td>[-73.98625152460835, 40.735626229414]</td></tr><tr><td>8.690102573992017e-07</td><td>1</td><td>2</td><td>[-73.98625173238652, 40.735625569382876]</td><td>[-73.9862518966646, 40.73562642272427]</td></tr><tr><td>8.825474076951457e-07</td><td>0</td><td>2</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.9862518966646, 40.73562642272427]</td></tr><tr><td>9.601375433120375e-07</td><td>1</td><td>10</td><td>[-73.98625173238652, 40.735625569382876]</td><td>[-73.98625077341322, 40.73562552211442]</td></tr><tr><td>1.0317784775226883e-06</td><td>4</td><td>10</td><td>[-73.98625152460835, 40.735626229414]</td><td>[-73.98625077341322, 40.73562552211442]</td></tr><tr><td>1.0423498270990819e-06</td><td>2</td><td>6</td><td>[-73.9862518966646, 40.73562642272427]</td><td>[-73.98625258509149, 40.7356272053874]</td></tr><tr><td>1.0505890250970463e-06</td><td>0</td><td>1</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.98625173238652, 40.735625569382876]</td></tr><tr><td>1.1759752372883875e-06</td><td>0</td><td>4</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.98625152460835, 40.735626229414]</td></tr><tr><td>1.1772662180684655e-06</td><td>1</td><td>9</td><td>[-73.98625173238652, 40.735625569382876]</td><td>[-73.98625254867669, 40.735624721075084]</td></tr><tr><td>1.1898617396243328e-06</td><td>0</td><td>6</td><td>[-73.98625268168838, 40.73562601945317]</td><td>[-73.98625258509149, 40.7356272053874]</td></tr><tr><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td></tr><tr><td>8.468969718424401e-05</td><td>7</td><td>8</td><td>[-73.98626592099406, 40.735602617283476]</td><td>[-73.9862216645921, 40.73567482334759]</td></tr></table>" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 15, | |
| "text": [ | |
| "#<Nyaplot::DataFrame:0x000001019de188 @name=\"6a47c46f-26e0-417d-8978-9ffbfef4b0cb\", @rows=[{\"dist\"=>4.1680249477628687e-07, \"from\"=>2, \"to\"=>3, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.986252242017, 40.735626656082445]}, {\"dist\"=>4.1927880127312373e-07, \"from\"=>2, \"to\"=>4, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625152460835, 40.735626229414]}, {\"dist\"=>6.476388201422145e-07, \"from\"=>3, \"to\"=>6, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>6.919630457708901e-07, \"from\"=>1, \"to\"=>4, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625152460835, 40.735626229414]}, {\"dist\"=>7.154453870992346e-07, \"from\"=>5, \"to\"=>9, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>7.736974578659688e-07, \"from\"=>0, \"to\"=>3, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.986252242017, 40.735626656082445]}, {\"dist\"=>8.346982305084655e-07, \"from\"=>3, \"to\"=>4, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625152460835, 40.735626229414]}, {\"dist\"=>8.690102573992017e-07, \"from\"=>1, \"to\"=>2, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.9862518966646, 40.73562642272427]}, {\"dist\"=>8.825474076951457e-07, \"from\"=>0, \"to\"=>2, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.9862518966646, 40.73562642272427]}, {\"dist\"=>9.601375433120375e-07, \"from\"=>1, \"to\"=>10, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.0317784775226883e-06, \"from\"=>4, \"to\"=>10, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.0423498270990819e-06, \"from\"=>2, \"to\"=>6, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>1.0505890250970463e-06, \"from\"=>0, \"to\"=>1, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625173238652, 40.735625569382876]}, {\"dist\"=>1.1759752372883875e-06, \"from\"=>0, \"to\"=>4, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625152460835, 40.735626229414]}, {\"dist\"=>1.1772662180684655e-06, \"from\"=>1, \"to\"=>9, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.1898617396243328e-06, \"from\"=>0, \"to\"=>6, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>1.2002662963773613e-06, \"from\"=>1, \"to\"=>3, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.986252242017, 40.735626656082445]}, {\"dist\"=>1.3051734635243496e-06, \"from\"=>0, \"to\"=>9, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.424259229759705e-06, \"from\"=>1, \"to\"=>5, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>1.4397193384948688e-06, \"from\"=>2, \"to\"=>10, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.441231617061079e-06, \"from\"=>4, \"to\"=>6, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>1.8222869506551436e-06, \"from\"=>2, \"to\"=>9, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.8231297971933801e-06, \"from\"=>4, \"to\"=>9, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.844889313547645e-06, \"from\"=>1, \"to\"=>6, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>1.8554461894036756e-06, \"from\"=>3, \"to\"=>10, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.8636745452851083e-06, \"from\"=>5, \"to\"=>10, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.9313197748456895e-06, \"from\"=>0, \"to\"=>5, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>1.947620190077808e-06, \"from\"=>9, \"to\"=>10, \"from_centroid\"=>[-73.98625254867669, 40.735624721075084], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>1.9591563623025234e-06, \"from\"=>3, \"to\"=>9, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>1.972019255979438e-06, \"from\"=>0, \"to\"=>10, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>2.115287830795299e-06, \"from\"=>4, \"to\"=>5, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>2.2431820796116284e-06, \"from\"=>2, \"to\"=>5, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>2.4729711093657763e-06, \"from\"=>6, \"to\"=>10, \"from_centroid\"=>[-73.98625258509149, 40.7356272053874], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>2.475348072708112e-06, \"from\"=>3, \"to\"=>5, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98625207318744, 40.73562418649854]}, {\"dist\"=>2.4845791864857814e-06, \"from\"=>6, \"to\"=>9, \"from_centroid\"=>[-73.98625258509149, 40.7356272053874], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>3.0619823175285526e-06, \"from\"=>5, \"to\"=>6, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.98625258509149, 40.7356272053874]}, {\"dist\"=>2.563187052301914e-05, \"from\"=>5, \"to\"=>7, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.5834017791549575e-05, \"from\"=>7, \"to\"=>9, \"from_centroid\"=>[-73.98626592099406, 40.735602617283476], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>2.688755773883865e-05, \"from\"=>0, \"to\"=>7, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.698361448529006e-05, \"from\"=>1, \"to\"=>7, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.7460525955903987e-05, \"from\"=>7, \"to\"=>10, \"from_centroid\"=>[-73.98626592099406, 40.735602617283476], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>2.7629347229741968e-05, \"from\"=>2, \"to\"=>7, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.7654812048911204e-05, \"from\"=>4, \"to\"=>7, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.765824052887831e-05, \"from\"=>3, \"to\"=>7, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>2.797179207560266e-05, \"from\"=>6, \"to\"=>7, \"from_centroid\"=>[-73.98625258509149, 40.7356272053874], \"to_centroid\"=>[-73.98626592099406, 40.735602617283476]}, {\"dist\"=>5.677629272142381e-05, \"from\"=>6, \"to\"=>8, \"from_centroid\"=>[-73.98625258509149, 40.7356272053874], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.7034997606567785e-05, \"from\"=>4, \"to\"=>8, \"from_centroid\"=>[-73.98625152460835, 40.735626229414], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.7053171211386354e-05, \"from\"=>3, \"to\"=>8, \"from_centroid\"=>[-73.986252242017, 40.735626656082445], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.706661497797782e-05, \"from\"=>2, \"to\"=>8, \"from_centroid\"=>[-73.9862518966646, 40.73562642272427], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.725325369972539e-05, \"from\"=>8, \"to\"=>10, \"from_centroid\"=>[-73.9862216645921, 40.73567482334759], \"to_centroid\"=>[-73.98625077341322, 40.73562552211442]}, {\"dist\"=>5.7706371411325726e-05, \"from\"=>1, \"to\"=>8, \"from_centroid\"=>[-73.98625173238652, 40.735625569382876], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.782629481832627e-05, \"from\"=>0, \"to\"=>8, \"from_centroid\"=>[-73.98625268168838, 40.73562601945317], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>5.88563029015269e-05, \"from\"=>8, \"to\"=>9, \"from_centroid\"=>[-73.9862216645921, 40.73567482334759], \"to_centroid\"=>[-73.98625254867669, 40.735624721075084]}, {\"dist\"=>5.906583744015411e-05, \"from\"=>5, \"to\"=>8, \"from_centroid\"=>[-73.98625207318744, 40.73562418649854], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}, {\"dist\"=>8.468969718424401e-05, \"from\"=>7, \"to\"=>8, \"from_centroid\"=>[-73.98626592099406, 40.735602617283476], \"to_centroid\"=>[-73.9862216645921, 40.73567482334759]}]>" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 15 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 1. Clustering centroids\n", | |
| "\n", | |
| "We can see here how the centroids reflect the three different basic shapes drawn by contributors above: the lone centroids for the upper-right and lower-left rectangles and the group of nine centroids for the L-shaped polygons in the \"center\".\n", | |
| "\n", | |
| "The problem now is finding a good minimum distance between centroids:\n", | |
| "\n", | |
| "- **big** enough to cover nearby centroids but also\n", | |
| "- **small** enough to _not_ group polygons that don't belong with each other\n", | |
| "\n", | |
| "Let's create a table to see just how close/far these centroids are from each other (standard euclidean distance: $\\sqrt{((\\Delta x)^2+(\\Delta y)^2)}$). Notice that, since geographic metric units have a _lot_ of significant digits (numbers to the right of the decimal point), we are dealing with distances smaller than $10^{-6}$: " | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "From the table (which is sorted by closest points first) we can see that the top 9 results are under $10^{-7}$ units away from each other (0.000001).\n", | |
| "\n", | |
| "## The DBSCAN algorithm\n", | |
| "\n", | |
| "To understand how clusters are formed, it is useful to understand how the [DBSCAN clustering algorithm](https://en.wikipedia.org/wiki/DBSCAN#Algorithm) works:\n", | |
| "\n", | |
| "> DBSCAN requires two parameters: \u03b5 (eps) and the minimum number of points (min_points) required to form a dense region. It starts with an arbitrary starting point that has not been visited. This point's \u03b5-neighborhood is retrieved, and if it contains sufficiently many points, a cluster is started. Otherwise, the point is labeled as noise. Note that this point might later be found in a sufficiently sized \u03b5-environment of a different point and hence be made part of a cluster.\n", | |
| "\n", | |
| "> If a point is found to be a dense part of a cluster, its \u03b5-neighborhood is also part of that cluster. Hence, all points that are found within the \u03b5-neighborhood are added, as is their own \u03b5-neighborhood when they are also dense. This process continues until the density-connected cluster is completely found. Then, a new unvisited point is retrieved and processed, leading to the discovery of a further cluster or noise.\n", | |
| "\n", | |
| "By playing around with different sets of polygons I came to a general \u03b5 of $1.8(10^{-6})$ and a `min_points` of 2 for **centroid clusters** (polygon vertex clusters have different input values as we will see below).\n", | |
| "\n", | |
| "This is the resulting centroid-clustering function:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def cluster_centroids(centroids, epsilon=1.8e-06, min_points=2)\n", | |
| " dbscan = DBSCAN( centroids.map{|c| c[1]}, :epsilon => epsilon, :min_points => min_points, :distance => :euclidean_distance )\n", | |
| " return dbscan.results.select{|k,v| k != -1} # omit the non-cluster\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 16, | |
| "text": [ | |
| ":cluster_centroids" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 16 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Let's test it:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "centroid_clusters = cluster_centroids(centroids)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 17, | |
| "text": [ | |
| "{0=>[[-73.98625268168838, 40.73562601945317], [-73.98625173238652, 40.735625569382876], [-73.9862518966646, 40.73562642272427], [-73.986252242017, 40.735626656082445], [-73.98625152460835, 40.735626229414], [-73.98625258509149, 40.7356272053874], [-73.98625254867669, 40.735624721075084], [-73.98625207318744, 40.73562418649854], [-73.98625077341322, 40.73562552211442]]}" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 17 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "The function returns a hash with whose `[-1]` key (if any) contains all the points that did not belong to a cluster and `[0..n]` contain the different clusters. In this example there is only one cluster, `centroid_clusters[0]` and the rejected `[-1]` non-cluster.\n", | |
| "\n", | |
| "Let's define a cluster plotting function and plot this (notice the \"disappearance\" of the two outliers that are being ignored by the function):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def plot_clusters(clusters)\n", | |
| " plot = Nyaplot::Plot.new\n", | |
| " plot.width(300)\n", | |
| " plot.height(400)\n", | |
| " plot.zoom(true)\n", | |
| " plot.rotate_x_label(-60)\n", | |
| " pts = clusters.map{|c| c[1]}.flatten(1)\n", | |
| " # add some padding\n", | |
| " xmin = pts.map {|p| p[0]}.min - 1e-5\n", | |
| " xmax = pts.map {|p| p[0]}.max + 1e-5\n", | |
| " ymin = pts.map {|p| p[1]}.min - 1e-5\n", | |
| " ymax = pts.map {|p| p[1]}.max + 1e-5\n", | |
| " plot.xrange([xmin,xmax])\n", | |
| " plot.yrange([ymin,ymax])\n", | |
| " # now plot\n", | |
| " clusters.each do |cluster|\n", | |
| " if cluster[0] != -1 # ignore cluster -1 because not enough points\n", | |
| " cluster_x = cluster[1].map { |c| c[0] }\n", | |
| " cluster_y = cluster[1].map { |c| c[1] }\n", | |
| " names = cluster[1].map { |c| cluster[0] }\n", | |
| " df = Nyaplot::DataFrame.new({x:cluster_x,y:cluster_y,cluster:names})\n", | |
| " sc = plot.add_with_df(df, :scatter, :x, :y)\n", | |
| " sc.tooltip_contents([:cluster])\n", | |
| " color = \"#\"+ \"%06x\" % (rand * 0xffffff)\n", | |
| " sc.color(color)\n", | |
| " end\n", | |
| " end\n", | |
| " plot.show\n", | |
| " return plot\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 18, | |
| "text": [ | |
| ":plot_clusters" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 18 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "plot = plot_clusters(centroid_clusters)\n", | |
| "plot.show" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<div id='vis-28fb465a-9286-4820-9c8c-b61a1ec17541'></div>\n", | |
| "<script>\n", | |
| "(function(){\n", | |
| " var render = function(){\n", | |
| " var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#130783\"},\"data\":\"9a840fd3-526e-4c5f-840f-84cba5706672\"}],\"options\":{\"width\":300,\"height\":400,\"zoom\":true,\"rotate_x_label\":-60,\"xrange\":[-73.98626268168839,-73.98624077341321],\"yrange\":[40.73561418649854,40.735637205387405]}}],\"data\":{\"9a840fd3-526e-4c5f-840f-84cba5706672\":[{\"x\":-73.98625268168838,\"y\":40.73562601945317,\"cluster\":0},{\"x\":-73.98625173238652,\"y\":40.735625569382876,\"cluster\":0},{\"x\":-73.9862518966646,\"y\":40.73562642272427,\"cluster\":0},{\"x\":-73.986252242017,\"y\":40.735626656082445,\"cluster\":0},{\"x\":-73.98625152460835,\"y\":40.735626229414,\"cluster\":0},{\"x\":-73.98625258509149,\"y\":40.7356272053874,\"cluster\":0},{\"x\":-73.98625254867669,\"y\":40.735624721075084,\"cluster\":0},{\"x\":-73.98625207318744,\"y\":40.73562418649854,\"cluster\":0},{\"x\":-73.98625077341322,\"y\":40.73562552211442,\"cluster\":0}]},\"extension\":[]}\n", | |
| " Nyaplot.core.parse(model, '#vis-28fb465a-9286-4820-9c8c-b61a1ec17541');\n", | |
| " };\n", | |
| " if(window['Nyaplot']==undefined){\n", | |
| " window.addEventListener('load_nyaplot', render, false);\n", | |
| "\treturn;\n", | |
| " }\n", | |
| " render();\n", | |
| "})();\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 19, | |
| "text": [ | |
| "\"<div id='vis-28fb465a-9286-4820-9c8c-b61a1ec17541'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#130783\\\"},\\\"data\\\":\\\"9a840fd3-526e-4c5f-840f-84cba5706672\\\"}],\\\"options\\\":{\\\"width\\\":300,\\\"height\\\":400,\\\"zoom\\\":true,\\\"rotate_x_label\\\":-60,\\\"xrange\\\":[-73.98626268168839,-73.98624077341321],\\\"yrange\\\":[40.73561418649854,40.735637205387405]}}],\\\"data\\\":{\\\"9a840fd3-526e-4c5f-840f-84cba5706672\\\":[{\\\"x\\\":-73.98625268168838,\\\"y\\\":40.73562601945317,\\\"cluster\\\":0},{\\\"x\\\":-73.98625173238652,\\\"y\\\":40.735625569382876,\\\"cluster\\\":0},{\\\"x\\\":-73.9862518966646,\\\"y\\\":40.73562642272427,\\\"cluster\\\":0},{\\\"x\\\":-73.986252242017,\\\"y\\\":40.735626656082445,\\\"cluster\\\":0},{\\\"x\\\":-73.98625152460835,\\\"y\\\":40.735626229414,\\\"cluster\\\":0},{\\\"x\\\":-73.98625258509149,\\\"y\\\":40.7356272053874,\\\"cluster\\\":0},{\\\"x\\\":-73.98625254867669,\\\"y\\\":40.735624721075084,\\\"cluster\\\":0},{\\\"x\\\":-73.98625207318744,\\\"y\\\":40.73562418649854,\\\"cluster\\\":0},{\\\"x\\\":-73.98625077341322,\\\"y\\\":40.73562552211442,\\\"cluster\\\":0}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-28fb465a-9286-4820-9c8c-b61a1ec17541');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 19 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 2. Clustering vertices\n", | |
| "\n", | |
| "Now we need to:\n", | |
| "\n", | |
| "1. work backwards from the centroid clusters that have three or more centroids (only one in this case)\n", | |
| "1. find the polygons they belong to and, finally,\n", | |
| "1. find their vertices and cluster them\n", | |
| "\n", | |
| "Below a function that retrieves the polygons for a given centroid cluster based on the structures we have built so far:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "# given a list of centroids (lon,lat), find their poly's index in the centroid list (index => lon,lat)\n", | |
| "def get_polys_for_centroid_cluster(cluster, centroids, original_polys)\n", | |
| " polys = []\n", | |
| " cluster.each do |cl|\n", | |
| " index = centroids.select {|k,v| v == cl}.keys.first\n", | |
| " polys.push(original_polys[index]) if index != -1\n", | |
| " end\n", | |
| " return polys\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 20, | |
| "text": [ | |
| ":get_polys_for_centroid_cluster" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 20 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Applying this to the only cluster that has useful centroids:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "cluster_polygons = get_polys_for_centroid_cluster(centroid_clusters[0], centroids, geocollection)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 21, | |
| "text": [ | |
| "[#<RGeo::GeoJSON::Feature:0x80d3bde8 id=nil geom=\"POLYGON ((-73.98620970547199 40.7356342514617, -73.98627072572708 40.735547874977094, -73.98632504045963 40.73557226364293, -73.98622445762157 40.73570995781772, -73.9861835539341 40.73569268254945, -73.98621775209902 40.735640856717666, -73.98620970547199 40.7356342514617))\">, #<RGeo::GeoJSON::Feature:0x80d3b550 id=nil geom=\"POLYGON ((-73.98620769381522 40.73563526765495, -73.9862660318613 40.735547874977094, -73.98632504045963 40.735570739351566, -73.98622579872608 40.73570944972167, -73.98618154227734 40.73569217445325, -73.98621775209902 40.73563933242788, -73.98620769381522 40.73563526765495))\">, #<RGeo::GeoJSON::Feature:0x80d3afc4 id=nil geom=\"POLYGON ((-73.98632369935513 40.735570739351566, -73.98622512817383 40.73570944972167, -73.98618154227734 40.73569014206842, -73.98621909320354 40.735640856717666, -73.98620970547199 40.73563526765495, -73.98627005517483 40.73554889117169, -73.98632369935513 40.735570739351566))\">, #<RGeo::GeoJSON::Feature:0x80d3aa9c id=nil geom=\"POLYGON ((-73.98621842265129 40.7356423810074, -73.98620903491974 40.73563577575159, -73.98627139627934 40.735547874977094, -73.98632436990738 40.735571755545806, -73.98622579872608 40.73570995781772, -73.98618087172508 40.735689633972214, -73.98621842265129 40.7356423810074))\">, #<RGeo::GeoJSON::Feature:0x80d3a59c id=nil geom=\"POLYGON ((-73.98626938462257 40.73554889117167, -73.98632369935513 40.735572771740024, -73.98622445762157 40.73570894162559, -73.98618154227734 40.73569065016463, -73.98621775209902 40.735640856717666, -73.98620836436749 40.735634251461676, -73.98626938462257 40.73554889117167))\">, #<RGeo::GeoJSON::Feature:0x80d37dc4 id=nil geom=\"POLYGON ((-73.98632571101189 40.735571755545806, -73.98622378706932 40.73570995781772, -73.98618288338184 40.73569268254945, -73.98621775209902 40.73564034862108, -73.9862110465765 40.7356362838482, -73.98627005517483 40.735550923560815, -73.98632571101189 40.735571755545806))\">, #<RGeo::GeoJSON::Feature:0x80d36e4c id=nil geom=\"POLYGON ((-73.98620970547199 40.73563475955834, -73.98627005517483 40.73554990736624, -73.98632369935513 40.735571755545806, -73.98622360456956 40.73570641325812, -73.9861848950386 40.735689633972214, -73.98621842265129 40.735640856717666, -73.98620970547199 40.73563475955834))\">, #<RGeo::GeoJSON::Feature:0x80d3a0b0 id=nil geom=\"POLYGON ((-73.98621775209902 40.73563984052446, -73.98620836436749 40.73563272717173, -73.98626938462257 40.735550415463514, -73.98632235825062 40.73557124744871, -73.98622360456956 40.73570641325812, -73.98618768252459 40.73568957578454, -73.98621775209902 40.73563984052446))\">, #<RGeo::GeoJSON::Feature:0x80d3644c id=nil geom=\"POLYGON ((-73.98621909320354 40.735638316234656, -73.98620836436749 40.7356362838482, -73.98620769381522 40.73563577575159, -73.98627005517483 40.73554939926897, -73.98632302880287 40.73557023125444, -73.98622360456956 40.73570641325812, -73.98617953062057 40.735689633972214, -73.98621909320354 40.735638316234656))\">]" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 21 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We need a method to extract the vertices from each polygon (in a DBSCAN-compatible format):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def get_points(poly_feature)\n", | |
| " geom = poly_feature.geometry\n", | |
| " return false if (geom.geometry_type.type_name != \"Polygon\")\n", | |
| " pts = []\n", | |
| " points = geom.exterior_ring.points\n", | |
| " points.each do |point|\n", | |
| " pts.push([point.x,point.y])\n", | |
| " end\n", | |
| " return pts\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 22, | |
| "text": [ | |
| ":get_points" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 22 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now let's plot what we have so far (vertices from the same polygon are the same color):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def plot_polys(polys)\n", | |
| " plot = Nyaplot::Plot.new\n", | |
| " plot.width(500)\n", | |
| " plot.height(500)\n", | |
| " plot.zoom(true)\n", | |
| " plot.rotate_x_label(-60)\n", | |
| " polys.each do |poly|\n", | |
| " plot_poly(poly, plot)\n", | |
| " end\n", | |
| " plot.show\n", | |
| "end\n", | |
| "def plot_poly(poly, plot = nil)\n", | |
| " showplot = false\n", | |
| " if plot == nil\n", | |
| " showplot = true\n", | |
| " plot = Nyaplot::Plot.new\n", | |
| " plot.width(500)\n", | |
| " plot.height(500)\n", | |
| " plot.zoom(true)\n", | |
| " plot.rotate_x_label(-60)\n", | |
| " end\n", | |
| " points = get_points(poly)\n", | |
| " points_x = points.map { |p| p[0] }\n", | |
| " points_y = points.map { |p| p[1] }\n", | |
| " df = Nyaplot::DataFrame.new({x:points_x,y:points_y})\n", | |
| " sc = plot.add_with_df(df, :scatter, :x, :y)\n", | |
| " color = \"#\"+ \"%06x\" % (rand * 0xffffff)\n", | |
| " sc.color(color)\n", | |
| " plot.show if showplot\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 23, | |
| "text": [ | |
| ":plot_poly" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 23 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "plot_polys(cluster_polygons)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<div id='vis-d40f4dbd-5edd-44f5-8abc-d0d49eef4fef'></div>\n", | |
| "<script>\n", | |
| "(function(){\n", | |
| " var render = function(){\n", | |
| " var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#5d9c36\"},\"data\":\"d2949b10-9d6e-4bca-b9c4-068e5ca6ed8d\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#6e5b61\"},\"data\":\"1ae0953c-32a1-410d-9309-9c17b4ee24b1\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#917ab4\"},\"data\":\"2ea818d3-6d40-42bd-8883-51c377a69610\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#38ca36\"},\"data\":\"f5f387ce-162a-43f6-ae21-06efbe8b684a\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#d16d7c\"},\"data\":\"3064cf5f-6bd9-4acb-b889-5819c1534187\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#885b3e\"},\"data\":\"d9ba0ec6-75fe-4cc3-a40e-7d39cf8ecd29\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#4e0177\"},\"data\":\"cb7313ae-25e6-4cc4-962f-e31e9b50368e\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#e2fd3d\"},\"data\":\"c5be729a-e8d3-483c-87c6-fa8d28814b9b\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"color\":\"#c02dc4\"},\"data\":\"c715c674-86da-496d-896b-c4d84ad44ec8\"}],\"options\":{\"width\":500,\"height\":500,\"zoom\":true,\"rotate_x_label\":-60,\"xrange\":[-73.98632571101189,-73.98617953062057],\"yrange\":[40.735547874977094,40.73570995781772]}}],\"data\":{\"d2949b10-9d6e-4bca-b9c4-068e5ca6ed8d\":[{\"x\":-73.98620970547199,\"y\":40.7356342514617},{\"x\":-73.98627072572708,\"y\":40.735547874977094},{\"x\":-73.98632504045963,\"y\":40.73557226364293},{\"x\":-73.98622445762157,\"y\":40.73570995781772},{\"x\":-73.9861835539341,\"y\":40.73569268254945},{\"x\":-73.98621775209902,\"y\":40.735640856717666},{\"x\":-73.98620970547199,\"y\":40.7356342514617}],\"1ae0953c-32a1-410d-9309-9c17b4ee24b1\":[{\"x\":-73.98620769381522,\"y\":40.73563526765495},{\"x\":-73.9862660318613,\"y\":40.735547874977094},{\"x\":-73.98632504045963,\"y\":40.735570739351566},{\"x\":-73.98622579872608,\"y\":40.73570944972167},{\"x\":-73.98618154227734,\"y\":40.73569217445325},{\"x\":-73.98621775209902,\"y\":40.73563933242788},{\"x\":-73.98620769381522,\"y\":40.73563526765495}],\"2ea818d3-6d40-42bd-8883-51c377a69610\":[{\"x\":-73.98632369935513,\"y\":40.735570739351566},{\"x\":-73.98622512817383,\"y\":40.73570944972167},{\"x\":-73.98618154227734,\"y\":40.73569014206842},{\"x\":-73.98621909320354,\"y\":40.735640856717666},{\"x\":-73.98620970547199,\"y\":40.73563526765495},{\"x\":-73.98627005517483,\"y\":40.73554889117169},{\"x\":-73.98632369935513,\"y\":40.735570739351566}],\"f5f387ce-162a-43f6-ae21-06efbe8b684a\":[{\"x\":-73.98621842265129,\"y\":40.7356423810074},{\"x\":-73.98620903491974,\"y\":40.73563577575159},{\"x\":-73.98627139627934,\"y\":40.735547874977094},{\"x\":-73.98632436990738,\"y\":40.735571755545806},{\"x\":-73.98622579872608,\"y\":40.73570995781772},{\"x\":-73.98618087172508,\"y\":40.735689633972214},{\"x\":-73.98621842265129,\"y\":40.7356423810074}],\"3064cf5f-6bd9-4acb-b889-5819c1534187\":[{\"x\":-73.98626938462257,\"y\":40.73554889117167},{\"x\":-73.98632369935513,\"y\":40.735572771740024},{\"x\":-73.98622445762157,\"y\":40.73570894162559},{\"x\":-73.98618154227734,\"y\":40.73569065016463},{\"x\":-73.98621775209902,\"y\":40.735640856717666},{\"x\":-73.98620836436749,\"y\":40.735634251461676},{\"x\":-73.98626938462257,\"y\":40.73554889117167}],\"d9ba0ec6-75fe-4cc3-a40e-7d39cf8ecd29\":[{\"x\":-73.98632571101189,\"y\":40.735571755545806},{\"x\":-73.98622378706932,\"y\":40.73570995781772},{\"x\":-73.98618288338184,\"y\":40.73569268254945},{\"x\":-73.98621775209902,\"y\":40.73564034862108},{\"x\":-73.9862110465765,\"y\":40.7356362838482},{\"x\":-73.98627005517483,\"y\":40.735550923560815},{\"x\":-73.98632571101189,\"y\":40.735571755545806}],\"cb7313ae-25e6-4cc4-962f-e31e9b50368e\":[{\"x\":-73.98620970547199,\"y\":40.73563475955834},{\"x\":-73.98627005517483,\"y\":40.73554990736624},{\"x\":-73.98632369935513,\"y\":40.735571755545806},{\"x\":-73.98622360456956,\"y\":40.73570641325812},{\"x\":-73.9861848950386,\"y\":40.735689633972214},{\"x\":-73.98621842265129,\"y\":40.735640856717666},{\"x\":-73.98620970547199,\"y\":40.73563475955834}],\"c5be729a-e8d3-483c-87c6-fa8d28814b9b\":[{\"x\":-73.98621775209902,\"y\":40.73563984052446},{\"x\":-73.98620836436749,\"y\":40.73563272717173},{\"x\":-73.98626938462257,\"y\":40.735550415463514},{\"x\":-73.98632235825062,\"y\":40.73557124744871},{\"x\":-73.98622360456956,\"y\":40.73570641325812},{\"x\":-73.98618768252459,\"y\":40.73568957578454},{\"x\":-73.98621775209902,\"y\":40.73563984052446}],\"c715c674-86da-496d-896b-c4d84ad44ec8\":[{\"x\":-73.98621909320354,\"y\":40.735638316234656},{\"x\":-73.98620836436749,\"y\":40.7356362838482},{\"x\":-73.98620769381522,\"y\":40.73563577575159},{\"x\":-73.98627005517483,\"y\":40.73554939926897},{\"x\":-73.98632302880287,\"y\":40.73557023125444},{\"x\":-73.98622360456956,\"y\":40.73570641325812},{\"x\":-73.98617953062057,\"y\":40.735689633972214},{\"x\":-73.98621909320354,\"y\":40.735638316234656}]},\"extension\":[]}\n", | |
| " Nyaplot.core.parse(model, '#vis-d40f4dbd-5edd-44f5-8abc-d0d49eef4fef');\n", | |
| " };\n", | |
| " if(window['Nyaplot']==undefined){\n", | |
| " window.addEventListener('load_nyaplot', render, false);\n", | |
| "\treturn;\n", | |
| " }\n", | |
| " render();\n", | |
| "})();\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 24, | |
| "text": [ | |
| "\"<div id='vis-d40f4dbd-5edd-44f5-8abc-d0d49eef4fef'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#5d9c36\\\"},\\\"data\\\":\\\"d2949b10-9d6e-4bca-b9c4-068e5ca6ed8d\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#6e5b61\\\"},\\\"data\\\":\\\"1ae0953c-32a1-410d-9309-9c17b4ee24b1\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#917ab4\\\"},\\\"data\\\":\\\"2ea818d3-6d40-42bd-8883-51c377a69610\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#38ca36\\\"},\\\"data\\\":\\\"f5f387ce-162a-43f6-ae21-06efbe8b684a\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#d16d7c\\\"},\\\"data\\\":\\\"3064cf5f-6bd9-4acb-b889-5819c1534187\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#885b3e\\\"},\\\"data\\\":\\\"d9ba0ec6-75fe-4cc3-a40e-7d39cf8ecd29\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#4e0177\\\"},\\\"data\\\":\\\"cb7313ae-25e6-4cc4-962f-e31e9b50368e\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#e2fd3d\\\"},\\\"data\\\":\\\"c5be729a-e8d3-483c-87c6-fa8d28814b9b\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"color\\\":\\\"#c02dc4\\\"},\\\"data\\\":\\\"c715c674-86da-496d-896b-c4d84ad44ec8\\\"}],\\\"options\\\":{\\\"width\\\":500,\\\"height\\\":500,\\\"zoom\\\":true,\\\"rotate_x_label\\\":-60,\\\"xrange\\\":[-73.98632571101189,-73.98617953062057],\\\"yrange\\\":[40.735547874977094,40.73570995781772]}}],\\\"data\\\":{\\\"d2949b10-9d6e-4bca-b9c4-068e5ca6ed8d\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617},{\\\"x\\\":-73.98627072572708,\\\"y\\\":40.735547874977094},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.73557226364293},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570995781772},{\\\"x\\\":-73.9861835539341,\\\"y\\\":40.73569268254945},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617}],\\\"1ae0953c-32a1-410d-9309-9c17b4ee24b1\\\":[{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495},{\\\"x\\\":-73.9862660318613,\\\"y\\\":40.735547874977094},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.735570739351566},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570944972167},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569217445325},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563933242788},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495}],\\\"2ea818d3-6d40-42bd-8883-51c377a69610\\\":[{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566},{\\\"x\\\":-73.98622512817383,\\\"y\\\":40.73570944972167},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569014206842},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735640856717666},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563526765495},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554889117169},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566}],\\\"f5f387ce-162a-43f6-ae21-06efbe8b684a\\\":[{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074},{\\\"x\\\":-73.98620903491974,\\\"y\\\":40.73563577575159},{\\\"x\\\":-73.98627139627934,\\\"y\\\":40.735547874977094},{\\\"x\\\":-73.98632436990738,\\\"y\\\":40.735571755545806},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570995781772},{\\\"x\\\":-73.98618087172508,\\\"y\\\":40.735689633972214},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074}],\\\"3064cf5f-6bd9-4acb-b889-5819c1534187\\\":[{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735572771740024},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570894162559},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569065016463},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.735634251461676},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167}],\\\"d9ba0ec6-75fe-4cc3-a40e-7d39cf8ecd29\\\":[{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806},{\\\"x\\\":-73.98622378706932,\\\"y\\\":40.73570995781772},{\\\"x\\\":-73.98618288338184,\\\"y\\\":40.73569268254945},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73564034862108},{\\\"x\\\":-73.9862110465765,\\\"y\\\":40.7356362838482},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.735550923560815},{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806}],\\\"cb7313ae-25e6-4cc4-962f-e31e9b50368e\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554990736624},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735571755545806},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812},{\\\"x\\\":-73.9861848950386,\\\"y\\\":40.735689633972214},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.735640856717666},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834}],\\\"c5be729a-e8d3-483c-87c6-fa8d28814b9b\\\":[{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.73563272717173},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.735550415463514},{\\\"x\\\":-73.98632235825062,\\\"y\\\":40.73557124744871},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812},{\\\"x\\\":-73.98618768252459,\\\"y\\\":40.73568957578454},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446}],\\\"c715c674-86da-496d-896b-c4d84ad44ec8\\\":[{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.7356362838482},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563577575159},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554939926897},{\\\"x\\\":-73.98632302880287,\\\"y\\\":40.73557023125444},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812},{\\\"x\\\":-73.98617953062057,\\\"y\\\":40.735689633972214},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-d40f4dbd-5edd-44f5-8abc-d0d49eef4fef');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 24 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Let's cluster these points. Below is a function that extracts the points from a list of polygons:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def get_all_poly_points(polys)\n", | |
| " points = []\n", | |
| " polys.each do |poly|\n", | |
| " points.push(get_points(poly))\n", | |
| " end\n", | |
| " return points\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 25, | |
| "text": [ | |
| ":get_all_poly_points" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 25 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "cluster_poly_points = get_all_poly_points(cluster_polygons)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 26, | |
| "text": [ | |
| "[[[-73.98620970547199, 40.7356342514617], [-73.98627072572708, 40.735547874977094], [-73.98632504045963, 40.73557226364293], [-73.98622445762157, 40.73570995781772], [-73.9861835539341, 40.73569268254945], [-73.98621775209902, 40.735640856717666], [-73.98620970547199, 40.7356342514617]], [[-73.98620769381522, 40.73563526765495], [-73.9862660318613, 40.735547874977094], [-73.98632504045963, 40.735570739351566], [-73.98622579872608, 40.73570944972167], [-73.98618154227734, 40.73569217445325], [-73.98621775209902, 40.73563933242788], [-73.98620769381522, 40.73563526765495]], [[-73.98632369935513, 40.735570739351566], [-73.98622512817383, 40.73570944972167], [-73.98618154227734, 40.73569014206842], [-73.98621909320354, 40.735640856717666], [-73.98620970547199, 40.73563526765495], [-73.98627005517483, 40.73554889117169], [-73.98632369935513, 40.735570739351566]], [[-73.98621842265129, 40.7356423810074], [-73.98620903491974, 40.73563577575159], [-73.98627139627934, 40.735547874977094], [-73.98632436990738, 40.735571755545806], [-73.98622579872608, 40.73570995781772], [-73.98618087172508, 40.735689633972214], [-73.98621842265129, 40.7356423810074]], [[-73.98626938462257, 40.73554889117167], [-73.98632369935513, 40.735572771740024], [-73.98622445762157, 40.73570894162559], [-73.98618154227734, 40.73569065016463], [-73.98621775209902, 40.735640856717666], [-73.98620836436749, 40.735634251461676], [-73.98626938462257, 40.73554889117167]], [[-73.98632571101189, 40.735571755545806], [-73.98622378706932, 40.73570995781772], [-73.98618288338184, 40.73569268254945], [-73.98621775209902, 40.73564034862108], [-73.9862110465765, 40.7356362838482], [-73.98627005517483, 40.735550923560815], [-73.98632571101189, 40.735571755545806]], [[-73.98620970547199, 40.73563475955834], [-73.98627005517483, 40.73554990736624], [-73.98632369935513, 40.735571755545806], [-73.98622360456956, 40.73570641325812], [-73.9861848950386, 40.735689633972214], [-73.98621842265129, 40.735640856717666], [-73.98620970547199, 40.73563475955834]], [[-73.98621775209902, 40.73563984052446], [-73.98620836436749, 40.73563272717173], [-73.98626938462257, 40.735550415463514], [-73.98632235825062, 40.73557124744871], [-73.98622360456956, 40.73570641325812], [-73.98618768252459, 40.73568957578454], [-73.98621775209902, 40.73563984052446]], [[-73.98621909320354, 40.735638316234656], [-73.98620836436749, 40.7356362838482], [-73.98620769381522, 40.73563577575159], [-73.98627005517483, 40.73554939926897], [-73.98632302880287, 40.73557023125444], [-73.98622360456956, 40.73570641325812], [-73.98617953062057, 40.735689633972214], [-73.98621909320354, 40.735638316234656]]]" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 26 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "The better \u03b5 value I found for these points is a bit more complicated. If it is too big, the L-shape will be lost: points in that corner will be clustered together. After fiddling around I found a decent value of of $6(10^{-6})$.\n", | |
| "\n", | |
| "An important aspect to account for here is that the GeoJSON spec requires that the coordinate array has to begin _and end_ with the _same point_. Therefore this point would be **counted twice** if we leave the array as-is. Below the resulting clustering function, corresponding test, and plot:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def cluster_points(points, epsilon=6e-06, min_points=2)\n", | |
| " dbscan = DBSCAN( points.flatten(1), :epsilon => epsilon, :min_points => min_points, :distance => :euclidean_distance )\n", | |
| " return dbscan.results.select{|k,v| k != -1} # omit the non-cluster\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 27, | |
| "text": [ | |
| ":cluster_points" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 27 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "# exclude first item in each poly since it is same as last\n", | |
| "unique_points = cluster_poly_points.map{|poly| poly[1..-1]}\n", | |
| "vertex_clusters = cluster_points(unique_points)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 28, | |
| "text": [ | |
| "{0=>[[-73.98627072572708, 40.735547874977094], [-73.9862660318613, 40.735547874977094], [-73.98627005517483, 40.73554889117169], [-73.98627139627934, 40.735547874977094], [-73.98626938462257, 40.73554889117167], [-73.98627005517483, 40.735550923560815], [-73.98627005517483, 40.73554990736624], [-73.98626938462257, 40.735550415463514], [-73.98627005517483, 40.73554939926897]], 1=>[[-73.98632504045963, 40.73557226364293], [-73.98632504045963, 40.735570739351566], [-73.98632369935513, 40.735570739351566], [-73.98632436990738, 40.735571755545806], [-73.98632369935513, 40.735572771740024], [-73.98632571101189, 40.735571755545806], [-73.98632369935513, 40.735571755545806], [-73.98632235825062, 40.73557124744871], [-73.98632302880287, 40.73557023125444]], 2=>[[-73.98622445762157, 40.73570995781772], [-73.98622579872608, 40.73570944972167], [-73.98622512817383, 40.73570944972167], [-73.98622579872608, 40.73570995781772], [-73.98622445762157, 40.73570894162559], [-73.98622378706932, 40.73570995781772], [-73.98622360456956, 40.73570641325812], [-73.98622360456956, 40.73570641325812], [-73.98622360456956, 40.73570641325812]], 3=>[[-73.9861835539341, 40.73569268254945], [-73.98618154227734, 40.73569217445325], [-73.98618154227734, 40.73569014206842], [-73.98618087172508, 40.735689633972214], [-73.98618154227734, 40.73569065016463], [-73.98618288338184, 40.73569268254945], [-73.9861848950386, 40.735689633972214], [-73.98618768252459, 40.73568957578454], [-73.98617953062057, 40.735689633972214]], 4=>[[-73.98621775209902, 40.735640856717666], [-73.98621775209902, 40.73563933242788], [-73.98621909320354, 40.735640856717666], [-73.98621842265129, 40.7356423810074], [-73.98621775209902, 40.73564034862108], [-73.98621842265129, 40.735640856717666], [-73.98621775209902, 40.73563984052446], [-73.98621909320354, 40.735638316234656], [-73.98621775209902, 40.735640856717666]], 5=>[[-73.98620970547199, 40.7356342514617], [-73.98620769381522, 40.73563526765495], [-73.98620970547199, 40.73563526765495], [-73.98620903491974, 40.73563577575159], [-73.98620836436749, 40.735634251461676], [-73.9862110465765, 40.7356362838482], [-73.98620970547199, 40.73563475955834], [-73.98620836436749, 40.73563272717173], [-73.98620836436749, 40.7356362838482], [-73.98620769381522, 40.73563577575159]]}" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 28 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "plot = plot_clusters(vertex_clusters)\n", | |
| "plot.show" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<div id='vis-6a777312-113c-4916-9476-4438cd56ec78'></div>\n", | |
| "<script>\n", | |
| "(function(){\n", | |
| " var render = function(){\n", | |
| " var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#d10d45\"},\"data\":\"06607b62-1b6a-4ccb-a030-0da9f8e3f7ac\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#12c935\"},\"data\":\"484d65e9-0b8c-4ed4-98d2-965306dd5bda\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#de6ea2\"},\"data\":\"97316b13-e1ec-417f-824a-4a1c084c10c3\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#0c0dcd\"},\"data\":\"a89cda52-f106-4b44-95aa-aaa6defc129b\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#dd318d\"},\"data\":\"615c22b4-81f1-4413-80fa-fc406e6c1efb\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#242148\"},\"data\":\"d0725452-aa45-4ebb-ac99-361f36b0bb10\"}],\"options\":{\"width\":300,\"height\":400,\"zoom\":true,\"rotate_x_label\":-60,\"xrange\":[-73.98633571101189,-73.98616953062057],\"yrange\":[40.73553787497709,40.73571995781772]}}],\"data\":{\"06607b62-1b6a-4ccb-a030-0da9f8e3f7ac\":[{\"x\":-73.98627072572708,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.9862660318613,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554889117169,\"cluster\":0},{\"x\":-73.98627139627934,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.73554889117167,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.735550923560815,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554990736624,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.735550415463514,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554939926897,\"cluster\":0}],\"484d65e9-0b8c-4ed4-98d2-965306dd5bda\":[{\"x\":-73.98632504045963,\"y\":40.73557226364293,\"cluster\":1},{\"x\":-73.98632504045963,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632436990738,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735572771740024,\"cluster\":1},{\"x\":-73.98632571101189,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632235825062,\"y\":40.73557124744871,\"cluster\":1},{\"x\":-73.98632302880287,\"y\":40.73557023125444,\"cluster\":1}],\"97316b13-e1ec-417f-824a-4a1c084c10c3\":[{\"x\":-73.98622445762157,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622512817383,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622445762157,\"y\":40.73570894162559,\"cluster\":2},{\"x\":-73.98622378706932,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2}],\"a89cda52-f106-4b44-95aa-aaa6defc129b\":[{\"x\":-73.9861835539341,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569217445325,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569014206842,\"cluster\":3},{\"x\":-73.98618087172508,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569065016463,\"cluster\":3},{\"x\":-73.98618288338184,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.9861848950386,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618768252459,\"y\":40.73568957578454,\"cluster\":3},{\"x\":-73.98617953062057,\"y\":40.735689633972214,\"cluster\":3}],\"615c22b4-81f1-4413-80fa-fc406e6c1efb\":[{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563933242788,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.7356423810074,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73564034862108,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563984052446,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735638316234656,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4}],\"d0725452-aa45-4ebb-ac99-361f36b0bb10\":[{\"x\":-73.98620970547199,\"y\":40.7356342514617,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620903491974,\"y\":40.73563577575159,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.735634251461676,\"cluster\":5},{\"x\":-73.9862110465765,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563475955834,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.73563272717173,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563577575159,\"cluster\":5}]},\"extension\":[]}\n", | |
| " Nyaplot.core.parse(model, '#vis-6a777312-113c-4916-9476-4438cd56ec78');\n", | |
| " };\n", | |
| " if(window['Nyaplot']==undefined){\n", | |
| " window.addEventListener('load_nyaplot', render, false);\n", | |
| "\treturn;\n", | |
| " }\n", | |
| " render();\n", | |
| "})();\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 29, | |
| "text": [ | |
| "\"<div id='vis-6a777312-113c-4916-9476-4438cd56ec78'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#d10d45\\\"},\\\"data\\\":\\\"06607b62-1b6a-4ccb-a030-0da9f8e3f7ac\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#12c935\\\"},\\\"data\\\":\\\"484d65e9-0b8c-4ed4-98d2-965306dd5bda\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#de6ea2\\\"},\\\"data\\\":\\\"97316b13-e1ec-417f-824a-4a1c084c10c3\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#0c0dcd\\\"},\\\"data\\\":\\\"a89cda52-f106-4b44-95aa-aaa6defc129b\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#dd318d\\\"},\\\"data\\\":\\\"615c22b4-81f1-4413-80fa-fc406e6c1efb\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#242148\\\"},\\\"data\\\":\\\"d0725452-aa45-4ebb-ac99-361f36b0bb10\\\"}],\\\"options\\\":{\\\"width\\\":300,\\\"height\\\":400,\\\"zoom\\\":true,\\\"rotate_x_label\\\":-60,\\\"xrange\\\":[-73.98633571101189,-73.98616953062057],\\\"yrange\\\":[40.73553787497709,40.73571995781772]}}],\\\"data\\\":{\\\"06607b62-1b6a-4ccb-a030-0da9f8e3f7ac\\\":[{\\\"x\\\":-73.98627072572708,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.9862660318613,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554889117169,\\\"cluster\\\":0},{\\\"x\\\":-73.98627139627934,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.735550923560815,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554990736624,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.735550415463514,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554939926897,\\\"cluster\\\":0}],\\\"484d65e9-0b8c-4ed4-98d2-965306dd5bda\\\":[{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.73557226364293,\\\"cluster\\\":1},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632436990738,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735572771740024,\\\"cluster\\\":1},{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632235825062,\\\"y\\\":40.73557124744871,\\\"cluster\\\":1},{\\\"x\\\":-73.98632302880287,\\\"y\\\":40.73557023125444,\\\"cluster\\\":1}],\\\"97316b13-e1ec-417f-824a-4a1c084c10c3\\\":[{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622512817383,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570894162559,\\\"cluster\\\":2},{\\\"x\\\":-73.98622378706932,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2}],\\\"a89cda52-f106-4b44-95aa-aaa6defc129b\\\":[{\\\"x\\\":-73.9861835539341,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569217445325,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569014206842,\\\"cluster\\\":3},{\\\"x\\\":-73.98618087172508,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569065016463,\\\"cluster\\\":3},{\\\"x\\\":-73.98618288338184,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.9861848950386,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618768252459,\\\"y\\\":40.73568957578454,\\\"cluster\\\":3},{\\\"x\\\":-73.98617953062057,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3}],\\\"615c22b4-81f1-4413-80fa-fc406e6c1efb\\\":[{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563933242788,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73564034862108,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4}],\\\"d0725452-aa45-4ebb-ac99-361f36b0bb10\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620903491974,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.735634251461676,\\\"cluster\\\":5},{\\\"x\\\":-73.9862110465765,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.73563272717173,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-6a777312-113c-4916-9476-4438cd56ec78');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 29 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 3. Finding the mean polygon\n", | |
| "\n", | |
| "Now we iterate through each vertex cluster and:\n", | |
| "\n", | |
| "1. find the mean vertex\n", | |
| "1. connect the mean vertices into a mean polygon\n", | |
| "\n", | |
| "For this we need some extra functions in the `Array` object to find the mean value:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "class Array\n", | |
| " def sum\n", | |
| " inject(0.0) { |result, el| result + el }\n", | |
| " end\n", | |
| "\n", | |
| " def mean \n", | |
| " sum / size\n", | |
| " end\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 30, | |
| "text": [ | |
| ":mean" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 30 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now we need a function that receives the vertex clusters and returns the average vertex for each cluster:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def get_mean_poly(clusters)\n", | |
| " means = {}\n", | |
| " clusters.each do |cluster|\n", | |
| " next if cluster[0] == -1 # ignore cluster -1\n", | |
| " lon = cluster[1].map {|c| c[0]}.mean\n", | |
| " lat = cluster[1].map {|c| c[1]}.mean\n", | |
| " means[cluster[0]] = [lon,lat]\n", | |
| " end\n", | |
| " return means\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 31, | |
| "text": [ | |
| ":get_mean_poly" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 31 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We test this function with our vertex clusters and plot both (mean vertices as yellow diamonds):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "mean_poly = get_mean_poly(vertex_clusters)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 32, | |
| "text": [ | |
| "{0=>[-73.9862696826458, 40.73554911699269], 1=>[-73.98632407188416, 40.73557147326963], 2=>[-73.98622447129412, 40.73570855047738], 3=>[-73.98618267156186, 40.73569075660959], 4=>[-73.98621819913387, 40.73564040507624], 5=>[-73.98620896786451, 40.735635064416286]}" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 32 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "# plot clusters with overlaid (yellow) mean points\n", | |
| "plot = plot_clusters(vertex_clusters)\n", | |
| "# add means\n", | |
| "m_x = mean_poly.map { |m| m[1][0] }\n", | |
| "m_y = mean_poly.map { |m| m[1][1] }\n", | |
| "sc = plot.add(:scatter, m_x, m_y)\n", | |
| "color = \"#ffff00\"\n", | |
| "sc.color(color)\n", | |
| "sc.shape('diamond')\n", | |
| "plot.show" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<div id='vis-2c7602f7-cc7d-4d94-becc-fd61a3798004'></div>\n", | |
| "<script>\n", | |
| "(function(){\n", | |
| " var render = function(){\n", | |
| " var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#e469de\"},\"data\":\"68ce990c-89db-490b-8361-a6602e98fafa\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#003a5f\"},\"data\":\"f6612040-9d54-4dbb-a260-889ee7f8bf18\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#d44634\"},\"data\":\"f8cca8ed-3965-4a6d-bcd0-724c6117ebe1\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#3bf97f\"},\"data\":\"c9ec7987-1a82-4c01-8eb4-89a433e66050\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#b45cb3\"},\"data\":\"01d68342-a813-4672-a1bb-aff5a45f8aaa\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#8413c0\"},\"data\":\"c3189c33-d563-40f4-a651-ea8fa7bc0d7f\"},{\"type\":\"scatter\",\"options\":{\"x\":\"data0\",\"y\":\"data1\",\"color\":\"#ffff00\",\"shape\":\"diamond\"},\"data\":\"d666d9ca-db2d-47b5-832b-3565a182a39e\"}],\"options\":{\"width\":300,\"height\":400,\"zoom\":true,\"rotate_x_label\":-60,\"xrange\":[-73.98633571101189,-73.98616953062057],\"yrange\":[40.73553787497709,40.73571995781772]}}],\"data\":{\"68ce990c-89db-490b-8361-a6602e98fafa\":[{\"x\":-73.98627072572708,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.9862660318613,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554889117169,\"cluster\":0},{\"x\":-73.98627139627934,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.73554889117167,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.735550923560815,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554990736624,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.735550415463514,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554939926897,\"cluster\":0}],\"f6612040-9d54-4dbb-a260-889ee7f8bf18\":[{\"x\":-73.98632504045963,\"y\":40.73557226364293,\"cluster\":1},{\"x\":-73.98632504045963,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632436990738,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735572771740024,\"cluster\":1},{\"x\":-73.98632571101189,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632235825062,\"y\":40.73557124744871,\"cluster\":1},{\"x\":-73.98632302880287,\"y\":40.73557023125444,\"cluster\":1}],\"f8cca8ed-3965-4a6d-bcd0-724c6117ebe1\":[{\"x\":-73.98622445762157,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622512817383,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622445762157,\"y\":40.73570894162559,\"cluster\":2},{\"x\":-73.98622378706932,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2}],\"c9ec7987-1a82-4c01-8eb4-89a433e66050\":[{\"x\":-73.9861835539341,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569217445325,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569014206842,\"cluster\":3},{\"x\":-73.98618087172508,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569065016463,\"cluster\":3},{\"x\":-73.98618288338184,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.9861848950386,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618768252459,\"y\":40.73568957578454,\"cluster\":3},{\"x\":-73.98617953062057,\"y\":40.735689633972214,\"cluster\":3}],\"01d68342-a813-4672-a1bb-aff5a45f8aaa\":[{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563933242788,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.7356423810074,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73564034862108,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563984052446,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735638316234656,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4}],\"c3189c33-d563-40f4-a651-ea8fa7bc0d7f\":[{\"x\":-73.98620970547199,\"y\":40.7356342514617,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620903491974,\"y\":40.73563577575159,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.735634251461676,\"cluster\":5},{\"x\":-73.9862110465765,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563475955834,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.73563272717173,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563577575159,\"cluster\":5}],\"d666d9ca-db2d-47b5-832b-3565a182a39e\":[{\"data0\":-73.9862696826458,\"data1\":40.73554911699269},{\"data0\":-73.98632407188416,\"data1\":40.73557147326963},{\"data0\":-73.98622447129412,\"data1\":40.73570855047738},{\"data0\":-73.98618267156186,\"data1\":40.73569075660959},{\"data0\":-73.98621819913387,\"data1\":40.73564040507624},{\"data0\":-73.98620896786451,\"data1\":40.735635064416286}]},\"extension\":[]}\n", | |
| " Nyaplot.core.parse(model, '#vis-2c7602f7-cc7d-4d94-becc-fd61a3798004');\n", | |
| " };\n", | |
| " if(window['Nyaplot']==undefined){\n", | |
| " window.addEventListener('load_nyaplot', render, false);\n", | |
| "\treturn;\n", | |
| " }\n", | |
| " render();\n", | |
| "})();\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 33, | |
| "text": [ | |
| "\"<div id='vis-2c7602f7-cc7d-4d94-becc-fd61a3798004'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#e469de\\\"},\\\"data\\\":\\\"68ce990c-89db-490b-8361-a6602e98fafa\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#003a5f\\\"},\\\"data\\\":\\\"f6612040-9d54-4dbb-a260-889ee7f8bf18\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#d44634\\\"},\\\"data\\\":\\\"f8cca8ed-3965-4a6d-bcd0-724c6117ebe1\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#3bf97f\\\"},\\\"data\\\":\\\"c9ec7987-1a82-4c01-8eb4-89a433e66050\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#b45cb3\\\"},\\\"data\\\":\\\"01d68342-a813-4672-a1bb-aff5a45f8aaa\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#8413c0\\\"},\\\"data\\\":\\\"c3189c33-d563-40f4-a651-ea8fa7bc0d7f\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\",\\\"color\\\":\\\"#ffff00\\\",\\\"shape\\\":\\\"diamond\\\"},\\\"data\\\":\\\"d666d9ca-db2d-47b5-832b-3565a182a39e\\\"}],\\\"options\\\":{\\\"width\\\":300,\\\"height\\\":400,\\\"zoom\\\":true,\\\"rotate_x_label\\\":-60,\\\"xrange\\\":[-73.98633571101189,-73.98616953062057],\\\"yrange\\\":[40.73553787497709,40.73571995781772]}}],\\\"data\\\":{\\\"68ce990c-89db-490b-8361-a6602e98fafa\\\":[{\\\"x\\\":-73.98627072572708,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.9862660318613,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554889117169,\\\"cluster\\\":0},{\\\"x\\\":-73.98627139627934,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.735550923560815,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554990736624,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.735550415463514,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554939926897,\\\"cluster\\\":0}],\\\"f6612040-9d54-4dbb-a260-889ee7f8bf18\\\":[{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.73557226364293,\\\"cluster\\\":1},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632436990738,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735572771740024,\\\"cluster\\\":1},{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632235825062,\\\"y\\\":40.73557124744871,\\\"cluster\\\":1},{\\\"x\\\":-73.98632302880287,\\\"y\\\":40.73557023125444,\\\"cluster\\\":1}],\\\"f8cca8ed-3965-4a6d-bcd0-724c6117ebe1\\\":[{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622512817383,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570894162559,\\\"cluster\\\":2},{\\\"x\\\":-73.98622378706932,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2}],\\\"c9ec7987-1a82-4c01-8eb4-89a433e66050\\\":[{\\\"x\\\":-73.9861835539341,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569217445325,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569014206842,\\\"cluster\\\":3},{\\\"x\\\":-73.98618087172508,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569065016463,\\\"cluster\\\":3},{\\\"x\\\":-73.98618288338184,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.9861848950386,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618768252459,\\\"y\\\":40.73568957578454,\\\"cluster\\\":3},{\\\"x\\\":-73.98617953062057,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3}],\\\"01d68342-a813-4672-a1bb-aff5a45f8aaa\\\":[{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563933242788,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73564034862108,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4}],\\\"c3189c33-d563-40f4-a651-ea8fa7bc0d7f\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620903491974,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.735634251461676,\\\"cluster\\\":5},{\\\"x\\\":-73.9862110465765,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.73563272717173,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5}],\\\"d666d9ca-db2d-47b5-832b-3565a182a39e\\\":[{\\\"data0\\\":-73.9862696826458,\\\"data1\\\":40.73554911699269},{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963},{\\\"data0\\\":-73.98622447129412,\\\"data1\\\":40.73570855047738},{\\\"data0\\\":-73.98618267156186,\\\"data1\\\":40.73569075660959},{\\\"data0\\\":-73.98621819913387,\\\"data1\\\":40.73564040507624},{\\\"data0\\\":-73.98620896786451,\\\"data1\\\":40.735635064416286}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-2c7602f7-cc7d-4d94-becc-fd61a3798004');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 33 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 4. Connecting it all\n", | |
| "\n", | |
| "So far we have a set of points that seem to be the most likely vertices of the mean polygon drawn by our contributors. However, there are **many ways in which these points could be connected to each other**.\n", | |
| "\n", | |
| "**DISCLAIMER**:\n", | |
| "\n", | |
| "What follows is a _very_ primitive process that I used to determine the most likely connection between those points. This process is the best I could come up with given my limited math knowledge and time. If you have a better idea of how to do this in Ruby please tweet me at [@mgiraldo](https://twitter.com/mgiraldo).\n", | |
| "\n", | |
| "**/DISCLAIMER**\n", | |
| "\n", | |
| "Before going through with connections we need to validate that we have a reasonable amount of clusters to work with: some vertices may be drawn far away enough for them to not cluster properly and therefore no cluster will be produced. We do this by determining the mean vertices in each polygon ($\\bar{m}$) and comparing it with the cluster count ($\\sum c$). Right now: $\\bar{m}\\leq\\sum c$ , so we should have at least _as many_ clusters as we have average points per polygon.\n", | |
| "\n", | |
| "Not perfect but works most of the time:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def validate_clusters(clusters, unique_points)\n", | |
| " average = (unique_points.flatten.count.to_f / (unique_points.size * 2).to_f).round\n", | |
| " return clusters.select{|k,v| k!=-1}.size >= average\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 34, | |
| "text": [ | |
| ":validate_clusters" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 34 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "validate_clusters(vertex_clusters, unique_points)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 35, | |
| "text": [ | |
| "true" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 35 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now that this has been verified we proceed to connect.\n", | |
| "\n", | |
| "The general process to connect mean vertices to each other is:\n", | |
| "\n", | |
| "1. for each mean vertex:\n", | |
| " 1. find the cluster of vertices it represents (from_vertices)\n", | |
| " 1. for each vertex in from_vertices:\n", | |
| " 1. find the vertex it is connected to (to_vertex)\n", | |
| " 1. find the cluster to_vertex belongs to (to_cluster)\n", | |
| " 1. add a \"vote\" for to_cluster\n", | |
| " 1. tally the votes\n", | |
| " 1. the to_cluster with most votes is the connected cluster\n", | |
| "1. connect the clusters\n", | |
| "1. validate that the connection makes sense (eg: is a [directed cycle graph](http://en.wikipedia.org/wiki/Cycle_graph))\n", | |
| "\n", | |
| "Below all the corresponding functions:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def find_connected_point(point, original_points)\n", | |
| " original_points.each do |poly|\n", | |
| " poly.each_with_index do |p,index|\n", | |
| " return poly[index+1] if point === p\n", | |
| " end\n", | |
| " end\n", | |
| " return\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 36, | |
| "text": [ | |
| ":find_connected_point" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 36 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def find_cluster_for_point(point, clusters)\n", | |
| " clusters.each do |cluster|\n", | |
| " cluster[1].each do |p|\n", | |
| " return cluster[0] if point === p && cluster[0] != -1\n", | |
| " end\n", | |
| " end\n", | |
| " return\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 37, | |
| "text": [ | |
| ":find_cluster_for_point" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 37 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def connect_clusters(clusters, original_points)\n", | |
| " connections = {}\n", | |
| " # for each cluster\n", | |
| " clusters.each do |cluster|\n", | |
| " # for each point in cluster\n", | |
| " if cluster[0] != -1 # exclude invalid cluster\n", | |
| " cluster_votes = {} # to weigh connection popularity (diff pts might be connected to diff clusters)\n", | |
| " cluster[1].each do |point|\n", | |
| " # find original point connected to it\n", | |
| " connection = find_connected_point(point, original_points)\n", | |
| " connected_cluster = find_cluster_for_point(connection, clusters)\n", | |
| " # if original point belongs to another cluster\n", | |
| " if connected_cluster != nil && connected_cluster != cluster[0]\n", | |
| " # vote for the cluster\n", | |
| " cluster_votes[connected_cluster] = 0 if cluster_votes[connected_cluster] == nil\n", | |
| " cluster_votes[connected_cluster] += 1\n", | |
| " end\n", | |
| " end\n", | |
| " connections[cluster[0]] = cluster_votes.sort_by{|k, v| v}\n", | |
| " next if connections[cluster[0]].size == 0\n", | |
| " connections[cluster[0]] = connections[cluster[0]].reverse[0][0]\n", | |
| " end\n", | |
| " end\n", | |
| " return connections\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 38, | |
| "text": [ | |
| ":connect_clusters" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 38 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "connections = connect_clusters(vertex_clusters, cluster_poly_points)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 39, | |
| "text": [ | |
| "{0=>1, 1=>2, 2=>3, 3=>4, 4=>5, 5=>0}" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 39 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "As can be seen above this is a directed cycle graph and the end result is a clean path from the first vertex to the last one.\n", | |
| "\n", | |
| "The fact that the points are sorted (0 to 1, 1 to 2, 2 to 3, and so on) is somewhat coincidential. Below is a basic function that checks the graph and returns a sorted list of clusters (the order we need to follow to draw the mean polygon):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def sort_connections(connections)\n", | |
| " # does some simple check for non-circularity \n", | |
| " sorted = []\n", | |
| " seen = {}\n", | |
| " as_list = connections.select{|k,v| k}\n", | |
| " done = false\n", | |
| " first = as_list.first[0]\n", | |
| " from = first\n", | |
| " while !done do\n", | |
| " to = connections[from]\n", | |
| " done = true if seen[to] || to == nil || to.size == 0\n", | |
| " seen[to] = true\n", | |
| " from = to\n", | |
| " sorted.push(to)\n", | |
| " done = true if seen.size == connections.size\n", | |
| " end\n", | |
| " return nil if seen.size != connections.size\n", | |
| " return sorted\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 40, | |
| "text": [ | |
| ":sort_connections" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 40 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "# testing sort function\n", | |
| "sort_connections(connections)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 41, | |
| "text": [ | |
| "[1, 2, 3, 4, 5, 0]" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 41 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now we can proceed to build our final mean polygon:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def connect_mean_poly(mean_poly, connections)\n", | |
| " connected = []\n", | |
| " sorted = sort_connections(connections)\n", | |
| " return nil if sorted == nil\n", | |
| " sorted.each do |c|\n", | |
| " connected.push([mean_poly[c][0], mean_poly[c][1]])\n", | |
| " end\n", | |
| " # for GeoJSON, last == first\n", | |
| " first = sorted[0]\n", | |
| " connected.push([mean_poly[first][0], mean_poly[first][1]])\n", | |
| " return connected\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 42, | |
| "text": [ | |
| ":connect_mean_poly" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 42 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "final_polygon = connect_mean_poly(mean_poly, connections)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 43, | |
| "text": [ | |
| "[[-73.98632407188416, 40.73557147326963], [-73.98622447129412, 40.73570855047738], [-73.98618267156186, 40.73569075660959], [-73.98621819913387, 40.73564040507624], [-73.98620896786451, 40.735635064416286], [-73.9862696826458, 40.73554911699269], [-73.98632407188416, 40.73557147326963]]" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 43 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Let's see how all this looks like:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "plot = plot_clusters(vertex_clusters)\n", | |
| "m_x = final_polygon.map { |m| m[0] }\n", | |
| "m_y = final_polygon.map { |m| m[1] }\n", | |
| "sc = plot.add(:scatter, m_x, m_y)\n", | |
| "color = \"#ffff00\"\n", | |
| "sc.color(color)\n", | |
| "sc.shape('diamond')\n", | |
| "# add the MEAN POLYGON\n", | |
| "final_polygon.each_with_index do |c, i|\n", | |
| " next if i >= final_polygon.size-1\n", | |
| " from = [ final_polygon[i][0], final_polygon[i+1][0] ]\n", | |
| " to = [ final_polygon[i][1], final_polygon[i+1][1] ]\n", | |
| " plot.add(:line, from, to)\n", | |
| "end\n", | |
| "plot.show" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<div id='vis-63b28041-b4a5-46ca-a180-7b81d6e30777'></div>\n", | |
| "<script>\n", | |
| "(function(){\n", | |
| " var render = function(){\n", | |
| " var model = {\"panes\":[{\"diagrams\":[{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#7315a3\"},\"data\":\"4ab43641-706f-4da1-8813-55d5a79a9440\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#c15e82\"},\"data\":\"3d2d1699-9cc2-4c56-b1d4-2f73bd2d7992\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#635e90\"},\"data\":\"56d9819e-9b16-4edd-9c4d-be56208b9aaf\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#0dd3f0\"},\"data\":\"ee09c06f-593e-455c-b6b3-71ec7ee40662\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#c3a664\"},\"data\":\"a4bb4d61-07fc-4384-9425-e7fb1b2226bb\"},{\"type\":\"scatter\",\"options\":{\"x\":\"x\",\"y\":\"y\",\"tooltip_contents\":[\"cluster\"],\"color\":\"#4f0528\"},\"data\":\"c2c2ad5e-ef28-46e1-9c06-ba95b054c073\"},{\"type\":\"scatter\",\"options\":{\"x\":\"data0\",\"y\":\"data1\",\"color\":\"#ffff00\",\"shape\":\"diamond\"},\"data\":\"0c10be2f-3343-4f0b-82bf-017e9935aabb\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"8adc9db5-013d-4e38-b6ec-7de039499e2c\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"7b9f79b0-75f3-4447-80a1-356335986f5d\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"3711473d-1dc0-48ef-b83f-e0abff3dc992\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"4e4c2b7f-dead-432c-bd59-c9032a81753a\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"08f22f21-80bf-409d-98a5-51507c927907\"},{\"type\":\"line\",\"options\":{\"x\":\"data0\",\"y\":\"data1\"},\"data\":\"c104f1a8-fdd8-486e-a43e-7fea2cbc3b5e\"}],\"options\":{\"width\":300,\"height\":400,\"zoom\":true,\"rotate_x_label\":-60,\"xrange\":[-73.98633571101189,-73.98616953062057],\"yrange\":[40.73553787497709,40.73571995781772]}}],\"data\":{\"4ab43641-706f-4da1-8813-55d5a79a9440\":[{\"x\":-73.98627072572708,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.9862660318613,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554889117169,\"cluster\":0},{\"x\":-73.98627139627934,\"y\":40.735547874977094,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.73554889117167,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.735550923560815,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554990736624,\"cluster\":0},{\"x\":-73.98626938462257,\"y\":40.735550415463514,\"cluster\":0},{\"x\":-73.98627005517483,\"y\":40.73554939926897,\"cluster\":0}],\"3d2d1699-9cc2-4c56-b1d4-2f73bd2d7992\":[{\"x\":-73.98632504045963,\"y\":40.73557226364293,\"cluster\":1},{\"x\":-73.98632504045963,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735570739351566,\"cluster\":1},{\"x\":-73.98632436990738,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735572771740024,\"cluster\":1},{\"x\":-73.98632571101189,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632369935513,\"y\":40.735571755545806,\"cluster\":1},{\"x\":-73.98632235825062,\"y\":40.73557124744871,\"cluster\":1},{\"x\":-73.98632302880287,\"y\":40.73557023125444,\"cluster\":1}],\"56d9819e-9b16-4edd-9c4d-be56208b9aaf\":[{\"x\":-73.98622445762157,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622512817383,\"y\":40.73570944972167,\"cluster\":2},{\"x\":-73.98622579872608,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622445762157,\"y\":40.73570894162559,\"cluster\":2},{\"x\":-73.98622378706932,\"y\":40.73570995781772,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2},{\"x\":-73.98622360456956,\"y\":40.73570641325812,\"cluster\":2}],\"ee09c06f-593e-455c-b6b3-71ec7ee40662\":[{\"x\":-73.9861835539341,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569217445325,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569014206842,\"cluster\":3},{\"x\":-73.98618087172508,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618154227734,\"y\":40.73569065016463,\"cluster\":3},{\"x\":-73.98618288338184,\"y\":40.73569268254945,\"cluster\":3},{\"x\":-73.9861848950386,\"y\":40.735689633972214,\"cluster\":3},{\"x\":-73.98618768252459,\"y\":40.73568957578454,\"cluster\":3},{\"x\":-73.98617953062057,\"y\":40.735689633972214,\"cluster\":3}],\"a4bb4d61-07fc-4384-9425-e7fb1b2226bb\":[{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563933242788,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.7356423810074,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73564034862108,\"cluster\":4},{\"x\":-73.98621842265129,\"y\":40.735640856717666,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.73563984052446,\"cluster\":4},{\"x\":-73.98621909320354,\"y\":40.735638316234656,\"cluster\":4},{\"x\":-73.98621775209902,\"y\":40.735640856717666,\"cluster\":4}],\"c2c2ad5e-ef28-46e1-9c06-ba95b054c073\":[{\"x\":-73.98620970547199,\"y\":40.7356342514617,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563526765495,\"cluster\":5},{\"x\":-73.98620903491974,\"y\":40.73563577575159,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.735634251461676,\"cluster\":5},{\"x\":-73.9862110465765,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620970547199,\"y\":40.73563475955834,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.73563272717173,\"cluster\":5},{\"x\":-73.98620836436749,\"y\":40.7356362838482,\"cluster\":5},{\"x\":-73.98620769381522,\"y\":40.73563577575159,\"cluster\":5}],\"0c10be2f-3343-4f0b-82bf-017e9935aabb\":[{\"data0\":-73.98632407188416,\"data1\":40.73557147326963},{\"data0\":-73.98622447129412,\"data1\":40.73570855047738},{\"data0\":-73.98618267156186,\"data1\":40.73569075660959},{\"data0\":-73.98621819913387,\"data1\":40.73564040507624},{\"data0\":-73.98620896786451,\"data1\":40.735635064416286},{\"data0\":-73.9862696826458,\"data1\":40.73554911699269},{\"data0\":-73.98632407188416,\"data1\":40.73557147326963}],\"8adc9db5-013d-4e38-b6ec-7de039499e2c\":[{\"data0\":-73.98632407188416,\"data1\":40.73557147326963},{\"data0\":-73.98622447129412,\"data1\":40.73570855047738}],\"7b9f79b0-75f3-4447-80a1-356335986f5d\":[{\"data0\":-73.98622447129412,\"data1\":40.73570855047738},{\"data0\":-73.98618267156186,\"data1\":40.73569075660959}],\"3711473d-1dc0-48ef-b83f-e0abff3dc992\":[{\"data0\":-73.98618267156186,\"data1\":40.73569075660959},{\"data0\":-73.98621819913387,\"data1\":40.73564040507624}],\"4e4c2b7f-dead-432c-bd59-c9032a81753a\":[{\"data0\":-73.98621819913387,\"data1\":40.73564040507624},{\"data0\":-73.98620896786451,\"data1\":40.735635064416286}],\"08f22f21-80bf-409d-98a5-51507c927907\":[{\"data0\":-73.98620896786451,\"data1\":40.735635064416286},{\"data0\":-73.9862696826458,\"data1\":40.73554911699269}],\"c104f1a8-fdd8-486e-a43e-7fea2cbc3b5e\":[{\"data0\":-73.9862696826458,\"data1\":40.73554911699269},{\"data0\":-73.98632407188416,\"data1\":40.73557147326963}]},\"extension\":[]}\n", | |
| " Nyaplot.core.parse(model, '#vis-63b28041-b4a5-46ca-a180-7b81d6e30777');\n", | |
| " };\n", | |
| " if(window['Nyaplot']==undefined){\n", | |
| " window.addEventListener('load_nyaplot', render, false);\n", | |
| "\treturn;\n", | |
| " }\n", | |
| " render();\n", | |
| "})();\n", | |
| "</script>\n" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 44, | |
| "text": [ | |
| "\"<div id='vis-63b28041-b4a5-46ca-a180-7b81d6e30777'></div>\\n<script>\\n(function(){\\n var render = function(){\\n var model = {\\\"panes\\\":[{\\\"diagrams\\\":[{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#7315a3\\\"},\\\"data\\\":\\\"4ab43641-706f-4da1-8813-55d5a79a9440\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#c15e82\\\"},\\\"data\\\":\\\"3d2d1699-9cc2-4c56-b1d4-2f73bd2d7992\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#635e90\\\"},\\\"data\\\":\\\"56d9819e-9b16-4edd-9c4d-be56208b9aaf\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#0dd3f0\\\"},\\\"data\\\":\\\"ee09c06f-593e-455c-b6b3-71ec7ee40662\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#c3a664\\\"},\\\"data\\\":\\\"a4bb4d61-07fc-4384-9425-e7fb1b2226bb\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"x\\\",\\\"y\\\":\\\"y\\\",\\\"tooltip_contents\\\":[\\\"cluster\\\"],\\\"color\\\":\\\"#4f0528\\\"},\\\"data\\\":\\\"c2c2ad5e-ef28-46e1-9c06-ba95b054c073\\\"},{\\\"type\\\":\\\"scatter\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\",\\\"color\\\":\\\"#ffff00\\\",\\\"shape\\\":\\\"diamond\\\"},\\\"data\\\":\\\"0c10be2f-3343-4f0b-82bf-017e9935aabb\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"8adc9db5-013d-4e38-b6ec-7de039499e2c\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"7b9f79b0-75f3-4447-80a1-356335986f5d\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"3711473d-1dc0-48ef-b83f-e0abff3dc992\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"4e4c2b7f-dead-432c-bd59-c9032a81753a\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"08f22f21-80bf-409d-98a5-51507c927907\\\"},{\\\"type\\\":\\\"line\\\",\\\"options\\\":{\\\"x\\\":\\\"data0\\\",\\\"y\\\":\\\"data1\\\"},\\\"data\\\":\\\"c104f1a8-fdd8-486e-a43e-7fea2cbc3b5e\\\"}],\\\"options\\\":{\\\"width\\\":300,\\\"height\\\":400,\\\"zoom\\\":true,\\\"rotate_x_label\\\":-60,\\\"xrange\\\":[-73.98633571101189,-73.98616953062057],\\\"yrange\\\":[40.73553787497709,40.73571995781772]}}],\\\"data\\\":{\\\"4ab43641-706f-4da1-8813-55d5a79a9440\\\":[{\\\"x\\\":-73.98627072572708,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.9862660318613,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554889117169,\\\"cluster\\\":0},{\\\"x\\\":-73.98627139627934,\\\"y\\\":40.735547874977094,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.73554889117167,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.735550923560815,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554990736624,\\\"cluster\\\":0},{\\\"x\\\":-73.98626938462257,\\\"y\\\":40.735550415463514,\\\"cluster\\\":0},{\\\"x\\\":-73.98627005517483,\\\"y\\\":40.73554939926897,\\\"cluster\\\":0}],\\\"3d2d1699-9cc2-4c56-b1d4-2f73bd2d7992\\\":[{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.73557226364293,\\\"cluster\\\":1},{\\\"x\\\":-73.98632504045963,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735570739351566,\\\"cluster\\\":1},{\\\"x\\\":-73.98632436990738,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735572771740024,\\\"cluster\\\":1},{\\\"x\\\":-73.98632571101189,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632369935513,\\\"y\\\":40.735571755545806,\\\"cluster\\\":1},{\\\"x\\\":-73.98632235825062,\\\"y\\\":40.73557124744871,\\\"cluster\\\":1},{\\\"x\\\":-73.98632302880287,\\\"y\\\":40.73557023125444,\\\"cluster\\\":1}],\\\"56d9819e-9b16-4edd-9c4d-be56208b9aaf\\\":[{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622512817383,\\\"y\\\":40.73570944972167,\\\"cluster\\\":2},{\\\"x\\\":-73.98622579872608,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622445762157,\\\"y\\\":40.73570894162559,\\\"cluster\\\":2},{\\\"x\\\":-73.98622378706932,\\\"y\\\":40.73570995781772,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2},{\\\"x\\\":-73.98622360456956,\\\"y\\\":40.73570641325812,\\\"cluster\\\":2}],\\\"ee09c06f-593e-455c-b6b3-71ec7ee40662\\\":[{\\\"x\\\":-73.9861835539341,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569217445325,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569014206842,\\\"cluster\\\":3},{\\\"x\\\":-73.98618087172508,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618154227734,\\\"y\\\":40.73569065016463,\\\"cluster\\\":3},{\\\"x\\\":-73.98618288338184,\\\"y\\\":40.73569268254945,\\\"cluster\\\":3},{\\\"x\\\":-73.9861848950386,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3},{\\\"x\\\":-73.98618768252459,\\\"y\\\":40.73568957578454,\\\"cluster\\\":3},{\\\"x\\\":-73.98617953062057,\\\"y\\\":40.735689633972214,\\\"cluster\\\":3}],\\\"a4bb4d61-07fc-4384-9425-e7fb1b2226bb\\\":[{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563933242788,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.7356423810074,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73564034862108,\\\"cluster\\\":4},{\\\"x\\\":-73.98621842265129,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.73563984052446,\\\"cluster\\\":4},{\\\"x\\\":-73.98621909320354,\\\"y\\\":40.735638316234656,\\\"cluster\\\":4},{\\\"x\\\":-73.98621775209902,\\\"y\\\":40.735640856717666,\\\"cluster\\\":4}],\\\"c2c2ad5e-ef28-46e1-9c06-ba95b054c073\\\":[{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.7356342514617,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563526765495,\\\"cluster\\\":5},{\\\"x\\\":-73.98620903491974,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.735634251461676,\\\"cluster\\\":5},{\\\"x\\\":-73.9862110465765,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620970547199,\\\"y\\\":40.73563475955834,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.73563272717173,\\\"cluster\\\":5},{\\\"x\\\":-73.98620836436749,\\\"y\\\":40.7356362838482,\\\"cluster\\\":5},{\\\"x\\\":-73.98620769381522,\\\"y\\\":40.73563577575159,\\\"cluster\\\":5}],\\\"0c10be2f-3343-4f0b-82bf-017e9935aabb\\\":[{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963},{\\\"data0\\\":-73.98622447129412,\\\"data1\\\":40.73570855047738},{\\\"data0\\\":-73.98618267156186,\\\"data1\\\":40.73569075660959},{\\\"data0\\\":-73.98621819913387,\\\"data1\\\":40.73564040507624},{\\\"data0\\\":-73.98620896786451,\\\"data1\\\":40.735635064416286},{\\\"data0\\\":-73.9862696826458,\\\"data1\\\":40.73554911699269},{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963}],\\\"8adc9db5-013d-4e38-b6ec-7de039499e2c\\\":[{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963},{\\\"data0\\\":-73.98622447129412,\\\"data1\\\":40.73570855047738}],\\\"7b9f79b0-75f3-4447-80a1-356335986f5d\\\":[{\\\"data0\\\":-73.98622447129412,\\\"data1\\\":40.73570855047738},{\\\"data0\\\":-73.98618267156186,\\\"data1\\\":40.73569075660959}],\\\"3711473d-1dc0-48ef-b83f-e0abff3dc992\\\":[{\\\"data0\\\":-73.98618267156186,\\\"data1\\\":40.73569075660959},{\\\"data0\\\":-73.98621819913387,\\\"data1\\\":40.73564040507624}],\\\"4e4c2b7f-dead-432c-bd59-c9032a81753a\\\":[{\\\"data0\\\":-73.98621819913387,\\\"data1\\\":40.73564040507624},{\\\"data0\\\":-73.98620896786451,\\\"data1\\\":40.735635064416286}],\\\"08f22f21-80bf-409d-98a5-51507c927907\\\":[{\\\"data0\\\":-73.98620896786451,\\\"data1\\\":40.735635064416286},{\\\"data0\\\":-73.9862696826458,\\\"data1\\\":40.73554911699269}],\\\"c104f1a8-fdd8-486e-a43e-7fea2cbc3b5e\\\":[{\\\"data0\\\":-73.9862696826458,\\\"data1\\\":40.73554911699269},{\\\"data0\\\":-73.98632407188416,\\\"data1\\\":40.73557147326963}]},\\\"extension\\\":[]}\\n Nyaplot.core.parse(model, '#vis-63b28041-b4a5-46ca-a180-7b81d6e30777');\\n };\\n if(window['Nyaplot']==undefined){\\n window.addEventListener('load_nyaplot', render, false);\\n\\treturn;\\n }\\n render();\\n})();\\n</script>\\n\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 44 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "To wrap it all up we create a single consensus function that receives a GeoJSON string and returns a list of mean polygons:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "def calculate_polygonfix_consensus(geojson)\n", | |
| " output = []\n", | |
| " geom = parse_geojson(geojson)\n", | |
| " centroids = get_all_centroids(geom)\n", | |
| " centroid_clusters = cluster_centroids(centroids)\n", | |
| " centroid_clusters.each do |ccluster|\n", | |
| " next if ccluster[0] == -1\n", | |
| " cluster = ccluster[1] # only the set of latlons\n", | |
| " sub_geom = get_polys_for_centroid_cluster(cluster, centroids, geom)\n", | |
| " next if sub_geom.size == 0\n", | |
| " original_points = get_all_poly_points(sub_geom)\n", | |
| " next if original_points == nil\n", | |
| " unique_points = original_points.map{|poly| poly[1..-1]}\n", | |
| " vertex_clusters = cluster_points(unique_points)\n", | |
| " next if !validate_clusters(vertex_clusters, unique_points)\n", | |
| " mean_poly = get_mean_poly(vertex_clusters)\n", | |
| " next if mean_poly == {}\n", | |
| " connections = connect_clusters(vertex_clusters, original_points)\n", | |
| " next if connections == nil || connections == {}\n", | |
| " poly = connect_mean_poly(mean_poly, connections)\n", | |
| " next if poly == nil || poly.count == 0\n", | |
| " output.push(poly)\n", | |
| " end\n", | |
| " return output\n", | |
| "end" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 45, | |
| "text": [ | |
| ":calculate_polygonfix_consensus" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 45 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "consensus = calculate_polygonfix_consensus(geomstr)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 46, | |
| "text": [ | |
| "[[[-73.98632407188416, 40.73557147326963], [-73.98622447129412, 40.73570855047738], [-73.98618267156186, 40.73569075660959], [-73.98621819913387, 40.73564040507624], [-73.98620896786451, 40.735635064416286], [-73.9862696826458, 40.73554911699269], [-73.98632407188416, 40.73557147326963]]]" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 46 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "The GeoJSON of all this might look something like:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "geo_json = {:type => \"FeatureCollection\", :features => consensus.map { |f| {:type => \"Feature\", :properties => { :id => 1 }, :geometry => { :type => \"Polygon\", :coordinates =>[f] } } } }.to_json\n", | |
| "puts geo_json" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": [ | |
| "{\"type\":\"FeatureCollection\",\"features\":[{\"type\":\"Feature\",\"properties\":{\"id\":1},\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[-73.98632407188416,40.73557147326963],[-73.98622447129412,40.73570855047738],[-73.98618267156186,40.73569075660959],[-73.98621819913387,40.73564040507624],[-73.98620896786451,40.735635064416286],[-73.9862696826458,40.73554911699269],[-73.98632407188416,40.73557147326963]]]}}]}\n" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 47 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now let's plots the resulting GeoJSON on the original map (purple):" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "IRuby.html '<iframe src=\"http://jsfiddle.net/mgiraldo/m4XeU/1/embedded/result/\" width=500 height=400></iframe>'" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "html": [ | |
| "<iframe src=\"http://jsfiddle.net/mgiraldo/m4XeU/1/embedded/result/\" width=500 height=400></iframe>" | |
| ], | |
| "metadata": {}, | |
| "output_type": "pyout", | |
| "prompt_number": 48, | |
| "text": [ | |
| "\"<iframe src=\\\"http://jsfiddle.net/mgiraldo/m4XeU/1/embedded/result/\\\" width=500 height=400></iframe>\"" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 48 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Voil\u00e0! The mean polygon looks good!" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Conclusion\n", | |
| "\n", | |
| "This is a first step towards finding geometric consensus from a list of user contributions to a given starting geometry and a map. It is a work in progress and hopefully other ideas can be added to improve this algorithm.\n", | |
| "\n", | |
| "This code is part of NYPL Labs' [Building Inspector](http://buildinginspector.nypl.org/). Explore and fork the [GitHub repository](https://github.com/NYPL/building-inspector).\n", | |
| "\n", | |
| "This notebook was created by [Mauricio Giraldo Arteaga](https://twitter.com/mgiraldo)." | |
| ] | |
| } | |
| ], | |
| "metadata": {} | |
| } | |
| ] | |
| } |
Author
Hi, current version of nyaplot can rotate labels on x and y-axis. Try it if you don't like over-lapping labels :)
Author
Done :)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
see it in nbviewer here: http://nbviewer.ipython.org/gist/a68b53175ce5892531bc