Last active
August 29, 2015 14:20
-
-
Save HubLot/aef0ecd56f40ec2ceaab to your computer and use it in GitHub Desktop.
Hclust R vs Python
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Comparison between hierarchical clustering in R and Python" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "For R, the version used is 3.2.0.\n", | |
| "\n", | |
| "For Python, the [scipy library](http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) will be used. scikit-learn is also used even if, for hierarchical clustering only, it's based on scipy.\n", | |
| "\n", | |
| "**An important note: ** The input matrix for scipy function is a **condensed matrix** (a condensed distance matrix is a flat array containing the upper triangular of the distance matrix). The documentation on that is wrong that point. ([stackoverflow post](https://stackoverflow.com/questions/18952587/use-distance-matrix-in-scipy-cluster-hierarchy-linkage), [issues](https://github.com/scipy/scipy/issues/2614) and [PR](https://github.com/scipy/scipy/pull/2615) on github's scipy).\n", | |
| "\n", | |
| "For R, the whole sparse distance matrix can be used.\n", | |
| "\n", | |
| "For scikit-learn, we need to set the `affinity=\"precomputed\"` when the input is a distance matrix. By default, scikit-learn use as input a matrix of observations." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "### Prepare the data" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 89, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "import PBlib as PB\n", | |
| "import PBclust\n", | |
| "import numpy as np\n", | |
| "from sklearn.cluster import AgglomerativeClustering\n", | |
| "import scipy.cluster.hierarchy as scipyh\n", | |
| "import scipy.spatial.distance as ssd" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 47, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "def medoids(dist, clusters_sklearn):\n", | |
| " medo = []\n", | |
| " for cluster in np.unique(clusters_sklearn):\n", | |
| " medo.append(np.argmin(np.sum(dist[clusters_sklearn == cluster],axis=0)))\n", | |
| " return medo" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 90, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "read 270 sequences in demo2_tmp/psi_md_traj_all.PB.fasta\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "#Get the data\n", | |
| "header_lst, seq_lst = PB.read_several_fasta([\"demo2_tmp/psi_md_traj_all.PB.fasta\"])\n", | |
| "substitution_mat = PB.load_substitution_matrix(PB.SUBSTITUTION_MATRIX_NAME)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 91, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Building distance matrix\n", | |
| "100%\n", | |
| "Building distance matrix\n", | |
| "100%\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "#Distances\n", | |
| "distance_mat = PB.distance_matrix(seq_lst, substitution_mat)\n", | |
| "\n", | |
| "#condensed one for scipy\n", | |
| "# The diagonal of the whole matrix has to be set at 0 first.\n", | |
| "dist_mat = PB.distance_matrix(seq_lst, substitution_mat)\n", | |
| "np.fill_diagonal(dist_mat, 0)\n", | |
| "dist_mat_cond = ssd.squareform(dist_mat)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Ward method" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "For R versions > 3.0.3, the good ward method is known as `ward.D2` (see [stats.exchange](https://stats.stackexchange.com/questions/109949/what-algorithm-does-ward-d-in-hclust-implement-if-it-is-not-wards-criteria) and the corresponding [paper](http://arxiv.org/pdf/1111.6285.pdf). For R versions <= 3.0.3, the only method is `ward`. \n", | |
| "\n", | |
| "To obtain the same results in both methods, the distance matrix has to be squared method before using `ward` method. See my [example notebook](http://nbviewer.ipython.org/gist/HubLot/7f0824c33d0b200bb10b).\n", | |
| "\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "### R" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 95, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "R clustering\n", | |
| "cluster 1: 90 sequences (33%)\n", | |
| "cluster 3: 71 sequences (26%)\n", | |
| "cluster 5: 58 sequences (21%)\n", | |
| "cluster 4: 32 sequences (12%)\n", | |
| "cluster 2: 19 sequences ( 7%)\n", | |
| "Index of medoids: [65, 94, 164, 180, 267]\n", | |
| "\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "nclusters = 5\n", | |
| "\n", | |
| "#R way\n", | |
| "clusters_R_D2, medoid_R_D2 = PB.hclust(distance_mat, nclusters=nclusters, method='ward.D2')\n", | |
| "\n", | |
| "print(\"R clustering\")\n", | |
| "PBclust.display_clust_report(clusters_R_D2)\n", | |
| "print(\"Index of medoids: {0}\".format(medoid_R_D2))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "### Scipy" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 60, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "ename": "ValueError", | |
| "evalue": "Valid methods when the raw observations are omitted are 'single', 'complete', 'weighted', and 'average'.", | |
| "output_type": "error", | |
| "traceback": [ | |
| "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", | |
| "\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", | |
| "\u001b[1;32m<ipython-input-60-7ab9ead058a3>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# Scipy way condensed\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mdata_link_cond\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mscipyh\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mward\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdist_mat_cond\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 3\u001b[0m \u001b[0mclusters_scipy_cond\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mscipyh\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfcluster\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata_link_cond\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnclusters\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcriterion\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m'maxclust'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 4\u001b[0m \u001b[0mmedoid_scipy_cond\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mmedoids\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdist_array\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mclusters_scipy_cond\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n", | |
| "\u001b[1;32m/home/temple/santuz/.virtualenvs/pbxplore/local/lib/python2.7/site-packages/scipy/cluster/hierarchy.pyc\u001b[0m in \u001b[0;36mward\u001b[1;34m(y)\u001b[0m\n\u001b[0;32m 463\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 464\u001b[0m \"\"\"\n\u001b[1;32m--> 465\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mlinkage\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0my\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmethod\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m'ward'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmetric\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m'euclidean'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 466\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 467\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n", | |
| "\u001b[1;32m/home/temple/santuz/.virtualenvs/pbxplore/local/lib/python2.7/site-packages/scipy/cluster/hierarchy.pyc\u001b[0m in \u001b[0;36mlinkage\u001b[1;34m(y, method, metric)\u001b[0m\n\u001b[0;32m 625\u001b[0m \u001b[0md\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mdistance\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mnum_obs_y\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0my\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 626\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mmethod\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[1;32min\u001b[0m \u001b[0m_cpy_non_euclid_methods\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 627\u001b[1;33m raise ValueError(\"Valid methods when the raw observations are \"\n\u001b[0m\u001b[0;32m 628\u001b[0m \u001b[1;34m\"omitted are 'single', 'complete', 'weighted', \"\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 629\u001b[0m \"and 'average'.\")\n", | |
| "\u001b[1;31mValueError\u001b[0m: Valid methods when the raw observations are omitted are 'single', 'complete', 'weighted', and 'average'." | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Scipy way condensed\n", | |
| "data_link_cond = scipyh.ward(dist_mat_cond)\n", | |
| "clusters_scipy_cond = scipyh.fcluster(data_link_cond, nclusters, criterion='maxclust')\n", | |
| "medoid_scipy_cond = medoids(dist_array, clusters_scipy_cond)\n", | |
| " " | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "You see a nice exception when using a condensed matrix with the ward method...\n", | |
| "\n", | |
| "In fact, for now, it's not possible to use a condensed distance matrix and the ward method with scipy ([issue on github](https://github.com/scipy/scipy/issues/1798))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "### scikit-learn" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 113, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "ename": "ValueError", | |
| "evalue": "precomputed was provided as affinity. Ward can only work with euclidean distances.", | |
| "output_type": "error", | |
| "traceback": [ | |
| "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", | |
| "\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", | |
| "\u001b[1;32m<ipython-input-113-40de67e77546>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m#The input for scikit-learn is a matrix of observations\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[1;31m#Since we used a distance matrix, we need to set affinity=\"precomputed\"\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0mward\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mAgglomerativeClustering\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mn_clusters\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mlinkage\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m'ward'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0maffinity\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m\"precomputed\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdistance_mat\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", | |
| "\u001b[1;32m/home/temple/santuz/.virtualenvs/pbxplore/local/lib/python2.7/site-packages/sklearn/cluster/hierarchical.pyc\u001b[0m in \u001b[0;36mfit\u001b[1;34m(self, X, y)\u001b[0m\n\u001b[0;32m 717\u001b[0m raise ValueError(\"%s was provided as affinity. Ward can only \"\n\u001b[0;32m 718\u001b[0m \u001b[1;34m\"work with euclidean distances.\"\u001b[0m \u001b[1;33m%\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 719\u001b[1;33m (self.affinity, ))\n\u001b[0m\u001b[0;32m 720\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 721\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mlinkage\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[1;32min\u001b[0m \u001b[0m_TREE_BUILDERS\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", | |
| "\u001b[1;31mValueError\u001b[0m: precomputed was provided as affinity. Ward can only work with euclidean distances." | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "#The input for scikit-learn is a matrix of observations\n", | |
| "#Since we used a distance matrix, we need to set affinity=\"precomputed\"\n", | |
| "ward = AgglomerativeClustering(n_clusters=5, linkage='ward', affinity=\"precomputed\").fit(distance_mat)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "It's not possible either with scikit-learn as the `ward` linkage only works with `affinity` set at \"euclidian\"" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Average method" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 121, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "nclusters = 5\n", | |
| "\n", | |
| "#R way\n", | |
| "clusters_R_ave, medoid_R_ave = PB.hclust(distance_mat, nclusters=nclusters, method='average')\n", | |
| "\n", | |
| "# Scipy way\n", | |
| "data_link_ave = scipyh.linkage(dist_mat_cond, method='average')\n", | |
| "clusters_scipy_ave = scipyh.fcluster(data_link_ave, nclusters, criterion='maxclust')\n", | |
| "medoid_scipy_ave = medoids(distance_mat, clusters_scipy_ave)\n", | |
| "\n", | |
| "#scikit-learn\n", | |
| "average = AgglomerativeClustering(n_clusters=5, linkage='average', affinity=\"precomputed\").fit(distance_mat)\n", | |
| "clusters_sklearn_ave = list(average.labels_)\n", | |
| "medoid_sklearn_ave = medoids(distance_mat, clusters_sklearn_ave)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 122, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "R clustering\n", | |
| "cluster 1: 90 sequences (33%)\n", | |
| "cluster 3: 69 sequences (26%)\n", | |
| "cluster 4: 57 sequences (21%)\n", | |
| "cluster 2: 53 sequences (20%)\n", | |
| "cluster 5: 1 sequences ( 0%)\n", | |
| "Index of medoids: [65, 180, 164, 267, 230]\n", | |
| "\n", | |
| "scipy clustering\n", | |
| "cluster 1: 90 sequences (33%)\n", | |
| "cluster 2: 69 sequences (26%)\n", | |
| "cluster 4: 57 sequences (21%)\n", | |
| "cluster 3: 53 sequences (20%)\n", | |
| "cluster 5: 1 sequences ( 0%)\n", | |
| "Index of medoids: [65, 164, 180, 267, 230]\n", | |
| "\n", | |
| "scikit-learn clustering\n", | |
| "cluster 3: 90 sequences (33%)\n", | |
| "cluster 2: 69 sequences (26%)\n", | |
| "cluster 0: 57 sequences (21%)\n", | |
| "cluster 1: 53 sequences (20%)\n", | |
| "cluster 4: 1 sequences ( 0%)\n", | |
| "Index of medoids: [267, 180, 164, 65, 230]\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "#Sum up the results\n", | |
| "\n", | |
| "print(\"R clustering\")\n", | |
| "PBclust.display_clust_report(clusters_R_ave)\n", | |
| "print(\"Index of medoids: {0}\".format(medoid_R_ave))\n", | |
| "print\n", | |
| "\n", | |
| "print(\"scipy clustering\")\n", | |
| "PBclust.display_clust_report(clusters_scipy_ave)\n", | |
| "print(\"Index of medoids: {0}\".format(medoid_scipy_ave))\n", | |
| "print\n", | |
| "\n", | |
| "print(\"scikit-learn clustering\")\n", | |
| "PBclust.display_clust_report(clusters_sklearn_ave)\n", | |
| "print(\"Index of medoids: {0}\".format(medoid_sklearn_ave))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "source": [ | |
| "### Great same results!" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Complete method" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 117, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "nclusters = 5\n", | |
| "\n", | |
| "#R way\n", | |
| "clusters_R_comp, medoid_R_comp = PB.hclust(distance_mat, nclusters=nclusters, method=\"complete\")\n", | |
| "\n", | |
| "# Scipy way\n", | |
| "data_link_comp = scipyh.linkage(dist_mat_cond, method=\"complete\")\n", | |
| "clusters_scipy_comp = scipyh.fcluster(data_link_comp, nclusters, criterion='maxclust')\n", | |
| "medoid_scipy_comp = medoids(distance_mat, clusters_scipy_comp)\n", | |
| "\n", | |
| "#scikit-learn\n", | |
| "complete = AgglomerativeClustering(n_clusters=5, linkage='complete', affinity=\"precomputed\").fit(distance_mat)\n", | |
| "clusters_sklearn_comp = list(complete.labels_)\n", | |
| "medoid_sklearn_comp = medoids(distance_mat, clusters_sklearn_comp)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 118, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "R clustering\n", | |
| "cluster 1: 90 sequences (33%)\n", | |
| "cluster 3: 62 sequences (23%)\n", | |
| "cluster 5: 54 sequences (20%)\n", | |
| "cluster 4: 36 sequences (13%)\n", | |
| "cluster 2: 28 sequences (10%)\n", | |
| "Index of medoids: [65, 94, 164, 180, 267]\n", | |
| "\n", | |
| "scipy clustering\n", | |
| "cluster 3: 90 sequences (33%)\n", | |
| "cluster 4: 62 sequences (23%)\n", | |
| "cluster 2: 54 sequences (20%)\n", | |
| "cluster 1: 36 sequences (13%)\n", | |
| "cluster 5: 28 sequences (10%)\n", | |
| "Index of medoids: [180, 267, 65, 164, 94]\n", | |
| "\n", | |
| "scikit-learn clustering\n", | |
| "cluster 2: 90 sequences (33%)\n", | |
| "cluster 4: 62 sequences (23%)\n", | |
| "cluster 0: 54 sequences (20%)\n", | |
| "cluster 1: 36 sequences (13%)\n", | |
| "cluster 3: 28 sequences (10%)\n", | |
| "Index of medoids: [267, 180, 65, 94, 164]\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "#Sum up the results\n", | |
| "\n", | |
| "print(\"R clustering\")\n", | |
| "PBclust.display_clust_report(clusters_R_comp)\n", | |
| "print(\"Index of medoids: {0}\".format(medoid_R_comp))\n", | |
| "print\n", | |
| "\n", | |
| "print(\"scipy clustering\")\n", | |
| "PBclust.display_clust_report(clusters_scipy_comp)\n", | |
| "print(\"Index of medoids: {0}\".format(medoid_scipy_comp))\n", | |
| "print\n", | |
| "\n", | |
| "print(\"scikit-learn clustering\")\n", | |
| "PBclust.display_clust_report(clusters_sklearn_comp)\n", | |
| "print(\"Index of medoids: {0}\".format(medoid_sklearn_comp))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "### Great same results!" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 2", | |
| "language": "python", | |
| "name": "python2" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 2 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython2", | |
| "version": "2.7.6" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 0 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment