Created
August 7, 2020 05:32
-
-
Save gautamborgohain/c752013024ae600b747b6d492b471ccc to your computer and use it in GitHub Desktop.
Statistical tests and how to use them
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "metadata": { | |
| "toc": true | |
| }, | |
| "cell_type": "markdown", | |
| "source": "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Statistical-Tests-and-how-to-use-them\" data-toc-modified-id=\"Statistical-Tests-and-how-to-use-them-1\"><span class=\"toc-item-num\">1 </span>Statistical Tests and how to use them</a></span><ul class=\"toc-item\"><li><span><a href=\"#Critical-Value-$\\alpha-=-0.05$\" data-toc-modified-id=\"Critical-Value-$\\alpha-=-0.05$-1.1\"><span class=\"toc-item-num\">1.1 </span>Critical Value $\\alpha = 0.05$</a></span></li><li><span><a href=\"#One-sample-T-test\" data-toc-modified-id=\"One-sample-T-test-1.2\"><span class=\"toc-item-num\">1.2 </span>One sample T-test</a></span></li><li><span><a href=\"#Fireplaces-and-House-Prices\" data-toc-modified-id=\"Fireplaces-and-House-Prices-1.3\"><span class=\"toc-item-num\">1.3 </span>Fireplaces and House Prices</a></span><ul class=\"toc-item\"><li><span><a href=\"#Independent-samples-t-test\" data-toc-modified-id=\"Independent-samples-t-test-1.3.1\"><span class=\"toc-item-num\">1.3.1 </span>Independent samples t-test</a></span></li><li><span><a href=\"#One-way-anova\" data-toc-modified-id=\"One-way-anova-1.3.2\"><span class=\"toc-item-num\">1.3.2 </span>One way anova</a></span><ul class=\"toc-item\"><li><span><a href=\"#Tukey-HSD\" data-toc-modified-id=\"Tukey-HSD-1.3.2.1\"><span class=\"toc-item-num\">1.3.2.1 </span>Tukey HSD</a></span></li></ul></li><li><span><a href=\"#Mann-Whitney-U-test\" data-toc-modified-id=\"Mann-Whitney-U-test-1.3.3\"><span class=\"toc-item-num\">1.3.3 </span>Mann Whitney U test</a></span></li><li><span><a href=\"#Kruskal-Wallis-test\" data-toc-modified-id=\"Kruskal-Wallis-test-1.3.4\"><span class=\"toc-item-num\">1.3.4 </span>Kruskal Wallis test</a></span></li></ul></li><li><span><a href=\"#Exploration-what-variables-are-co-related-with-SalePrice\" data-toc-modified-id=\"Exploration-what-variables-are-co-related-with-SalePrice-1.4\"><span class=\"toc-item-num\">1.4 </span>Exploration what variables are co-related with SalePrice</a></span></li><li><span><a href=\"#Wrap-up\" data-toc-modified-id=\"Wrap-up-1.5\"><span class=\"toc-item-num\">1.5 </span>Wrap-up</a></span></li></ul></li></ul></div>" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "# Statistical Tests and how to use them\n\n" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "For this demo, I am using the [House Prices dataset available on Kaggle](https://www.kaggle.com/c/house-prices-advanced-regression-techniques) \n\nThis data is also called the [Ames Housing dataset](http://jse.amstat.org/v19n3/decock.pdf) and it contains individual residential transactions from 2006-2010 in Ames, Iowa, USA. It contains 23 nominal, 23 ordinal, 14 discrete, and 20 continuous variables" | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-05T14:40:25.359319Z", | |
| "start_time": "2020-08-05T14:40:22.548732Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "import pandas as pd\nfrom pandas_profiling import ProfileReport\nimport plotly.express as px\nimport scipy\nimport numpy as np\nimport statsmodels.api as sm\nfrom statsmodels.formula.api import ols\nfrom statsmodels.stats.multicomp import pairwise_tukeyhsd", | |
| "execution_count": 62, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-05T14:40:27.695542Z", | |
| "start_time": "2020-08-05T14:40:27.581296Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "df = pd.read_csv(\"/Users/gautamborgohain/pers/data/kaggle_house_prices/train.csv\",na_values=\"NaN\")\nprint(df.shape)\nprint(df.columns)\ndf.head()", | |
| "execution_count": 2, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": "(1460, 81)\nIndex(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',\n 'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',\n 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',\n 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',\n 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',\n 'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',\n 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',\n 'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',\n 'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',\n 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',\n 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',\n 'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',\n 'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',\n 'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF',\n 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC',\n 'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType',\n 'SaleCondition', 'SalePrice'],\n dtype='object')\n" | |
| }, | |
| { | |
| "data": { | |
| "text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Id</th>\n <th>MSSubClass</th>\n <th>MSZoning</th>\n <th>LotFrontage</th>\n <th>LotArea</th>\n <th>Street</th>\n <th>Alley</th>\n <th>LotShape</th>\n <th>LandContour</th>\n <th>Utilities</th>\n <th>...</th>\n <th>PoolArea</th>\n <th>PoolQC</th>\n <th>Fence</th>\n <th>MiscFeature</th>\n <th>MiscVal</th>\n <th>MoSold</th>\n <th>YrSold</th>\n <th>SaleType</th>\n <th>SaleCondition</th>\n <th>SalePrice</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>60</td>\n <td>RL</td>\n <td>65.0</td>\n <td>8450</td>\n <td>Pave</td>\n <td>NaN</td>\n <td>Reg</td>\n <td>Lvl</td>\n <td>AllPub</td>\n <td>...</td>\n <td>0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>0</td>\n <td>2</td>\n <td>2008</td>\n <td>WD</td>\n <td>Normal</td>\n <td>208500</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>20</td>\n <td>RL</td>\n <td>80.0</td>\n <td>9600</td>\n <td>Pave</td>\n <td>NaN</td>\n <td>Reg</td>\n <td>Lvl</td>\n <td>AllPub</td>\n <td>...</td>\n <td>0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>0</td>\n <td>5</td>\n <td>2007</td>\n <td>WD</td>\n <td>Normal</td>\n <td>181500</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>60</td>\n <td>RL</td>\n <td>68.0</td>\n <td>11250</td>\n <td>Pave</td>\n <td>NaN</td>\n <td>IR1</td>\n <td>Lvl</td>\n <td>AllPub</td>\n <td>...</td>\n <td>0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>0</td>\n <td>9</td>\n <td>2008</td>\n <td>WD</td>\n <td>Normal</td>\n <td>223500</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>70</td>\n <td>RL</td>\n <td>60.0</td>\n <td>9550</td>\n <td>Pave</td>\n <td>NaN</td>\n <td>IR1</td>\n <td>Lvl</td>\n <td>AllPub</td>\n <td>...</td>\n <td>0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>0</td>\n <td>2</td>\n <td>2006</td>\n <td>WD</td>\n <td>Abnorml</td>\n <td>140000</td>\n </tr>\n <tr>\n <th>4</th>\n <td>5</td>\n <td>60</td>\n <td>RL</td>\n <td>84.0</td>\n <td>14260</td>\n <td>Pave</td>\n <td>NaN</td>\n <td>IR1</td>\n <td>Lvl</td>\n <td>AllPub</td>\n <td>...</td>\n <td>0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>0</td>\n <td>12</td>\n <td>2008</td>\n <td>WD</td>\n <td>Normal</td>\n <td>250000</td>\n </tr>\n </tbody>\n</table>\n<p>5 rows × 81 columns</p>\n</div>", | |
| "text/plain": " Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \\\n0 1 60 RL 65.0 8450 Pave NaN Reg \n1 2 20 RL 80.0 9600 Pave NaN Reg \n2 3 60 RL 68.0 11250 Pave NaN IR1 \n3 4 70 RL 60.0 9550 Pave NaN IR1 \n4 5 60 RL 84.0 14260 Pave NaN IR1 \n\n LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal MoSold \\\n0 Lvl AllPub ... 0 NaN NaN NaN 0 2 \n1 Lvl AllPub ... 0 NaN NaN NaN 0 5 \n2 Lvl AllPub ... 0 NaN NaN NaN 0 9 \n3 Lvl AllPub ... 0 NaN NaN NaN 0 2 \n4 Lvl AllPub ... 0 NaN NaN NaN 0 12 \n\n YrSold SaleType SaleCondition SalePrice \n0 2008 WD Normal 208500 \n1 2007 WD Normal 181500 \n2 2008 WD Normal 223500 \n3 2006 WD Abnorml 140000 \n4 2008 WD Normal 250000 \n\n[5 rows x 81 columns]" | |
| }, | |
| "execution_count": 2, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "For this demo, I am going to start my asking some questions about the housing market in Ames that is sampled in this dataset and use the statistical tests that I am familiar with the answer them" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "## Critical Value $\\alpha = 0.05$\n\nThis is the significance level that we will be using for the tests " | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "## One sample T-test\n\nNow this dataset was obtained from Kaggle and was already prepared and cleaned. However, often times in the real world when we are trying to find answers to our business/research questions, we need to collect the data ourselves. Whenever, we are collecting data for analysis, we need to be careful to be using a sufficiently representative sample of the population. If the mean house prices in Ames is \\\\$1 Million but the sample data that we are using to form our conclusions has mean house price of \\\\$100K , we might be having a biased sample and our inferences might not be generalisable. A quick check of this can be done here with the one-sample T-test" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "For the population mean, I am using the data available on [Zillow](https://www.zillow.com/ames-ia/home-values/) about Ames in 2011" | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-06T04:12:44.040864Z", | |
| "start_time": "2020-08-06T04:12:44.037470Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "print(df['SalePrice'].mean()) # sample mean\navg_sale_price = 178000 # population mean, obtained from Zillow ", | |
| "execution_count": 9, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": "180921.19589041095\n" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "scipy.stats.ttest_1samp(df['SalePrice'], avg_sale_price)", | |
| "execution_count": 10, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": "Ttest_1sampResult(statistic=1.4050254485601865, pvalue=0.16022658524886843)" | |
| }, | |
| "execution_count": 10, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "In one-sample t-test, the H0 is that the sample mean is equal to the population mean. Since the p-value is > $\\alpha$, we cannot reject the H0, so we can conclude that the mean house price in the dataset is statistically similar to the mean of house prices in Ames in general." | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "WRONG!" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "\nWhy ? In short because - If we cannot reject a null hypothesis, it does not mean that we can accept it. \n\nTo explain this a short primer on hypothesis testing --\n\nH0 - Sample mean is equal to the population mean\nH1 - Sample mean is not equal to the population mean\n\nIf the p-value <= $\\alpha$ , Then the H0 is rejected and the H1 is accepted, however if the p-value > $\\alpha$ , the H0 is NOT rejected and not accepted.\nRejecting H0 and accepting H1 is like saying - \"The statement - the means are equal, is *false*, so its opposite must be true\", \nNot able to reject H0 - \" The statemnt - The means are equal, cannot be proved to be false, so its opposite might not be true, but also doesn't mean that it's proved it to be True.\n\n**So \"cannot be proved to be false\" != \"it is true\"**\n" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "So basically this test is telling us that it does not know if the means are significantly different or not.\n\nThis kinda anit-climatic isn't it. We can't conclude if the sample is representative or not. There is another way to get the answer to this question though. Itt should have crossed your mind by now that even if you were to somehow get the proof that the means of the prices in the sample is the same as the one in the population, we would still have no idea about the representativeness of the dataset. Mean is just on of the various descriptive statistics used to summarize data, i.e Multiple distributions can have the same mean. " | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "In order to answer the question of representativeness, therefore, statisticans first do the sample size estimation. I will cover how that is done later." | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "## Fireplaces and House Prices" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "Do you think having fireplaces increases or decreases the value of a house?\n\nThis question is something you will usually get from a \"business\" person. It is not a hypothesis that can be directly tested with a statistical test. It is a question for recommendation. In order to answer this, we need to break it down.\n\nWhat is it exactly that we are looking for - \n\n - That houses with fireplaces are more expensive than the houses without?\n This question would be enough for questions on the average trend in the market - eg. for identifying house owners - If houses with fireplaces are more expensive then the team can focus or ignore this demographic based on what they are marketing.\n \n - That adding a fireplace to the house, will increase the value of the house?\n This question is asking about the effect of the fireplace on the house's price and could be used for deciding the value of the house - Whether to increase the asking price of the house based on the presence of a fireplace\n" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "There are probably more interpretation of the question, and that's why as a Data Scientist, communication and business understanding is so important. In this notebook, I will try to answer the first variant.\n\nTo answer the second one - the causal effect of having fireplaces to house value, we would need to use a different set of tools and methods - Causal Inference on observational data. Statistical tests can determine causal effect to - except the design of the study needs to support that. If we were using data from a field or lab experiment, and compared house prices with and without fireplaces controlling for other variables, we could use ones these tests, however, this dataset is an observational dataset, it is a sample out of the population, not an experimental setup, so right off the bat, controlling for all the other factors like location, house condition etc. would require us to use regression analysis and all. Causal Inference techniques however shine in highlighting the causal effect. I will discuss Causal Inference in a seperate post." | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-05T14:43:08.589733Z", | |
| "start_time": "2020-08-05T14:43:08.583568Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "df['Fireplaces'].value_counts() # Different levels of Fireplaces", | |
| "execution_count": 55, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": "0 690\n1 650\n2 115\n3 5\nName: Fireplaces, dtype: int64" | |
| }, | |
| "execution_count": 55, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-05T14:47:31.678511Z", | |
| "start_time": "2020-08-05T14:47:31.669582Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "# Create a binary indicator for presence of one or more fireplaces\ndf['fireplace_exists'] = df['Fireplaces'].map(lambda x: 1 if x>= 1 else 0)\ndf['fireplace_exists'].value_counts(normalize=True)", | |
| "execution_count": 56, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": "1 0.527397\n0 0.472603\nName: fireplace_exists, dtype: float64" | |
| }, | |
| "execution_count": 56, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-05T14:54:50.650558Z", | |
| "start_time": "2020-08-05T14:54:50.642960Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "# The mean prices of houses with and without fireplaces\ndf.groupby('fireplace_exists')['SalePrice'].mean()", | |
| "execution_count": 57, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": "fireplace_exists\n0 141331.482609\n1 216397.692208\nName: SalePrice, dtype: float64" | |
| }, | |
| "execution_count": 57, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "So looks like the average house with a fireplace is 50% more expensive than ones without.." | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "Let's also look at the distribution of SalePrice" | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-06T08:10:08.328751Z", | |
| "start_time": "2020-08-06T08:10:07.941482Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "df['SalePrice'].plot.density()", | |
| "execution_count": 58, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": "<matplotlib.axes._subplots.AxesSubplot at 0x7fdebde6a250>" | |
| }, | |
| "execution_count": 58, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| }, | |
| { | |
| "data": { | |
| "image/png": "\n", | |
| "text/plain": "<Figure size 432x288 with 1 Axes>" | |
| }, | |
| "metadata": { | |
| "needs_background": "light" | |
| }, | |
| "output_type": "display_data" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "This looks a bit skewed and thus migh not be normally distributed, but lets assume that it is normal " | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "Let's create another column in the df, with a transformation of SalePrice to make it a normal distribution" | |
| }, | |
| { | |
| "metadata": { | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "# Normalizing the data by taking log+1 of the sale price and the subtracting from mean, divide by std\n\ndf['SalePricel1p'] = np.log1p(df['SalePrice'])\nsp_mean, sp_std = df['SalePricel1p'].mean(), df['SalePricel1p'].std()\ndf['SalePriceNorm'] = df['SalePricel1p'].map(lambda x: (x - sp_mean) / sp_std)\n\n# df['SalePriceNorm'].replace([np.inf, -np.inf, None], 0, inplace=True)\ndf['SalePriceNorm'].plot.density()\n", | |
| "execution_count": 53, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": "<matplotlib.axes._subplots.AxesSubplot at 0x7fdebcebb4d0>" | |
| }, | |
| "execution_count": 53, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| }, | |
| { | |
| "data": { | |
| "image/png": "\n", | |
| "text/plain": "<Figure size 432x288 with 1 Axes>" | |
| }, | |
| "metadata": { | |
| "needs_background": "light" | |
| }, | |
| "output_type": "display_data" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "This now looks very symmetrical to me" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "### Independent samples t-test" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "T-test is a parametric test - these tests are used when the endpoint of interest is normally distributed" | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-06T03:42:13.850770Z", | |
| "start_time": "2020-08-06T03:42:13.840951Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "t_score, p_value = scipy.stats.ttest_ind(df[df.fireplace_exists == 1]['SalePriceNorm'],\n df[df.fireplace_exists == 0]['SalePriceNorm'])\n\nt_score, p_value", | |
| "execution_count": 60, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": "(22.6408086001983, 1.6884906347491944e-97)" | |
| }, | |
| "execution_count": 60, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "So what does this tell us ? p-value is < $\\alpha$ so the H0 is rejected, \n\nH0 in the t-test is that the two groups are have similar means. Since we reject H0 and accpet H1, so we can conclude that houses with and without fireplaces do infact have statistically significantly different prices" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "### One way anova" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "ANOVA is a also a parametric test " | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-06T03:42:06.843422Z", | |
| "start_time": "2020-08-06T03:42:06.835642Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "f_score, p_value = scipy.stats.f_oneway(df[df.fireplace_exists == 1]['SalePriceNorm'],\n df[df.fireplace_exists == 0]['SalePriceNorm'])\n\nf_score, p_value", | |
| "execution_count": 59, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": "(512.6062140708134, 1.6884906347494813e-97)" | |
| }, | |
| "execution_count": 59, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "The same result, but presented in a table format using the statsmodel library:" | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-06T03:48:34.273041Z", | |
| "start_time": "2020-08-06T03:48:34.249018Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "model = ols('SalePriceNorm ~ fireplace_exists', data=df).fit()\nanova_table = sm.stats.anova_lm(model, typ=2)\nanova_table", | |
| "execution_count": 61, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>sum_sq</th>\n <th>df</th>\n <th>F</th>\n <th>PR(>F)</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>fireplace_exists</th>\n <td>379.524058</td>\n <td>1.0</td>\n <td>512.606214</td>\n <td>1.688491e-97</td>\n </tr>\n <tr>\n <th>Residual</th>\n <td>1079.475942</td>\n <td>1458.0</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
| "text/plain": " sum_sq df F PR(>F)\nfireplace_exists 379.524058 1.0 512.606214 1.688491e-97\nResidual 1079.475942 1458.0 NaN NaN" | |
| }, | |
| "execution_count": 61, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "The *test-statistic* in both the t test and the anova tests are *different* because of the *different distributions* that are used in the tests, one is t and the other is f, but the p-value calculated from the respective distributions are the same. \n\n**So when should we be using onway anova and when shold we be using a independent sample t test?**\n\n**Answer** - *independent samples t-test* only applies when there are *two* groups to be compared, however the *one way anova* can have *>=2 groups* that can be compared" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "Now we know that the there is difference in the means, but are they more expensive or less? \n\nFor this we need to do a posthoc test like TukeyHSD" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "#### Tukey HSD" | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-06T03:42:30.011633Z", | |
| "start_time": "2020-08-06T03:42:30.000503Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "res = pairwise_tukeyhsd(df['SalePriceNorm'], df['fireplace_exists'], alpha= 0.05)\nprint(res)", | |
| "execution_count": 63, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": "Multiple Comparison of Means - Tukey HSD, FWER=0.05\n=================================================\ngroup1 group2 meandiff p-adj lower upper reject\n-------------------------------------------------\n 0 1 1.0212 0.001 0.9328 1.1097 True\n-------------------------------------------------\n" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "So the meandiff is positve, so the effect is more, i.e houses with fireplaces cost more. \n\nGood by product of having our data nice and normalized is that we can also interpret that the factor of this increase is 1.02" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "Tukey HSD is also usefull so to figure out which groups within a variable are different. In the case about we had created a binary indicator because we were just interested in the existence of fireplaces, not on the effect of *multiple* fireplaces. \n\nWe can use the original variable `Fireplaces` and see the effect of multiple fireplaces too like so - " | |
| }, | |
| { | |
| "metadata": { | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "# First do ANOVA to check if there significant within group differences in the mean of SalePrice\nmodel = ols('SalePriceNorm ~ Fireplaces', data=df).fit()\nanova_table = sm.stats.anova_lm(model, typ=2)\nanova_table", | |
| "execution_count": 70, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>sum_sq</th>\n <th>df</th>\n <th>F</th>\n <th>PR(>F)</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Fireplaces</th>\n <td>349.519291</td>\n <td>1.0</td>\n <td>459.313192</td>\n <td>8.420419e-89</td>\n </tr>\n <tr>\n <th>Residual</th>\n <td>1109.480709</td>\n <td>1458.0</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
| "text/plain": " sum_sq df F PR(>F)\nFireplaces 349.519291 1.0 459.313192 8.420419e-89\nResidual 1109.480709 1458.0 NaN NaN" | |
| }, | |
| "execution_count": 70, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-06T03:42:30.011633Z", | |
| "start_time": "2020-08-06T03:42:30.000503Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "# Then do the post hoc test to see those within group differences\nres = pairwise_tukeyhsd(df['SalePriceNorm'], df['Fireplaces'], alpha= 0.05)\nprint(res)", | |
| "execution_count": 71, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": "Multiple Comparison of Means - Tukey HSD, FWER=0.05\n===================================================\ngroup1 group2 meandiff p-adj lower upper reject\n---------------------------------------------------\n 0 1 0.9807 0.001 0.8601 1.1014 True\n 0 2 1.2321 0.001 1.0097 1.4544 True\n 0 3 1.4365 0.0011 0.4458 2.4273 True\n 1 2 0.2513 0.0201 0.028 0.4746 True\n 1 3 0.4558 0.6213 -0.5352 1.4467 False\n 2 3 0.2044 0.9 -0.804 1.2129 False\n---------------------------------------------------\n" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "This is interesting, going up to any number of fireplaces from 0 is better, infact more the better, however, going from 2 to 3, or even 1 to 3 is not that different in price, infact not statitstically signifinicantly different either." | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "### Mann Whitney U test" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "Using the anova and the independent samples t-test we could conclude that the *mean* of Sale Prices between houses that have fireplaces and those that do not are statistically differenct. \n\nBut we had made an assumption here, that the SalePrice are normally distributed. I think that is a reasonable assumption to make in terms of house prices, since it makes sense that house prices will tend towards average while there will be some that are on extreme ends. However, you might encounter an endpoint where this assumption cannot be made. For example Click Through Rate, wait times, count events . The ANOVA and the T-Test will not apply in those cases since normality of the data is an underlying assumption in those test. \n\nSo we have to use a non-parametric test for those cases. Mann Whitney U Test is an example of a non-parametric test, which does not make the assumption that the endpoint is normally distributed. \n\nThis test is a non-parametric version of the T-test, it tests for difference in the underlying distributions of the two samples. It can be used to to compare 2 groups" | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-06T07:27:46.199161Z", | |
| "start_time": "2020-08-06T07:27:46.190609Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "stat, p_value = scipy.stats.mannwhitneyu(df[df.fireplace_exists == 1]['SalePrice'],\n df[df.fireplace_exists == 0]['SalePrice'])\n\nstat, p_value", | |
| "execution_count": 76, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": "(104141.5, 5.369064913897434e-90)" | |
| }, | |
| "execution_count": 76, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "So the p value is very low, so the $H0$ is rejected, so we conclued that the distributions of SalePrice differs between houses that have fireplaces and those that do not. " | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "### Kruskal Wallis test" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "What if we want to check for more that 2 groups?, enter Kruskal Wallist, which is non-parametric version of ANOVA, and can be used to compare more than two groups" | |
| }, | |
| { | |
| "metadata": { | |
| "ExecuteTime": { | |
| "end_time": "2020-08-06T08:25:41.165237Z", | |
| "start_time": "2020-08-06T08:25:41.136282Z" | |
| }, | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "# from scipy.stats.mstats import kruskal\nstat, p_value = scipy.stats.mstats.kruskalwallis(df[df.Fireplaces == 0]['SalePrice'].values,\n df[df.Fireplaces == 1]['SalePrice'].values,\n df[df.Fireplaces == 2]['SalePrice'].values,\n df[df.Fireplaces == 3]['SalePrice'].values)\nstat, p_value", | |
| "execution_count": 77, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": "(406.8360963973842, 7.317749401601013e-88)" | |
| }, | |
| "execution_count": 77, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "## Exploration what variables are co-related with SalePrice" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "Usually, with a new dataset, it is always good to start with the correlation matrix to see what features are correlated with the target variable.\n\nGetting the correlation matrix in python is actually quite easy" | |
| }, | |
| { | |
| "metadata": { | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "corr_df = df.corr() # returns a dataframe with pair-wise co-relation of all the variables in the df\ncorr_df[['SalePrice']].style.background_gradient() # Slect only the correlations to SalePrice", | |
| "execution_count": 92, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": "<style type=\"text/css\" >\n #T_a84dbf1e_d86c_11ea_be2c_acde48001122row0_col0 {\n background-color: #f0eaf4;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row1_col0 {\n background-color: #f8f1f8;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row2_col0 {\n background-color: #91b5d6;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row3_col0 {\n background-color: #adc1dd;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row4_col0 {\n background-color: #04649e;\n color: #f1f1f1;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row5_col0 {\n background-color: #f7f0f7;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row6_col0 {\n background-color: #4c99c5;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row7_col0 {\n background-color: #549cc7;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row8_col0 {\n background-color: #60a1ca;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row9_col0 {\n background-color: #84b0d3;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row10_col0 {\n background-color: #eee9f3;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row11_col0 {\n background-color: #bdc8e1;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row12_col0 {\n background-color: #2987bc;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row13_col0 {\n background-color: #2a88bc;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row14_col0 {\n background-color: #9cb9d9;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row15_col0 {\n background-color: #f1ebf4;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row16_col0 {\n background-color: #0771b1;\n color: #f1f1f1;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row17_col0 {\n background-color: #b9c6e0;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row18_col0 {\n background-color: #f0eaf4;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row19_col0 {\n background-color: #3d93c2;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row20_col0 {\n background-color: #a8bedc;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row21_col0 {\n background-color: #cacee5;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row22_col0 {\n background-color: #fff7fb;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row23_col0 {\n background-color: #4897c4;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row24_col0 {\n background-color: #65a3cb;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row25_col0 {\n background-color: #5c9fc9;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row26_col0 {\n background-color: #2081b9;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row27_col0 {\n background-color: #2484ba;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row28_col0 {\n background-color: #9ab8d8;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row29_col0 {\n background-color: #9ebad9;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row30_col0 {\n background-color: #fef6fb;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row31_col0 {\n background-color: #e5e1ef;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row32_col0 {\n background-color: #d8d7e9;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row33_col0 {\n background-color: #dbdaeb;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row34_col0 {\n background-color: #f0eaf4;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row35_col0 {\n background-color: #e4e1ef;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row36_col0 {\n background-color: #f1ebf4;\n color: #000000;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row37_col0 {\n background-color: #023858;\n color: #f1f1f1;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row38_col0 {\n background-color: #03446a;\n color: #f1f1f1;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row39_col0 {\n background-color: #03446a;\n color: #f1f1f1;\n } #T_a84dbf1e_d86c_11ea_be2c_acde48001122row40_col0 {\n background-color: #63a2cb;\n color: #000000;\n }</style><table id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122\" ><thead> <tr> <th class=\"blank level0\" ></th> <th class=\"col_heading level0 col0\" >SalePrice</th> </tr></thead><tbody>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row0\" class=\"row_heading level0 row0\" >Id</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row0_col0\" class=\"data row0 col0\" >-0.0219167</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row1\" class=\"row_heading level0 row1\" >MSSubClass</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row1_col0\" class=\"data row1 col0\" >-0.0842841</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row2\" class=\"row_heading level0 row2\" >LotFrontage</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row2_col0\" class=\"data row2 col0\" >0.351799</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row3\" class=\"row_heading level0 row3\" >LotArea</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row3_col0\" class=\"data row3 col0\" >0.263843</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row4\" class=\"row_heading level0 row4\" >OverallQual</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row4_col0\" class=\"data row4 col0\" >0.790982</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row5\" class=\"row_heading level0 row5\" >OverallCond</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row5_col0\" class=\"data row5 col0\" >-0.0778559</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row6\" class=\"row_heading level0 row6\" >YearBuilt</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row6_col0\" class=\"data row6 col0\" >0.522897</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row7\" class=\"row_heading level0 row7\" >YearRemodAdd</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row7_col0\" class=\"data row7 col0\" >0.507101</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row8\" class=\"row_heading level0 row8\" >MasVnrArea</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row8_col0\" class=\"data row8 col0\" >0.477493</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row9\" class=\"row_heading level0 row9\" >BsmtFinSF1</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row9_col0\" class=\"data row9 col0\" >0.38642</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row10\" class=\"row_heading level0 row10\" >BsmtFinSF2</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row10_col0\" class=\"data row10 col0\" >-0.0113781</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row11\" class=\"row_heading level0 row11\" >BsmtUnfSF</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row11_col0\" class=\"data row11 col0\" >0.214479</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row12\" class=\"row_heading level0 row12\" >TotalBsmtSF</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row12_col0\" class=\"data row12 col0\" >0.613581</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row13\" class=\"row_heading level0 row13\" >1stFlrSF</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row13_col0\" class=\"data row13 col0\" >0.605852</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row14\" class=\"row_heading level0 row14\" >2ndFlrSF</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row14_col0\" class=\"data row14 col0\" >0.319334</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row15\" class=\"row_heading level0 row15\" >LowQualFinSF</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row15_col0\" class=\"data row15 col0\" >-0.0256061</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row16\" class=\"row_heading level0 row16\" >GrLivArea</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row16_col0\" class=\"data row16 col0\" >0.708624</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row17\" class=\"row_heading level0 row17\" >BsmtFullBath</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row17_col0\" class=\"data row17 col0\" >0.227122</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row18\" class=\"row_heading level0 row18\" >BsmtHalfBath</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row18_col0\" class=\"data row18 col0\" >-0.0168442</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row19\" class=\"row_heading level0 row19\" >FullBath</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row19_col0\" class=\"data row19 col0\" >0.560664</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row20\" class=\"row_heading level0 row20\" >HalfBath</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row20_col0\" class=\"data row20 col0\" >0.284108</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row21\" class=\"row_heading level0 row21\" >BedroomAbvGr</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row21_col0\" class=\"data row21 col0\" >0.168213</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row22\" class=\"row_heading level0 row22\" >KitchenAbvGr</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row22_col0\" class=\"data row22 col0\" >-0.135907</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row23\" class=\"row_heading level0 row23\" >TotRmsAbvGrd</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row23_col0\" class=\"data row23 col0\" >0.533723</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row24\" class=\"row_heading level0 row24\" >Fireplaces</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row24_col0\" class=\"data row24 col0\" >0.466929</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row25\" class=\"row_heading level0 row25\" >GarageYrBlt</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row25_col0\" class=\"data row25 col0\" >0.486362</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row26\" class=\"row_heading level0 row26\" >GarageCars</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row26_col0\" class=\"data row26 col0\" >0.640409</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row27\" class=\"row_heading level0 row27\" >GarageArea</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row27_col0\" class=\"data row27 col0\" >0.623431</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row28\" class=\"row_heading level0 row28\" >WoodDeckSF</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row28_col0\" class=\"data row28 col0\" >0.324413</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row29\" class=\"row_heading level0 row29\" >OpenPorchSF</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row29_col0\" class=\"data row29 col0\" >0.315856</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row30\" class=\"row_heading level0 row30\" >EnclosedPorch</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row30_col0\" class=\"data row30 col0\" >-0.128578</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row31\" class=\"row_heading level0 row31\" >3SsnPorch</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row31_col0\" class=\"data row31 col0\" >0.0445837</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row32\" class=\"row_heading level0 row32\" >ScreenPorch</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row32_col0\" class=\"data row32 col0\" >0.111447</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row33\" class=\"row_heading level0 row33\" >PoolArea</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row33_col0\" class=\"data row33 col0\" >0.0924035</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row34\" class=\"row_heading level0 row34\" >MiscVal</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row34_col0\" class=\"data row34 col0\" >-0.0211896</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row35\" class=\"row_heading level0 row35\" >MoSold</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row35_col0\" class=\"data row35 col0\" >0.0464322</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row36\" class=\"row_heading level0 row36\" >YrSold</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row36_col0\" class=\"data row36 col0\" >-0.0289226</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row37\" class=\"row_heading level0 row37\" >SalePrice</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row37_col0\" class=\"data row37 col0\" >1</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row38\" class=\"row_heading level0 row38\" >SalePriceNorm</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row38_col0\" class=\"data row38 col0\" >0.948374</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row39\" class=\"row_heading level0 row39\" >SalePricel1p</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row39_col0\" class=\"data row39 col0\" >0.948374</td>\n </tr>\n <tr>\n <th id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122level0_row40\" class=\"row_heading level0 row40\" >fireplace_exists</th>\n <td id=\"T_a84dbf1e_d86c_11ea_be2c_acde48001122row40_col0\" class=\"data row40 col0\" >0.471908</td>\n </tr>\n </tbody></table>", | |
| "text/plain": "<pandas.io.formats.style.Styler at 0x7fdebb681410>" | |
| }, | |
| "execution_count": 92, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "Ok, so there are quite a few that have a high co-relation with SalePrice (>0.6).\n\nbut notice, we do not have a p-value here, we would unfortunatley have to do that ourseleves. 🤷♂️" | |
| }, | |
| { | |
| "metadata": { | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "def get_pearsonr_w_pvalue(corr_df: pd.DataFrame, target_var: str,\n threshold: float) -> pd.DataFrame:\n \"\"\"\n Helper function to get the p-values along with the pearson co-relation coefficients\n \"\"\"\n correlated_features = corr_df[corr_df[target_var] >= threshold].index\n sale_price_corr_df = pd.DataFrame(\n [(feat, *scipy.stats.pearsonr(df[feat].fillna(0), df[target_var]))\n for feat in correlated_features],\n columns=['feat', 'pearson_r', 'p_value'])\n return sale_price_corr_df.style.background_gradient()", | |
| "execution_count": 102, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "get_pearsonr_w_pvalue(corr_df, target_var=\"SalePrice\", threshold=-100.0) # All the features", | |
| "execution_count": 103, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": "<style type=\"text/css\" >\n #T_cf919a22_d86d_11ea_be2c_acde48001122row0_col1 {\n background-color: #f0eaf4;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row0_col2 {\n background-color: #3f93c2;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row1_col1 {\n background-color: #f8f1f8;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row1_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row2_col1 {\n background-color: #bfc9e1;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row2_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row3_col1 {\n background-color: #adc1dd;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row3_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row4_col1 {\n background-color: #04649e;\n color: #f1f1f1;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row4_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row5_col1 {\n background-color: #f7f0f7;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row5_col2 {\n background-color: #fef6fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row6_col1 {\n background-color: #4c99c5;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row6_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row7_col1 {\n background-color: #549cc7;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row7_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row8_col1 {\n background-color: #62a2cb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row8_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row9_col1 {\n background-color: #84b0d3;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row9_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row10_col1 {\n background-color: #eee9f3;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row10_col2 {\n background-color: #023858;\n color: #f1f1f1;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row11_col1 {\n background-color: #bdc8e1;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row11_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row12_col1 {\n background-color: #2987bc;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row12_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row13_col1 {\n background-color: #2a88bc;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row13_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row14_col1 {\n background-color: #9cb9d9;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row14_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row15_col1 {\n background-color: #f1ebf4;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row15_col2 {\n background-color: #76aad0;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row16_col1 {\n background-color: #0771b1;\n color: #f1f1f1;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row16_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row17_col1 {\n background-color: #b9c6e0;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row17_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row18_col1 {\n background-color: #f0eaf4;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row18_col2 {\n background-color: #056aa6;\n color: #f1f1f1;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row19_col1 {\n background-color: #3d93c2;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row19_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row20_col1 {\n background-color: #a8bedc;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row20_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row21_col1 {\n background-color: #cacee5;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row21_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row22_col1 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row22_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row23_col1 {\n background-color: #4897c4;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row23_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row24_col1 {\n background-color: #65a3cb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row24_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row25_col1 {\n background-color: #afc1dd;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row25_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row26_col1 {\n background-color: #2081b9;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row26_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row27_col1 {\n background-color: #2484ba;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row27_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row28_col1 {\n background-color: #9ab8d8;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row28_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row29_col1 {\n background-color: #9ebad9;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row29_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row30_col1 {\n background-color: #fef6fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row30_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row31_col1 {\n background-color: #e5e1ef;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row31_col2 {\n background-color: #eae6f1;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row32_col1 {\n background-color: #d8d7e9;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row32_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row33_col1 {\n background-color: #dbdaeb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row33_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row34_col1 {\n background-color: #f0eaf4;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row34_col2 {\n background-color: #348ebf;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row35_col1 {\n background-color: #e4e1ef;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row35_col2 {\n background-color: #eee8f3;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row36_col1 {\n background-color: #f1ebf4;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row36_col2 {\n background-color: #9ab8d8;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row37_col1 {\n background-color: #023858;\n color: #f1f1f1;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row37_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row38_col1 {\n background-color: #03446a;\n color: #f1f1f1;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row38_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row39_col1 {\n background-color: #03446a;\n color: #f1f1f1;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row39_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row40_col1 {\n background-color: #63a2cb;\n color: #000000;\n } #T_cf919a22_d86d_11ea_be2c_acde48001122row40_col2 {\n background-color: #fff7fb;\n color: #000000;\n }</style><table id=\"T_cf919a22_d86d_11ea_be2c_acde48001122\" ><thead> <tr> <th class=\"blank level0\" ></th> <th class=\"col_heading level0 col0\" >feat</th> <th class=\"col_heading level0 col1\" >pearson_r</th> <th class=\"col_heading level0 col2\" >p_value</th> </tr></thead><tbody>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row0\" class=\"row_heading level0 row0\" >0</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row0_col0\" class=\"data row0 col0\" >Id</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row0_col1\" class=\"data row0 col1\" >-0.0219167</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row0_col2\" class=\"data row0 col2\" >0.402694</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row1\" class=\"row_heading level0 row1\" >1</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row1_col0\" class=\"data row1 col0\" >MSSubClass</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row1_col1\" class=\"data row1 col1\" >-0.0842841</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row1_col2\" class=\"data row1 col2\" >0.00126647</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row2\" class=\"row_heading level0 row2\" >2</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row2_col0\" class=\"data row2 col0\" >LotFrontage</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row2_col1\" class=\"data row2 col1\" >0.209624</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row2_col2\" class=\"data row2 col2\" >5.8243e-16</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row3\" class=\"row_heading level0 row3\" >3</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row3_col0\" class=\"data row3 col0\" >LotArea</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row3_col1\" class=\"data row3 col1\" >0.263843</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row3_col2\" class=\"data row3 col2\" >1.12314e-24</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row4\" class=\"row_heading level0 row4\" >4</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row4_col0\" class=\"data row4 col0\" >OverallQual</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row4_col1\" class=\"data row4 col1\" >0.790982</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row4_col2\" class=\"data row4 col2\" >2.18568e-313</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row5\" class=\"row_heading level0 row5\" >5</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row5_col0\" class=\"data row5 col0\" >OverallCond</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row5_col1\" class=\"data row5 col1\" >-0.0778559</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row5_col2\" class=\"data row5 col2\" >0.00291235</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row6\" class=\"row_heading level0 row6\" >6</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row6_col0\" class=\"data row6 col0\" >YearBuilt</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row6_col1\" class=\"data row6 col1\" >0.522897</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row6_col2\" class=\"data row6 col2\" >2.99023e-103</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row7\" class=\"row_heading level0 row7\" >7</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row7_col0\" class=\"data row7 col0\" >YearRemodAdd</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row7_col1\" class=\"data row7 col1\" >0.507101</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row7_col2\" class=\"data row7 col2\" >3.16495e-96</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row8\" class=\"row_heading level0 row8\" >8</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row8_col0\" class=\"data row8 col0\" >MasVnrArea</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row8_col1\" class=\"data row8 col1\" >0.472614</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row8_col2\" class=\"data row8 col2\" >4.10046e-82</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row9\" class=\"row_heading level0 row9\" >9</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row9_col0\" class=\"data row9 col0\" >BsmtFinSF1</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row9_col1\" class=\"data row9 col1\" >0.38642</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row9_col2\" class=\"data row9 col2\" >3.39411e-53</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row10\" class=\"row_heading level0 row10\" >10</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row10_col0\" class=\"data row10 col0\" >BsmtFinSF2</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row10_col1\" class=\"data row10 col1\" >-0.0113781</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row10_col2\" class=\"data row10 col2\" >0.663999</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row11\" class=\"row_heading level0 row11\" >11</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row11_col0\" class=\"data row11 col0\" >BsmtUnfSF</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row11_col1\" class=\"data row11 col1\" >0.214479</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row11_col2\" class=\"data row11 col2\" >1.18298e-16</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row12\" class=\"row_heading level0 row12\" >12</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row12_col0\" class=\"data row12 col0\" >TotalBsmtSF</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row12_col1\" class=\"data row12 col1\" >0.613581</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row12_col2\" class=\"data row12 col2\" >9.48423e-152</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row13\" class=\"row_heading level0 row13\" >13</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row13_col0\" class=\"data row13 col0\" >1stFlrSF</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row13_col1\" class=\"data row13 col1\" >0.605852</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row13_col2\" class=\"data row13 col2\" >5.39471e-147</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row14\" class=\"row_heading level0 row14\" >14</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row14_col0\" class=\"data row14 col0\" >2ndFlrSF</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row14_col1\" class=\"data row14 col1\" >0.319334</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row14_col2\" class=\"data row14 col2\" >5.76434e-36</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row15\" class=\"row_heading level0 row15\" >15</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row15_col0\" class=\"data row15 col0\" >LowQualFinSF</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row15_col1\" class=\"data row15 col1\" >-0.0256061</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row15_col2\" class=\"data row15 col2\" >0.328207</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row16\" class=\"row_heading level0 row16\" >16</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row16_col0\" class=\"data row16 col0\" >GrLivArea</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row16_col1\" class=\"data row16 col1\" >0.708624</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row16_col2\" class=\"data row16 col2\" >4.51803e-223</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row17\" class=\"row_heading level0 row17\" >17</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row17_col0\" class=\"data row17 col0\" >BsmtFullBath</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row17_col1\" class=\"data row17 col1\" >0.227122</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row17_col2\" class=\"data row17 col2\" >1.55034e-18</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row18\" class=\"row_heading level0 row18\" >18</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row18_col0\" class=\"data row18 col0\" >BsmtHalfBath</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row18_col1\" class=\"data row18 col1\" >-0.0168442</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row18_col2\" class=\"data row18 col2\" >0.520154</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row19\" class=\"row_heading level0 row19\" >19</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row19_col0\" class=\"data row19 col0\" >FullBath</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row19_col1\" class=\"data row19 col1\" >0.560664</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row19_col2\" class=\"data row19 col2\" >1.23647e-121</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row20\" class=\"row_heading level0 row20\" >20</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row20_col0\" class=\"data row20 col0\" >HalfBath</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row20_col1\" class=\"data row20 col1\" >0.284108</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row20_col2\" class=\"data row20 col2\" >1.65047e-28</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row21\" class=\"row_heading level0 row21\" >21</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row21_col0\" class=\"data row21 col0\" >BedroomAbvGr</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row21_col1\" class=\"data row21 col1\" >0.168213</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row21_col2\" class=\"data row21 col2\" >9.9275e-11</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row22\" class=\"row_heading level0 row22\" >22</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row22_col0\" class=\"data row22 col0\" >KitchenAbvGr</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row22_col1\" class=\"data row22 col1\" >-0.135907</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row22_col2\" class=\"data row22 col2\" >1.86043e-07</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row23\" class=\"row_heading level0 row23\" >23</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row23_col0\" class=\"data row23 col0\" >TotRmsAbvGrd</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row23_col1\" class=\"data row23 col1\" >0.533723</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row23_col2\" class=\"data row23 col2\" >2.77228e-108</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row24\" class=\"row_heading level0 row24\" >24</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row24_col0\" class=\"data row24 col0\" >Fireplaces</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row24_col1\" class=\"data row24 col1\" >0.466929</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row24_col2\" class=\"data row24 col2\" >6.14149e-80</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row25\" class=\"row_heading level0 row25\" >25</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row25_col0\" class=\"data row25 col0\" >GarageYrBlt</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row25_col1\" class=\"data row25 col1\" >0.261366</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row25_col2\" class=\"data row25 col2\" >3.13925e-24</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row26\" class=\"row_heading level0 row26\" >26</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row26_col0\" class=\"data row26 col0\" >GarageCars</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row26_col1\" class=\"data row26 col1\" >0.640409</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row26_col2\" class=\"data row26 col2\" >2.49864e-169</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row27\" class=\"row_heading level0 row27\" >27</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row27_col0\" class=\"data row27 col0\" >GarageArea</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row27_col1\" class=\"data row27 col1\" >0.623431</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row27_col2\" class=\"data row27 col2\" >5.26504e-158</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row28\" class=\"row_heading level0 row28\" >28</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row28_col0\" class=\"data row28 col0\" >WoodDeckSF</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row28_col1\" class=\"data row28 col1\" >0.324413</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row28_col2\" class=\"data row28 col2\" >3.97222e-37</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row29\" class=\"row_heading level0 row29\" >29</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row29_col0\" class=\"data row29 col0\" >OpenPorchSF</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row29_col1\" class=\"data row29 col1\" >0.315856</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row29_col2\" class=\"data row29 col2\" >3.49337e-35</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row30\" class=\"row_heading level0 row30\" >30</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row30_col0\" class=\"data row30 col0\" >EnclosedPorch</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row30_col1\" class=\"data row30 col1\" >-0.128578</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row30_col2\" class=\"data row30 col2\" >8.25577e-07</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row31\" class=\"row_heading level0 row31\" >31</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row31_col0\" class=\"data row31 col0\" >3SsnPorch</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row31_col1\" class=\"data row31 col1\" >0.0445837</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row31_col2\" class=\"data row31 col2\" >0.0885817</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row32\" class=\"row_heading level0 row32\" >32</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row32_col0\" class=\"data row32 col0\" >ScreenPorch</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row32_col1\" class=\"data row32 col1\" >0.111447</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row32_col2\" class=\"data row32 col2\" >1.97214e-05</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row33\" class=\"row_heading level0 row33\" >33</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row33_col0\" class=\"data row33 col0\" >PoolArea</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row33_col1\" class=\"data row33 col1\" >0.0924035</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row33_col2\" class=\"data row33 col2\" >0.000407349</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row34\" class=\"row_heading level0 row34\" >34</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row34_col0\" class=\"data row34 col0\" >MiscVal</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row34_col1\" class=\"data row34 col1\" >-0.0211896</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row34_col2\" class=\"data row34 col2\" >0.418486</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row35\" class=\"row_heading level0 row35\" >35</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row35_col0\" class=\"data row35 col0\" >MoSold</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row35_col1\" class=\"data row35 col1\" >0.0464322</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row35_col2\" class=\"data row35 col2\" >0.0761276</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row36\" class=\"row_heading level0 row36\" >36</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row36_col0\" class=\"data row36 col0\" >YrSold</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row36_col1\" class=\"data row36 col1\" >-0.0289226</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row36_col2\" class=\"data row36 col2\" >0.269413</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row37\" class=\"row_heading level0 row37\" >37</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row37_col0\" class=\"data row37 col0\" >SalePrice</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row37_col1\" class=\"data row37 col1\" >1</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row37_col2\" class=\"data row37 col2\" >0</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row38\" class=\"row_heading level0 row38\" >38</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row38_col0\" class=\"data row38 col0\" >SalePriceNorm</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row38_col1\" class=\"data row38 col1\" >0.948374</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row38_col2\" class=\"data row38 col2\" >0</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row39\" class=\"row_heading level0 row39\" >39</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row39_col0\" class=\"data row39 col0\" >SalePricel1p</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row39_col1\" class=\"data row39 col1\" >0.948374</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row39_col2\" class=\"data row39 col2\" >0</td>\n </tr>\n <tr>\n <th id=\"T_cf919a22_d86d_11ea_be2c_acde48001122level0_row40\" class=\"row_heading level0 row40\" >40</th>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row40_col0\" class=\"data row40 col0\" >fireplace_exists</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row40_col1\" class=\"data row40 col1\" >0.471908</td>\n <td id=\"T_cf919a22_d86d_11ea_be2c_acde48001122row40_col2\" class=\"data row40 col2\" >7.68009e-82</td>\n </tr>\n </tbody></table>", | |
| "text/plain": "<pandas.io.formats.style.Styler at 0x7fdec0c66210>" | |
| }, | |
| "execution_count": 103, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "trusted": true | |
| }, | |
| "cell_type": "code", | |
| "source": "get_pearsonr_w_pvalue(corr_df, target_var=\"SalePrice\", threshold=0.60) # Higl co-relation", | |
| "execution_count": 104, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": "<style type=\"text/css\" >\n #T_dac9741e_d86d_11ea_be2c_acde48001122row0_col1 {\n background-color: #80aed2;\n color: #000000;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row0_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row1_col1 {\n background-color: #fcf4fa;\n color: #000000;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row1_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row2_col1 {\n background-color: #fff7fb;\n color: #000000;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row2_col2 {\n background-color: #023858;\n color: #f1f1f1;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row3_col1 {\n background-color: #cdd0e5;\n color: #000000;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row3_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row4_col1 {\n background-color: #f2ecf5;\n color: #000000;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row4_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row5_col1 {\n background-color: #f8f1f8;\n color: #000000;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row5_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row6_col1 {\n background-color: #023858;\n color: #f1f1f1;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row6_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row7_col1 {\n background-color: #045b8e;\n color: #f1f1f1;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row7_col2 {\n background-color: #fff7fb;\n color: #000000;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row8_col1 {\n background-color: #045b8e;\n color: #f1f1f1;\n } #T_dac9741e_d86d_11ea_be2c_acde48001122row8_col2 {\n background-color: #fff7fb;\n color: #000000;\n }</style><table id=\"T_dac9741e_d86d_11ea_be2c_acde48001122\" ><thead> <tr> <th class=\"blank level0\" ></th> <th class=\"col_heading level0 col0\" >feat</th> <th class=\"col_heading level0 col1\" >pearson_r</th> <th class=\"col_heading level0 col2\" >p_value</th> </tr></thead><tbody>\n <tr>\n <th id=\"T_dac9741e_d86d_11ea_be2c_acde48001122level0_row0\" class=\"row_heading level0 row0\" >0</th>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row0_col0\" class=\"data row0 col0\" >OverallQual</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row0_col1\" class=\"data row0 col1\" >0.790982</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row0_col2\" class=\"data row0 col2\" >2.18568e-313</td>\n </tr>\n <tr>\n <th id=\"T_dac9741e_d86d_11ea_be2c_acde48001122level0_row1\" class=\"row_heading level0 row1\" >1</th>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row1_col0\" class=\"data row1 col0\" >TotalBsmtSF</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row1_col1\" class=\"data row1 col1\" >0.613581</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row1_col2\" class=\"data row1 col2\" >9.48423e-152</td>\n </tr>\n <tr>\n <th id=\"T_dac9741e_d86d_11ea_be2c_acde48001122level0_row2\" class=\"row_heading level0 row2\" >2</th>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row2_col0\" class=\"data row2 col0\" >1stFlrSF</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row2_col1\" class=\"data row2 col1\" >0.605852</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row2_col2\" class=\"data row2 col2\" >5.39471e-147</td>\n </tr>\n <tr>\n <th id=\"T_dac9741e_d86d_11ea_be2c_acde48001122level0_row3\" class=\"row_heading level0 row3\" >3</th>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row3_col0\" class=\"data row3 col0\" >GrLivArea</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row3_col1\" class=\"data row3 col1\" >0.708624</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row3_col2\" class=\"data row3 col2\" >4.51803e-223</td>\n </tr>\n <tr>\n <th id=\"T_dac9741e_d86d_11ea_be2c_acde48001122level0_row4\" class=\"row_heading level0 row4\" >4</th>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row4_col0\" class=\"data row4 col0\" >GarageCars</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row4_col1\" class=\"data row4 col1\" >0.640409</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row4_col2\" class=\"data row4 col2\" >2.49864e-169</td>\n </tr>\n <tr>\n <th id=\"T_dac9741e_d86d_11ea_be2c_acde48001122level0_row5\" class=\"row_heading level0 row5\" >5</th>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row5_col0\" class=\"data row5 col0\" >GarageArea</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row5_col1\" class=\"data row5 col1\" >0.623431</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row5_col2\" class=\"data row5 col2\" >5.26504e-158</td>\n </tr>\n <tr>\n <th id=\"T_dac9741e_d86d_11ea_be2c_acde48001122level0_row6\" class=\"row_heading level0 row6\" >6</th>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row6_col0\" class=\"data row6 col0\" >SalePrice</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row6_col1\" class=\"data row6 col1\" >1</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row6_col2\" class=\"data row6 col2\" >0</td>\n </tr>\n <tr>\n <th id=\"T_dac9741e_d86d_11ea_be2c_acde48001122level0_row7\" class=\"row_heading level0 row7\" >7</th>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row7_col0\" class=\"data row7 col0\" >SalePriceNorm</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row7_col1\" class=\"data row7 col1\" >0.948374</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row7_col2\" class=\"data row7 col2\" >0</td>\n </tr>\n <tr>\n <th id=\"T_dac9741e_d86d_11ea_be2c_acde48001122level0_row8\" class=\"row_heading level0 row8\" >8</th>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row8_col0\" class=\"data row8 col0\" >SalePricel1p</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row8_col1\" class=\"data row8 col1\" >0.948374</td>\n <td id=\"T_dac9741e_d86d_11ea_be2c_acde48001122row8_col2\" class=\"data row8 col2\" >0</td>\n </tr>\n </tbody></table>", | |
| "text/plain": "<pandas.io.formats.style.Styler at 0x7fdebe98e6d0>" | |
| }, | |
| "execution_count": 104, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "## Wrap-up" | |
| }, | |
| { | |
| "metadata": {}, | |
| "cell_type": "markdown", | |
| "source": "In this post we saw how to answer some of the business questions with the statistical tests that we have on observational data that is obtained. \nWith the methods stated here we can determine the significance of difference in the means and medians of a target endpoint by different groups or the \"associations\" and we used both parametric and non-parametric methods to determine that. \n\nNext we will look into answering the questions about the \"effect\" of certain variables on the endponit.\n" | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "name": "conda_env_playground", | |
| "display_name": "conda_env_playground", | |
| "language": "python" | |
| }, | |
| "language_info": { | |
| "name": "python", | |
| "version": "3.7.4", | |
| "mimetype": "text/x-python", | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "pygments_lexer": "ipython3", | |
| "nbconvert_exporter": "python", | |
| "file_extension": ".py" | |
| }, | |
| "toc": { | |
| "nav_menu": {}, | |
| "number_sections": true, | |
| "sideBar": true, | |
| "skip_h1_title": false, | |
| "base_numbering": 1, | |
| "title_cell": "Table of Contents", | |
| "title_sidebar": "Contents", | |
| "toc_cell": true, | |
| "toc_position": {}, | |
| "toc_section_display": true, | |
| "toc_window_display": true | |
| }, | |
| "gist": { | |
| "id": "", | |
| "data": { | |
| "description": "Statistical tests and how to use them", | |
| "public": true | |
| } | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 4 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment