Created
November 30, 2015 10:47
-
-
Save statkclee/9d54f17859763713973c to your computer and use it in GitHub Desktop.
Think Stats 2, 11장 연습문제 해답
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "통계적 사고 (2판) 연습문제 ([thinkstats2.com](thinkstats2.com), [think-stat.xwmooc.org](http://think-stat.xwmooc.org))<br>\n", | |
| "Allen Downey / 이광춘(xwMOOC)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 연습문제 11.1\n", | |
| "\n", | |
| "동료중 한명이 출산이 예정되어 있고, 출생일을 예측하는 사무실 내기게임에 참여한다고 가정하자. 임신 30주차에 내기를 건다고 가정하자. 가장 최선의 예측을 하는데 어떤 변수를 사용하면 될까? 출생전에 알려진 변수로 한정해야하고, 내기에 참여한 사람들에게 이용가능해야할 것 같다.\n", | |
| "\n", | |
| "## 연습문제 11.2\n", | |
| "\n", | |
| "트리버스-윌라드(Trivers-Willard) 가설은 많은 포유류에 있어, 성비는 “어머니 상태(maternal condition)”에 달려있다고 제시한다; 즉, 산모 연령, 크기, 건강, 사회적 지위 같은 요인. https://en.wikipedia.org/wiki/Trivers-Willard_hypothesis 참조한다.\n", | |
| "일부 연구는 사람사이에 이런 효과를 보여주고 있지만, 결과는 혼재한다. 이번 장에서, 이런 요인과 연관된 일부 변수를 검정하지만, 성비에 통계적으로 유의적인 효과를 갖는 어떤 변수도 발견하지 못했다.\n", | |
| "\n", | |
| "연습으로, 데이터 마이닝 접근법을 사용해서 임신파일과 응답자파일에서 다른 변수를 검정하라. 실질적인 효과를 갖는 다른 요인을 발견할 수 있는가?\n", | |
| "\n", | |
| "## 연습문제 11.3\n", | |
| "예측하고자 하는 수량(quantity)이 갯수(count)라면, 포아송 회귀를 사용할 수 있다. StatModels에 poisson 함수로 구현되어 있다. ols 나 logit 과 동일한 방식으로 작동한다. 연습으로, 이것을 사용해서 한 여성에 대해서 얼마나 많은 아기가 태어나는지 예측한다; NSFG 데이터셋에서, 이 변수는 numbabes로 불린다..\n", | |
| "나이가 35세, 흑인, 연가구소득이 $75,000을 넘는 대학을 졸업한 여성을 가정해보자. 그녀가 얼마나 많은 자녀를 출산할 것으로 예상되는가?\n", | |
| "\n", | |
| "## 연습문제 11.4\n", | |
| "\n", | |
| "만약 예측하고자 하는 수량이 범주형이라면, 다항 로지스틱 회귀를 사용한다. StatModels에서 mnlogit 함수로 구현되어 있다. 연습으로, 이를 사용해서, 한 여성이 혼인상태, 동거상태, 과부상태, 이혼상태, 별거상태, 미혼인지 추측해본다; NSFG 데이터셋에서, 혼인상태는 변수명 rmarital로 부호화되어 있다.\n", | |
| "연령이 25세, 백인, 연가구소득이 약 $45,000 달러인 고졸 여성을 만났다고 가정해보자. 그녀가 혼인, 동거, 등일 확률은 얼마나 될까?" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 9, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "import numpy as np\n", | |
| "import first\n", | |
| "live, firsts, others = first.MakeFrames()\n", | |
| "df = live[live.prglngth>30]" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "아래에 저자가 발견한 임신기간에 통계적으로 유의적인 변수만 나와있다." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<table class=\"simpletable\">\n", | |
| "<caption>OLS Regression Results</caption>\n", | |
| "<tr>\n", | |
| " <th>Dep. Variable:</th> <td>prglngth</td> <th> R-squared: </th> <td> 0.011</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Model:</th> <td>OLS</td> <th> Adj. R-squared: </th> <td> 0.011</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Method:</th> <td>Least Squares</td> <th> F-statistic: </th> <td> 34.28</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Date:</th> <td>Mon, 30 Nov 2015</td> <th> Prob (F-statistic):</th> <td>5.09e-22</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Time:</th> <td>10:43:48</td> <th> Log-Likelihood: </th> <td> -18247.</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>No. Observations:</th> <td> 8884</td> <th> AIC: </th> <td>3.650e+04</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Df Residuals:</th> <td> 8880</td> <th> BIC: </th> <td>3.653e+04</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Df Model:</th> <td> 3</td> <th> </th> <td> </td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Covariance Type:</th> <td>nonrobust</td> <th> </th> <td> </td> \n", | |
| "</tr>\n", | |
| "</table>\n", | |
| "<table class=\"simpletable\">\n", | |
| "<tr>\n", | |
| " <td></td> <th>coef</th> <th>std err</th> <th>t</th> <th>P>|t|</th> <th>[95.0% Conf. Int.]</th> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Intercept</th> <td> 38.7617</td> <td> 0.039</td> <td> 1006.410</td> <td> 0.000</td> <td> 38.686 38.837</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>birthord == 1[T.True]</th> <td> 0.1015</td> <td> 0.040</td> <td> 2.528</td> <td> 0.011</td> <td> 0.023 0.180</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>race == 2[T.True]</th> <td> 0.1390</td> <td> 0.042</td> <td> 3.311</td> <td> 0.001</td> <td> 0.057 0.221</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>nbrnaliv > 1[T.True]</th> <td> -1.4944</td> <td> 0.164</td> <td> -9.086</td> <td> 0.000</td> <td> -1.817 -1.172</td>\n", | |
| "</tr>\n", | |
| "</table>\n", | |
| "<table class=\"simpletable\">\n", | |
| "<tr>\n", | |
| " <th>Omnibus:</th> <td>1587.470</td> <th> Durbin-Watson: </th> <td> 1.619</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Prob(Omnibus):</th> <td> 0.000</td> <th> Jarque-Bera (JB): </th> <td>6160.751</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Skew:</th> <td>-0.852</td> <th> Prob(JB): </th> <td> 0.00</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Kurtosis:</th> <td> 6.707</td> <th> Cond. No. </th> <td> 10.9</td>\n", | |
| "</tr>\n", | |
| "</table>" | |
| ], | |
| "text/plain": [ | |
| "<class 'statsmodels.iolib.summary.Summary'>\n", | |
| "\"\"\"\n", | |
| " OLS Regression Results \n", | |
| "==============================================================================\n", | |
| "Dep. Variable: prglngth R-squared: 0.011\n", | |
| "Model: OLS Adj. R-squared: 0.011\n", | |
| "Method: Least Squares F-statistic: 34.28\n", | |
| "Date: Mon, 30 Nov 2015 Prob (F-statistic): 5.09e-22\n", | |
| "Time: 10:43:48 Log-Likelihood: -18247.\n", | |
| "No. Observations: 8884 AIC: 3.650e+04\n", | |
| "Df Residuals: 8880 BIC: 3.653e+04\n", | |
| "Df Model: 3 \n", | |
| "Covariance Type: nonrobust \n", | |
| "=========================================================================================\n", | |
| " coef std err t P>|t| [95.0% Conf. Int.]\n", | |
| "-----------------------------------------------------------------------------------------\n", | |
| "Intercept 38.7617 0.039 1006.410 0.000 38.686 38.837\n", | |
| "birthord == 1[T.True] 0.1015 0.040 2.528 0.011 0.023 0.180\n", | |
| "race == 2[T.True] 0.1390 0.042 3.311 0.001 0.057 0.221\n", | |
| "nbrnaliv > 1[T.True] -1.4944 0.164 -9.086 0.000 -1.817 -1.172\n", | |
| "==============================================================================\n", | |
| "Omnibus: 1587.470 Durbin-Watson: 1.619\n", | |
| "Prob(Omnibus): 0.000 Jarque-Bera (JB): 6160.751\n", | |
| "Skew: -0.852 Prob(JB): 0.00\n", | |
| "Kurtosis: 6.707 Cond. No. 10.9\n", | |
| "==============================================================================\n", | |
| "\n", | |
| "Warnings:\n", | |
| "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", | |
| "\"\"\"" | |
| ] | |
| }, | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "import statsmodels.formula.api as smf\n", | |
| "model = smf.ols('prglngth ~ birthord==1 + race==2 + nbrnaliv>1', data=df)\n", | |
| "results = model.fit()\n", | |
| "results.summary()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "성비를 예측하는 변수에 대해서 좀더 깊게 살펴보자." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 4, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "import regression\n", | |
| "join = regression.JoinFemResp(live)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 10, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Optimization terminated successfully.\n", | |
| " Current function value: 0.687464\n", | |
| " Iterations 4\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "def GoMining(df):\n", | |
| " \"\"\"Searches for variables that predict birth weight.\n", | |
| "\n", | |
| " df: DataFrame of pregnancy records\n", | |
| "\n", | |
| " returns: list of (rsquared, variable name) pairs\n", | |
| " \"\"\"\n", | |
| " df['boy'] = (df.babysex==1).astype(int)\n", | |
| " variables = []\n", | |
| " for name in df.columns:\n", | |
| " try:\n", | |
| " if df[name].var() < 1e-7:\n", | |
| " continue\n", | |
| "\n", | |
| " formula='boy ~ agepreg + ' + name\n", | |
| " model = smf.logit(formula, data=df)\n", | |
| " nobs = len(model.endog)\n", | |
| " if nobs < len(df)/2:\n", | |
| " continue\n", | |
| "\n", | |
| " results = model.fit()\n", | |
| " except:\n", | |
| " continue\n", | |
| "\n", | |
| " variables.append((results.prsquared, name))\n", | |
| "\n", | |
| " return variables\n", | |
| "\n", | |
| "variables = GoMining(join)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "가장 높은 유사-$R^2$(pseudo-$R^2$) 값을 이끌어 내는 변수가 30개 있다." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 11, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "totalwgt_lb 0.00804663447667\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "regression.MiningReport(variables)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "임신기간 동안 알려지지 않은 변수와 다양한 사유로 수상한 냄새가 나는 변수를 제거하고 나서, 저자가 찾은 가장 최적의 모형이 다음에 나와있다:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 12, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Optimization terminated successfully.\n", | |
| " Current function value: 0.691983\n", | |
| " Iterations 4\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<table class=\"simpletable\">\n", | |
| "<caption>Logit Regression Results</caption>\n", | |
| "<tr>\n", | |
| " <th>Dep. Variable:</th> <td>boy</td> <th> No. Observations: </th> <td> 9148</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Model:</th> <td>Logit</td> <th> Df Residuals: </th> <td> 9144</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Method:</th> <td>MLE</td> <th> Df Model: </th> <td> 3</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Date:</th> <td>Mon, 30 Nov 2015</td> <th> Pseudo R-squ.: </th> <td>0.001525</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Time:</th> <td>10:45:58</td> <th> Log-Likelihood: </th> <td> -6330.3</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>converged:</th> <td>True</td> <th> LL-Null: </th> <td> -6339.9</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th> </th> <td> </td> <th> LLR p-value: </th> <td>0.0002326</td>\n", | |
| "</tr>\n", | |
| "</table>\n", | |
| "<table class=\"simpletable\">\n", | |
| "<tr>\n", | |
| " <td></td> <th>coef</th> <th>std err</th> <th>z</th> <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Intercept</th> <td> -0.1863</td> <td> 0.116</td> <td> -1.606</td> <td> 0.108</td> <td> -0.414 0.041</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>fmarout5 == 5[T.True]</th> <td> 0.1493</td> <td> 0.048</td> <td> 3.087</td> <td> 0.002</td> <td> 0.054 0.244</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>infever == 1[T.True]</th> <td> 0.2096</td> <td> 0.063</td> <td> 3.304</td> <td> 0.001</td> <td> 0.085 0.334</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>agepreg</th> <td> 0.0053</td> <td> 0.004</td> <td> 1.252</td> <td> 0.211</td> <td> -0.003 0.014</td>\n", | |
| "</tr>\n", | |
| "</table>" | |
| ], | |
| "text/plain": [ | |
| "<class 'statsmodels.iolib.summary.Summary'>\n", | |
| "\"\"\"\n", | |
| " Logit Regression Results \n", | |
| "==============================================================================\n", | |
| "Dep. Variable: boy No. Observations: 9148\n", | |
| "Model: Logit Df Residuals: 9144\n", | |
| "Method: MLE Df Model: 3\n", | |
| "Date: Mon, 30 Nov 2015 Pseudo R-squ.: 0.001525\n", | |
| "Time: 10:45:58 Log-Likelihood: -6330.3\n", | |
| "converged: True LL-Null: -6339.9\n", | |
| " LLR p-value: 0.0002326\n", | |
| "=========================================================================================\n", | |
| " coef std err z P>|z| [95.0% Conf. Int.]\n", | |
| "-----------------------------------------------------------------------------------------\n", | |
| "Intercept -0.1863 0.116 -1.606 0.108 -0.414 0.041\n", | |
| "fmarout5 == 5[T.True] 0.1493 0.048 3.087 0.002 0.054 0.244\n", | |
| "infever == 1[T.True] 0.2096 0.063 3.304 0.001 0.085 0.334\n", | |
| "agepreg 0.0053 0.004 1.252 0.211 -0.003 0.014\n", | |
| "=========================================================================================\n", | |
| "\"\"\"" | |
| ] | |
| }, | |
| "execution_count": 12, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "formula='boy ~ agepreg + fmarout5==5 + infever==1'\n", | |
| "model = smf.logit(formula, data=join)\n", | |
| "results = model.fit()\n", | |
| "results.summary() " | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "응답자 자녀숫자를 예측하는 모형을 만들어보자. 연령에 대해 비선형모형을 사용했다. age3 항이 지나칠 수 있지만, 저자 생각에는 엉뚱한 선택은 아니라고 본다. 예측에 그다지 효과는 없다." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 13, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "join.numbabes.replace([97], np.nan, inplace=True)\n", | |
| "join['age2'] = join.age_r**2\n", | |
| "join['age3'] = join.age_r**3" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 14, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Optimization terminated successfully.\n", | |
| " Current function value: 1.677637\n", | |
| " Iterations 7\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<table class=\"simpletable\">\n", | |
| "<caption>Poisson Regression Results</caption>\n", | |
| "<tr>\n", | |
| " <th>Dep. Variable:</th> <td>numbabes</td> <th> No. Observations: </th> <td> 9148</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Model:</th> <td>Poisson</td> <th> Df Residuals: </th> <td> 9140</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Method:</th> <td>MLE</td> <th> Df Model: </th> <td> 7</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Date:</th> <td>Mon, 30 Nov 2015</td> <th> Pseudo R-squ.: </th> <td>0.03665</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Time:</th> <td>10:46:04</td> <th> Log-Likelihood: </th> <td> -15347.</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>converged:</th> <td>True</td> <th> LL-Null: </th> <td> -15931.</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th> </th> <td> </td> <th> LLR p-value: </th> <td>6.721e-248</td>\n", | |
| "</tr>\n", | |
| "</table>\n", | |
| "<table class=\"simpletable\">\n", | |
| "<tr>\n", | |
| " <td></td> <th>coef</th> <th>std err</th> <th>z</th> <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Intercept</th> <td> -3.2731</td> <td> 0.719</td> <td> -4.551</td> <td> 0.000</td> <td> -4.683 -1.863</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>C(race)[T.2]</th> <td> -0.1403</td> <td> 0.014</td> <td> -9.683</td> <td> 0.000</td> <td> -0.169 -0.112</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>C(race)[T.3]</th> <td> -0.0959</td> <td> 0.024</td> <td> -3.957</td> <td> 0.000</td> <td> -0.143 -0.048</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>age_r</th> <td> 0.3744</td> <td> 0.069</td> <td> 5.433</td> <td> 0.000</td> <td> 0.239 0.510</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>age2</th> <td> -0.0090</td> <td> 0.002</td> <td> -4.158</td> <td> 0.000</td> <td> -0.013 -0.005</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>age3</th> <td> 7.12e-05</td> <td> 2.2e-05</td> <td> 3.237</td> <td> 0.001</td> <td> 2.81e-05 0.000</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>totincr</th> <td> -0.0185</td> <td> 0.002</td> <td> -9.858</td> <td> 0.000</td> <td> -0.022 -0.015</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>educat</th> <td> -0.0463</td> <td> 0.003</td> <td> -15.988</td> <td> 0.000</td> <td> -0.052 -0.041</td>\n", | |
| "</tr>\n", | |
| "</table>" | |
| ], | |
| "text/plain": [ | |
| "<class 'statsmodels.iolib.summary.Summary'>\n", | |
| "\"\"\"\n", | |
| " Poisson Regression Results \n", | |
| "==============================================================================\n", | |
| "Dep. Variable: numbabes No. Observations: 9148\n", | |
| "Model: Poisson Df Residuals: 9140\n", | |
| "Method: MLE Df Model: 7\n", | |
| "Date: Mon, 30 Nov 2015 Pseudo R-squ.: 0.03665\n", | |
| "Time: 10:46:04 Log-Likelihood: -15347.\n", | |
| "converged: True LL-Null: -15931.\n", | |
| " LLR p-value: 6.721e-248\n", | |
| "================================================================================\n", | |
| " coef std err z P>|z| [95.0% Conf. Int.]\n", | |
| "--------------------------------------------------------------------------------\n", | |
| "Intercept -3.2731 0.719 -4.551 0.000 -4.683 -1.863\n", | |
| "C(race)[T.2] -0.1403 0.014 -9.683 0.000 -0.169 -0.112\n", | |
| "C(race)[T.3] -0.0959 0.024 -3.957 0.000 -0.143 -0.048\n", | |
| "age_r 0.3744 0.069 5.433 0.000 0.239 0.510\n", | |
| "age2 -0.0090 0.002 -4.158 0.000 -0.013 -0.005\n", | |
| "age3 7.12e-05 2.2e-05 3.237 0.001 2.81e-05 0.000\n", | |
| "totincr -0.0185 0.002 -9.858 0.000 -0.022 -0.015\n", | |
| "educat -0.0463 0.003 -15.988 0.000 -0.052 -0.041\n", | |
| "================================================================================\n", | |
| "\"\"\"" | |
| ] | |
| }, | |
| "execution_count": 14, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "formula='numbabes ~ age_r + age2 + age3 + C(race) + totincr + educat'\n", | |
| "model = smf.poisson(formula, data=join)\n", | |
| "results = model.fit()\n", | |
| "results.summary() " | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "이제, 35세, 흑인, 대학졸업하고 연간 가구소등이 $75,000을 넘는 여성에 대한 자녀수를 예측할 수 있다. " | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 15, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "array([ 2.47393019])" | |
| ] | |
| }, | |
| "execution_count": 15, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "import pandas\n", | |
| "columns = ['age_r', 'age2', 'age3', 'race', 'totincr', 'educat']\n", | |
| "new = pandas.DataFrame([[35, 35**2, 35**3, 1, 14, 16]], columns=columns)\n", | |
| "results.predict(new)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "혼인상태를 예측하기 위하는데, 저자가 찾은 가장 최적의 모형이 다음에 나와 있다:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 17, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Optimization terminated successfully.\n", | |
| " Current function value: 1.092083\n", | |
| " Iterations 8\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<table class=\"simpletable\">\n", | |
| "<caption>MNLogit Regression Results</caption>\n", | |
| "<tr>\n", | |
| " <th>Dep. Variable:</th> <td>rmarital</td> <th> No. Observations: </th> <td> 9148</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Model:</th> <td>MNLogit</td> <th> Df Residuals: </th> <td> 9113</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Method:</th> <td>MLE</td> <th> Df Model: </th> <td> 30</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Date:</th> <td>Mon, 30 Nov 2015</td> <th> Pseudo R-squ.: </th> <td>0.1661</td> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Time:</th> <td>10:46:25</td> <th> Log-Likelihood: </th> <td> -9990.4</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>converged:</th> <td>True</td> <th> LL-Null: </th> <td> -11981.</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th> </th> <td> </td> <th> LLR p-value: </th> <td> 0.000</td> \n", | |
| "</tr>\n", | |
| "</table>\n", | |
| "<table class=\"simpletable\">\n", | |
| "<tr>\n", | |
| " <th>rmarital=2</th> <th>coef</th> <th>std err</th> <th>z</th> <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Intercept</th> <td> 8.9153</td> <td> 0.792</td> <td> 11.251</td> <td> 0.000</td> <td> 7.362 10.468</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>C(race)[T.2]</th> <td> -0.9260</td> <td> 0.087</td> <td> -10.705</td> <td> 0.000</td> <td> -1.096 -0.756</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>C(race)[T.3]</th> <td> -0.6335</td> <td> 0.133</td> <td> -4.747</td> <td> 0.000</td> <td> -0.895 -0.372</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>age_r</th> <td> -0.3567</td> <td> 0.050</td> <td> -7.132</td> <td> 0.000</td> <td> -0.455 -0.259</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>age2</th> <td> 0.0047</td> <td> 0.001</td> <td> 6.054</td> <td> 0.000</td> <td> 0.003 0.006</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>totincr</th> <td> -0.1301</td> <td> 0.011</td> <td> -11.475</td> <td> 0.000</td> <td> -0.152 -0.108</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>educat</th> <td> -0.1940</td> <td> 0.018</td> <td> -10.534</td> <td> 0.000</td> <td> -0.230 -0.158</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>rmarital=3</th> <th>coef</th> <th>std err</th> <th>z</th> <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Intercept</th> <td> 2.9927</td> <td> 2.970</td> <td> 1.007</td> <td> 0.314</td> <td> -2.829 8.815</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>C(race)[T.2]</th> <td> -0.3963</td> <td> 0.235</td> <td> -1.685</td> <td> 0.092</td> <td> -0.857 0.065</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>C(race)[T.3]</th> <td> 0.0650</td> <td> 0.336</td> <td> 0.194</td> <td> 0.846</td> <td> -0.593 0.723</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>age_r</th> <td> -0.3141</td> <td> 0.174</td> <td> -1.806</td> <td> 0.071</td> <td> -0.655 0.027</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>age2</th> <td> 0.0064</td> <td> 0.003</td> <td> 2.532</td> <td> 0.011</td> <td> 0.001 0.011</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>totincr</th> <td> -0.3217</td> <td> 0.032</td> <td> -10.135</td> <td> 0.000</td> <td> -0.384 -0.259</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>educat</th> <td> -0.1093</td> <td> 0.048</td> <td> -2.266</td> <td> 0.023</td> <td> -0.204 -0.015</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>rmarital=4</th> <th>coef</th> <th>std err</th> <th>z</th> <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Intercept</th> <td> -3.6475</td> <td> 1.193</td> <td> -3.059</td> <td> 0.002</td> <td> -5.985 -1.310</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>C(race)[T.2]</th> <td> -0.3303</td> <td> 0.091</td> <td> -3.641</td> <td> 0.000</td> <td> -0.508 -0.153</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>C(race)[T.3]</th> <td> -0.8227</td> <td> 0.170</td> <td> -4.853</td> <td> 0.000</td> <td> -1.155 -0.490</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>age_r</th> <td> 0.1238</td> <td> 0.070</td> <td> 1.763</td> <td> 0.078</td> <td> -0.014 0.261</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>age2</th> <td> -0.0008</td> <td> 0.001</td> <td> -0.814</td> <td> 0.416</td> <td> -0.003 0.001</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>totincr</th> <td> -0.2288</td> <td> 0.011</td> <td> -20.058</td> <td> 0.000</td> <td> -0.251 -0.206</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>educat</th> <td> 0.0661</td> <td> 0.016</td> <td> 4.015</td> <td> 0.000</td> <td> 0.034 0.098</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>rmarital=5</th> <th>coef</th> <th>std err</th> <th>z</th> <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Intercept</th> <td> -2.3978</td> <td> 1.269</td> <td> -1.889</td> <td> 0.059</td> <td> -4.886 0.090</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>C(race)[T.2]</th> <td> -1.0493</td> <td> 0.101</td> <td> -10.366</td> <td> 0.000</td> <td> -1.248 -0.851</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>C(race)[T.3]</th> <td> -0.6065</td> <td> 0.154</td> <td> -3.937</td> <td> 0.000</td> <td> -0.908 -0.305</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>age_r</th> <td> 0.2084</td> <td> 0.077</td> <td> 2.699</td> <td> 0.007</td> <td> 0.057 0.360</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>age2</th> <td> -0.0030</td> <td> 0.001</td> <td> -2.619</td> <td> 0.009</td> <td> -0.005 -0.001</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>totincr</th> <td> -0.2900</td> <td> 0.014</td> <td> -20.314</td> <td> 0.000</td> <td> -0.318 -0.262</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>educat</th> <td> -0.0176</td> <td> 0.021</td> <td> -0.835</td> <td> 0.404</td> <td> -0.059 0.024</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>rmarital=6</th> <th>coef</th> <th>std err</th> <th>z</th> <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>Intercept</th> <td> 8.1722</td> <td> 0.797</td> <td> 10.252</td> <td> 0.000</td> <td> 6.610 9.734</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>C(race)[T.2]</th> <td> -2.1628</td> <td> 0.079</td> <td> -27.532</td> <td> 0.000</td> <td> -2.317 -2.009</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>C(race)[T.3]</th> <td> -1.9701</td> <td> 0.136</td> <td> -14.472</td> <td> 0.000</td> <td> -2.237 -1.703</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>age_r</th> <td> -0.2188</td> <td> 0.050</td> <td> -4.337</td> <td> 0.000</td> <td> -0.318 -0.120</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>age2</th> <td> 0.0020</td> <td> 0.001</td> <td> 2.505</td> <td> 0.012</td> <td> 0.000 0.004</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>totincr</th> <td> -0.2888</td> <td> 0.011</td> <td> -25.418</td> <td> 0.000</td> <td> -0.311 -0.267</td>\n", | |
| "</tr>\n", | |
| "<tr>\n", | |
| " <th>educat</th> <td> -0.0802</td> <td> 0.018</td> <td> -4.582</td> <td> 0.000</td> <td> -0.115 -0.046</td>\n", | |
| "</tr>\n", | |
| "</table>" | |
| ], | |
| "text/plain": [ | |
| "<class 'statsmodels.iolib.summary.Summary'>\n", | |
| "\"\"\"\n", | |
| " MNLogit Regression Results \n", | |
| "==============================================================================\n", | |
| "Dep. Variable: rmarital No. Observations: 9148\n", | |
| "Model: MNLogit Df Residuals: 9113\n", | |
| "Method: MLE Df Model: 30\n", | |
| "Date: Mon, 30 Nov 2015 Pseudo R-squ.: 0.1661\n", | |
| "Time: 10:46:25 Log-Likelihood: -9990.4\n", | |
| "converged: True LL-Null: -11981.\n", | |
| " LLR p-value: 0.000\n", | |
| "================================================================================\n", | |
| " rmarital=2 coef std err z P>|z| [95.0% Conf. Int.]\n", | |
| "--------------------------------------------------------------------------------\n", | |
| "Intercept 8.9153 0.792 11.251 0.000 7.362 10.468\n", | |
| "C(race)[T.2] -0.9260 0.087 -10.705 0.000 -1.096 -0.756\n", | |
| "C(race)[T.3] -0.6335 0.133 -4.747 0.000 -0.895 -0.372\n", | |
| "age_r -0.3567 0.050 -7.132 0.000 -0.455 -0.259\n", | |
| "age2 0.0047 0.001 6.054 0.000 0.003 0.006\n", | |
| "totincr -0.1301 0.011 -11.475 0.000 -0.152 -0.108\n", | |
| "educat -0.1940 0.018 -10.534 0.000 -0.230 -0.158\n", | |
| "--------------------------------------------------------------------------------\n", | |
| " rmarital=3 coef std err z P>|z| [95.0% Conf. Int.]\n", | |
| "--------------------------------------------------------------------------------\n", | |
| "Intercept 2.9927 2.970 1.007 0.314 -2.829 8.815\n", | |
| "C(race)[T.2] -0.3963 0.235 -1.685 0.092 -0.857 0.065\n", | |
| "C(race)[T.3] 0.0650 0.336 0.194 0.846 -0.593 0.723\n", | |
| "age_r -0.3141 0.174 -1.806 0.071 -0.655 0.027\n", | |
| "age2 0.0064 0.003 2.532 0.011 0.001 0.011\n", | |
| "totincr -0.3217 0.032 -10.135 0.000 -0.384 -0.259\n", | |
| "educat -0.1093 0.048 -2.266 0.023 -0.204 -0.015\n", | |
| "--------------------------------------------------------------------------------\n", | |
| " rmarital=4 coef std err z P>|z| [95.0% Conf. Int.]\n", | |
| "--------------------------------------------------------------------------------\n", | |
| "Intercept -3.6475 1.193 -3.059 0.002 -5.985 -1.310\n", | |
| "C(race)[T.2] -0.3303 0.091 -3.641 0.000 -0.508 -0.153\n", | |
| "C(race)[T.3] -0.8227 0.170 -4.853 0.000 -1.155 -0.490\n", | |
| "age_r 0.1238 0.070 1.763 0.078 -0.014 0.261\n", | |
| "age2 -0.0008 0.001 -0.814 0.416 -0.003 0.001\n", | |
| "totincr -0.2288 0.011 -20.058 0.000 -0.251 -0.206\n", | |
| "educat 0.0661 0.016 4.015 0.000 0.034 0.098\n", | |
| "--------------------------------------------------------------------------------\n", | |
| " rmarital=5 coef std err z P>|z| [95.0% Conf. Int.]\n", | |
| "--------------------------------------------------------------------------------\n", | |
| "Intercept -2.3978 1.269 -1.889 0.059 -4.886 0.090\n", | |
| "C(race)[T.2] -1.0493 0.101 -10.366 0.000 -1.248 -0.851\n", | |
| "C(race)[T.3] -0.6065 0.154 -3.937 0.000 -0.908 -0.305\n", | |
| "age_r 0.2084 0.077 2.699 0.007 0.057 0.360\n", | |
| "age2 -0.0030 0.001 -2.619 0.009 -0.005 -0.001\n", | |
| "totincr -0.2900 0.014 -20.314 0.000 -0.318 -0.262\n", | |
| "educat -0.0176 0.021 -0.835 0.404 -0.059 0.024\n", | |
| "--------------------------------------------------------------------------------\n", | |
| " rmarital=6 coef std err z P>|z| [95.0% Conf. Int.]\n", | |
| "--------------------------------------------------------------------------------\n", | |
| "Intercept 8.1722 0.797 10.252 0.000 6.610 9.734\n", | |
| "C(race)[T.2] -2.1628 0.079 -27.532 0.000 -2.317 -2.009\n", | |
| "C(race)[T.3] -1.9701 0.136 -14.472 0.000 -2.237 -1.703\n", | |
| "age_r -0.2188 0.050 -4.337 0.000 -0.318 -0.120\n", | |
| "age2 0.0020 0.001 2.505 0.012 0.000 0.004\n", | |
| "totincr -0.2888 0.011 -25.418 0.000 -0.311 -0.267\n", | |
| "educat -0.0802 0.018 -4.582 0.000 -0.115 -0.046\n", | |
| "================================================================================\n", | |
| "\"\"\"" | |
| ] | |
| }, | |
| "execution_count": 17, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "formula='rmarital ~ age_r + age2 + C(race) + totincr + educat'\n", | |
| "model = smf.mnlogit(formula, data=join)\n", | |
| "results = model.fit()\n", | |
| "results.summary() " | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "25세, 백인, 고등학교 졸업, 연간 가구소득이 $45,000인 여성에 대한 예측이 다음에 나와있다." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 18, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "array([[ 0.74568869, 0.12829132, 0.0016241 , 0.03280099, 0.02188668,\n", | |
| " 0.06970823]])" | |
| ] | |
| }, | |
| "execution_count": 18, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "columns = ['age_r', 'age2', 'race', 'totincr', 'educat']\n", | |
| "new = pandas.DataFrame([[25, 25**2, 2, 11, 12]], columns=columns)\n", | |
| "results.predict(new)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "그래서, 이 분의 경우 현재 결혼가능성이 75%이고, \"결혼하지 않고 다른 성과 동거\" 가능성이 13% 등이다." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 2", | |
| "language": "python", | |
| "name": "python2" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 2 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython2", | |
| "version": "2.7.10" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 0 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment