Skip to content

Instantly share code, notes, and snippets.

@ClebsonDantasUchoa
Created October 13, 2018 21:17
Show Gist options
  • Select an option

  • Save ClebsonDantasUchoa/7b7c30ed55e24e13c724b006e5949764 to your computer and use it in GitHub Desktop.

Select an option

Save ClebsonDantasUchoa/7b7c30ed55e24e13c724b006e5949764 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Resoluçao do problema: https://www.codenation.com.br/journey/data-science/challenge/enem-4.html\n",
"\n",
"## Definição: \n",
"Neste desafio deverá descobrir quais estudantes estão fazendo a prova apenas para treino.\n",
"\n",
"Alguns estudantes decidem realizar prova do ENEM de forma precoce, como um teste (coluna IN_TREINEIRO). Neste desafio, você deve criar um modelo de classificação binária para inferir a mesma. Os resultados possíveis da sua resposta devem ser “0” ou “1”.\n",
"\n",
"Salve sua resposta em um arquivo chamado answer.csv com duas colunas: NU_INSCRICAO e IN_TREINEIRO.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Importação das bibliotecas"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import pandas as pd\n",
"import numpy as np\n",
"from sklearn import linear_model\n",
"from sklearn import metrics\n",
"import matplotlib.pyplot as plt\n",
"from sklearn import tree\n",
"from sklearn import svm\n",
"from sklearn import neighbors\n",
"from sklearn.ensemble import GradientBoostingRegressor\n",
"from sklearn import model_selection"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## leitura do arquivo de treino"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"train = pd.read_csv('train.csv')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"columns=[\n",
" 'NU_NOTA_CN','NU_NOTA_CH','NU_NOTA_LC','NU_NOTA_REDACAO','TP_ST_CONCLUSAO','IN_TREINEIRO'\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"train = train[columns]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualização dos dados após a filtragem"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>NU_NOTA_CN</th>\n",
" <th>NU_NOTA_CH</th>\n",
" <th>NU_NOTA_LC</th>\n",
" <th>NU_NOTA_REDACAO</th>\n",
" <th>TP_ST_CONCLUSAO</th>\n",
" <th>IN_TREINEIRO</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>436.3</td>\n",
" <td>495.4</td>\n",
" <td>581.2</td>\n",
" <td>520.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>474.5</td>\n",
" <td>544.1</td>\n",
" <td>599.0</td>\n",
" <td>580.0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" NU_NOTA_CN NU_NOTA_CH NU_NOTA_LC NU_NOTA_REDACAO TP_ST_CONCLUSAO \\\n",
"0 436.3 495.4 581.2 520.0 1 \n",
"1 474.5 544.1 599.0 580.0 2 \n",
"2 NaN NaN NaN NaN 3 \n",
"3 NaN NaN NaN NaN 1 \n",
"4 NaN NaN NaN NaN 1 \n",
"\n",
" IN_TREINEIRO \n",
"0 0 \n",
"1 0 \n",
"2 0 \n",
"3 0 \n",
"4 0 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tratamento de dados faltantes"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NU_NOTA_CN 3389\n",
"NU_NOTA_CH 3389\n",
"NU_NOTA_LC 3597\n",
"NU_NOTA_REDACAO 3597\n",
"TP_ST_CONCLUSAO 0\n",
"IN_TREINEIRO 0\n",
"dtype: int64\n"
]
}
],
"source": [
"print(train.isnull().sum())"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"train.fillna(0, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NU_NOTA_CN 0\n",
"NU_NOTA_CH 0\n",
"NU_NOTA_LC 0\n",
"NU_NOTA_REDACAO 0\n",
"TP_ST_CONCLUSAO 0\n",
"IN_TREINEIRO 0\n",
"dtype: int64\n"
]
}
],
"source": [
"print(train.isnull().sum())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Separação dos dados de treino entre treino e teste"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"X = train.values[:, :-1]\n",
"y = train.values[:, -1]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, \n",
" test_size=0.3, random_state=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Treinamento e avaliçao dos modelos"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## DecisionTree"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0.0 0.98 0.96 0.97 3588\n",
" 1.0 0.78 0.86 0.82 531\n",
"\n",
"avg / total 0.95 0.95 0.95 4119\n",
"\n",
"accuracy: \n",
"0.9502306385044914\n"
]
}
],
"source": [
"dt = tree.DecisionTreeClassifier()\n",
"dt.fit(X_train, y_train)\n",
"resposta_dt = dt.predict(X_test)\n",
"print(metrics.classification_report(y_test, resposta_dt))\n",
"accuracy_dt = metrics.accuracy_score(y_test, resposta_dt)\n",
"print('accuracy: ')\n",
"print(accuracy_dt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## LogisticRegression"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0.0 0.89 0.95 0.91 3588\n",
" 1.0 0.33 0.18 0.23 531\n",
"\n",
"avg / total 0.81 0.85 0.83 4119\n",
"\n",
"accuracy: \n",
"0.8468074775430929\n"
]
}
],
"source": [
"lr = linear_model.LogisticRegression()\n",
"lr.fit(X_train, y_train)\n",
"resposta_lr = lr.predict(X_test)\n",
"print(metrics.classification_report(y_test, resposta_lr))\n",
"accuracy_lr = metrics.accuracy_score(y_test, resposta_lr);\n",
"print('accuracy: ')\n",
"print(accuracy_lr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## SVC"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0.0 0.89 0.99 0.93 3588\n",
" 1.0 0.61 0.14 0.23 531\n",
"\n",
"avg / total 0.85 0.88 0.84 4119\n",
"\n",
"accuracy: \n",
"0.8778829813061423\n"
]
}
],
"source": [
"svc = svm.SVC()\n",
"svc.fit(X_train, y_train)\n",
"resposta_svc = svc.predict(X_test)\n",
"print(metrics.classification_report(y_test, resposta_svc))\n",
"accuracy_svc = metrics.accuracy_score(y_test, resposta_svc);\n",
"print('accuracy: ')\n",
"print(accuracy_svc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## KNN"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0.0 0.89 0.97 0.93 3588\n",
" 1.0 0.46 0.17 0.25 531\n",
"\n",
"avg / total 0.83 0.87 0.84 4119\n",
"\n",
"accuracy: \n",
"0.8676863316338917\n"
]
}
],
"source": [
"knn = neighbors.KNeighborsClassifier()\n",
"knn.fit(X_train, y_train)\n",
"resposta_knn = knn.predict(X_test)\n",
"print(metrics.classification_report(y_test, resposta_knn))\n",
"accuracy_knn = metrics.accuracy_score(y_test, resposta_knn);\n",
"print('accuracy: ')\n",
"print(accuracy_knn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## GradientBoostingRegressor"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0.0 0.89 0.97 0.93 3588\n",
" 1.0 0.46 0.17 0.25 531\n",
"\n",
"avg / total 0.83 0.87 0.84 4119\n",
"\n",
"accuracy: \n",
"0.8676863316338917\n"
]
}
],
"source": [
"modeloGBR = GradientBoostingRegressor()\n",
"modeloGBR.fit(X_train, y_train)\n",
"resposta_gbr = knn.predict(X_test)\n",
"print(metrics.classification_report(y_test, resposta_gbr))\n",
"accuracy_gbr = metrics.accuracy_score(y_test, resposta_gbr);\n",
"print('accuracy: ')\n",
"print(accuracy_gbr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comparação entre modelos"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"comparacao = pd.DataFrame(data=[[\n",
" accuracy_dt, accuracy_lr, accuracy_svc, accuracy_knn, accuracy_gbr]],\n",
" columns=['DT', 'LR', 'SVC', 'KNN', 'GBR'])"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DT</th>\n",
" <th>LR</th>\n",
" <th>SVC</th>\n",
" <th>KNN</th>\n",
" <th>GBR</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.950231</td>\n",
" <td>0.846807</td>\n",
" <td>0.877883</td>\n",
" <td>0.867686</td>\n",
" <td>0.867686</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" DT LR SVC KNN GBR\n",
"0 0.950231 0.846807 0.877883 0.867686 0.867686"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"comparacao.head()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"comparacao = comparacao.transpose()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEICAYAAACktLTqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAE7ZJREFUeJzt3X+QXfV53/H3xwghghXsSli1WRzBGHeCTRzMAu6Qxqvg1IK2IlNTW5TiEIcoM7XiH9AfysSDU9J23HiI28Qktdq6dtyGDUkmtlrLhjYxSdqGGKgdA2IoG8BmRYux7FBKUITkp3/cq+xlvdLe1b3au/re92tGM3vO+d5zH55ZPnvu9/y4qSokSW15yagLkCQNn+EuSQ0y3CWpQYa7JDXIcJekBhnuktSgRcM9yceTfD3JA0fYniS/mGQmyVeSvHH4ZUqSlqKfI/dPAJuPsv1y4Nzuv23ArwxeliRpEIuGe1X9PvDNowy5EvjV6rgbeFmSVw6rQEnS0g1jzv1M4Ime5dnuuu+QZFuSe7v/tg3hvSVJC1g1hH1kgXULPtOgqnYCOwHWr19fk5OTHxvC+w/kueee47TTTht1GSuCveiwD3PsxZyV0ov77rvvG1V1xmLjhhHus8BZPcsTwJOLvWjjxo3ce++9Q3j7wdx1111MTU2NuowVwV502Ic59mLOSulFkq/2M24Y0zK7gHd2r5p5E/BMVf3vIexXknSMFj1yT3IbMAWsTzILfBA4GaCq/jWwG7gCmAH+DPix41WsJKk/i4Z7VV29yPYC3j20iiRJAxvGnLsknVBeeOEFZmdn2b9/f9+vOf3003nooYeOY1UvtmbNGiYmJjj55JOP6fWGu6SxMzs7y9q1a9m4cSPJQhf8fadnn32WtWvXHufKOqqKffv2MTs7y9lnn31M+/DZMpLGzv79+1m3bl3fwb7ckrBu3bolfbKYz3CXNJZWarAfNmh9hrskNeiEnnPfuOOzA+/jxvMPct0A+3n8Q39j4BokjdYwsqRXP7nw+c9/nve+970cOnSI66+/nh07dgy1Bo/cJWmZHTp0iHe/+9187nOfY8+ePdx2223s2bNnqO9huEvSMvviF7/Ia17zGs455xxWr17N1q1b+cxnPjPU9zDcJWmZ7d27l7POmnsk18TEBHv37h3qexjukrTMOjf2v9iwr94x3CVpmU1MTPDEE3NfgzE7O8urXvWqob6H4S5Jy+yiiy7ikUce4bHHHuPAgQNMT0+zZcuWob7HCX0ppCQNQz+XLg7z8QOrVq3iox/9KG9961s5dOgQ73rXu3jd6143lH3/xXsMdW+SpL5cccUVXHHFFcdt/07LSFKDDHdJapDhLmksLXQ54koyaH2Gu6Sxs2bNGvbt27diA/7w89zXrFlzzPvwhKqksTMxMcHs7CxPP/1036/Zv3//QGG7VIe/ielYGe6Sxs7JJ5+85G84uuuuu7jggguOU0XD57SMJDXII3c1Z9Bncw/6jH/wOf8aPY/cJalBHrlLDfNTzJxx64VH7pLUIMNdkhpkuEtSgwx3SWqQJ1QbMejJIhj8hNFKOXEmySN3SWqS4S5JDTLcJalBhrskNchwl6QGGe6S1KC+wj3J5iQPJ5lJsmOB7a9O8oUkX0rylSTH7yu9JUmLWjTck5wE3ApcDpwHXJ3kvHnDPgDcXlUXAFuBXx52oZKk/vVz5H4xMFNVj1bVAWAauHLemAK+u/vz6cCTwytRkrRUWewLYpNcBWyuquu7y9cCl1TV9p4xrwTuBF4OnAa8paruW2Bf24BtABs2bLhwenp6oOLv3/vMQK8H2HAqPPX8sb/+/DNPH7iGYbAXcwbtxaB9AHvRy17MGUYvNm3adF9VTS42rp/HD2SBdfP/IlwNfKKqbknyV4FPJXl9VX37RS+q2gnsBJicnKypqak+3v7IBn22MnRuub/l/mN/CsPj10wNXMMw2Is5g/Zi0D6AvehlL+YsZy/6mZaZBc7qWZ7gO6ddfhy4HaCq/hBYA6wfRoGSpKXrJ9zvAc5NcnaS1XROmO6aN+ZrwGUASb6XTrg/PcxCJUn9WzTcq+ogsB24A3iIzlUxDya5OcmW7rAbgZ9I8sfAbcB1tdhkviTpuOlrAqmqdgO75627qefnPcClwy1NknSsvENVkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoP6Cvckm5M8nGQmyY4jjHl7kj1JHkzya8MtU5K0FKsWG5DkJOBW4IeBWeCeJLuqak/PmHOBnwYurapvJXnF8SpYkrS4fo7cLwZmqurRqjoATANXzhvzE8CtVfUtgKr6+nDLlCQtRT/hfibwRM/ybHddr9cCr03y35PcnWTzsAqUJC1dquroA5K/A7y1qq7vLl8LXFxVP9Uz5j8DLwBvByaAPwBeX1V/Om9f24BtABs2bLhwenp6oOLv3/vMQK8H2HAqPPX8sb/+/DNPH7iGYbAXcwbtxaB9AHvRy17MGUYvNm3adF9VTS42btE5dzpH6mf1LE8ATy4w5u6qegF4LMnDwLnAPb2DqmonsBNgcnKypqam+nj7I7tux2cHej3Ajecf5Jb7+2nDwh6/ZmrgGobBXswZtBeD9gHsRS97MWc5e9HPtMw9wLlJzk6yGtgK7Jo35tPAJoAk6+lM0zw6zEIlSf1bNNyr6iCwHbgDeAi4vaoeTHJzki3dYXcA+5LsAb4A/MOq2ne8ipYkHV1fnzGqajewe966m3p+LuCG7j9J0oh5h6okNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktSgvsI9yeYkDyeZSbLjKOOuSlJJJodXoiRpqRYN9yQnAbcClwPnAVcnOW+BcWuB9wB/NOwiJUlL08+R+8XATFU9WlUHgGngygXG/Rzw88D+IdYnSToGqaqjD0iuAjZX1fXd5WuBS6pqe8+YC4APVNXbktwF/IOquneBfW0DtgFs2LDhwunp6YGKv3/vMwO9HmDDqfDU88f++vPPPH3gGobBXswZtBeD9gHsRS97MWcYvdi0adN9VbXo1PeqPvaVBdb9xV+EJC8BPgJct9iOqmonsBNgcnKypqam+nj7I7tux2cHej3Ajecf5Jb7+2nDwh6/ZmrgGobBXswZtBeD9gHsRS97MWc5e9HPtMwscFbP8gTwZM/yWuD1wF1JHgfeBOzypKokjU4/4X4PcG6Ss5OsBrYCuw5vrKpnqmp9VW2sqo3A3cCWhaZlJEnLY9Fwr6qDwHbgDuAh4PaqejDJzUm2HO8CJUlL19cEUlXtBnbPW3fTEcZODV6WJGkQ3qEqSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1qK9wT7I5ycNJZpLsWGD7DUn2JPlKkt9J8j3DL1WS1K9Fwz3JScCtwOXAecDVSc6bN+xLwGRVfR/wm8DPD7tQSVL/+jlyvxiYqapHq+oAMA1c2Tugqr5QVX/WXbwbmBhumZKkpUhVHX1AchWwuaqu7y5fC1xSVduPMP6jwP+pqn+6wLZtwDaADRs2XDg9PT1Q8ffvfWag1wNsOBWeev7YX3/+macPXMMw2Is5g/Zi0D6AvehlL+YMoxebNm26r6omFxu3qo99ZYF1C/5FSPL3gEngzQttr6qdwE6AycnJmpqa6uPtj+y6HZ8d6PUAN55/kFvu76cNC3v8mqmBaxgGezFn0F4M2gewF73sxZzl7EU/lc4CZ/UsTwBPzh+U5C3AzwBvrqo/H055kqRj0c+c+z3AuUnOTrIa2Ars6h2Q5ALgY8CWqvr68MuUJC3FouFeVQeB7cAdwEPA7VX1YJKbk2zpDvsw8FLgN5J8OcmuI+xOkrQM+ppAqqrdwO55627q+fktQ65LkjQA71CVpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1KC+wj3J5iQPJ5lJsmOB7ack+fXu9j9KsnHYhUqS+rdouCc5CbgVuBw4D7g6yXnzhv048K2qeg3wEeBfDLtQSVL/+jlyvxiYqapHq+oAMA1cOW/MlcAnuz//JnBZkgyvTEnSUqSqjj4guQrYXFXXd5evBS6pqu09Yx7ojpntLv9Jd8w35u1rG7Ctu/hXgIeH9R8ygPXANxYdNR7sRYd9mGMv5qyUXnxPVZ2x2KBVfexooSPw+X8R+hlDVe0Edvbxnssmyb1VNTnqOlYCe9FhH+bYizknWi/6mZaZBc7qWZ4AnjzSmCSrgNOBbw6jQEnS0vUT7vcA5yY5O8lqYCuwa96YXcCPdn++CvjdWmy+R5J03Cw6LVNVB5NsB+4ATgI+XlUPJrkZuLeqdgH/DvhUkhk6R+xbj2fRQ7aipolGzF502Ic59mLOCdWLRU+oSpJOPN6hKkkNMtwlqUGGuyQ1yHCXpAaNVbgn+cSoa9DKkuSiJJcvsH5LkgtHUZNWtiSnjbqGfvRzh2pLvm/UBawk3YfCvfzwYyK69zFcB7y/qr53lLUtow/T+W+ebw+dS99+aFmrGaEkNx1lc1XVzy1bMStAkjOBVwJfqaoDSV4BvI/O78urRllbP8Yt3L8ryQUs/LgEqup/LnM9I5NkK/Ax4LkkjwA/C3yKzk1r14ywtOW2rqoen7+yqmaSrBtBPaP03ALrvgu4HlgHjE24J3kf8DPADHBKkn8F/ALwq8AJ8YlurK5zT/IsnfBa8Fk4VTVOR2kPAD/SDbE3An8IbK2q3x5xacsqyUz3UdVL2ta6JGuB99J5nPftwC1V9fXRVrV8kuwBfqCqvpnk1XRC/ger6u4Rl9a3cTtynxmnAF/Egaqagc4nliSPjVuwd/3XJP8M+EDvIzOS/BPgd0dX1mgk+UvADXQ+vX0SeGNVfWu0VY3E/qr6JkBVfS3J/zqRgh3GL9w15xVJbuhZfmnvclX9wghqGoUb6Tw+YybJl7vr3gDcS2c6Ymwk+TDwt+mcazi/qv7fiEsapYkkv9iz/Ire5ap6zwhqWpJxm5b561V1Z/fnMwCq6unRVjUaST54lM1VVTcvWzErQJJzgNd1Fx+sqkdHWc8oJPk28OfAQV78yO7Q+Z347pEUNgJJfvRo26vqk0fbvhKMW7gH+CCwnc4v7Evo/CL/0riF2dEkeV9V/ctR17EcunOr/wH49ar6k1HXIw3LWF3nTucypkuBi6pqXVW9HLgEuDTJ+0db2opyw+JDmnE1sBa4s/vl7u9LsuIvc9PxlWR9kg8meU+Slyb5lSQPJPlMkhPiJPu4Hbl/CfjhBb7+7wzgzqq6YDSVrSxJnqiqsxYf2ZYkbwLeAbyNztURt1XVvxltVcunezVZ8eKryYrOubnVVTU25+iS3EnnvMta4DLg3wP/CfhrwDVVNTW66vozbuH+QFW9fqnbxk2Sr1XVq0ddx6gkmQI+ApxXVaeMuJyR6V4O+feBnwR+u6puHHFJyybJH1fVG7pTuV/t/f8hyZer6vtHWF5fxuYvcdeBY9zWnJ6jtO/YBJy6zOWMXJKL6EzRvA14nM4VI78xyppGJcnL6ExhvhP4NTrTmPtGW9WyOwSds8hJ5n8p9rdHUM+SjVu4vyHJ/11gfYA1y13MKFXV2lHXsBIk+efA24E/BaaBS6tqdrRVjUaS9XQuDX0H8HHggqp6ZrRVjcw5SXbRyYbDP9NdPnt0ZfVvrKZlpPmS7AY+VFW/311+J52j968CP3v4RpZxkOQ54Gk688vPzt8+Rvc+kOTNC6w+HJapqt9bznqOxbgduUvz/WXgAYAkPwh8CPgp4PvpTM1cNbrSlt2HmQuw+Z/sxu0o8GXARFXdCpDki8AZdPrwj0dZWL8Md427l/Qcnb8D2FlVvwX8Vs8dq+Pi3x5pSirJ31ruYkbsHwFbe5ZXA5PAaXQ+2az48zHjdp27NN+qJIcPci7jxc+TGbeDn99JsnH+yiQ/BozFTW09VlfVEz3L/62q9lXV1+gE/Io3br+80ny3Ab/XvSLieeAPALo3qozbycT3A/8lyRVV9QhAkp8G/i6w0Bx0y17eu1BV23sWz1jmWo6JJ1Q19ro3L72Szo1sz3XXvRZ46Tg94x8gyWV0nvP/I3QenHYR8DfH7cmQSf4jcNf8m9iS/CQwVVVXj6ay/hnukl4kyQ8Anwb+B/D2qto/4pKWXfdblz5N50Fqh//AXwicQud7EJ4aVW39MtwlAd/x+IFTgBfo3Mwzdk+FPCzJD/Hip4WeMM/4N9wlqUFeLSNJDTLcJalBhrskNchwl6QG/X8bq+hQRhbKWQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"comparacao.plot(kind='bar', grid=True);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Criação do modelo escolhido"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"test = pd.read_csv('test.csv')"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"columns=['NU_NOTA_CN','NU_NOTA_CH', 'NU_NOTA_LC', 'NU_NOTA_REDACAO', 'TP_ST_CONCLUSAO']"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"test = test[columns]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tratamento de dados faltantes"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NU_NOTA_CN 1112\n",
"NU_NOTA_CH 1112\n",
"NU_NOTA_LC 1170\n",
"NU_NOTA_REDACAO 1170\n",
"TP_ST_CONCLUSAO 0\n",
"dtype: int64\n"
]
}
],
"source": [
"print(test.isnull().sum())"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"test = test.fillna(0)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NU_NOTA_CN 0\n",
"NU_NOTA_CH 0\n",
"NU_NOTA_LC 0\n",
"NU_NOTA_REDACAO 0\n",
"TP_ST_CONCLUSAO 0\n",
"dtype: int64\n"
]
}
],
"source": [
"print(test.isnull().sum())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## realizando a predição com o modelo"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"definitivo = tree.DecisionTreeClassifier()\n",
"definitivo.fit(X, y)\n",
"resposta_definitivo = definitivo.predict(test.values)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Criação do arquivo csv"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"answer = pd.DataFrame()\n",
"answer['NU_INSCRICAO'] = pd.read_csv('test.csv')['NU_INSCRICAO']\n",
"answer['IN_TREINEIRO'] = resposta_definitivo"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(4570, 2)"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"answer.shape"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>NU_INSCRICAO</th>\n",
" <th>IN_TREINEIRO</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ba0cc30ba34e7a46764c09dfc38ed83d15828897</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>177f281c68fa032aedbd842a745da68490926cd2</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>6cf0d8b97597d7625cdedc7bdb6c0f052286c334</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>5c356d810fa57671402502cd0933e5601a2ebf1e</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>df47c07bd881c2db3f38c6048bf77c132ad0ceb3</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" NU_INSCRICAO IN_TREINEIRO\n",
"0 ba0cc30ba34e7a46764c09dfc38ed83d15828897 0.0\n",
"1 177f281c68fa032aedbd842a745da68490926cd2 0.0\n",
"2 6cf0d8b97597d7625cdedc7bdb6c0f052286c334 0.0\n",
"3 5c356d810fa57671402502cd0933e5601a2ebf1e 0.0\n",
"4 df47c07bd881c2db3f38c6048bf77c132ad0ceb3 0.0"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"answer.head()"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"answer.to_csv('answer.csv', index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Após submeter o arquivo 'answer.csv' para a codenation, foi obtida a pontuação de 95%"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment