Skip to content

Instantly share code, notes, and snippets.

@jade1508
Last active October 21, 2023 14:13
Show Gist options
  • Select an option

  • Save jade1508/a865911ceae24cb3a9e4fb9f540b8534 to your computer and use it in GitHub Desktop.

Select an option

Save jade1508/a865911ceae24cb3a9e4fb9f540b8534 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 160,
"id": "e29fb7dd-9cb7-4c6c-898c-8f409f10a519",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn import linear_model\n",
"from sklearn.decomposition import PCA\n",
"from sklearn.cluster import KMeans\n",
"from scipy.cluster.hierarchy import dendrogram, linkage\n",
"from sklearn.cluster import AgglomerativeClustering"
]
},
{
"cell_type": "code",
"execution_count": 161,
"id": "228ce577-eaa5-4ca8-bd29-90c3a3d705ae",
"metadata": {},
"outputs": [],
"source": [
"foodavai = pd.read_csv(r'C:\\Users\\data\\DisponibiliteAlimentaire_2017.csv')"
]
},
{
"cell_type": "code",
"execution_count": 162,
"id": "3d7c9398-b9f4-43d5-a28e-b3c1fe9a5ca4",
"metadata": {},
"outputs": [],
"source": [
"gdp = pd.read_csv(r'C:\\Users\\data\\pib_hab.csv')"
]
},
{
"cell_type": "code",
"execution_count": 163,
"id": "51ac992e-841b-4e27-9ec2-b2c22bb7b9a2",
"metadata": {},
"outputs": [],
"source": [
"population0018 = pd.read_csv(r'C:\\Users\\data\\Population_2000_2018.csv')"
]
},
{
"cell_type": "code",
"execution_count": 164,
"id": "4ac24e68-bad6-4bba-a150-479641e0c7db",
"metadata": {},
"outputs": [],
"source": [
"stabpolitic = pd.read_csv(r'C:\\Users\\data\\stabilité politique.csv')"
]
},
{
"cell_type": "markdown",
"id": "c6cc8853-3af4-48a8-9569-a676c13fb14c",
"metadata": {},
"source": [
"## Merge all data"
]
},
{
"cell_type": "markdown",
"id": "3fbbad1a-d8dc-4f42-adba-2be102e51574",
"metadata": {},
"source": [
"### 1. Merge foodavai and population0018"
]
},
{
"cell_type": "code",
"execution_count": 165,
"id": "e041c3f0-31ef-4fb2-8179-391776a8e4e1",
"metadata": {},
"outputs": [],
"source": [
"food_population = pd.merge(foodavai, population0018, on=['Year','Zone'], how='outer', indicator=True)"
]
},
{
"cell_type": "code",
"execution_count": 166,
"id": "7225a069-1916-4874-9cdd-7e35d8490ed1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"_merge\n",
"both 176600\n",
"right_only 4237\n",
"left_only 0\n",
"Name: count, dtype: int64"
]
},
"execution_count": 166,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"food_population._merge.value_counts()"
]
},
{
"cell_type": "markdown",
"id": "e042558c-efaf-4b44-a219-7e6e1f7f9ede",
"metadata": {},
"source": [
"### 2. Select only product poultry meat when analzying the chance of exporting this product"
]
},
{
"cell_type": "code",
"execution_count": 167,
"id": "d28821cd-c3c6-4faa-a2fc-55acb6c079c7",
"metadata": {},
"outputs": [],
"source": [
"food_population = food_population[(food_population._merge == 'both') & (food_population.Product_x == 'Viande de Volailles')]"
]
},
{
"cell_type": "code",
"execution_count": 168,
"id": "8c3b81fc-1300-43e2-a300-580e94f14acd",
"metadata": {},
"outputs": [],
"source": [
"food_population = food_population.drop(columns='_merge')"
]
},
{
"cell_type": "markdown",
"id": "f3ee27e4-9095-4f2b-be2b-961f0a2a7b91",
"metadata": {},
"source": [
"### 3. Merge with gdp"
]
},
{
"cell_type": "code",
"execution_count": 169,
"id": "5fa42d4c-d999-46db-811d-efe3d8d2ca44",
"metadata": {},
"outputs": [],
"source": [
"food_pop_gdp = pd.merge(food_population, gdp, on='Zone', how='outer', indicator=True)"
]
},
{
"cell_type": "code",
"execution_count": 170,
"id": "db7d6aba-c42e-4dc5-999e-3f1d5083983a",
"metadata": {},
"outputs": [],
"source": [
"food_pop_gdp = food_pop_gdp[food_pop_gdp._merge == 'both']"
]
},
{
"cell_type": "code",
"execution_count": 171,
"id": "c6f7d510-83c6-4074-bbf6-67d915c4e34a",
"metadata": {},
"outputs": [],
"source": [
"food_pop_gdp = food_pop_gdp.drop(columns='_merge')"
]
},
{
"cell_type": "markdown",
"id": "936b7519-c7c2-4a4c-a7f6-295993ecdc00",
"metadata": {},
"source": [
"### 4. Ignore all columns include only 1 value"
]
},
{
"cell_type": "code",
"execution_count": 172,
"id": "3c800ed1-534b-4e21-84a4-d01077e275cf",
"metadata": {},
"outputs": [],
"source": [
"uniquecols = food_pop_gdp.nunique() > 1"
]
},
{
"cell_type": "code",
"execution_count": 173,
"id": "819f3ba4-8a70-413c-a7e4-2a608bb625c4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Zone code_x', 'Zone', 'Code Element_x', 'Element_x', 'Unit_x',\n",
" 'Value_x', 'Symbol_x', 'Description of Symbol_x', 'Zone code_y',\n",
" 'Value_y', 'Zone code', 'Value'],\n",
" dtype='object')"
]
},
"execution_count": 173,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"uniquecols[uniquecols == True].index"
]
},
{
"cell_type": "code",
"execution_count": 174,
"id": "254aee43-0ff8-4375-9804-9d973ada698f",
"metadata": {},
"outputs": [],
"source": [
"food_pop_gdp = food_pop_gdp[list(uniquecols[uniquecols == True].index)]"
]
},
{
"cell_type": "code",
"execution_count": 175,
"id": "aa9c6f86-5be1-4408-8473-970d6574264e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Zone code_x</th>\n",
" <th>Zone</th>\n",
" <th>Code Element_x</th>\n",
" <th>Element_x</th>\n",
" <th>Unit_x</th>\n",
" <th>Value_x</th>\n",
" <th>Symbol_x</th>\n",
" <th>Description of Symbol_x</th>\n",
" <th>Zone code_y</th>\n",
" <th>Value_y</th>\n",
" <th>Zone code</th>\n",
" <th>Value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2.0</td>\n",
" <td>Afghanistan</td>\n",
" <td>5511.0</td>\n",
" <td>Production</td>\n",
" <td>Milliers de tonnes</td>\n",
" <td>28.00</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" <td>2.0</td>\n",
" <td>36296.113</td>\n",
" <td>4.0</td>\n",
" <td>2058.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2.0</td>\n",
" <td>Afghanistan</td>\n",
" <td>5611.0</td>\n",
" <td>Importations - Quantité</td>\n",
" <td>Milliers de tonnes</td>\n",
" <td>29.00</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" <td>2.0</td>\n",
" <td>36296.113</td>\n",
" <td>4.0</td>\n",
" <td>2058.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2.0</td>\n",
" <td>Afghanistan</td>\n",
" <td>5072.0</td>\n",
" <td>Variation de stock</td>\n",
" <td>Milliers de tonnes</td>\n",
" <td>0.00</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" <td>2.0</td>\n",
" <td>36296.113</td>\n",
" <td>4.0</td>\n",
" <td>2058.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2.0</td>\n",
" <td>Afghanistan</td>\n",
" <td>5301.0</td>\n",
" <td>Disponibilité intérieure</td>\n",
" <td>Milliers de tonnes</td>\n",
" <td>57.00</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" <td>2.0</td>\n",
" <td>36296.113</td>\n",
" <td>4.0</td>\n",
" <td>2058.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2.0</td>\n",
" <td>Afghanistan</td>\n",
" <td>5123.0</td>\n",
" <td>Pertes</td>\n",
" <td>Milliers de tonnes</td>\n",
" <td>2.00</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" <td>2.0</td>\n",
" <td>36296.113</td>\n",
" <td>4.0</td>\n",
" <td>2058.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2056</th>\n",
" <td>181.0</td>\n",
" <td>Zimbabwe</td>\n",
" <td>5142.0</td>\n",
" <td>Nourriture</td>\n",
" <td>Milliers de tonnes</td>\n",
" <td>67.00</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" <td>181.0</td>\n",
" <td>14236.595</td>\n",
" <td>716.0</td>\n",
" <td>3795.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2057</th>\n",
" <td>181.0</td>\n",
" <td>Zimbabwe</td>\n",
" <td>645.0</td>\n",
" <td>Disponibilité alimentaire en quantité (kg/pers...</td>\n",
" <td>kg</td>\n",
" <td>4.68</td>\n",
" <td>Fc</td>\n",
" <td>Donnée calculée</td>\n",
" <td>181.0</td>\n",
" <td>14236.595</td>\n",
" <td>716.0</td>\n",
" <td>3795.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2058</th>\n",
" <td>181.0</td>\n",
" <td>Zimbabwe</td>\n",
" <td>664.0</td>\n",
" <td>Disponibilité alimentaire (Kcal/personne/jour)</td>\n",
" <td>Kcal/personne/jour</td>\n",
" <td>16.00</td>\n",
" <td>Fc</td>\n",
" <td>Donnée calculée</td>\n",
" <td>181.0</td>\n",
" <td>14236.595</td>\n",
" <td>716.0</td>\n",
" <td>3795.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2059</th>\n",
" <td>181.0</td>\n",
" <td>Zimbabwe</td>\n",
" <td>674.0</td>\n",
" <td>Disponibilité de protéines en quantité (g/pers...</td>\n",
" <td>g/personne/jour</td>\n",
" <td>1.59</td>\n",
" <td>Fc</td>\n",
" <td>Donnée calculée</td>\n",
" <td>181.0</td>\n",
" <td>14236.595</td>\n",
" <td>716.0</td>\n",
" <td>3795.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2060</th>\n",
" <td>181.0</td>\n",
" <td>Zimbabwe</td>\n",
" <td>684.0</td>\n",
" <td>Disponibilité de matière grasse en quantité (g...</td>\n",
" <td>g/personne/jour</td>\n",
" <td>0.99</td>\n",
" <td>Fc</td>\n",
" <td>Donnée calculée</td>\n",
" <td>181.0</td>\n",
" <td>14236.595</td>\n",
" <td>716.0</td>\n",
" <td>3795.6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1963 rows × 12 columns</p>\n",
"</div>"
],
"text/plain": [
" Zone code_x Zone Code Element_x \\\n",
"0 2.0 Afghanistan 5511.0 \n",
"1 2.0 Afghanistan 5611.0 \n",
"2 2.0 Afghanistan 5072.0 \n",
"3 2.0 Afghanistan 5301.0 \n",
"4 2.0 Afghanistan 5123.0 \n",
"... ... ... ... \n",
"2056 181.0 Zimbabwe 5142.0 \n",
"2057 181.0 Zimbabwe 645.0 \n",
"2058 181.0 Zimbabwe 664.0 \n",
"2059 181.0 Zimbabwe 674.0 \n",
"2060 181.0 Zimbabwe 684.0 \n",
"\n",
" Element_x Unit_x \\\n",
"0 Production Milliers de tonnes \n",
"1 Importations - Quantité Milliers de tonnes \n",
"2 Variation de stock Milliers de tonnes \n",
"3 Disponibilité intérieure Milliers de tonnes \n",
"4 Pertes Milliers de tonnes \n",
"... ... ... \n",
"2056 Nourriture Milliers de tonnes \n",
"2057 Disponibilité alimentaire en quantité (kg/pers... kg \n",
"2058 Disponibilité alimentaire (Kcal/personne/jour) Kcal/personne/jour \n",
"2059 Disponibilité de protéines en quantité (g/pers... g/personne/jour \n",
"2060 Disponibilité de matière grasse en quantité (g... g/personne/jour \n",
"\n",
" Value_x Symbol_x Description of Symbol_x Zone code_y Value_y \\\n",
"0 28.00 S Données standardisées 2.0 36296.113 \n",
"1 29.00 S Données standardisées 2.0 36296.113 \n",
"2 0.00 S Données standardisées 2.0 36296.113 \n",
"3 57.00 S Données standardisées 2.0 36296.113 \n",
"4 2.00 S Données standardisées 2.0 36296.113 \n",
"... ... ... ... ... ... \n",
"2056 67.00 S Données standardisées 181.0 14236.595 \n",
"2057 4.68 Fc Donnée calculée 181.0 14236.595 \n",
"2058 16.00 Fc Donnée calculée 181.0 14236.595 \n",
"2059 1.59 Fc Donnée calculée 181.0 14236.595 \n",
"2060 0.99 Fc Donnée calculée 181.0 14236.595 \n",
"\n",
" Zone code Value \n",
"0 4.0 2058.4 \n",
"1 4.0 2058.4 \n",
"2 4.0 2058.4 \n",
"3 4.0 2058.4 \n",
"4 4.0 2058.4 \n",
"... ... ... \n",
"2056 716.0 3795.6 \n",
"2057 716.0 3795.6 \n",
"2058 716.0 3795.6 \n",
"2059 716.0 3795.6 \n",
"2060 716.0 3795.6 \n",
"\n",
"[1963 rows x 12 columns]"
]
},
"execution_count": 175,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"food_pop_gdp"
]
},
{
"cell_type": "markdown",
"id": "a154d91a-096c-489a-8f13-ec71514ae528",
"metadata": {},
"source": [
"### 5. Rename and remove columns"
]
},
{
"cell_type": "code",
"execution_count": 176,
"id": "765ce546-e1fb-4be2-93c5-d4dd0b32dfac",
"metadata": {},
"outputs": [],
"source": [
"food_pop_gdp.rename(columns={'Element_x':'Element', 'Value_x':'Value_food', 'Value_y':'Value_population', 'Value':'Value_gdp', 'Symbol_x':'Symbol', 'Description of Symbol_x':'Description of Symbol'}, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 177,
"id": "0d53328f-032d-4d71-9cde-1948db48a413",
"metadata": {},
"outputs": [],
"source": [
"food_pop_gdp = food_pop_gdp[['Zone','Element','Value_food','Value_population','Value_gdp','Symbol','Description of Symbol']]"
]
},
{
"cell_type": "code",
"execution_count": 178,
"id": "b80b3bf1-4b8c-4472-8c1e-6c2107cc5514",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Zone</th>\n",
" <th>Element</th>\n",
" <th>Value_food</th>\n",
" <th>Value_population</th>\n",
" <th>Value_gdp</th>\n",
" <th>Symbol</th>\n",
" <th>Description of Symbol</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Afghanistan</td>\n",
" <td>Production</td>\n",
" <td>28.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Afghanistan</td>\n",
" <td>Importations - Quantité</td>\n",
" <td>29.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Afghanistan</td>\n",
" <td>Variation de stock</td>\n",
" <td>0.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Afghanistan</td>\n",
" <td>Disponibilité intérieure</td>\n",
" <td>57.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Afghanistan</td>\n",
" <td>Pertes</td>\n",
" <td>2.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2056</th>\n",
" <td>Zimbabwe</td>\n",
" <td>Nourriture</td>\n",
" <td>67.00</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2057</th>\n",
" <td>Zimbabwe</td>\n",
" <td>Disponibilité alimentaire en quantité (kg/pers...</td>\n",
" <td>4.68</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>Fc</td>\n",
" <td>Donnée calculée</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2058</th>\n",
" <td>Zimbabwe</td>\n",
" <td>Disponibilité alimentaire (Kcal/personne/jour)</td>\n",
" <td>16.00</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>Fc</td>\n",
" <td>Donnée calculée</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2059</th>\n",
" <td>Zimbabwe</td>\n",
" <td>Disponibilité de protéines en quantité (g/pers...</td>\n",
" <td>1.59</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>Fc</td>\n",
" <td>Donnée calculée</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2060</th>\n",
" <td>Zimbabwe</td>\n",
" <td>Disponibilité de matière grasse en quantité (g...</td>\n",
" <td>0.99</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>Fc</td>\n",
" <td>Donnée calculée</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1963 rows × 7 columns</p>\n",
"</div>"
],
"text/plain": [
" Zone Element \\\n",
"0 Afghanistan Production \n",
"1 Afghanistan Importations - Quantité \n",
"2 Afghanistan Variation de stock \n",
"3 Afghanistan Disponibilité intérieure \n",
"4 Afghanistan Pertes \n",
"... ... ... \n",
"2056 Zimbabwe Nourriture \n",
"2057 Zimbabwe Disponibilité alimentaire en quantité (kg/pers... \n",
"2058 Zimbabwe Disponibilité alimentaire (Kcal/personne/jour) \n",
"2059 Zimbabwe Disponibilité de protéines en quantité (g/pers... \n",
"2060 Zimbabwe Disponibilité de matière grasse en quantité (g... \n",
"\n",
" Value_food Value_population Value_gdp Symbol Description of Symbol \n",
"0 28.00 36296.113 2058.4 S Données standardisées \n",
"1 29.00 36296.113 2058.4 S Données standardisées \n",
"2 0.00 36296.113 2058.4 S Données standardisées \n",
"3 57.00 36296.113 2058.4 S Données standardisées \n",
"4 2.00 36296.113 2058.4 S Données standardisées \n",
"... ... ... ... ... ... \n",
"2056 67.00 14236.595 3795.6 S Données standardisées \n",
"2057 4.68 14236.595 3795.6 Fc Donnée calculée \n",
"2058 16.00 14236.595 3795.6 Fc Donnée calculée \n",
"2059 1.59 14236.595 3795.6 Fc Donnée calculée \n",
"2060 0.99 14236.595 3795.6 Fc Donnée calculée \n",
"\n",
"[1963 rows x 7 columns]"
]
},
"execution_count": 178,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"food_pop_gdp"
]
},
{
"cell_type": "markdown",
"id": "9863c11d-2830-48f1-a1c9-39ca914b7dd3",
"metadata": {},
"source": [
"### 6. Merge with stabpolitic"
]
},
{
"cell_type": "code",
"execution_count": 179,
"id": "1b57480d-90cf-4054-a2e3-0107b60ddd73",
"metadata": {},
"outputs": [],
"source": [
"df = pd.merge(food_pop_gdp, stabpolitic, on='Zone', how='outer', indicator=True)"
]
},
{
"cell_type": "code",
"execution_count": 180,
"id": "d68e6063-e33e-4a57-8467-21cfa36a26f9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"_merge\n",
"both 1963\n",
"right_only 33\n",
"left_only 0\n",
"Name: count, dtype: int64"
]
},
"execution_count": 180,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df._merge.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 181,
"id": "04a86f34-63fd-4bab-b078-b116bcaf8288",
"metadata": {},
"outputs": [],
"source": [
"df = df[df._merge == 'both']"
]
},
{
"cell_type": "code",
"execution_count": 182,
"id": "b19ebacd-e7e9-4dc2-83bd-ad22228b9f60",
"metadata": {},
"outputs": [],
"source": [
"df_uniquecols = df.nunique() > 1"
]
},
{
"cell_type": "code",
"execution_count": 183,
"id": "0c58afd2-f059-439c-a747-7b0e16ece323",
"metadata": {},
"outputs": [],
"source": [
"df = df[list(df_uniquecols[df_uniquecols == True].index)]"
]
},
{
"cell_type": "code",
"execution_count": 184,
"id": "9f0c82d2-cffc-468d-9838-9ac64d6e1ef1",
"metadata": {},
"outputs": [],
"source": [
"df = df[['Zone','Value_food','Value_population','Value_gdp','Value','Element_x','Symbol_x','Description of Symbol_x']]"
]
},
{
"cell_type": "code",
"execution_count": 185,
"id": "a6f7d5c5-93a3-40a0-8af8-b031945b7f60",
"metadata": {},
"outputs": [],
"source": [
"df.rename(columns={'Element_x':'Element', 'Value':'Value_politicstab', 'Symbol_x':'Symbol', 'Description of Symbol_x':'Description of Symbol'}, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 186,
"id": "69620f32-d97a-4016-a558-b31aa3b4eb5f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Zone</th>\n",
" <th>Value_food</th>\n",
" <th>Value_population</th>\n",
" <th>Value_gdp</th>\n",
" <th>Value_politicstab</th>\n",
" <th>Element</th>\n",
" <th>Symbol</th>\n",
" <th>Description of Symbol</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Afghanistan</td>\n",
" <td>28.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>Production</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Afghanistan</td>\n",
" <td>29.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>Importations - Quantité</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Afghanistan</td>\n",
" <td>0.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>Variation de stock</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Afghanistan</td>\n",
" <td>57.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>Disponibilité intérieure</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Afghanistan</td>\n",
" <td>2.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>Pertes</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1958</th>\n",
" <td>Zimbabwe</td>\n",
" <td>67.00</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>Nourriture</td>\n",
" <td>S</td>\n",
" <td>Données standardisées</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1959</th>\n",
" <td>Zimbabwe</td>\n",
" <td>4.68</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>Disponibilité alimentaire en quantité (kg/pers...</td>\n",
" <td>Fc</td>\n",
" <td>Donnée calculée</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1960</th>\n",
" <td>Zimbabwe</td>\n",
" <td>16.00</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>Disponibilité alimentaire (Kcal/personne/jour)</td>\n",
" <td>Fc</td>\n",
" <td>Donnée calculée</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1961</th>\n",
" <td>Zimbabwe</td>\n",
" <td>1.59</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>Disponibilité de protéines en quantité (g/pers...</td>\n",
" <td>Fc</td>\n",
" <td>Donnée calculée</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1962</th>\n",
" <td>Zimbabwe</td>\n",
" <td>0.99</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>Disponibilité de matière grasse en quantité (g...</td>\n",
" <td>Fc</td>\n",
" <td>Donnée calculée</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1963 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" Zone Value_food Value_population Value_gdp Value_politicstab \\\n",
"0 Afghanistan 28.00 36296.113 2058.4 -2.80 \n",
"1 Afghanistan 29.00 36296.113 2058.4 -2.80 \n",
"2 Afghanistan 0.00 36296.113 2058.4 -2.80 \n",
"3 Afghanistan 57.00 36296.113 2058.4 -2.80 \n",
"4 Afghanistan 2.00 36296.113 2058.4 -2.80 \n",
"... ... ... ... ... ... \n",
"1958 Zimbabwe 67.00 14236.595 3795.6 -0.71 \n",
"1959 Zimbabwe 4.68 14236.595 3795.6 -0.71 \n",
"1960 Zimbabwe 16.00 14236.595 3795.6 -0.71 \n",
"1961 Zimbabwe 1.59 14236.595 3795.6 -0.71 \n",
"1962 Zimbabwe 0.99 14236.595 3795.6 -0.71 \n",
"\n",
" Element Symbol \\\n",
"0 Production S \n",
"1 Importations - Quantité S \n",
"2 Variation de stock S \n",
"3 Disponibilité intérieure S \n",
"4 Pertes S \n",
"... ... ... \n",
"1958 Nourriture S \n",
"1959 Disponibilité alimentaire en quantité (kg/pers... Fc \n",
"1960 Disponibilité alimentaire (Kcal/personne/jour) Fc \n",
"1961 Disponibilité de protéines en quantité (g/pers... Fc \n",
"1962 Disponibilité de matière grasse en quantité (g... Fc \n",
"\n",
" Description of Symbol \n",
"0 Données standardisées \n",
"1 Données standardisées \n",
"2 Données standardisées \n",
"3 Données standardisées \n",
"4 Données standardisées \n",
"... ... \n",
"1958 Données standardisées \n",
"1959 Donnée calculée \n",
"1960 Donnée calculée \n",
"1961 Donnée calculée \n",
"1962 Donnée calculée \n",
"\n",
"[1963 rows x 8 columns]"
]
},
"execution_count": 186,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"id": "75cadf2b-4423-4dd2-bb79-9aee573a7a5c",
"metadata": {},
"source": [
"### Unpivot column Element_x"
]
},
{
"cell_type": "code",
"execution_count": 187,
"id": "149e430a-74d2-4498-a56b-3d676e79fc10",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Production', 'Importations - Quantité', 'Variation de stock',\n",
" 'Disponibilité intérieure', 'Pertes', 'Résidus', 'Nourriture',\n",
" 'Disponibilité alimentaire en quantité (kg/personne/an)',\n",
" 'Disponibilité alimentaire (Kcal/personne/jour)',\n",
" 'Disponibilité de protéines en quantité (g/personne/jour)',\n",
" 'Disponibilité de matière grasse en quantité (g/personne/jour)',\n",
" 'Exportations - Quantité', 'Alimentation pour touristes',\n",
" 'Traitement', 'Autres utilisations (non alimentaire)',\n",
" 'Aliments pour animaux', 'Semences'], dtype=object)"
]
},
"execution_count": 187,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.Element.unique()"
]
},
{
"cell_type": "code",
"execution_count": 188,
"id": "7e7aaf6d-9f85-4d32-9d9f-240d812ed051",
"metadata": {},
"outputs": [],
"source": [
"df = pd.get_dummies(df, columns=df.columns[5:], prefix='', prefix_sep='', dtype=int)"
]
},
{
"cell_type": "code",
"execution_count": 189,
"id": "fc56c36d-3465-43c8-b674-d4a04a082e2c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Zone</th>\n",
" <th>Value_food</th>\n",
" <th>Value_population</th>\n",
" <th>Value_gdp</th>\n",
" <th>Value_politicstab</th>\n",
" <th>Alimentation pour touristes</th>\n",
" <th>Aliments pour animaux</th>\n",
" <th>Autres utilisations (non alimentaire)</th>\n",
" <th>Disponibilité alimentaire (Kcal/personne/jour)</th>\n",
" <th>Disponibilité alimentaire en quantité (kg/personne/an)</th>\n",
" <th>...</th>\n",
" <th>Pertes</th>\n",
" <th>Production</th>\n",
" <th>Résidus</th>\n",
" <th>Semences</th>\n",
" <th>Traitement</th>\n",
" <th>Variation de stock</th>\n",
" <th>Fc</th>\n",
" <th>S</th>\n",
" <th>Donnée calculée</th>\n",
" <th>Données standardisées</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Afghanistan</td>\n",
" <td>28.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Afghanistan</td>\n",
" <td>29.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Afghanistan</td>\n",
" <td>0.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Afghanistan</td>\n",
" <td>57.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Afghanistan</td>\n",
" <td>2.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1958</th>\n",
" <td>Zimbabwe</td>\n",
" <td>67.00</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1959</th>\n",
" <td>Zimbabwe</td>\n",
" <td>4.68</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1960</th>\n",
" <td>Zimbabwe</td>\n",
" <td>16.00</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1961</th>\n",
" <td>Zimbabwe</td>\n",
" <td>1.59</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1962</th>\n",
" <td>Zimbabwe</td>\n",
" <td>0.99</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1963 rows × 26 columns</p>\n",
"</div>"
],
"text/plain": [
" Zone Value_food Value_population Value_gdp Value_politicstab \\\n",
"0 Afghanistan 28.00 36296.113 2058.4 -2.80 \n",
"1 Afghanistan 29.00 36296.113 2058.4 -2.80 \n",
"2 Afghanistan 0.00 36296.113 2058.4 -2.80 \n",
"3 Afghanistan 57.00 36296.113 2058.4 -2.80 \n",
"4 Afghanistan 2.00 36296.113 2058.4 -2.80 \n",
"... ... ... ... ... ... \n",
"1958 Zimbabwe 67.00 14236.595 3795.6 -0.71 \n",
"1959 Zimbabwe 4.68 14236.595 3795.6 -0.71 \n",
"1960 Zimbabwe 16.00 14236.595 3795.6 -0.71 \n",
"1961 Zimbabwe 1.59 14236.595 3795.6 -0.71 \n",
"1962 Zimbabwe 0.99 14236.595 3795.6 -0.71 \n",
"\n",
" Alimentation pour touristes Aliments pour animaux \\\n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"... ... ... \n",
"1958 0 0 \n",
"1959 0 0 \n",
"1960 0 0 \n",
"1961 0 0 \n",
"1962 0 0 \n",
"\n",
" Autres utilisations (non alimentaire) \\\n",
"0 0 \n",
"1 0 \n",
"2 0 \n",
"3 0 \n",
"4 0 \n",
"... ... \n",
"1958 0 \n",
"1959 0 \n",
"1960 0 \n",
"1961 0 \n",
"1962 0 \n",
"\n",
" Disponibilité alimentaire (Kcal/personne/jour) \\\n",
"0 0 \n",
"1 0 \n",
"2 0 \n",
"3 0 \n",
"4 0 \n",
"... ... \n",
"1958 0 \n",
"1959 0 \n",
"1960 1 \n",
"1961 0 \n",
"1962 0 \n",
"\n",
" Disponibilité alimentaire en quantité (kg/personne/an) ... Pertes \\\n",
"0 0 ... 0 \n",
"1 0 ... 0 \n",
"2 0 ... 0 \n",
"3 0 ... 0 \n",
"4 0 ... 1 \n",
"... ... ... ... \n",
"1958 0 ... 0 \n",
"1959 1 ... 0 \n",
"1960 0 ... 0 \n",
"1961 0 ... 0 \n",
"1962 0 ... 0 \n",
"\n",
" Production Résidus Semences Traitement Variation de stock Fc S \\\n",
"0 1 0 0 0 0 0 1 \n",
"1 0 0 0 0 0 0 1 \n",
"2 0 0 0 0 1 0 1 \n",
"3 0 0 0 0 0 0 1 \n",
"4 0 0 0 0 0 0 1 \n",
"... ... ... ... ... ... .. .. \n",
"1958 0 0 0 0 0 0 1 \n",
"1959 0 0 0 0 0 1 0 \n",
"1960 0 0 0 0 0 1 0 \n",
"1961 0 0 0 0 0 1 0 \n",
"1962 0 0 0 0 0 1 0 \n",
"\n",
" Donnée calculée Données standardisées \n",
"0 0 1 \n",
"1 0 1 \n",
"2 0 1 \n",
"3 0 1 \n",
"4 0 1 \n",
"... ... ... \n",
"1958 0 1 \n",
"1959 1 0 \n",
"1960 1 0 \n",
"1961 1 0 \n",
"1962 1 0 \n",
"\n",
"[1963 rows x 26 columns]"
]
},
"execution_count": 189,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"id": "23f7ad19-fabe-4202-8a80-64f650ab868a",
"metadata": {},
"source": [
"### Add values from Value_x to all rows in column Element\n",
"### Remove column Value_food and Value_population"
]
},
{
"cell_type": "code",
"execution_count": 190,
"id": "f6456647-bdd3-48c5-bb7f-3f9d6f9c59a9",
"metadata": {},
"outputs": [],
"source": [
"for column in df.columns[5:]: \n",
" df[column] = df.Value_food * df[column]"
]
},
{
"cell_type": "code",
"execution_count": 191,
"id": "ee0a74cc-66fa-4901-9a21-48af88a16c1e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Zone</th>\n",
" <th>Value_food</th>\n",
" <th>Value_population</th>\n",
" <th>Value_gdp</th>\n",
" <th>Value_politicstab</th>\n",
" <th>Alimentation pour touristes</th>\n",
" <th>Aliments pour animaux</th>\n",
" <th>Autres utilisations (non alimentaire)</th>\n",
" <th>Disponibilité alimentaire (Kcal/personne/jour)</th>\n",
" <th>Disponibilité alimentaire en quantité (kg/personne/an)</th>\n",
" <th>...</th>\n",
" <th>Pertes</th>\n",
" <th>Production</th>\n",
" <th>Résidus</th>\n",
" <th>Semences</th>\n",
" <th>Traitement</th>\n",
" <th>Variation de stock</th>\n",
" <th>Fc</th>\n",
" <th>S</th>\n",
" <th>Donnée calculée</th>\n",
" <th>Données standardisées</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Afghanistan</td>\n",
" <td>28.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>28.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>28.0</td>\n",
" <td>0.00</td>\n",
" <td>28.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Afghanistan</td>\n",
" <td>29.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>29.0</td>\n",
" <td>0.00</td>\n",
" <td>29.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Afghanistan</td>\n",
" <td>0.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Afghanistan</td>\n",
" <td>57.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>57.0</td>\n",
" <td>0.00</td>\n",
" <td>57.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Afghanistan</td>\n",
" <td>2.00</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>...</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>2.0</td>\n",
" <td>0.00</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1958</th>\n",
" <td>Zimbabwe</td>\n",
" <td>67.00</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>67.0</td>\n",
" <td>0.00</td>\n",
" <td>67.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1959</th>\n",
" <td>Zimbabwe</td>\n",
" <td>4.68</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>4.68</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>4.68</td>\n",
" <td>0.0</td>\n",
" <td>4.68</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1960</th>\n",
" <td>Zimbabwe</td>\n",
" <td>16.00</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>16.0</td>\n",
" <td>0.00</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>16.00</td>\n",
" <td>0.0</td>\n",
" <td>16.00</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1961</th>\n",
" <td>Zimbabwe</td>\n",
" <td>1.59</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.59</td>\n",
" <td>0.0</td>\n",
" <td>1.59</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1962</th>\n",
" <td>Zimbabwe</td>\n",
" <td>0.99</td>\n",
" <td>14236.595</td>\n",
" <td>3795.6</td>\n",
" <td>-0.71</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.00</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.99</td>\n",
" <td>0.0</td>\n",
" <td>0.99</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1963 rows × 26 columns</p>\n",
"</div>"
],
"text/plain": [
" Zone Value_food Value_population Value_gdp Value_politicstab \\\n",
"0 Afghanistan 28.00 36296.113 2058.4 -2.80 \n",
"1 Afghanistan 29.00 36296.113 2058.4 -2.80 \n",
"2 Afghanistan 0.00 36296.113 2058.4 -2.80 \n",
"3 Afghanistan 57.00 36296.113 2058.4 -2.80 \n",
"4 Afghanistan 2.00 36296.113 2058.4 -2.80 \n",
"... ... ... ... ... ... \n",
"1958 Zimbabwe 67.00 14236.595 3795.6 -0.71 \n",
"1959 Zimbabwe 4.68 14236.595 3795.6 -0.71 \n",
"1960 Zimbabwe 16.00 14236.595 3795.6 -0.71 \n",
"1961 Zimbabwe 1.59 14236.595 3795.6 -0.71 \n",
"1962 Zimbabwe 0.99 14236.595 3795.6 -0.71 \n",
"\n",
" Alimentation pour touristes Aliments pour animaux \\\n",
"0 0.0 0.0 \n",
"1 0.0 0.0 \n",
"2 0.0 0.0 \n",
"3 0.0 0.0 \n",
"4 0.0 0.0 \n",
"... ... ... \n",
"1958 0.0 0.0 \n",
"1959 0.0 0.0 \n",
"1960 0.0 0.0 \n",
"1961 0.0 0.0 \n",
"1962 0.0 0.0 \n",
"\n",
" Autres utilisations (non alimentaire) \\\n",
"0 0.0 \n",
"1 0.0 \n",
"2 0.0 \n",
"3 0.0 \n",
"4 0.0 \n",
"... ... \n",
"1958 0.0 \n",
"1959 0.0 \n",
"1960 0.0 \n",
"1961 0.0 \n",
"1962 0.0 \n",
"\n",
" Disponibilité alimentaire (Kcal/personne/jour) \\\n",
"0 0.0 \n",
"1 0.0 \n",
"2 0.0 \n",
"3 0.0 \n",
"4 0.0 \n",
"... ... \n",
"1958 0.0 \n",
"1959 0.0 \n",
"1960 16.0 \n",
"1961 0.0 \n",
"1962 0.0 \n",
"\n",
" Disponibilité alimentaire en quantité (kg/personne/an) ... Pertes \\\n",
"0 0.00 ... 0.0 \n",
"1 0.00 ... 0.0 \n",
"2 0.00 ... 0.0 \n",
"3 0.00 ... 0.0 \n",
"4 0.00 ... 2.0 \n",
"... ... ... ... \n",
"1958 0.00 ... 0.0 \n",
"1959 4.68 ... 0.0 \n",
"1960 0.00 ... 0.0 \n",
"1961 0.00 ... 0.0 \n",
"1962 0.00 ... 0.0 \n",
"\n",
" Production Résidus Semences Traitement Variation de stock Fc \\\n",
"0 28.0 0.0 0.0 0.0 0.0 0.00 \n",
"1 0.0 0.0 0.0 0.0 0.0 0.00 \n",
"2 0.0 0.0 0.0 0.0 0.0 0.00 \n",
"3 0.0 0.0 0.0 0.0 0.0 0.00 \n",
"4 0.0 0.0 0.0 0.0 0.0 0.00 \n",
"... ... ... ... ... ... ... \n",
"1958 0.0 0.0 0.0 0.0 0.0 0.00 \n",
"1959 0.0 0.0 0.0 0.0 0.0 4.68 \n",
"1960 0.0 0.0 0.0 0.0 0.0 16.00 \n",
"1961 0.0 0.0 0.0 0.0 0.0 1.59 \n",
"1962 0.0 0.0 0.0 0.0 0.0 0.99 \n",
"\n",
" S Donnée calculée Données standardisées \n",
"0 28.0 0.00 28.0 \n",
"1 29.0 0.00 29.0 \n",
"2 0.0 0.00 0.0 \n",
"3 57.0 0.00 57.0 \n",
"4 2.0 0.00 2.0 \n",
"... ... ... ... \n",
"1958 67.0 0.00 67.0 \n",
"1959 0.0 4.68 0.0 \n",
"1960 0.0 16.00 0.0 \n",
"1961 0.0 1.59 0.0 \n",
"1962 0.0 0.99 0.0 \n",
"\n",
"[1963 rows x 26 columns]"
]
},
"execution_count": 191,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 192,
"id": "5ca88399-22fb-48a9-b504-f2baef121b36",
"metadata": {},
"outputs": [],
"source": [
"df = df.drop(columns=['Value_food'],axis=1)"
]
},
{
"cell_type": "markdown",
"id": "3ae6b0e2-a676-4993-a979-77b0e2410ab8",
"metadata": {},
"source": [
"### Group by Zone to aggregate all columns"
]
},
{
"cell_type": "code",
"execution_count": 193,
"id": "34a49b5e-267c-420f-8b90-a7cdd62d7790",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'Value_population': 'mean', 'Value_gdp': 'mean', 'Value_politicstab': 'mean', 'Alimentation pour touristes': 'sum', 'Aliments pour animaux': 'sum', 'Autres utilisations (non alimentaire)': 'sum', 'Disponibilité alimentaire (Kcal/personne/jour)': 'sum', 'Disponibilité alimentaire en quantité (kg/personne/an)': 'sum', 'Disponibilité de matière grasse en quantité (g/personne/jour)': 'sum', 'Disponibilité de protéines en quantité (g/personne/jour)': 'sum', 'Disponibilité intérieure': 'sum', 'Exportations - Quantité': 'sum', 'Importations - Quantité': 'sum', 'Nourriture': 'sum', 'Pertes': 'sum', 'Production': 'sum', 'Résidus': 'sum', 'Semences': 'sum', 'Traitement': 'sum', 'Variation de stock': 'sum', 'Fc': 'sum', 'S': 'sum', 'Donnée calculée': 'sum', 'Données standardisées': 'sum'}\n"
]
}
],
"source": [
"dict = {}\n",
"for column in df.columns[1:]: \n",
" if column.startswith('Value_'+''):\n",
" dict[column] = 'mean'\n",
" else:\n",
" dict[column] = 'sum'\n",
"print(dict)"
]
},
{
"cell_type": "code",
"execution_count": 194,
"id": "98ba1563-1100-4106-9759-32d5055b22c8",
"metadata": {},
"outputs": [],
"source": [
"agg_df = df.groupby('Zone').agg(dict).reset_index()"
]
},
{
"cell_type": "code",
"execution_count": 195,
"id": "e224081b-5929-47f1-910a-e14f0eef5d92",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Zone</th>\n",
" <th>Value_population</th>\n",
" <th>Value_gdp</th>\n",
" <th>Value_politicstab</th>\n",
" <th>Alimentation pour touristes</th>\n",
" <th>Aliments pour animaux</th>\n",
" <th>Autres utilisations (non alimentaire)</th>\n",
" <th>Disponibilité alimentaire (Kcal/personne/jour)</th>\n",
" <th>Disponibilité alimentaire en quantité (kg/personne/an)</th>\n",
" <th>Disponibilité de matière grasse en quantité (g/personne/jour)</th>\n",
" <th>...</th>\n",
" <th>Pertes</th>\n",
" <th>Production</th>\n",
" <th>Résidus</th>\n",
" <th>Semences</th>\n",
" <th>Traitement</th>\n",
" <th>Variation de stock</th>\n",
" <th>Fc</th>\n",
" <th>S</th>\n",
" <th>Donnée calculée</th>\n",
" <th>Données standardisées</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Afghanistan</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>5.0</td>\n",
" <td>1.53</td>\n",
" <td>0.33</td>\n",
" <td>...</td>\n",
" <td>2.0</td>\n",
" <td>28.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>7.40</td>\n",
" <td>171.0</td>\n",
" <td>7.40</td>\n",
" <td>171.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Afrique du Sud</td>\n",
" <td>57009.756</td>\n",
" <td>13860.3</td>\n",
" <td>-0.28</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>143.0</td>\n",
" <td>35.69</td>\n",
" <td>9.25</td>\n",
" <td>...</td>\n",
" <td>83.0</td>\n",
" <td>1667.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>202.05</td>\n",
" <td>6480.0</td>\n",
" <td>202.05</td>\n",
" <td>6480.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Albanie</td>\n",
" <td>2884.169</td>\n",
" <td>12771.0</td>\n",
" <td>0.38</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>85.0</td>\n",
" <td>16.36</td>\n",
" <td>6.45</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>13.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>4.0</td>\n",
" <td>114.07</td>\n",
" <td>149.0</td>\n",
" <td>114.07</td>\n",
" <td>149.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Algérie</td>\n",
" <td>41389.189</td>\n",
" <td>11737.4</td>\n",
" <td>-0.92</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>22.0</td>\n",
" <td>6.38</td>\n",
" <td>1.50</td>\n",
" <td>...</td>\n",
" <td>13.0</td>\n",
" <td>275.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>31.85</td>\n",
" <td>831.0</td>\n",
" <td>31.85</td>\n",
" <td>831.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Allemagne</td>\n",
" <td>82658.409</td>\n",
" <td>53071.5</td>\n",
" <td>0.59</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>71.0</td>\n",
" <td>19.47</td>\n",
" <td>4.16</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>1514.0</td>\n",
" <td>-38.0</td>\n",
" <td>0.0</td>\n",
" <td>167.0</td>\n",
" <td>-29.0</td>\n",
" <td>102.59</td>\n",
" <td>6450.0</td>\n",
" <td>102.59</td>\n",
" <td>6450.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>159</th>\n",
" <td>Émirats arabes unis</td>\n",
" <td>9487.203</td>\n",
" <td>67183.6</td>\n",
" <td>0.62</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>147.0</td>\n",
" <td>43.47</td>\n",
" <td>9.25</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>48.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>-26.0</td>\n",
" <td>214.52</td>\n",
" <td>1373.0</td>\n",
" <td>214.52</td>\n",
" <td>1373.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>160</th>\n",
" <td>Équateur</td>\n",
" <td>16785.361</td>\n",
" <td>11617.9</td>\n",
" <td>-0.07</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>83.0</td>\n",
" <td>19.31</td>\n",
" <td>6.35</td>\n",
" <td>...</td>\n",
" <td>17.0</td>\n",
" <td>340.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>-1.0</td>\n",
" <td>114.81</td>\n",
" <td>1021.0</td>\n",
" <td>114.81</td>\n",
" <td>1021.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>161</th>\n",
" <td>États-Unis d'Amérique</td>\n",
" <td>325084.756</td>\n",
" <td>59914.8</td>\n",
" <td>0.29</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>89.0</td>\n",
" <td>219.0</td>\n",
" <td>55.68</td>\n",
" <td>14.83</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>21914.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>77.0</td>\n",
" <td>80.0</td>\n",
" <td>309.44</td>\n",
" <td>62341.0</td>\n",
" <td>309.44</td>\n",
" <td>62341.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>162</th>\n",
" <td>Éthiopie</td>\n",
" <td>106399.924</td>\n",
" <td>2021.6</td>\n",
" <td>-1.68</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.13</td>\n",
" <td>0.03</td>\n",
" <td>...</td>\n",
" <td>1.0</td>\n",
" <td>14.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.20</td>\n",
" <td>44.0</td>\n",
" <td>0.20</td>\n",
" <td>44.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>163</th>\n",
" <td>Îles Salomon</td>\n",
" <td>636.039</td>\n",
" <td>2663.5</td>\n",
" <td>0.20</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>18.0</td>\n",
" <td>4.45</td>\n",
" <td>1.31</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>3.0</td>\n",
" <td>25.27</td>\n",
" <td>15.0</td>\n",
" <td>25.27</td>\n",
" <td>15.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>164 rows × 25 columns</p>\n",
"</div>"
],
"text/plain": [
" Zone Value_population Value_gdp Value_politicstab \\\n",
"0 Afghanistan 36296.113 2058.4 -2.80 \n",
"1 Afrique du Sud 57009.756 13860.3 -0.28 \n",
"2 Albanie 2884.169 12771.0 0.38 \n",
"3 Algérie 41389.189 11737.4 -0.92 \n",
"4 Allemagne 82658.409 53071.5 0.59 \n",
".. ... ... ... ... \n",
"159 Émirats arabes unis 9487.203 67183.6 0.62 \n",
"160 Équateur 16785.361 11617.9 -0.07 \n",
"161 États-Unis d'Amérique 325084.756 59914.8 0.29 \n",
"162 Éthiopie 106399.924 2021.6 -1.68 \n",
"163 Îles Salomon 636.039 2663.5 0.20 \n",
"\n",
" Alimentation pour touristes Aliments pour animaux \\\n",
"0 0.0 0.0 \n",
"1 0.0 0.0 \n",
"2 0.0 0.0 \n",
"3 0.0 0.0 \n",
"4 0.0 0.0 \n",
".. ... ... \n",
"159 0.0 0.0 \n",
"160 0.0 0.0 \n",
"161 0.0 0.0 \n",
"162 0.0 0.0 \n",
"163 0.0 0.0 \n",
"\n",
" Autres utilisations (non alimentaire) \\\n",
"0 0.0 \n",
"1 0.0 \n",
"2 0.0 \n",
"3 0.0 \n",
"4 0.0 \n",
".. ... \n",
"159 0.0 \n",
"160 0.0 \n",
"161 89.0 \n",
"162 0.0 \n",
"163 0.0 \n",
"\n",
" Disponibilité alimentaire (Kcal/personne/jour) \\\n",
"0 5.0 \n",
"1 143.0 \n",
"2 85.0 \n",
"3 22.0 \n",
"4 71.0 \n",
".. ... \n",
"159 147.0 \n",
"160 83.0 \n",
"161 219.0 \n",
"162 0.0 \n",
"163 18.0 \n",
"\n",
" Disponibilité alimentaire en quantité (kg/personne/an) \\\n",
"0 1.53 \n",
"1 35.69 \n",
"2 16.36 \n",
"3 6.38 \n",
"4 19.47 \n",
".. ... \n",
"159 43.47 \n",
"160 19.31 \n",
"161 55.68 \n",
"162 0.13 \n",
"163 4.45 \n",
"\n",
" Disponibilité de matière grasse en quantité (g/personne/jour) ... \\\n",
"0 0.33 ... \n",
"1 9.25 ... \n",
"2 6.45 ... \n",
"3 1.50 ... \n",
"4 4.16 ... \n",
".. ... ... \n",
"159 9.25 ... \n",
"160 6.35 ... \n",
"161 14.83 ... \n",
"162 0.03 ... \n",
"163 1.31 ... \n",
"\n",
" Pertes Production Résidus Semences Traitement Variation de stock \\\n",
"0 2.0 28.0 0.0 0.0 0.0 0.0 \n",
"1 83.0 1667.0 0.0 0.0 0.0 0.0 \n",
"2 0.0 13.0 0.0 0.0 0.0 4.0 \n",
"3 13.0 275.0 0.0 0.0 0.0 0.0 \n",
"4 0.0 1514.0 -38.0 0.0 167.0 -29.0 \n",
".. ... ... ... ... ... ... \n",
"159 0.0 48.0 0.0 0.0 0.0 -26.0 \n",
"160 17.0 340.0 0.0 0.0 0.0 -1.0 \n",
"161 0.0 21914.0 0.0 0.0 77.0 80.0 \n",
"162 1.0 14.0 0.0 0.0 0.0 0.0 \n",
"163 0.0 0.0 0.0 0.0 0.0 3.0 \n",
"\n",
" Fc S Donnée calculée Données standardisées \n",
"0 7.40 171.0 7.40 171.0 \n",
"1 202.05 6480.0 202.05 6480.0 \n",
"2 114.07 149.0 114.07 149.0 \n",
"3 31.85 831.0 31.85 831.0 \n",
"4 102.59 6450.0 102.59 6450.0 \n",
".. ... ... ... ... \n",
"159 214.52 1373.0 214.52 1373.0 \n",
"160 114.81 1021.0 114.81 1021.0 \n",
"161 309.44 62341.0 309.44 62341.0 \n",
"162 0.20 44.0 0.20 44.0 \n",
"163 25.27 15.0 25.27 15.0 \n",
"\n",
"[164 rows x 25 columns]"
]
},
"execution_count": 195,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agg_df"
]
},
{
"cell_type": "markdown",
"id": "fd0581fc-c849-45a8-bf9f-6d99a97d26e5",
"metadata": {},
"source": [
"### Check which columns include only 1 value and filter out them"
]
},
{
"cell_type": "code",
"execution_count": 196,
"id": "60d7bc05-68e3-4f57-baf5-51f02bed2432",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Zone 164\n",
"Value_population 164\n",
"Value_gdp 164\n",
"Value_politicstab 136\n",
"Alimentation pour touristes 7\n",
"Aliments pour animaux 1\n",
"Autres utilisations (non alimentaire) 17\n",
"Disponibilité alimentaire (Kcal/personne/jour) 107\n",
"Disponibilité alimentaire en quantité (kg/personne/an) 162\n",
"Disponibilité de matière grasse en quantité (g/personne/jour) 149\n",
"Disponibilité de protéines en quantité (g/personne/jour) 156\n",
"Disponibilité intérieure 130\n",
"Exportations - Quantité 49\n",
"Importations - Quantité 81\n",
"Nourriture 126\n",
"Pertes 29\n",
"Production 119\n",
"Résidus 15\n",
"Semences 1\n",
"Traitement 25\n",
"Variation de stock 54\n",
"Fc 164\n",
"S 147\n",
"Donnée calculée 164\n",
"Données standardisées 147\n",
"dtype: int64"
]
},
"execution_count": 196,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agg_df.nunique()"
]
},
{
"cell_type": "code",
"execution_count": 197,
"id": "061fa728-32dc-4f85-b81e-767ed255e9c5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Zone</th>\n",
" <th>Value_population</th>\n",
" <th>Value_gdp</th>\n",
" <th>Value_politicstab</th>\n",
" <th>Alimentation pour touristes</th>\n",
" <th>Aliments pour animaux</th>\n",
" <th>Autres utilisations (non alimentaire)</th>\n",
" <th>Disponibilité alimentaire (Kcal/personne/jour)</th>\n",
" <th>Disponibilité alimentaire en quantité (kg/personne/an)</th>\n",
" <th>Disponibilité de matière grasse en quantité (g/personne/jour)</th>\n",
" <th>...</th>\n",
" <th>Pertes</th>\n",
" <th>Production</th>\n",
" <th>Résidus</th>\n",
" <th>Semences</th>\n",
" <th>Traitement</th>\n",
" <th>Variation de stock</th>\n",
" <th>Fc</th>\n",
" <th>S</th>\n",
" <th>Donnée calculée</th>\n",
" <th>Données standardisées</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>164</td>\n",
" <td>1.640000e+02</td>\n",
" <td>164.000000</td>\n",
" <td>164.000000</td>\n",
" <td>164.000000</td>\n",
" <td>164.0</td>\n",
" <td>164.000000</td>\n",
" <td>164.000000</td>\n",
" <td>164.000000</td>\n",
" <td>164.000000</td>\n",
" <td>...</td>\n",
" <td>164.000000</td>\n",
" <td>164.000000</td>\n",
" <td>164.000000</td>\n",
" <td>164.0</td>\n",
" <td>164.000000</td>\n",
" <td>164.000000</td>\n",
" <td>164.000000</td>\n",
" <td>164.000000</td>\n",
" <td>164.000000</td>\n",
" <td>164.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>164</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>Afghanistan</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>NaN</td>\n",
" <td>3.554589e+04</td>\n",
" <td>20342.903659</td>\n",
" <td>-0.046646</td>\n",
" <td>0.073171</td>\n",
" <td>0.0</td>\n",
" <td>8.719512</td>\n",
" <td>74.054878</td>\n",
" <td>20.066707</td>\n",
" <td>4.861646</td>\n",
" <td>...</td>\n",
" <td>13.646341</td>\n",
" <td>622.573171</td>\n",
" <td>-2.829268</td>\n",
" <td>0.0</td>\n",
" <td>7.365854</td>\n",
" <td>14.286585</td>\n",
" <td>106.067134</td>\n",
" <td>2007.243902</td>\n",
" <td>106.067134</td>\n",
" <td>2007.243902</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>NaN</td>\n",
" <td>1.133016e+05</td>\n",
" <td>20803.735703</td>\n",
" <td>0.875888</td>\n",
" <td>1.630089</td>\n",
" <td>0.0</td>\n",
" <td>66.382656</td>\n",
" <td>60.955807</td>\n",
" <td>15.899542</td>\n",
" <td>4.227377</td>\n",
" <td>...</td>\n",
" <td>62.940204</td>\n",
" <td>2125.487715</td>\n",
" <td>13.580756</td>\n",
" <td>0.0</td>\n",
" <td>31.242574</td>\n",
" <td>76.319297</td>\n",
" <td>86.362759</td>\n",
" <td>6061.153566</td>\n",
" <td>86.362759</td>\n",
" <td>6061.153566</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>NaN</td>\n",
" <td>5.204500e+01</td>\n",
" <td>912.800000</td>\n",
" <td>-2.800000</td>\n",
" <td>-18.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.130000</td>\n",
" <td>0.030000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>-125.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>-119.000000</td>\n",
" <td>0.200000</td>\n",
" <td>0.000000</td>\n",
" <td>0.200000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>NaN</td>\n",
" <td>2.874480e+03</td>\n",
" <td>5011.275000</td>\n",
" <td>-0.622500</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>21.500000</td>\n",
" <td>6.282500</td>\n",
" <td>1.355000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>11.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>30.820000</td>\n",
" <td>93.500000</td>\n",
" <td>30.820000</td>\n",
" <td>93.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>NaN</td>\n",
" <td>9.468717e+03</td>\n",
" <td>13265.700000</td>\n",
" <td>0.015000</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>62.500000</td>\n",
" <td>17.800000</td>\n",
" <td>3.690000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>66.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>90.665000</td>\n",
" <td>315.500000</td>\n",
" <td>90.665000</td>\n",
" <td>315.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>NaN</td>\n",
" <td>3.118956e+04</td>\n",
" <td>28880.475000</td>\n",
" <td>0.650000</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>104.250000</td>\n",
" <td>29.485000</td>\n",
" <td>6.475000</td>\n",
" <td>...</td>\n",
" <td>2.000000</td>\n",
" <td>345.250000</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>7.250000</td>\n",
" <td>152.065000</td>\n",
" <td>1369.250000</td>\n",
" <td>152.065000</td>\n",
" <td>1369.250000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>NaN</td>\n",
" <td>1.338677e+06</td>\n",
" <td>126144.000000</td>\n",
" <td>1.600000</td>\n",
" <td>5.000000</td>\n",
" <td>0.0</td>\n",
" <td>783.000000</td>\n",
" <td>243.000000</td>\n",
" <td>72.310000</td>\n",
" <td>17.860000</td>\n",
" <td>...</td>\n",
" <td>695.000000</td>\n",
" <td>21914.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>306.000000</td>\n",
" <td>859.000000</td>\n",
" <td>355.470000</td>\n",
" <td>62341.000000</td>\n",
" <td>355.470000</td>\n",
" <td>62341.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>11 rows × 25 columns</p>\n",
"</div>"
],
"text/plain": [
" Zone Value_population Value_gdp Value_politicstab \\\n",
"count 164 1.640000e+02 164.000000 164.000000 \n",
"unique 164 NaN NaN NaN \n",
"top Afghanistan NaN NaN NaN \n",
"freq 1 NaN NaN NaN \n",
"mean NaN 3.554589e+04 20342.903659 -0.046646 \n",
"std NaN 1.133016e+05 20803.735703 0.875888 \n",
"min NaN 5.204500e+01 912.800000 -2.800000 \n",
"25% NaN 2.874480e+03 5011.275000 -0.622500 \n",
"50% NaN 9.468717e+03 13265.700000 0.015000 \n",
"75% NaN 3.118956e+04 28880.475000 0.650000 \n",
"max NaN 1.338677e+06 126144.000000 1.600000 \n",
"\n",
" Alimentation pour touristes Aliments pour animaux \\\n",
"count 164.000000 164.0 \n",
"unique NaN NaN \n",
"top NaN NaN \n",
"freq NaN NaN \n",
"mean 0.073171 0.0 \n",
"std 1.630089 0.0 \n",
"min -18.000000 0.0 \n",
"25% 0.000000 0.0 \n",
"50% 0.000000 0.0 \n",
"75% 0.000000 0.0 \n",
"max 5.000000 0.0 \n",
"\n",
" Autres utilisations (non alimentaire) \\\n",
"count 164.000000 \n",
"unique NaN \n",
"top NaN \n",
"freq NaN \n",
"mean 8.719512 \n",
"std 66.382656 \n",
"min 0.000000 \n",
"25% 0.000000 \n",
"50% 0.000000 \n",
"75% 0.000000 \n",
"max 783.000000 \n",
"\n",
" Disponibilité alimentaire (Kcal/personne/jour) \\\n",
"count 164.000000 \n",
"unique NaN \n",
"top NaN \n",
"freq NaN \n",
"mean 74.054878 \n",
"std 60.955807 \n",
"min 0.000000 \n",
"25% 21.500000 \n",
"50% 62.500000 \n",
"75% 104.250000 \n",
"max 243.000000 \n",
"\n",
" Disponibilité alimentaire en quantité (kg/personne/an) \\\n",
"count 164.000000 \n",
"unique NaN \n",
"top NaN \n",
"freq NaN \n",
"mean 20.066707 \n",
"std 15.899542 \n",
"min 0.130000 \n",
"25% 6.282500 \n",
"50% 17.800000 \n",
"75% 29.485000 \n",
"max 72.310000 \n",
"\n",
" Disponibilité de matière grasse en quantité (g/personne/jour) ... \\\n",
"count 164.000000 ... \n",
"unique NaN ... \n",
"top NaN ... \n",
"freq NaN ... \n",
"mean 4.861646 ... \n",
"std 4.227377 ... \n",
"min 0.030000 ... \n",
"25% 1.355000 ... \n",
"50% 3.690000 ... \n",
"75% 6.475000 ... \n",
"max 17.860000 ... \n",
"\n",
" Pertes Production Résidus Semences Traitement \\\n",
"count 164.000000 164.000000 164.000000 164.0 164.000000 \n",
"unique NaN NaN NaN NaN NaN \n",
"top NaN NaN NaN NaN NaN \n",
"freq NaN NaN NaN NaN NaN \n",
"mean 13.646341 622.573171 -2.829268 0.0 7.365854 \n",
"std 62.940204 2125.487715 13.580756 0.0 31.242574 \n",
"min 0.000000 0.000000 -125.000000 0.0 0.000000 \n",
"25% 0.000000 11.000000 0.000000 0.0 0.000000 \n",
"50% 0.000000 66.000000 0.000000 0.0 0.000000 \n",
"75% 2.000000 345.250000 0.000000 0.0 0.000000 \n",
"max 695.000000 21914.000000 0.000000 0.0 306.000000 \n",
"\n",
" Variation de stock Fc S Donnée calculée \\\n",
"count 164.000000 164.000000 164.000000 164.000000 \n",
"unique NaN NaN NaN NaN \n",
"top NaN NaN NaN NaN \n",
"freq NaN NaN NaN NaN \n",
"mean 14.286585 106.067134 2007.243902 106.067134 \n",
"std 76.319297 86.362759 6061.153566 86.362759 \n",
"min -119.000000 0.200000 0.000000 0.200000 \n",
"25% 0.000000 30.820000 93.500000 30.820000 \n",
"50% 0.000000 90.665000 315.500000 90.665000 \n",
"75% 7.250000 152.065000 1369.250000 152.065000 \n",
"max 859.000000 355.470000 62341.000000 355.470000 \n",
"\n",
" Données standardisées \n",
"count 164.000000 \n",
"unique NaN \n",
"top NaN \n",
"freq NaN \n",
"mean 2007.243902 \n",
"std 6061.153566 \n",
"min 0.000000 \n",
"25% 93.500000 \n",
"50% 315.500000 \n",
"75% 1369.250000 \n",
"max 62341.000000 \n",
"\n",
"[11 rows x 25 columns]"
]
},
"execution_count": 197,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agg_df.describe(include='all')"
]
},
{
"cell_type": "markdown",
"id": "fb96400b-1c3c-4b3d-9b29-53c561bb78e4",
"metadata": {},
"source": [
"### Apply standardize features: StandardScaler"
]
},
{
"cell_type": "code",
"execution_count": 198,
"id": "b5e67544-c75e-4d56-a46b-fd5643638b4d",
"metadata": {},
"outputs": [],
"source": [
"df = agg_df.set_index('Zone')"
]
},
{
"cell_type": "code",
"execution_count": 199,
"id": "0db13b8d-16d7-49f9-b689-082e1afc15b9",
"metadata": {},
"outputs": [],
"source": [
"X = df.values"
]
},
{
"cell_type": "code",
"execution_count": 200,
"id": "21e941ea-a352-4fb4-90a8-d687091fc382",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 3.62961130e+04, 2.05840000e+03, -2.80000000e+00, ...,\n",
" 1.71000000e+02, 7.40000000e+00, 1.71000000e+02],\n",
" [ 5.70097560e+04, 1.38603000e+04, -2.80000000e-01, ...,\n",
" 6.48000000e+03, 2.02050000e+02, 6.48000000e+03],\n",
" [ 2.88416900e+03, 1.27710000e+04, 3.80000000e-01, ...,\n",
" 1.49000000e+02, 1.14070000e+02, 1.49000000e+02],\n",
" ...,\n",
" [ 3.25084756e+05, 5.99148000e+04, 2.90000000e-01, ...,\n",
" 6.23410000e+04, 3.09440000e+02, 6.23410000e+04],\n",
" [ 1.06399924e+05, 2.02160000e+03, -1.68000000e+00, ...,\n",
" 4.40000000e+01, 2.00000000e-01, 4.40000000e+01],\n",
" [ 6.36039000e+02, 2.66350000e+03, 2.00000000e-01, ...,\n",
" 1.50000000e+01, 2.52700000e+01, 1.50000000e+01]])"
]
},
"execution_count": 200,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X"
]
},
{
"cell_type": "code",
"execution_count": 201,
"id": "4c06bb9a-c715-4084-aae0-ef153f79e012",
"metadata": {},
"outputs": [],
"source": [
"scale = StandardScaler()\n",
"scaled_X = scale.fit_transform(X)"
]
},
{
"cell_type": "code",
"execution_count": 202,
"id": "b47a1c1a-e887-4035-85c0-d88f0d816751",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.00664176, -0.88159673, -3.15312822, -0.04502505, 0. ,\n",
" -0.13175457, -1.13633765, -1.16943503, -1.07525907, -1.16744927,\n",
" -0.30465506, -0.22595585, -0.30802752, -0.29503457, -0.18560493,\n",
" -0.28059171, 0.20896729, 0. , -0.23648543, -0.18776827,\n",
" -1.14597232, -0.30388076, -1.14597232, -0.30388076])"
]
},
"execution_count": 202,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scaled_X[0]"
]
},
{
"cell_type": "markdown",
"id": "f34aa196-5cc8-403e-84fe-e2ca7e9565b9",
"metadata": {},
"source": [
"### Analyse PCA to gain a lower dimensional space"
]
},
{
"cell_type": "code",
"execution_count": 203,
"id": "3b917c13-8ca1-45a6-bc57-36007d15eb58",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-6 {color: black;}#sk-container-id-6 pre{padding: 0;}#sk-container-id-6 div.sk-toggleable {background-color: white;}#sk-container-id-6 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-6 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-6 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-6 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-6 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-6 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-6 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-6 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-6 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-6 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-6 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-6 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-6 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-6 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-6 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-6 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-6 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-6 div.sk-item {position: relative;z-index: 1;}#sk-container-id-6 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-6 div.sk-item::before, #sk-container-id-6 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-6 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-6 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-6 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-6 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-6 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-6 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-6 div.sk-label-container {text-align: center;}#sk-container-id-6 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-6 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-6\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>PCA(n_components=24)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-6\" type=\"checkbox\" checked><label for=\"sk-estimator-id-6\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">PCA</label><div class=\"sk-toggleable__content\"><pre>PCA(n_components=24)</pre></div></div></div></div></div>"
],
"text/plain": [
"PCA(n_components=24)"
]
},
"execution_count": 203,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pca = PCA(n_components = 24)\n",
"pca.fit(scaled_X)"
]
},
{
"cell_type": "code",
"execution_count": 204,
"id": "920026ad-ab12-4c60-a149-d569c7c5939c",
"metadata": {},
"outputs": [],
"source": [
"X_proj = pca.transform(scaled_X)"
]
},
{
"cell_type": "code",
"execution_count": 205,
"id": "30690b20-baf1-42c4-8b6d-4b09d26dc024",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(164, 24)"
]
},
"execution_count": 205,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_proj.shape"
]
},
{
"cell_type": "code",
"execution_count": 206,
"id": "40551316-a1b0-4a92-8511-f8adc1e8c2f8",
"metadata": {},
"outputs": [],
"source": [
"scree = (pca.explained_variance_ratio_*100).round(2)"
]
},
{
"cell_type": "code",
"execution_count": 207,
"id": "ea8301a9-a906-4cd5-aa7d-e015fc6cda61",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3.636e+01, 2.162e+01, 9.570e+00, 7.680e+00, 5.460e+00, 4.840e+00,\n",
" 4.310e+00, 3.630e+00, 2.380e+00, 1.720e+00, 1.420e+00, 4.900e-01,\n",
" 3.100e-01, 1.900e-01, 3.000e-02, 0.000e+00, 0.000e+00, 0.000e+00,\n",
" 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00])"
]
},
"execution_count": 207,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scree"
]
},
{
"cell_type": "code",
"execution_count": 208,
"id": "cbc48f02-0a87-43d1-863b-57cdbad61228",
"metadata": {},
"outputs": [],
"source": [
"scree_cum = scree.cumsum().round()"
]
},
{
"cell_type": "code",
"execution_count": 209,
"id": "0c48da62-2b35-4d8f-a924-0b84f9d5288c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 36., 58., 68., 75., 81., 86., 90., 93., 96., 98., 99.,\n",
" 99., 100., 100., 100., 100., 100., 100., 100., 100., 100., 100.,\n",
" 100., 100.])"
]
},
"execution_count": 209,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scree_cum"
]
},
{
"cell_type": "code",
"execution_count": 210,
"id": "ec12a9c2-e191-4a9d-924b-70c182fd98ad",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1,\n",
" 2,\n",
" 3,\n",
" 4,\n",
" 5,\n",
" 6,\n",
" 7,\n",
" 8,\n",
" 9,\n",
" 10,\n",
" 11,\n",
" 12,\n",
" 13,\n",
" 14,\n",
" 15,\n",
" 16,\n",
" 17,\n",
" 18,\n",
" 19,\n",
" 20,\n",
" 21,\n",
" 22,\n",
" 23,\n",
" 24]"
]
},
"execution_count": 210,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x_list = range(1,25)\n",
"list(x_list)"
]
},
{
"cell_type": "code",
"execution_count": 211,
"id": "1aac7a4f-0082-4efc-93c0-bbe6845d796d",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 800x550 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.bar(x_list,scree)\n",
"plt.plot(x_list,scree_cum,c='red',marker='o')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 212,
"id": "7eae730a-4b66-4947-be4e-c8f7f5e94904",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-7 {color: black;}#sk-container-id-7 pre{padding: 0;}#sk-container-id-7 div.sk-toggleable {background-color: white;}#sk-container-id-7 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-7 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-7 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-7 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-7 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-7 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-7 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-7 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-7 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-7 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-7 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-7 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-7 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-7 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-7 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-7 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-7 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-7 div.sk-item {position: relative;z-index: 1;}#sk-container-id-7 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-7 div.sk-item::before, #sk-container-id-7 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-7 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-7 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-7 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-7 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-7 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-7 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-7 div.sk-label-container {text-align: center;}#sk-container-id-7 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-7 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-7\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>PCA(n_components=9)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-7\" type=\"checkbox\" checked><label for=\"sk-estimator-id-7\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">PCA</label><div class=\"sk-toggleable__content\"><pre>PCA(n_components=9)</pre></div></div></div></div></div>"
],
"text/plain": [
"PCA(n_components=9)"
]
},
"execution_count": 212,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pca = PCA(n_components = 9)\n",
"pca.fit(scaled_X)"
]
},
{
"cell_type": "code",
"execution_count": 213,
"id": "2639018f-d53d-46ab-a934-65bcd3b67a81",
"metadata": {},
"outputs": [],
"source": [
"X_proj = pca.transform(scaled_X)"
]
},
{
"cell_type": "markdown",
"id": "a30963da-23ae-4652-a5b1-cea2847fd2ea",
"metadata": {},
"source": [
"### Cluster dataset by KMean"
]
},
{
"cell_type": "code",
"execution_count": 214,
"id": "09273bbd-b526-4584-bc47-fdf16f663165",
"metadata": {},
"outputs": [],
"source": [
"k_list = range(1, 10)"
]
},
{
"cell_type": "code",
"execution_count": 215,
"id": "84aed05a-4cc8-4255-8ced-e2b9571c7a5d",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n"
]
}
],
"source": [
"intertia = []\n",
"for i in k_list:\n",
" kmean = KMeans(n_clusters=i) # instantiate a KMeans clustering model with a variable number of clusters denoted by i\n",
" kmean.fit(X_proj) # fitting a KMeans clustering model to a dataset represented by X_proj\n",
" intertia.append(kmean.inertia_) # calculating and storing the inertia value for each iteration of the KMeans clustering algorithm"
]
},
{
"cell_type": "code",
"execution_count": 216,
"id": "9307bad7-1601-4fcd-9f77-f7dcfa05d853",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[3458.325754304632,\n",
" 2608.0749444315375,\n",
" 1909.1665724829008,\n",
" 1637.218276364663,\n",
" 1416.6284947582112,\n",
" 1179.984493100464,\n",
" 976.489129754445,\n",
" 814.188467706743,\n",
" 685.5079142382809]"
]
},
"execution_count": 216,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"intertia"
]
},
{
"cell_type": "code",
"execution_count": 217,
"id": "2836c93f-efa7-4b66-972f-44a8e08dfc09",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Axes: >"
]
},
"execution_count": 217,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 1000x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig, ax = plt.subplots(figsize=(10, 5))\n",
"sns.lineplot(x=k_list, y=intertia, ax=ax)"
]
},
{
"cell_type": "code",
"execution_count": 218,
"id": "75c95b51-1794-4ac8-ac05-e797f2ec980d",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n",
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n"
]
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 800x550 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"<Axes: title={'center': 'Distortion Score Elbow for KMeans Clustering'}, xlabel='k', ylabel='distortion score'>"
]
},
"execution_count": 218,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from yellowbrick.cluster import KElbowVisualizer\n",
"visualizer = KElbowVisualizer(kmean, k = (1, 10))\n",
"visualizer.fit(X_proj)\n",
"visualizer.show()"
]
},
{
"cell_type": "code",
"execution_count": 219,
"id": "ed550bb8-7ff4-42d1-8f4a-4bb1b73e1e5f",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\DELL\\PycharmProjects\\Book_revenue\\venv\\lib\\site-packages\\sklearn\\cluster\\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning\n",
" super()._check_params_vs_input(X, default_n_init=10)\n"
]
},
{
"data": {
"text/html": [
"<style>#sk-container-id-8 {color: black;}#sk-container-id-8 pre{padding: 0;}#sk-container-id-8 div.sk-toggleable {background-color: white;}#sk-container-id-8 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-8 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-8 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-8 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-8 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-8 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-8 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-8 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-8 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-8 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-8 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-8 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-8 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-8 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-8 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-8 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-8 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-8 div.sk-item {position: relative;z-index: 1;}#sk-container-id-8 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-8 div.sk-item::before, #sk-container-id-8 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-8 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-8 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-8 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-8 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-8 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-8 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-8 div.sk-label-container {text-align: center;}#sk-container-id-8 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-8 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-8\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>KMeans(n_clusters=3)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-8\" type=\"checkbox\" checked><label for=\"sk-estimator-id-8\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">KMeans</label><div class=\"sk-toggleable__content\"><pre>KMeans(n_clusters=3)</pre></div></div></div></div></div>"
],
"text/plain": [
"KMeans(n_clusters=3)"
]
},
"execution_count": 219,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kmean = KMeans(n_clusters=3) # instantiate a KMeans clustering model with a variable number of clusters denoted by 3\n",
"kmean.fit(X_proj)"
]
},
{
"cell_type": "code",
"execution_count": 220,
"id": "7df0e54c-5bef-41ea-9bc3-dd96c10b0ede",
"metadata": {},
"outputs": [],
"source": [
"klabels = kmean.labels_"
]
},
{
"cell_type": "code",
"execution_count": 221,
"id": "eee59f23-8742-442d-9dc3-63291c411bdc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([-3.02936038, 3.07659364, -0.46226791, -2.06578191, 2.4575722 ,\n",
" -1.47146495, 3.68980558, 3.0897495 , 3.90156962, -1.24908747,\n",
" 3.91770563, -0.14484444, -1.48054556, 2.56272868, -2.5673846 ,\n",
" 2.75451652, -0.13146576, 0.16701805, 1.78268232, -1.12410773,\n",
" -2.28669468, 13.46080219, -0.11450568, -2.73469626, 0.42190617,\n",
" -1.5128625 , -0.91975893, -2.52462097, -2.62708021, 3.22229006,\n",
" 2.29404206, 5.89222855, 2.31238783, 0.10834973, 1.56636861,\n",
" -0.67787902, 0.05190084, -1.41986302, -2.69829382, 1.18951674,\n",
" -2.68930641, 1.42205586, -0.69051563, 1.99031091, -0.11490319,\n",
" -2.21857036, 0.11517501, -0.25627658, 2.2983697 , 3.95944044,\n",
" 0.23612354, -2.51693289, -2.05916141, 2.59740593, -0.79223437,\n",
" -0.51862329, -2.50665674, -2.7321229 , 1.71589553, -1.25590132,\n",
" -1.95131174, -0.228983 , 0.93310373, 0.82843516, 0.39594374,\n",
" 1.48072645, -0.95232048, 1.03457934, 0.80823989, 4.71329676,\n",
" 0.5841687 , 2.44444706, 2.69383757, 0.20932233, -0.54550685,\n",
" -2.89809497, -2.54717706, -0.92331003, 2.29427083, -2.0907511 ,\n",
" -0.42606615, -1.82312331, -1.79497792, 0.44343518, 0.182322 ,\n",
" -0.76962891, -2.4668857 , 2.95280396, -2.30114596, -1.44958605,\n",
" -2.81875509, 0.62294486, -0.24990568, 1.18665245, -2.52444926,\n",
" 4.19126303, -2.42020627, -1.12406717, -2.51188963, 1.48818937,\n",
" -1.52045008, -0.20837738, -2.93952664, -2.78807641, -0.2053925 ,\n",
" 1.77581023, -2.69461064, -0.22523033, -2.73882135, -2.67493841,\n",
" -1.5377356 , 0.97384312, -2.13770177, 2.05858668, -0.31385708,\n",
" 2.90651655, 1.10352636, 0.49193557, -0.21136464, 3.52841316,\n",
" -2.70526626, -2.91942665, 0.06578984, -1.17728906, 0.99923799,\n",
" -1.6249364 , -2.65970226, 2.95802609, 4.50478199, 3.97988432,\n",
" 3.68321799, -1.88284617, -1.65182539, -2.53549167, -0.9540615 ,\n",
" 0.03470465, -2.8989357 , -1.84751495, -0.42734736, 0.48621991,\n",
" -0.02155223, -2.34630345, -2.41312222, -2.99539805, 0.17203541,\n",
" 0.08419739, -2.43168239, -2.24916622, 2.89364417, -1.08944937,\n",
" -2.33241112, 0.83537001, 0.30627995, -1.60013568, -1.60765685,\n",
" -0.38497733, -2.46107176, -2.4186749 , -0.60887068, 2.59013499,\n",
" -0.29551438, 20.33948129, -3.02387167, -2.34163379])"
]
},
"execution_count": 221,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_proj[:,0]"
]
},
{
"cell_type": "code",
"execution_count": 222,
"id": "81c2da59-d265-4037-b38f-450dae681521",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1.98951132e+00, -2.82669685e-01, -7.67745334e-01, 1.22390694e+00,\n",
" 1.42972482e+00, 7.27384692e-01, -4.46173266e+00, -1.59756408e+00,\n",
" -1.16714922e+00, 3.55018702e-02, -2.84719710e+00, -7.31627407e-01,\n",
" 3.86009819e-01, -3.46302194e+00, 1.98659929e+00, -3.70141817e+00,\n",
" 4.30459422e-01, -1.32452118e+00, -1.85363525e+00, -8.00119583e-02,\n",
" 6.16959520e-01, 7.89156733e+00, -7.94056025e-01, 1.38792274e+00,\n",
" -8.38905275e-01, 3.37750337e-01, -5.12583466e-01, 1.03969718e+00,\n",
" 1.39052165e+00, -1.74949153e+00, -1.92181212e+00, -4.14883850e+00,\n",
" -3.25326120e+00, -1.32567054e+00, 1.65242960e-01, -3.53915275e-01,\n",
" -1.15073045e+00, 4.73984865e-02, 1.42695341e+00, -1.83269030e+00,\n",
" 1.20842665e+00, -2.60559767e+00, -3.46467672e-01, -1.33485732e-01,\n",
" -1.11796474e+00, 7.43656521e-01, -1.36770718e+00, -8.70739583e-01,\n",
" 7.79200068e-01, 2.64148899e+00, -1.23839881e+00, 9.89398697e-01,\n",
" 9.51785904e-01, -3.58591081e+00, -8.82152984e-05, -4.41687078e-02,\n",
" 1.15349940e+00, 1.21615559e+00, -2.60547029e+00, 4.98922663e-02,\n",
" 7.51474791e-01, -6.12930886e-01, -1.12516895e+00, 8.60727055e+00,\n",
" 4.39736651e+00, 8.55383650e-01, 9.85935717e-01, -1.75656303e+00,\n",
" -2.06503493e+00, -4.09990320e+00, 5.49478298e-01, -3.12923003e+00,\n",
" 1.59358082e+00, -8.96759997e-01, -1.59072538e-01, 1.63077752e+00,\n",
" 1.10472349e+00, -5.47760397e-01, -2.82516712e+00, 6.41046305e-01,\n",
" -7.76267283e-01, 7.25112293e-01, 4.86030140e-01, -1.42741583e+00,\n",
" -1.49323400e+00, -4.33204034e-01, 1.16610981e+00, -7.11517160e-01,\n",
" 1.03818304e+00, -4.16593838e-02, 1.60162880e+00, -1.89022112e+00,\n",
" 1.94575264e-01, -2.30218160e+00, 1.09942498e+00, 2.14481602e+00,\n",
" 7.71673117e-01, -2.12878562e-01, 1.38092507e+00, 2.07669319e-01,\n",
" 7.20827715e-02, -7.92883606e-01, 1.58474497e+00, 2.30674005e+00,\n",
" -1.02336351e+00, -2.55464869e+00, 1.34936339e+00, -8.69050666e-01,\n",
" 1.42844733e+00, 1.20462417e+00, 2.95466025e+00, -1.72533110e+00,\n",
" 7.05351658e-01, 4.47585706e-01, 1.83536098e+00, 1.82546253e-01,\n",
" -1.63625703e+00, 2.08524863e+00, -1.92155890e-01, 3.06866827e-01,\n",
" 1.15187916e+00, 1.61333565e+00, 1.96699569e-01, -3.94242212e-02,\n",
" -1.58334451e+00, 1.43796917e-01, 1.46062601e+00, -3.83047145e+00,\n",
" -5.17409022e+00, -4.71601962e+00, -4.52908637e+00, 3.78386299e-01,\n",
" 3.56722817e-01, 1.03256412e+00, -2.52495468e-01, -1.20162937e+00,\n",
" 1.77979660e+00, 8.32256037e-01, -7.71953248e-01, -1.57998800e+00,\n",
" -5.31553125e-01, 9.62372749e-01, 1.04547928e+00, 1.59460145e+00,\n",
" -9.54068587e-01, 1.61794985e+00, 8.82397751e-01, 9.45474394e-01,\n",
" -3.58389202e+00, 1.97152540e-01, 8.69789354e-01, 1.61967454e+00,\n",
" 8.36748839e-01, 8.35762317e-02, 6.93314037e-02, 1.18846476e+00,\n",
" 1.01196046e+00, 1.15017449e+00, 1.49679402e+00, -2.65593762e+00,\n",
" -3.82440147e-01, 1.32789253e+01, 1.94626508e+00, 7.75882531e-01])"
]
},
"execution_count": 222,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_proj[:,1]"
]
},
{
"cell_type": "code",
"execution_count": 223,
"id": "b383dc9e-f57a-4215-b296-00803e0665e4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x1892d962280>"
]
},
"execution_count": 223,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 800x550 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=X_proj[:,0], y=X_proj[:,1], hue = klabels)\n",
"plt.legend(bbox_to_anchor=(1, 1), loc='upper left')"
]
},
{
"cell_type": "markdown",
"id": "b4871e0c-de98-4a73-9994-f216e3a2195f",
"metadata": {},
"source": [
"### Cluster dataset by AgglomerativeClustering"
]
},
{
"cell_type": "code",
"execution_count": 224,
"id": "2c381c65-7fe6-4a15-ba8e-f21306284bb2",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 1200x800 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"Z = linkage(X_proj, method='ward')\n",
"fig, ax = plt.subplots(1,1,figsize=(12,8))\n",
"_ = dendrogram(Z, truncate_mode='lastp', ax=ax)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 225,
"id": "ddfdcbe9-9e06-4129-98da-4d4787f1b606",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-9 {color: black;}#sk-container-id-9 pre{padding: 0;}#sk-container-id-9 div.sk-toggleable {background-color: white;}#sk-container-id-9 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-9 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-9 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-9 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-9 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-9 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-9 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-9 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-9 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-9 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-9 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-9 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-9 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-9 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-9 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-9 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-9 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-9 div.sk-item {position: relative;z-index: 1;}#sk-container-id-9 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-9 div.sk-item::before, #sk-container-id-9 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-9 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-9 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-9 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-9 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-9 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-9 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-9 div.sk-label-container {text-align: center;}#sk-container-id-9 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-9 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-9\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>AgglomerativeClustering(n_clusters=3)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-9\" type=\"checkbox\" checked><label for=\"sk-estimator-id-9\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">AgglomerativeClustering</label><div class=\"sk-toggleable__content\"><pre>AgglomerativeClustering(n_clusters=3)</pre></div></div></div></div></div>"
],
"text/plain": [
"AgglomerativeClustering(n_clusters=3)"
]
},
"execution_count": 225,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cah = AgglomerativeClustering(n_clusters=3, linkage='ward')\n",
"cah.fit(X_proj)"
]
},
{
"cell_type": "code",
"execution_count": 226,
"id": "b267cd7d-4670-414e-8ecc-7bc584e24304",
"metadata": {},
"outputs": [],
"source": [
"cah_label = cah.labels_"
]
},
{
"cell_type": "code",
"execution_count": 227,
"id": "72158a42-eefa-4ab3-a479-00c9766b8bb6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x1892d95c700>"
]
},
"execution_count": 227,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 800x550 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=X_proj[:,0], y=X_proj[:,1], hue = cah_label)\n",
"plt.legend(bbox_to_anchor=(1, 1), loc='upper left')"
]
},
{
"cell_type": "markdown",
"id": "f2585cb0-fea8-4271-8fb8-fdd7a74058c7",
"metadata": {},
"source": [
"### Add the clusters to dataframe"
]
},
{
"cell_type": "code",
"execution_count": 228,
"id": "22e2e58f-0519-4ff7-b50c-a07f3e4f7a30",
"metadata": {},
"outputs": [],
"source": [
"agg_df['kmean'] = klabels"
]
},
{
"cell_type": "code",
"execution_count": 229,
"id": "480bed93-69ae-4998-b713-b8dc363a104a",
"metadata": {},
"outputs": [],
"source": [
"agg_df['linkage'] = cah_label"
]
},
{
"cell_type": "code",
"execution_count": 230,
"id": "7f3abca1-6ebf-4745-9fdb-bf3416ce4de1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Zone</th>\n",
" <th>Value_population</th>\n",
" <th>Value_gdp</th>\n",
" <th>Value_politicstab</th>\n",
" <th>Alimentation pour touristes</th>\n",
" <th>Aliments pour animaux</th>\n",
" <th>Autres utilisations (non alimentaire)</th>\n",
" <th>Disponibilité alimentaire (Kcal/personne/jour)</th>\n",
" <th>Disponibilité alimentaire en quantité (kg/personne/an)</th>\n",
" <th>Disponibilité de matière grasse en quantité (g/personne/jour)</th>\n",
" <th>...</th>\n",
" <th>Résidus</th>\n",
" <th>Semences</th>\n",
" <th>Traitement</th>\n",
" <th>Variation de stock</th>\n",
" <th>Fc</th>\n",
" <th>S</th>\n",
" <th>Donnée calculée</th>\n",
" <th>Données standardisées</th>\n",
" <th>kmean</th>\n",
" <th>linkage</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Afghanistan</td>\n",
" <td>36296.113</td>\n",
" <td>2058.4</td>\n",
" <td>-2.80</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>5.0</td>\n",
" <td>1.53</td>\n",
" <td>0.33</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>7.40</td>\n",
" <td>171.0</td>\n",
" <td>7.40</td>\n",
" <td>171.0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Afrique du Sud</td>\n",
" <td>57009.756</td>\n",
" <td>13860.3</td>\n",
" <td>-0.28</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>143.0</td>\n",
" <td>35.69</td>\n",
" <td>9.25</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>202.05</td>\n",
" <td>6480.0</td>\n",
" <td>202.05</td>\n",
" <td>6480.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Albanie</td>\n",
" <td>2884.169</td>\n",
" <td>12771.0</td>\n",
" <td>0.38</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>85.0</td>\n",
" <td>16.36</td>\n",
" <td>6.45</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>4.0</td>\n",
" <td>114.07</td>\n",
" <td>149.0</td>\n",
" <td>114.07</td>\n",
" <td>149.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Algérie</td>\n",
" <td>41389.189</td>\n",
" <td>11737.4</td>\n",
" <td>-0.92</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>22.0</td>\n",
" <td>6.38</td>\n",
" <td>1.50</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>31.85</td>\n",
" <td>831.0</td>\n",
" <td>31.85</td>\n",
" <td>831.0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Allemagne</td>\n",
" <td>82658.409</td>\n",
" <td>53071.5</td>\n",
" <td>0.59</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>71.0</td>\n",
" <td>19.47</td>\n",
" <td>4.16</td>\n",
" <td>...</td>\n",
" <td>-38.0</td>\n",
" <td>0.0</td>\n",
" <td>167.0</td>\n",
" <td>-29.0</td>\n",
" <td>102.59</td>\n",
" <td>6450.0</td>\n",
" <td>102.59</td>\n",
" <td>6450.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>159</th>\n",
" <td>Émirats arabes unis</td>\n",
" <td>9487.203</td>\n",
" <td>67183.6</td>\n",
" <td>0.62</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>147.0</td>\n",
" <td>43.47</td>\n",
" <td>9.25</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>-26.0</td>\n",
" <td>214.52</td>\n",
" <td>1373.0</td>\n",
" <td>214.52</td>\n",
" <td>1373.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>160</th>\n",
" <td>Équateur</td>\n",
" <td>16785.361</td>\n",
" <td>11617.9</td>\n",
" <td>-0.07</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>83.0</td>\n",
" <td>19.31</td>\n",
" <td>6.35</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>-1.0</td>\n",
" <td>114.81</td>\n",
" <td>1021.0</td>\n",
" <td>114.81</td>\n",
" <td>1021.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>161</th>\n",
" <td>États-Unis d'Amérique</td>\n",
" <td>325084.756</td>\n",
" <td>59914.8</td>\n",
" <td>0.29</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>89.0</td>\n",
" <td>219.0</td>\n",
" <td>55.68</td>\n",
" <td>14.83</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>77.0</td>\n",
" <td>80.0</td>\n",
" <td>309.44</td>\n",
" <td>62341.0</td>\n",
" <td>309.44</td>\n",
" <td>62341.0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>162</th>\n",
" <td>Éthiopie</td>\n",
" <td>106399.924</td>\n",
" <td>2021.6</td>\n",
" <td>-1.68</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.13</td>\n",
" <td>0.03</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.20</td>\n",
" <td>44.0</td>\n",
" <td>0.20</td>\n",
" <td>44.0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>163</th>\n",
" <td>Îles Salomon</td>\n",
" <td>636.039</td>\n",
" <td>2663.5</td>\n",
" <td>0.20</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>18.0</td>\n",
" <td>4.45</td>\n",
" <td>1.31</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>3.0</td>\n",
" <td>25.27</td>\n",
" <td>15.0</td>\n",
" <td>25.27</td>\n",
" <td>15.0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>164 rows × 27 columns</p>\n",
"</div>"
],
"text/plain": [
" Zone Value_population Value_gdp Value_politicstab \\\n",
"0 Afghanistan 36296.113 2058.4 -2.80 \n",
"1 Afrique du Sud 57009.756 13860.3 -0.28 \n",
"2 Albanie 2884.169 12771.0 0.38 \n",
"3 Algérie 41389.189 11737.4 -0.92 \n",
"4 Allemagne 82658.409 53071.5 0.59 \n",
".. ... ... ... ... \n",
"159 Émirats arabes unis 9487.203 67183.6 0.62 \n",
"160 Équateur 16785.361 11617.9 -0.07 \n",
"161 États-Unis d'Amérique 325084.756 59914.8 0.29 \n",
"162 Éthiopie 106399.924 2021.6 -1.68 \n",
"163 Îles Salomon 636.039 2663.5 0.20 \n",
"\n",
" Alimentation pour touristes Aliments pour animaux \\\n",
"0 0.0 0.0 \n",
"1 0.0 0.0 \n",
"2 0.0 0.0 \n",
"3 0.0 0.0 \n",
"4 0.0 0.0 \n",
".. ... ... \n",
"159 0.0 0.0 \n",
"160 0.0 0.0 \n",
"161 0.0 0.0 \n",
"162 0.0 0.0 \n",
"163 0.0 0.0 \n",
"\n",
" Autres utilisations (non alimentaire) \\\n",
"0 0.0 \n",
"1 0.0 \n",
"2 0.0 \n",
"3 0.0 \n",
"4 0.0 \n",
".. ... \n",
"159 0.0 \n",
"160 0.0 \n",
"161 89.0 \n",
"162 0.0 \n",
"163 0.0 \n",
"\n",
" Disponibilité alimentaire (Kcal/personne/jour) \\\n",
"0 5.0 \n",
"1 143.0 \n",
"2 85.0 \n",
"3 22.0 \n",
"4 71.0 \n",
".. ... \n",
"159 147.0 \n",
"160 83.0 \n",
"161 219.0 \n",
"162 0.0 \n",
"163 18.0 \n",
"\n",
" Disponibilité alimentaire en quantité (kg/personne/an) \\\n",
"0 1.53 \n",
"1 35.69 \n",
"2 16.36 \n",
"3 6.38 \n",
"4 19.47 \n",
".. ... \n",
"159 43.47 \n",
"160 19.31 \n",
"161 55.68 \n",
"162 0.13 \n",
"163 4.45 \n",
"\n",
" Disponibilité de matière grasse en quantité (g/personne/jour) ... \\\n",
"0 0.33 ... \n",
"1 9.25 ... \n",
"2 6.45 ... \n",
"3 1.50 ... \n",
"4 4.16 ... \n",
".. ... ... \n",
"159 9.25 ... \n",
"160 6.35 ... \n",
"161 14.83 ... \n",
"162 0.03 ... \n",
"163 1.31 ... \n",
"\n",
" Résidus Semences Traitement Variation de stock Fc S \\\n",
"0 0.0 0.0 0.0 0.0 7.40 171.0 \n",
"1 0.0 0.0 0.0 0.0 202.05 6480.0 \n",
"2 0.0 0.0 0.0 4.0 114.07 149.0 \n",
"3 0.0 0.0 0.0 0.0 31.85 831.0 \n",
"4 -38.0 0.0 167.0 -29.0 102.59 6450.0 \n",
".. ... ... ... ... ... ... \n",
"159 0.0 0.0 0.0 -26.0 214.52 1373.0 \n",
"160 0.0 0.0 0.0 -1.0 114.81 1021.0 \n",
"161 0.0 0.0 77.0 80.0 309.44 62341.0 \n",
"162 0.0 0.0 0.0 0.0 0.20 44.0 \n",
"163 0.0 0.0 0.0 3.0 25.27 15.0 \n",
"\n",
" Donnée calculée Données standardisées kmean linkage \n",
"0 7.40 171.0 0 2 \n",
"1 202.05 6480.0 1 0 \n",
"2 114.07 149.0 0 0 \n",
"3 31.85 831.0 0 2 \n",
"4 102.59 6450.0 1 0 \n",
".. ... ... ... ... \n",
"159 214.52 1373.0 1 0 \n",
"160 114.81 1021.0 0 0 \n",
"161 309.44 62341.0 2 1 \n",
"162 0.20 44.0 0 2 \n",
"163 25.27 15.0 0 2 \n",
"\n",
"[164 rows x 27 columns]"
]
},
"execution_count": 230,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agg_df"
]
},
{
"cell_type": "code",
"execution_count": 231,
"id": "c38610ad-e573-4be8-a66b-90f3843aef35",
"metadata": {},
"outputs": [],
"source": [
"df_linkage = agg_df.set_index('Zone')"
]
},
{
"cell_type": "code",
"execution_count": 232,
"id": "06fd49ac-10fe-4978-a27a-28303b2182f9",
"metadata": {},
"outputs": [],
"source": [
"df_linkage = df_linkage.groupby('linkage').mean().reset_index()"
]
},
{
"cell_type": "code",
"execution_count": 233,
"id": "e2aeab41-47e5-4c15-9552-1a212757b8b3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['linkage', 'Value_population', 'Value_gdp', 'Value_politicstab',\n",
" 'Alimentation pour touristes', 'Aliments pour animaux',\n",
" 'Autres utilisations (non alimentaire)',\n",
" 'Disponibilité alimentaire (Kcal/personne/jour)',\n",
" 'Disponibilité alimentaire en quantité (kg/personne/an)',\n",
" 'Disponibilité de matière grasse en quantité (g/personne/jour)',\n",
" 'Disponibilité de protéines en quantité (g/personne/jour)',\n",
" 'Disponibilité intérieure', 'Exportations - Quantité',\n",
" 'Importations - Quantité', 'Nourriture', 'Pertes', 'Production',\n",
" 'Résidus', 'Semences', 'Traitement', 'Variation de stock', 'Fc', 'S',\n",
" 'Donnée calculée', 'Données standardisées', 'kmean'],\n",
" dtype='object')"
]
},
"execution_count": 233,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_linkage.columns"
]
},
{
"cell_type": "code",
"execution_count": 234,
"id": "ce342d11-ecc4-40b8-9548-b29ac0a0b37c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>linkage</th>\n",
" <th>Value_population</th>\n",
" <th>Value_politicstab</th>\n",
" <th>Value_gdp</th>\n",
" <th>Alimentation pour touristes</th>\n",
" <th>Aliments pour animaux</th>\n",
" <th>Autres utilisations (non alimentaire)</th>\n",
" <th>Disponibilité intérieure</th>\n",
" <th>Exportations - Quantité</th>\n",
" <th>Importations - Quantité</th>\n",
" <th>Production</th>\n",
" <th>kmean</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>25720.330372</td>\n",
" <td>0.264787</td>\n",
" <td>29560.870213</td>\n",
" <td>0.074468</td>\n",
" <td>0.0</td>\n",
" <td>5.797872</td>\n",
" <td>615.361702</td>\n",
" <td>98.287234</td>\n",
" <td>134.521277</td>\n",
" <td>603.212766</td>\n",
" <td>0.723404</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>266459.289500</td>\n",
" <td>-0.090000</td>\n",
" <td>37219.700000</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>44.500000</td>\n",
" <td>14124.000000</td>\n",
" <td>3957.500000</td>\n",
" <td>63.000000</td>\n",
" <td>18057.500000</td>\n",
" <td>2.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>42336.708794</td>\n",
" <td>-0.475882</td>\n",
" <td>7104.044118</td>\n",
" <td>0.073529</td>\n",
" <td>0.0</td>\n",
" <td>11.705882</td>\n",
" <td>155.441176</td>\n",
" <td>1.573529</td>\n",
" <td>20.544118</td>\n",
" <td>136.544118</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" linkage Value_population Value_politicstab Value_gdp \\\n",
"0 0 25720.330372 0.264787 29560.870213 \n",
"1 1 266459.289500 -0.090000 37219.700000 \n",
"2 2 42336.708794 -0.475882 7104.044118 \n",
"\n",
" Alimentation pour touristes Aliments pour animaux \\\n",
"0 0.074468 0.0 \n",
"1 0.000000 0.0 \n",
"2 0.073529 0.0 \n",
"\n",
" Autres utilisations (non alimentaire) Disponibilité intérieure \\\n",
"0 5.797872 615.361702 \n",
"1 44.500000 14124.000000 \n",
"2 11.705882 155.441176 \n",
"\n",
" Exportations - Quantité Importations - Quantité Production kmean \n",
"0 98.287234 134.521277 603.212766 0.723404 \n",
"1 3957.500000 63.000000 18057.500000 2.000000 \n",
"2 1.573529 20.544118 136.544118 0.000000 "
]
},
"execution_count": 234,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_linkage[['linkage', 'Value_population', 'Value_politicstab','Value_gdp','Alimentation pour touristes', 'Aliments pour animaux',\n",
" 'Autres utilisations (non alimentaire)','Disponibilité intérieure','Exportations - Quantité','Importations - Quantité','Production','kmean']]"
]
},
{
"cell_type": "markdown",
"id": "9e03e520-1bc5-4e10-a4d6-bf85f4e17b28",
"metadata": {},
"source": [
"## CONCLUSIONS"
]
},
{
"cell_type": "markdown",
"id": "f7e8e820-773c-4876-93a7-9729a9218597",
"metadata": {},
"source": [
"1. Group 0 does not show potential as a market. Their production is lower in comparison with the other two groups low indicators for food usage suggest limited potential.\n",
"2. Group 2 is either not a potential market. Their production and export quantities are lowest indicators with negative politic stability.\n",
"3. Group 1, on the other hand, shows potential as a market. They have high production levels, with 78% of the supply being domestic and the remaining 22% being exported, this indicates that there is a spare room of demand for food imports."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment