Created
April 3, 2022 00:50
-
-
Save scottyla19/acec3fc1012648f20ee0a8406dad63c2 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Importing csv data into MySQL database\n", | |
| "I downloaded the 2019 Lahman Baseball database script and used it populate my local database. The only problem is that the script held the data through the 2019 season and I wanted to add the updated data through the 2021 season. \n", | |
| "\n", | |
| "I downloaded the 2021 csv file and attempted to import it. Using the import data tool in MySQL Workbench, I lost most of the data due to NaN and null values. It turnse out that the Workbench import tool ignores any rows with empty spaces in the csv. The process I used in pandas was as follows:\n", | |
| "1. Read the 2021 csv file using `pd.read_csv`.\n", | |
| "2. Filter the resulting dataframe for years > 2019.\n", | |
| "3. Export the updated dataframe to a new csv using `pd.to_csv`.\n", | |
| "\n", | |
| "The problem with this process was that `to_csv` by default exports blank spaces into the csv for any `NaN` values in the dataframe. The `na_rep` argument in the `to_csv` function comes to the rescue. Using `na_rep = 'NULL'` fixes the issue as the Workbench import tool will now import null values and include the expected rows.\n", | |
| "\n", | |
| "The code below offers a simple function and creates new filtered csvs for each file. The only other step is to use the Data Import tool in MySQL Workbench and import each file into an existing table. Now I have all data through the 2021 World Series." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "import pandas as pd" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 12, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "def filterCSV(filename):\n", | |
| " df = pd.read_csv(filename + '.csv')\n", | |
| " if filename == 'HomeGames':\n", | |
| " df = df[df['year.key'] > 2019]\n", | |
| " else:\n", | |
| " df = df[df['yearID'] > 2019]\n", | |
| " df.to_csv(filename + '2021.csv', na_rep='NULL')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 13, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "AllstarFull\n", | |
| "Appearances\n", | |
| "Batting\n", | |
| "BattingPost\n", | |
| "Fielding\n", | |
| "FieldingOF\n", | |
| "FieldingOFSplit\n", | |
| "FieldingPost\n", | |
| "HomeGames\n", | |
| "Managers\n", | |
| "ManagersHalf\n", | |
| "Pitching\n", | |
| "PitchingPost\n", | |
| "SeriesPost\n", | |
| "Teams\n", | |
| "TeamsHalf\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "fileList = ['AllstarFull', 'Appearances','Batting','BattingPost','Fielding','FieldingOF','FieldingOFSplit',\n", | |
| "'FieldingPost','HomeGames','Managers', 'ManagersHalf','Pitching','PitchingPost','SeriesPost',\n", | |
| "'Teams', 'TeamsHalf']\n", | |
| "\n", | |
| "for file in fileList:\n", | |
| " filterCSV(file)\n" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "interpreter": { | |
| "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49" | |
| }, | |
| "kernelspec": { | |
| "display_name": "Python 3.9.1 64-bit", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.9.1" | |
| }, | |
| "orig_nbformat": 4 | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 2 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment