Created
October 5, 2022 01:56
-
-
Save AhAzizPy/6f739bf79aac719919353c3f2a46999f to your computer and use it in GitHub Desktop.
Recommendation System on E-commerce domain 'Amazon Reviews data repository'
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "id": "8a82d192", | |
| "metadata": {}, | |
| "source": [ | |
| "# Recommendation System on E-commerce domain 'Amazon Reviews data repository'\n", | |
| "*by Ahmad Abdelaziz*\n", | |
| "\n", | |
| "Background and Context\n", | |
| "\n", | |
| "Online E-commerce websites like Amazon, Flipkart uses different recommendation models to provide different suggestions to different users. Amazon currently uses item-to-item collaborative filtering, which scales to massive data sets and produces high-quality recommendations in real-time.\n", | |
| "\n", | |
| "Objective\n", | |
| "\n", | |
| "Build a recommendation system to recommend products to customers based on their previous ratings for other products. Apply the concepts and techniques you have learned in the previous weeks and summarise your insights at the end.\n", | |
| "\n", | |
| "\n", | |
| "Dataset:\n", | |
| "\n", | |
| "We are using the Electronics dataset from the Amazon Reviews data repository, which has several datasets.\n", | |
| "\n", | |
| "Attribute Information\n", | |
| "\n", | |
| "- `userId`: Every user identified with a unique id\n", | |
| "- `productId`: Every product identified with a unique id\n", | |
| "- `Rating`: Rating of the corresponding product by the corresponding user\n", | |
| "- `timestamp`: Time of the rating ( ignore this column for this exercise)\n", | |
| "\n", | |
| "### Contents:\n", | |
| "\n", | |
| "### A. Data Overview\n", | |
| "\n", | |
| "### B. EDA\n", | |
| "\n", | |
| "### C. Understanding of Data Columns\n", | |
| " - Taking a subset\n", | |
| " - Split the data to train and test\n", | |
| "\n", | |
| "### D. Popularity Recommender model\n", | |
| "\n", | |
| "### E. Collaborative Filtering model\n", | |
| "\n", | |
| "### F. Model Evaluation\n", | |
| "\n", | |
| "### G. Top 5 recommendations based on user habits\n", | |
| "\n", | |
| "### H. Summary and Insights" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "id": "321ecb89", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "#Import required libraries\n", | |
| "import numpy as np\n", | |
| "import pandas as pd\n", | |
| "import math\n", | |
| "import json\n", | |
| "import time\n", | |
| "import matplotlib.pyplot as plt\n", | |
| "import seaborn as sns\n", | |
| "from sklearn.metrics.pairwise import cosine_similarity\n", | |
| "from sklearn.model_selection import train_test_split\n", | |
| "from sklearn.neighbors import NearestNeighbors\n", | |
| "## from sklearn.externals import joblib\n", | |
| "import scipy.sparse\n", | |
| "from scipy.sparse import csr_matrix\n", | |
| "import warnings; warnings.simplefilter('ignore')\n", | |
| "%matplotlib inline" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "932fea5a", | |
| "metadata": {}, | |
| "source": [ | |
| "### A. Data Overview" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "id": "0f83f6dc", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "#Import the data set\n", | |
| "df = pd.read_csv('ratings_Electronics.csv')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "id": "bef8c106", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>AKM1MP6P0OYPR</th>\n", | |
| " <th>0132793040</th>\n", | |
| " <th>5.0</th>\n", | |
| " <th>1365811200</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>6499999</th>\n", | |
| " <td>A18O8K0G77YZSQ</td>\n", | |
| " <td>B0096TK65A</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>1356652800</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1969094</th>\n", | |
| " <td>A3L8JGRGG919GV</td>\n", | |
| " <td>B001EM9JXC</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>1391731200</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2058125</th>\n", | |
| " <td>A2DCTBX9FV3AYJ</td>\n", | |
| " <td>B001GX6MJ8</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>1384819200</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3007076</th>\n", | |
| " <td>A3CG93783LP0FO</td>\n", | |
| " <td>B0031RGEVI</td>\n", | |
| " <td>4.0</td>\n", | |
| " <td>1273017600</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>539064</th>\n", | |
| " <td>A26OWAOKDNVT1Y</td>\n", | |
| " <td>B00064O89Y</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>1124064000</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " AKM1MP6P0OYPR 0132793040 5.0 1365811200\n", | |
| "6499999 A18O8K0G77YZSQ B0096TK65A 5.0 1356652800\n", | |
| "1969094 A3L8JGRGG919GV B001EM9JXC 5.0 1391731200\n", | |
| "2058125 A2DCTBX9FV3AYJ B001GX6MJ8 5.0 1384819200\n", | |
| "3007076 A3CG93783LP0FO B0031RGEVI 4.0 1273017600\n", | |
| "539064 A26OWAOKDNVT1Y B00064O89Y 5.0 1124064000" | |
| ] | |
| }, | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "#Displaying a sample of 5 entries from the dataset\n", | |
| "df.sample(5)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 4, | |
| "id": "24fd2d48", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "df.columns =['userId', 'productId', 'rating', 'timestamp']" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 5, | |
| "id": "3340a360", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>userId</th>\n", | |
| " <th>productId</th>\n", | |
| " <th>rating</th>\n", | |
| " <th>timestamp</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>A2CX7LUOHB2NDG</td>\n", | |
| " <td>0321732944</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>1341100800</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>A2NWSAGRHCP8N5</td>\n", | |
| " <td>0439886341</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>1367193600</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>A2WNBOD3WNDNKT</td>\n", | |
| " <td>0439886341</td>\n", | |
| " <td>3.0</td>\n", | |
| " <td>1374451200</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>A1GI0U4ZRJA8WN</td>\n", | |
| " <td>0439886341</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>1334707200</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>A1QGNMC6O1VW39</td>\n", | |
| " <td>0511189877</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>1397433600</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " userId productId rating timestamp\n", | |
| "0 A2CX7LUOHB2NDG 0321732944 5.0 1341100800\n", | |
| "1 A2NWSAGRHCP8N5 0439886341 1.0 1367193600\n", | |
| "2 A2WNBOD3WNDNKT 0439886341 3.0 1374451200\n", | |
| "3 A1GI0U4ZRJA8WN 0439886341 1.0 1334707200\n", | |
| "4 A1QGNMC6O1VW39 0511189877 5.0 1397433600" | |
| ] | |
| }, | |
| "execution_count": 5, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "df.head(5)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 6, | |
| "id": "e8dd7386", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "#Dropping time stamp column as it will not be used\n", | |
| "df.drop(['timestamp'],axis=1,inplace=True)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 7, | |
| "id": "f49b5e1b", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "(7824481, 3)" | |
| ] | |
| }, | |
| "execution_count": 7, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "df.shape" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 8, | |
| "id": "715018cc", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "<class 'pandas.core.frame.DataFrame'>\n", | |
| "RangeIndex: 7824481 entries, 0 to 7824480\n", | |
| "Data columns (total 3 columns):\n", | |
| " # Column Dtype \n", | |
| "--- ------ ----- \n", | |
| " 0 userId object \n", | |
| " 1 productId object \n", | |
| " 2 rating float64\n", | |
| "dtypes: float64(1), object(2)\n", | |
| "memory usage: 179.1+ MB\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "df.info()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 9, | |
| "id": "75c44f2f", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "userId 0\n", | |
| "productId 0\n", | |
| "rating 0\n", | |
| "dtype: int64" | |
| ] | |
| }, | |
| "execution_count": 9, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "#Counting the number of missing cells in each column\n", | |
| "df.isna().sum()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 10, | |
| "id": "818335ff", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>count</th>\n", | |
| " <th>unique</th>\n", | |
| " <th>top</th>\n", | |
| " <th>freq</th>\n", | |
| " <th>mean</th>\n", | |
| " <th>std</th>\n", | |
| " <th>min</th>\n", | |
| " <th>25%</th>\n", | |
| " <th>50%</th>\n", | |
| " <th>75%</th>\n", | |
| " <th>max</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>userId</th>\n", | |
| " <td>7824481</td>\n", | |
| " <td>4201696</td>\n", | |
| " <td>A5JLAU2ARJ0BO</td>\n", | |
| " <td>520</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>productId</th>\n", | |
| " <td>7824481</td>\n", | |
| " <td>476001</td>\n", | |
| " <td>B0074BW614</td>\n", | |
| " <td>18244</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>rating</th>\n", | |
| " <td>7824481.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>4.012337</td>\n", | |
| " <td>1.38091</td>\n", | |
| " <td>1.0</td>\n", | |
| " <td>3.0</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>5.0</td>\n", | |
| " <td>5.0</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " count unique top freq mean std min \\\n", | |
| "userId 7824481 4201696 A5JLAU2ARJ0BO 520 NaN NaN NaN \n", | |
| "productId 7824481 476001 B0074BW614 18244 NaN NaN NaN \n", | |
| "rating 7824481.0 NaN NaN NaN 4.012337 1.38091 1.0 \n", | |
| "\n", | |
| " 25% 50% 75% max \n", | |
| "userId NaN NaN NaN NaN \n", | |
| "productId NaN NaN NaN NaN \n", | |
| "rating 3.0 5.0 5.0 5.0 " | |
| ] | |
| }, | |
| "execution_count": 10, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "#Statistical description for all features\n", | |
| "df.describe(include='all').T" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "a834a4f8", | |
| "metadata": {}, | |
| "source": [ | |
| "### B. EDA" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 11, | |
| "id": "61caa4f9", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<AxesSubplot:xlabel='rating', ylabel='count'>" | |
| ] | |
| }, | |
| "execution_count": 11, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| }, | |
| { | |
| "data": { | |
| "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAERCAYAAABxZrw0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAOuklEQVR4nO3de4zlZX3H8fcHdi1YINjs1CILHWOQFimCjpS6CcjGNKsoNGoJpmBrabcmQiFBjfSPtrbpX1ajUmyzykVUMLSIQeKNVC6KXJyFRVhWEoLQbqTucr80oQW//eOc7Q6zM7Nnd+c35+xz3q9ksuf8Luf33W+yn3n2d57znFQVkqT27DPsAiRJ3TDgJalRBrwkNcqAl6RGGfCS1CgDXpIaNXIBn+TSJFuS3Dfg8acnuT/JxiRXdl2fJO0tMmrz4JOcCDwHXFFVR+/k2COAq4HVVfVkkl+vqi1LUackjbqRG8FX1S3AEzO3JXldku8kWZ/kB0l+q7/rz4GLq+rJ/rmGuyT1jVzAz2MdcG5VvRn4CPD5/vbXA69PcmuS25OsGVqFkjRilg27gJ1JcgDwVuBfk2zb/Cv9P5cBRwBvA1YCP0hydFU9tcRlStLIGfmAp/e/jKeq6tg59m0Gbq+q/wV+luQBeoH/4yWsT5JG0sjfoqmqZ+iF9x8CpOeN/d3fAE7ub19B75bNQ8OoU5JGzcgFfJKrgNuAI5NsTnI28EfA2UnuATYCp/UP/y7weJL7gRuBj1bV48OoW5JGzchNk5QkLY6RG8FLkhbHSL3JumLFipqcnBx2GZK011i/fv1jVTUx176RCvjJyUmmp6eHXYYk7TWSPDLfPm/RSFKjDHhJapQBL0mNMuAlqVEGvCQ1yoCXpEYZ8JLUKANekhplwEtSo0bqk6yStKf+6YJvDruETpzzqXfv8jmO4CWpUQa8JDXKgJekRhnwktQoA16SGmXAS1KjDHhJapQBL0mNMuAlqVEGvCQ1yoCXpEYZ8JLUqM4DPsm+Se5Ocn3X15IkbbcUI/jzgE1LcB1J0gydBnySlcApwBe7vI4kaUddj+A/A3wM+OV8ByRZm2Q6yfTWrVs7LkeSxkdnAZ/kXcCWqlq/0HFVta6qpqpqamJioqtyJGnsdDmCXwWcmuRh4GvA6iRf6fB6kqQZOgv4qrqwqlZW1SRwBvD9qjqzq+tJkl7OefCS1Kgl+dLtqroJuGkpriVJ6nEEL0mNMuAlqVEGvCQ1yoCXpEYZ8JLUKANekhplwEtSowx4SWqUAS9JjTLgJalRBrwkNcqAl6RGGfCS1CgDXpIaZcBLUqMMeElqlAEvSY0y4CWpUQa8JDXKgJekRhnwktQoA16SGmXAS1KjDHhJapQBL0mNMuAlqVEGvCQ1yoCXpEYZ8JLUKANekhplwEtSowx4SWqUAS9JjTLgJalRBrwkNcqAl6RGGfCS1CgDXpIaZcBLUqM6C/gk+yW5M8k9STYm+URX15Ik7WhZh6/9ArC6qp5Lshz4YZJvV9XtHV5TktTXWcBXVQHP9Z8u7/9UV9eTJL1cp/fgk+ybZAOwBbihqu6Y45i1SaaTTG/durXLciRprHQa8FX1UlUdC6wEjk9y9BzHrKuqqaqampiY6LIcSRorSzKLpqqeAm4C1izF9SRJ3c6imUhycP/x/sDbgZ92dT1J0st1OYvmEOBLSfal94vk6qq6vsPrSZJm6HIWzU+A47p6fUnSwvwkqyQ1yoCXpEYZ8JLUKANekhplwEtSowx4SWqUAS9JjTLgJalRBrwkNcqAl6RGGfCS1CgDXpIaZcBLUqMMeElqlAEvSY0y4CWpUQa8JDVqoIBP8u+DbJMkjY4Fv7IvyX7AK4EVSV4FpL/rIOA1HdcmSdoDO/tO1r8AzqcX5uvZHvDPABd3V5YkaU8tGPBV9Vngs0nOraqLlqgmSdIi2NkIHoCquijJW4HJmedU1RUd1SVJ2kMDBXySLwOvAzYAL/U3F2DAS9KIGijggSngqKqqLouRJC2eQefB3wf8RpeFSJIW16Aj+BXA/UnuBF7YtrGqTu2kKknSHhs04P+2yyIkSYtv0Fk0N3ddiCRpcQ06i+ZZerNmAF4BLAeer6qDuipMkrRnBh3BHzjzeZI/AI7voiBJ0uLYrdUkq+obwOrFLUWStJgGvUXznhlP96E3L9458ZI0wgadRfPuGY9fBB4GTlv0aiRJi2bQe/Af7LoQSdLiGvQLP1YmuTbJliS/SHJNkpVdFydJ2n2Dvsl6GXAdvXXhDwW+2d8mSRpRgwb8RFVdVlUv9n8uByY6rEuStIcGDfjHkpyZZN/+z5nA410WJknaM4MG/J8CpwP/BTwKvA9Y8I3XJIcluTHJpiQbk5y3Z6VKknbFoNMk/x7446p6EiDJrwH/SC/45/MicEFV3ZXkQGB9khuq6v49qliSNJBBR/DHbAt3gKp6AjhuoROq6tGquqv/+FlgE703aCVJS2DQgN8nyau2PemP4Acd/ZNkkt4vhDt2qTpJ0m4bNKQ/Bfwoyb/RW6LgdOAfBjkxyQHANcD5VfXMHPvXAmsBDj/88AHLkSTtzEAj+Kq6Angv8AtgK/Ceqvryzs5LspxeuH+1qr4+z2uvq6qpqpqamHDmpSQtloFvs/TfHB34DdIkAS4BNlXVp3ejNknSHtit5YIHtAo4C1idZEP/550dXk+SNMPAI/hdVVU/BNLV60uSFtblCF6SNEQGvCQ1yoCXpEYZ8JLUKANekhplwEtSowx4SWqUAS9JjTLgJalRBrwkNcqAl6RGGfCS1CgDXpIaZcBLUqMMeElqlAEvSY0y4CWpUQa8JDXKgJekRnX2naySls7NJ5407BI6cdItNw+7hL2aI3hJapQBL0mNMuAlqVEGvCQ1yoCXpEYZ8JLUKANekhplwEtSowx4SWqUAS9JjTLgJalRBrwkNcqAl6RGGfCS1CgDXpIaZcBLUqMMeElqlAEvSY0y4CWpUZ0FfJJLk2xJcl9X15Akza/LEfzlwJoOX1+StIDOAr6qbgGe6Or1JUkLG/o9+CRrk0wnmd66deuwy5GkZgw94KtqXVVNVdXUxMTEsMuRpGYMPeAlSd0w4CWpUV1Ok7wKuA04MsnmJGd3dS1J0o6WdfXCVfX+xXqtN3/0isV6qZGy/pMfGHYJkhrmLRpJapQBL0mNMuAlqVEGvCQ1yoCXpEYZ8JLUKANekhplwEtSowx4SWqUAS9JjTLgJalRBrwkNaqzxcakrq26aNWwS+jErefeOuwS1AhH8JLUKANekhplwEtSowx4SWqUAS9JjTLgJalRBrwkNcp58HuZ//i73xl2CZ04/K/vHXYJUnMcwUtSowx4SWqUAS9JjTLgJalRBrwkNcqAl6RGGfCS1CgDXpIaZcBLUqMMeElqlAEvSY0y4CWpUQa8JDXKgJekRhnwktQoA16SGmXAS1KjDHhJalSnAZ9kTZIHkjyY5ONdXkuS9HKdBXySfYGLgXcARwHvT3JUV9eTJL1clyP444EHq+qhqvof4GvAaR1eT5I0Q6qqmxdO3gesqao/6z8/C/jdqjpn1nFrgbX9p0cCD3RS0OBWAI8NuYZRYS+2sxfb2YvtRqEXv1lVE3PtWNbhRTPHth1+m1TVOmBdh3XskiTTVTU17DpGgb3Yzl5sZy+2G/VedHmLZjNw2IznK4Gfd3g9SdIMXQb8j4Ejkrw2ySuAM4DrOryeJGmGzm7RVNWLSc4BvgvsC1xaVRu7ut4iGpnbRSPAXmxnL7azF9uNdC86e5NVkjRcfpJVkhplwEtSo8Yy4JNcmmRLkvvm2Z8kn+svsfCTJG9a6hqXSpLDktyYZFOSjUnOm+OYsehHkv2S3Jnknn4vPjHHMWPRC+h9Gj3J3Umun2Pf2PQBIMnDSe5NsiHJ9Bz7R7IfYxnwwOXAmgX2vwM4ov+zFvjnJahpWF4ELqiq3wZOAD48x5IS49KPF4DVVfVG4FhgTZITZh0zLr0AOA/YNM++cerDNidX1bHzzHsfyX6MZcBX1S3AEwscchpwRfXcDhyc5JClqW5pVdWjVXVX//Gz9P5BHzrrsLHoR//v91z/6fL+z+xZCGPRiyQrgVOAL85zyFj0YReMZD/GMuAHcCjwnzOeb2bH0GtOkkngOOCOWbvGph/92xIbgC3ADVU1rr34DPAx4Jfz7B+XPmxTwPeSrO8vrzLbSPbDgJ/bQMsstCTJAcA1wPlV9czs3XOc0mQ/quqlqjqW3ievj09y9KxDmu9FkncBW6pq/UKHzbGtqT7Msqqq3kTvVsyHk5w4a/9I9sOAn9tYLbOQZDm9cP9qVX19jkPGqh8AVfUUcBM7vlczDr1YBZya5GF6q8CuTvKVWceMQx/+X1X9vP/nFuBaeqvlzjSS/TDg53Yd8IH+O+MnAE9X1aPDLqoLSQJcAmyqqk/Pc9hY9CPJRJKD+4/3B94O/HTWYc33oqourKqVVTVJb4mR71fVmbMOa74P2yT51SQHbnsM/D4wewbeSPajy9UkR1aSq4C3ASuSbAb+ht4balTVvwDfAt4JPAj8N/DB4VS6JFYBZwH39u89A/wVcDiMXT8OAb6U3pfV7ANcXVXXJ/kQjF0vdjDGfXg1cG1vLMQy4Mqq+s7e0A+XKpCkRnmLRpIaZcBLUqMMeElqlAEvSY0y4CWpUQa8NIck5yd55Yzn39o2R17aWzhNUmOr/yGvVNUO6630P8U5VVWPLXlh0iJxBK+xkmSyv/b954G7gEuSTM9c/z3JXwKvAW5McmN/28NJVsw4/wv9c77X/9QrSd7SXwv8tiSfzDzfNyAtFQNe4+hIeku7HkdvLfwp4BjgpCTHVNXn6K0jcnJVnTzH+UcAF1fVG4CngPf2t18GfKiqfg94qeu/hLQzBrzG0SP9NbsBTk9yF3A38AZg9pedzOVnVbWh/3g9MNm/P39gVf2ov/3KRaxX2i1juRaNxt7zAEleC3wEeEtVPZnkcmC/Ac5/Ycbjl4D9mXu5WGmoHMFrnB1EL+yfTvJqemt9b/MscOCgL1RVTwLPzviKvzMWrUppNzmC19iqqnuS3A1sBB4Cbp2xex3w7SSPznMffi5nA19I8jy9teSfXsx6pV3lNElpkSQ5YNt3uib5OHBIVZ035LI0xhzBS4vnlCQX0vt39QjwJ8MtR+POEbwkNco3WSWpUQa8JDXKgJekRhnwktQoA16SGvV/abe7J5gNATkAAAAASUVORK5CYII=\n", | |
| "text/plain": [ | |
| "<Figure size 432x288 with 1 Axes>" | |
| ] | |
| }, | |
| "metadata": { | |
| "needs_background": "light" | |
| }, | |
| "output_type": "display_data" | |
| } | |
| ], | |
| "source": [ | |
| "# Check the distribution of ratings \n", | |
| "sns.countplot(data=df,x='rating')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "84a94436", | |
| "metadata": {}, | |
| "source": [ | |
| "### C. Understanding of Data Columns" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "9a8d2ce7", | |
| "metadata": {}, | |
| "source": [ | |
| "Taking a subset of the dataframe for users who contributed 50 or more reviews" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 12, | |
| "id": "5c0c9889", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "A5JLAU2ARJ0BO 520\n", | |
| "ADLVFFE4VBT8 501\n", | |
| "A3OXHLG6DIBRW8 498\n", | |
| "A6FIAB28IS79 431\n", | |
| "A680RUE1FDO8B 406\n", | |
| " ... \n", | |
| "A2SM8TX57PEN4V 1\n", | |
| "A9J3QVF3B3VVP 1\n", | |
| "A3EWVBSOOC5WVI 1\n", | |
| "A5IFA8Z0UDEE0 1\n", | |
| "A2OA9XEK0Y2NE0 1\n", | |
| "Name: userId, Length: 4201696, dtype: int64" | |
| ] | |
| }, | |
| "execution_count": 12, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "count = df.userId.value_counts()\n", | |
| "count" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 13, | |
| "id": "03d0852a", | |
| "metadata": { | |
| "scrolled": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>userId</th>\n", | |
| " <th>productId</th>\n", | |
| " <th>rating</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>93</th>\n", | |
| " <td>A3BY5KCNQZXV5U</td>\n", | |
| " <td>0594451647</td>\n", | |
| " <td>5.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>117</th>\n", | |
| " <td>AT09WGFUM934H</td>\n", | |
| " <td>0594481813</td>\n", | |
| " <td>3.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>176</th>\n", | |
| " <td>A32HSNCNPRUMTR</td>\n", | |
| " <td>0970407998</td>\n", | |
| " <td>1.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>177</th>\n", | |
| " <td>A17HMM1M7T9PJ1</td>\n", | |
| " <td>0970407998</td>\n", | |
| " <td>4.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>491</th>\n", | |
| " <td>A3CLWR1UUZT6TG</td>\n", | |
| " <td>0972683275</td>\n", | |
| " <td>5.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>...</th>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>7824425</th>\n", | |
| " <td>A1E1LEVQ9VQNK</td>\n", | |
| " <td>B00LGQ6HL8</td>\n", | |
| " <td>5.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>7824426</th>\n", | |
| " <td>A2NYK9KWFMJV4Y</td>\n", | |
| " <td>B00LGQ6HL8</td>\n", | |
| " <td>5.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>7824435</th>\n", | |
| " <td>A3AYSYSLHU26U9</td>\n", | |
| " <td>B00LI4ZZO8</td>\n", | |
| " <td>4.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>7824437</th>\n", | |
| " <td>A2NYK9KWFMJV4Y</td>\n", | |
| " <td>B00LI4ZZO8</td>\n", | |
| " <td>5.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>7824443</th>\n", | |
| " <td>A2BYV7S1QP2YIG</td>\n", | |
| " <td>B00LKG1MC8</td>\n", | |
| " <td>5.0</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "<p>125871 rows × 3 columns</p>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " userId productId rating\n", | |
| "93 A3BY5KCNQZXV5U 0594451647 5.0\n", | |
| "117 AT09WGFUM934H 0594481813 3.0\n", | |
| "176 A32HSNCNPRUMTR 0970407998 1.0\n", | |
| "177 A17HMM1M7T9PJ1 0970407998 4.0\n", | |
| "491 A3CLWR1UUZT6TG 0972683275 5.0\n", | |
| "... ... ... ...\n", | |
| "7824425 A1E1LEVQ9VQNK B00LGQ6HL8 5.0\n", | |
| "7824426 A2NYK9KWFMJV4Y B00LGQ6HL8 5.0\n", | |
| "7824435 A3AYSYSLHU26U9 B00LI4ZZO8 4.0\n", | |
| "7824437 A2NYK9KWFMJV4Y B00LI4ZZO8 5.0\n", | |
| "7824443 A2BYV7S1QP2YIG B00LKG1MC8 5.0\n", | |
| "\n", | |
| "[125871 rows x 3 columns]" | |
| ] | |
| }, | |
| "execution_count": 13, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "dfs = df[df.userId.isin(count[count >= 50].index)]\n", | |
| "dfs" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 14, | |
| "id": "9240860a", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "(125871, 3)" | |
| ] | |
| }, | |
| "execution_count": 14, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "dfs.shape" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 15, | |
| "id": "07a49cf9", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Number of unique USERS in final data = 1540\n", | |
| "Number of unique ITEMS in final data = 1540\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "print('Number of unique USERS in final data = ', dfs['userId'].nunique())\n", | |
| "print('Number of unique ITEMS in final data = ', dfs['userId'].nunique())" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "2d490d29", | |
| "metadata": {}, | |
| "source": [ | |
| "Splitting the data into train and test" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 16, | |
| "id": "6f29acdd", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>userId</th>\n", | |
| " <th>productId</th>\n", | |
| " <th>rating</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>6595852</th>\n", | |
| " <td>A2BYV7S1QP2YIG</td>\n", | |
| " <td>B009EAHVTA</td>\n", | |
| " <td>5.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4738240</th>\n", | |
| " <td>AB094YABX21WQ</td>\n", | |
| " <td>B0056XCEAA</td>\n", | |
| " <td>1.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4175595</th>\n", | |
| " <td>A3D0UM4ZD2CMAW</td>\n", | |
| " <td>B004I763AW</td>\n", | |
| " <td>5.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3753015</th>\n", | |
| " <td>AATWFX0ZZSE6C</td>\n", | |
| " <td>B0040NPHMO</td>\n", | |
| " <td>3.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1734766</th>\n", | |
| " <td>A1NNMOD9H36Q8E</td>\n", | |
| " <td>B0015VW3BM</td>\n", | |
| " <td>4.0</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " userId productId rating\n", | |
| "6595852 A2BYV7S1QP2YIG B009EAHVTA 5.0\n", | |
| "4738240 AB094YABX21WQ B0056XCEAA 1.0\n", | |
| "4175595 A3D0UM4ZD2CMAW B004I763AW 5.0\n", | |
| "3753015 AATWFX0ZZSE6C B0040NPHMO 3.0\n", | |
| "1734766 A1NNMOD9H36Q8E B0015VW3BM 4.0" | |
| ] | |
| }, | |
| "execution_count": 16, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "#Split the data randomnly into test and train datasets\n", | |
| "#Split the training and test data in the ratio 70:30\n", | |
| "train_data, test_data = train_test_split(dfs, test_size = 0.3, random_state=0)\n", | |
| "train_data.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 17, | |
| "id": "a8e72cfd", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "(88109, 3)" | |
| ] | |
| }, | |
| "execution_count": 17, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "train_data.shape" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 18, | |
| "id": "5533a95e", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "(37762, 3)" | |
| ] | |
| }, | |
| "execution_count": 18, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "test_data.shape" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "805e9968", | |
| "metadata": {}, | |
| "source": [ | |
| "### D. Popularity Recommender model" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 19, | |
| "id": "b3c5eaf3", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>productId</th>\n", | |
| " <th>usercount</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>9812</th>\n", | |
| " <td>B000TTMJ2E</td>\n", | |
| " <td>2</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>34882</th>\n", | |
| " <td>B00CBO14FI</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>25196</th>\n", | |
| " <td>B004ZOD8LU</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>33219</th>\n", | |
| " <td>B00A2T6X0K</td>\n", | |
| " <td>2</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>29898</th>\n", | |
| " <td>B007PAR6SC</td>\n", | |
| " <td>1</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " productId usercount\n", | |
| "9812 B000TTMJ2E 2\n", | |
| "34882 B00CBO14FI 1\n", | |
| "25196 B004ZOD8LU 1\n", | |
| "33219 B00A2T6X0K 2\n", | |
| "29898 B007PAR6SC 1" | |
| ] | |
| }, | |
| "execution_count": 19, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "#Count of user_id for each unique product as recommendation score \n", | |
| "train_data_grouped = train_data.groupby('productId').agg({'userId': 'count'}).reset_index()\n", | |
| "train_data_grouped.rename(columns = {'userId': 'usercount'},inplace=True)\n", | |
| "train_data_grouped.sample(5)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 20, | |
| "id": "2010b8f3", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>productId</th>\n", | |
| " <th>usercount</th>\n", | |
| " <th>rank</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>30847</th>\n", | |
| " <td>B0088CJT4U</td>\n", | |
| " <td>133</td>\n", | |
| " <td>1.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>30287</th>\n", | |
| " <td>B007WTAJTO</td>\n", | |
| " <td>124</td>\n", | |
| " <td>2.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>19647</th>\n", | |
| " <td>B003ES5ZUU</td>\n", | |
| " <td>122</td>\n", | |
| " <td>3.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>8752</th>\n", | |
| " <td>B000N99BBC</td>\n", | |
| " <td>114</td>\n", | |
| " <td>4.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>30555</th>\n", | |
| " <td>B00829THK0</td>\n", | |
| " <td>97</td>\n", | |
| " <td>5.0</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " productId usercount rank\n", | |
| "30847 B0088CJT4U 133 1.0\n", | |
| "30287 B007WTAJTO 124 2.0\n", | |
| "19647 B003ES5ZUU 122 3.0\n", | |
| "8752 B000N99BBC 114 4.0\n", | |
| "30555 B00829THK0 97 5.0" | |
| ] | |
| }, | |
| "execution_count": 20, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "#Sort the products on recommendation score \n", | |
| "train_data_sort = train_data_grouped.sort_values(['usercount', 'productId'], ascending = [0,1]) \n", | |
| " \n", | |
| "#Generate a recommendation rank based upon score \n", | |
| "train_data_sort['rank'] = train_data_sort['usercount'].rank(ascending=0, method='first') \n", | |
| " \n", | |
| "#Get the top 5 recommendations \n", | |
| "popularity_recommendations = train_data_sort.head(5) \n", | |
| "popularity_recommendations" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 21, | |
| "id": "edf50427", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Use popularity based recommender model to make predictions\n", | |
| "def recommend(user_id): \n", | |
| " user_recommendations = popularity_recommendations \n", | |
| " \n", | |
| " #Add user_id column for which the recommendations are being generated \n", | |
| " user_recommendations['userId'] = user_id \n", | |
| " \n", | |
| " #Bring user_id column to the front \n", | |
| " cols = user_recommendations.columns.tolist() \n", | |
| " cols = cols[-1:] + cols[:-1] \n", | |
| " user_recommendations = user_recommendations[cols] \n", | |
| " \n", | |
| " return user_recommendations" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 22, | |
| "id": "3c64c342", | |
| "metadata": { | |
| "scrolled": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Here is the recommendation for the userId: {}\n", | |
| " A2BYV7S1QP2YIG\n", | |
| " userId productId usercount rank\n", | |
| "30847 A2BYV7S1QP2YIG B0088CJT4U 133 1.0\n", | |
| "30287 A2BYV7S1QP2YIG B007WTAJTO 124 2.0\n", | |
| "19647 A2BYV7S1QP2YIG B003ES5ZUU 122 3.0\n", | |
| "8752 A2BYV7S1QP2YIG B000N99BBC 114 4.0\n", | |
| "30555 A2BYV7S1QP2YIG B00829THK0 97 5.0\n", | |
| "\n", | |
| "\n", | |
| "Here is the recommendation for the userId: {}\n", | |
| " A1NNMOD9H36Q8E\n", | |
| " userId productId usercount rank\n", | |
| "30847 A1NNMOD9H36Q8E B0088CJT4U 133 1.0\n", | |
| "30287 A1NNMOD9H36Q8E B007WTAJTO 124 2.0\n", | |
| "19647 A1NNMOD9H36Q8E B003ES5ZUU 122 3.0\n", | |
| "8752 A1NNMOD9H36Q8E B000N99BBC 114 4.0\n", | |
| "30555 A1NNMOD9H36Q8E B00829THK0 97 5.0\n", | |
| "\n", | |
| "\n", | |
| "Here is the recommendation for the userId: {}\n", | |
| " AATWFX0ZZSE6C\n", | |
| " userId productId usercount rank\n", | |
| "30847 AATWFX0ZZSE6C B0088CJT4U 133 1.0\n", | |
| "30287 AATWFX0ZZSE6C B007WTAJTO 124 2.0\n", | |
| "19647 AATWFX0ZZSE6C B003ES5ZUU 122 3.0\n", | |
| "8752 AATWFX0ZZSE6C B000N99BBC 114 4.0\n", | |
| "30555 AATWFX0ZZSE6C B00829THK0 97 5.0\n", | |
| "\n", | |
| "\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "find_recom = ['A2BYV7S1QP2YIG','A1NNMOD9H36Q8E','AATWFX0ZZSE6C'] # This list is user choice.\n", | |
| "for i in find_recom:\n", | |
| " print(\"Here is the recommendation for the userId: {}\\n\" ,i)\n", | |
| " print(recommend(i)) \n", | |
| " print(\"\\n\") " | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "14c69528", | |
| "metadata": {}, | |
| "source": [ | |
| "It is obvious that the popularity based recommender gives the same recommmendation to all users and is lacking any kind of personalization." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "255edd0d", | |
| "metadata": {}, | |
| "source": [ | |
| "### E. Collaborative Filtering model" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "50a0c17c", | |
| "metadata": {}, | |
| "source": [ | |
| "Applying the surprise library which utilized the SVD concept to make rating predictions \n", | |
| "- Method 1 is just a simple illustration of the surprise library, feeding both userid and productid and it utilizes the pre-rated products to provide new predictions.\n", | |
| "- Method 2 using surprise library to get the top 5 reccomendations while also excluding the previously rated products by the user from this list, thus making it smarter and recommending new products to the user and keep the user engaged." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "711d6834", | |
| "metadata": {}, | |
| "source": [ | |
| "#### Method 1" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 23, | |
| "id": "c1051fb3", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "#Using the surprise library for collaborative SVD model\n", | |
| "from surprise import Dataset\n", | |
| "from surprise import Reader\n", | |
| "from surprise import SVD\n", | |
| "from surprise.model_selection import cross_validate" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 24, | |
| "id": "0344307e", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "#Creating surprise objects\n", | |
| "reader = Reader(rating_scale=(1, 5))\n", | |
| "data = Dataset.load_from_df(dfs[['userId', 'productId', 'rating']], reader)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "22374fe7", | |
| "metadata": {}, | |
| "source": [ | |
| "### F. Model Evaluation" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 25, | |
| "id": "84ff466e", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Processing epoch 0\n", | |
| "Processing epoch 1\n", | |
| "Processing epoch 2\n", | |
| "Processing epoch 3\n", | |
| "Processing epoch 4\n", | |
| "Processing epoch 5\n", | |
| "Processing epoch 6\n", | |
| "Processing epoch 7\n", | |
| "Processing epoch 8\n", | |
| "Processing epoch 9\n", | |
| "Processing epoch 0\n", | |
| "Processing epoch 1\n", | |
| "Processing epoch 2\n", | |
| "Processing epoch 3\n", | |
| "Processing epoch 4\n", | |
| "Processing epoch 5\n", | |
| "Processing epoch 6\n", | |
| "Processing epoch 7\n", | |
| "Processing epoch 8\n", | |
| "Processing epoch 9\n", | |
| "Processing epoch 0\n", | |
| "Processing epoch 1\n", | |
| "Processing epoch 2\n", | |
| "Processing epoch 3\n", | |
| "Processing epoch 4\n", | |
| "Processing epoch 5\n", | |
| "Processing epoch 6\n", | |
| "Processing epoch 7\n", | |
| "Processing epoch 8\n", | |
| "Processing epoch 9\n", | |
| "Evaluating RMSE, MAE of algorithm SVD on 3 split(s).\n", | |
| "\n", | |
| " Fold 1 Fold 2 Fold 3 Mean Std \n", | |
| "RMSE (testset) 0.9911 0.9844 0.9896 0.9884 0.0029 \n", | |
| "MAE (testset) 0.7383 0.7341 0.7399 0.7374 0.0025 \n", | |
| "Fit time 2.32 2.46 2.41 2.40 0.06 \n", | |
| "Test time 0.33 0.32 0.31 0.32 0.01 \n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "{'test_rmse': array([0.99113029, 0.98441446, 0.98960218]),\n", | |
| " 'test_mae': array([0.73832072, 0.73410076, 0.73991664]),\n", | |
| " 'fit_time': (2.317796230316162, 2.4607317447662354, 2.4088938236236572),\n", | |
| " 'test_time': (0.3263695240020752, 0.3244481086730957, 0.31456804275512695)}" | |
| ] | |
| }, | |
| "execution_count": 25, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "#Initiating the SVD classifier and cross validation parameters\n", | |
| "svd = SVD(verbose=True, n_epochs=10)\n", | |
| "cross_validate(svd, data, measures=['RMSE', 'MAE'], cv=3, verbose=True)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 26, | |
| "id": "9789cf20", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "Prediction(uid='AATWFX0ZZSE6C', iid='B0088CJT4U', r_ui=None, est=4.2812266820651095, details={'was_impossible': False})" | |
| ] | |
| }, | |
| "execution_count": 26, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "svd.predict(uid='AATWFX0ZZSE6C', iid='B0088CJT4U')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "3cf32715", | |
| "metadata": {}, | |
| "source": [ | |
| "#### Method 2" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 27, | |
| "id": "474bc99d", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "#Creating surprise objects\n", | |
| "reader = Reader(rating_scale=(1, 5))\n", | |
| "data2 = Dataset.load_from_df(train_data[['userId', 'productId', 'rating']], reader)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 28, | |
| "id": "cdc5a1ae", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Processing epoch 0\n", | |
| "Processing epoch 1\n", | |
| "Processing epoch 2\n", | |
| "Processing epoch 3\n", | |
| "Processing epoch 4\n", | |
| "Processing epoch 5\n", | |
| "Processing epoch 6\n", | |
| "Processing epoch 7\n", | |
| "Processing epoch 8\n", | |
| "Processing epoch 9\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<surprise.prediction_algorithms.matrix_factorization.SVD at 0x24182558bb0>" | |
| ] | |
| }, | |
| "execution_count": 28, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "#fitting the SVD model on the train dataset\n", | |
| "trainset = data2.build_full_trainset()\n", | |
| "svd.fit(trainset)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 29, | |
| "id": "c8f80bc5", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "Prediction(uid='A1NNMOD9H36Q8E', iid='B0088CJT4U', r_ui=None, est=4.10961378631626, details={'was_impossible': False})" | |
| ] | |
| }, | |
| "execution_count": 29, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "#Making a trial prediction to check the output\n", | |
| "svd.predict(uid='A1NNMOD9H36Q8E', iid='B0088CJT4U')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "96b09276", | |
| "metadata": {}, | |
| "source": [ | |
| "### F. Model Evaluation" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 30, | |
| "id": "3e4a2c3b", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Processing epoch 0\n", | |
| "Processing epoch 1\n", | |
| "Processing epoch 2\n", | |
| "Processing epoch 3\n", | |
| "Processing epoch 4\n", | |
| "Processing epoch 5\n", | |
| "Processing epoch 6\n", | |
| "Processing epoch 7\n", | |
| "Processing epoch 8\n", | |
| "Processing epoch 9\n", | |
| "Processing epoch 0\n", | |
| "Processing epoch 1\n", | |
| "Processing epoch 2\n", | |
| "Processing epoch 3\n", | |
| "Processing epoch 4\n", | |
| "Processing epoch 5\n", | |
| "Processing epoch 6\n", | |
| "Processing epoch 7\n", | |
| "Processing epoch 8\n", | |
| "Processing epoch 9\n", | |
| "Processing epoch 0\n", | |
| "Processing epoch 1\n", | |
| "Processing epoch 2\n", | |
| "Processing epoch 3\n", | |
| "Processing epoch 4\n", | |
| "Processing epoch 5\n", | |
| "Processing epoch 6\n", | |
| "Processing epoch 7\n", | |
| "Processing epoch 8\n", | |
| "Processing epoch 9\n", | |
| "Evaluating RMSE, MAE of algorithm SVD on 3 split(s).\n", | |
| "\n", | |
| " Fold 1 Fold 2 Fold 3 Mean Std \n", | |
| "RMSE (testset) 0.9905 0.9921 0.9822 0.9882 0.0043 \n", | |
| "MAE (testset) 0.7372 0.7385 0.7340 0.7366 0.0019 \n", | |
| "Fit time 2.44 2.48 2.50 2.47 0.03 \n", | |
| "Test time 0.24 0.24 0.35 0.28 0.05 \n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "{'test_rmse': array([0.9904555 , 0.99210045, 0.9821631 ]),\n", | |
| " 'test_mae': array([0.73722976, 0.73846946, 0.7340454 ]),\n", | |
| " 'fit_time': (2.436018228530884, 2.4807779788970947, 2.496033191680908),\n", | |
| " 'test_time': (0.23685503005981445, 0.2428913116455078, 0.3540534973144531)}" | |
| ] | |
| }, | |
| "execution_count": 30, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "#Checking the performance on the train dataset\n", | |
| "cross_validate(svd, data, measures=['RMSE', 'MAE'], cv=3, verbose=True)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 31, | |
| "id": "960235d9", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th>productId</th>\n", | |
| " <th>0594451647</th>\n", | |
| " <th>0594481813</th>\n", | |
| " <th>0970407998</th>\n", | |
| " <th>0972683275</th>\n", | |
| " <th>1400501466</th>\n", | |
| " <th>1400501520</th>\n", | |
| " <th>1400501776</th>\n", | |
| " <th>1400532620</th>\n", | |
| " <th>1400532655</th>\n", | |
| " <th>140053271X</th>\n", | |
| " <th>...</th>\n", | |
| " <th>B00L5YZCCG</th>\n", | |
| " <th>B00L8I6SFY</th>\n", | |
| " <th>B00L8QCVL6</th>\n", | |
| " <th>B00LA6T0LS</th>\n", | |
| " <th>B00LBZ1Z7K</th>\n", | |
| " <th>B00LED02VY</th>\n", | |
| " <th>B00LGN7Y3G</th>\n", | |
| " <th>B00LGQ6HL8</th>\n", | |
| " <th>B00LI4ZZO8</th>\n", | |
| " <th>B00LKG1MC8</th>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>userId</th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>A100UD67AHFODS</th>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>A100WO06OQR8BQ</th>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>A105S56ODHGJEK</th>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>A105TOJ6LTVMBG</th>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>A10AFVU66A79Y1</th>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>...</th>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>AZBXKUH4AIW3X</th>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>AZCE11PSTCH1L</th>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>AZMY6E8B52L2T</th>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>AZNUHQSHZHSUE</th>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>AZOK5STV85FBJ</th>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "<p>1540 rows × 48190 columns</p>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| "productId 0594451647 0594481813 0970407998 0972683275 1400501466 \\\n", | |
| "userId \n", | |
| "A100UD67AHFODS 0.0 0.0 0.0 0.0 0.0 \n", | |
| "A100WO06OQR8BQ 0.0 0.0 0.0 0.0 0.0 \n", | |
| "A105S56ODHGJEK 0.0 0.0 0.0 0.0 0.0 \n", | |
| "A105TOJ6LTVMBG 0.0 0.0 0.0 0.0 0.0 \n", | |
| "A10AFVU66A79Y1 0.0 0.0 0.0 0.0 0.0 \n", | |
| "... ... ... ... ... ... \n", | |
| "AZBXKUH4AIW3X 0.0 0.0 0.0 0.0 0.0 \n", | |
| "AZCE11PSTCH1L 0.0 0.0 0.0 0.0 0.0 \n", | |
| "AZMY6E8B52L2T 0.0 0.0 0.0 0.0 0.0 \n", | |
| "AZNUHQSHZHSUE 0.0 0.0 0.0 0.0 0.0 \n", | |
| "AZOK5STV85FBJ 0.0 0.0 0.0 0.0 0.0 \n", | |
| "\n", | |
| "productId 1400501520 1400501776 1400532620 1400532655 140053271X \\\n", | |
| "userId \n", | |
| "A100UD67AHFODS 0.0 0.0 0.0 0.0 0.0 \n", | |
| "A100WO06OQR8BQ 0.0 0.0 0.0 0.0 0.0 \n", | |
| "A105S56ODHGJEK 0.0 0.0 0.0 0.0 0.0 \n", | |
| "A105TOJ6LTVMBG 0.0 0.0 0.0 0.0 0.0 \n", | |
| "A10AFVU66A79Y1 0.0 0.0 0.0 0.0 0.0 \n", | |
| "... ... ... ... ... ... \n", | |
| "AZBXKUH4AIW3X 0.0 0.0 0.0 0.0 0.0 \n", | |
| "AZCE11PSTCH1L 0.0 0.0 0.0 0.0 0.0 \n", | |
| "AZMY6E8B52L2T 0.0 0.0 0.0 0.0 0.0 \n", | |
| "AZNUHQSHZHSUE 0.0 0.0 0.0 0.0 0.0 \n", | |
| "AZOK5STV85FBJ 0.0 0.0 0.0 0.0 0.0 \n", | |
| "\n", | |
| "productId ... B00L5YZCCG B00L8I6SFY B00L8QCVL6 B00LA6T0LS \\\n", | |
| "userId ... \n", | |
| "A100UD67AHFODS ... 0.0 0.0 0.0 0.0 \n", | |
| "A100WO06OQR8BQ ... 0.0 0.0 0.0 0.0 \n", | |
| "A105S56ODHGJEK ... 0.0 0.0 0.0 0.0 \n", | |
| "A105TOJ6LTVMBG ... 0.0 0.0 0.0 0.0 \n", | |
| "A10AFVU66A79Y1 ... 0.0 0.0 0.0 0.0 \n", | |
| "... ... ... ... ... ... \n", | |
| "AZBXKUH4AIW3X ... 0.0 0.0 0.0 0.0 \n", | |
| "AZCE11PSTCH1L ... 0.0 0.0 0.0 0.0 \n", | |
| "AZMY6E8B52L2T ... 0.0 0.0 0.0 0.0 \n", | |
| "AZNUHQSHZHSUE ... 0.0 0.0 0.0 0.0 \n", | |
| "AZOK5STV85FBJ ... 0.0 0.0 0.0 0.0 \n", | |
| "\n", | |
| "productId B00LBZ1Z7K B00LED02VY B00LGN7Y3G B00LGQ6HL8 B00LI4ZZO8 \\\n", | |
| "userId \n", | |
| "A100UD67AHFODS 0.0 0.0 0.0 0.0 0.0 \n", | |
| "A100WO06OQR8BQ 0.0 0.0 0.0 0.0 0.0 \n", | |
| "A105S56ODHGJEK 0.0 0.0 0.0 0.0 0.0 \n", | |
| "A105TOJ6LTVMBG 0.0 0.0 0.0 0.0 0.0 \n", | |
| "A10AFVU66A79Y1 0.0 0.0 0.0 0.0 0.0 \n", | |
| "... ... ... ... ... ... \n", | |
| "AZBXKUH4AIW3X 0.0 0.0 0.0 0.0 0.0 \n", | |
| "AZCE11PSTCH1L 0.0 0.0 0.0 0.0 0.0 \n", | |
| "AZMY6E8B52L2T 0.0 0.0 0.0 0.0 0.0 \n", | |
| "AZNUHQSHZHSUE 0.0 0.0 0.0 0.0 0.0 \n", | |
| "AZOK5STV85FBJ 0.0 0.0 0.0 0.0 0.0 \n", | |
| "\n", | |
| "productId B00LKG1MC8 \n", | |
| "userId \n", | |
| "A100UD67AHFODS 0.0 \n", | |
| "A100WO06OQR8BQ 0.0 \n", | |
| "A105S56ODHGJEK 0.0 \n", | |
| "A105TOJ6LTVMBG 0.0 \n", | |
| "A10AFVU66A79Y1 0.0 \n", | |
| "... ... \n", | |
| "AZBXKUH4AIW3X 0.0 \n", | |
| "AZCE11PSTCH1L 0.0 \n", | |
| "AZMY6E8B52L2T 0.0 \n", | |
| "AZNUHQSHZHSUE 0.0 \n", | |
| "AZOK5STV85FBJ 0.0 \n", | |
| "\n", | |
| "[1540 rows x 48190 columns]" | |
| ] | |
| }, | |
| "execution_count": 31, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "#creating a new datafram for all users and corresponding product rating\n", | |
| "pivot_dfs = dfs.pivot(index = 'userId', columns ='productId', values = 'rating').fillna(0)\n", | |
| "pivot_dfs" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 32, | |
| "id": "953873a5", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "(1540, 48190)" | |
| ] | |
| }, | |
| "execution_count": 32, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "#checking shape of dataframe\n", | |
| "pivot_dfs.shape" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "4faad9fc", | |
| "metadata": {}, | |
| "source": [ | |
| "### G. Top 5 recommendations based on user habits" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 33, | |
| "id": "271bf4af", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# Get top n new recommendations for an individual user, function takes in userid and number of recomendations required\n", | |
| "#This function excludes the previously rated items from recommendation list\n", | |
| "#This avoid the recommendation list to be dominated by 5 star previously rated items\n", | |
| "\n", | |
| "def product_rec(userid,n):\n", | |
| " \n", | |
| " #initiating an empty previously rated list\n", | |
| " rated =[]\n", | |
| " \n", | |
| " for i in pivot_dfs.columns:\n", | |
| " \n", | |
| " #To avoid overwriting the existing rating of rated products only zero rated products will be predicted\n", | |
| " \n", | |
| " #identifying the items rated by user\n", | |
| " if pivot_dfs.loc[userid][i] != 0:\n", | |
| " rated.append(i)\n", | |
| " \n", | |
| " if pivot_dfs.loc[userid][i] == 0:\n", | |
| " pr_pred = svd.predict(uid=userid, iid=i)\n", | |
| " pivot_dfs.loc[userid][i] = pr_pred.est\n", | |
| " \n", | |
| " \n", | |
| " #sorting the top recommendations\n", | |
| " rec_list = pd.DataFrame(pivot_dfs.loc[userid].sort_values(axis=0, ascending=False))\n", | |
| " \n", | |
| " #dropping the identified rated items \n", | |
| " rec_list.drop(index=rated, axis=0, inplace=True)\n", | |
| " \n", | |
| " top_n_rec = rec_list.head(n)\n", | |
| " \n", | |
| " return top_n_rec" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 34, | |
| "id": "9ece0f5b", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>A105TOJ6LTVMBG</th>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>productId</th>\n", | |
| " <th></th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>B003ES5ZUU</th>\n", | |
| " <td>4.635203</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>B001TUYTZM</th>\n", | |
| " <td>4.572009</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>B000N99BBC</th>\n", | |
| " <td>4.552072</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>B0052SCU8U</th>\n", | |
| " <td>4.540869</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>B00005LEN4</th>\n", | |
| " <td>4.540526</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " A105TOJ6LTVMBG\n", | |
| "productId \n", | |
| "B003ES5ZUU 4.635203\n", | |
| "B001TUYTZM 4.572009\n", | |
| "B000N99BBC 4.552072\n", | |
| "B0052SCU8U 4.540869\n", | |
| "B00005LEN4 4.540526" | |
| ] | |
| }, | |
| "execution_count": 34, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "product_rec('A105TOJ6LTVMBG',5)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 36, | |
| "id": "fbb7f764", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>A10AFVU66A79Y1</th>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>productId</th>\n", | |
| " <th></th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>B002WE6D44</th>\n", | |
| " <td>4.823729</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>B0052SCU8U</th>\n", | |
| " <td>4.812853</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>B0000BZL1P</th>\n", | |
| " <td>4.790149</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>B00BQ4F9ZA</th>\n", | |
| " <td>4.737074</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>B001TH7GUU</th>\n", | |
| " <td>4.732855</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " A10AFVU66A79Y1\n", | |
| "productId \n", | |
| "B002WE6D44 4.823729\n", | |
| "B0052SCU8U 4.812853\n", | |
| "B0000BZL1P 4.790149\n", | |
| "B00BQ4F9ZA 4.737074\n", | |
| "B001TH7GUU 4.732855" | |
| ] | |
| }, | |
| "execution_count": 36, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "product_rec('A10AFVU66A79Y1',5)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 37, | |
| "id": "6cfc55d4", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>AZNUHQSHZHSUE</th>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>productId</th>\n", | |
| " <th></th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>B003L1ZYZ6</th>\n", | |
| " <td>4.774907</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>B00DTZYHX4</th>\n", | |
| " <td>4.773021</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>B000053HC5</th>\n", | |
| " <td>4.771061</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>B0000BZL1P</th>\n", | |
| " <td>4.767102</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>B001TH7GUU</th>\n", | |
| " <td>4.757887</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " AZNUHQSHZHSUE\n", | |
| "productId \n", | |
| "B003L1ZYZ6 4.774907\n", | |
| "B00DTZYHX4 4.773021\n", | |
| "B000053HC5 4.771061\n", | |
| "B0000BZL1P 4.767102\n", | |
| "B001TH7GUU 4.757887" | |
| ] | |
| }, | |
| "execution_count": 37, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "product_rec('AZNUHQSHZHSUE',5)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 38, | |
| "id": "5aa36893", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<style scoped>\n", | |
| " .dataframe tbody tr th:only-of-type {\n", | |
| " vertical-align: middle;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe tbody tr th {\n", | |
| " vertical-align: top;\n", | |
| " }\n", | |
| "\n", | |
| " .dataframe thead th {\n", | |
| " text-align: right;\n", | |
| " }\n", | |
| "</style>\n", | |
| "<table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th>productId</th>\n", | |
| " <th>0594451647</th>\n", | |
| " <th>0594481813</th>\n", | |
| " <th>0970407998</th>\n", | |
| " <th>0972683275</th>\n", | |
| " <th>1400501466</th>\n", | |
| " <th>1400501520</th>\n", | |
| " <th>1400501776</th>\n", | |
| " <th>1400532620</th>\n", | |
| " <th>1400532655</th>\n", | |
| " <th>140053271X</th>\n", | |
| " <th>...</th>\n", | |
| " <th>B00L5YZCCG</th>\n", | |
| " <th>B00L8I6SFY</th>\n", | |
| " <th>B00L8QCVL6</th>\n", | |
| " <th>B00LA6T0LS</th>\n", | |
| " <th>B00LBZ1Z7K</th>\n", | |
| " <th>B00LED02VY</th>\n", | |
| " <th>B00LGN7Y3G</th>\n", | |
| " <th>B00LGQ6HL8</th>\n", | |
| " <th>B00LI4ZZO8</th>\n", | |
| " <th>B00LKG1MC8</th>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>userId</th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>A100UD67AHFODS</th>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>A100WO06OQR8BQ</th>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>A105S56ODHGJEK</th>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>A105TOJ6LTVMBG</th>\n", | |
| " <td>3.961084</td>\n", | |
| " <td>3.743478</td>\n", | |
| " <td>3.968784</td>\n", | |
| " <td>3.843805</td>\n", | |
| " <td>3.751049</td>\n", | |
| " <td>3.949130</td>\n", | |
| " <td>4.070154</td>\n", | |
| " <td>3.977135</td>\n", | |
| " <td>4.023832</td>\n", | |
| " <td>3.904179</td>\n", | |
| " <td>...</td>\n", | |
| " <td>3.910912</td>\n", | |
| " <td>3.961084</td>\n", | |
| " <td>3.961084</td>\n", | |
| " <td>3.833987</td>\n", | |
| " <td>3.961084</td>\n", | |
| " <td>3.958425</td>\n", | |
| " <td>3.961084</td>\n", | |
| " <td>4.021761</td>\n", | |
| " <td>4.095685</td>\n", | |
| " <td>4.062524</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>A10AFVU66A79Y1</th>\n", | |
| " <td>4.212932</td>\n", | |
| " <td>4.229150</td>\n", | |
| " <td>4.073252</td>\n", | |
| " <td>4.317804</td>\n", | |
| " <td>3.951347</td>\n", | |
| " <td>4.428924</td>\n", | |
| " <td>4.316332</td>\n", | |
| " <td>4.368245</td>\n", | |
| " <td>4.269673</td>\n", | |
| " <td>3.972014</td>\n", | |
| " <td>...</td>\n", | |
| " <td>4.242122</td>\n", | |
| " <td>4.212932</td>\n", | |
| " <td>4.212932</td>\n", | |
| " <td>4.093958</td>\n", | |
| " <td>4.212932</td>\n", | |
| " <td>4.484619</td>\n", | |
| " <td>4.212932</td>\n", | |
| " <td>4.300889</td>\n", | |
| " <td>4.277161</td>\n", | |
| " <td>4.453578</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>...</th>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " <td>...</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>AZBXKUH4AIW3X</th>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>AZCE11PSTCH1L</th>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>AZMY6E8B52L2T</th>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>AZNUHQSHZHSUE</th>\n", | |
| " <td>4.175817</td>\n", | |
| " <td>4.282747</td>\n", | |
| " <td>3.960530</td>\n", | |
| " <td>4.283048</td>\n", | |
| " <td>3.928413</td>\n", | |
| " <td>4.234730</td>\n", | |
| " <td>4.122535</td>\n", | |
| " <td>4.076139</td>\n", | |
| " <td>3.949735</td>\n", | |
| " <td>4.082643</td>\n", | |
| " <td>...</td>\n", | |
| " <td>3.893137</td>\n", | |
| " <td>4.175817</td>\n", | |
| " <td>4.175817</td>\n", | |
| " <td>4.001993</td>\n", | |
| " <td>4.175817</td>\n", | |
| " <td>4.214200</td>\n", | |
| " <td>4.175817</td>\n", | |
| " <td>4.385163</td>\n", | |
| " <td>4.071431</td>\n", | |
| " <td>4.144396</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>AZOK5STV85FBJ</th>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " <td>0.000000</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "<p>1540 rows × 48190 columns</p>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| "productId 0594451647 0594481813 0970407998 0972683275 1400501466 \\\n", | |
| "userId \n", | |
| "A100UD67AHFODS 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "A100WO06OQR8BQ 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "A105S56ODHGJEK 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "A105TOJ6LTVMBG 3.961084 3.743478 3.968784 3.843805 3.751049 \n", | |
| "A10AFVU66A79Y1 4.212932 4.229150 4.073252 4.317804 3.951347 \n", | |
| "... ... ... ... ... ... \n", | |
| "AZBXKUH4AIW3X 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "AZCE11PSTCH1L 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "AZMY6E8B52L2T 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "AZNUHQSHZHSUE 4.175817 4.282747 3.960530 4.283048 3.928413 \n", | |
| "AZOK5STV85FBJ 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "\n", | |
| "productId 1400501520 1400501776 1400532620 1400532655 140053271X \\\n", | |
| "userId \n", | |
| "A100UD67AHFODS 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "A100WO06OQR8BQ 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "A105S56ODHGJEK 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "A105TOJ6LTVMBG 3.949130 4.070154 3.977135 4.023832 3.904179 \n", | |
| "A10AFVU66A79Y1 4.428924 4.316332 4.368245 4.269673 3.972014 \n", | |
| "... ... ... ... ... ... \n", | |
| "AZBXKUH4AIW3X 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "AZCE11PSTCH1L 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "AZMY6E8B52L2T 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "AZNUHQSHZHSUE 4.234730 4.122535 4.076139 3.949735 4.082643 \n", | |
| "AZOK5STV85FBJ 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "\n", | |
| "productId ... B00L5YZCCG B00L8I6SFY B00L8QCVL6 B00LA6T0LS \\\n", | |
| "userId ... \n", | |
| "A100UD67AHFODS ... 0.000000 0.000000 0.000000 0.000000 \n", | |
| "A100WO06OQR8BQ ... 0.000000 0.000000 0.000000 0.000000 \n", | |
| "A105S56ODHGJEK ... 0.000000 0.000000 0.000000 0.000000 \n", | |
| "A105TOJ6LTVMBG ... 3.910912 3.961084 3.961084 3.833987 \n", | |
| "A10AFVU66A79Y1 ... 4.242122 4.212932 4.212932 4.093958 \n", | |
| "... ... ... ... ... ... \n", | |
| "AZBXKUH4AIW3X ... 0.000000 0.000000 0.000000 0.000000 \n", | |
| "AZCE11PSTCH1L ... 0.000000 0.000000 0.000000 0.000000 \n", | |
| "AZMY6E8B52L2T ... 0.000000 0.000000 0.000000 0.000000 \n", | |
| "AZNUHQSHZHSUE ... 3.893137 4.175817 4.175817 4.001993 \n", | |
| "AZOK5STV85FBJ ... 0.000000 0.000000 0.000000 0.000000 \n", | |
| "\n", | |
| "productId B00LBZ1Z7K B00LED02VY B00LGN7Y3G B00LGQ6HL8 B00LI4ZZO8 \\\n", | |
| "userId \n", | |
| "A100UD67AHFODS 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "A100WO06OQR8BQ 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "A105S56ODHGJEK 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "A105TOJ6LTVMBG 3.961084 3.958425 3.961084 4.021761 4.095685 \n", | |
| "A10AFVU66A79Y1 4.212932 4.484619 4.212932 4.300889 4.277161 \n", | |
| "... ... ... ... ... ... \n", | |
| "AZBXKUH4AIW3X 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "AZCE11PSTCH1L 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "AZMY6E8B52L2T 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "AZNUHQSHZHSUE 4.175817 4.214200 4.175817 4.385163 4.071431 \n", | |
| "AZOK5STV85FBJ 0.000000 0.000000 0.000000 0.000000 0.000000 \n", | |
| "\n", | |
| "productId B00LKG1MC8 \n", | |
| "userId \n", | |
| "A100UD67AHFODS 0.000000 \n", | |
| "A100WO06OQR8BQ 0.000000 \n", | |
| "A105S56ODHGJEK 0.000000 \n", | |
| "A105TOJ6LTVMBG 4.062524 \n", | |
| "A10AFVU66A79Y1 4.453578 \n", | |
| "... ... \n", | |
| "AZBXKUH4AIW3X 0.000000 \n", | |
| "AZCE11PSTCH1L 0.000000 \n", | |
| "AZMY6E8B52L2T 0.000000 \n", | |
| "AZNUHQSHZHSUE 4.144396 \n", | |
| "AZOK5STV85FBJ 0.000000 \n", | |
| "\n", | |
| "[1540 rows x 48190 columns]" | |
| ] | |
| }, | |
| "execution_count": 38, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "#displaying the dataframe showing predicted ratings for all products for a given user\n", | |
| "pivot_dfs" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "a158484a", | |
| "metadata": {}, | |
| "source": [ | |
| "### H. Summary and Insights" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "2a1d42bf", | |
| "metadata": {}, | |
| "source": [ | |
| "- Popularity based recommendation systems does not provide any personalized recommendation and provides same recomendations to all users.\n", | |
| "\n", | |
| "- Collaborative models provides an estimated rating for products based on previously rated products using SVD, using surprise library makes it easy and less computational expensive running through all the products at once.\n", | |
| "- It can be further applied based on categorical grouping of products and this will make it even less computationaly expensive and more focused on browsed product categories." | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.8.8" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 5 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment