Created
March 9, 2022 13:03
-
-
Save ngupta23/891c6d4e17df7fe538008eb34a99c044 to your computer and use it in GitHub Desktop.
pycaret_ts_preprocesing.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "nbformat": 4, | |
| "nbformat_minor": 0, | |
| "metadata": { | |
| "colab": { | |
| "name": "pycaret_ts_preprocesing.ipynb", | |
| "provenance": [], | |
| "collapsed_sections": [], | |
| "include_colab_link": true | |
| }, | |
| "kernelspec": { | |
| "name": "python3", | |
| "display_name": "Python 3" | |
| }, | |
| "language_info": { | |
| "name": "python" | |
| } | |
| }, | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "view-in-github", | |
| "colab_type": "text" | |
| }, | |
| "source": [ | |
| "<a href=\"https://colab.research.google.com/gist/ngupta23/891c6d4e17df7fe538008eb34a99c044/sktime_preprocesing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "# Installation" | |
| ], | |
| "metadata": { | |
| "id": "AS6S5gWAK5Cd" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "iqlsAF56kHx9", | |
| "outputId": "0ecf564e-2d0b-4a15-ec90-3e032c4187be" | |
| }, | |
| "source": [ | |
| "!pip install sktime\n", | |
| "!pip install pmdarima" | |
| ], | |
| "execution_count": null, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stdout", | |
| "text": [ | |
| "Requirement already satisfied: sktime in /usr/local/lib/python3.7/dist-packages (0.10.1)\n", | |
| "Requirement already satisfied: numba>=0.53 in /usr/local/lib/python3.7/dist-packages (from sktime) (0.55.1)\n", | |
| "Requirement already satisfied: scipy<1.8.0 in /usr/local/lib/python3.7/dist-packages (from sktime) (1.4.1)\n", | |
| "Requirement already satisfied: statsmodels>=0.12.1 in /usr/local/lib/python3.7/dist-packages (from sktime) (0.13.2)\n", | |
| "Requirement already satisfied: pandas<1.5.0,>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from sktime) (1.3.5)\n", | |
| "Requirement already satisfied: deprecated>=1.2.13 in /usr/local/lib/python3.7/dist-packages (from sktime) (1.2.13)\n", | |
| "Requirement already satisfied: scikit-learn>=0.24.0 in /usr/local/lib/python3.7/dist-packages (from sktime) (1.0.2)\n", | |
| "Requirement already satisfied: numpy<1.22,>=1.21.0 in /usr/local/lib/python3.7/dist-packages (from sktime) (1.21.5)\n", | |
| "Requirement already satisfied: wrapt<2,>=1.10 in /usr/local/lib/python3.7/dist-packages (from deprecated>=1.2.13->sktime) (1.13.3)\n", | |
| "Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from numba>=0.53->sktime) (57.4.0)\n", | |
| "Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in /usr/local/lib/python3.7/dist-packages (from numba>=0.53->sktime) (0.38.0)\n", | |
| "Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas<1.5.0,>=1.1.0->sktime) (2018.9)\n", | |
| "Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas<1.5.0,>=1.1.0->sktime) (2.8.2)\n", | |
| "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas<1.5.0,>=1.1.0->sktime) (1.15.0)\n", | |
| "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.24.0->sktime) (3.1.0)\n", | |
| "Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.24.0->sktime) (1.1.0)\n", | |
| "Requirement already satisfied: packaging>=21.3 in /usr/local/lib/python3.7/dist-packages (from statsmodels>=0.12.1->sktime) (21.3)\n", | |
| "Requirement already satisfied: patsy>=0.5.2 in /usr/local/lib/python3.7/dist-packages (from statsmodels>=0.12.1->sktime) (0.5.2)\n", | |
| "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=21.3->statsmodels>=0.12.1->sktime) (3.0.7)\n", | |
| "Requirement already satisfied: pmdarima in /usr/local/lib/python3.7/dist-packages (1.8.5)\n", | |
| "Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (1.4.1)\n", | |
| "Requirement already satisfied: pandas>=0.19 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (1.3.5)\n", | |
| "Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (1.1.0)\n", | |
| "Requirement already satisfied: statsmodels!=0.12.0,>=0.11 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (0.13.2)\n", | |
| "Requirement already satisfied: setuptools!=50.0.0,>=38.6.0 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (57.4.0)\n", | |
| "Requirement already satisfied: Cython!=0.29.18,>=0.29 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (0.29.28)\n", | |
| "Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (1.24.3)\n", | |
| "Requirement already satisfied: numpy>=1.19.3 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (1.21.5)\n", | |
| "Requirement already satisfied: scikit-learn>=0.22 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (1.0.2)\n", | |
| "Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.19->pmdarima) (2.8.2)\n", | |
| "Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.19->pmdarima) (2018.9)\n", | |
| "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas>=0.19->pmdarima) (1.15.0)\n", | |
| "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.22->pmdarima) (3.1.0)\n", | |
| "Requirement already satisfied: packaging>=21.3 in /usr/local/lib/python3.7/dist-packages (from statsmodels!=0.12.0,>=0.11->pmdarima) (21.3)\n", | |
| "Requirement already satisfied: patsy>=0.5.2 in /usr/local/lib/python3.7/dist-packages (from statsmodels!=0.12.0,>=0.11->pmdarima) (0.5.2)\n", | |
| "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=21.3->statsmodels!=0.12.0,>=0.11->pmdarima) (3.0.7)\n" | |
| ] | |
| } | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "# Basic Example (sktime)" | |
| ], | |
| "metadata": { | |
| "id": "DXPMkmuTGWzG" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "metadata": { | |
| "id": "bVWedmaakob2", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "outputId": "717adcbb-4325-4d01-fcc5-03be1170e880" | |
| }, | |
| "source": [ | |
| "#### Step 1: Load data and simulate missing value ----\n", | |
| "import numpy as np\n", | |
| "from sktime.datasets import load_airline\n", | |
| "y = load_airline()\n", | |
| "y[2:10] = np.nan\n", | |
| "y[70:80] = np.nan" | |
| ], | |
| "execution_count": null, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stderr", | |
| "text": [ | |
| "/usr/local/lib/python3.7/dist-packages/pandas/core/series.py:1070: FutureWarning: Slicing a positional slice with .loc is not supported, and will raise TypeError in a future version. Use .loc with labels or .iloc with positions instead.\n", | |
| " self.loc[key] = value\n" | |
| ] | |
| } | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "metadata": { | |
| "id": "EjsWc5TYkund" | |
| }, | |
| "source": [ | |
| "#### Step 2: Create pipeline with preprocessing ----\n", | |
| "from sktime.forecasting.compose import ForecastingPipeline\n", | |
| "from sktime.transformations.series.impute import Imputer\n", | |
| "from sktime.transformations.series.boxcox import LogTransformer\n", | |
| "from sktime.forecasting.compose import TransformedTargetForecaster\n", | |
| "from sktime.transformations.series.detrend import Deseasonalizer\n", | |
| "from sktime.forecasting.arima import ARIMA\n", | |
| "\n", | |
| "# Preprocessing here works only on the y-values\n", | |
| "forecaster = TransformedTargetForecaster(\n", | |
| " [ \n", | |
| " (\"impute\", Imputer()),\n", | |
| " (\"log\", LogTransformer()),\n", | |
| " (\"deseasonalize\", Deseasonalizer(model=\"multiplicative\", sp=12)),\n", | |
| " (\"model\", ARIMA()),\n", | |
| " ]\n", | |
| ")\n", | |
| "\n", | |
| "# Preprocessing here works only on the X values\n", | |
| "pipe = ForecastingPipeline(\n", | |
| " [\n", | |
| " (\"impute\", Imputer()),\n", | |
| " (\"log\", LogTransformer()),\n", | |
| " (\"forecast\", forecaster)\n", | |
| " ]\n", | |
| ")" | |
| ], | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "KVnHO-bJkwBj", | |
| "outputId": "7781db12-a0c0-4d2d-abe2-1eaf571c9393" | |
| }, | |
| "source": [ | |
| "#### Step 3: Train and Predict ----\n", | |
| "pipe.fit(y, X=None, fh=np.arange(1,13))\n", | |
| "pipe.predict(X=None)" | |
| ], | |
| "execution_count": null, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "1961-01 433.542916\n", | |
| "1961-02 420.520173\n", | |
| "1961-03 477.924732\n", | |
| "1961-04 458.315857\n", | |
| "1961-05 457.981903\n", | |
| "1961-06 515.257830\n", | |
| "1961-07 552.394610\n", | |
| "1961-08 551.075341\n", | |
| "1961-09 483.507705\n", | |
| "1961-10 418.166156\n", | |
| "1961-11 369.678009\n", | |
| "1961-12 413.889702\n", | |
| "Freq: M, dtype: float64" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 4 | |
| } | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "# More complex example (for pycaret)" | |
| ], | |
| "metadata": { | |
| "id": "CLFlYY42pHxj" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## Helper Functions (Internal to PyCaret)" | |
| ], | |
| "metadata": { | |
| "id": "2tuHJGzu-uDJ" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "metadata": { | |
| "id": "9e6d4eVNlQ9D" | |
| }, | |
| "source": [ | |
| "# https://www.sktime.org/en/stable/api_reference/auto_generated/sktime.transformations.series.compose.OptionalPassthrough.html\n", | |
| "from sktime.transformations.series.compose import OptionalPassthrough\n" | |
| ], | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "def _get_imputer(impute):\n", | |
| " if impute is None:\n", | |
| " passthrough = True\n", | |
| " method = \"drift\" # Placeholder\n", | |
| " else:\n", | |
| " allowed_values = [\n", | |
| " \"drift\", \"linear\", \"nearest\", \"constant\", \"mean\", \n", | |
| " \"median\", \"backfill\", \"bfill\", \"pad\", \"ffill\", \"random\"\n", | |
| " ]\n", | |
| " if impute not in allowed_values:\n", | |
| " raise ValueError(f\"Impute value '{impute}' not allowed.\")\n", | |
| " passthrough = False\n", | |
| " method = impute\n", | |
| " imputer = OptionalPassthrough(Imputer(method=method), passthrough=passthrough)\n", | |
| " return imputer" | |
| ], | |
| "metadata": { | |
| "id": "2vc2SF8itM1s" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "from sktime.transformations.series.boxcox import BoxCoxTransformer\n", | |
| "from sktime.transformations.series.boxcox import LogTransformer\n", | |
| "from sktime.transformations.series.exponent import SqrtTransformer\n", | |
| "from sktime.transformations.series.exponent import ExponentTransformer\n", | |
| "from sktime.transformations.series.cos import CosineTransformer\n", | |
| "\n", | |
| "def _get_transformer(transform):\n", | |
| " if transform is None:\n", | |
| " transformer = None\n", | |
| " else:\n", | |
| " allowed_values = [\"box-cox\", \"log\", \"sqrt\", \"exp\", \"cos\"]\n", | |
| " if transform not in allowed_values:\n", | |
| " raise ValueError(f\"Transform value '{transform}' not allowed.\")\n", | |
| " passthrough = False\n", | |
| "\n", | |
| " if transform == \"box-cox\":\n", | |
| " transformer = OptionalPassthrough(BoxCoxTransformer(), passthrough=passthrough)\n", | |
| " elif transform == \"log\":\n", | |
| " transformer = OptionalPassthrough(LogTransformer(), passthrough=passthrough)\n", | |
| " elif transform == \"sqrt\":\n", | |
| " transformer = OptionalPassthrough(SqrtTransformer(), passthrough=passthrough)\n", | |
| " elif transform == \"exp\":\n", | |
| " transformer = OptionalPassthrough(ExponentTransformer(), passthrough=passthrough)\n", | |
| " elif transform == \"cos\":\n", | |
| " transformer = OptionalPassthrough(CosineTransformer(), passthrough=passthrough)\n", | |
| "\n", | |
| " return transformer" | |
| ], | |
| "metadata": { | |
| "id": "JLzn8gjct9Mi" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "from sktime.transformations.series.adapt import TabularToSeriesAdaptor\n", | |
| "from sklearn.preprocessing import StandardScaler\n", | |
| "\n", | |
| "def _get_scaler(scale):\n", | |
| " if scale is None:\n", | |
| " scaler = None\n", | |
| " else:\n", | |
| " allowed_values = [\"z-score\"]\n", | |
| " if scale not in allowed_values:\n", | |
| " raise ValueError(f\"Scale value '{scale}' not allowed.\")\n", | |
| " passthrough = False\n", | |
| "\n", | |
| " if scale == \"z-score\":\n", | |
| " scaler = OptionalPassthrough(TabularToSeriesAdaptor(StandardScaler()))\n", | |
| " \n", | |
| " return scaler" | |
| ], | |
| "metadata": { | |
| "id": "BgnDf6OTt_sh" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "def _get_pipe():\n", | |
| " \"\"\"Uses global variable for now. Will be fixed in PyCaret to use Class Attributes \"\"\"\n", | |
| " imputer_target = _get_imputer(impute=imputate_target)\n", | |
| " imputer_exogenous = _get_imputer(impute=imputate_exogenous)\n", | |
| " transformer_target = _get_transformer(transform=transform_target)\n", | |
| " transformer_exogenous = _get_transformer(transform=transform_exogenous)\n", | |
| " scaler_target = _get_scaler(scale=scale_target)\n", | |
| " scaler_exogenous = _get_scaler(scale=scale_exogenous)\n", | |
| "\n", | |
| " # print(imputer_target)\n", | |
| " # print(imputer_exogenous)\n", | |
| " # print(transformer_target)\n", | |
| " # print(transformer_exogenous)\n", | |
| " # print(scaler_target)\n", | |
| " # print(scaler_exogenous)\n", | |
| "\n", | |
| " target_steps = []\n", | |
| " target_steps.extend([(\"imputer\", imputer_target)])\n", | |
| " if transformer_target is not None:\n", | |
| " target_steps.extend([(\"transformer\", transformer_target)])\n", | |
| " if scaler_target is not None:\n", | |
| " target_steps.extend([(\"scaler\", scaler_target)])\n", | |
| " target_steps.extend([(\"model\", model)])\n", | |
| "\n", | |
| " exog_steps = []\n", | |
| "\n", | |
| " exog_steps.extend([(\"imputer\", imputer_exogenous)])\n", | |
| " if transformer_exogenous is not None:\n", | |
| " exog_steps.extend([(\"transformer\", transformer_exogenous)])\n", | |
| " if scaler_exogenous is not None:\n", | |
| " exog_steps.extend([(\"scaler\", scaler_exogenous)])\n", | |
| "\n", | |
| " exog_steps.extend([(\"forecaster\", TransformedTargetForecaster(target_steps))])\n", | |
| "\n", | |
| " from sktime.forecasting.compose import ForecastingPipeline\n", | |
| " pipe = ForecastingPipeline(exog_steps)\n", | |
| " return pipe" | |
| ], | |
| "metadata": { | |
| "id": "o6K3O-pd0B-N" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## User Inputs in Setup\n", | |
| "\n", | |
| "### Imputation\n", | |
| "\n", | |
| "**Proposed Arguments: `imputate_target` and `imputate_exogenous`**\n", | |
| "\n", | |
| "**Options**:\n", | |
| "* None: No imputation is done, else specify any value allowed by sktime\n", | |
| "* \"drift\" : drift/trend values by sktime.PolynomialTrendForecaster()\n", | |
| "* \"linear\" : linear interpolation, by pd.Series.interpolate()\n", | |
| "* \"nearest\" : use nearest value, by pd.Series.interpolate()\n", | |
| "* \"constant\" : same constant value (given in arg value) for all NaN\n", | |
| "* \"mean\" : pd.Series.mean()\n", | |
| "* \"median\" : pd.Series.median()\n", | |
| "* \"backfill\" ot \"bfill\" : adapted from pd.Series.fillna()\n", | |
| "* \"pad\" or \"ffill\" : adapted from pd.Series.fillna()\n", | |
| "* \"random\" : random values between pd.Series.min() and .max()\n", | |
| "* \"forecaster\" : use an sktime Forecaster, given in arg forecaster (TODO: Maybe skip for now)\n", | |
| "\n", | |
| "### Transformation \n", | |
| "\n", | |
| "**Proposed Arguments: `transform_target` and `transform_exogenous`**\n", | |
| "\n", | |
| "**Options**:\n", | |
| "* None\n", | |
| "* \"box-cox\"\n", | |
| "* \"log\"\n", | |
| "* \"sqrt\"\n", | |
| "* \"exp\"\n", | |
| "* \"cos\"\n", | |
| "* NOTE: yeo-johnson is not suppoted by sktime yet\n", | |
| "\n", | |
| "### Scaling\n", | |
| "\n", | |
| "**Proposed Arguments: `scale_target` and `scale_exogenous`**\n", | |
| "\n", | |
| "**Options**:\n", | |
| "* None\n", | |
| "* \"z-score\"\n", | |
| "\n" | |
| ], | |
| "metadata": { | |
| "id": "Fg62t0ND_fzR" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## Exponential Smoothing\n", | |
| "\n", | |
| "NOTE: This does not support missing values, hence will fail without imputation" | |
| ], | |
| "metadata": { | |
| "id": "ogSjUZ6xFNIl" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### Without imputation" | |
| ], | |
| "metadata": { | |
| "id": "cYBabe_6IYYU" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "imputate_target = None\n", | |
| "imputate_exogenous = None\n", | |
| "\n", | |
| "transform_target = None \n", | |
| "transform_exogenous = None\n", | |
| "\n", | |
| "scale_target = None\n", | |
| "scale_exogenous = None\n", | |
| "\n", | |
| "# e.g. From `create_model(\"exp_smooth\")`\n", | |
| "# Does not handle missing data ----\n", | |
| "from sktime.forecasting.exp_smoothing import ExponentialSmoothing\n", | |
| "model = ExponentialSmoothing()\n", | |
| "\n", | |
| "pipe = _get_pipe()\n", | |
| "pipe" | |
| ], | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "sauV03hD_CTJ", | |
| "outputId": "df26f9ec-4406-4400-bfac-c2717908da73" | |
| }, | |
| "execution_count": null, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "ForecastingPipeline(steps=[('imputer',\n", | |
| " OptionalPassthrough(passthrough=True,\n", | |
| " transformer=Imputer())),\n", | |
| " ('forecaster',\n", | |
| " TransformedTargetForecaster(steps=[('imputer',\n", | |
| " OptionalPassthrough(passthrough=True,\n", | |
| " transformer=Imputer())),\n", | |
| " ('model',\n", | |
| " ExponentialSmoothing())]))])" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 10 | |
| } | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "pipe.fit(y)\n", | |
| "predictions = pipe.predict(fh=np.arange(1, 13))\n", | |
| "print(predictions)\n", | |
| "\n", | |
| "from sktime.utils.plotting import plot_series\n", | |
| "_ = plot_series(y, predictions)" | |
| ], | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 525 | |
| }, | |
| "id": "3bVSgbnM-aEV", | |
| "outputId": "3d290b21-0845-49ae-e02c-b55f2ad193fd" | |
| }, | |
| "execution_count": null, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stderr", | |
| "text": [ | |
| "/usr/local/lib/python3.7/dist-packages/statsmodels/tsa/holtwinters/model.py:917: ConvergenceWarning: Optimization failed to converge. Check mle_retvals.\n", | |
| " ConvergenceWarning,\n" | |
| ] | |
| }, | |
| { | |
| "output_type": "stream", | |
| "name": "stdout", | |
| "text": [ | |
| "1961-01 NaN\n", | |
| "1961-02 NaN\n", | |
| "1961-03 NaN\n", | |
| "1961-04 NaN\n", | |
| "1961-05 NaN\n", | |
| "1961-06 NaN\n", | |
| "1961-07 NaN\n", | |
| "1961-08 NaN\n", | |
| "1961-09 NaN\n", | |
| "1961-10 NaN\n", | |
| "1961-11 NaN\n", | |
| "1961-12 NaN\n", | |
| "Freq: M, dtype: float64\n" | |
| ] | |
| }, | |
| { | |
| "output_type": "display_data", | |
| "data": { | |
| "image/png": "\n", | |
| "text/plain": [ | |
| "<Figure size 1152x288 with 1 Axes>" | |
| ] | |
| }, | |
| "metadata": { | |
| "needs_background": "light" | |
| } | |
| } | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "" | |
| ], | |
| "metadata": { | |
| "id": "vd26USbnLJMn" | |
| } | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### With imputation" | |
| ], | |
| "metadata": { | |
| "id": "KfTdF3h7FPI9" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "imputate_target = \"drift\"\n", | |
| "imputate_exogenous = None\n", | |
| "\n", | |
| "transform_target = None \n", | |
| "transform_exogenous = None\n", | |
| "\n", | |
| "scale_target = None\n", | |
| "scale_exogenous = None\n", | |
| "\n", | |
| "# e.g. From `create_model(\"exp_smooth\")`\n", | |
| "# Does not handle missing data ----\n", | |
| "from sktime.forecasting.exp_smoothing import ExponentialSmoothing\n", | |
| "model = ExponentialSmoothing()\n", | |
| "\n", | |
| "pipe = _get_pipe()\n", | |
| "pipe" | |
| ], | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "AjjxhR_fDIJH", | |
| "outputId": "b92516a8-f190-4932-bfce-05c187bf4739" | |
| }, | |
| "execution_count": null, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "ForecastingPipeline(steps=[('imputer',\n", | |
| " OptionalPassthrough(passthrough=True,\n", | |
| " transformer=Imputer())),\n", | |
| " ('forecaster',\n", | |
| " TransformedTargetForecaster(steps=[('imputer',\n", | |
| " OptionalPassthrough(transformer=Imputer())),\n", | |
| " ('model',\n", | |
| " ExponentialSmoothing())]))])" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 12 | |
| } | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "pipe.fit(y)\n", | |
| "predictions = pipe.predict(fh=np.arange(1, 13))\n", | |
| "print(predictions)\n", | |
| "\n", | |
| "from sktime.utils.plotting import plot_series\n", | |
| "_ = plot_series(y, predictions)" | |
| ], | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 491 | |
| }, | |
| "id": "ucwHKoLhFWfP", | |
| "outputId": "ef432b1b-1c40-42b9-e646-37b4536a19fc" | |
| }, | |
| "execution_count": null, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stdout", | |
| "text": [ | |
| "1961-01 431.791781\n", | |
| "1961-02 431.791781\n", | |
| "1961-03 431.791781\n", | |
| "1961-04 431.791781\n", | |
| "1961-05 431.791781\n", | |
| "1961-06 431.791781\n", | |
| "1961-07 431.791781\n", | |
| "1961-08 431.791781\n", | |
| "1961-09 431.791781\n", | |
| "1961-10 431.791781\n", | |
| "1961-11 431.791781\n", | |
| "1961-12 431.791781\n", | |
| "Freq: M, dtype: float64\n" | |
| ] | |
| }, | |
| { | |
| "output_type": "display_data", | |
| "data": { | |
| "image/png": "\n", | |
| "text/plain": [ | |
| "<Figure size 1152x288 with 1 Axes>" | |
| ] | |
| }, | |
| "metadata": { | |
| "needs_background": "light" | |
| } | |
| } | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### With Box-Cox Transformaton" | |
| ], | |
| "metadata": { | |
| "id": "slNDf6TqIhts" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "imputate_target = \"drift\"\n", | |
| "imputate_exogenous = None\n", | |
| "\n", | |
| "transform_target = \"box-cox\" \n", | |
| "transform_exogenous = None\n", | |
| "\n", | |
| "scale_target = None\n", | |
| "scale_exogenous = None\n", | |
| "\n", | |
| "# e.g. From `create_model(\"exp_smooth\")`\n", | |
| "# Does not handle missing data ----\n", | |
| "from sktime.forecasting.exp_smoothing import ExponentialSmoothing\n", | |
| "model = ExponentialSmoothing()\n", | |
| "\n", | |
| "pipe = _get_pipe()\n", | |
| "pipe" | |
| ], | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "FUjuXfGOHnKG", | |
| "outputId": "19d61da7-7cd1-445e-f09e-a246307e8801" | |
| }, | |
| "execution_count": null, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "ForecastingPipeline(steps=[('imputer',\n", | |
| " OptionalPassthrough(passthrough=True,\n", | |
| " transformer=Imputer())),\n", | |
| " ('forecaster',\n", | |
| " TransformedTargetForecaster(steps=[('imputer',\n", | |
| " OptionalPassthrough(transformer=Imputer())),\n", | |
| " ('transformer',\n", | |
| " OptionalPassthrough(transformer=BoxCoxTransformer())),\n", | |
| " ('model',\n", | |
| " ExponentialSmoothing())]))])" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 14 | |
| } | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "pipe.fit(y)\n", | |
| "predictions = pipe.predict(fh=np.arange(1, 13))\n", | |
| "print(predictions)\n", | |
| "\n", | |
| "from sktime.utils.plotting import plot_series\n", | |
| "_ = plot_series(y, predictions)" | |
| ], | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 491 | |
| }, | |
| "id": "FQruszeRIpIN", | |
| "outputId": "86975a13-fd31-4ff6-dafc-0df3b3f79268" | |
| }, | |
| "execution_count": null, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stdout", | |
| "text": [ | |
| "1961-01 431.999999\n", | |
| "1961-02 431.999999\n", | |
| "1961-03 431.999999\n", | |
| "1961-04 431.999999\n", | |
| "1961-05 431.999999\n", | |
| "1961-06 431.999999\n", | |
| "1961-07 431.999999\n", | |
| "1961-08 431.999999\n", | |
| "1961-09 431.999999\n", | |
| "1961-10 431.999999\n", | |
| "1961-11 431.999999\n", | |
| "1961-12 431.999999\n", | |
| "Freq: M, dtype: float64\n" | |
| ] | |
| }, | |
| { | |
| "output_type": "display_data", | |
| "data": { | |
| "image/png": "\n", | |
| "text/plain": [ | |
| "<Figure size 1152x288 with 1 Axes>" | |
| ] | |
| }, | |
| "metadata": { | |
| "needs_background": "light" | |
| } | |
| } | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### With Scaling" | |
| ], | |
| "metadata": { | |
| "id": "inUjhjIQIucs" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "imputate_target = \"drift\"\n", | |
| "imputate_exogenous = None\n", | |
| "\n", | |
| "transform_target = None \n", | |
| "transform_exogenous = None\n", | |
| "\n", | |
| "scale_target = \"z-score\"\n", | |
| "scale_exogenous = None\n", | |
| "\n", | |
| "# e.g. From `create_model(\"exp_smooth\")`\n", | |
| "# Does not handle missing data ----\n", | |
| "from sktime.forecasting.exp_smoothing import ExponentialSmoothing\n", | |
| "model = ExponentialSmoothing()\n", | |
| "\n", | |
| "pipe = _get_pipe()\n", | |
| "pipe" | |
| ], | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "0nw7P4ysIq0M", | |
| "outputId": "205d5a51-ee9f-497e-c9ad-193486c22495" | |
| }, | |
| "execution_count": null, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "ForecastingPipeline(steps=[('imputer',\n", | |
| " OptionalPassthrough(passthrough=True,\n", | |
| " transformer=Imputer())),\n", | |
| " ('forecaster',\n", | |
| " TransformedTargetForecaster(steps=[('imputer',\n", | |
| " OptionalPassthrough(transformer=Imputer())),\n", | |
| " ('scaler',\n", | |
| " OptionalPassthrough(transformer=TabularToSeriesAdaptor(transformer=StandardScaler()))),\n", | |
| " ('model',\n", | |
| " ExponentialSmoothing())]))])" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "execution_count": 16 | |
| } | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "pipe.fit(y)\n", | |
| "predictions = pipe.predict(fh=np.arange(1, 13))\n", | |
| "print(predictions)\n", | |
| "\n", | |
| "from sktime.utils.plotting import plot_series\n", | |
| "_ = plot_series(y, predictions)" | |
| ], | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 491 | |
| }, | |
| "id": "BcAWFxvSI3Y9", | |
| "outputId": "61725534-b2ed-42ac-92b7-1f2021914279" | |
| }, | |
| "execution_count": null, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stdout", | |
| "text": [ | |
| "1961-01 431.999999\n", | |
| "1961-02 431.999999\n", | |
| "1961-03 431.999999\n", | |
| "1961-04 431.999999\n", | |
| "1961-05 431.999999\n", | |
| "1961-06 431.999999\n", | |
| "1961-07 431.999999\n", | |
| "1961-08 431.999999\n", | |
| "1961-09 431.999999\n", | |
| "1961-10 431.999999\n", | |
| "1961-11 431.999999\n", | |
| "1961-12 431.999999\n", | |
| "Freq: M, dtype: float64\n" | |
| ] | |
| }, | |
| { | |
| "output_type": "display_data", | |
| "data": { | |
| "image/png": "\n", | |
| "text/plain": [ | |
| "<Figure size 1152x288 with 1 Axes>" | |
| ] | |
| }, | |
| "metadata": { | |
| "needs_background": "light" | |
| } | |
| } | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "# Open Questions\n", | |
| "\n", | |
| "1. Is everyone OK with argument names to be added to setup (slightly different from regression/classification, but more appropraite naming)?\n", | |
| "2. Should `OptionalPassthrough` with `passthrough=True` be used when `imputate` is not required by user? Same for `transform` and `scale`?\n", | |
| " - Does this offer any advantages in tuning? \n", | |
| " - Does it offer the user additional flexibility when then use the exported model outside pycaret?\n", | |
| "3. What about categorical variables\n", | |
| " - I dont think classical time series will support categorial variables (need to check).\n", | |
| " - But reduced regression may support this (need to check).\n", | |
| " - How will the setup arguments change if this needs to be suppoted in the future? Do we need to plan for it now?\n", | |
| " - Should [TransformerWrapper](https://github.com/pycaret/pycaret/blob/fcd9ad809699ce5a6e2f76a8f5b0ba0be55d02db/pycaret/internal/preprocess/preprocessor.py#L252) be used for the transformers as done in regression/classification? How does this impact the pipeline? Can users still use the models without installing pycaret?" | |
| ], | |
| "metadata": { | |
| "id": "DzX7jOeOLnTM" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "" | |
| ], | |
| "metadata": { | |
| "id": "5WYAY2ZnLn3r" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| } | |
| ] | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment