Last active
April 11, 2024 01:04
-
-
Save MrSnor/f00eda4b6653db7798f99d6a096a1f36 to your computer and use it in GitHub Desktop.
IDSUP programs (chapterwise)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Chapter 5 - Statistics\n", | |
| "\n", | |
| "2 types of statistics - descriptive and inferential" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Descriptive Statistics\n", | |
| "\n", | |
| "- Descriptive statistics is a set of techniques used to summarize and present data.\n", | |
| "- Used to describe the characteristics of the population using sample data.\n", | |
| "- Characteristics can be mean, median, mode, variance, standard deviation, etc.\n", | |
| "- We measure the central tendency, graphical representation, dispersion, percentiles, etc.\n", | |
| "\n", | |
| "## Inferential Statistics\n", | |
| "\n", | |
| "- Inferential statistics is a set of techniques used to make inferences about the population using sample data.\n", | |
| "- Used to test hypotheses and make predictions about the population using sample data.\n", | |
| "\n", | |
| "## Types of Data\n", | |
| "\n", | |
| "- Categorical\n", | |
| "- Numerical\n", | |
| "\n", | |
| "i. Categorical Data : \n", | |
| " a. **Nominal** - Nominal data is data that has no order. \n", | |
| " b. **Ordinal** - Ordinal data is data that has an order. \n", | |
| " c. **Binary** - Binary data is data that has two possible values.\n", | |
| "\n", | |
| "ii. Numerical Data : Continuous, Discrete\n", | |
| " a. **Continuous** - Continuous data is data that has a range of values. \n", | |
| " b. **Discrete** - Discrete data is data that has a finite number of values.\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Measures of central tendency\n", | |
| "\n", | |
| "Mean - The average value of a dataset.\n", | |
| "\n", | |
| "formula = x1 + x2 + x3 + ... + xn / n\n", | |
| "\n", | |
| "- Mean uses the sum of all values divided by the number of values.\n", | |
| "\n", | |
| "- Mean is not affected by outliers." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Example of using mean function\n", | |
| "2.0\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# mean in python\n", | |
| "from typing import List\n", | |
| "def mean(v: List[float]) -> float:\n", | |
| " assert len(v) > 0\n", | |
| " return sum(v) / len(v)\n", | |
| "\n", | |
| "# example of using the function\n", | |
| "print(\"Example of using mean function\")\n", | |
| "a = [1, 2, 3]\n", | |
| "print(mean(a))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Median - The middle most value of a dataset." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 18, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "List of 7 random numbers: [73, 79, 111, 2, 97, 34, 28]\n", | |
| "Sorted list of 7 random numbers: [2, 28, 34, 73, 79, 97, 111]\n", | |
| "73\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Median in python\n", | |
| "from typing import List\n", | |
| "import random\n", | |
| "\n", | |
| "def median(v: List[float]) -> float:\n", | |
| " n = len(v)\n", | |
| " sorted_v = sorted(v) # ordered dataset\n", | |
| " midpoint = n // 2\n", | |
| " if n % 2 == 1:\n", | |
| " return sorted_v[midpoint]\n", | |
| " else:\n", | |
| " after_midpoint = sorted_v[midpoint]\n", | |
| " before_midpoint = sorted_v[midpoint - 1]\n", | |
| " return (after_midpoint + before_midpoint) / 2\n", | |
| " \n", | |
| "# example of using the function using list of 7 random numbers\n", | |
| "random_numbers = [random.randint(1, 165) for _ in range(7)]\n", | |
| "print(\"List of 7 random numbers: \", random_numbers)\n", | |
| "print(\"Sorted list of 7 random numbers: \", sorted(random_numbers))\n", | |
| "print(median(random_numbers))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Mode - The most common value of a dataset." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 22, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Example of using mode function\n", | |
| "[3]\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "from collections import Counter\n", | |
| "\n", | |
| "def mode(v: List[float]) -> List[float]:\n", | |
| " c = Counter(v)\n", | |
| " max_count = max(c.values())\n", | |
| " return [x for x, count in c.items() if count == max_count]\n", | |
| "\n", | |
| "# example of using the mode function\n", | |
| "print(\"Example of using mode function\")\n", | |
| "a = [1, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4]\n", | |
| "print(mode(a))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Quantile - The value that divides a dataset into two equal parts." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 24, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Example of using quantile function\n", | |
| "6\n", | |
| "11\n", | |
| "16\n", | |
| "20\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Example of Quantile in python\n", | |
| "from typing import List\n", | |
| "\n", | |
| "def find_quantile(xs: List[float], p: float) -> float:\n", | |
| " \"\"\"\n", | |
| " xs: a list of numbers\n", | |
| " p: a number between 0 and 1 (e.g - 0.25) (for calculating quantiles)\n", | |
| " \"\"\"\n", | |
| " p_index = p * len(xs)\n", | |
| " if p_index == 0:\n", | |
| " return min(xs)\n", | |
| " elif p_index == len(xs):\n", | |
| " return max(xs)\n", | |
| " else:\n", | |
| " return sorted(xs)[int(p_index)]\n", | |
| " \n", | |
| "# example of using the quantile function\n", | |
| "print(\"Example of using quantile function\")\n", | |
| "# list of 20 numbers\n", | |
| "a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]\n", | |
| "print(find_quantile(a, 0.25))\n", | |
| "print(find_quantile(a, 0.5))\n", | |
| "print(find_quantile(a, 0.75))\n", | |
| "print(find_quantile(a, 1))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Deciles - Divides a dataset into 10 equal parts.\n", | |
| "\n", | |
| "Percentiles - Divides a dataset into 100 equal parts." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 26, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Example of using qth quantile function\n", | |
| "6\n", | |
| "11\n", | |
| "16\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# code for qth quantile in python\n", | |
| "\n", | |
| "def qth_quantile(ds: List[float], q: float) -> float:\n", | |
| " \"\"\"\n", | |
| " xs: a list of numbers\n", | |
| " q: a number between 0 and 1 (e.g - 0.25) (for calculating quantiles)\n", | |
| " \"\"\"\n", | |
| " n = len(ds)\n", | |
| " ods = sorted(ds)\n", | |
| " x_index = int(q * n)\n", | |
| " return ods[x_index]\n", | |
| "\n", | |
| "# This doesn't give exact median\n", | |
| "\n", | |
| "# Example of using the qth quantile function\n", | |
| "print(\"Example of using qth quantile function\")\n", | |
| "# list of 20 numbers\n", | |
| "a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]\n", | |
| "print(qth_quantile(a, 0.25))\n", | |
| "print(qth_quantile(a, 0.5))\n", | |
| "print(qth_quantile(a, 0.75))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Measure of dispersion (spread)\n", | |
| "\n", | |
| "- Statistic measure of dispersion is the difference between the largest and smallest values in a dataset.\n", | |
| "- Describe the spread or variability of a dataset.\n", | |
| "\n", | |
| "It includes -\n", | |
| "\n", | |
| "- Range - The difference between the largest and smallest values in a dataset.\n", | |
| "\n", | |
| "- Inter-quartile range - The difference between the 75th and 25th percentiles.\n", | |
| "\n", | |
| "- Variance - The average of the squared differences from the mean." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 27, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Example of using range function\n", | |
| "19\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# code for range \n", | |
| "\n", | |
| "def range_(ds: List[float]) -> float:\n", | |
| " \"\"\"\n", | |
| " xs: a list of numbers\n", | |
| " \"\"\"\n", | |
| " x_max = max(ds)\n", | |
| " x_min = min(ds)\n", | |
| " return x_max - x_min\n", | |
| "\n", | |
| "# example of using the range function\n", | |
| "print(\"Example of using range function\")\n", | |
| "# list of 20 numbers\n", | |
| "a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]\n", | |
| "print(range_(a))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Plot for y = x^2" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 28, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "image/png": "", | |
| "text/plain": [ | |
| "<Figure size 640x480 with 1 Axes>" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| } | |
| ], | |
| "source": [ | |
| "# Plot for y = x^2\n", | |
| "from matplotlib import pyplot as plt\n", | |
| "\n", | |
| "x_list = [a/10 for a in range(-100, 100)]\n", | |
| "\n", | |
| "y_list = [a**2 for a in x_list]\n", | |
| "\n", | |
| "plt.plot(x_list, y_list)\n", | |
| "plt.show()" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": ".venv", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.12.2" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 2 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment