deebuls · November 22, 2022 19:29
diff --git a/fomo.ipynb b/fomo.ipynb
 {
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "authorship_tag": "ABX9TyOjR21sSY9zNjU5RgfDxWBE",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/deebuls/3796d4b8f2bd2c1324407aa31b4d03ff/fomo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Faster Objects More Objects aka FOMO\n",
        "## Pytorch implementation \n",
        "\n",
        "FOMO introduced by [Edge Impulse](https://peter-ing.medium.com/introducing-faster-objects-more-objects-aka-fomo-3ce1c4ce2e3a) is actually rebranded architecture callen bnn which was intially developed by Mat Palm and explained in the [blog](http://matpalm.com/blog/counting_bees/). The tensorflow code was made available in [github](https://github.com/matpalm/bnn).\n",
        "\n",
        "The architecture diagram of FOMO/BNN is describe sa shown below : \n",
        "![](http://matpalm.com/blog/imgs/2018/bnn/network.png)\n",
        "\n",
        "Here I try to convert the above diagram into a pytorch model. Hoep it helps anyone looking to deploy the FOMO model in real world."
      ],
      "metadata": {
        "id": "ELS0uQOvwQUT"
      }
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "s5Sfx7MdykSO"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Model Description as per Mat Palm\n",
        "\n",
        "```\n",
        "the model the architecture of the network is a very vanilla u-net.\n",
        "\n",
        "a fully convolutional network trained on half resolution patches but run\n",
        " against full resolution images encoding is a sequence of 4 3x3 convolutions\n",
        "  with stride 2 decoding is a sequence of nearest neighbours resizes + 3x3 \n",
        "  convolution (stride 1) + skip connection from the encoders final layer is a \n",
        "  1x1 convolution (stride 1) with sigmoid activation (i.e. binary bee / no bee\n",
        "   choice per pixel) after some emperical experiments i chose to only decode \n",
        "   back to half the resolution of the input. it was good enough.\n",
        "\n",
        "i did the decoding using a nearest neighbour resize instead of a deconvolution \n",
        "pretty much out of habit.\n",
        "```"
      ],
      "metadata": {
        "id": "gZyvlY2Kyk0G"
      }
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "id": "j_VnAOzGGm9Q"
      },
      "outputs": [],
      "source": [
        "import torch\n",
        "import torch.nn as nn"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#ToDo Questions\n",
        "# how is the padding working should it be the paper is same but we are doing zero\n",
        "# upsampling what mode should it be \n",
        "\n",
        "class FOMO(torch.nn.Module):\n",
        "    def __init__(self):\n",
        "        super(FOMO, self).__init__()\n",
        "\n",
        "        #Reduction\n",
        "        #3x3 conv stride 2  with 4 out channel \n",
        "        self.conv1 = torch.nn.Conv2d(in_channels=3, out_channels=4, kernel_size=3, stride=2, padding=(1,1))\n",
        "        #3x3 conv stride 2 with 8 out channel \n",
        "        self.conv2 = torch.nn.Conv2d(in_channels=4, out_channels=8, kernel_size=3, stride=2, padding=(1,1))\n",
        "        #3x3 conv stride 2 with 16 out channel \n",
        "        self.conv3 = torch.nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, stride=2, padding=(1,1))\n",
        "        #3x3 conv stride 2 with 32 out channel \n",
        "        self.conv4 = torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=2, padding=(1,1))\n",
        "        #3x3 conv stride 1 with 16 out channel \n",
        "        self.conv5 = torch.nn.Conv2d(in_channels=32, out_channels=16, kernel_size=3, stride=1, padding='same')\n",
        "\n",
        "        self.upsample = torch.nn.Upsample(scale_factor=2, mode='bilinear')\n",
        "\n",
        "        #Increasing\n",
        "        self.conv6 = torch.nn.Conv2d(in_channels=32, out_channels=8, kernel_size=3, stride=1, padding='same')\n",
        "        self.conv7 = torch.nn.Conv2d(in_channels=16, out_channels=4, kernel_size=3, stride=1, padding='same')\n",
        "        self.conv8 = torch.nn.Conv2d(in_channels=8, out_channels=1, kernel_size=1, stride=1, padding='same')\n",
        "\n",
        "        \n",
        "\n",
        "        \n",
        "\n",
        "    def forward(self, x):\n",
        "        \n",
        "        #Downsample\n",
        "        out1 = self.conv1(x)\n",
        "        out2 = self.conv2(out1)\n",
        "        out3 = self.conv3(out2)\n",
        "\n",
        "        output = self.conv4(out3)\n",
        "        \n",
        "        output = self.upsample(output)\n",
        "        output = self.conv5(output)\n",
        "        output = torch.concat(( output, out3), dim=1)\n",
        "        output = self.upsample(output)\n",
        "        output = self.conv6(output)\n",
        "        output = torch.concat(( output, out2), dim=1)\n",
        "        output = self.upsample(output)\n",
        "        output = self.conv7(output)\n",
        "        output = torch.concat(( output, out1), dim=1)\n",
        "        output = self.conv8(output)\n",
        "\n",
        "        return output\n",
        "\n",
        "        \n"
      ],
      "metadata": {
        "id": "RPJHjmLLGpRF"
      },
      "execution_count": 9,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "model = FOMO()\n",
        "print (model)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "6tzoJPtuJ4Y4",
        "outputId": "41e106cb-0832-48f8-ddd2-523f9780e464"
      },
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "FOMO(\n",
            "  (conv1): Conv2d(3, 4, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))\n",
            "  (conv2): Conv2d(4, 8, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))\n",
            "  (conv3): Conv2d(8, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))\n",
            "  (conv4): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))\n",
            "  (conv5): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)\n",
            "  (upsample): Upsample(scale_factor=2.0, mode=bilinear)\n",
            "  (conv6): Conv2d(32, 8, kernel_size=(3, 3), stride=(1, 1), padding=same)\n",
            "  (conv7): Conv2d(16, 4, kernel_size=(3, 3), stride=(1, 1), padding=same)\n",
            "  (conv8): Conv2d(8, 1, kernel_size=(1, 1), stride=(1, 1), padding=same)\n",
            ")\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "x = torch.randn(1, 3, 512, 384)\n",
        "y = model(x)\n",
        "print (y.shape)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "cMlg8AR0VlR-",
        "outputId": "568cedcb-1a23-4fac-8bfd-372b973df3f0"
      },
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "torch.Size([1, 1, 256, 192])\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "%timeit y=model(x)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "SIVQaqD9hPul",
        "outputId": "05b019e1-0645-433b-ef31-f72f67a80d22"
      },
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "19.8 ms ± 2.77 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "TcWExgJLz0Ue"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## ToDo train on a dataset \n",
        "\n"
      ],
      "metadata": {
        "id": "uLux4Q-Y0Q1o"
      }
    }
  ]
 }
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"provenance": [],
	"authorship_tag": "ABX9TyOjR21sSY9zNjU5RgfDxWBE",
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	},
	"language_info": {
	"name": "python"
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/deebuls/3796d4b8f2bd2c1324407aa31b4d03ff/fomo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"source": [
	"# Faster Objects More Objects aka FOMO\n",
	"## Pytorch implementation \n",
	"\n",
	"FOMO introduced by [Edge Impulse](https://peter-ing.medium.com/introducing-faster-objects-more-objects-aka-fomo-3ce1c4ce2e3a) is actually rebranded architecture callen bnn which was intially developed by Mat Palm and explained in the [blog](http://matpalm.com/blog/counting_bees/). The tensorflow code was made available in [github](https://github.com/matpalm/bnn).\n",
	"\n",
	"The architecture diagram of FOMO/BNN is describe sa shown below : \n",
	"![](http://matpalm.com/blog/imgs/2018/bnn/network.png)\n",
	"\n",
	"Here I try to convert the above diagram into a pytorch model. Hoep it helps anyone looking to deploy the FOMO model in real world."
	],
	"metadata": {
	"id": "ELS0uQOvwQUT"
	}
	},
	{
	"cell_type": "code",
	"source": [],
	"metadata": {
	"id": "s5Sfx7MdykSO"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"## Model Description as per Mat Palm\n",
	"\n",
	"```\n",
	"the model the architecture of the network is a very vanilla u-net.\n",
	"\n",
	"a fully convolutional network trained on half resolution patches but run\n",
	" against full resolution images encoding is a sequence of 4 3x3 convolutions\n",
	" with stride 2 decoding is a sequence of nearest neighbours resizes + 3x3 \n",
	" convolution (stride 1) + skip connection from the encoders final layer is a \n",
	" 1x1 convolution (stride 1) with sigmoid activation (i.e. binary bee / no bee\n",
	" choice per pixel) after some emperical experiments i chose to only decode \n",
	" back to half the resolution of the input. it was good enough.\n",
	"\n",
	"i did the decoding using a nearest neighbour resize instead of a deconvolution \n",
	"pretty much out of habit.\n",
	"```"
	],
	"metadata": {
	"id": "gZyvlY2Kyk0G"
	}
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {
	"id": "j_VnAOzGGm9Q"
	},
	"outputs": [],
	"source": [
	"import torch\n",
	"import torch.nn as nn"
	]
	},
	{
	"cell_type": "code",
	"source": [
	"#ToDo Questions\n",
	"# how is the padding working should it be the paper is same but we are doing zero\n",
	"# upsampling what mode should it be \n",
	"\n",
	"class FOMO(torch.nn.Module):\n",
	" def __init__(self):\n",
	" super(FOMO, self).__init__()\n",
	"\n",
	" #Reduction\n",
	" #3x3 conv stride 2 with 4 out channel \n",
	" self.conv1 = torch.nn.Conv2d(in_channels=3, out_channels=4, kernel_size=3, stride=2, padding=(1,1))\n",
	" #3x3 conv stride 2 with 8 out channel \n",
	" self.conv2 = torch.nn.Conv2d(in_channels=4, out_channels=8, kernel_size=3, stride=2, padding=(1,1))\n",
	" #3x3 conv stride 2 with 16 out channel \n",
	" self.conv3 = torch.nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, stride=2, padding=(1,1))\n",
	" #3x3 conv stride 2 with 32 out channel \n",
	" self.conv4 = torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=2, padding=(1,1))\n",
	" #3x3 conv stride 1 with 16 out channel \n",
	" self.conv5 = torch.nn.Conv2d(in_channels=32, out_channels=16, kernel_size=3, stride=1, padding='same')\n",
	"\n",
	" self.upsample = torch.nn.Upsample(scale_factor=2, mode='bilinear')\n",
	"\n",
	" #Increasing\n",
	" self.conv6 = torch.nn.Conv2d(in_channels=32, out_channels=8, kernel_size=3, stride=1, padding='same')\n",
	" self.conv7 = torch.nn.Conv2d(in_channels=16, out_channels=4, kernel_size=3, stride=1, padding='same')\n",
	" self.conv8 = torch.nn.Conv2d(in_channels=8, out_channels=1, kernel_size=1, stride=1, padding='same')\n",
	"\n",
	" \n",
	"\n",
	" \n",
	"\n",
	" def forward(self, x):\n",
	" \n",
	" #Downsample\n",
	" out1 = self.conv1(x)\n",
	" out2 = self.conv2(out1)\n",
	" out3 = self.conv3(out2)\n",
	"\n",
	" output = self.conv4(out3)\n",
	" \n",
	" output = self.upsample(output)\n",
	" output = self.conv5(output)\n",
	" output = torch.concat(( output, out3), dim=1)\n",
	" output = self.upsample(output)\n",
	" output = self.conv6(output)\n",
	" output = torch.concat(( output, out2), dim=1)\n",
	" output = self.upsample(output)\n",
	" output = self.conv7(output)\n",
	" output = torch.concat(( output, out1), dim=1)\n",
	" output = self.conv8(output)\n",
	"\n",
	" return output\n",
	"\n",
	" \n"
	],
	"metadata": {
	"id": "RPJHjmLLGpRF"
	},
	"execution_count": 9,
	"outputs": []
	},
	{
	"cell_type": "code",
	"source": [
	"model = FOMO()\n",
	"print (model)"
	],
	"metadata": {
	"colab": {
	"base_uri": "https://localhost:8080/"
	},
	"id": "6tzoJPtuJ4Y4",
	"outputId": "41e106cb-0832-48f8-ddd2-523f9780e464"
	},
	"execution_count": 10,
	"outputs": [
	{
	"output_type": "stream",
	"name": "stdout",
	"text": [
	"FOMO(\n",
	" (conv1): Conv2d(3, 4, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))\n",
	" (conv2): Conv2d(4, 8, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))\n",
	" (conv3): Conv2d(8, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))\n",
	" (conv4): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))\n",
	" (conv5): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)\n",
	" (upsample): Upsample(scale_factor=2.0, mode=bilinear)\n",
	" (conv6): Conv2d(32, 8, kernel_size=(3, 3), stride=(1, 1), padding=same)\n",
	" (conv7): Conv2d(16, 4, kernel_size=(3, 3), stride=(1, 1), padding=same)\n",
	" (conv8): Conv2d(8, 1, kernel_size=(1, 1), stride=(1, 1), padding=same)\n",
	")\n"
	]
	}
	]
	},
	{
	"cell_type": "code",
	"source": [
	"x = torch.randn(1, 3, 512, 384)\n",
	"y = model(x)\n",
	"print (y.shape)"
	],
	"metadata": {
	"colab": {
	"base_uri": "https://localhost:8080/"
	},
	"id": "cMlg8AR0VlR-",
	"outputId": "568cedcb-1a23-4fac-8bfd-372b973df3f0"
	},
	"execution_count": 11,
	"outputs": [
	{
	"output_type": "stream",
	"name": "stdout",
	"text": [
	"torch.Size([1, 1, 256, 192])\n"
	]
	}
	]
	},
	{
	"cell_type": "code",
	"source": [
	"%timeit y=model(x)"
	],
	"metadata": {
	"colab": {
	"base_uri": "https://localhost:8080/"
	},
	"id": "SIVQaqD9hPul",
	"outputId": "05b019e1-0645-433b-ef31-f72f67a80d22"
	},
	"execution_count": 7,
	"outputs": [
	{
	"output_type": "stream",
	"name": "stdout",
	"text": [
	"19.8 ms ± 2.77 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
	]
	}
	]
	},
	{
	"cell_type": "code",
	"source": [],
	"metadata": {
	"id": "TcWExgJLz0Ue"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"## ToDo train on a dataset \n",
	"\n"
	],
	"metadata": {
	"id": "uLux4Q-Y0Q1o"
	}
	}
	]
	}
No results found