Skip to content

Instantly share code, notes, and snippets.

@smnatale
Created April 17, 2023 22:29
Show Gist options
  • Select an option

  • Save smnatale/b963aa69a281b8c2905eb986db16f357 to your computer and use it in GitHub Desktop.

Select an option

Save smnatale/b963aa69a281b8c2905eb986db16f357 to your computer and use it in GitHub Desktop.
AI Skin Assignment.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/samnatale3/b963aa69a281b8c2905eb986db16f357/ai-skin-assignment.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# Display information about the NVIDIA GPU being used by this Google Colab notebook"
],
"metadata": {
"id": "cWb5eyAxSQE3"
}
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "b9bUN__RKphq",
"outputId": "4ee2fb52-56de-43f2-a464-ba40ca59631b"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mon Apr 17 16:59:33 2023 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"| | | MIG M. |\n",
"|===============================+======================+======================|\n",
"| 0 Tesla V100-SXM2... Off | 00000000:00:04.0 Off | 0 |\n",
"| N/A 34C P0 24W / 300W | 0MiB / 16384MiB | 0% Default |\n",
"| | | N/A |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: |\n",
"| GPU GI CI PID Type Process name GPU Memory |\n",
"| ID ID Usage |\n",
"|=============================================================================|\n",
"| No running processes found |\n",
"+-----------------------------------------------------------------------------+\n"
]
}
],
"source": [
"!nvidia-smi"
]
},
{
"cell_type": "markdown",
"source": [
"# Import the os module to interact with the filesystem\n",
"\n",
"# Download the ISIC 2020 dataset (JPEG images) and the corresponding ground truth labels\n",
"# These datasets are used for skin lesion classification tasks\n"
],
"metadata": {
"id": "qSS1YpDVSWoJ"
}
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "GCTLEwapnyHE",
"outputId": "8f6d60e5-d762-460e-b120-26265ce29eb2"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"--2023-04-17 16:59:38-- https://isic-challenge-data.s3.amazonaws.com/2020/ISIC_2020_Training_JPEG.zip\n",
"Resolving isic-challenge-data.s3.amazonaws.com (isic-challenge-data.s3.amazonaws.com)... 52.217.49.44, 3.5.25.202, 3.5.29.160, ...\n",
"Connecting to isic-challenge-data.s3.amazonaws.com (isic-challenge-data.s3.amazonaws.com)|52.217.49.44|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 24707698022 (23G) [application/zip]\n",
"Saving to: ‘ISIC_2020_Training_JPEG.zip’\n",
"\n",
"ISIC_2020_Training_ 100%[===================>] 23.01G 33.7MB/s in 12m 57s \n",
"\n",
"2023-04-17 17:12:36 (30.3 MB/s) - ‘ISIC_2020_Training_JPEG.zip’ saved [24707698022/24707698022]\n",
"\n",
"--2023-04-17 17:12:36-- https://isic-challenge-data.s3.amazonaws.com/2020/ISIC_2020_Training_GroundTruth.csv\n",
"Resolving isic-challenge-data.s3.amazonaws.com (isic-challenge-data.s3.amazonaws.com)... 52.217.91.28, 52.217.170.33, 3.5.27.132, ...\n",
"Connecting to isic-challenge-data.s3.amazonaws.com (isic-challenge-data.s3.amazonaws.com)|52.217.91.28|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 2056020 (2.0M) [text/csv]\n",
"Saving to: ‘ISIC_2020_Training_GroundTruth.csv’\n",
"\n",
"ISIC_2020_Training_ 100%[===================>] 1.96M 3.79MB/s in 0.5s \n",
"\n",
"2023-04-17 17:12:37 (3.79 MB/s) - ‘ISIC_2020_Training_GroundTruth.csv’ saved [2056020/2056020]\n",
"\n"
]
}
],
"source": [
"import os\n",
"\n",
"# Download and extract the datasets, deleting after unzipped to create more storage space\n",
"!wget https://isic-challenge-data.s3.amazonaws.com/2020/ISIC_2020_Training_JPEG.zip\n",
"!wget https://isic-challenge-data.s3.amazonaws.com/2020/ISIC_2020_Training_GroundTruth.csv\n"
]
},
{
"cell_type": "markdown",
"source": [
"# Unzip the content into a folder: /content/images\n"
],
"metadata": {
"id": "bLj9xZZrScAB"
}
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "EZCkOhoF2Qro"
},
"outputs": [],
"source": [
"!unzip -q ISIC_2020_Training_JPEG.zip -d /content\n",
"# Zip will unzip into a folder called 'train' we want this to be 'images'\n",
"!mv /content/train /content/images"
]
},
{
"cell_type": "markdown",
"source": [
"# Import dependancies"
],
"metadata": {
"id": "cDMhy9elSh4k"
}
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"id": "A70oFkFOmA7g"
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import os\n",
"import cv2\n",
"from sklearn.model_selection import train_test_split\n",
"import tensorflow as tf\n",
"from tensorflow.keras.preprocessing.image import ImageDataGenerator\n",
"import shutil\n",
"from tensorflow.keras.models import load_model\n",
"from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score\n",
"from tensorflow.keras.applications import DenseNet121, DenseNet201, InceptionV3, NASNetLarge, MobileNetV2\n",
"from google.colab import drive\n",
"import itertools\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from sklearn.metrics import roc_curve, auc\n",
"from sklearn.metrics import confusion_matrix, roc_auc_score, matthews_corrcoef"
]
},
{
"cell_type": "markdown",
"source": [
"# Read and process the ground truth CSV file\n",
"# Balance the dataset by selecting an equal number of benign and malignant images\n",
"# Split the balanced data into train and test sets (80% train, 20% test)\n",
"# Organize the train and test images into their respective benign and malignant folders\n"
],
"metadata": {
"id": "UpflFEhBSwwE"
}
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "p1ggcwNPECZP",
"outputId": "ad95bb49-e3a5-4d07-bccb-1b42cc225a7d"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Starting benign count: 32542\n",
"Starting malignant count: 584\n",
"Minimum count to balance: 584\n",
"\n",
"Train:\n",
" Benign count: 461\n",
" Malignant count: 473\n",
"Test:\n",
" Benign count: 123\n",
" Malignant count: 111\n"
]
}
],
"source": [
"# Read the ground truth CSV\n",
"df = pd.read_csv(\"ISIC_2020_Training_GroundTruth.csv\")\n",
"\n",
"# Count the number of benign and malignant images\n",
"benign_count = sum(df['benign_malignant'] == 'benign')\n",
"malignant_count = sum(df['benign_malignant'] == 'malignant')\n",
"\n",
"print(\"Starting benign count:\", benign_count)\n",
"print(\"Starting malignant count:\", malignant_count)\n",
"\n",
"# Determine the minimum count between the two classes\n",
"min_count = min(benign_count, malignant_count)\n",
"\n",
"print(\"Minimum count to balance:\", min_count);\n",
"\n",
"# Balance the dataset by selecting an equal number of benign and malignant images\n",
"balanced_df = pd.concat([\n",
" df[df['benign_malignant'] == 'benign'].sample(min_count, random_state=42),\n",
" df[df['benign_malignant'] == 'malignant'].sample(min_count, random_state=42)\n",
"])\n",
"\n",
"# Split the balanced data into train and test sets (80% train, 20% test)\n",
"train_df, test_df = train_test_split(balanced_df, test_size=0.2, random_state=42)\n",
"\n",
"# Create folders for train and test images\n",
"for dataset in ['train', 'test']:\n",
" for folder in ['benign', 'malignant']:\n",
" path = f\"/content/{dataset}/{folder}\"\n",
" if not os.path.exists(path):\n",
" os.makedirs(path)\n",
"\n",
"# Move train images to their respective folders\n",
"for index, row in train_df.iterrows():\n",
" image_path = os.path.join(\"/content/images\", row['image_name'] + \".jpg\")\n",
" if os.path.isfile(image_path):\n",
" if row['benign_malignant'] == 'benign':\n",
" shutil.copy(image_path, os.path.join(\"/content/train/benign\", row['image_name'] + \".jpg\"))\n",
" else:\n",
" shutil.copy(image_path, os.path.join(\"/content/train/malignant\", row['image_name'] + \".jpg\"))\n",
"\n",
"# Move test images to their respective folders\n",
"for index, row in test_df.iterrows():\n",
" image_path = os.path.join(\"/content/images\", row['image_name'] + \".jpg\")\n",
" if os.path.isfile(image_path):\n",
" if row['benign_malignant'] == 'benign':\n",
" shutil.copy(image_path, os.path.join(\"/content/test/benign\", row['image_name'] + \".jpg\"))\n",
" else:\n",
" shutil.copy(image_path, os.path.join(\"/content/test/malignant\", row['image_name'] + \".jpg\"))\n",
"\n",
"\n",
"train_benign_count = len(os.listdir(\"/content/train/benign\"))\n",
"train_malignant_count = len(os.listdir(\"/content/train/malignant\"))\n",
"test_benign_count = len(os.listdir(\"/content/test/benign\"))\n",
"test_malignant_count = len(os.listdir(\"/content/test/malignant\"))\n",
"\n",
"print()\n",
"print(\"Train:\")\n",
"print(f\" Benign count: {train_benign_count}\")\n",
"print(f\" Malignant count: {train_malignant_count}\")\n",
"\n",
"print(\"Test:\")\n",
"print(f\" Benign count: {test_benign_count}\")\n",
"print(f\" Malignant count: {test_malignant_count}\")\n"
]
},
{
"cell_type": "markdown",
"source": [
"# Split the train set into train and validation sets (80% train, 20% validation)\n",
"# Create data generators for training, validation, and test sets\n",
"# Configure the generators to resize the images, apply rescaling, and set batch size and class mode\n"
],
"metadata": {
"id": "JHTU8ZabS-IP"
}
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "p6fKaPNaEH-y",
"outputId": "22a2a402-6f34-49a1-b0c8-e355d38ea8c6"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"malignant 385\n",
"benign 362\n",
"Name: benign_malignant, dtype: int64\n",
"Found 748 images belonging to 2 classes.\n",
"Found 186 images belonging to 2 classes.\n",
"Found 234 images belonging to 2 classes.\n"
]
}
],
"source": [
"# Split the train set into train and validation sets (80% train, 20% validation)\n",
"train_df, val_df = train_test_split(train_df, test_size=0.2, random_state=42)\n",
"\n",
"print(train_df['benign_malignant'].value_counts())\n",
"IMG_SIZE = 224\n",
"\n",
"# Create data generators for training, validation and test sets\n",
"train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)\n",
"test_datagen = ImageDataGenerator(rescale=1./255)\n",
"\n",
"train_generator = train_datagen.flow_from_directory(\n",
" directory=\"/content/train\",\n",
" target_size=(IMG_SIZE, IMG_SIZE),\n",
" batch_size=32,\n",
" class_mode='binary',\n",
" subset='training'\n",
")\n",
"\n",
"val_generator = train_datagen.flow_from_directory(\n",
" directory=\"/content/train\",\n",
" target_size=(IMG_SIZE, IMG_SIZE),\n",
" batch_size=32,\n",
" class_mode='binary',\n",
" subset='validation'\n",
")\n",
"\n",
"\n",
"test_generator = test_datagen.flow_from_directory(\n",
" directory=\"/content/test\",\n",
" target_size=(IMG_SIZE, IMG_SIZE),\n",
" batch_size=32,\n",
" class_mode='binary',\n",
" shuffle=False # Keep the order of the predictions for ensemble later\n",
")"
]
},
{
"cell_type": "markdown",
"source": [
"# Mount Google Drive to save trained models so they can be used in multiple instances.\n",
"\n",
"# Google Collab has a max of 12hrs"
],
"metadata": {
"id": "Ej_WEMIYS_1E"
}
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "1ttbej5DZsL4",
"outputId": "46644049-82b2-453d-b03e-40b8cab09366"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount(\"/content/gdrive\", force_remount=True).\n"
]
}
],
"source": [
"drive.mount('/content/gdrive')\n",
"\n",
"model_save_path = \"/content/gdrive/MyDrive/saved_models\"\n",
"if not os.path.exists(model_save_path):\n",
" os.makedirs(model_save_path)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LC015stqqM0g"
},
"source": [
"# Define the list of model architectures to train\n",
"# Train each model with their respective pre-trained weights and save the trained models\n",
"\n",
"# I selected a smaller number of epochs for the sake of time and computational resources and acknowledge that training for more epochs could potentially lead to better performance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"background_save": true,
"base_uri": "https://localhost:8080/"
},
"id": "xOF0_q9ja2ro",
"outputId": "55b834b3-83c0-48f1-ef3f-540fa76bec50"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training MobileNetV2 model...\n",
"Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224_no_top.h5\n",
"9406464/9406464 [==============================] - 0s 0us/step\n",
"Epoch 1/10\n",
"24/24 [==============================] - 148s 6s/step - loss: 0.6613 - accuracy: 0.6364 - val_loss: 0.5587 - val_accuracy: 0.7097\n",
"Epoch 2/10\n",
"24/24 [==============================] - 147s 6s/step - loss: 0.5093 - accuracy: 0.7620 - val_loss: 0.5211 - val_accuracy: 0.7097\n",
"Epoch 3/10\n",
"24/24 [==============================] - 130s 5s/step - loss: 0.4650 - accuracy: 0.7807 - val_loss: 0.5082 - val_accuracy: 0.7043\n",
"Epoch 4/10\n",
"24/24 [==============================] - 148s 6s/step - loss: 0.4408 - accuracy: 0.7914 - val_loss: 0.5031 - val_accuracy: 0.7312\n",
"Epoch 5/10\n",
"24/24 [==============================] - 136s 6s/step - loss: 0.4300 - accuracy: 0.7914 - val_loss: 0.5021 - val_accuracy: 0.7366\n",
"Epoch 6/10\n",
"24/24 [==============================] - 128s 5s/step - loss: 0.4135 - accuracy: 0.8128 - val_loss: 0.4963 - val_accuracy: 0.7366\n",
"Epoch 7/10\n",
"24/24 [==============================] - 129s 5s/step - loss: 0.4046 - accuracy: 0.8182 - val_loss: 0.4848 - val_accuracy: 0.7634\n",
"Epoch 8/10\n",
"24/24 [==============================] - 131s 5s/step - loss: 0.3948 - accuracy: 0.8262 - val_loss: 0.4832 - val_accuracy: 0.7796\n",
"Epoch 9/10\n",
"24/24 [==============================] - 141s 6s/step - loss: 0.3909 - accuracy: 0.8302 - val_loss: 0.4765 - val_accuracy: 0.7903\n",
"Epoch 10/10\n",
"24/24 [==============================] - 131s 6s/step - loss: 0.3753 - accuracy: 0.8396 - val_loss: 0.4794 - val_accuracy: 0.7634\n",
"MobileNetV2 model saved.\n"
]
}
],
"source": [
"# Define the list of models to train\n",
"model_architectures = [ \n",
" {\"name\": \"DenseNet121\", \"model\": DenseNet121}, \n",
" {\"name\": \"DenseNet201\", \"model\": DenseNet201}, \n",
" {\"name\": \"InceptionV3\", \"model\": InceptionV3}, \n",
" {\"name\": \"NASNetLarge\", \"model\": NASNetLarge}, \n",
" {\"name\": \"MobileNetV2\", \"model\": MobileNetV2}, ]\n",
"\n",
"# Iterate through each model architecture\n",
"for arch in model_architectures:\n",
" print(f\"Training {arch['name']} model...\")\n",
" \n",
" # Initialize the base model with pre-trained weights\n",
" base_model = arch['model'](\n",
" input_shape=(IMG_SIZE, IMG_SIZE, 3),\n",
" include_top=False,\n",
" weights='imagenet'\n",
" )\n",
"\n",
" # Set the base model as non-trainable (use pre-trained weights)\n",
" base_model.trainable = False\n",
" \n",
" # Define the input layer\n",
" inputs = tf.keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3))\n",
" \n",
" # Pass the inputs through the base model\n",
" x = base_model(inputs, training=False)\n",
" \n",
" # Add a global average pooling layer\n",
" x = tf.keras.layers.GlobalAveragePooling2D()(x)\n",
" \n",
" # Add the output layer with sigmoid activation for binary classification\n",
" outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)\n",
"\n",
" # Build the final model\n",
" model = tf.keras.Model(inputs, outputs)\n",
"\n",
" # Compile the model with Adam optimizer and binary cross-entropy loss\n",
" model.compile(optimizer=tf.keras.optimizers.Adam(),\n",
" loss=tf.keras.losses.BinaryCrossentropy(),\n",
" metrics=['accuracy'])\n",
"\n",
" # Train the model using the training and validation data\n",
" history = model.fit(\n",
" train_generator,\n",
" validation_data=val_generator,\n",
" epochs=10,\n",
" steps_per_epoch=len(train_generator),\n",
" validation_steps=len(val_generator),\n",
" verbose=1\n",
" )\n",
" \n",
" # Save the trained model to Google Drive\n",
" model_save_path = \"/content/gdrive/MyDrive/saved_models\"\n",
" model.save(os.path.join(model_save_path, f\"{arch['name']}.h5\"))\n",
"\n",
" print(f\"{arch['name']} model saved.\")\n"
]
},
{
"cell_type": "markdown",
"source": [
"# Prepare the test images and labels by loading and preprocessing them\n",
"# Read test images from the test directory, resize, and rescale them\n",
"# Store the preprocessed images and their corresponding labels in lists\n"
],
"metadata": {
"id": "Ga19Q9YLTf9Q"
}
},
{
"cell_type": "code",
"source": [
"# Prepare the test images and labels\n",
"test_images = []\n",
"test_labels = []\n",
"test_folder = \"/content/test\"\n",
"\n",
"# Iterate over each class folder in the test directory\n",
"for label, folder in enumerate(['benign', 'malignant']):\n",
" folder_path = os.path.join(test_folder, folder)\n",
" \n",
" # Iterate over each image in the class folder\n",
" for image_name in os.listdir(folder_path):\n",
" image_path = os.path.join(folder_path, image_name)\n",
" img = cv2.imread(image_path)\n",
" if img is not None:\n",
" img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))\n",
" img = img / 255.0\n",
" \n",
" # Append the preprocessed image and its label to the test images and labels list\n",
" test_images.append(img)\n",
" test_labels.append(label)\n",
"\n",
"# Convert the test images list to a numpy array\n",
"test_images = np.array(test_images)"
],
"metadata": {
"id": "kuuPoiYAN1lP"
},
"execution_count": 8,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Evaluate the performance of each trained model on the test dataset\n",
"# Load each saved model, generate predictions, and calculate evaluation metrics\n"
],
"metadata": {
"id": "4PPoqrgkTmoI"
}
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"id": "voyYmrothQSg",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "ae4f1097-87d3-4875-e4f3-f6ff11e3585d"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Evaluating DenseNet121 model...\n",
"8/8 [==============================] - 12s 130ms/step\n",
"DenseNet121 model evaluation:\n",
" Accuracy: 0.6623931623931624\n",
" Precision: 0.625\n",
" Recall: 0.7207207207207207\n",
" F1-score: 0.6694560669456067\n",
"\n",
"Evaluating DenseNet201 model...\n",
"8/8 [==============================] - 5s 136ms/step\n",
"DenseNet201 model evaluation:\n",
" Accuracy: 0.6538461538461539\n",
" Precision: 0.5862068965517241\n",
" Recall: 0.918918918918919\n",
" F1-score: 0.7157894736842105\n",
"\n",
"Evaluating InceptionV3 model...\n",
"8/8 [==============================] - 4s 108ms/step\n",
"InceptionV3 model evaluation:\n",
" Accuracy: 0.688034188034188\n",
" Precision: 0.6319444444444444\n",
" Recall: 0.8198198198198198\n",
" F1-score: 0.7137254901960783\n",
"\n",
"Evaluating NASNetLarge model...\n",
"8/8 [==============================] - 9s 156ms/step\n",
"NASNetLarge model evaluation:\n",
" Accuracy: 0.6709401709401709\n",
" Precision: 0.6075949367088608\n",
" Recall: 0.8648648648648649\n",
" F1-score: 0.7137546468401488\n",
"\n",
"Evaluating MobileNetV2 model...\n",
"8/8 [==============================] - 2s 67ms/step\n",
"MobileNetV2 model evaluation:\n",
" Accuracy: 0.6581196581196581\n",
" Precision: 0.5950920245398773\n",
" Recall: 0.8738738738738738\n",
" F1-score: 0.708029197080292\n",
"\n"
]
}
],
"source": [
"# Load and test each model\n",
"model_predictions = []\n",
"for arch in model_architectures:\n",
" print(f\"Evaluating {arch['name']} model...\")\n",
" \n",
" # Load the saved model from Google Drive\n",
" model_path = os.path.join(model_save_path, f\"{arch['name']}.h5\")\n",
" model = load_model(model_path)\n",
"\n",
" # Generate predictions for the test images using the loaded model\n",
" predictions = model.predict(test_images)\n",
" predictions = [1 if p >= 0.5 else 0 for p in predictions]\n",
" \n",
" # Append the model's predictions to the model_predictions list\n",
" model_predictions.append(predictions)\n",
"\n",
" # Calculate evaluation metrics (accuracy, precision, recall, f1-score) for the model's predictions\n",
" accuracy = accuracy_score(test_labels, predictions)\n",
" precision = precision_score(test_labels, predictions)\n",
" recall = recall_score(test_labels, predictions)\n",
" f1 = f1_score(test_labels, predictions)\n",
"\n",
" # Print the model's evaluation metrics\n",
" print(f\"{arch['name']} model evaluation:\")\n",
" print(f\" Accuracy: {accuracy}\")\n",
" print(f\" Precision: {precision}\")\n",
" print(f\" Recall: {recall}\")\n",
" print(f\" F1-score: {f1}\\n\")"
]
},
{
"cell_type": "markdown",
"source": [
"# Find the best weights for an ensemble of models\n",
"# Generate all possible weight combinations and evaluate their performance on the test dataset\n",
"# Store the best weight combination and its corresponding accuracy\n"
],
"metadata": {
"id": "_RZHSzqHTthW"
}
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"id": "S8JWl1NZtpVh",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "1983f95d-d749-4bf7-9fb8-2043dc1013af"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Best weights: (0.1, 0.1, 0.1, 0.4, 0.3)\n",
"Best accuracy: 0.6965811965811965\n"
]
}
],
"source": [
"# Define the list of possible weights\n",
"weight_values = [0.1, 0.2, 0.3, 0.4]\n",
"\n",
"# Generate all possible combinations of weights\n",
"weight_combinations = list(itertools.product(weight_values, repeat=len(model_architectures)))\n",
"\n",
"best_weights = None\n",
"best_accuracy = 0\n",
"\n",
"# Loop through all weight combinations\n",
"for weights in weight_combinations:\n",
" # Check if the sum of the weights is 1.0 (if not, skip this combination)\n",
" if round(sum(weights), 2) != 1.0:\n",
" continue\n",
"\n",
" # Calculate the weighted predictions\n",
" combined_predictions = np.sum(\n",
" [np.array(predictions) * weight for predictions, weight in zip(model_predictions, weights)],\n",
" axis=0\n",
" )\n",
"\n",
" # Convert the predictions to binary labels\n",
" threshold = 0.5\n",
" binary_predictions = (combined_predictions > threshold).astype(int)\n",
"\n",
" # Calculate the accuracy\n",
" accuracy = accuracy_score(test_labels, binary_predictions)\n",
"\n",
" # Check if the accuracy is better than the best accuracy found so far\n",
" if accuracy > best_accuracy:\n",
" best_accuracy = accuracy\n",
" best_weights = weights\n",
"\n",
"print(f\"Best weights: {best_weights}\")\n",
"print(f\"Best accuracy: {best_accuracy}\")\n"
]
},
{
"cell_type": "markdown",
"source": [
"# Summary of the Ensemble performance"
],
"metadata": {
"id": "idESEhbhTws-"
}
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"id": "pf-RxghBtrTG",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "3d471f14-78f4-4619-82cc-a841bbbda6dc"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Ensemble model evaluation:\n",
"Accuracy: 0.6965811965811965\n",
"Precision: 0.6351351351351351\n",
"Recall: 0.8468468468468469\n",
"F1-score: 0.7258687258687259\n"
]
}
],
"source": [
"# Calculate the weighted predictions using the best weights\n",
"combined_predictions = np.sum(\n",
" [np.array(predictions) * weight for predictions, weight in zip(model_predictions, best_weights)],\n",
" axis=0\n",
")\n",
"\n",
"# Convert the predictions to binary labels\n",
"threshold = 0.5\n",
"binary_predictions = (combined_predictions > threshold).astype(int)\n",
"\n",
"# Calculate the accuracy, precision, recall, and F1-score\n",
"accuracy = accuracy_score(test_labels, binary_predictions)\n",
"precision = precision_score(test_labels, binary_predictions)\n",
"recall = recall_score(test_labels, binary_predictions)\n",
"f1 = f1_score(test_labels, binary_predictions)\n",
"\n",
"print(\"Ensemble model evaluation:\")\n",
"print(f\"Accuracy: {accuracy}\")\n",
"print(f\"Precision: {precision}\")\n",
"print(f\"Recall: {recall}\")\n",
"print(f\"F1-score: {f1}\")\n"
]
},
{
"cell_type": "markdown",
"source": [
"# The performance of the models seems to be moderate, with accuracy ranging from 65% to 69%.\n",
"\n",
"These accuracy levels may be acceptable for some use cases but for this project (critical medical diagnoses) a higher performance would be desirable.\n",
"\n",
"# Adding fine tuning and to try to improve the accuracy"
],
"metadata": {
"id": "eKtxdqsPRzgX"
}
},
{
"cell_type": "code",
"source": [
"# Iterate through each model architecture\n",
"for arch in model_architectures:\n",
" print(f\"Training {arch['name']} model...\")\n",
" \n",
" # Initialize the base model with pre-trained weights\n",
" base_model = arch['model'](\n",
" input_shape=(IMG_SIZE, IMG_SIZE, 3),\n",
" include_top=False,\n",
" weights='imagenet'\n",
" )\n",
"\n",
" # Set the base model as non-trainable (use pre-trained weights)\n",
" base_model.trainable = False\n",
" \n",
" # Define the input layer\n",
" inputs = tf.keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3))\n",
" \n",
" # Pass the inputs through the base model\n",
" x = base_model(inputs, training=False)\n",
" \n",
" # Add a global average pooling layer\n",
" x = tf.keras.layers.GlobalAveragePooling2D()(x)\n",
" \n",
" # Add the output layer with sigmoid activation for binary classification\n",
" outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)\n",
"\n",
" # Build the final model\n",
" model = tf.keras.Model(inputs, outputs)\n",
"\n",
" # Compile the model with Adam optimizer and binary cross-entropy loss\n",
" model.compile(optimizer=tf.keras.optimizers.Adam(),\n",
" loss=tf.keras.losses.BinaryCrossentropy(),\n",
" metrics=['accuracy'])\n",
"\n",
" # Train the model using the training and validation data\n",
" history = model.fit(\n",
" train_generator,\n",
" validation_data=val_generator,\n",
" epochs=10,\n",
" steps_per_epoch=len(train_generator),\n",
" validation_steps=len(val_generator),\n",
" verbose=1\n",
" )\n",
"\n",
" # Unfreeze the last few layers of the base model for fine-tuning\n",
" for layer in base_model.layers[-5:]:\n",
" layer.trainable = True\n",
"\n",
" # Compile the model with a lower learning rate for fine-tuning\n",
" model.compile(optimizer=tf.keras.optimizers.Adam(lr=1e-5),\n",
" loss=tf.keras.losses.BinaryCrossentropy(),\n",
" metrics=['accuracy'])\n",
"\n",
" # Fine-tune the model using the training and validation data\n",
" fine_tuning_history = model.fit(\n",
" train_generator,\n",
" validation_data=val_generator,\n",
" epochs=5,\n",
" steps_per_epoch=len(train_generator),\n",
" validation_steps=len(val_generator),\n",
" verbose=1\n",
" )\n",
" \n",
" # Save the trained model to Google Drive\n",
" model_save_path = \"/content/gdrive/MyDrive/saved_models\"\n",
" model.save(os.path.join(model_save_path, f\"{arch['name']}_finetuned.h5\"))\n",
"\n",
" print(f\"{arch['name']} model saved.\")\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "JdX_hsjiRLAu",
"outputId": "4ce0adf7-e897-4d33-d89f-e7723dd49caa"
},
"execution_count": 14,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Training DenseNet121 model...\n",
"Epoch 1/10\n",
"24/24 [==============================] - 173s 7s/step - loss: 0.6299 - accuracy: 0.6578 - val_loss: 0.6206 - val_accuracy: 0.7097\n",
"Epoch 2/10\n",
"24/24 [==============================] - 142s 6s/step - loss: 0.5466 - accuracy: 0.7286 - val_loss: 0.6045 - val_accuracy: 0.7527\n",
"Epoch 3/10\n",
"24/24 [==============================] - 140s 6s/step - loss: 0.5198 - accuracy: 0.7460 - val_loss: 0.5787 - val_accuracy: 0.7151\n",
"Epoch 4/10\n",
"24/24 [==============================] - 142s 6s/step - loss: 0.4885 - accuracy: 0.7647 - val_loss: 0.5681 - val_accuracy: 0.7097\n",
"Epoch 5/10\n",
"24/24 [==============================] - 139s 6s/step - loss: 0.4679 - accuracy: 0.7821 - val_loss: 0.5640 - val_accuracy: 0.7258\n",
"Epoch 6/10\n",
"24/24 [==============================] - 137s 6s/step - loss: 0.4606 - accuracy: 0.7861 - val_loss: 0.5599 - val_accuracy: 0.7366\n",
"Epoch 7/10\n",
"24/24 [==============================] - 143s 6s/step - loss: 0.4433 - accuracy: 0.7848 - val_loss: 0.5578 - val_accuracy: 0.7097\n",
"Epoch 8/10\n",
"24/24 [==============================] - 140s 6s/step - loss: 0.4338 - accuracy: 0.8102 - val_loss: 0.5534 - val_accuracy: 0.7366\n",
"Epoch 9/10\n",
"24/24 [==============================] - 144s 6s/step - loss: 0.4310 - accuracy: 0.7941 - val_loss: 0.5469 - val_accuracy: 0.7419\n",
"Epoch 10/10\n",
"24/24 [==============================] - 143s 6s/step - loss: 0.4206 - accuracy: 0.8115 - val_loss: 0.5446 - val_accuracy: 0.7473\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:absl:`lr` is deprecated in Keras optimizer, please use `learning_rate` or use the legacy optimizer, e.g.,tf.keras.optimizers.legacy.Adam.\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1/5\n",
"24/24 [==============================] - 157s 6s/step - loss: 0.4185 - accuracy: 0.7995 - val_loss: 0.5405 - val_accuracy: 0.7473\n",
"Epoch 2/5\n",
"24/24 [==============================] - 141s 6s/step - loss: 0.4115 - accuracy: 0.8048 - val_loss: 0.5386 - val_accuracy: 0.7473\n",
"Epoch 3/5\n",
"24/24 [==============================] - 145s 6s/step - loss: 0.4028 - accuracy: 0.8249 - val_loss: 0.5388 - val_accuracy: 0.7419\n",
"Epoch 4/5\n",
"24/24 [==============================] - 141s 6s/step - loss: 0.4003 - accuracy: 0.8222 - val_loss: 0.5386 - val_accuracy: 0.7366\n",
"Epoch 5/5\n",
"24/24 [==============================] - 144s 6s/step - loss: 0.3915 - accuracy: 0.8302 - val_loss: 0.5279 - val_accuracy: 0.7473\n",
"DenseNet121 model saved.\n",
"Training DenseNet201 model...\n",
"Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/densenet/densenet201_weights_tf_dim_ordering_tf_kernels_notop.h5\n",
"74836368/74836368 [==============================] - 3s 0us/step\n",
"Epoch 1/10\n",
"24/24 [==============================] - 181s 7s/step - loss: 0.6273 - accuracy: 0.6444 - val_loss: 0.5848 - val_accuracy: 0.6989\n",
"Epoch 2/10\n",
"24/24 [==============================] - 142s 6s/step - loss: 0.5185 - accuracy: 0.7540 - val_loss: 0.5361 - val_accuracy: 0.7581\n",
"Epoch 3/10\n",
"24/24 [==============================] - 141s 6s/step - loss: 0.4715 - accuracy: 0.7928 - val_loss: 0.5133 - val_accuracy: 0.7473\n",
"Epoch 4/10\n",
"24/24 [==============================] - 148s 6s/step - loss: 0.4432 - accuracy: 0.8048 - val_loss: 0.4990 - val_accuracy: 0.7634\n",
"Epoch 5/10\n",
"24/24 [==============================] - 141s 6s/step - loss: 0.4290 - accuracy: 0.8035 - val_loss: 0.4974 - val_accuracy: 0.7258\n",
"Epoch 6/10\n",
"24/24 [==============================] - 147s 6s/step - loss: 0.4181 - accuracy: 0.8075 - val_loss: 0.4918 - val_accuracy: 0.7473\n",
"Epoch 7/10\n",
"24/24 [==============================] - 138s 6s/step - loss: 0.4119 - accuracy: 0.8195 - val_loss: 0.4841 - val_accuracy: 0.7473\n",
"Epoch 8/10\n",
"24/24 [==============================] - 158s 7s/step - loss: 0.3938 - accuracy: 0.8342 - val_loss: 0.4936 - val_accuracy: 0.7366\n",
"Epoch 9/10\n",
"24/24 [==============================] - 145s 6s/step - loss: 0.3868 - accuracy: 0.8356 - val_loss: 0.4698 - val_accuracy: 0.8011\n",
"Epoch 10/10\n",
"24/24 [==============================] - 141s 6s/step - loss: 0.3770 - accuracy: 0.8302 - val_loss: 0.4685 - val_accuracy: 0.8118\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:absl:`lr` is deprecated in Keras optimizer, please use `learning_rate` or use the legacy optimizer, e.g.,tf.keras.optimizers.legacy.Adam.\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1/5\n",
"24/24 [==============================] - 175s 7s/step - loss: 0.3751 - accuracy: 0.8289 - val_loss: 0.4645 - val_accuracy: 0.8172\n",
"Epoch 2/5\n",
"24/24 [==============================] - 140s 6s/step - loss: 0.3650 - accuracy: 0.8516 - val_loss: 0.4616 - val_accuracy: 0.7957\n",
"Epoch 3/5\n",
"24/24 [==============================] - 148s 6s/step - loss: 0.3598 - accuracy: 0.8382 - val_loss: 0.4554 - val_accuracy: 0.8118\n",
"Epoch 4/5\n",
"24/24 [==============================] - 143s 6s/step - loss: 0.3533 - accuracy: 0.8356 - val_loss: 0.4577 - val_accuracy: 0.8172\n",
"Epoch 5/5\n",
"24/24 [==============================] - 141s 6s/step - loss: 0.3558 - accuracy: 0.8556 - val_loss: 0.4516 - val_accuracy: 0.7796\n",
"DenseNet201 model saved.\n",
"Training InceptionV3 model...\n",
"Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5\n",
"87910968/87910968 [==============================] - 3s 0us/step\n",
"Epoch 1/10\n",
"24/24 [==============================] - 153s 6s/step - loss: 0.6589 - accuracy: 0.6083 - val_loss: 0.6021 - val_accuracy: 0.6828\n",
"Epoch 2/10\n",
"24/24 [==============================] - 141s 6s/step - loss: 0.5546 - accuracy: 0.7152 - val_loss: 0.5619 - val_accuracy: 0.7258\n",
"Epoch 3/10\n",
"24/24 [==============================] - 142s 6s/step - loss: 0.5017 - accuracy: 0.7687 - val_loss: 0.5526 - val_accuracy: 0.7151\n",
"Epoch 4/10\n",
"24/24 [==============================] - 143s 6s/step - loss: 0.4683 - accuracy: 0.8008 - val_loss: 0.5523 - val_accuracy: 0.6989\n",
"Epoch 5/10\n",
"24/24 [==============================] - 138s 6s/step - loss: 0.4554 - accuracy: 0.7848 - val_loss: 0.5482 - val_accuracy: 0.7151\n",
"Epoch 6/10\n",
"24/24 [==============================] - 147s 6s/step - loss: 0.4406 - accuracy: 0.8115 - val_loss: 0.5478 - val_accuracy: 0.7151\n",
"Epoch 7/10\n",
"24/24 [==============================] - 141s 6s/step - loss: 0.4290 - accuracy: 0.7981 - val_loss: 0.5372 - val_accuracy: 0.7204\n",
"Epoch 8/10\n",
"24/24 [==============================] - 145s 6s/step - loss: 0.4109 - accuracy: 0.8235 - val_loss: 0.5454 - val_accuracy: 0.6989\n",
"Epoch 9/10\n",
"24/24 [==============================] - 142s 6s/step - loss: 0.3931 - accuracy: 0.8396 - val_loss: 0.5554 - val_accuracy: 0.7097\n",
"Epoch 10/10\n",
"24/24 [==============================] - 142s 6s/step - loss: 0.3808 - accuracy: 0.8409 - val_loss: 0.5507 - val_accuracy: 0.6989\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:absl:`lr` is deprecated in Keras optimizer, please use `learning_rate` or use the legacy optimizer, e.g.,tf.keras.optimizers.legacy.Adam.\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1/5\n",
"24/24 [==============================] - 155s 6s/step - loss: 0.3886 - accuracy: 0.8302 - val_loss: 0.5377 - val_accuracy: 0.7204\n",
"Epoch 2/5\n",
"24/24 [==============================] - 147s 6s/step - loss: 0.3683 - accuracy: 0.8516 - val_loss: 0.5447 - val_accuracy: 0.7312\n",
"Epoch 3/5\n",
"24/24 [==============================] - 143s 6s/step - loss: 0.3535 - accuracy: 0.8409 - val_loss: 0.5462 - val_accuracy: 0.7366\n",
"Epoch 4/5\n",
"24/24 [==============================] - 148s 6s/step - loss: 0.3429 - accuracy: 0.8596 - val_loss: 0.5581 - val_accuracy: 0.7204\n",
"Epoch 5/5\n",
"24/24 [==============================] - 140s 6s/step - loss: 0.3322 - accuracy: 0.8690 - val_loss: 0.5586 - val_accuracy: 0.7204\n",
"InceptionV3 model saved.\n",
"Training NASNetLarge model...\n",
"Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/nasnet/NASNet-large-no-top.h5\n",
"343610240/343610240 [==============================] - 11s 0us/step\n",
"Epoch 1/10\n",
"24/24 [==============================] - 174s 6s/step - loss: 0.6187 - accuracy: 0.6564 - val_loss: 0.6188 - val_accuracy: 0.6720\n",
"Epoch 2/10\n",
"24/24 [==============================] - 139s 6s/step - loss: 0.5097 - accuracy: 0.7727 - val_loss: 0.5962 - val_accuracy: 0.6989\n",
"Epoch 3/10\n",
"24/24 [==============================] - 143s 6s/step - loss: 0.4695 - accuracy: 0.7968 - val_loss: 0.5955 - val_accuracy: 0.6882\n",
"Epoch 4/10\n",
"24/24 [==============================] - 139s 6s/step - loss: 0.4426 - accuracy: 0.8021 - val_loss: 0.5882 - val_accuracy: 0.6882\n",
"Epoch 5/10\n",
"24/24 [==============================] - 143s 6s/step - loss: 0.4189 - accuracy: 0.8168 - val_loss: 0.5891 - val_accuracy: 0.6882\n",
"Epoch 6/10\n",
"24/24 [==============================] - 141s 6s/step - loss: 0.4079 - accuracy: 0.8302 - val_loss: 0.5867 - val_accuracy: 0.6935\n",
"Epoch 7/10\n",
"24/24 [==============================] - 144s 6s/step - loss: 0.3928 - accuracy: 0.8329 - val_loss: 0.5870 - val_accuracy: 0.6989\n",
"Epoch 8/10\n",
"24/24 [==============================] - 143s 6s/step - loss: 0.3797 - accuracy: 0.8476 - val_loss: 0.5830 - val_accuracy: 0.6882\n",
"Epoch 9/10\n",
"24/24 [==============================] - 143s 6s/step - loss: 0.3673 - accuracy: 0.8543 - val_loss: 0.5841 - val_accuracy: 0.7097\n",
"Epoch 10/10\n",
"24/24 [==============================] - 137s 6s/step - loss: 0.3618 - accuracy: 0.8503 - val_loss: 0.5781 - val_accuracy: 0.6882\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:absl:`lr` is deprecated in Keras optimizer, please use `learning_rate` or use the legacy optimizer, e.g.,tf.keras.optimizers.legacy.Adam.\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1/5\n",
"24/24 [==============================] - 173s 6s/step - loss: 0.3594 - accuracy: 0.8476 - val_loss: 0.5973 - val_accuracy: 0.7312\n",
"Epoch 2/5\n",
"24/24 [==============================] - 142s 6s/step - loss: 0.3388 - accuracy: 0.8636 - val_loss: 0.5873 - val_accuracy: 0.6989\n",
"Epoch 3/5\n",
"24/24 [==============================] - 140s 6s/step - loss: 0.3332 - accuracy: 0.8730 - val_loss: 0.5834 - val_accuracy: 0.6720\n",
"Epoch 4/5\n",
"24/24 [==============================] - 144s 6s/step - loss: 0.3216 - accuracy: 0.8730 - val_loss: 0.5868 - val_accuracy: 0.6935\n",
"Epoch 5/5\n",
"24/24 [==============================] - 140s 6s/step - loss: 0.3167 - accuracy: 0.8703 - val_loss: 0.5977 - val_accuracy: 0.7258\n",
"NASNetLarge model saved.\n",
"Training MobileNetV2 model...\n",
"Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224_no_top.h5\n",
"9406464/9406464 [==============================] - 1s 0us/step\n",
"Epoch 1/10\n",
"24/24 [==============================] - 146s 6s/step - loss: 0.6431 - accuracy: 0.6217 - val_loss: 0.5707 - val_accuracy: 0.6828\n",
"Epoch 2/10\n",
"24/24 [==============================] - 138s 6s/step - loss: 0.5221 - accuracy: 0.7313 - val_loss: 0.5654 - val_accuracy: 0.6935\n",
"Epoch 3/10\n",
"24/24 [==============================] - 153s 6s/step - loss: 0.4873 - accuracy: 0.7607 - val_loss: 0.5328 - val_accuracy: 0.7527\n",
"Epoch 4/10\n",
"24/24 [==============================] - 137s 6s/step - loss: 0.4703 - accuracy: 0.7714 - val_loss: 0.5244 - val_accuracy: 0.7581\n",
"Epoch 5/10\n",
"24/24 [==============================] - 138s 6s/step - loss: 0.4489 - accuracy: 0.7914 - val_loss: 0.5176 - val_accuracy: 0.7634\n",
"Epoch 6/10\n",
"24/24 [==============================] - 138s 6s/step - loss: 0.4357 - accuracy: 0.7955 - val_loss: 0.5131 - val_accuracy: 0.7581\n",
"Epoch 7/10\n",
"24/24 [==============================] - 138s 6s/step - loss: 0.4221 - accuracy: 0.8075 - val_loss: 0.5101 - val_accuracy: 0.7634\n",
"Epoch 8/10\n",
"24/24 [==============================] - 138s 6s/step - loss: 0.4145 - accuracy: 0.8195 - val_loss: 0.5269 - val_accuracy: 0.7312\n",
"Epoch 9/10\n",
"24/24 [==============================] - 138s 6s/step - loss: 0.4086 - accuracy: 0.8182 - val_loss: 0.5017 - val_accuracy: 0.7688\n",
"Epoch 10/10\n",
"24/24 [==============================] - 138s 6s/step - loss: 0.4035 - accuracy: 0.8222 - val_loss: 0.5032 - val_accuracy: 0.7796\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:absl:`lr` is deprecated in Keras optimizer, please use `learning_rate` or use the legacy optimizer, e.g.,tf.keras.optimizers.legacy.Adam.\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1/5\n",
"24/24 [==============================] - 148s 6s/step - loss: 0.3930 - accuracy: 0.8262 - val_loss: 0.5387 - val_accuracy: 0.7204\n",
"Epoch 2/5\n",
"24/24 [==============================] - 152s 6s/step - loss: 0.3806 - accuracy: 0.8329 - val_loss: 0.5167 - val_accuracy: 0.7419\n",
"Epoch 3/5\n",
"24/24 [==============================] - 136s 6s/step - loss: 0.3729 - accuracy: 0.8382 - val_loss: 0.4888 - val_accuracy: 0.7849\n",
"Epoch 4/5\n",
"24/24 [==============================] - 136s 6s/step - loss: 0.3655 - accuracy: 0.8369 - val_loss: 0.5008 - val_accuracy: 0.7581\n",
"Epoch 5/5\n",
"24/24 [==============================] - 148s 6s/step - loss: 0.3557 - accuracy: 0.8463 - val_loss: 0.4915 - val_accuracy: 0.7688\n",
"MobileNetV2 model saved.\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# Function to plot ROC curve\n",
"def plot_roc_curve(fpr, tpr, roc_auc, model_name):\n",
" plt.plot(fpr, tpr, label=f'{model_name} (AUC = {roc_auc:.2f})')\n",
" plt.plot([0, 1], [0, 1], 'k--')\n",
" plt.xlim([0.0, 1.0])\n",
" plt.ylim([0.0, 1.05])\n",
" plt.xlabel('False Positive Rate')\n",
" plt.ylabel('True Positive Rate')\n",
" plt.title('Receiver Operating Characteristic (ROC)')\n",
" plt.legend(loc=\"lower right\")\n",
"\n",
"# Function to plot confusion matrix\n",
"def plot_confusion_matrix(cm, model_name):\n",
" sns.heatmap(cm, annot=True, fmt=\"d\", cmap=\"Blues\")\n",
" plt.xlabel('Predicted')\n",
" plt.ylabel('True')\n",
" plt.title(f'Confusion Matrix for {model_name}')\n",
"\n",
"# Initialize an empty dictionary to store MCC scores\n",
"mcc_scores = {}\n",
"\n",
"# Load and test each model\n",
"model_predictions = []\n",
"for arch in model_architectures:\n",
" print(f\"Evaluating {arch['name']} finetuned model...\")\n",
" \n",
" # Load the saved model from Google Drive\n",
" model_path = os.path.join(model_save_path, f\"{arch['name']}_finetuned.h5\")\n",
" model = load_model(model_path)\n",
"\n",
" # Generate predictions for the test images using the loaded model\n",
" predictions = model.predict(test_images)\n",
" predictions_binary = [1 if p >= 0.5 else 0 for p in predictions]\n",
" \n",
" # Append the model's predictions to the model_predictions list\n",
" model_predictions.append(predictions_binary)\n",
"\n",
" # Calculate evaluation metrics (accuracy, precision, recall, f1-score, confusion matrix, AUC-ROC, and MCC) for the model's predictions\n",
" accuracy = accuracy_score(test_labels, predictions_binary)\n",
" precision = precision_score(test_labels, predictions_binary)\n",
" recall = recall_score(test_labels, predictions_binary)\n",
" f1 = f1_score(test_labels, predictions_binary)\n",
" cm = confusion_matrix(test_labels, predictions_binary)\n",
" auc_roc = roc_auc_score(test_labels, predictions)\n",
" mcc = matthews_corrcoef(test_labels, predictions_binary)\n",
"\n",
" # Print the model's evaluation metrics\n",
" print(f\"{arch['name']} model evaluation:\")\n",
" print(f\" Accuracy: {accuracy}\")\n",
" print(f\" Precision: {precision}\")\n",
" print(f\" Recall: {recall}\")\n",
" print(f\" F1-score: {f1}\")\n",
" print(f\" Confusion matrix: \\n{cm}\")\n",
" print(f\" AUC-ROC: {auc_roc}\")\n",
" print(f\" MCC: {mcc}\\n\")\n",
"\n",
" # Store the MCC score in the dictionary\n",
" mcc_scores[arch['name']] = mcc\n",
"\n",
" # Plot the confusion matrix\n",
" plt.figure()\n",
" plot_confusion_matrix(cm, arch['name'])\n",
" plt.show()\n",
"\n",
" # Plot the ROC curve\n",
" fpr, tpr, _ = roc_curve(test_labels, predictions)\n",
" roc_auc = auc(fpr, tpr)\n",
" plt.figure()\n",
" plot_roc_curve(fpr, tpr, roc_auc, arch['name'])\n",
" plt.show()\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "AXAHFzIzAYl4",
"outputId": "95e89898-85c9-4a37-907e-d74e0bc81613"
},
"execution_count": 31,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Evaluating DenseNet121 finetuned model...\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:tensorflow:Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"8/8 [==============================] - 5s 39ms/step\n",
"DenseNet121 model evaluation:\n",
" Accuracy: 0.6623931623931624\n",
" Precision: 0.6012658227848101\n",
" Recall: 0.8558558558558559\n",
" F1-score: 0.7063197026022304\n",
" Confusion matrix: \n",
"[[60 63]\n",
" [16 95]]\n",
" AUC-ROC: 0.7453306965502088\n",
" MCC: 0.36644484679082756\n",
"\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 2 Axes>"
],
"image/png": "\n"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Evaluating DenseNet201 finetuned model...\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:tensorflow:Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"8/8 [==============================] - 5s 57ms/step\n",
"DenseNet201 model evaluation:\n",
" Accuracy: 0.6239316239316239\n",
" Precision: 0.56353591160221\n",
" Recall: 0.918918918918919\n",
" F1-score: 0.6986301369863014\n",
" Confusion matrix: \n",
"[[ 44 79]\n",
" [ 9 102]]\n",
" AUC-ROC: 0.7639346663736908\n",
" MCC: 0.33003174636621696\n",
"\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 2 Axes>"
],
"image/png": "\n"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Evaluating InceptionV3 finetuned model...\n",
"8/8 [==============================] - 2s 31ms/step\n",
"InceptionV3 model evaluation:\n",
" Accuracy: 0.688034188034188\n",
" Precision: 0.6759259259259259\n",
" Recall: 0.6576576576576577\n",
" F1-score: 0.6666666666666667\n",
" Confusion matrix: \n",
"[[88 35]\n",
" [38 73]]\n",
" AUC-ROC: 0.7499450670182377\n",
" MCC: 0.37372120906708856\n",
"\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 2 Axes>"
],
"image/png": "\n"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Evaluating NASNetLarge finetuned model...\n",
"8/8 [==============================] - 8s 119ms/step\n",
"NASNetLarge model evaluation:\n",
" Accuracy: 0.6239316239316239\n",
" Precision: 0.5688622754491018\n",
" Recall: 0.8558558558558559\n",
" F1-score: 0.683453237410072\n",
" Confusion matrix: \n",
"[[51 72]\n",
" [16 95]]\n",
" AUC-ROC: 0.7489928953343588\n",
" MCC: 0.29879245429862344\n",
"\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 2 Axes>"
],
"image/png": "\n"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Evaluating MobileNetV2 finetuned model...\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:tensorflow:Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"8/8 [==============================] - 1s 21ms/step\n",
"MobileNetV2 model evaluation:\n",
" Accuracy: 0.6410256410256411\n",
" Precision: 0.5859872611464968\n",
" Recall: 0.8288288288288288\n",
" F1-score: 0.6865671641791046\n",
" Confusion matrix: \n",
"[[58 65]\n",
" [19 92]]\n",
" AUC-ROC: 0.7635684464952758\n",
" MCC: 0.31921297473985993\n",
"\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 2 Axes>"
],
"image/png": "\n"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"source": [
"# Define the list of possible weights\n",
"weight_values = [0.1, 0.2, 0.3, 0.4]\n",
"\n",
"# Generate all possible combinations of weights\n",
"weight_combinations = list(itertools.product(weight_values, repeat=len(model_architectures)))\n",
"\n",
"best_weights = None\n",
"best_accuracy = 0\n",
"\n",
"# Loop through all weight combinations\n",
"for weights in weight_combinations:\n",
" # Check if the sum of the weights is 1.0 (if not, skip this combination)\n",
" if round(sum(weights), 2) != 1.0:\n",
" continue\n",
"\n",
" # Calculate the weighted predictions\n",
" combined_predictions = np.sum(\n",
" [np.array(predictions) * weight for predictions, weight in zip(model_predictions, weights)],\n",
" axis=0\n",
" )\n",
"\n",
" # Convert the predictions to binary labels\n",
" threshold = 0.5\n",
" binary_predictions = (combined_predictions > threshold).astype(int)\n",
"\n",
" # Calculate the accuracy\n",
" accuracy = accuracy_score(test_labels, binary_predictions)\n",
"\n",
" # Check if the accuracy is better than the best accuracy found so far\n",
" if accuracy > best_accuracy:\n",
" best_accuracy = accuracy\n",
" best_weights = weights\n",
"\n",
"print(f\"Best weights: {best_weights}\")\n",
"print(f\"Best accuracy: {best_accuracy}\")\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "WEcBBLroA3v2",
"outputId": "ec127e5c-9bcc-408a-94bb-7c8949389784"
},
"execution_count": 21,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Best weights: (0.1, 0.1, 0.3, 0.1, 0.4)\n",
"Best accuracy: 0.688034188034188\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# Calculate the weighted predictions using the best weights\n",
"combined_predictions = np.sum(\n",
" [np.array(predictions) * weight for predictions, weight in zip(model_predictions, best_weights)],\n",
" axis=0\n",
")\n",
"\n",
"# Convert the predictions to binary labels\n",
"threshold = 0.5\n",
"binary_predictions = (combined_predictions > threshold).astype(int)\n",
"\n",
"# Calculate the accuracy, precision, recall, and F1-score\n",
"accuracy = accuracy_score(test_labels, binary_predictions)\n",
"precision = precision_score(test_labels, binary_predictions)\n",
"recall = recall_score(test_labels, binary_predictions)\n",
"f1 = f1_score(test_labels, binary_predictions)\n",
"cm = confusion_matrix(test_labels, binary_predictions)\n",
"auc_roc = roc_auc_score(test_labels, binary_predictions)\n",
"mcc = matthews_corrcoef(test_labels, binary_predictions)\n",
"\n",
"print(\"Ensemble model evaluation:\")\n",
"print(f\"Accuracy: {accuracy}\")\n",
"print(f\"Precision: {precision}\")\n",
"print(f\"Recall: {recall}\")\n",
"print(f\"F1-score: {f1}\")\n",
"print(f\" Confusion matrix: \\n{cm}\")\n",
"print(f\" AUC-ROC: {auc_roc}\")\n",
"print(f\" MCC: {mcc}\\n\")\n",
"\n",
"mcc_scores['Ensemble'] = mcc\n",
"\n",
"plt.figure()\n",
"plt.barh(list(mcc_scores.keys()), list(mcc_scores.values()), color='blue')\n",
"plt.xlabel('MCC Score')\n",
"plt.ylabel('Model')\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 656
},
"id": "KaQQnqaUCeLo",
"outputId": "57be0fa1-360d-4656-ac6a-57831fc3e0d0"
},
"execution_count": 33,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Ensemble model evaluation:\n",
"Accuracy: 0.688034188034188\n",
"Precision: 0.6266666666666667\n",
"Recall: 0.8468468468468469\n",
"F1-score: 0.7203065134099618\n",
" Confusion matrix: \n",
"[[67 56]\n",
" [17 94]]\n",
" AUC-ROC: 0.6957811470006592\n",
" MCC: 0.4075957404067199\n",
"\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Text(0, 0.5, 'Model')"
]
},
"metadata": {},
"execution_count": 33
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyO3QIpzJfPtbT7VTKsS/Yyp",
"include_colab_link": true
},
"gpuClass": "premium",
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment