Skip to content

Instantly share code, notes, and snippets.

@aeyage
Last active February 1, 2024 10:07
Show Gist options
  • Select an option

  • Save aeyage/a8c6faf754472fdf71c9f930783af886 to your computer and use it in GitHub Desktop.

Select an option

Save aeyage/a8c6faf754472fdf71c9f930783af886 to your computer and use it in GitHub Desktop.
PoS Tagging of DBS Bank
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"collapsed_sections": [
"csVW2IbEphTu",
"wPHqHd3iO4F6",
"pkSHayO8PlVF",
"HdxkSFXKQKqa",
"Uh_cLIaU0Grv",
"CH-PEAWuR35n"
],
"authorship_tag": "ABX9TyOouSMNGePRpDAqoes0mDae",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/aeyage/a8c6faf754472fdf71c9f930783af886/cpc353-pos-tagging-of-dbs-bank.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# Text and Sentiment Analysis of DBS Bank Ltd. Banking Performance Using Part-of-Speech Tagging\n",
"\n",
"![dbs.png]()"
],
"metadata": {
"id": "qcShyLmum8A-"
}
},
{
"cell_type": "markdown",
"source": [
"**Project Overview:**\n",
"\n",
"This project aims to study the text analysis of DBS Bank performance feedback by labelling words in a set of reviews as nouns, adjectives, verb using PoS Tagging on a total of 107 consumer reviews from Jul 2019 to Nov 2023 extracted from:\n",
"\n",
" i. *Trustpilot* <br>\n",
" ii. *BankQuality* <br>\n",
"```\n",
"Author: Aiman Hakimi (153153)\n",
"```\n"
],
"metadata": {
"id": "LbvPeYfsnHoS"
}
},
{
"cell_type": "markdown",
"source": [
"Suggestion: Best viewed on *Google Colab*."
],
"metadata": {
"id": "vdy0WaNJn_C5"
}
},
{
"cell_type": "markdown",
"source": [
"##**Import Necesssary Libraries**"
],
"metadata": {
"id": "csVW2IbEphTu"
}
},
{
"cell_type": "code",
"source": [
"%config Completer.use_jedi=False\n",
"\n",
"import pandas as pd\n",
"import spacy\n",
"from nltk.tag import pos_tag\n",
"\n",
"import re\n",
"\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
],
"metadata": {
"id": "8yLHzWc2pksy"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"##**Load The Sample Dataset**"
],
"metadata": {
"id": "wPHqHd3iO4F6"
}
},
{
"cell_type": "code",
"source": [
"dbs_pos_rev = pd.read_csv(\"dbs_pos_reviews.csv\", na_values=\"?\")\n",
"dbs_neg_rev = pd.read_csv(\"dbs_neg_reviews.csv\", na_values=\"?\")"
],
"metadata": {
"id": "dfwl_eJ4slmX"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"##**Data Pre-processing**"
],
"metadata": {
"id": "pkSHayO8PlVF"
}
},
{
"cell_type": "code",
"source": [
"# Load spaCy model\n",
"nlp = spacy.load(\"en_core_web_sm\")\n",
"\n",
"\"\"\"\n",
"Helper function to remove punctuation, lowercase conversion,\n",
"and removing unnecessary characters.\n",
"\"\"\"\n",
"def clean_reviews(Review):\n",
" crev = str(Review).lower()\n",
" crev = re.sub(r'[^\\w\\s]', '', crev)\n",
" crev = re.sub(r'\\s{2,}', ' ', crev)\n",
" return crev\n",
"\n",
"dbs_pos_rev[\"Cleaned_Review\"] = dbs_pos_rev[\"Review\"].apply(clean_reviews)\n",
"dbs_neg_rev[\"Cleaned_Review\"] = dbs_neg_rev[\"Review\"].apply(clean_reviews)"
],
"metadata": {
"id": "DKCnaSXfsX6w"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"##**Part-of-Speech Tagging**"
],
"metadata": {
"id": "HdxkSFXKQKqa"
}
},
{
"cell_type": "code",
"source": [
"# Helper function to perform PoS Tagging\n",
"def perform_pos_tagging(Review):\n",
" doc = nlp(Review)\n",
" pos_tags = [(token.text, token.pos_) for token in doc]\n",
" return pos_tags\n",
"\n",
"# Perform POS tagging\n",
"dbs_pos_rev[\"POS_Tags\"] = dbs_pos_rev[\"Cleaned_Review\"].apply(perform_pos_tagging)\n",
"dbs_neg_rev[\"POS_Tags\"] = dbs_neg_rev[\"Cleaned_Review\"].apply(perform_pos_tagging)"
],
"metadata": {
"id": "S4S4KRN7s8UU"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Print sample tagged reviews\n",
"print(\"Positive Reviews:\")\n",
"print(dbs_pos_rev[[\"Review\", \"Cleaned_Review\", \"POS_Tags\"]].head(10))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "PVNrsSpTtIP2",
"outputId": "8a44c9c4-89da-4e21-fe61-811290bee343"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Positive Reviews:\n",
" Review \\\n",
"0 Pretty good. Just ATM can be crowded at times. \n",
"1 A great bank for us Singaporeans!! The smart p... \n",
"2 I've been a customer for four years and have n... \n",
"3 Moved to Singapore from the UK a couple of yea... \n",
"4 wow ...bhumi meri hai \n",
"5 Called recently to organise a new loan, expect... \n",
"6 Internet banking is best in the world \n",
"7 Thank you for yr kind assistance .very service... \n",
"8 Customer service adviser Yanty was very helpfu... \n",
"9 Dear Head of DBS Bank, Westgate Branch,I, Hazl... \n",
"\n",
" Cleaned_Review \\\n",
"0 pretty good just atm can be crowded at times \n",
"1 a great bank for us singaporeans the smart pho... \n",
"2 ive been a customer for four years and have no... \n",
"3 moved to singapore from the uk a couple of yea... \n",
"4 wow bhumi meri hai \n",
"5 called recently to organise a new loan expecte... \n",
"6 internet banking is best in the world \n",
"7 thank you for yr kind assistance very service ... \n",
"8 customer service adviser yanty was very helpfu... \n",
"9 dear head of dbs bank westgate branchi hazlind... \n",
"\n",
" POS_Tags \n",
"0 [(pretty, ADV), (good, ADJ), (just, ADV), (atm... \n",
"1 [(a, DET), (great, ADJ), (bank, NOUN), (for, A... \n",
"2 [(i, PRON), (ve, AUX), (been, AUX), (a, DET), ... \n",
"3 [(moved, VERB), (to, ADP), (singapore, PROPN),... \n",
"4 [(wow, INTJ), (bhumi, PROPN), (meri, PROPN), (... \n",
"5 [(called, VERB), (recently, ADV), (to, PART), ... \n",
"6 [(internet, NOUN), (banking, NOUN), (is, AUX),... \n",
"7 [(thank, VERB), (you, PRON), (for, ADP), (yr, ... \n",
"8 [(customer, NOUN), (service, NOUN), (adviser, ... \n",
"9 [(dear, ADJ), (head, NOUN), (of, ADP), (dbs, A... \n"
]
}
]
},
{
"cell_type": "code",
"source": [
"from collections import Counter\n",
"\n",
"\"\"\"\n",
"Helper function to analyse the top 10 most frequent words\n",
"by POS tag in a set of POS-tagged reviews.\n",
"\"\"\"\n",
"def analyze_top_words_by_pos(pos_tagged_reviews):\n",
"\n",
" # Define a list of POS tags to track\n",
" pos_tags_to_track = [\"VERB\", \"ADJ\", \"NOUN\"]\n",
"\n",
" # Create a dictionary to store tag counts\n",
" tag_counts = {tag: Counter() for tag in pos_tags_to_track}\n",
"\n",
" # Iterate through all reviews and count POS occurrences\n",
" for review_id, pos_tags in pos_tagged_reviews.items():\n",
" for token, tag in pos_tags:\n",
" if tag in tag_counts:\n",
" tag_counts[tag][token] += 1\n",
"\n",
" # Extract top 10 words for each tag\n",
" top_words_by_tag = {}\n",
" for tag, tag_counter in tag_counts.items():\n",
" top_words_by_tag[tag] = tag_counter.most_common(10)\n",
"\n",
" return top_words_by_tag"
],
"metadata": {
"id": "wxISwWTpCV_Y"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"###Positive Reviews"
],
"metadata": {
"id": "Uh_cLIaU0Grv"
}
},
{
"cell_type": "code",
"source": [
"# Analyse positive reviews\n",
"pos_top_words_by_tag = analyze_top_words_by_pos(dbs_pos_rev[\"POS_Tags\"])\n",
"\n",
"# Print top 10 words for each POS tag\n",
"for sentiment, top_words in [(\"Positive\", pos_top_words_by_tag)]:\n",
" print(f\"\\nTop 10 words by POS tag in {sentiment} Reviews:\")\n",
" for tag, top_words in top_words.items():\n",
" print(f\"\\n{tag}:\")\n",
" for word, count in top_words:\n",
" print(f\"\\t{word}: {count}\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "TxUEok8XLW_0",
"outputId": "5432d6bb-ecfd-4951-b476-32c5f2a7f19c"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n",
"Top 10 words by POS tag in Positive Reviews:\n",
"\n",
"VERB:\n",
"\thave: 7\n",
"\tadvised: 6\n",
"\tthank: 5\n",
"\tmade: 4\n",
"\tdid: 3\n",
"\thad: 3\n",
"\tgo: 3\n",
"\tcalled: 3\n",
"\tassisted: 3\n",
"\tdo: 3\n",
"\n",
"ADJ:\n",
"\tnew: 6\n",
"\tgood: 4\n",
"\tbest: 4\n",
"\tonline: 4\n",
"\tgreat: 3\n",
"\table: 3\n",
"\tmultiplier: 2\n",
"\tfast: 2\n",
"\thappy: 2\n",
"\tlong: 2\n",
"\n",
"NOUN:\n",
"\taccount: 12\n",
"\tbranch: 9\n",
"\tservice: 9\n",
"\tbank: 8\n",
"\tcustomer: 6\n",
"\tyears: 5\n",
"\tcard: 5\n",
"\tcall: 5\n",
"\tdbs: 4\n",
"\tdays: 4\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"###Negative Reviews"
],
"metadata": {
"id": "CH-PEAWuR35n"
}
},
{
"cell_type": "code",
"source": [
"# Analyse negative reviews\n",
"neg_top_words_by_tag = analyze_top_words_by_pos(dbs_neg_rev[\"POS_Tags\"])\n",
"\n",
"# Print top 10 words for each POS tag\n",
"for sentiment, top_words in [(\"Negative\", neg_top_words_by_tag)]:\n",
" print(f\"\\nTop 10 words by POS tag in {sentiment} Reviews:\")\n",
" for tag, top_words in top_words.items():\n",
" print(f\"\\n{tag}:\")\n",
" for word, count in top_words:\n",
" print(f\"\\t{word}: {count}\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ZsjnkZlHR7Xf",
"outputId": "e7a25de3-c461-4b91-8dc0-2291ce1cb676"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n",
"Top 10 words by POS tag in Negative Reviews:\n",
"\n",
"VERB:\n",
"\thave: 31\n",
"\tget: 15\n",
"\tcalled: 12\n",
"\thad: 11\n",
"\topen: 11\n",
"\tgo: 10\n",
"\tdo: 10\n",
"\tdbs: 9\n",
"\tsaid: 9\n",
"\tis: 8\n",
"\n",
"ADJ:\n",
"\tother: 20\n",
"\tworst: 17\n",
"\tmore: 11\n",
"\tmultiple: 11\n",
"\tmany: 10\n",
"\tsame: 9\n",
"\tbad: 9\n",
"\tterrible: 8\n",
"\tpoor: 8\n",
"\tdbs: 8\n",
"\n",
"NOUN:\n",
"\tbank: 95\n",
"\tcustomer: 50\n",
"\tservice: 49\n",
"\taccount: 41\n",
"\tdbs: 32\n",
"\tcard: 29\n",
"\ttime: 21\n",
"\tmonth: 15\n",
"\tamount: 14\n",
"\tmoney: 13\n"
]
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment