Skip to content

Instantly share code, notes, and snippets.

@djsegal
Created January 3, 2025 21:25
Show Gist options
  • Select an option

  • Save djsegal/2a7e3cc17aaba702de9adb3e6fad6d50 to your computer and use it in GitHub Desktop.

Select an option

Save djsegal/2a7e3cc17aaba702de9adb3e6fad6d50 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "a88f75ee-eb6b-4366-a80b-56f41cd92b55",
"metadata": {},
"source": [
"# Step 1"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c9fdc7e6-f430-4c58-a6b6-5d0cd0dc8494",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: selenium in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (4.27.1)\n",
"Requirement already satisfied: urllib3<3,>=1.26 in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from urllib3[socks]<3,>=1.26->selenium) (1.26.20)\n",
"Requirement already satisfied: trio~=0.17 in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from selenium) (0.28.0)\n",
"Requirement already satisfied: trio-websocket~=0.9 in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from selenium) (0.11.1)\n",
"Requirement already satisfied: certifi>=2021.10.8 in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from selenium) (2023.7.22)\n",
"Requirement already satisfied: typing_extensions~=4.9 in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from selenium) (4.9.0)\n",
"Requirement already satisfied: websocket-client~=1.8 in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from selenium) (1.8.0)\n",
"Requirement already satisfied: attrs>=23.2.0 in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from trio~=0.17->selenium) (24.3.0)\n",
"Requirement already satisfied: sortedcontainers in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from trio~=0.17->selenium) (2.4.0)\n",
"Requirement already satisfied: idna in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from trio~=0.17->selenium) (3.4)\n",
"Requirement already satisfied: outcome in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from trio~=0.17->selenium) (1.3.0.post0)\n",
"Requirement already satisfied: sniffio>=1.3.0 in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from trio~=0.17->selenium) (1.3.0)\n",
"Requirement already satisfied: wsproto>=0.14 in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from trio-websocket~=0.9->selenium) (1.2.0)\n",
"Requirement already satisfied: PySocks!=1.5.7,<2.0,>=1.5.6 in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from urllib3[socks]<3,>=1.26->selenium) (1.7.1)\n",
"Requirement already satisfied: h11<1,>=0.9.0 in /Users/dan/.pyenv/versions/3.11.6/lib/python3.11/site-packages (from wsproto>=0.14->trio-websocket~=0.9->selenium) (0.14.0)\n",
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"\u001b[34m==>\u001b[0m \u001b[1mDownloading https://formulae.brew.sh/api/formula.jws.json\u001b[0m\n",
"##O#- # \n",
"\u001b[34m==>\u001b[0m \u001b[1mDownloading https://formulae.brew.sh/api/cask.jws.json\u001b[0m\n",
"######################################################################### 100.0%\n",
"\u001b[33mWarning:\u001b[0m Not upgrading chromedriver, the latest version is already installed\n"
]
}
],
"source": [
"!pip install selenium\n",
"!brew install chromedriver\n"
]
},
{
"cell_type": "markdown",
"id": "c2946bcc-6205-450f-98f5-8ce52e277d01",
"metadata": {},
"source": [
"# Step 2\n",
"\n",
"i) After you get warning about\n",
"\n",
"```\n",
"\"chromedriver\" Not Opened\n",
"\n",
"Apple couldnot verify \"chromedriver\" is free of malware that may harm your Mac or compromise your privacy.\n",
"\n",
"(Done) (Move to Trash)\n",
"```\n",
"\n",
"ii) In MacOS Spotlight search for `Security & Privacy`\n",
"\n",
"iii) Scroll to bottom and click \"Allow Anyway\" to a boxed option close to something about \"chromedriver\"\n"
]
},
{
"cell_type": "markdown",
"id": "ea9284e1-5274-472e-85d3-6e1ce4be8a34",
"metadata": {},
"source": [
"# Step 3"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "863066e4-8f93-4910-848b-0548aecc580d",
"metadata": {},
"outputs": [],
"source": [
"from selenium import webdriver\n",
"from selenium.webdriver.common.by import By\n",
"from selenium.webdriver.chrome.service import Service\n",
"from selenium.webdriver.common.action_chains import ActionChains\n",
"from selenium.webdriver.common.keys import Keys\n",
"\n",
"import time\n",
"import random"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "92f4c0ed-57f0-4777-8609-3209f16d32f3",
"metadata": {},
"outputs": [],
"source": [
"driver = webdriver.Chrome()\n",
"driver.get(\"https://www.rottentomatoes.com/m/wicked_2024/reviews\")\n",
"\n",
"reject_button = driver.find_element(By.ID, \"onetrust-reject-all-handler\")\n",
"reject_button.click()\n"
]
},
{
"cell_type": "markdown",
"id": "e4bae5e8-8210-40b1-bc40-8638ad9f846c",
"metadata": {},
"source": [
"# Step 4\n",
"\n",
"Wait for `ElementNotInteractableException`"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "626ab7b8-adf0-46ca-9c00-8a81a5695595",
"metadata": {
"scrolled": true
},
"outputs": [
{
"ename": "ElementNotInteractableException",
"evalue": "Message: element not interactable\n (Session info: chrome=131.0.6778.205)\nStacktrace:\n0 chromedriver 0x00000001026eb184 cxxbridge1$str$ptr + 3626716\n1 chromedriver 0x00000001026e39d4 cxxbridge1$str$ptr + 3596076\n2 chromedriver 0x00000001021507d8 cxxbridge1$string$len + 88828\n3 chromedriver 0x0000000102195cd4 cxxbridge1$string$len + 372728\n4 chromedriver 0x000000010218b474 cxxbridge1$string$len + 329624\n5 chromedriver 0x000000010218aec8 cxxbridge1$string$len + 328172\n6 chromedriver 0x00000001021ce5b4 cxxbridge1$string$len + 604376\n7 chromedriver 0x0000000102189568 cxxbridge1$string$len + 321676\n8 chromedriver 0x000000010218a1b8 cxxbridge1$string$len + 324828\n9 chromedriver 0x00000001026b69ac cxxbridge1$str$ptr + 3411716\n10 chromedriver 0x00000001026b9ccc cxxbridge1$str$ptr + 3424804\n11 chromedriver 0x000000010269d86c cxxbridge1$str$ptr + 3308996\n12 chromedriver 0x00000001026ba58c cxxbridge1$str$ptr + 3427044\n13 chromedriver 0x000000010268f09c cxxbridge1$str$ptr + 3249652\n14 chromedriver 0x00000001026d44b8 cxxbridge1$str$ptr + 3533328\n15 chromedriver 0x00000001026d4634 cxxbridge1$str$ptr + 3533708\n16 chromedriver 0x00000001026e3648 cxxbridge1$str$ptr + 3595168\n17 libsystem_pthread.dylib 0x000000019de042e4 _pthread_start + 136\n18 libsystem_pthread.dylib 0x000000019ddff0fc thread_start + 8\n",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mElementNotInteractableException\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[4], line 23\u001b[0m\n\u001b[1;32m 21\u001b[0m \u001b[38;5;66;03m# Assert the child is of type 'rt-button' and click it\u001b[39;00m\n\u001b[1;32m 22\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m child_tag \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrt-button\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mError: Expected \u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mrt-button\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m, but found \u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mchild_tag\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m---> 23\u001b[0m \u001b[43mchild\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mclick\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;66;03m# Click the 'rt-button'\u001b[39;00m\n\u001b[1;32m 25\u001b[0m \u001b[38;5;66;03m# Sleep for a random interval between 1.5 and 5 seconds\u001b[39;00m\n\u001b[1;32m 26\u001b[0m time\u001b[38;5;241m.\u001b[39msleep(random\u001b[38;5;241m.\u001b[39muniform(\u001b[38;5;241m1.5\u001b[39m, \u001b[38;5;241m5\u001b[39m))\n",
"File \u001b[0;32m~/.pyenv/versions/3.11.6/lib/python3.11/site-packages/selenium/webdriver/remote/webelement.py:94\u001b[0m, in \u001b[0;36mWebElement.click\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 92\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mclick\u001b[39m(\u001b[38;5;28mself\u001b[39m) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 93\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"Clicks the element.\"\"\"\u001b[39;00m\n\u001b[0;32m---> 94\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_execute\u001b[49m\u001b[43m(\u001b[49m\u001b[43mCommand\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mCLICK_ELEMENT\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/.pyenv/versions/3.11.6/lib/python3.11/site-packages/selenium/webdriver/remote/webelement.py:395\u001b[0m, in \u001b[0;36mWebElement._execute\u001b[0;34m(self, command, params)\u001b[0m\n\u001b[1;32m 393\u001b[0m params \u001b[38;5;241m=\u001b[39m {}\n\u001b[1;32m 394\u001b[0m params[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mid\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_id\n\u001b[0;32m--> 395\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_parent\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexecute\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcommand\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mparams\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/.pyenv/versions/3.11.6/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py:384\u001b[0m, in \u001b[0;36mWebDriver.execute\u001b[0;34m(self, driver_command, params)\u001b[0m\n\u001b[1;32m 382\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcommand_executor\u001b[38;5;241m.\u001b[39mexecute(driver_command, params)\n\u001b[1;32m 383\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m response:\n\u001b[0;32m--> 384\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43merror_handler\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcheck_response\u001b[49m\u001b[43m(\u001b[49m\u001b[43mresponse\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 385\u001b[0m response[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mvalue\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_unwrap_value(response\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mvalue\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m))\n\u001b[1;32m 386\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m response\n",
"File \u001b[0;32m~/.pyenv/versions/3.11.6/lib/python3.11/site-packages/selenium/webdriver/remote/errorhandler.py:232\u001b[0m, in \u001b[0;36mErrorHandler.check_response\u001b[0;34m(self, response)\u001b[0m\n\u001b[1;32m 230\u001b[0m alert_text \u001b[38;5;241m=\u001b[39m value[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124malert\u001b[39m\u001b[38;5;124m\"\u001b[39m]\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtext\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 231\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m exception_class(message, screen, stacktrace, alert_text) \u001b[38;5;66;03m# type: ignore[call-arg] # mypy is not smart enough here\u001b[39;00m\n\u001b[0;32m--> 232\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m exception_class(message, screen, stacktrace)\n",
"\u001b[0;31mElementNotInteractableException\u001b[0m: Message: element not interactable\n (Session info: chrome=131.0.6778.205)\nStacktrace:\n0 chromedriver 0x00000001026eb184 cxxbridge1$str$ptr + 3626716\n1 chromedriver 0x00000001026e39d4 cxxbridge1$str$ptr + 3596076\n2 chromedriver 0x00000001021507d8 cxxbridge1$string$len + 88828\n3 chromedriver 0x0000000102195cd4 cxxbridge1$string$len + 372728\n4 chromedriver 0x000000010218b474 cxxbridge1$string$len + 329624\n5 chromedriver 0x000000010218aec8 cxxbridge1$string$len + 328172\n6 chromedriver 0x00000001021ce5b4 cxxbridge1$string$len + 604376\n7 chromedriver 0x0000000102189568 cxxbridge1$string$len + 321676\n8 chromedriver 0x000000010218a1b8 cxxbridge1$string$len + 324828\n9 chromedriver 0x00000001026b69ac cxxbridge1$str$ptr + 3411716\n10 chromedriver 0x00000001026b9ccc cxxbridge1$str$ptr + 3424804\n11 chromedriver 0x000000010269d86c cxxbridge1$str$ptr + 3308996\n12 chromedriver 0x00000001026ba58c cxxbridge1$str$ptr + 3427044\n13 chromedriver 0x000000010268f09c cxxbridge1$str$ptr + 3249652\n14 chromedriver 0x00000001026d44b8 cxxbridge1$str$ptr + 3533328\n15 chromedriver 0x00000001026d4634 cxxbridge1$str$ptr + 3533708\n16 chromedriver 0x00000001026e3648 cxxbridge1$str$ptr + 3595168\n17 libsystem_pthread.dylib 0x000000019de042e4 _pthread_start + 136\n18 libsystem_pthread.dylib 0x000000019ddff0fc thread_start + 8\n"
]
}
],
"source": [
"for i in range(100): # Loop to handle 10 pages\n",
" # Scroll to the bottom of the page\n",
" driver.execute_script(\"window.scrollTo(0, document.body.scrollHeight);\")\n",
" \n",
" # Locate the 'load-more-container'\n",
" load_more_container = driver.find_elements(By.CLASS_NAME, \"load-more-container\")\n",
" \n",
" # Assert there's exactly one 'load-more-container'\n",
" assert len(load_more_container) == 1, f\"Error: Expected 1 'load-more-container', but found {len(load_more_container)}.\"\n",
" \n",
" # Get the child elements of the 'load-more-container'\n",
" children = load_more_container[0].find_elements(By.XPATH, \"./*\")\n",
" \n",
" # Assert there's exactly one child\n",
" assert len(children) == 1, f\"Error: Expected 1 child, but found {len(children)}.\"\n",
" \n",
" # Get the child element and check its tag name\n",
" child = children[0]\n",
" child_tag = child.tag_name.lower() # Convert to lowercase for consistency\n",
" \n",
" # Assert the child is of type 'rt-button' and click it\n",
" assert child_tag == \"rt-button\", f\"Error: Expected 'rt-button', but found '{child_tag}'.\"\n",
" child.click() # Click the 'rt-button'\n",
" \n",
" # Sleep for a random interval between 1.5 and 5 seconds\n",
" time.sleep(random.uniform(1.5, 5))\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "2c77a399-2faa-4b6c-9665-b1610ae2da38",
"metadata": {},
"source": [
"# Step 5"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1f31c3ad-aed2-4af0-adf6-c4f6c2db7429",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"# Extract all review rows\n",
"review_rows = driver.find_elements(By.CLASS_NAME, \"review-row\")\n",
"\n",
"# List to store review data\n",
"reviews_data = []\n",
"\n",
"# Loop through each review-row and extract data\n",
"for review_row in review_rows:\n",
" try:\n",
" # Extract critic name\n",
" critic_name = review_row.find_element(By.CSS_SELECTOR, \".display-name\").text\n",
"\n",
" # Extract publication name\n",
" publication_name = review_row.find_element(By.CSS_SELECTOR, \".publication\").text\n",
"\n",
" # Extract sentiment\n",
" sentiment = review_row.find_element(By.CSS_SELECTOR, \"score-icon-critics\").get_attribute(\"sentiment\")\n",
"\n",
" # Extract review text\n",
" review_text = review_row.find_element(By.CSS_SELECTOR, \".review-text\").text\n",
"\n",
" # Extract review date\n",
" review_date = review_row.find_element(By.CSS_SELECTOR, \"[data-qa='review-date']\").text\n",
"\n",
" # Extract full review link\n",
" full_review_link = review_row.find_element(By.CSS_SELECTOR, \"a.full-url\").get_attribute(\"href\")\n",
"\n",
" # Append data to the list\n",
" reviews_data.append({\n",
" \"Critic Name\": critic_name,\n",
" \"Publication\": publication_name,\n",
" \"Sentiment\": sentiment,\n",
" \"Review Text\": review_text,\n",
" \"Review Date\": review_date,\n",
" \"Full Review Link\": full_review_link\n",
" })\n",
" except Exception as e:\n",
" print(f\"Error extracting data for a review row: {e}\")\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "91751ba8-74df-4cb5-90b5-cfb73a6718ad",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Critic Name</th>\n",
" <th>Publication</th>\n",
" <th>Sentiment</th>\n",
" <th>Review Text</th>\n",
" <th>Review Date</th>\n",
" <th>Full Review Link</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Michael Ward</td>\n",
" <td>Should I See It</td>\n",
" <td>POSITIVE</td>\n",
" <td>Bottom line: If you are at all a fan of Wicked...</td>\n",
" <td>Jan 3, 2025</td>\n",
" <td>https://www.shouldiseeit.net/reviews/2024/wick...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>John Serba</td>\n",
" <td>Decider</td>\n",
" <td>NEGATIVE</td>\n",
" <td>I often struggled to right the sails of my fla...</td>\n",
" <td>Jan 2, 2025</td>\n",
" <td>https://decider.com/2024/12/31/wicked-streamin...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Matt Hudson</td>\n",
" <td>Bloody Awesome Movie Podcast</td>\n",
" <td>POSITIVE</td>\n",
" <td>Both Cynthia Erivo and Ariana Grande shine in ...</td>\n",
" <td>Jan 2, 2025</td>\n",
" <td>https://open.spotify.com/episode/119OM2z8oqs0w...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Matthew Huff</td>\n",
" <td>Parade Magazine</td>\n",
" <td>POSITIVE</td>\n",
" <td>Following in a long line of legendary adaptati...</td>\n",
" <td>Jan 2, 2025</td>\n",
" <td>https://parade.com/movies/wicked-part-one-review</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Brooke Obie</td>\n",
" <td>Black Girl Watching</td>\n",
" <td>POSITIVE</td>\n",
" <td>[Elphaba] is perhaps the most radical...charac...</td>\n",
" <td>Jan 1, 2025</td>\n",
" <td>https://open.substack.com/pub/blackgirlwatchin...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>368</th>\n",
" <td>Abe Friedtanzer</td>\n",
" <td>Awards Buzz</td>\n",
" <td>POSITIVE</td>\n",
" <td>From its opening moments, this is a movie mean...</td>\n",
" <td>Nov 19, 2024</td>\n",
" <td>https://awardsbuzz.com/review-wicked-is-a-cine...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>369</th>\n",
" <td>Dan Rubins</td>\n",
" <td>Slant Magazine</td>\n",
" <td>POSITIVE</td>\n",
" <td>Wicked’s frequent patches of sluggishness are ...</td>\n",
" <td>Nov 19, 2024</td>\n",
" <td>https://www.slantmagazine.com/film/wicked-revi...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>370</th>\n",
" <td>Richard Roeper</td>\n",
" <td>Chicago Sun-Times</td>\n",
" <td>POSITIVE</td>\n",
" <td>Still, Erivo and Grande have chemistry in abun...</td>\n",
" <td>Nov 19, 2024</td>\n",
" <td>https://chicago.suntimes.com/movies-and-tv/202...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>371</th>\n",
" <td>Joseph Robinson</td>\n",
" <td>Fish Jelly Films (YouTube)</td>\n",
" <td>POSITIVE</td>\n",
" <td>Wicked delivers on all fronts and should satis...</td>\n",
" <td>Nov 19, 2024</td>\n",
" <td>https://youtu.be/4E_gagd_jRk</td>\n",
" </tr>\n",
" <tr>\n",
" <th>372</th>\n",
" <td>Neil Pond</td>\n",
" <td>Neil's Entertainment Picks</td>\n",
" <td>POSITIVE</td>\n",
" <td>A visually stunning, fantabulously festooned s...</td>\n",
" <td>Nov 18, 2024</td>\n",
" <td>http://neilsentertainmentpicks.com/2024/11/19/...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>373 rows × 6 columns</p>\n",
"</div>"
],
"text/plain": [
" Critic Name Publication Sentiment \\\n",
"0 Michael Ward Should I See It POSITIVE \n",
"1 John Serba Decider NEGATIVE \n",
"2 Matt Hudson Bloody Awesome Movie Podcast POSITIVE \n",
"3 Matthew Huff Parade Magazine POSITIVE \n",
"4 Brooke Obie Black Girl Watching POSITIVE \n",
".. ... ... ... \n",
"368 Abe Friedtanzer Awards Buzz POSITIVE \n",
"369 Dan Rubins Slant Magazine POSITIVE \n",
"370 Richard Roeper Chicago Sun-Times POSITIVE \n",
"371 Joseph Robinson Fish Jelly Films (YouTube) POSITIVE \n",
"372 Neil Pond Neil's Entertainment Picks POSITIVE \n",
"\n",
" Review Text Review Date \\\n",
"0 Bottom line: If you are at all a fan of Wicked... Jan 3, 2025 \n",
"1 I often struggled to right the sails of my fla... Jan 2, 2025 \n",
"2 Both Cynthia Erivo and Ariana Grande shine in ... Jan 2, 2025 \n",
"3 Following in a long line of legendary adaptati... Jan 2, 2025 \n",
"4 [Elphaba] is perhaps the most radical...charac... Jan 1, 2025 \n",
".. ... ... \n",
"368 From its opening moments, this is a movie mean... Nov 19, 2024 \n",
"369 Wicked’s frequent patches of sluggishness are ... Nov 19, 2024 \n",
"370 Still, Erivo and Grande have chemistry in abun... Nov 19, 2024 \n",
"371 Wicked delivers on all fronts and should satis... Nov 19, 2024 \n",
"372 A visually stunning, fantabulously festooned s... Nov 18, 2024 \n",
"\n",
" Full Review Link \n",
"0 https://www.shouldiseeit.net/reviews/2024/wick... \n",
"1 https://decider.com/2024/12/31/wicked-streamin... \n",
"2 https://open.spotify.com/episode/119OM2z8oqs0w... \n",
"3 https://parade.com/movies/wicked-part-one-review \n",
"4 https://open.substack.com/pub/blackgirlwatchin... \n",
".. ... \n",
"368 https://awardsbuzz.com/review-wicked-is-a-cine... \n",
"369 https://www.slantmagazine.com/film/wicked-revi... \n",
"370 https://chicago.suntimes.com/movies-and-tv/202... \n",
"371 https://youtu.be/4E_gagd_jRk \n",
"372 http://neilsentertainmentpicks.com/2024/11/19/... \n",
"\n",
"[373 rows x 6 columns]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Convert to a Pandas DataFrame\n",
"reviews_df = pd.DataFrame(reviews_data)\n",
"\n",
"# Show the dataframe\n",
"reviews_df\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "dbed375b-3c12-4a74-ac46-ac78f3f1b255",
"metadata": {},
"outputs": [],
"source": [
"# # Optionally, save to a CSV file\n",
"# reviews_df.to_csv(\"reviews.csv\", index=False)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment