Last active
August 18, 2025 10:21
-
-
Save KMarkert/691c28596402340671dbbe518d716585 to your computer and use it in GitHub Desktop.
Colab Notebook highlighting how to visualize geospatial data from BigQuery in Colab with Matplotlib
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "nbformat": 4, | |
| "nbformat_minor": 0, | |
| "metadata": { | |
| "colab": { | |
| "private_outputs": true, | |
| "provenance": [] | |
| }, | |
| "kernelspec": { | |
| "name": "python3", | |
| "display_name": "Python 3" | |
| }, | |
| "language_info": { | |
| "name": "python" | |
| } | |
| }, | |
| "cells": [ | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "#@title Copyright 2025 Google LLC\n", | |
| "\n", | |
| "# Copyright 2025 Google LLC\n", | |
| "#\n", | |
| "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", | |
| "# you may not use this file except in compliance with the License.\n", | |
| "# You may obtain a copy of the License at\n", | |
| "#\n", | |
| "# https://www.apache.org/licenses/LICENSE-2.0\n", | |
| "#\n", | |
| "# Unless required by applicable law or agreed to in writing, software\n", | |
| "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", | |
| "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", | |
| "# See the License for the specific language governing permissions and\n", | |
| "# limitations under the License." | |
| ], | |
| "metadata": { | |
| "cellView": "form", | |
| "id": "OXA2rpXfoyFo" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "# Visualizing BigQuery Earth Engine Results in a Python Notebook\n", | |
| "\n", | |
| "<a href=https://colab.research.google.com/gist/KMarkert/691c28596402340671dbbe518d716585/ee_in_bq_blog_viz.ipynb>\n", | |
| " <img src=https://colab.research.google.com/assets/colab-badge.svg alt=\"Open in Colab\">\n", | |
| "</a>\n", | |
| "\n", | |
| "This notebook provides a complete workflow for visualizing the results of the extreme precipitation analysis performed in BigQuery. While the BigQuery console offers powerful built-in map visualization, this approach is ideal when you need to:\n", | |
| "\n", | |
| "* Create highly customized, publication-quality plots.\n", | |
| "* Integrate the results into a larger Python-based data science workflow.\n", | |
| "* Share a self-contained, reproducible analysis.\n", | |
| "\n", | |
| "\n", | |
| "We will walk through three main steps:\n", | |
| "1. **Querying Data:** Execute the same analysis query from the blog post directly from Python and load the results.\n", | |
| "2. **Geospatial Transformation:** Convert the query results into a GeoPandas GeoDataFrame.\n", | |
| "3. **Custom Visualization:** Create a three-panel comparison plot using Matplotlib to visualize the changes between the two climate epochs." | |
| ], | |
| "metadata": { | |
| "id": "x9Wpp6-_ltAV" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "metadata": { | |
| "id": "7228e77b" | |
| }, | |
| "source": [ | |
| "from google.colab import auth\n", | |
| "\n", | |
| "auth.authenticate_user()" | |
| ], | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "from google.cloud import bigquery\n", | |
| "import geopandas as gpd\n", | |
| "import pandas as pd\n", | |
| "from shapely.wkt import loads\n", | |
| "\n", | |
| "import matplotlib.pyplot as plt\n", | |
| "import matplotlib.colorbar as cbar" | |
| ], | |
| "metadata": { | |
| "id": "cuIHSxl1kYFy" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "3f0957b2" | |
| }, | |
| "source": [ | |
| "## Step 1: Querying the Analysis Results from BigQuery\n", | |
| "\n", | |
| "First, we define the SQL query to be executed. This is the same query used in the blog post, which performs the return period calculation." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "source": [ | |
| "query = \"\"\"\n", | |
| "-- This query calculates the difference in precipitation (in mm) for various return periods\n", | |
| "-- between two distinct 30-year epochs.\n", | |
| "-- Replace YOUR_PROJECT.YOUR_DATASET.de_yearly_max_precip with your table.\n", | |
| "\n", | |
| "\n", | |
| "CREATE TEMP FUNCTION\n", | |
| " CalculateReturnPeriod(rp INT64, xbar FLOAT64, sigma FLOAT64)\n", | |
| " RETURNS FLOAT64 AS ( ROUND(-LOG(-LOG(1 - (1 / rp))) * sigma * .7797 + xbar - (.45 * sigma), 4) );\n", | |
| "\n", | |
| "\n", | |
| "WITH\n", | |
| " yearly_maxes AS (\n", | |
| " FROM YOUR_PROJECT.YOUR_DATASET.de_yealy_max_precip\n", | |
| " ),\n", | |
| " epochs AS (\n", | |
| " FROM UNNEST([\n", | |
| " STRUCT('epoch1' AS epoch, 1980 AS start_year, 2010 AS end_year),\n", | |
| " STRUCT('epoch2' AS epoch, 1994 AS start_year, 2024 AS end_year)])\n", | |
| " ),\n", | |
| " return_periods AS (\n", | |
| " FROM UNNEST([10,25,50,100,500]) AS years\n", | |
| " |> EXTEND null AS dummy\n", | |
| " |> DROP dummy\n", | |
| " ),\n", | |
| " geometries AS (\n", | |
| " FROM bigquery-public-data.overture_maps.division_area\n", | |
| " |> WHERE country = 'DE' AND subtype = 'county'\n", | |
| " |> SELECT\n", | |
| " id AS county_id,\n", | |
| " names.primary AS county,\n", | |
| " ST_SIMPLIFY(geometry,500) AS geometry\n", | |
| " )\n", | |
| "\n", | |
| "\n", | |
| "FROM epochs\n", | |
| "|> JOIN yearly_maxes ON yearly_maxes.year BETWEEN epochs.start_year AND epochs.end_year\n", | |
| "|> AGGREGATE\n", | |
| " AVG(max_1day_precipitation * 1e3) AS avg_yearly_max_daily_precip,\n", | |
| " STDDEV(max_1day_precipitation * 1e3) AS stddev_yearly_max_daily_precip,\n", | |
| " GROUP BY county_id, epoch\n", | |
| "|> CROSS JOIN return_periods rp\n", | |
| "|> EXTEND\n", | |
| " CalculateReturnPeriod(rp.years, avg_yearly_max_daily_precip, stddev_yearly_max_daily_precip) AS precip_value_rp\n", | |
| "|> DROP avg_yearly_max_daily_precip, stddev_yearly_max_daily_precip\n", | |
| "|> PIVOT (ANY_VALUE(precip_value_rp) AS precip_value_rp FOR epoch IN ('epoch1', 'epoch2'))\n", | |
| "|> PIVOT (ANY_VALUE(precip_value_rp_epoch1) AS epoch1, ANY_VALUE(precip_value_rp_epoch2) AS epoch2 FOR years IN (10, 25, 50, 100, 500))\n", | |
| "|> AS place\n", | |
| "|> EXTEND\n", | |
| " (SELECT geometry FROM geometries WHERE geometries.county_id = place.county_id) AS geometry,\n", | |
| " epoch2_10 - epoch1_10 AS difference_rp_10,\n", | |
| " epoch2_25 - epoch1_25 AS difference_rp_25,\n", | |
| " epoch2_50 - epoch1_50 AS difference_rp_50,\n", | |
| " epoch2_100 - epoch1_100 AS difference_rp_100,\n", | |
| " epoch2_500 - epoch1_500 AS difference_rp_500\n", | |
| "|> ORDER BY county_id\n", | |
| "\"\"\"" | |
| ], | |
| "metadata": { | |
| "id": "zSxMw0hUTHgM" | |
| }, | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "source": [ | |
| "Now, we use the google-cloud-bigquery Python client to execute the query.\n", | |
| "We initialize the client, specifying our Google Cloud project.\n", | |
| "We pass our SQL query to the `client.query()` method, which sends the job to BigQuery.\n", | |
| "\n", | |
| "Finally, we call `.to_dataframe()` on the completed job to download the results into a pandas DataFrame, which is a standard format for data analysis in Python." | |
| ], | |
| "metadata": { | |
| "id": "UZa3xq1VmNHo" | |
| } | |
| }, | |
| { | |
| "cell_type": "code", | |
| "metadata": { | |
| "id": "d86db91e" | |
| }, | |
| "source": [ | |
| "# Construct a BigQuery client object.\n", | |
| "# Replace 'YOUR-PROJECT' with your actual Google Cloud project ID\n", | |
| "client = bigquery.Client(project='YOUR-PROJECT')\n", | |
| "\n", | |
| "query_job = client.query(query) # Make an API request.\n", | |
| "\n", | |
| "# Convert the query results to a pandas DataFrame.\n", | |
| "df = query_job.to_dataframe()\n", | |
| "\n", | |
| "display(df.head())" | |
| ], | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "c4dc04d7" | |
| }, | |
| "source": [ | |
| "## Step 2: Converting Data to a GeoDataFrame\n", | |
| "\n", | |
| "The query results are now in a pandas DataFrame, but the geometry column is represented as a WKT (Well-Known Text) string. To perform geospatial operations and plotting, we need to convert this into a GeoDataFrame, the core data structure in GeoPandas.\n", | |
| "\n", | |
| "This involves three key steps:\n", | |
| "1. **Parse Geometries:** We use the loads function from the shapely library to parse the WKT string in each row into a shapely geometry object.\n", | |
| "2. **Create GeoDataFrame:** We initialize a gpd.GeoDataFrame, passing our pandas DataFrame and specifying which column contains the geometry.\n", | |
| "3. **Set CRS/Reproject:** We explicitly set the Coordinate Reference System (CRS) to EPSG:4326 (WGS 84) which BigQuery uses for all geometry types. Typically this CRS is not widely used for mapping applications due to its inherent distortion and dynamic nature therefore a repojection is done to put the geometries in local UTM projection." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "metadata": { | |
| "id": "8b1a8ea7" | |
| }, | |
| "source": [ | |
| "# Convert the 'geometry' column from string to geometry objects\n", | |
| "df['geometry'] = df['geometry'].apply(loads)\n", | |
| "\n", | |
| "# Convert the pandas DataFrame to a GeoDataFrame\n", | |
| "gdf = gpd.GeoDataFrame(df, geometry='geometry').set_crs(epsg=4326)\n", | |
| "\n", | |
| "# Reproject the geography data to UTM Zone 32N for Germany\n", | |
| "gdf.to_crs(epsg=32632 , inplace=True)\n" | |
| ], | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "metadata": { | |
| "id": "b0983914" | |
| }, | |
| "source": [ | |
| "# Create a new GeoDataFrame with the specified columns\n", | |
| "gdf_subset = gdf[['geometry', 'epoch1_100', 'epoch2_100', 'difference_rp_100']]\n", | |
| "\n", | |
| "# Display the first few rows of the new gdf_subset GeoDataFrame\n", | |
| "display(gdf_subset.head())" | |
| ], | |
| "execution_count": null, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "eb7e1f84" | |
| }, | |
| "source": [ | |
| "## Step 3: Creating a Multi-Panel Visualization\n", | |
| "With our data properly formatted in a GeoDataFrame, we can now create a custom plot using Matplotlib. Our goal is to create a three-panel figure that clearly shows:\n", | |
| "1. The 100-year precipitation values for the first epoch (1980-2010).\n", | |
| "2. The 100-year precipitation values for the second, more recent epoch (1994-2024).\n", | |
| "3. The absolute difference between the two epochs, highlighting areas of significant change.\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "metadata": { | |
| "id": "141ead2a" | |
| }, | |
| "source": [ | |
| "# Create a figure and a set of subplots (1 row, 3 columns)\n", | |
| "fig, axs = plt.subplots(1, 3, figsize=(16, 8))\n", | |
| "\n", | |
| "# Plot each column on its respective subplot\n", | |
| "gdf_subset.plot(column='epoch1_100', ax=axs[0], legend=False, vmin=40, vmax=100)\n", | |
| "gdf_subset.plot(column='epoch2_100', ax=axs[1], legend=False, vmin=40, vmax=100)\n", | |
| "gdf_subset.plot(column='difference_rp_100', ax=axs[2], legend=False, cmap='RdYlBu', vmin=-25, vmax=25)\n", | |
| "\n", | |
| "\n", | |
| "# Set titles for each subplot\n", | |
| "axs[0].set_title('Epoch 1 (1980-2010)')\n", | |
| "axs[1].set_title('Epoch 2 (1994-2024)')\n", | |
| "axs[2].set_title('Difference (Epoch 2 - Epoch 1)')\n", | |
| "\n", | |
| "fig.text(0.5,0.99, '100-year Return Period Precipitation', ha='center', rotation='horizontal',\n", | |
| " fontproperties={'size':24,'weight':'bold'})\n", | |
| "\n", | |
| "# Create a shared colorbar for the first two plots\n", | |
| "cbar_ax_epochs = fig.add_axes([0.15, 0.05, 0.3, 0.02]) # [left, bottom, width, height]\n", | |
| "sm_epochs = plt.cm.ScalarMappable(cmap='viridis', norm=plt.Normalize(vmin=40, vmax=100))\n", | |
| "sm_epochs._A = [] # Hack to make the colorbar work with a ScalarMappable\n", | |
| "cbar.ColorbarBase(cbar_ax_epochs, cmap='viridis',\n", | |
| " norm=plt.Normalize(vmin=40, vmax=100),\n", | |
| " orientation='horizontal',\n", | |
| " label='Precipitation [mm]')\n", | |
| "\n", | |
| "# Create a separate colorbar for the difference plot\n", | |
| "cbar_ax_diff = fig.add_axes([0.73, 0.05, 0.2, 0.02]) # [left, bottom, width, height]\n", | |
| "sm_diff = plt.cm.ScalarMappable(cmap='RdYlBu', norm=plt.Normalize(vmin=-25, vmax=25))\n", | |
| "sm_diff._A = [] # Hack to make the colorbar work with a ScalarMappable\n", | |
| "cbar.ColorbarBase(cbar_ax_diff, cmap='RdYlBu',\n", | |
| " norm=plt.Normalize(vmin=-25, vmax=25),\n", | |
| " orientation='horizontal',\n", | |
| " label='Precipitation difference [mm]')\n", | |
| "\n", | |
| "# Remove axis\n", | |
| "for ax in axs:\n", | |
| " ax.set_axis_off()\n", | |
| "\n", | |
| "# Adjust layout\n", | |
| "plt.tight_layout()\n", | |
| "\n", | |
| "# Display the plot\n", | |
| "plt.show()" | |
| ], | |
| "execution_count": null, | |
| "outputs": [] | |
| } | |
| ] | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment