Skip to content

Instantly share code, notes, and snippets.

@papaemman
Last active January 4, 2023 14:11
Show Gist options
  • Select an option

  • Save papaemman/0b29922d3d8835e7dd26f4c04d5bbdb4 to your computer and use it in GitHub Desktop.

Select an option

Save papaemman/0b29922d3d8835e7dd26f4c04d5bbdb4 to your computer and use it in GitHub Desktop.
Authenticate to Google Cloud Platform and interact with services using python

Google Cloud Platform authentication and interaction, using Python

0. Install the gcloud CLI

Guides:

Steps

  1. Download the file

  2. Run script (this will update the ~/.zshrc file)

> cd Downloads
> ./google-cloud-sdk/install.sh
  1. Initialise the cloud CLI (select default project and login)
./google-cloud-sdk/bin/gcloud init
  1. Delete the downloaded folder

1. Authenticate using terminal

  • Run gcloud auth login in terminal (or jupyter notebook)
  • Connect to your GCP account
  • Setup the correct GCP project gcloud config set project prd-data-analytics

2. Open python and interact with services

# Install package
python -m pip install google-cloud-bigquery
python -m pip install 'google-cloud-bigquery[pandas]'
# Test connection with GCP
def implicit():
    from google.cloud import storage

    # If you don't specify credentials when constructing the client, the
    # client library will look for credentials in the environment.
    storage_client = storage.Client()

    # Make an authenticated API request
    buckets = list(storage_client.list_buckets())
    print(buckets)

# Run
implicit()

Examples

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
"""
query_job = client.query(query)  # Make an API request.

print("The query data:")
for row in query_job:
    # Row values can be accessed by field name or index.
    print("name={}, count={}".format(row[0], row["total_people"]))

1. Run an SQL query with parameter from python to BigQuery

from google.cloud import bigquery
project_id = 'prd-mfp-analytics'
client = bigquery.Client(project=project_id)

query = """
        with filtered_faults AS (
        SELECT 
            site_name
            ,robot_id
            ,msg.attribute AS fault_type
            ,msg.value
            ,time
            ,msg.header.stamp.sec+msg.header.stamp.nsec/1000000000 AS start_time
            ,LEAD(msg.header.stamp.sec+msg.header.stamp.nsec/1000000000) OVER (PARTITION BY site_name, robot_id, msg.attribute ORDER BY msg.header.stamp.sec+msg.header.stamp.nsec/1000000000 ASC) AS end_time
            ,LEAD(msg.value) OVER (PARTITION BY site_name, robot_id, msg.attribute ORDER BY msg.header.stamp.sec+msg.header.stamp.nsec/1000000000 ASC) AS next_fault_value
        FROM `prd-mfp-analytics.analytics.info_events`
        WHERE 1=1
            AND time between TIMESTAMP('{start_date}') and TIMESTAMP('{end_date}')
            AND msg.entity = 'FAULT'
            AND site_name IS NOT NULL
            AND NOT REGEXP_CONTAINS(site_name, r".{{6,}}sim|.{{6,}}tst|simulation|techops|qa_stress|simsolutionsmovement|wagedwtrn") --remove simumation sites
        )

        -- filter to remove unclosed faults and return a duration for each remaining fault
        SELECT  
            end_time - start_time AS seconds_faulted
        FROM filtered_faults
        WHERE 1=1
          AND value = 1 
          AND next_fault_value = 0
"""

start_date_1 = '2021-06-05'
end_date_1 = '2021-06-08'

formatted_query_1 = query.format(start_date = start_date_1, end_date = end_date_1)

fault_time_df_1 = client.query(formatted_query_1).to_dataframe()

print(fault_time_df_1)

2. Run an SQL query and save the results to a new pandas DataFrame that's named regions_by_country.

You provide that name by using an argument with the %%bigquery magic command.

# Run this cell to load bigquery magic
%load_ext google.cloud.bigquery
%%bigquery regions_by_country
SELECT
  country_code,
  country_name,
  COUNT(DISTINCT region_code) AS num_regions
FROM
  `bigquery-public-data.google_trends.international_top_terms`
WHERE
  refresh_date = DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY)
GROUP BY
  country_code, country_name
ORDER BY
  num_regions DESC;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment