Skip to content

Instantly share code, notes, and snippets.

View nmagee's full-sized avatar

Neal Magee nmagee

View GitHub Profile
@nmagee
nmagee / README.md
Created November 12, 2025 17:37
Get practice with JSON serialization and de-serialization

Lab: JSON Serialization and Deserialization with REST APIs

Estimated Time: 30-45 minutes
Difficulty: Intermediate
Prerequisites: Basic Python, HTTP requests, JSON concepts

Learning Objectives

By the end of this lab, you will be able to:

  • Fetch data from REST APIs and deserialize JSON responses
@nmagee
nmagee / _README.md
Last active November 10, 2025 20:40
Learn about ways to aggregate data at rest (with GROUP BY) and streaming data (with WINDOWING)

Data Windowing

With endless streaming sources, how do we generate analytics? Windowing is a meaningful way of aggregating data into useful chunks. A consumer operation, not a producer operation. Think of it as a group_by operation for streaming data.

GROUP BY - for data at rest

Note the two attached python scripts that use the NYC Taxi Data and group results by either day or hour (of a single day).

@nmagee
nmagee / _SETUP.md
Created September 24, 2025 20:55
Prefect - Two simple flows

prefect

Install

In a new virtual environment:

pip install prefect requests sympy

Server

@nmagee
nmagee / docker-compose.yml
Created June 13, 2025 11:55
Run a 3-broker Redpanda Kafka cluster
networks:
redpanda_network:
driver: bridge
volumes:
redpanda-0: null
redpanda-1: null
redpanda-2: null
services:
redpanda-0:
command:
@nmagee
nmagee / README.md
Created April 21, 2025 20:44
Lab 9 - Kubernetes Cron Jobs

Kubernetes Cron Jobs

To process multiple jobs at scale in an organized and manageable way, organizations might choose from a variety of tools.

One powerful tool to run and scheduled jobs is Kubernetes, an open source platform first created by Google. Kubernetes (or K8S) orchestrates dozens or even hundreds (or thousands) of containerized jobs using a "declared state" model - which is to say that developers describe the state of their application and the cluster makes it happen.

In this lab you will create and submit a CronJob to run in the UVA Kubernetes cluster.

Instructions

@nmagee
nmagee / README.md
Last active February 8, 2025 19:43
Lab 3

DS2002 Lab 3 - Data Cleaning Scripts

Sample Data

  • Sample data - https://s3.amazonaws.com/ds2002-resources/labs/lab3-bundle.tar.gz - tar-zipped TSV
  • Stock Data - https://s3.amazonaws.com/ds2002-resources/labs/stock_data.tsv - TSV
  • Flight Log - https://s3.amazonaws.com/ds2002-resources/labs/flights.csv - CSV

1. Write a bash script to fetch and decompress a remote bundle

@nmagee
nmagee / README.md
Last active February 5, 2025 20:17
In-class scripting exercises

bash Scripting Exercises

  1. Download the files below. Then try running the textstats.sh script with each of them.

  2. Play wordle using the script below. Can you figure out

    • How to print the secret word before you start guessing? and
    • Where bash is getting the list of 5-letter words?
  3. Using the fileinfo.sh script below, complete the script to display basic information about any file passed as an argument to the script. You will have to complete line 8 to begin, as well as use echo to display values to the user. BONUS: add another variable that calculates how many lines the file contains. Display the 4 file attributes.

@nmagee
nmagee / simple.py
Created November 26, 2024 13:15
A basic fetch and load airflow DAG
from airflow import Dataset
from airflow.decorators import (
dag,
task,
)
from airflow.models import Variable
from pendulum import datetime, duration
import requests
from pymongo import MongoClient, errors
from bson.json_util import dumps
@nmagee
nmagee / s3-material.md
Last active February 25, 2024 14:38
Materials to fetch for DS2002 S3

S3 Sample Data

Fetch dummy data folders and files

curl https://s3.amazonaws.com/ds2002-resources/data/bundle.tar.gz > \
  bundle.tar.gz && \
  tar -xzvf bundle.tar.gz && \
  rm bundle.tar.gz
@nmagee
nmagee / get-secret.py
Last active November 28, 2022 20:22
Decrypt an AWS Secrets Manager secret
import boto3
from botocore.exceptions import ClientError
def get_secret(secret_name,region_name):
# Create a Secrets Manager client
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager',
region_name=region_name