FrankRuns’s gists

FrankRuns / embeddings_transportation_experiment.py

Created October 19, 2025 13:56

Synthetic data experiment to show how embeddings might improve transportation rate predictions

	# %% [markdown]
	# # Why AI Thinks Phoenix and Miami Belong Together — Tutorial Notebook
	#
	# This notebook-style script walks through:
	# 1) Clustering US cities by geography (lat/lon) vs meaning (embeddings)
	# 2) Building a synthetic lane-rate dataset where semantic city characteristics
	# (derived from embeddings) actually drive part of the rate variance
	# 3) Training three models to predict rate-per-mile (RPM):
	# - Baseline (Distance-only)
	# - Name IDs (one-hot origin/destination) — "memorizer"

FrankRuns / supply_chain_dimension_reduction.py

Created September 12, 2025 11:46

	#!/usr/bin/env python3
	# -- coding: utf-8 --
	"""
	Decision-preserving dimensionality reduction for supply-chain network design.

	What this script does:
	• Strict 1,000-mile lane cap (no k-nearest fallback).
	• Supply-aware clustering: only merge demand points sharing the same TOP-2 nearest DC signature.
	• Demand-weighted clustering guardrails:
	- Mean distance to centroid ≤ CLUSTER_MEAN_MILES

FrankRuns / risk_accum_power_analysis.R

Created August 28, 2025 11:54

	# Accumulating Risk Analysis: "It's there, you just can't see it"
	# Demonstrates how risk accumulates over time and the statistical challenges in detecting it

	# Load required libraries
	library(ggplot2)
	library(dplyr)
	library(broom)

	# Set seed for reproducibility
	set.seed(42)

FrankRuns / gist:13c76f300f15da7d74b3d60d7b7c0d5d

Created August 17, 2025 11:56

risk_complacency_model.R

	###############################################################
	# TUTORIAL: The “complacency model” for safety incidents
	#
	# Audience: Curious operators and analysts. No math background required.
	#
	# Big idea in plain English:
	# - Instead of assuming risk is constant, let’s assume it RISES
	# the longer we go without an incident or intervention.
	# - Think of it like tension in a spring. The longer it goes untouched,
	# the more tightly wound it gets. Eventually something snaps.

FrankRuns / gist:b00437d7717b8ab097a636af86156bb0

Created August 17, 2025 11:51

constant_risk_updating.R

	###############################################################
	# TUTORIAL: Estimating “chance of an incident next week”
	# when you’ve seen zero incidents so far
	#
	# Audience: Curious operators and analysts. No Bayesian background needed.
	#
	# Big idea in plain English:
	# - You start with a reasonable guess about the weekly incident rate
	# (call this your PRIOR belief, based on history/industry norms).
	# - You observe some weeks with no incidents.

FrankRuns / ml_jitter_experiment_prompt.txt

Created January 9, 2025 12:28

Prompt to get python script experimenting with jittering input training data for an ML model.

	I want a Python script demonstrating a simple approach for “shaking up” that historical data. Specifically, show me how to:

	Load the Boston Housing dataset (or a similar publicly available dataset).
	Split the data into training and test sets.
	Add a small amount of random noise (jitter) to the training set features.
	Train one linear regression model on the unmodified data and another on the jittered data.
	Compare the MSE (Mean Squared Error) of each model on the same test set.
	For the jitter, just use a normal distribution with a small standard deviation, something like 0.01. Then show me how the MSE differs between the original and jittered data. If the jittered version yields a lower MSE, let me know in the script output. If it’s worse, let me know that, too.

	Nothing too fancy, just enough that I can make a point about how “bad data” might become surprisingly helpful when we own the uncertainty and inject it. And please include some print statements that display the MSEs. That’s it.

FrankRuns / synthetic_transit_matrix_prompt.txt

Created January 4, 2025 13:58

	You are a highly capable Python programmer who has access to locations.csv, which contains columns name, longitude, latitude, and type.

	Please write a Python script that does the following:

	Reads locations.csv into a pandas DataFrame.
	Enumerates every possible Origin–Destination (OD) pair, but skips certain flows based on the following rules (via a helper function is_valid_flow(origin_type, dest_type)):
	No shipments from Plant -> Customer
	No shipments from DC -> Plant
	No shipments from Customer -> DC
	No shipments from Customer -> Plant

FrankRuns / locations.csv

Created December 22, 2024 12:39

FrankRuns / synthetic_customer_demand_prompt.txt

Created December 22, 2024 12:16

	You have a CSV file called `locations.csv` with columns: name, longitude, latitude, type (including 'Customer' rows), DCs, and plants.

	I want you to:

	1. Filter the data to only include rows where `type == 'Customer'`.

	2. Generate synthetic one-period demand for these customers:

	- Normal scenario: Draw from a normal distribution (mean=100, std=20), clip negatives at 0.

FrankRuns / gist:05c3b6734617516f1cea43ac155c5fc1

Created December 22, 2024 11:40

what_data_prompt.txt

	Generate a Python Script for [Project Objective] Visualization with [Visualization Tools] in a Jupyter Notebook

	Body:
	Objective:

	Clearly describe the purpose of the project, the type of data involved, and the key insights or lessons you aim to convey through visualization. Mention whether you have an existing dataset or need to generate synthetic data.

	Example:
	Create a Python script to visualize supply chain network scenarios using Folium maps. The visualization should compare an optimal distribution strategy (multiple Distribution Centers) versus a suboptimal one (single Distribution Center) to highlight the impact on costs and delivery times. If no data file is provided, generate synthetic data for Distribution Centers (DCs) and Customers.

Frank Corrigan FrankRuns