Created
March 9, 2026 01:52
-
-
Save danwagnerco/cf23d48c10bf9903b2319f30a0a51b44 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "id": "4j7wbwrmd1w", | |
| "metadata": {}, | |
| "source": [ | |
| "# Dark Pool Aggregate Flow Analysis\n", | |
| "\n", | |
| "Interactive exploration and conclusions using Unusual Whales dark pool transaction data set.\n", | |
| "\n", | |
| "## Reference\n", | |
| "- **How Rigged Are Stock Markets? Evidence from Microsecond Timestamps by Robert P. Bartlett, III and Justin McCrary**\n", | |
| " - Published to the Journal of Financial Markets (Volume 45, September 2019)\n", | |
| " - Available for free from UC Berkeley since both authors are associated with the university: [https://www.law.berkeley.edu/wp-content/uploads/2019/10/bartlett_mccrary_latency2017.pdf](https://www.law.berkeley.edu/wp-content/uploads/2019/10/bartlett_mccrary_latency2017.pdf)\n", | |
| "- **Can You Swim in a Dark Pool?**\n", | |
| " - Published November 15, 2023\n", | |
| " - Available for free from FINRA: [https://www.finra.org/investors/insights/can-you-swim-dark-pool](https://www.finra.org/investors/insights/can-you-swim-dark-pool)\n", | |
| "- **Short Is Long**\n", | |
| " - Published March 2018\n", | |
| " - Available for free from Squeezemetrics: [https://squeezemetrics.com/monitor/download/pdf/short_is_long.pdf](https://squeezemetrics.com/monitor/download/pdf/short_is_long.pdf)\n", | |
| "\n", | |
| "## Definitions\n", | |
| "- **Alternative Trading System (\"ATS\"):** a non-exchange venue for equity trading\n", | |
| "- **Dark Pool:** a type of ATS whose defining characteristic is **no public order book**, which means dark pools display no bid, offer, or market depth, as explained in the FINRA article \"Can You Swim in a Dark Pool?\" above\n", | |
| "- **Trade Reporting Facility (\"TRF\"):** one of two FINRA-managed entities that receives off-exchange transactions, one operated by NYSE and one operated by Nasdaq with the Nasdaq TRF receiving the vast majority of the non-exchange trade reports (87.27% of all non-exchange trade reports in Bartlett & McCrary 2019)\n", | |
| "- **Securities Information Processors (\"SIP\"s):** two entities that provide consolidated pricing to the public, U.S. regulations mandate that all trading centers disclose their quote updates and trades to both SIPs\n", | |
| "- **National Best Bid and Offer (\"NBBO\"):** the highest bid and the lowest offer broadcast by the SIPs based on the consolidated pricing, **important: there is no requirement that the best bid and best offer be sourced from the same exchange**\n", | |
| "\n", | |
| "## Dark Pool Data Notes\n", | |
| "- Unusual Whales only provides dark pool transaction data from the Nasdaq TRF\n", | |
| "- The historical data set explored here only includes dark pool transactions where the notional size was at least $100K\n", | |
| "- Dark pools do not have a public order book (this is their defining characteristic) so Unusual Whales matches these transactions against the NBBO from Nasdaq's \"QBBO\" product\n", | |
| "\n", | |
| "## Key Takeaway 1\n", | |
| "Bartlett & McCrary (2019) found that 51.43% of unadjusted FINRA TRF transactions matched the SIP NBBO (see Table 5, Panel A in their paper linked above), based on observations from August 6th, 2015 through June 30th, 2016. We do not contest this finding and have no means to dispute it anyway. The Unusual Whales historical dark pool data set used for testing spans August 22nd, 2022 through December 31st, 2025, and **across this time period the total trade execution placement vs. QBBO was:**\n", | |
| "\n", | |
| "| Price \"Bucket\" | Shares | Pct |\n", | |
| "|---|---|---|\n", | |
| "| \"below_midpoint\" | 154864129919 | 20.18 |\n", | |
| "| \"above_midpoint\" | 154791930424 | 20.17 |\n", | |
| "| \"at_bid\" | 142699996308 | 18.6 |\n", | |
| "| \"at_ask\" | 138399629861 | 18.04 |\n", | |
| "| \"at_midpoint\" | 96551515022 | 12.58 |\n", | |
| "| \"below_bid\" | 43376178948 | 5.65 |\n", | |
| "| \"above_ask\" | 36566174133 | 4.77 |\n", | |
| "\n", | |
| "## Key Takeaway 2\n", | |
| "The distribution of trade execution placement vs. QBBO is stable across each full year period as shown by the Jensen-Shannon Divergence (\"JSD\") results:\n", | |
| "\n", | |
| "| Period | JSD |\n", | |
| "|---|---|\n", | |
| "| 2023 vs 2024 | 0.0017 |\n", | |
| "| 2023 vs 2025 | 0.0045 |\n", | |
| "| 2024 vs 2025 | 0.0008 |\n", | |
| "\n", | |
| "## Key Takeaway 3\n", | |
| "In a comparison of 20 NASDAQ-listed securities to 20 NYSE-listed securities, there was no evidence of an accuracy bias towards NASDAQ-listed securities:\n", | |
| "\n", | |
| "| Listing Venue | Total Shares | Outside QBBO Pct | At Midpoint Pct | Directionally Classifiable Pct |\n", | |
| "|---|---|---|---|---|\n", | |
| "| NASDAQ | 90414340704 | 19.9529 | 10.2899 | 69.7572 |\n", | |
| "| NYSE | 15901675066 | 9.7906 | 17.4888 | 72.7206 |\n", | |
| "\n", | |
| "## Conclusion\n", | |
| "**The Unusual Whales historical dark pool data set may have systematic bias with stable error characteristics.** The appropriate analogy is that of a biased scale that reports every item as 20 pounds heavier than its actual weight. It might not accurately tell you how much you weigh, but if your goal is to lose weight, you can still run an effective diet program using it for your weight measurements because it is consistent over time.\n", | |
| "\n", | |
| "## Next Steps for the Reader\n", | |
| "- Consider calculating yearly ticker-level JSD levels as a baseline, then looking for deviations from from those levels on shorter timeframes (like a day or a week)\n", | |
| "- Consider building ticker-level \"flow scores\" that agggregate directional trades over various time periods then assessing forward price changes\n", | |
| "- Consider creating a dark pool \"utilization\" metric that compares dark pool volume vs. lit exchange volume for evidence of institutional activity at certain prices or times\n", | |
| "- Consider constructing sector or thematic baskets rather than individual tickers for the above exercises like Squeezemetrics did for the S&P500 with their \"DIX\" measurement" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "0ygud7wtfda", | |
| "metadata": {}, | |
| "source": [ | |
| "## Configuration & Imports" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "id": "l49l8m9jhyd", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "\n", | |
| " <div id=\"pJ7rYr\"></div>\n", | |
| " <script type=\"text/javascript\" data-lets-plot-script=\"library\">\n", | |
| " if(!window.letsPlotCallQueue) {\n", | |
| " window.letsPlotCallQueue = [];\n", | |
| " };\n", | |
| " window.letsPlotCall = function(f) {\n", | |
| " window.letsPlotCallQueue.push(f);\n", | |
| " };\n", | |
| " (function() {\n", | |
| " var script = document.createElement(\"script\");\n", | |
| " script.type = \"text/javascript\";\n", | |
| " script.src = \"https://cdn.jsdelivr.net/gh/JetBrains/lets-plot@v4.8.2/js-package/distr/lets-plot.min.js\";\n", | |
| " script.onload = function() {\n", | |
| " window.letsPlotCall = function(f) {f();};\n", | |
| " window.letsPlotCallQueue.forEach(function(f) {f();});\n", | |
| " window.letsPlotCallQueue = [];\n", | |
| " \n", | |
| " };\n", | |
| " script.onerror = function(event) {\n", | |
| " window.letsPlotCall = function(f) {}; // noop\n", | |
| " window.letsPlotCallQueue = [];\n", | |
| " var div = document.createElement(\"div\");\n", | |
| " div.style.color = 'darkred';\n", | |
| " div.textContent = 'Error loading Lets-Plot JS';\n", | |
| " document.getElementById(\"pJ7rYr\").appendChild(div);\n", | |
| " };\n", | |
| " var e = document.getElementById(\"pJ7rYr\");\n", | |
| " e.appendChild(script);\n", | |
| " })()\n", | |
| " </script>\n", | |
| " " | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| } | |
| ], | |
| "source": [ | |
| "from pathlib import Path\n", | |
| "from itertools import combinations\n", | |
| "\n", | |
| "import polars as pl\n", | |
| "import lets_plot as lp\n", | |
| "import numpy as np\n", | |
| "from scipy.spatial.distance import jensenshannon\n", | |
| "\n", | |
| "lp.LetsPlot.setup_html()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "b80c11c7", | |
| "metadata": {}, | |
| "source": [ | |
| "## 1. Distribution of Trade Executions" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "id": "0a3497d0", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Loaded 209,272,081 trades across 844 trading days\n", | |
| "Date range: 2022-08-22 to 2025-12-31\n", | |
| "Unique tickers: shape: (14_396,)\n", | |
| "Series: 'ticker' [str]\n", | |
| "[\n", | |
| "\t\"STBA\"\n", | |
| "\t\"IDEX\"\n", | |
| "\t\"OKE\"\n", | |
| "\t\"CWEN.A\"\n", | |
| "\t\"WTRG\"\n", | |
| "\t…\n", | |
| "\t\"ATER\"\n", | |
| "\t\"VINC\"\n", | |
| "\t\"PMIO\"\n", | |
| "\t\"SCMB\"\n", | |
| "\t\"GAPR\"\n", | |
| "]\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div><style>\n", | |
| ".dataframe > thead > tr,\n", | |
| ".dataframe > tbody > tr {\n", | |
| " text-align: right;\n", | |
| " white-space: pre-wrap;\n", | |
| "}\n", | |
| "</style>\n", | |
| "<small>shape: (5, 6)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>ticker</th><th>trade_date</th><th>price</th><th>size</th><th>nbbo_bid</th><th>nbbo_ask</th></tr><tr><td>str</td><td>date</td><td>f64</td><td>i64</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>"COST"</td><td>2022-08-22</td><td>545.735</td><td>277</td><td>545.7</td><td>545.78</td></tr><tr><td>"GOOG"</td><td>2022-08-22</td><td>115.075</td><td>873</td><td>115.07</td><td>115.08</td></tr><tr><td>"GOOG"</td><td>2022-08-22</td><td>115.075</td><td>872</td><td>115.07</td><td>115.08</td></tr><tr><td>"AAPL"</td><td>2022-08-22</td><td>167.575</td><td>1563</td><td>167.58</td><td>167.59</td></tr><tr><td>"RSP"</td><td>2022-08-22</td><td>146.31</td><td>2400</td><td>146.3</td><td>146.32</td></tr></tbody></table></div>" | |
| ], | |
| "text/plain": [ | |
| "shape: (5, 6)\n", | |
| "┌────────┬────────────┬─────────┬──────┬──────────┬──────────┐\n", | |
| "│ ticker ┆ trade_date ┆ price ┆ size ┆ nbbo_bid ┆ nbbo_ask │\n", | |
| "│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", | |
| "│ str ┆ date ┆ f64 ┆ i64 ┆ f64 ┆ f64 │\n", | |
| "╞════════╪════════════╪═════════╪══════╪══════════╪══════════╡\n", | |
| "│ COST ┆ 2022-08-22 ┆ 545.735 ┆ 277 ┆ 545.7 ┆ 545.78 │\n", | |
| "│ GOOG ┆ 2022-08-22 ┆ 115.075 ┆ 873 ┆ 115.07 ┆ 115.08 │\n", | |
| "│ GOOG ┆ 2022-08-22 ┆ 115.075 ┆ 872 ┆ 115.07 ┆ 115.08 │\n", | |
| "│ AAPL ┆ 2022-08-22 ┆ 167.575 ┆ 1563 ┆ 167.58 ┆ 167.59 │\n", | |
| "│ RSP ┆ 2022-08-22 ┆ 146.31 ┆ 2400 ┆ 146.3 ┆ 146.32 │\n", | |
| "└────────┴────────────┴─────────┴──────┴──────────┴──────────┘" | |
| ] | |
| }, | |
| "execution_count": 2, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "data_lake_path = Path(\"C:/lake/silver/dark_pool\") # Hive-partitioned root\n", | |
| "glob = str(data_lake_path / \"**\" / \"*.parquet\")\n", | |
| "scan: pl.LazyFrame = pl.scan_parquet(glob, hive_partitioning=True)\n", | |
| "assert isinstance(scan, pl.LazyFrame)\n", | |
| "\n", | |
| "# Data filtering\n", | |
| "# - Keep all dates, want to look at the entire date period spanning 2022 through 2025\n", | |
| "# - Exclude:\n", | |
| "# (1) cancelled trades\n", | |
| "# (2) special trade codes like \"derivative priced\", \"qualified_contingent_trade\", and \"intermarket_sweep\"\n", | |
| "# (3) extended hours trades (Reg NMS does not apply outside of regular trading hours)\n", | |
| "# (4) non-regular settlement\n", | |
| "# (5) non-regular sale condition codes like \"average_price_trade\", \"prior_reference_price\", etc.\n", | |
| "REGULAR_SETTLEMENT_VALUES = {\n", | |
| " \"regular\", # 2025+\n", | |
| " \"regular_settlement\", # 2022-2024\n", | |
| "}\n", | |
| "base_filter = (\n", | |
| " pl.col(\"canceled\").eq(False)\n", | |
| " & pl.col(\"trade_code\").is_null()\n", | |
| " & pl.col(\"ext_hour_sold_codes\").is_null()\n", | |
| " & (\n", | |
| " pl.col(\"trade_settlement\").is_null()\n", | |
| " | pl.col(\"trade_settlement\").is_in(REGULAR_SETTLEMENT_VALUES)\n", | |
| " )\n", | |
| " & pl.col(\"sale_cond_codes\").is_null()\n", | |
| ")\n", | |
| "\n", | |
| "df = (\n", | |
| " scan\n", | |
| " .filter(base_filter)\n", | |
| " .with_columns(\n", | |
| " trade_date = pl.col(\"date\").cast(pl.Date)\n", | |
| " )\n", | |
| " .select([\n", | |
| " \"ticker\",\n", | |
| " \"trade_date\",\n", | |
| " \"price\",\n", | |
| " \"size\",\n", | |
| " \"nbbo_bid\",\n", | |
| " \"nbbo_ask\"\n", | |
| " ])\n", | |
| " .collect()\n", | |
| ")\n", | |
| "assert isinstance(df, pl.DataFrame)\n", | |
| "df = df.sort(\"trade_date\", descending=False)\n", | |
| "\n", | |
| "print(f\"Loaded {len(df):,} trades across {df['trade_date'].n_unique()} trading days\")\n", | |
| "print(f\"Date range: {df['trade_date'].min()} to {df['trade_date'].max()}\")\n", | |
| "print(f\"Unique tickers: {df['ticker'].unique()}\")\n", | |
| "df.head(5)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "9ff9a5e4", | |
| "metadata": {}, | |
| "source": [ | |
| "Now that we have consolidated this data, we need to classify trades against the matched QBBO:\n", | |
| "\n", | |
| "| Bucket | Condition | Classification |\n", | |
| "|---|---|---|\n", | |
| "| `above_ask` | price > ask | QBBO failure (unreliable) |\n", | |
| "| `at_ask` | price == ask | Clear buyer aggressor |\n", | |
| "| `above_midpoint` | mid < price < ask | Probable buyer aggressor |\n", | |
| "| `at_midpoint` | price == mid | Ambiguous (excluded from flow) |\n", | |
| "| `below_midpoint` | bid < price < mid | Probable seller aggressor |\n", | |
| "| `at_bid` | price == bid | Clear seller aggressor |\n", | |
| "| `below_bid` | price < bid | QBBO failure (unreliable) |" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "id": "dbf6f353", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Price bucket distribution (share-weighted):\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div><style>\n", | |
| ".dataframe > thead > tr,\n", | |
| ".dataframe > tbody > tr {\n", | |
| " text-align: right;\n", | |
| " white-space: pre-wrap;\n", | |
| "}\n", | |
| "</style>\n", | |
| "<small>shape: (7, 3)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>price_bucket</th><th>shares</th><th>pct</th></tr><tr><td>str</td><td>i64</td><td>f64</td></tr></thead><tbody><tr><td>"below_midpoint"</td><td>154864129919</td><td>20.18</td></tr><tr><td>"above_midpoint"</td><td>154791930424</td><td>20.17</td></tr><tr><td>"at_bid"</td><td>142699996308</td><td>18.6</td></tr><tr><td>"at_ask"</td><td>138399629861</td><td>18.04</td></tr><tr><td>"at_midpoint"</td><td>96551515022</td><td>12.58</td></tr><tr><td>"below_bid"</td><td>43376178948</td><td>5.65</td></tr><tr><td>"above_ask"</td><td>36566174133</td><td>4.77</td></tr></tbody></table></div>" | |
| ], | |
| "text/plain": [ | |
| "shape: (7, 3)\n", | |
| "┌────────────────┬──────────────┬───────┐\n", | |
| "│ price_bucket ┆ shares ┆ pct │\n", | |
| "│ --- ┆ --- ┆ --- │\n", | |
| "│ str ┆ i64 ┆ f64 │\n", | |
| "╞════════════════╪══════════════╪═══════╡\n", | |
| "│ below_midpoint ┆ 154864129919 ┆ 20.18 │\n", | |
| "│ above_midpoint ┆ 154791930424 ┆ 20.17 │\n", | |
| "│ at_bid ┆ 142699996308 ┆ 18.6 │\n", | |
| "│ at_ask ┆ 138399629861 ┆ 18.04 │\n", | |
| "│ at_midpoint ┆ 96551515022 ┆ 12.58 │\n", | |
| "│ below_bid ┆ 43376178948 ┆ 5.65 │\n", | |
| "│ above_ask ┆ 36566174133 ┆ 4.77 │\n", | |
| "└────────────────┴──────────────┴───────┘" | |
| ] | |
| }, | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "# First pass: compute the mid-price for all transactions\n", | |
| "df = df.with_columns(\n", | |
| " mid = ((pl.col(\"nbbo_bid\") + pl.col(\"nbbo_ask\")) / 2.0)\n", | |
| ")\n", | |
| "\n", | |
| "# Second pass: 'bucket' each transaction into a price group like 'above_ask' etc.\n", | |
| "df = df.with_columns(\n", | |
| " price_bucket = (\n", | |
| " pl.when(pl.col(\"price\") > pl.col(\"nbbo_ask\")).then(pl.lit(\"above_ask\"))\n", | |
| " .when(pl.col(\"price\") == pl.col(\"nbbo_ask\")).then(pl.lit(\"at_ask\"))\n", | |
| " .when(pl.col(\"price\") > pl.col(\"mid\")).then(pl.lit(\"above_midpoint\"))\n", | |
| " .when(pl.col(\"price\") == pl.col(\"mid\")).then(pl.lit(\"at_midpoint\"))\n", | |
| " .when(pl.col(\"price\") > pl.col(\"nbbo_bid\")).then(pl.lit(\"below_midpoint\"))\n", | |
| " .when(pl.col(\"price\") == pl.col(\"nbbo_bid\")).then(pl.lit(\"at_bid\"))\n", | |
| " .otherwise(pl.lit(\"below_bid\"))\n", | |
| " )\n", | |
| ")\n", | |
| "\n", | |
| "# Third pass: create helper boolean column like is_buyer_leaning etc.\n", | |
| "df = df.with_columns(\n", | |
| " is_buyer_leaning = pl.col(\"price_bucket\").is_in([\"at_ask\", \"above_midpoint\"]),\n", | |
| " is_seller_leaning = pl.col(\"price_bucket\").is_in([\"at_bid\", \"below_midpoint\"]),\n", | |
| " is_classifiable = pl.col(\"price_bucket\").is_in(\n", | |
| " [\"at_ask\", \"above_midpoint\", \"at_bid\", \"below_midpoint\"]\n", | |
| " ),\n", | |
| " is_outside_qbbo = pl.col(\"price_bucket\").is_in([\"above_ask\", \"below_bid\"]),\n", | |
| ")\n", | |
| "\n", | |
| "# Aggregate into price buckets the compute share-weighted distribution\n", | |
| "print(\"Price bucket distribution (share-weighted):\")\n", | |
| "bucket_dist_df = (\n", | |
| " df\n", | |
| " .group_by(\"price_bucket\")\n", | |
| " .agg(pl.col(\"size\").sum().alias(\"shares\"))\n", | |
| " .with_columns(\n", | |
| " pct = (pl.col(\"shares\") / (pl.col(\"shares\").sum()) * 100).round(2)\n", | |
| " )\n", | |
| " .sort(\"pct\", descending=True)\n", | |
| ")\n", | |
| "bucket_dist_df" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "b71f9552", | |
| "metadata": {}, | |
| "source": [ | |
| "The trade distribution over the entire period is quite symmetric,\n", | |
| "with each bucket roughly equal to its counterpart. Let's plot this result:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 4, | |
| "id": "4895e629", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| " <div id=\"lmM3zX\" ></div>\n", | |
| " <script type=\"text/javascript\" data-lets-plot-script=\"plot\">\n", | |
| " \n", | |
| " (function() {\n", | |
| " // ----------\n", | |
| " \n", | |
| " const forceImmediateRender = false;\n", | |
| " const responsive = false;\n", | |
| " \n", | |
| " let sizing = {\n", | |
| " width_mode: \"MIN\",\n", | |
| " height_mode: \"SCALED\",\n", | |
| " width: null, \n", | |
| " height: null \n", | |
| " };\n", | |
| " \n", | |
| " const preferredWidth = document.body.dataset.letsPlotPreferredWidth;\n", | |
| " if (preferredWidth !== undefined) {\n", | |
| " sizing = {\n", | |
| " width_mode: 'FIXED',\n", | |
| " height_mode: 'SCALED',\n", | |
| " width: parseFloat(preferredWidth)\n", | |
| " };\n", | |
| " }\n", | |
| " \n", | |
| " const containerDiv = document.getElementById(\"lmM3zX\");\n", | |
| " let fig = null;\n", | |
| " \n", | |
| " function renderPlot() {\n", | |
| " if (fig === null) {\n", | |
| " const plotSpec = {\n", | |
| "\"data\":{\n", | |
| "\"price_bucket\":[\"below_midpoint\",\"above_midpoint\",\"at_bid\",\"at_ask\",\"at_midpoint\",\"below_bid\",\"above_ask\"],\n", | |
| "\"pct\":[20.18,20.17,18.6,18.04,12.58,5.65,4.77]\n", | |
| "},\n", | |
| "\"mapping\":{\n", | |
| "\"x\":\"price_bucket\",\n", | |
| "\"y\":\"pct\",\n", | |
| "\"fill\":\"price_bucket\"\n", | |
| "},\n", | |
| "\"data_meta\":{\n", | |
| "\"series_annotations\":[{\n", | |
| "\"type\":\"str\",\n", | |
| "\"column\":\"price_bucket\"\n", | |
| "},{\n", | |
| "\"type\":\"int\",\n", | |
| "\"column\":\"shares\"\n", | |
| "},{\n", | |
| "\"type\":\"float\",\n", | |
| "\"column\":\"pct\"\n", | |
| "}]\n", | |
| "},\n", | |
| "\"ggtitle\":{\n", | |
| "\"text\":\"Dark Pool Trades by Price Bucket (Share-Weighted %)\"\n", | |
| "},\n", | |
| "\"guides\":{\n", | |
| "\"x\":{\n", | |
| "\"title\":\"Price Bucket\"\n", | |
| "},\n", | |
| "\"y\":{\n", | |
| "\"title\":\"% of Total Shares\"\n", | |
| "}\n", | |
| "},\n", | |
| "\"theme\":{\n", | |
| "\"axis_text_x\":{\n", | |
| "\"angle\":45.0,\n", | |
| "\"blank\":false\n", | |
| "},\n", | |
| "\"legend_position\":\"none\",\n", | |
| "\"plot_title\":{\n", | |
| "\"size\":16.0,\n", | |
| "\"hjust\":0.5,\n", | |
| "\"blank\":false\n", | |
| "}\n", | |
| "},\n", | |
| "\"ggsize\":{\n", | |
| "\"width\":800.0,\n", | |
| "\"height\":400.0\n", | |
| "},\n", | |
| "\"kind\":\"plot\",\n", | |
| "\"scales\":[{\n", | |
| "\"aesthetic\":\"x\",\n", | |
| "\"limits\":[\"below_bid\",\"at_bid\",\"below_midpoint\",\"at_midpoint\",\"above_midpoint\",\"at_ask\",\"above_ask\"],\n", | |
| "\"discrete\":true,\n", | |
| "\"reverse\":false\n", | |
| "},{\n", | |
| "\"aesthetic\":\"fill\",\n", | |
| "\"breaks\":[\"below_bid\",\"at_bid\",\"below_midpoint\",\"at_midpoint\",\"above_midpoint\",\"at_ask\",\"above_ask\"],\n", | |
| "\"values\":[\"#d62728\",\"#ff7f0e\",\"#ffbb78\",\"#999999\",\"#aec7e8\",\"#1f77b4\",\"#d62728\"]\n", | |
| "}],\n", | |
| "\"layers\":[{\n", | |
| "\"geom\":\"bar\",\n", | |
| "\"stat\":\"identity\",\n", | |
| "\"mapping\":{\n", | |
| "},\n", | |
| "\"data_meta\":{\n", | |
| "},\n", | |
| "\"data\":{\n", | |
| "}\n", | |
| "}],\n", | |
| "\"metainfo_list\":[],\n", | |
| "\"spec_id\":\"1\"\n", | |
| "};\n", | |
| " window.letsPlotCall(function() { fig = LetsPlot.buildPlotFromProcessedSpecs(plotSpec, containerDiv, sizing); });\n", | |
| " } else {\n", | |
| " fig.updateView({});\n", | |
| " }\n", | |
| " }\n", | |
| " \n", | |
| " const renderImmediately = \n", | |
| " forceImmediateRender || (\n", | |
| " sizing.width_mode === 'FIXED' && \n", | |
| " (sizing.height_mode === 'FIXED' || sizing.height_mode === 'SCALED')\n", | |
| " );\n", | |
| " \n", | |
| " if (renderImmediately) {\n", | |
| " renderPlot();\n", | |
| " }\n", | |
| " \n", | |
| " if (!renderImmediately || responsive) {\n", | |
| " // Set up observer for initial sizing or continuous monitoring\n", | |
| " var observer = new ResizeObserver(function(entries) {\n", | |
| " for (let entry of entries) {\n", | |
| " if (entry.contentBoxSize && \n", | |
| " entry.contentBoxSize[0].inlineSize > 0) {\n", | |
| " if (!responsive && observer) {\n", | |
| " observer.disconnect();\n", | |
| " observer = null;\n", | |
| " }\n", | |
| " renderPlot();\n", | |
| " if (!responsive) {\n", | |
| " break;\n", | |
| " }\n", | |
| " }\n", | |
| " }\n", | |
| " });\n", | |
| " \n", | |
| " observer.observe(containerDiv);\n", | |
| " }\n", | |
| " \n", | |
| " // ----------\n", | |
| " })();\n", | |
| " \n", | |
| " </script>" | |
| ], | |
| "text/plain": [ | |
| "<lets_plot.plot.core.PlotSpec at 0x256ff48c450>" | |
| ] | |
| }, | |
| "execution_count": 4, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "bucket_order = [\n", | |
| " \"below_bid\", \"at_bid\", \"below_midpoint\",\n", | |
| " \"at_midpoint\",\n", | |
| " \"above_midpoint\", \"at_ask\", \"above_ask\",\n", | |
| "]\n", | |
| "\n", | |
| "bucket_colors = {\n", | |
| " \"below_bid\": \"#d62728\", # red: QBBO failure\n", | |
| " \"at_bid\": \"#ff7f0e\", # orange: seller\n", | |
| " \"below_midpoint\": \"#ffbb78\", # light orange: probable seller\n", | |
| " \"at_midpoint\": \"#999999\", # grey: ambiguous\n", | |
| " \"above_midpoint\": \"#aec7e8\", # light blue: probable buyer\n", | |
| " \"at_ask\": \"#1f77b4\", # blue: buyer\n", | |
| " \"above_ask\": \"#d62728\", # red: QBBO failure\n", | |
| "}\n", | |
| "\n", | |
| "(\n", | |
| " lp.ggplot(bucket_dist_df, lp.aes(x=\"price_bucket\", y=\"pct\", fill=\"price_bucket\"))\n", | |
| " + lp.geom_bar(stat=\"identity\")\n", | |
| " + lp.scale_x_discrete(limits=bucket_order)\n", | |
| " + lp.scale_fill_manual(values=bucket_colors)\n", | |
| " + lp.labs(\n", | |
| " title=\"Dark Pool Trades by Price Bucket (Share-Weighted %)\",\n", | |
| " x=\"Price Bucket\",\n", | |
| " y=\"% of Total Shares\",\n", | |
| " )\n", | |
| " + lp.theme(\n", | |
| " legend_position=\"none\",\n", | |
| " plot_title=lp.element_text(size=16, hjust=0.5),\n", | |
| " axis_text_x=lp.element_text(angle=45),\n", | |
| " )\n", | |
| " + lp.ggsize(800, 400)\n", | |
| ")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "331dd5e2", | |
| "metadata": {}, | |
| "source": [ | |
| "**Is this distribution changing from from year-to-year?**\n", | |
| "- We only have a partial for 2022 (starting from 2022-08-22)\n", | |
| "- A similar bar plot for each year will make it easy to spot jarring changes" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 5, | |
| "id": "79bdd7e1", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| " <div id=\"9j4Wou\" ></div>\n", | |
| " <script type=\"text/javascript\" data-lets-plot-script=\"plot\">\n", | |
| " \n", | |
| " (function() {\n", | |
| " // ----------\n", | |
| " \n", | |
| " const forceImmediateRender = false;\n", | |
| " const responsive = false;\n", | |
| " \n", | |
| " let sizing = {\n", | |
| " width_mode: \"MIN\",\n", | |
| " height_mode: \"SCALED\",\n", | |
| " width: null, \n", | |
| " height: null \n", | |
| " };\n", | |
| " \n", | |
| " const preferredWidth = document.body.dataset.letsPlotPreferredWidth;\n", | |
| " if (preferredWidth !== undefined) {\n", | |
| " sizing = {\n", | |
| " width_mode: 'FIXED',\n", | |
| " height_mode: 'SCALED',\n", | |
| " width: parseFloat(preferredWidth)\n", | |
| " };\n", | |
| " }\n", | |
| " \n", | |
| " const containerDiv = document.getElementById(\"9j4Wou\");\n", | |
| " let fig = null;\n", | |
| " \n", | |
| " function renderPlot() {\n", | |
| " if (fig === null) {\n", | |
| " const plotSpec = {\n", | |
| "\"data\":{\n", | |
| "\"year\":[\"2022\",\"2022\",\"2022\",\"2022\",\"2022\",\"2022\",\"2022\",\"2023\",\"2023\",\"2023\",\"2023\",\"2023\",\"2023\",\"2023\",\"2024\",\"2024\",\"2024\",\"2024\",\"2024\",\"2024\",\"2024\",\"2025\",\"2025\",\"2025\",\"2025\",\"2025\",\"2025\",\"2025\"],\n", | |
| "\"price_bucket\":[\"below_midpoint\",\"above_midpoint\",\"at_bid\",\"at_ask\",\"at_midpoint\",\"below_bid\",\"above_ask\",\"at_bid\",\"below_midpoint\",\"above_midpoint\",\"at_ask\",\"at_midpoint\",\"below_bid\",\"above_ask\",\"above_midpoint\",\"below_midpoint\",\"at_bid\",\"at_ask\",\"at_midpoint\",\"below_bid\",\"above_ask\",\"above_midpoint\",\"below_midpoint\",\"at_bid\",\"at_ask\",\"at_midpoint\",\"below_bid\",\"above_ask\"],\n", | |
| "\"pct\":[20.55,20.04,19.82,18.91,13.5,3.97,3.22,20.38,19.57,19.18,19.08,13.45,4.6,3.74,20.31,20.14,18.24,17.91,12.89,5.75,4.76,20.66,20.47,17.59,17.35,11.69,6.55,5.69]\n", | |
| "},\n", | |
| "\"mapping\":{\n", | |
| "\"x\":\"price_bucket\",\n", | |
| "\"y\":\"pct\",\n", | |
| "\"fill\":\"price_bucket\"\n", | |
| "},\n", | |
| "\"data_meta\":{\n", | |
| "\"series_annotations\":[{\n", | |
| "\"type\":\"str\",\n", | |
| "\"column\":\"year\"\n", | |
| "},{\n", | |
| "\"type\":\"str\",\n", | |
| "\"column\":\"price_bucket\"\n", | |
| "},{\n", | |
| "\"type\":\"int\",\n", | |
| "\"column\":\"shares\"\n", | |
| "},{\n", | |
| "\"type\":\"float\",\n", | |
| "\"column\":\"pct\"\n", | |
| "}]\n", | |
| "},\n", | |
| "\"facet\":{\n", | |
| "\"name\":\"wrap\",\n", | |
| "\"facets\":\"year\",\n", | |
| "\"ncol\":2.0,\n", | |
| "\"order\":1.0,\n", | |
| "\"dir\":\"h\"\n", | |
| "},\n", | |
| "\"ggtitle\":{\n", | |
| "\"text\":\"Dark Pool Price Bucket Distribution by Year (Share-Weighted %)\"\n", | |
| "},\n", | |
| "\"guides\":{\n", | |
| "\"x\":{\n", | |
| "\"title\":\"Price Bucket\"\n", | |
| "},\n", | |
| "\"y\":{\n", | |
| "\"title\":\"% of Total Shares\"\n", | |
| "}\n", | |
| "},\n", | |
| "\"theme\":{\n", | |
| "\"axis_text_x\":{\n", | |
| "\"angle\":45.0,\n", | |
| "\"blank\":false\n", | |
| "},\n", | |
| "\"legend_position\":\"none\",\n", | |
| "\"plot_title\":{\n", | |
| "\"size\":16.0,\n", | |
| "\"hjust\":0.5,\n", | |
| "\"blank\":false\n", | |
| "}\n", | |
| "},\n", | |
| "\"ggsize\":{\n", | |
| "\"width\":1200.0,\n", | |
| "\"height\":800.0\n", | |
| "},\n", | |
| "\"kind\":\"plot\",\n", | |
| "\"scales\":[{\n", | |
| "\"aesthetic\":\"x\",\n", | |
| "\"limits\":[\"below_bid\",\"at_bid\",\"below_midpoint\",\"at_midpoint\",\"above_midpoint\",\"at_ask\",\"above_ask\"],\n", | |
| "\"discrete\":true,\n", | |
| "\"reverse\":false\n", | |
| "},{\n", | |
| "\"aesthetic\":\"fill\",\n", | |
| "\"breaks\":[\"below_bid\",\"at_bid\",\"below_midpoint\",\"at_midpoint\",\"above_midpoint\",\"at_ask\",\"above_ask\"],\n", | |
| "\"values\":[\"#d62728\",\"#ff7f0e\",\"#ffbb78\",\"#999999\",\"#aec7e8\",\"#1f77b4\",\"#d62728\"]\n", | |
| "}],\n", | |
| "\"layers\":[{\n", | |
| "\"geom\":\"bar\",\n", | |
| "\"stat\":\"identity\",\n", | |
| "\"mapping\":{\n", | |
| "},\n", | |
| "\"data_meta\":{\n", | |
| "},\n", | |
| "\"data\":{\n", | |
| "}\n", | |
| "}],\n", | |
| "\"metainfo_list\":[],\n", | |
| "\"spec_id\":\"2\"\n", | |
| "};\n", | |
| " window.letsPlotCall(function() { fig = LetsPlot.buildPlotFromProcessedSpecs(plotSpec, containerDiv, sizing); });\n", | |
| " } else {\n", | |
| " fig.updateView({});\n", | |
| " }\n", | |
| " }\n", | |
| " \n", | |
| " const renderImmediately = \n", | |
| " forceImmediateRender || (\n", | |
| " sizing.width_mode === 'FIXED' && \n", | |
| " (sizing.height_mode === 'FIXED' || sizing.height_mode === 'SCALED')\n", | |
| " );\n", | |
| " \n", | |
| " if (renderImmediately) {\n", | |
| " renderPlot();\n", | |
| " }\n", | |
| " \n", | |
| " if (!renderImmediately || responsive) {\n", | |
| " // Set up observer for initial sizing or continuous monitoring\n", | |
| " var observer = new ResizeObserver(function(entries) {\n", | |
| " for (let entry of entries) {\n", | |
| " if (entry.contentBoxSize && \n", | |
| " entry.contentBoxSize[0].inlineSize > 0) {\n", | |
| " if (!responsive && observer) {\n", | |
| " observer.disconnect();\n", | |
| " observer = null;\n", | |
| " }\n", | |
| " renderPlot();\n", | |
| " if (!responsive) {\n", | |
| " break;\n", | |
| " }\n", | |
| " }\n", | |
| " }\n", | |
| " });\n", | |
| " \n", | |
| " observer.observe(containerDiv);\n", | |
| " }\n", | |
| " \n", | |
| " // ----------\n", | |
| " })();\n", | |
| " \n", | |
| " </script>" | |
| ], | |
| "text/plain": [ | |
| "<lets_plot.plot.core.PlotSpec at 0x256d7d83570>" | |
| ] | |
| }, | |
| "execution_count": 5, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "yearly_bucket_dist_df = (\n", | |
| " df\n", | |
| " .with_columns(\n", | |
| " year = pl.col(\"trade_date\").dt.year().cast(pl.String)\n", | |
| " )\n", | |
| " .group_by([\"year\", \"price_bucket\"])\n", | |
| " .agg(pl.col(\"size\").sum().alias(\"shares\"))\n", | |
| " .with_columns(\n", | |
| " pct = (\n", | |
| " pl.col(\"shares\")\n", | |
| " / (pl.col(\"shares\").sum().over(\"year\"))\n", | |
| " * 100\n", | |
| " ).round(2)\n", | |
| " )\n", | |
| " .sort([\"year\", \"pct\"], descending=[False, True])\n", | |
| ")\n", | |
| "\n", | |
| "(\n", | |
| " lp.ggplot(\n", | |
| " yearly_bucket_dist_df,\n", | |
| " lp.aes(x=\"price_bucket\", y=\"pct\", fill=\"price_bucket\")\n", | |
| " )\n", | |
| " + lp.geom_bar(stat=\"identity\")\n", | |
| " + lp.scale_x_discrete(limits=bucket_order)\n", | |
| " + lp.scale_fill_manual(values=bucket_colors)\n", | |
| " + lp.facet_wrap(facets=\"year\", ncol=2) # one panel per year\n", | |
| " + lp.labs(\n", | |
| " title=\"Dark Pool Price Bucket Distribution by Year (Share-Weighted %)\",\n", | |
| " x=\"Price Bucket\",\n", | |
| " y=\"% of Total Shares\",\n", | |
| " )\n", | |
| " + lp.theme(\n", | |
| " legend_position=\"none\",\n", | |
| " plot_title=lp.element_text(size=16, hjust=0.5),\n", | |
| " axis_text_x=lp.element_text(angle=45),\n", | |
| " )\n", | |
| " + lp.ggsize(1200, 800)\n", | |
| ")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "99e2dca8", | |
| "metadata": {}, | |
| "source": [ | |
| "The plots look consistent, but we will calculate Jensen-Shannon Divergence to be statistically certain:\n", | |
| "\n", | |
| "| Jensen-Shannon Divergence | Meaning |\n", | |
| "|---|---|\n", | |
| "| `<0.01` | Essentially identical |\n", | |
| "| `0.01-0.05` | Minor drift but likely stable |\n", | |
| "| `0.05-0.15` | Meaningful shift |\n", | |
| "| `>0.15` | Regime change |" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 6, | |
| "id": "e991d55e", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Jensen-Shannon Divergence between years:\n", | |
| " 2023 vs 2024: 0.0017\n", | |
| " 2023 vs 2025: 0.0045\n", | |
| " 2024 vs 2025: 0.0008\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "yearly_pivot_df = (\n", | |
| " yearly_bucket_dist_df\n", | |
| " .pivot(index=\"year\", on=\"price_bucket\", values=\"pct\")\n", | |
| " .fill_null(0)\n", | |
| " .sort(\"year\")\n", | |
| " .filter(pl.col(\"year\").is_in([\"2023\", \"2024\", \"2025\"])) # full years only\n", | |
| ")\n", | |
| "\n", | |
| "# Build numpy matrix with one probability vector per year\n", | |
| "years = yearly_pivot_df[\"year\"].to_list()\n", | |
| "bucket_cols = bucket_order\n", | |
| "matrix = yearly_pivot_df.select(bucket_cols).to_numpy()\n", | |
| "matrix = matrix / matrix.sum(axis=1, keepdims=True) # re-normalize to sum=1\n", | |
| "\n", | |
| "print(\"Jensen-Shannon Divergence between years:\")\n", | |
| "for (i, year_i), (j, year_j) in combinations(enumerate(years), 2):\n", | |
| " dist = jensenshannon(matrix[i], matrix[j], base=2) ** 2 \n", | |
| " print(f\" {year_i} vs {year_j}: {dist:.4f}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "8eacf0b7", | |
| "metadata": {}, | |
| "source": [ | |
| "**The Jensen-Shannon Divergence is firmly in the \"essentially identical\" range, giving us confidence that Nasdaq TRF dark pool trade execution matched against QBBO is stable.**" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "4b05054d", | |
| "metadata": {}, | |
| "source": [ | |
| "## 2. Do NYSE vs. NASDAQ Discrepancies exist?\n", | |
| "Let's select 20 liquid NYSE stocks and 20 liquid NASDAQ stocks then compare their\n", | |
| "trade execution distributions to see if any notable discrepancies appear:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 7, | |
| "id": "b9356473", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div><style>\n", | |
| ".dataframe > thead > tr,\n", | |
| ".dataframe > tbody > tr {\n", | |
| " text-align: right;\n", | |
| " white-space: pre-wrap;\n", | |
| "}\n", | |
| "</style>\n", | |
| "<small>shape: (2, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>listing_venue</th><th>total_shares</th><th>outside_qbbo_pct</th><th>at_midpoint_pct</th><th>classifiable_pct</th></tr><tr><td>str</td><td>i64</td><td>f64</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>"NASDAQ"</td><td>90414340704</td><td>0.199529</td><td>0.102899</td><td>0.697572</td></tr><tr><td>"NYSE"</td><td>15901675066</td><td>0.097906</td><td>0.174888</td><td>0.727206</td></tr></tbody></table></div>" | |
| ], | |
| "text/plain": [ | |
| "shape: (2, 5)\n", | |
| "┌───────────────┬──────────────┬──────────────────┬─────────────────┬──────────────────┐\n", | |
| "│ listing_venue ┆ total_shares ┆ outside_qbbo_pct ┆ at_midpoint_pct ┆ classifiable_pct │\n", | |
| "│ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", | |
| "│ str ┆ i64 ┆ f64 ┆ f64 ┆ f64 │\n", | |
| "╞═══════════════╪══════════════╪══════════════════╪═════════════════╪══════════════════╡\n", | |
| "│ NASDAQ ┆ 90414340704 ┆ 0.199529 ┆ 0.102899 ┆ 0.697572 │\n", | |
| "│ NYSE ┆ 15901675066 ┆ 0.097906 ┆ 0.174888 ┆ 0.727206 │\n", | |
| "└───────────────┴──────────────┴──────────────────┴─────────────────┴──────────────────┘" | |
| ] | |
| }, | |
| "execution_count": 7, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "venue_map = pl.DataFrame({\n", | |
| " \"ticker\": [\n", | |
| " # NYSE stocks\n", | |
| " \"JPM\", \"BAC\", \"GS\", \"MS\", \"WFC\",\n", | |
| " \"C\", \"CAT\", \"JNJ\", \"PG\", \"XOM\",\n", | |
| " \"CVX\", \"KO\", \"MCD\", \"WMT\", \"HD\",\n", | |
| " \"MO\", \"PM\", \"T\", \"VZ\", \"IBM\",\n", | |
| " # NASDAQ stocks\n", | |
| " \"AAPL\", \"MSFT\", \"NVDA\", \"AMZN\", \"TSLA\",\n", | |
| " \"META\", \"GOOG\", \"AVGO\", \"COST\", \"ADBE\",\n", | |
| " \"AMD\", \"INTC\", \"QCOM\", \"TXN\", \"MU\",\n", | |
| " \"AMAT\", \"NFLX\", \"CSCO\", \"HON\", \"PYPL\",\n", | |
| " ],\n", | |
| " \"listing_venue\": [\n", | |
| " \"NYSE\", \"NYSE\", \"NYSE\", \"NYSE\", \"NYSE\",\n", | |
| " \"NYSE\", \"NYSE\", \"NYSE\", \"NYSE\", \"NYSE\",\n", | |
| " \"NYSE\", \"NYSE\", \"NYSE\", \"NYSE\", \"NYSE\",\n", | |
| " \"NYSE\", \"NYSE\", \"NYSE\", \"NYSE\", \"NYSE\",\n", | |
| " \"NASDAQ\", \"NASDAQ\", \"NASDAQ\", \"NASDAQ\", \"NASDAQ\",\n", | |
| " \"NASDAQ\", \"NASDAQ\", \"NASDAQ\", \"NASDAQ\", \"NASDAQ\",\n", | |
| " \"NASDAQ\", \"NASDAQ\", \"NASDAQ\", \"NASDAQ\", \"NASDAQ\",\n", | |
| " \"NASDAQ\", \"NASDAQ\", \"NASDAQ\", \"NASDAQ\", \"NASDAQ\",\n", | |
| " ]\n", | |
| "})\n", | |
| "\n", | |
| "classified_df = (\n", | |
| " df\n", | |
| " .join(venue_map, on=\"ticker\", how=\"inner\")\n", | |
| " .select([\n", | |
| " \"ticker\", \"listing_venue\", \"price\", \"size\",\n", | |
| " \"nbbo_bid\", \"nbbo_ask\", \"mid\", \"price_bucket\",\n", | |
| " ])\n", | |
| ")\n", | |
| "\n", | |
| "# Aggregate by ticker, venue, and price bucket to get total shares and number of trades\n", | |
| "aggregate_df = (\n", | |
| " classified_df\n", | |
| " .group_by([\"ticker\", \"listing_venue\", \"price_bucket\"])\n", | |
| " .agg([\n", | |
| " pl.len().alias(\"n_trades\"),\n", | |
| " pl.col(\"size\").sum().alias(\"total_shares\"),\n", | |
| " ])\n", | |
| ")\n", | |
| "\n", | |
| "# Total shares per ticker for pct_of_volume\n", | |
| "ticker_totals_df = (\n", | |
| " aggregate_df\n", | |
| " .group_by([\"ticker\", \"listing_venue\"])\n", | |
| " .agg([\n", | |
| " pl.col(\"total_shares\").sum().alias(\"ticker_total_shares\")\n", | |
| " ])\n", | |
| ")\n", | |
| "\n", | |
| "# Per-ticker, per-bucket detail with pct of volume\n", | |
| "ticker_detail_df = (\n", | |
| " aggregate_df\n", | |
| " .join(ticker_totals_df, on=[\"ticker\", \"listing_venue\"], how=\"left\")\n", | |
| " .with_columns(\n", | |
| " pct_of_volume = (pl.col(\"total_shares\") / pl.col(\"ticker_total_shares\")),\n", | |
| " is_buyer_leaning = pl.col(\"price_bucket\").is_in([\"at_ask\", \"above_midpoint\"]),\n", | |
| " is_seller_leaning = pl.col(\"price_bucket\").is_in([\"at_bid\", \"below_midpoint\"]),\n", | |
| " is_classifiable = pl.col(\"price_bucket\").is_in(\n", | |
| " [\"at_ask\", \"above_midpoint\", \"at_bid\", \"below_midpoint\"]\n", | |
| " ),\n", | |
| " is_outside_qbbo = pl.col(\"price_bucket\").is_in([\"above_ask\", \"below_bid\"])\n", | |
| " )\n", | |
| " .sort([\"listing_venue\", \"ticker\", \"price_bucket\"])\n", | |
| ")\n", | |
| "\n", | |
| "# Final aggregation by venue to get overall distribution of price buckets\n", | |
| "venue_summary_df = (\n", | |
| " ticker_detail_df\n", | |
| " .group_by(\"listing_venue\")\n", | |
| " .agg([\n", | |
| " pl.col(\"total_shares\").sum().alias(\"total_shares\"),\n", | |
| " (\n", | |
| " pl.col(\"total_shares\").filter(pl.col(\"is_outside_qbbo\")).sum()\n", | |
| " / pl.col(\"total_shares\").sum()\n", | |
| " ).alias(\"outside_qbbo_pct\"),\n", | |
| " (\n", | |
| " pl.col(\"total_shares\").filter(pl.col(\"price_bucket\") == \"at_midpoint\").sum()\n", | |
| " / pl.col(\"total_shares\").sum()\n", | |
| " ).alias(\"at_midpoint_pct\"),\n", | |
| " (\n", | |
| " pl.col(\"total_shares\").filter(pl.col(\"is_classifiable\")).sum()\n", | |
| " / pl.col(\"total_shares\").sum()\n", | |
| " ).alias(\"classifiable_pct\"),\n", | |
| " ])\n", | |
| " .sort(\"listing_venue\")\n", | |
| ")\n", | |
| "venue_summary_df" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "0b8d30fe", | |
| "metadata": {}, | |
| "source": [ | |
| "Only 9.8% of the selected NYSE-listed stock dark pool transactions occur outside the QBBO. In comparison:\n", | |
| "- 20.0% of the selected NASDAQ-listed stock dark pool transactions occur outside the QBBO\n", | |
| "- 10.4% of the total dark pool transactions occur outside the QBBO\n", | |
| "\n", | |
| "**This gives us confidence that there is no bias against NYSE-listed issues.**" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "darkpool-backtest (3.13.8)", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.13.8" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 5 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment