These are some quick and simple tools for comparing performance of different versions of browsertrix-crawler (using the webrecorder/browsertrix-crawler Docker images). You'll need Docker and uv to run this.
-
First, adjust
epatest.yamlfor whatever settings you’d like. -
Then, run
run-many.sh <name-of-scenario>. This will run 3 crawls each with v1.8.1 and v1.9.0, where the resulting collections are namedepatest--<version>--<scenario>--<index>. For example:# Try it with default settings: ./run-many.sh default # Adjust epatest.yaml to turn off autoscroll, then: ./run-many.sh no-autoscroll
(This script just runs
epatest.shrepeatedly with appropriate version and name arguments. You can do a single test run with./epatest.sh <version> <name>, e.g../epatest.sh 1.8.1 default--1.) -
Run
epatest-logs.pywith uv to load the logs from those crawls and compare them:$ uv run epatest-logs.py Median Time | Collections | Individual Times -------------- | -------------------------------------------------- | -------------------- https://hero.epa.gov/hero/index.cfm/search 6.0s +/- 0.2 | 1-8-1--default-behaviors.......................... | 5.9s, 6.0s, 6.0s 6.0s +/- 0.2 | 1-8-1--no-autoscroll.............................. | 6.0s, 5.8s, 6.1s 6.0s +/- 0.1 | 1-9-0--default-behaviors.......................... | 6.0s, 5.9s, 6.0s 5.9s +/- 0.0 | 1-9-0--no-autoscroll.............................. | 5.9s, 5.8s, 5.9s https://espanol.epa.gov/espanol/terminos-e 3.6s +/- 0.5 | 1-8-1--default-behaviors.......................... | 3.5s, 3.6s, 4.1s 3.6s +/- 0.6 | 1-8-1--no-autoscroll.............................. | 3.6s, 3.6s, 4.1s 70.1s +/- 0.1 | 1-9-0--default-behaviors.......................... | 70.1s, 70.0s, 70.1s 32.9s +/- 0.1 | 1-9-0--no-autoscroll.............................. | 32.9s, 32.8s, 32.9sSo you can compare how different scenarios and versions perform on the same page(s).
(NOTE: you can also add a
--path <path>to point to a directory of crawl collections, e.g.uv run epatest-logs.py --path other-crawls/collections.)
I sure got some surprising results here! I had started by grabbing the first 20 URLs of a random EDGI crawl of epa.gov hostnames (see epatest.yaml in this gist). It happened to include two hostnames: hero.epa.gov and espanol.epa.gov. Running this gave me results like what I expected… hero.epa.gov worked fine in both versions, and every URL at espanol.epa.gov took ~10× longer in v1.9.0. Dropping autoscroll helped v1.9.0 only on some long pages. Excerpted results:
Median Time | Collections | Individual Times
-------------- | -------------------------------------------------- | --------------------
https://hero.epa.gov/hero/index.cfm/search
6.0s +/- 0.2 | 1-8-1--default-behaviors.......................... | 5.9s, 6.0s, 6.0s
6.0s +/- 0.2 | 1-8-1--no-autoscroll.............................. | 6.0s, 5.8s, 6.1s
6.0s +/- 0.2 | 1-8-1--no-autoscroll-autoplay..................... | 6.2s, 6.0s, 6.0s
5.8s +/- 0.1 | 1-8-1--no-autoscroll-autoplay-autofetch........... | 5.7s, 5.8s, 5.9s
6.0s +/- 0.1 | 1-9-0--default-behaviors.......................... | 6.0s, 5.9s, 6.0s
5.9s +/- 0.0 | 1-9-0--no-autoscroll.............................. | 5.9s, 5.8s, 5.9s
5.9s +/- 0.0 | 1-9-0--no-autoscroll-autoplay..................... | 5.9s, 5.9s, 5.9s
5.8s +/- 0.2 | 1-9-0--no-autoscroll-autoplay-autofetch........... | 5.8s, 5.7s, 5.8s
https://hero.epa.gov/
5.2s +/- 0.0 | 1-8-1--default-behaviors.......................... | 5.2s, 5.2s, 5.2s
5.2s +/- 0.1 | 1-8-1--no-autoscroll.............................. | 5.2s, 5.2s, 5.2s
5.2s +/- 0.1 | 1-8-1--no-autoscroll-autoplay..................... | 5.2s, 5.1s, 5.2s
5.4s +/- 0.1 | 1-8-1--no-autoscroll-autoplay-autofetch........... | 5.4s, 5.4s, 5.3s
5.6s +/- 0.1 | 1-9-0--default-behaviors.......................... | 5.6s, 5.5s, 5.6s
5.6s +/- 0.1 | 1-9-0--no-autoscroll.............................. | 5.5s, 5.6s, 5.6s
5.6s +/- 0.0 | 1-9-0--no-autoscroll-autoplay..................... | 5.6s, 5.6s, 5.6s
5.8s +/- 0.2 | 1-9-0--no-autoscroll-autoplay-autofetch........... | 5.7s, 5.9s, 5.8s
https://espanol.epa.gov/cai/manual-informativo-sobre-el-radon
4.7s +/- 0.2 | 1-8-1--default-behaviors.......................... | 4.7s, 4.7s, 4.9s
4.7s +/- 0.0 | 1-8-1--no-autoscroll.............................. | 4.7s, 4.7s, 4.7s
4.7s +/- 0.1 | 1-8-1--no-autoscroll-autoplay..................... | 4.7s, 4.7s, 4.8s
4.6s +/- 0.2 | 1-8-1--no-autoscroll-autoplay-autofetch........... | 4.5s, 4.6s, 4.7s
34.6s +/- 0.0 | 1-9-0--default-behaviors.......................... | 34.6s, 34.6s, 34.5s
33.8s +/- 0.0 | 1-9-0--no-autoscroll.............................. | 33.8s, 33.7s, 33.8s
33.8s +/- 0.1 | 1-9-0--no-autoscroll-autoplay..................... | 33.8s, 33.7s, 33.8s
33.8s +/- 0.0 | 1-9-0--no-autoscroll-autoplay-autofetch........... | 33.8s, 33.8s, 33.8s
https://espanol.epa.gov/watersense/en-sequia
3.6s +/- 0.4 | 1-8-1--default-behaviors.......................... | 3.9s, 3.6s, 3.5s
3.6s +/- 0.1 | 1-8-1--no-autoscroll.............................. | 3.6s, 3.6s, 3.5s
4.3s +/- 0.8 | 1-8-1--no-autoscroll-autoplay..................... | 4.3s, 3.6s, 4.3s
3.7s +/- 0.8 | 1-8-1--no-autoscroll-autoplay-autofetch........... | 4.3s, 3.5s, 3.7s
35.4s +/- 0.0 | 1-9-0--default-behaviors.......................... | 35.4s, 35.4s, 35.4s
32.9s +/- 0.0 | 1-9-0--no-autoscroll.............................. | 32.9s, 32.8s, 32.9s
32.9s +/- 0.2 | 1-9-0--no-autoscroll-autoplay..................... | 32.9s, 32.9s, 33.0s
32.9s +/- 0.0 | 1-9-0--no-autoscroll-autoplay-autofetch........... | 32.9s, 32.9s, 32.9s
https://espanol.epa.gov/espanol/terminos-e
3.6s +/- 0.5 | 1-8-1--default-behaviors.......................... | 3.5s, 3.6s, 4.1s
3.6s +/- 0.6 | 1-8-1--no-autoscroll.............................. | 3.6s, 3.6s, 4.1s
3.5s +/- 0.1 | 1-8-1--no-autoscroll-autoplay..................... | 3.5s, 3.5s, 3.6s
3.5s +/- 0.8 | 1-8-1--no-autoscroll-autoplay-autofetch........... | 3.5s, 3.5s, 4.3s
70.1s +/- 0.1 | 1-9-0--default-behaviors.......................... | 70.1s, 70.0s, 70.1s
32.9s +/- 0.1 | 1-9-0--no-autoscroll.............................. | 32.9s, 32.8s, 32.9s
32.9s +/- 0.2 | 1-9-0--no-autoscroll-autoplay..................... | 32.9s, 32.9s, 33.0s
32.9s +/- 1.3 | 1-9-0--no-autoscroll-autoplay-autofetch........... | 32.9s, 34.2s, 32.9s
So I thought I’d try to simplify the test and just look at a couple of the slow pages. But then v1.8.1 slowed down to be the same as v1.9.0!
Median Time | Collections | Individual Times
-------------- | -------------------------------------------------- | --------------------
https://espanol.epa.gov/cai/manual-informativo-sobre-el-radon
34.9s +/- 0.1 | 1-8-1--default-behaviors.......................... | 35.0s, 34.9s, 34.9s
34.1s +/- 0.0 | 1-8-1--no-autoscroll.............................. | 34.1s, 34.1s, 34.1s
34.9s +/- 0.1 | 1-9-0--default-behaviors.......................... | 34.9s, 34.9s, 35.0s
34.1s +/- 0.1 | 1-9-0--no-autoscroll.............................. | 34.1s, 34.2s, 34.1s
https://espanol.epa.gov/watersense/en-sequia
32.8s +/- 0.1 | 1-8-1--default-behaviors.......................... | 32.9s, 32.8s, 32.8s
32.8s +/- 0.1 | 1-8-1--no-autoscroll.............................. | 32.8s, 32.8s, 32.8s
35.6s +/- 0.0 | 1-9-0--default-behaviors.......................... | 35.6s, 35.6s, 35.6s
33.0s +/- 0.1 | 1-9-0--no-autoscroll.............................. | 33.0s, 33.0s, 32.9s
This holds even when I bumped it up to 5 URLs instead of two.
BUT if I include 2 URLs each from hero.epa.gov and espanol.epa.gov, we're back to the original speedy behavior in v1.8.1 and slow behavior in v1.9.0:
Median Time | Collections | Individual Times
-------------- | -------------------------------------------------- | --------------------
https://hero.epa.gov/hero/index.cfm/search
5.9s +/- 0.5 | 1-8-1--default-behaviors.......................... | 6.3s, 5.8s, 5.9s
6.1s +/- 0.2 | 1-8-1--no-autoscroll.............................. | 6.1s, 6.0s, 6.2s
5.9s +/- 0.0 | 1-9-0--default-behaviors.......................... | 5.9s, 5.9s, 5.9s
5.9s +/- 0.0 | 1-9-0--no-autoscroll.............................. | 5.8s, 5.9s, 5.9s
https://hero.epa.gov/
5.1s +/- 1.2 | 1-8-1--default-behaviors.......................... | 5.1s, 5.2s, 4.0s
5.2s +/- 0.8 | 1-8-1--no-autoscroll.............................. | 4.4s, 5.2s, 5.2s
5.6s +/- 0.8 | 1-9-0--default-behaviors.......................... | 4.9s, 5.6s, 5.6s
5.6s +/- 0.8 | 1-9-0--no-autoscroll.............................. | 4.8s, 5.6s, 5.6s
https://espanol.epa.gov/cai/manual-informativo-sobre-el-radon
4.6s +/- 0.1 | 1-8-1--default-behaviors.......................... | 4.7s, 4.6s, 4.6s
4.7s +/- 0.0 | 1-8-1--no-autoscroll.............................. | 4.7s, 4.7s, 4.7s
34.6s +/- 0.3 | 1-9-0--default-behaviors.......................... | 34.8s, 34.5s, 34.6s
33.7s +/- 0.3 | 1-9-0--no-autoscroll.............................. | 34.0s, 33.7s, 33.7s
https://espanol.epa.gov/watersense/en-sequia
3.6s +/- 0.7 | 1-8-1--default-behaviors.......................... | 3.6s, 3.6s, 4.2s
3.6s +/- 0.8 | 1-8-1--no-autoscroll.............................. | 4.4s, 3.6s, 3.6s
35.4s +/- 0.7 | 1-9-0--default-behaviors.......................... | 36.1s, 35.4s, 35.4s
32.9s +/- 0.5 | 1-9-0--no-autoscroll.............................. | 33.3s, 32.8s, 32.9s
Doesn't make a whole lot of sense to me. Is this an issue with multiple hostnames? hero.epa.gov poisoning things somehow? Odd.