Skip to content

Instantly share code, notes, and snippets.

@nmoinvaz
Last active February 24, 2026 04:49
Show Gist options
  • Select an option

  • Save nmoinvaz/9608be84c11cb327e293804c0053da16 to your computer and use it in GitHub Desktop.

Select an option

Save nmoinvaz/9608be84c11cb327e293804c0053da16 to your computer and use it in GitHub Desktop.
Zlib-ng benchmark: crc32_armv8_pmull_eor3 — improvements/crc32-arm-copy vs develop

Benchmark: improvements/crc32-arm-copy vs develop

Date: 2026-02-23 Platform: Apple Silicon (ARM64), 8 cores, L1D 64 KiB, L2 4096 KiB Build: CMake Release, static libs Repetitions: 5 (median CPU time reported)

crc32/armv8_pmull_eor3 (CRC32 only)

Size develop (ns) feature (ns) Change
1 2.15 2.15 0%
8 5.67 4.31 -24.0%
12 5.66 5.34 -5.7%
16 5.91 5.83 -1.4%
32 6.35 6.28 -1.1%
64 8.70 8.69 0%
512 35.5 35.9 +1.1%
4096 100 102 +2.0%
32768 399 398 -0.3%
262144 2664 2717 +2.0%
4194304 41708 42192 +1.2%

crc32_copy/armv8_pmull_eor3 (CRC32 + memcpy)

Size develop (ns) feature (ns) Change
32 10.3 6.48 -37.1%
512 39.4 38.4 -2.5%
8192 251 199 -20.7%
32768 700 630 -10.0%
65536 1480 1179 -20.3%

Summary

  • crc32 (no copy): No significant regression. Small-size improvement at 8 bytes; larger sizes within noise (~1-2%).
  • crc32_copy (interleaved copy): Substantial improvements across all sizes — 20-37% faster at most sizes. The interleaved CRC32+copy implementation avoids a separate memcpy pass.

Commits on improvements/crc32-arm-copy

  • b4043c6f Implement crc32 interleaved copy for ARM PMULL+EOR3
  • babbd9f1 Add ARM CRC32 private header with shared align/tail helpers
@nmoinvaz
Copy link
Author

Raw benchmark output

develop (54352daf)

-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
crc32/armv8_pmull_eor3/1_mean                  2.16 ns         2.15 ns            5
crc32/armv8_pmull_eor3/1_median                2.15 ns         2.15 ns            5
crc32/armv8_pmull_eor3/1_stddev               0.007 ns        0.003 ns            5
crc32/armv8_pmull_eor3/1_cv                    0.30 %          0.16 %             5
crc32/armv8_pmull_eor3/8_mean                  5.71 ns         5.64 ns            5
crc32/armv8_pmull_eor3/8_median                5.68 ns         5.67 ns            5
crc32/armv8_pmull_eor3/8_stddev               0.158 ns        0.054 ns            5
crc32/armv8_pmull_eor3/8_cv                    2.76 %          0.96 %             5
crc32/armv8_pmull_eor3/12_mean                 5.67 ns         5.65 ns            5
crc32/armv8_pmull_eor3/12_median               5.67 ns         5.66 ns            5
crc32/armv8_pmull_eor3/12_stddev              0.075 ns        0.075 ns            5
crc32/armv8_pmull_eor3/12_cv                   1.33 %          1.32 %             5
crc32/armv8_pmull_eor3/16_mean                 6.00 ns         5.91 ns            5
crc32/armv8_pmull_eor3/16_median               5.95 ns         5.91 ns            5
crc32/armv8_pmull_eor3/16_stddev              0.193 ns        0.084 ns            5
crc32/armv8_pmull_eor3/16_cv                   3.22 %          1.42 %             5
crc32/armv8_pmull_eor3/32_mean                 6.38 ns         6.32 ns            5
crc32/armv8_pmull_eor3/32_median               6.39 ns         6.35 ns            5
crc32/armv8_pmull_eor3/32_stddev              0.115 ns        0.055 ns            5
crc32/armv8_pmull_eor3/32_cv                   1.81 %          0.87 %             5
crc32/armv8_pmull_eor3/64_mean                 8.73 ns         8.70 ns            5
crc32/armv8_pmull_eor3/64_median               8.72 ns         8.70 ns            5
crc32/armv8_pmull_eor3/64_stddev              0.014 ns        0.012 ns            5
crc32/armv8_pmull_eor3/64_cv                   0.16 %          0.14 %             5
crc32/armv8_pmull_eor3/512_mean                36.0 ns         35.6 ns            5
crc32/armv8_pmull_eor3/512_median              35.6 ns         35.5 ns            5
crc32/armv8_pmull_eor3/512_stddev             0.842 ns        0.148 ns            5
crc32/armv8_pmull_eor3/512_cv                  2.34 %          0.42 %             5
crc32/armv8_pmull_eor3/4096_mean                101 ns          100 ns            5
crc32/armv8_pmull_eor3/4096_median              101 ns          100 ns            5
crc32/armv8_pmull_eor3/4096_stddev            0.092 ns        0.070 ns            5
crc32/armv8_pmull_eor3/4096_cv                 0.09 %          0.07 %             5
crc32/armv8_pmull_eor3/32768_mean               404 ns          400 ns            5
crc32/armv8_pmull_eor3/32768_median             400 ns          399 ns            5
crc32/armv8_pmull_eor3/32768_stddev            9.84 ns         2.10 ns            5
crc32/armv8_pmull_eor3/32768_cv                2.43 %          0.52 %             5
crc32/armv8_pmull_eor3/262144_mean             2672 ns         2664 ns            5
crc32/armv8_pmull_eor3/262144_median           2672 ns         2664 ns            5
crc32/armv8_pmull_eor3/262144_stddev           3.50 ns         2.24 ns            5
crc32/armv8_pmull_eor3/262144_cv               0.13 %          0.08 %             5
crc32/armv8_pmull_eor3/4194304_mean           42188 ns        41920 ns            5
crc32/armv8_pmull_eor3/4194304_median         41851 ns        41708 ns            5
crc32/armv8_pmull_eor3/4194304_stddev           727 ns          447 ns            5
crc32/armv8_pmull_eor3/4194304_cv              1.72 %          1.07 %             5
crc32_copy/armv8_pmull_eor3/32_mean            10.5 ns         10.3 ns            5
crc32_copy/armv8_pmull_eor3/32_median          10.3 ns         10.3 ns            5
crc32_copy/armv8_pmull_eor3/32_stddev         0.256 ns        0.056 ns            5
crc32_copy/armv8_pmull_eor3/32_cv              2.44 %          0.54 %             5
crc32_copy/armv8_pmull_eor3/512_mean           39.5 ns         39.4 ns            5
crc32_copy/armv8_pmull_eor3/512_median         39.5 ns         39.4 ns            5
crc32_copy/armv8_pmull_eor3/512_stddev        0.065 ns        0.061 ns            5
crc32_copy/armv8_pmull_eor3/512_cv             0.17 %          0.15 %             5
crc32_copy/armv8_pmull_eor3/8192_mean           255 ns          251 ns            5
crc32_copy/armv8_pmull_eor3/8192_median         252 ns          251 ns            5
crc32_copy/armv8_pmull_eor3/8192_stddev        7.57 ns         1.98 ns            5
crc32_copy/armv8_pmull_eor3/8192_cv            2.97 %          0.79 %             5
crc32_copy/armv8_pmull_eor3/32768_mean          705 ns          702 ns            5
crc32_copy/armv8_pmull_eor3/32768_median        702 ns          700 ns            5
crc32_copy/armv8_pmull_eor3/32768_stddev       5.81 ns         5.05 ns            5
crc32_copy/armv8_pmull_eor3/32768_cv           0.82 %          0.72 %             5
crc32_copy/armv8_pmull_eor3/65536_mean         1490 ns         1481 ns            5
crc32_copy/armv8_pmull_eor3/65536_median       1485 ns         1480 ns            5
crc32_copy/armv8_pmull_eor3/65536_stddev       20.0 ns         10.5 ns            5
crc32_copy/armv8_pmull_eor3/65536_cv           1.34 %          0.71 %             5

improvements/crc32-arm-copy (b4043c6f)

-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
crc32/armv8_pmull_eor3/1_mean                  2.15 ns         2.15 ns            5
crc32/armv8_pmull_eor3/1_median                2.15 ns         2.15 ns            5
crc32/armv8_pmull_eor3/1_stddev               0.004 ns        0.003 ns            5
crc32/armv8_pmull_eor3/1_cv                    0.19 %          0.14 %             5
crc32/armv8_pmull_eor3/8_mean                  4.41 ns         4.33 ns            5
crc32/armv8_pmull_eor3/8_median                4.37 ns         4.31 ns            5
crc32/armv8_pmull_eor3/8_stddev               0.111 ns        0.031 ns            5
crc32/armv8_pmull_eor3/8_cv                    2.51 %          0.72 %             5
crc32/armv8_pmull_eor3/12_mean                 5.38 ns         5.37 ns            5
crc32/armv8_pmull_eor3/12_median               5.35 ns         5.34 ns            5
crc32/armv8_pmull_eor3/12_stddev              0.044 ns        0.043 ns            5
crc32/armv8_pmull_eor3/12_cv                   0.81 %          0.81 %             5
crc32/armv8_pmull_eor3/16_mean                 5.82 ns         5.80 ns            5
crc32/armv8_pmull_eor3/16_median               5.84 ns         5.83 ns            5
crc32/armv8_pmull_eor3/16_stddev              0.100 ns        0.100 ns            5
crc32/armv8_pmull_eor3/16_cv                   1.72 %          1.72 %             5
crc32/armv8_pmull_eor3/32_mean                 6.41 ns         6.29 ns            5
crc32/armv8_pmull_eor3/32_median               6.35 ns         6.28 ns            5
crc32/armv8_pmull_eor3/32_stddev              0.195 ns        0.062 ns            5
crc32/armv8_pmull_eor3/32_cv                   3.05 %          0.98 %             5
crc32/armv8_pmull_eor3/64_mean                 8.72 ns         8.69 ns            5
crc32/armv8_pmull_eor3/64_median               8.71 ns         8.69 ns            5
crc32/armv8_pmull_eor3/64_stddev              0.027 ns        0.020 ns            5
crc32/armv8_pmull_eor3/64_cv                   0.31 %          0.23 %             5
crc32/armv8_pmull_eor3/512_mean                36.5 ns         35.9 ns            5
crc32/armv8_pmull_eor3/512_median              36.0 ns         35.9 ns            5
crc32/armv8_pmull_eor3/512_stddev             0.880 ns        0.116 ns            5
crc32/armv8_pmull_eor3/512_cv                  2.41 %          0.32 %             5
crc32/armv8_pmull_eor3/4096_mean                103 ns          102 ns            5
crc32/armv8_pmull_eor3/4096_median              102 ns          102 ns            5
crc32/armv8_pmull_eor3/4096_stddev             2.39 ns        0.227 ns            5
crc32/armv8_pmull_eor3/4096_cv                 2.31 %          0.22 %             5
crc32/armv8_pmull_eor3/32768_mean               400 ns          398 ns            5
crc32/armv8_pmull_eor3/32768_median             399 ns          398 ns            5
crc32/armv8_pmull_eor3/32768_stddev            1.68 ns         1.06 ns            5
crc32/armv8_pmull_eor3/32768_cv                0.42 %          0.27 %             5
crc32/armv8_pmull_eor3/262144_mean             2771 ns         2725 ns            5
crc32/armv8_pmull_eor3/262144_median           2726 ns         2717 ns            5
crc32/armv8_pmull_eor3/262144_stddev            105 ns         19.7 ns            5
crc32/armv8_pmull_eor3/262144_cv               3.77 %          0.72 %             5
crc32/armv8_pmull_eor3/4194304_mean           42313 ns        42187 ns            5
crc32/armv8_pmull_eor3/4194304_median         42313 ns        42192 ns            5
crc32/armv8_pmull_eor3/4194304_stddev          93.6 ns         85.9 ns            5
crc32/armv8_pmull_eor3/4194304_cv              0.22 %          0.20 %             5
crc32_copy/armv8_pmull_eor3/32_mean            6.58 ns         6.49 ns            5
crc32_copy/armv8_pmull_eor3/32_median          6.58 ns         6.48 ns            5
crc32_copy/armv8_pmull_eor3/32_stddev         0.151 ns        0.063 ns            5
crc32_copy/armv8_pmull_eor3/32_cv              2.30 %          0.97 %             5
crc32_copy/armv8_pmull_eor3/512_mean           38.5 ns         38.4 ns            5
crc32_copy/armv8_pmull_eor3/512_median         38.5 ns         38.4 ns            5
crc32_copy/armv8_pmull_eor3/512_stddev        0.042 ns        0.032 ns            5
crc32_copy/armv8_pmull_eor3/512_cv             0.11 %          0.08 %             5
crc32_copy/armv8_pmull_eor3/8192_mean           200 ns          199 ns            5
crc32_copy/armv8_pmull_eor3/8192_median         200 ns          199 ns            5
crc32_copy/armv8_pmull_eor3/8192_stddev       0.306 ns        0.309 ns            5
crc32_copy/armv8_pmull_eor3/8192_cv            0.15 %          0.16 %             5
crc32_copy/armv8_pmull_eor3/32768_mean          638 ns          629 ns            5
crc32_copy/armv8_pmull_eor3/32768_median        638 ns          630 ns            5
crc32_copy/armv8_pmull_eor3/32768_stddev       13.7 ns         4.58 ns            5
crc32_copy/armv8_pmull_eor3/32768_cv           2.15 %          0.73 %             5
crc32_copy/armv8_pmull_eor3/65536_mean         1195 ns         1182 ns            5
crc32_copy/armv8_pmull_eor3/65536_median       1183 ns         1179 ns            5
crc32_copy/armv8_pmull_eor3/65536_stddev       22.4 ns         5.85 ns            5
crc32_copy/armv8_pmull_eor3/65536_cv           1.88 %          0.49 %             5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment