Skip to content

Instantly share code, notes, and snippets.

@icedmoca
Created September 7, 2025 03:47
Show Gist options
  • Select an option

  • Save icedmoca/471c85ae7fd4cfac9e4bdc3e78a2fa1c to your computer and use it in GitHub Desktop.

Select an option

Save icedmoca/471c85ae7fd4cfac9e4bdc3e78a2fa1c to your computer and use it in GitHub Desktop.
sha256_90r_old_table_ref

SHA-256 vs SHA256-90R – Multi-Backend Performance (AWS Intel Xeon Platinum 8375C @ 2.90 GHz)

Algorithm / Backend Rounds Block / Output Size Cycles/Byte (cpb) Bytes/Cycle Latency (ns) Throughput/Core (Gbps) Slowdown vs Standard System Specs
SHA-256 (Scalar C) 64 512b / 256b ~60.0 0.017 ~240 ns ~0.46 Gbps – (baseline scalar) Portable C, no SIMD
SHA256-90R (Scalar C) 90 512b / 256b ~45.0 0.022 ~300 ns ~0.60 Gbps ~30% slower vs 64r scalar Same C impl, 40% more rounds
SHA-256 (AVX2) 64 512b / 256b ~7.1 0.14 ~20 ns ~4.2 Gbps ~9× faster vs scalar AVX2 vectorized, 4-way blocks
SHA256-90R (AVX2) 90 512b / 256b ~11.0 0.091 ~24 ns ~2.7 Gbps ~36% slower vs 64r AVX2 AVX2 unrolled 90 rounds
SHA-256 (SHA-NI) 64 512b / 256b 0.18 5.56 ~18 ns 154.8 Gbps – (baseline fast) Intel SHA-NI instructions
SHA256-90R (SHA-NI) 90 512b / 256b 0.26 3.85 ~24 ns 106.7 Gbps ~31% slower Same system, same SHA-NI
SHA-256 (FPGA Sim) 64 512b / 256b ~5.0 (est) 0.20 ~50 ns ~12.3 Gbps Baseline FPGA 200 MHz pipelined sim
SHA256-90R (FPGA Sim) 90 512b / 256b ~6.5 (est) 0.15 ~67 ns ~12.8 Gbps ~4% faster (pipeline depth) 90-stage FPGA pipeline
SHA-256 (GPU est.) 64 512b / 256b ~0.02 50.0 <0.1 ns 45–50 Gbps ~100× scalar CUDA/OpenCL batch
SHA256-90R (GPU est.) 90 512b / 256b ~0.03 33.0 <0.1 ns 50+ Gbps Comparable (batch parallel) Needs warp-level opt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment