| Algorithm / Backend | Rounds | Block / Output Size | Cycles/Byte (cpb) | Bytes/Cycle | Latency (ns) | Throughput/Core (Gbps) | Slowdown vs Standard | System Specs |
|---|---|---|---|---|---|---|---|---|
| SHA-256 (Scalar C) | 64 | 512b / 256b | ~60.0 | 0.017 | ~240 ns | ~0.46 Gbps | – (baseline scalar) | Portable C, no SIMD |
| SHA256-90R (Scalar C) | 90 | 512b / 256b | ~45.0 | 0.022 | ~300 ns | ~0.60 Gbps | ~30% slower vs 64r scalar | Same C impl, 40% more rounds |
| SHA-256 (AVX2) | 64 | 512b / 256b | ~7.1 | 0.14 | ~20 ns | ~4.2 Gbps | ~9× faster vs scalar | AVX2 vectorized, 4-way blocks |
| SHA256-90R (AVX2) | 90 | 512b / 256b | ~11.0 | 0.091 | ~24 ns | ~2.7 Gbps | ~36% slower vs 64r AVX2 | AVX2 unrolled 90 rounds |
| SHA-256 (SHA-NI) | 64 | 512b / 256b | 0.18 | 5.56 | ~18 ns | 154.8 Gbps | – (baseline fast) | Intel SHA-NI instructions |
| SHA256-90R (SHA-NI) | 90 | 512b / 256b | 0.26 | 3.85 | ~24 ns | 106.7 Gbps | ~31% slower | Same system, same SHA-NI |
| SHA-256 (FPGA Sim) | 64 | 512b / 256b | ~5.0 (est) | 0.20 | ~50 ns | ~12.3 Gbps | Baseline FPGA | 200 MHz pipelined sim |
| SHA256-90R (FPGA Sim) | 90 | 512b / 256b | ~6.5 (est) | 0.15 | ~67 ns | ~12.8 Gbps | ~4% faster (pipeline depth) | 90-stage FPGA pipeline |
| SHA-256 (GPU est.) | 64 | 512b / 256b | ~0.02 | 50.0 | <0.1 ns | 45–50 Gbps | ~100× scalar | CUDA/OpenCL batch |
| SHA256-90R (GPU est.) | 90 | 512b / 256b | ~0.03 | 33.0 | <0.1 ns | 50+ Gbps | Comparable (batch parallel) | Needs warp-level opt |
Created
September 7, 2025 03:47
-
-
Save icedmoca/471c85ae7fd4cfac9e4bdc3e78a2fa1c to your computer and use it in GitHub Desktop.
sha256_90r_old_table_ref
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment