This document provides an overview and detailed explanation of the Verilog implementation of a 90-stage SHA256-90R FPGA pipeline, converted from a provided C code simulation of a hardware pipeline. The design implements a fully pipelined SHA-256 hash function with 90 rounds, optimized for FPGA hardware with constant-time operation to mitigate timing attacks.
The SHA256-90R pipeline is a hardware implementation of the SHA-256 cryptographic hash function, extended to 90 rounds for enhanced security or specific application requirements. The design processes a 512-bit input block and produces a 256-bit hash output, achieving a throughput of one hash per clock cycle after an initial pipeline fill-up period. The implementation is constant-time, using arithmetic masking to ensure consistent execution regardless of input data.
- 90-Stage Pipeline: Each stage performs one round of the SHA-256 compression function.
- Constant-Time Operation: Uses masking to prevent timing-based side-channel attacks.
- Input/Output: Accepts a 512-bit input block and produces a 256-bit hash.
- Message Schedule: Pre-computes 90 message words from the input block.
- Synchronous Design: Operates with a single clock and active-low reset.
The Verilog code consists of two main modules:
- sha256_round: A combinational module that implements a single SHA-256 round with masking.
- sha256_90r_pipeline: The top-level module that manages the 90-stage pipeline, message schedule computation, and output generation.
This module performs a single SHA-256 round computation, equivalent to the fpga_round_masked function in the original C code.
a, b, c, d, e, f, g, h(32-bit each): Current state variables.w(32-bit): Message word for the round.k(32-bit): Round constant.valid(1-bit): Valid flag to enable/disable computation.
a_out, b_out, c_out, d_out, e_out, f_out, g_out, h_out(32-bit each): Updated state variables.
- Computes the SHA-256 round function:
t1 = h + Σ1(e) + Ch(e,f,g) + k + w, where:Σ1(e) = (e >>> 6) ^ (e >>> 11) ^ (e >>> 25)Ch(e,f,g) = (e & f) ^ (~e & g)
t2 = Σ0(a) + Maj(a,b,c), where:Σ0(a) = (a >>> 2) ^ (a >>> 13) ^ (a >>> 22)Maj(a,b,c) = (a & b) ^ (a & c) ^ (b & c)
- Updates state variables:
h_out = g,g_out = f,f_out = e,e_out = d + t1d_out = c,c_out = b,b_out = a,a_out = t1 + t2
- Applies a
valid_mask(0xFFFFFFFF ifvalid=1, else 0) to ensure constant-time updates by conditionally preserving the current state ifvalid=0.
This is the top-level module that implements the 90-stage pipeline and message schedule computation.
clk: Clock signal for synchronous operation.rst_n: Active-low reset signal.input_valid: Indicates a valid 512-bit input block.data_in(512-bit): Input block to be hashed.
output_valid: Indicates a valid hash output.hash_out(256-bit): The computed SHA-256 hash.
- Constants (
k): A 90-entry array of 32-bit round constants, initialized with the SHA-256 constants extended to 90 rounds (with padding zeros). - Pipeline Registers:
stage_a, stage_b, ..., stage_h(90 x 32-bit): State variables for each pipeline stage.stage_w(90 x 32-bit): Message words.stage_k(90 x 32-bit): Round constants.stage_valid(90 x 1-bit): Valid flags.current_stage(7-bit): Tracks the current round (0–89).pipeline_filled: Indicates when the pipeline is fully loaded.
- Message Schedule Registers:
m(90 x 32-bit): Stores the pre-computed message schedule.msg_cycle(7-bit): Tracks the message schedule computation progress.
- Message Schedule Computation:
- On
input_valid=1, computes 90 message words (m[0:89]):- First 16 words are extracted from
data_in(big-endian). - Words 16–89 are computed using the SHA-256 expansion formula:
where:
m[i] = m[i-16] + σ0(m[i-15]) + m[i-7] + σ1(m[i-2])
σ0(x) = (x >>> 7) ^ (x >>> 18) ^ (x >> 3)σ1(x) = (x >>> 17) ^ (x >>> 19) ^ (x >> 10)
- First 16 words are extracted from
- Takes 90 clock cycles to complete.
- On
- Pipeline Operation:
- On each clock cycle:
- Shifts pipeline stages (stages 1–89 copy from stage 0–88).
- Stage 0 is initialized with SHA-256 initial state (
H0–H7) ifinput_valid=1, masked to preserve existing values ifinput_valid=0. - Each stage (1–89) applies the
sha256_roundmodule to compute the next state.
- Updates
current_stageandpipeline_filledwheninput_valid=1.
- On each clock cycle:
- Output:
output_validis set tostage_valid[89].hash_outis set to{stage_a[89], ..., stage_h[89]}when valid.
- Reset: On
rst_n=0, all registers are cleared. - Input Phase: When
input_valid=1, the module loads the 512-bitdata_inand computes the message schedule (m[0:89]) over 90 clock cycles. - Pipeline Processing: Each cycle, the pipeline shifts data, with stage 0 loading initial state and message words, and stages 1–89 performing round computations.
- Output Phase: After 179 cycles (90 input + 89 drain),
output_validgoes high, andhash_outcontains the final hash.
- The pipeline uses a
valid_maskin thesha256_roundmodule to ensure constant-time operation, preventing timing attacks by performing computations regardless of input validity. - All pipeline stages are updated every clock cycle, even if
input_valid=0, maintaining consistent timing.
Based on the original C code’s estimation:
- LUTs: ~500 per stage x 90 stages = ~45,000 LUTs.
- Flip-Flops: ~256 per stage x 90 stages = ~23,040 FFs.
- BRAM: ~4 blocks for constants and message storage.
- DSP Slices: 0 (pure logic implementation).
- Max Frequency: Estimated at 300 MHz, though actual performance depends on the FPGA and synthesis tool.
- Batch Processing: The original C code included batch processing for multiple pipelines, which is omitted here for simplicity but can be added as multiple instances of
sha256_90r_pipeline. - Simulation: The design requires 179 cycles to produce a hash, which may be slow on online simulators like JDoodle due to large array sizes.
A testbench (sha256_90r_pipeline_tb.v) is provided to simulate the design:
- Generates a 100 MHz clock.
- Applies a reset, then inputs a padded zero block (
0x8000...0040). - Displays the hash after 179 cycles.
- Can be combined with the main module for simulation on JDoodle or other tools.
To use the module:
- Instantiate
sha256_90r_pipelinein your design. - Provide a 512-bit input block and assert
input_validfor one cycle. - Wait 179 cycles for
output_validto go high and readhash_out. - For continuous operation, input new blocks every cycle after the pipeline is filled (steady-state throughput: 1 hash/clock).
- Optimization: Further pipeline the round function to reduce critical path and increase clock frequency.
- Batch Processing: Implement multiple pipelines for parallel processing, as in the original C code’s
fpga_batch_pipeline_t. - Testbench Enhancements: Add more test cases with known SHA-256 outputs for verification.
- Synthesis Testing: Validate on a real FPGA (e.g., Xilinx Vivado) to confirm resource usage and timing.
This implementation provides a robust, constant-time SHA256-90R pipeline suitable for FPGA deployment, with clear mappings from the original C code to hardware constructs.
module sha256_round (
input wire [31:0] a, b, c, d, e, f, g, h,
input wire [31:0] w, k,
input wire valid,
output wire [31:0] a_out, b_out, c_out, d_out, e_out, f_out, g_out, h_out
);
wire [31:0] valid_mask = valid ? 32'hFFFFFFFF : 32'h0;
wire [31:0] t1 = h + (((e >> 6) | (e << 26)) ^ ((e >> 11) | (e << 21)) ^ ((e >> 25) | (e << 7))) +
((e & f) ^ (~e & g)) + k + w;
wire [31:0] t2 = (((a >> 2) | (a << 30)) ^ ((a >> 13) | (a << 19)) ^ ((a >> 22) | (a << 10))) +
((a & b) ^ (a & c) ^ (b & c));
assign a_out = ((t1 + t2) & valid_mask) | (a & ~valid_mask);
assign b_out = (a & valid_mask) | (b & ~valid_mask);
assign c_out = (b & valid_mask) | (c & ~valid_mask);
assign d_out = (c & valid_mask) | (d & ~valid_mask);
assign e_out = ((d + t1) & valid_mask) | (e & ~valid_mask);
assign f_out = (e & valid_mask) | (f & ~valid_mask);
assign g_out = (f & valid_mask) | (g & ~valid_mask);
assign h_out = (g & valid_mask) | (h & ~valid_mask);
endmodule
module sha256_90r_pipeline (
input wire clk, // Clock input
input wire rst_n, // Active-low reset
input wire input_valid, // Input valid signal
input wire [511:0] data_in, // 512-bit input block
output reg output_valid, // Output valid signal
output reg [255:0] hash_out // 256-bit hash output
);
// SHA-256 constants (k_90r_fpga)
reg [31:0] k [0:95];
initial begin
k[0] = 32'h428a2f98; k[1] = 32'h71374491; k[2] = 32'hb5c0fbcf; k[3] = 32'he9b5dba5;
k[4] = 32'h3956c25b; k[5] = 32'h59f111f1; k[6] = 32'h923f82a4; k[7] = 32'hab1c5ed5;
k[8] = 32'hd807aa98; k[9] = 32'h12835b01; k[10] = 32'h243185be; k[11] = 32'h550c7dc3;
k[12] = 32'h72be5d74; k[13] = 32'h80deb1fe; k[14] = 32'h9bdc06a7; k[15] = 32'hc19bf174;
k[16] = 32'he49b69c1; k[17] = 32'hefbe4786; k[18] = 32'h0fc19dc6; k[19] = 32'h240ca1cc;
k[20] = 32'h2de92c6f; k[21] = 32'h4a7484aa; k[22] = 32'h5cb0a9dc; k[23] = 32'h76f988da;
k[24] = 32'h983e5152; k[25] = 32'ha831c66d; k[26] = 32'hb00327c8; k[27] = 32'hbf597fc7;
k[28] = 32'hc6e00bf3; k[29] = 32'hd5a79147; k[30] = 32'h06ca6351; k[31] = 32'h14292967;
k[32] = 32'h27b70a85; k[33] = 32'h2e1b2138; k[34] = 32'h4d2c6dfc; k[35] = 32'h53380d13;
k[36] = 32'h650a7354; k[37] = 32'h766a0abb; k[38] = 32'h81c2c92e; k[39] = 32'h92722c85;
k[40] = 32'ha2bfe8a1; k[41] = 32'ha81a664b; k[42] = 32'hc24b8b70; k[43] = 32'hc76c51a3;
k[44] = 32'hd192e819; k[45] = 32'hd6990624; k[46] = 32'hf40e3585; k[47] = 32'h106aa070;
k[48] = 32'h19a4c116; k[49] = 32'h1e376c08; k[50] = 32'h2748774c; k[51] = 32'h34b0bcb5;
k[52] = 32'h391c0cb3; k[53] = 32'h4ed8aa4a; k[54] = 32'h5b9cca4f; k[55] = 32'h682e6ff3;
k[56] = 32'h748f82ee; k[57] = 32'h78a5636f; k[58] = 32'h84c87814; k[59] = 32'h8cc70208;
k[60] = 32'h90befffa; k[61] = 32'ha4506ceb; k[62] = 32'hbef9a3f7; k[63] = 32'hc67178f2;
k[64] = 32'hc67178f2; k[65] = 32'hca273ece; k[66] = 32'hd186b8c7; k[67] = 32'heada7dd6;
k[68] = 32'hf57d4f7f; k[69] = 32'h06f067aa; k[70] = 32'h0a637dc5; k[71] = 32'h113f9804;
k[72] = 32'h1b710b35; k[73] = 32'h28db77f5; k[74] = 32'h32caab7b; k[75] = 32'h3c9ebe0a;
k[76] = 32'h431d67c4; k[77] = 32'h4cc5d4be; k[78] = 32'h597f299c; k[79] = 32'h5fcb6fab;
k[80] = 32'h6c44198c; k[81] = 32'h7ba0ea2d; k[82] = 32'h7eabf2d0; k[83] = 32'h8dbe8d03;
k[84] = 32'h90bb1721; k[85] = 32'h99a2ad45; k[86] = 32'h9f86e289; k[87] = 32'ha84c4472;
k[88] = 32'hb3df34fc; k[89] = 32'hb99bb8d7; k[90] = 32'h0; k[91] = 32'h0;
k[92] = 32'h0; k[93] = 32'h0; k[94] = 32'h0; k[95] = 32'h0;
end
// Pipeline stage registers
reg [31:0] stage_a [0:89];
reg [31:0] stage_b [0:89];
reg [31:0] stage_c [0:89];
reg [31:0] stage_d [0:89];
reg [31:0] stage_e [0:89];
reg [31:0] stage_f [0:89];
reg [31:0] stage_g [0:89];
reg [31:0] stage_h [0:89];
reg [31:0] stage_w [0:89];
reg [31:0] stage_k [0:89];
reg stage_valid [0:89];
reg [6:0] current_stage; // Tracks pipeline progress (0-89)
reg pipeline_filled;
// Message schedule registers
reg [31:0] m [0:89];
reg [6:0] msg_cycle; // Tracks message schedule generation (0-89)
// SHA-256 initial state
localparam [31:0] H0 = 32'h6a09e667;
localparam [31:0] H1 = 32'hbb67ae85;
localparam [31:0] H2 = 32'h3c6ef372;
localparam [31:0] H3 = 32'ha54ff53a;
localparam [31:0] H4 = 32'h510e527f;
localparam [31:0] H5 = 32'h9b05688c;
localparam [31:0] H6 = 32'h1f83d9ab;
localparam [31:0] H7 = 32'h5be0cd19;
// Message schedule computation
integer i;
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
msg_cycle <= 0;
for (i = 0; i < 90; i = i + 1) begin
m[i] <= 0;
end
end else if (input_valid && msg_cycle < 90) begin
if (msg_cycle < 16) begin
m[msg_cycle] <= data_in[511 - msg_cycle*32 -: 32];
end else begin
m[msg_cycle] <= m[msg_cycle-16] +
(((m[msg_cycle-15] >> 7) | (m[msg_cycle-15] << 25)) ^
((m[msg_cycle-15] >> 18) | (m[msg_cycle-15] << 14)) ^
(m[msg_cycle-15] >> 3)) +
m[msg_cycle-7] +
(((m[msg_cycle-2] >> 17) | (m[msg_cycle-2] << 15)) ^
((m[msg_cycle-2] >> 19) | (m[msg_cycle-2] << 13)) ^
(m[msg_cycle-2] >> 10));
end
msg_cycle <= msg_cycle + 1;
end
end
// Pipeline logic
wire [31:0] a_next [0:89];
wire [31:0] b_next [0:89];
wire [31:0] c_next [0:89];
wire [31:0] d_next [0:89];
wire [31:0] e_next [0:89];
wire [31:0] f_next [0:89];
wire [31:0] g_next [0:89];
wire [31:0] h_next [0:89];
// Instantiate round modules for each stage (except first stage)
genvar j;
generate
for (j = 1; j < 90; j = j + 1) begin : round_gen
sha256_round round (
.a(stage_a[j-1]), .b(stage_b[j-1]), .c(stage_c[j-1]), .d(stage_d[j-1]),
.e(stage_e[j-1]), .f(stage_f[j-1]), .g(stage_g[j-1]), .h(stage_h[j-1]),
.w(stage_w[j-1]), .k(stage_k[j-1]), .valid(stage_valid[j-1]),
.a_out(a_next[j]), .b_out(b_next[j]), .c_out(c_next[j]), .d_out(d_next[j]),
.e_out(e_next[j]), .f_out(f_next[j]), .g_out(g_next[j]), .h_out(h_next[j])
);
end
endgenerate
// First stage round computation
wire [31:0] input_mask = input_valid ? 32'hFFFFFFFF : 32'h0;
assign a_next[0] = (H0 & input_mask) | (stage_a[0] & ~input_mask);
assign b_next[0] = (H1 & input_mask) | (stage_b[0] & ~input_mask);
assign c_next[0] = (H2 & input_mask) | (stage_c[0] & ~input_mask);
assign d_next[0] = (H3 & input_mask) | (stage_d[0] & ~input_mask);
assign e_next[0] = (H4 & input_mask) | (stage_e[0] & ~input_mask);
assign f_next[0] = (H5 & input_mask) | (stage_f[0] & ~input_mask);
assign g_next[0] = (H6 & input_mask) | (stage_g[0] & ~input_mask);
assign h_next[0] = (H7 & input_mask) | (stage_h[0] & ~input_mask);
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
for (i = 0; i < 90; i = i + 1) begin
stage_a[i] <= 0;
stage_b[i] <= 0;
stage_c[i] <= 0;
stage_d[i] <= 0;
stage_e[i] <= 0;
stage_f[i] <= 0;
stage_g[i] <= 0;
stage_h[i] <= 0;
stage_w[i] <= 0;
stage_k[i] <= 0;
stage_valid[i] <= 0;
end
current_stage <= 0;
pipeline_filled <= 0;
output_valid <= 0;
hash_out <= 0;
end else begin
// Shift pipeline stages and update
for (i = 0; i < 90; i = i + 1) begin
if (i == 0) begin
stage_a[0] <= a_next[0];
stage_b[0] <= b_next[0];
stage_c[0] <= c_next[0];
stage_d[0] <= d_next[0];
stage_e[0] <= e_next[0];
stage_f[0] <= f_next[0];
stage_g[0] <= g_next[0];
stage_h[0] <= h_next[0];
stage_w[0] <= (m[current_stage] & input_mask) | (stage_w[0] & ~input_mask);
stage_k[0] <= (k[current_stage] & input_mask) | (stage_k[0] & ~input_mask);
stage_valid[0] <= input_valid ? 1 : stage_valid[0];
end else begin
stage_a[i] <= a_next[i];
stage_b[i] <= b_next[i];
stage_c[i] <= c_next[i];
stage_d[i] <= d_next[i];
stage_e[i] <= e_next[i];
stage_f[i] <= f_next[i];
stage_g[i] <= g_next[i];
stage_h[i] <= h_next[i];
stage_w[i] <= stage_w[i-1];
stage_k[i] <= stage_k[i-1];
stage_valid[i] <= stage_valid[i-1];
end
end
// Update pipeline state
if (input_valid && !pipeline_filled) begin
current_stage <= current_stage + 1;
if (current_stage >= 89) begin
pipeline_filled <= 1;
end
end
// Output logic
output_valid <= stage_valid[89];
if (stage_valid[89]) begin
hash_out <= {stage_a[89], stage_b[89], stage_c[89], stage_d[89],
stage_e[89], stage_f[89], stage_g[89], stage_h[89]};
end
end
end
endmodule