can you write compute_least_squares ?
I'll write the function for you. Based on the context, this appears to be for refining extrema positions in a computer vision/image processing algorithm (likely SIFT or similar), where you need to solve the linear system Hx = -g to find the sub-pixel offset. compute_least_squares
fn compute_least_squares(hessian: mat3x3f, gradient: vec3f) -> vec3f {
// Solve Hx = -g for x, where H is the Hessian and g is the gradient
// This gives us the sub-pixel offset for extrema refinement
// Calculate determinant of the Hessian matrix
let det = determinant(hessian);
// Check if matrix is invertible (determinant close to zero means singular matrix)
if (abs(det) < 1e-10) {
return vec3f(0.0, 0.0, 0.0); // Return zero offset if matrix is singular
}
// Calculate the inverse of the 3x3 Hessian matrix manually
let h00 = hessian[0][0]; let h01 = hessian[0][1]; let h02 = hessian[0][2];
let h10 = hessian[1][0]; let h11 = hessian[1][1]; let h12 = hessian[1][2];
let h20 = hessian[2][0]; let h21 = hessian[2][1]; let h22 = hessian[2][2];
// Cofactor matrix elements
let c00 = h11 * h22 - h12 * h21;
let c01 = h02 * h21 - h01 * h22;
let c02 = h01 * h12 - h02 * h11;
let c10 = h12 * h20 - h10 * h22;
let c11 = h00 * h22 - h02 * h20;
let c12 = h02 * h10 - h00 * h12;
let c20 = h10 * h21 - h11 * h20;
let c21 = h01 * h20 - h00 * h21;
let c22 = h00 * h11 - h01 * h10;
// Inverse matrix = (1/det) * cofactor_transpose
let inv_det = 1.0 / det;
let inv_hessian = mat3x3f(
vec3f(c00, c10, c20) * inv_det,
vec3f(c01, c11, c21) * inv_det,
vec3f(c02, c12, c22) * inv_det
);
// Solve for x = -H^(-1) * g
return -(inv_hessian * gradient);
}This function:
- Calculates the determinant of the Hessian matrix to check if it's invertible
- Handles singular matrices by returning a zero offset when the determinant is too small
- Manually computes the matrix inverse using the cofactor method (since WGSL doesn't have a built-in matrix inverse)
- Solves the linear system Hx = -g by computing x = -H^(-1) * g
The result is a 3D vector representing the sub-pixel offset that should be applied to refine the extrema position. This is a standard technique in keypoint detection algorithms like SIFT for achieving sub-pixel accuracy in extrema localization.
do you think using Cholesky decomposition would be relevant?
Yes, Cholesky decomposition could be relevant here, but there are some important considerations:
Advantages:
- Numerically stable and more efficient than general matrix inversion
- Better conditioned for solving linear systems
- Guaranteed to work when the matrix is positive definite
Requirements: The Hessian matrix must be positive definite (all eigenvalues > 0). In your context:
- At a maximum, the Hessian should be negative definite
- At a minimum, the Hessian should be positive definite
- At a saddle point, the Hessian will have mixed eigenvalues
Since you're detecting extrema (both maxima and minima), you'll encounter both positive and negative definite matrices. For maxima, you'd need to factor -H instead of H.
fn compute_least_squares_cholesky(hessian: mat3x3f, gradient: vec3f) -> vec3f {
// For maxima, we need to negate the Hessian to make it positive definite
// We can detect this by checking the trace (sum of diagonal elements)
let trace = hessian[0][0] + hessian[1][1] + hessian[2][2];
let H = select(hessian, -hessian, trace < 0.0);
let g = select(gradient, -gradient, trace < 0.0);
// Cholesky decomposition: H = L * L^T
let h00 = H[0][0]; let h01 = H[0][1]; let h02 = H[0][2];
let h11 = H[1][1]; let h12 = H[1][2]; let h22 = H[2][2];
// Check if positive definite
if (h00 <= 0.0) { return vec3f(0.0); }
let l00 = sqrt(h00);
let l10 = h01 / l00;
let l11_sq = h11 - l10 * l10;
if (l11_sq <= 0.0) { return vec3f(0.0); }
let l11 = sqrt(l11_sq);
let l20 = h02 / l00;
let l21 = (h12 - l10 * l20) / l11;
let l22_sq = h22 - l20 * l20 - l21 * l21;
if (l22_sq <= 0.0) { return vec3f(0.0); }
let l22 = sqrt(l22_sq);
// Forward substitution: L * y = -g
let y0 = -g.x / l00;
let y1 = (-g.y - l10 * y0) / l11;
let y2 = (-g.z - l20 * y0 - l21 * y1) / l22;
// Back substitution: L^T * x = y
let x2 = y2 / l22;
let x1 = (y1 - l21 * x2) / l11;
let x0 = (y0 - l10 * x1 - l20 * x2) / l00;
return vec3f(x0, x1, x2);
}For your use case, I'd suggest sticking with the general matrix inversion approach I provided earlier because:
- Simpler logic - handles all cases uniformly
- Robust handling of saddle points and near-singular matrices
- Less branching in GPU code
- The 3x3 case is small enough that the performance difference is minimal
Cholesky would be more beneficial for larger matrices or if you knew a priori that all your extrema were of the same type (all minima or all maxima).
ok, let's forget cholesky, can you use the cross product when computing the co-factors?
Yes! Using cross products is a much more elegant way to compute the cofactors for a 3x3 matrix inverse. Here's the updated function:
fn compute_least_squares(hessian: mat3x3f, gradient: vec3f) -> vec3f {
// Solve Hx = -g for x, where H is the Hessian and g is the gradient
// Get the column vectors of the Hessian matrix
let col0 = hessian[0]; // First column
let col1 = hessian[1]; // Second column
let col2 = hessian[2]; // Third column
// Calculate determinant using scalar triple product: col0 · (col1 × col2)
let det = dot(col0, cross(col1, col2));
// Check if matrix is invertible
if (abs(det) < 1e-10) {
return vec3f(0.0, 0.0, 0.0);
}
// Calculate cofactor matrix using cross products
// Each row of the cofactor matrix is the cross product of two columns
let cof_row0 = cross(col1, col2); // Cross product of columns 1 and 2
let cof_row1 = cross(col2, col0); // Cross product of columns 2 and 0
let cof_row2 = cross(col0, col1); // Cross product of columns 0 and 1
// Inverse matrix = (1/det) * cofactor_transpose
// Since we computed cofactor rows, we need to transpose by using them as columns
let inv_det = 1.0 / det;
let inv_hessian = mat3x3f(cof_row0, cof_row1, cof_row2) * inv_det;
// Solve for x = -H^(-1) * g
return -(inv_hessian * gradient);
}This approach is much cleaner because:
- Uses the geometric meaning: The cofactor of each element is naturally expressed as a cross product of the other two column vectors
- More compact: Reduces the manual cofactor calculations from 9 individual expressions to 3 cross products
- Leverages built-in functions: Uses WGSL's
cross()anddot()functions - Easier to understand: The relationship between cross products and matrix determinants/cofactors is more intuitive
The key insight is that for a 3x3 matrix, each row of the cofactor matrix is simply the cross product of the corresponding pair of columns from the original matrix.
would using L-U decomposition make sense instead of computing the determinant?
Yes, LU decomposition would actually be a great approach here! It avoids computing the determinant explicitly and can be more numerically stable. Here's why it makes sense:
- No explicit determinant calculation needed
- Better numerical stability than direct matrix inversion
- Natural singularity detection during the decomposition process
- Efficient solving via forward/back substitution
Here's the implementation:
fn compute_least_squares(hessian: mat3x3f, gradient: vec3f) -> vec3f {
// Solve Hx = -g using LU decomposition
// Copy hessian to working matrix (LU decomposition modifies the matrix)
var L = mat3x3f(
vec3f(0.0, 0.0, 0.0),
vec3f(0.0, 0.0, 0.0),
vec3f(0.0, 0.0, 0.0)
);
var U = hessian; // U will be modified in-place
// LU decomposition without pivoting
// Step 1: First column of L and first row of U
if (abs(U[0][0]) < 1e-10) {
return vec3f(0.0, 0.0, 0.0); // Matrix is singular
}
L[0][0] = 1.0;
L[1][0] = U[1][0] / U[0][0];
L[2][0] = U[2][0] / U[0][0];
// Step 2: Eliminate first column below diagonal
U[1][0] = 0.0;
U[1][1] = U[1][1] - L[1][0] * U[0][1];
U[1][2] = U[1][2] - L[1][0] * U[0][2];
U[2][0] = 0.0;
U[2][1] = U[2][1] - L[2][0] * U[0][1];
U[2][2] = U[2][2] - L[2][0] * U[0][2];
// Step 3: Second column of L and second row of U
if (abs(U[1][1]) < 1e-10) {
return vec3f(0.0, 0.0, 0.0); // Matrix is singular
}
L[1][1] = 1.0;
L[2][1] = U[2][1] / U[1][1];
// Step 4: Eliminate second column below diagonal
U[2][1] = 0.0;
U[2][2] = U[2][2] - L[2][1] * U[1][2];
// Step 5: Third diagonal element
if (abs(U[2][2]) < 1e-10) {
return vec3f(0.0, 0.0, 0.0); // Matrix is singular
}
L[2][2] = 1.0;
// Now solve LUx = -g in two steps:
// 1) Forward substitution: Ly = -g
let neg_g = -gradient;
let y0 = neg_g.x / L[0][0];
let y1 = (neg_g.y - L[1][0] * y0) / L[1][1];
let y2 = (neg_g.z - L[2][0] * y0 - L[2][1] * y1) / L[2][2];
// 2) Back substitution: Ux = y
let x2 = y2 / U[2][2];
let x1 = (y1 - U[1][2] * x2) / U[1][1];
let x0 = (y0 - U[0][1] * x1 - U[0][2] * x2) / U[0][0];
return vec3f(x0, x1, x2);
}- No determinant calculation - we detect singularity naturally during decomposition
- More numerically stable - especially for ill-conditioned matrices
- Direct solving - we go straight from LU to solution without computing the full inverse
- Early termination - we can bail out as soon as we detect a zero pivot
- More code but potentially more robust
- Sequential nature might be less GPU-friendly than the parallel cross product approach
- No partial pivoting in this simple version (could add if needed)
For your SIFT-like application, either approach should work well, but LU decomposition might give you better numerical behavior with borderline cases near saddle points or when the Hessian is close to singular.