lsqrt.md

Me

can you write compute_least_squares ?

AI Assistant

I'll write the function for you. Based on the context, this appears to be for refining extrema positions in a computer vision/image processing algorithm (likely SIFT or similar), where you need to solve the linear system Hx = -g to find the sub-pixel offset. compute_least_squares

fn compute_least_squares(hessian: mat3x3f, gradient: vec3f) -> vec3f {
    // Solve Hx = -g for x, where H is the Hessian and g is the gradient
    // This gives us the sub-pixel offset for extrema refinement
    
    // Calculate determinant of the Hessian matrix
    let det = determinant(hessian);
    
    // Check if matrix is invertible (determinant close to zero means singular matrix)
    if (abs(det) < 1e-10) {
        return vec3f(0.0, 0.0, 0.0);  // Return zero offset if matrix is singular
    }
    
    // Calculate the inverse of the 3x3 Hessian matrix manually
    let h00 = hessian[0][0]; let h01 = hessian[0][1]; let h02 = hessian[0][2];
    let h10 = hessian[1][0]; let h11 = hessian[1][1]; let h12 = hessian[1][2];
    let h20 = hessian[2][0]; let h21 = hessian[2][1]; let h22 = hessian[2][2];
    
    // Cofactor matrix elements
    let c00 = h11 * h22 - h12 * h21;
    let c01 = h02 * h21 - h01 * h22;
    let c02 = h01 * h12 - h02 * h11;
    let c10 = h12 * h20 - h10 * h22;
    let c11 = h00 * h22 - h02 * h20;
    let c12 = h02 * h10 - h00 * h12;
    let c20 = h10 * h21 - h11 * h20;
    let c21 = h01 * h20 - h00 * h21;
    let c22 = h00 * h11 - h01 * h10;
    
    // Inverse matrix = (1/det) * cofactor_transpose
    let inv_det = 1.0 / det;
    let inv_hessian = mat3x3f(
        vec3f(c00, c10, c20) * inv_det,
        vec3f(c01, c11, c21) * inv_det,
        vec3f(c02, c12, c22) * inv_det
    );
    
    // Solve for x = -H^(-1) * g
    return -(inv_hessian * gradient);
}

This function:

Calculates the determinant of the Hessian matrix to check if it's invertible
Handles singular matrices by returning a zero offset when the determinant is too small
Manually computes the matrix inverse using the cofactor method (since WGSL doesn't have a built-in matrix inverse)
Solves the linear system Hx = -g by computing x = -H^(-1) * g

The result is a 3D vector representing the sub-pixel offset that should be applied to refine the extrema position. This is a standard technique in keypoint detection algorithms like SIFT for achieving sub-pixel accuracy in extrema localization.

Me

do you think using Cholesky decomposition would be relevant?

AI Assistant

Yes, Cholesky decomposition could be relevant here, but there are some important considerations:

When Cholesky is applicable:

Advantages:

Numerically stable and more efficient than general matrix inversion
Better conditioned for solving linear systems
Guaranteed to work when the matrix is positive definite

Requirements: The Hessian matrix must be positive definite (all eigenvalues > 0). In your context:

At a maximum, the Hessian should be negative definite
At a minimum, the Hessian should be positive definite
At a saddle point, the Hessian will have mixed eigenvalues

The challenge:

Since you're detecting extrema (both maxima and minima), you'll encounter both positive and negative definite matrices. For maxima, you'd need to factor -H instead of H.

Practical implementation:

fn compute_least_squares_cholesky(hessian: mat3x3f, gradient: vec3f) -> vec3f {
    // For maxima, we need to negate the Hessian to make it positive definite
    // We can detect this by checking the trace (sum of diagonal elements)
    let trace = hessian[0][0] + hessian[1][1] + hessian[2][2];
    let H = select(hessian, -hessian, trace < 0.0);
    let g = select(gradient, -gradient, trace < 0.0);
    
    // Cholesky decomposition: H = L * L^T
    let h00 = H[0][0]; let h01 = H[0][1]; let h02 = H[0][2];
    let h11 = H[1][1]; let h12 = H[1][2]; let h22 = H[2][2];
    
    // Check if positive definite
    if (h00 <= 0.0) { return vec3f(0.0); }
    
    let l00 = sqrt(h00);
    let l10 = h01 / l00;
    let l11_sq = h11 - l10 * l10;
    if (l11_sq <= 0.0) { return vec3f(0.0); }
    
    let l11 = sqrt(l11_sq);
    let l20 = h02 / l00;
    let l21 = (h12 - l10 * l20) / l11;
    let l22_sq = h22 - l20 * l20 - l21 * l21;
    if (l22_sq <= 0.0) { return vec3f(0.0); }
    
    let l22 = sqrt(l22_sq);
    
    // Forward substitution: L * y = -g
    let y0 = -g.x / l00;
    let y1 = (-g.y - l10 * y0) / l11;
    let y2 = (-g.z - l20 * y0 - l21 * y1) / l22;
    
    // Back substitution: L^T * x = y
    let x2 = y2 / l22;
    let x1 = (y1 - l21 * x2) / l11;
    let x0 = (y0 - l10 * x1 - l20 * x2) / l00;
    
    return vec3f(x0, x1, x2);
}

My recommendation:

For your use case, I'd suggest sticking with the general matrix inversion approach I provided earlier because:

Simpler logic - handles all cases uniformly
Robust handling of saddle points and near-singular matrices
Less branching in GPU code
The 3x3 case is small enough that the performance difference is minimal

Cholesky would be more beneficial for larger matrices or if you knew a priori that all your extrema were of the same type (all minima or all maxima).

Me

ok, let's forget cholesky, can you use the cross product when computing the co-factors?

AI Assistant

Yes! Using cross products is a much more elegant way to compute the cofactors for a 3x3 matrix inverse. Here's the updated function:

fn compute_least_squares(hessian: mat3x3f, gradient: vec3f) -> vec3f {
    // Solve Hx = -g for x, where H is the Hessian and g is the gradient
    
    // Get the column vectors of the Hessian matrix
    let col0 = hessian[0];  // First column
    let col1 = hessian[1];  // Second column
    let col2 = hessian[2];  // Third column
    
    // Calculate determinant using scalar triple product: col0 · (col1 × col2)
    let det = dot(col0, cross(col1, col2));
    
    // Check if matrix is invertible
    if (abs(det) < 1e-10) {
        return vec3f(0.0, 0.0, 0.0);
    }
    
    // Calculate cofactor matrix using cross products
    // Each row of the cofactor matrix is the cross product of two columns
    let cof_row0 = cross(col1, col2);  // Cross product of columns 1 and 2
    let cof_row1 = cross(col2, col0);  // Cross product of columns 2 and 0
    let cof_row2 = cross(col0, col1);  // Cross product of columns 0 and 1
    
    // Inverse matrix = (1/det) * cofactor_transpose
    // Since we computed cofactor rows, we need to transpose by using them as columns
    let inv_det = 1.0 / det;
    let inv_hessian = mat3x3f(cof_row0, cof_row1, cof_row2) * inv_det;
    
    // Solve for x = -H^(-1) * g
    return -(inv_hessian * gradient);
}

This approach is much cleaner because:

Uses the geometric meaning: The cofactor of each element is naturally expressed as a cross product of the other two column vectors
More compact: Reduces the manual cofactor calculations from 9 individual expressions to 3 cross products
Leverages built-in functions: Uses WGSL's cross() and dot() functions
Easier to understand: The relationship between cross products and matrix determinants/cofactors is more intuitive

The key insight is that for a 3x3 matrix, each row of the cofactor matrix is simply the cross product of the corresponding pair of columns from the original matrix.

Me

would using L-U decomposition make sense instead of computing the determinant?

AI Assistant

Yes, LU decomposition would actually be a great approach here! It avoids computing the determinant explicitly and can be more numerically stable. Here's why it makes sense:

Advantages of LU decomposition:

No explicit determinant calculation needed
Better numerical stability than direct matrix inversion
Natural singularity detection during the decomposition process
Efficient solving via forward/back substitution

Here's the implementation:

fn compute_least_squares(hessian: mat3x3f, gradient: vec3f) -> vec3f {
    // Solve Hx = -g using LU decomposition
    
    // Copy hessian to working matrix (LU decomposition modifies the matrix)
    var L = mat3x3f(
        vec3f(0.0, 0.0, 0.0),
        vec3f(0.0, 0.0, 0.0), 
        vec3f(0.0, 0.0, 0.0)
    );
    var U = hessian;  // U will be modified in-place
    
    // LU decomposition without pivoting
    // Step 1: First column of L and first row of U
    if (abs(U[0][0]) < 1e-10) {
        return vec3f(0.0, 0.0, 0.0);  // Matrix is singular
    }
    
    L[0][0] = 1.0;
    L[1][0] = U[1][0] / U[0][0];
    L[2][0] = U[2][0] / U[0][0];
    
    // Step 2: Eliminate first column below diagonal
    U[1][0] = 0.0;
    U[1][1] = U[1][1] - L[1][0] * U[0][1];
    U[1][2] = U[1][2] - L[1][0] * U[0][2];
    
    U[2][0] = 0.0;
    U[2][1] = U[2][1] - L[2][0] * U[0][1];
    U[2][2] = U[2][2] - L[2][0] * U[0][2];
    
    // Step 3: Second column of L and second row of U
    if (abs(U[1][1]) < 1e-10) {
        return vec3f(0.0, 0.0, 0.0);  // Matrix is singular
    }
    
    L[1][1] = 1.0;
    L[2][1] = U[2][1] / U[1][1];
    
    // Step 4: Eliminate second column below diagonal
    U[2][1] = 0.0;
    U[2][2] = U[2][2] - L[2][1] * U[1][2];
    
    // Step 5: Third diagonal element
    if (abs(U[2][2]) < 1e-10) {
        return vec3f(0.0, 0.0, 0.0);  // Matrix is singular
    }
    
    L[2][2] = 1.0;
    
    // Now solve LUx = -g in two steps:
    // 1) Forward substitution: Ly = -g
    let neg_g = -gradient;
    let y0 = neg_g.x / L[0][0];
    let y1 = (neg_g.y - L[1][0] * y0) / L[1][1];
    let y2 = (neg_g.z - L[2][0] * y0 - L[2][1] * y1) / L[2][2];
    
    // 2) Back substitution: Ux = y
    let x2 = y2 / U[2][2];
    let x1 = (y1 - U[1][2] * x2) / U[1][1];
    let x0 = (y0 - U[0][1] * x1 - U[0][2] * x2) / U[0][0];
    
    return vec3f(x0, x1, x2);
}

Benefits over the cross product approach:

No determinant calculation - we detect singularity naturally during decomposition
More numerically stable - especially for ill-conditioned matrices
Direct solving - we go straight from LU to solution without computing the full inverse
Early termination - we can bail out as soon as we detect a zero pivot

Trade-offs:

More code but potentially more robust
Sequential nature might be less GPU-friendly than the parallel cross product approach
No partial pivoting in this simple version (could add if needed)

For your SIFT-like application, either approach should work well, but LU decomposition might give you better numerical behavior with borderline cases near saddle points or when the Hessian is close to singular.

nraynaud/lsqrt.md

Select an option

No results found

Select an option

No results found

Me

AI Assistant

Me

AI Assistant

When Cholesky is applicable:

The challenge:

Practical implementation:

My recommendation:

Me

AI Assistant

Me

AI Assistant

Advantages of LU decomposition:

Benefits over the cross product approach:

Trade-offs: