Solving a specific sparse linear system without dense materialization

Question

I need to (computationally) solve a system of equations, for the purposes of an interior point method, of the form
$$
left[begin{array}{cc}B & A^T  A & 0end{array}right] left[begin{array}{c}xyend{array}right] = left[begin{array}{c}c_x  c_yend{array}right]
$$
where $A$ and $B$ are large and sparse, and $B$ is Hermitian positive definite. By assumption, $A$ has full row rank. My current approach goes:

Compute the Cholesky factor $B = L L^T$.
Solve $L W = A^{T}$ for $W$, producing $W=L^{-1}A^T$ so that $W^T W = A L^{-T} L^{-1} A^T$
Compute the Cholesky factor of $W^T W$
Use this to solve $A L^{-T} L^{-1} A^T y = A L^{-T} L^{-1} c_x - c_y$
Recover $x$ from $L L^T x = c_x - A^T y$

The problem that I'm running into is that $W$, in step 2, may be dense and its sparsity structure indeterminate based on the sparsity of $A$ and $B$. The sparsity of $W$ is dependent on the actual numerical values of $A$ and $L$, which is hard to optimize for.
Is there a way to avoid materializing $W$? I'd like to avoid having to invert $A$, as while $A$ itself usually has a sparse structure, its inverse is not guaranteed to be sparse.

Brian Borchers · Accepted Answer

There are two common ways to deal with systems of equations like this that arise in interior-point methods.
We'll assume that $A$ is of full row rank but not full column rank.  That is, $A$ has more columns than rows.  Thus we can't solve systems of equations involving $A$ uniquely or find $A^{-1}$.

If $B$ is symmetric and positive definite, we can reduce this system of equations to a smaller system of equations in $y$ by solving for $x$ in terms of $y$,

$x=B^{-1}(c_{x}-A^{T}y)$
and then substituting this into the equation $Ax=b$ to get
$AB^{-1}A^{T}y=AB^{-1}c_{x}-b$.
The matrix in this equation for $y$ is symmetric and positive definite, so we can solve the system by sparse Cholesky factorization.
In interior point methods for linear programming, $B$ is diagonal, so $B^{-1}$ is easy to compute and the the matrix $AB^{-1}A^{T}$ has the same sparsity pattern as as $AA^{T}$.  The speed of the Cholesky factorization can be improved by reordering the rows of $A$ (and symmetrically the columns of $A^{T}$) to minimize fill-in.  The reordering algorithm only has to be run once since the sparsity pattern of $AB^{-1}A^{T}$ is fixed.  Furthermore, there's no need to pivot for numerical stability during the factorization.

In interior point methods for nonlinear programming this approach may not be feasible, either because $B$ is indefinite or because $AB^{-1}A^{T}$ becomes too dense. In these situations, we typically make use of a sparse symmetric indefinite factorization

$left[
begin{array}{cc}
B  & A^{T} 
A  & 0     
end{array}
right]
=PLDL^{T}P^{T}$
to solve the larger system of equations.  Here, it is generally necessary to pivot for stability.
In your case, $B$ is sparse, symmetric and positive definite.  You could compute sparse Cholesky factors of $B$ and use them to compute $B^{-1}A^{T}$ column by column (with a pair of sparse triangular solves for each column) rather than computing $B^{-1}$ and multiplying it times $A^{T}$.  Whether this would be faster than a sparse symmetric indefinite factorization is something that you'd want to experiment with.

rchilton1980 · Answer

Though digging through all the commentary has muddied (for me, at least) what your exact scenario is, there are sparse-direct algorithms for computing schur complements of the form $mathbf S = mathbf A mathbf B^{-1} mathbf A^T$, where both the $mathbf B$ and $mathbf A$ operands are sparse. I can point you to my own implementation here. The PARDISO package (contained within MKL) has similar functionality.
Their efficiency depends upon the size/shape of $mathbf A$ relative to $mathbf B$, the approach only makes sense if the size of $mathbf S$ is (asymptotically) smaller than $mathbf B$. The use case that drove me to these methods is substructuring / static condensation, wherein $mathbf B$ represents a volume-sized discretization of a PDE, while $mathbf A$ is a surface-sized restriction/sampling over a boundary. (In which case the time/memory required to compute $mathbf S$ is asymptotically the same as the Cholesky factorization of $mathbf B$). Unfortunately I can't quite tell if you example falls into this category or not .. basically you need $mathrm {size}(mathbf x)$ >> $mathrm{size}(mathbf y)$ for this approach to work well.

Solving a specific sparse linear system without dense materialization

2 Answers

Add your own answers!

Ask a Question