TransWikia.com

Solving a specific sparse linear system without dense materialization

Computational Science Asked by ckfinite on July 21, 2021

I need to (computationally) solve a system of equations, for the purposes of an interior point method, of the form

$$
left[begin{array}{cc}B & A^T A & 0end{array}right] left[begin{array}{c}xyend{array}right] = left[begin{array}{c}c_x c_yend{array}right]
$$

where $A$ and $B$ are large and sparse, and $B$ is Hermitian positive definite. By assumption, $A$ has full row rank. My current approach goes:

  1. Compute the Cholesky factor $B = L L^T$.
  2. Solve $L W = A^{T}$ for $W$, producing $W=L^{-1}A^T$ so that $W^T W = A L^{-T} L^{-1} A^T$
  3. Compute the Cholesky factor of $W^T W$
  4. Use this to solve $A L^{-T} L^{-1} A^T y = A L^{-T} L^{-1} c_x – c_y$
  5. Recover $x$ from $L L^T x = c_x – A^T y$

The problem that I’m running into is that $W$, in step 2, may be dense and its sparsity structure indeterminate based on the sparsity of $A$ and $B$. The sparsity of $W$ is dependent on the actual numerical values of $A$ and $L$, which is hard to optimize for.

Is there a way to avoid materializing $W$? I’d like to avoid having to invert $A$, as while $A$ itself usually has a sparse structure, its inverse is not guaranteed to be sparse.

2 Answers

There are two common ways to deal with systems of equations like this that arise in interior-point methods.

We'll assume that $A$ is of full row rank but not full column rank. That is, $A$ has more columns than rows. Thus we can't solve systems of equations involving $A$ uniquely or find $A^{-1}$.

  1. If $B$ is symmetric and positive definite, we can reduce this system of equations to a smaller system of equations in $y$ by solving for $x$ in terms of $y$,

$x=B^{-1}(c_{x}-A^{T}y)$

and then substituting this into the equation $Ax=b$ to get

$AB^{-1}A^{T}y=AB^{-1}c_{x}-b$.

The matrix in this equation for $y$ is symmetric and positive definite, so we can solve the system by sparse Cholesky factorization.

In interior point methods for linear programming, $B$ is diagonal, so $B^{-1}$ is easy to compute and the the matrix $AB^{-1}A^{T}$ has the same sparsity pattern as as $AA^{T}$. The speed of the Cholesky factorization can be improved by reordering the rows of $A$ (and symmetrically the columns of $A^{T}$) to minimize fill-in. The reordering algorithm only has to be run once since the sparsity pattern of $AB^{-1}A^{T}$ is fixed. Furthermore, there's no need to pivot for numerical stability during the factorization.

  1. In interior point methods for nonlinear programming this approach may not be feasible, either because $B$ is indefinite or because $AB^{-1}A^{T}$ becomes too dense. In these situations, we typically make use of a sparse symmetric indefinite factorization

$left[ begin{array}{cc} B & A^{T} A & 0 end{array} right] =PLDL^{T}P^{T}$

to solve the larger system of equations. Here, it is generally necessary to pivot for stability.

In your case, $B$ is sparse, symmetric and positive definite. You could compute sparse Cholesky factors of $B$ and use them to compute $B^{-1}A^{T}$ column by column (with a pair of sparse triangular solves for each column) rather than computing $B^{-1}$ and multiplying it times $A^{T}$. Whether this would be faster than a sparse symmetric indefinite factorization is something that you'd want to experiment with.

Correct answer by Brian Borchers on July 21, 2021

Though digging through all the commentary has muddied (for me, at least) what your exact scenario is, there are sparse-direct algorithms for computing schur complements of the form $mathbf S = mathbf A mathbf B^{-1} mathbf A^T$, where both the $mathbf B$ and $mathbf A$ operands are sparse. I can point you to my own implementation here. The PARDISO package (contained within MKL) has similar functionality.

Their efficiency depends upon the size/shape of $mathbf A$ relative to $mathbf B$, the approach only makes sense if the size of $mathbf S$ is (asymptotically) smaller than $mathbf B$. The use case that drove me to these methods is substructuring / static condensation, wherein $mathbf B$ represents a volume-sized discretization of a PDE, while $mathbf A$ is a surface-sized restriction/sampling over a boundary. (In which case the time/memory required to compute $mathbf S$ is asymptotically the same as the Cholesky factorization of $mathbf B$). Unfortunately I can't quite tell if you example falls into this category or not .. basically you need $mathrm {size}(mathbf x)$ >> $mathrm{size}(mathbf y)$ for this approach to work well.

Answered by rchilton1980 on July 21, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP