HP MLIB User's Guide Vol. 2 7th Ed.

Chapter 11 Introduction to Distributed SuperLU 737

Distributed SuperLU computational routines

The difference between pdgssvx (pzgssvx) and pdgssvx_ABglobal

(pzgssvx_ABglobal) is that, for pdgssvx_ABglobal (pzgssvx_ABglobal), the

input matrices A and B are globally available (replicated) on all processes,

whereas for pdgssvx (pzgssvx), the input matrices A and B are distributed

among all processes.

If there is sufﬁcient memory, then

pdgssvx_ABglobal (pzgssvx_ABglobal)

should be used to solve sparse linear systems, since

pdgssvx_ABglobal

(pzgssvx_ABglobal) is faster than pdgssvx (pzgssvx) due to algorithmic

differences.

pdgssvx, pdgssvx_ABglobal, pzgssvx and pzgssvx_ABglobal perform the

following functions:

• Equilibrate the system (scale A’s rows and columns to have unit norm) if A

is poorly scaled.

• Find a row permutation that makes diagonal of A large relative to the

off-diagonal.

• Find a column permutation that preserves the sparsity of the L and U

factors.

• Solve the system AX=B for X by factoring A followed by forward and back

substitutions.

• Reﬁne the solution X.

Distributed SuperLU computational routines

The following computational routines can be invoked to directly control the

behavior of SuperLU.

• pdgstrf, pzgstrf: Factorize in parallel.

These routines factorize the input matrix A (or the scaled and permuted A).

They assume that the distributed data structures for L and U factors are

already set up, and the initial values of A are loaded into the data

structures. They can factor non-square matrices.

Currently, A must be globally available on all processes.

• pdgstrs, pdgstrs_Bglobal, pzgstrs, pzgstrs_Bglobal: Triangular solve in

parallel.