HP MLIB User's Guide Vol. 1 7th Ed.
230 HP MLIB User’s Guide
DGEMMS/ZGEMMS Strassen matrix-matrix multiply
beta The scalar β.
c Array containing the m-by-n matrix C. Not used as
input if beta = 0.
ldc The leading dimension of array c as declared in the
calling program unit, with ldc ≥ max(m,1).
Output c The updated C matrix replaces the input.
Notes Except for the extra character in the subprogram name, these subprograms
conform to specifications of the Level 3 BLAS subprograms DGEMM and
ZGEMM.
Because of their use of Strassen’s method DGEMMS and ZGEMMS are
asymptotically faster than standard matrix multiply methods such as those
employed in the standard routines DGEMM and ZGEMM. In practice, these
particular implementations are faster than their standard counterparts if
min(m,n,k) > 700 for ZGEMMS, or min(m,n,k) > 1500 for DGEMMS. The
speedup in the complex case is much more pronounced. That is due in large
part to the complex bilinear reduction technique (implemented underneath
Strassen’s method) that allows two complex matrices to be multiplied using
only 3/4 of the multiplications required by the traditional method. Also, the
relative cost of data motion is lower in the complex case. The gains in the real
case are marginal until n becomes very large.
In the operator norm, Strassen’s method is slightly less stable than traditional
matrix multiplication, and the computation of individual elements is unstable.
The emerging consensus seems to be that Strassen’s method is sufficiently
stable for most applications. Partly for stability reasons, however, only 64-bit
Strassen subprograms are available at this time.
For a good overview and bibliography of this subject, see Higham.
If an error in the arguments is detected, the subprograms call error handler
XERBLA, which writes an error message onto the standard error file and
terminates execution. The standard version of XERBLA (refer to the end of this
chapter) can be replaced with a user-supplied version to change the error
procedure. Error conditions are:
transa ≠ ’N’ or ’n’ or ’T’ or ’t’ or ’C’ or ’c’
transb ≠ ’N’ or ’n’ or ’T’ or ’t’ or ’C’ or ’c’
m < 0
n < 0
k < 0
lda too small
ldb too small
ldc < max(m,1)