Given an
matrix
, we seek the factorization
,
where
is an
orthogonal matrix, and
is an
upper triangular matrix.
At the
-th step of the computation, we partition this factorization
to the
submatrix of
as

where the block
is
,
is
,
is
, and
is
.
is an
matrix, containing the first
columns of
the matrix
, and
is an
matrix,
containing the last
columns of
(that is,
and
).
is a
upper triangular matrix.
A QR factorization is performed on the first
panel of
(i.e.,
).
In practice,
is computed by applying a series of Householder
transformations to
of the form,
where
. The vector
is of length
with
0's for the first
entries and 1 for the
-th entry, and
. During the QR factorization, the vector
overwrites the entries of
below the diagonal, and
is stored
in a vector.
Furthermore, it can be shown that
,
where
is
upper triangular and the
-th column of
equals
.
This is indeed a block version of the QR factorization [22][4],
and is rich in matrix-matrix operations.
Figure 4: A snapshot of block QR factorization.
The block equation can be rearranged as

A snapshot of the block QR factorization is shown in Figure 4.
During the computation, the sequence of the Householder vectors
is
computed, and the row panel
and
, and the trailing
submatrix
are updated.
The factorization can be done by recursively applying the steps
outlined above to the
matrix
.
The computation of the above steps of the LAPACK routine, DGEQRF, involves the following operations:
panel of
(i.e.,
)
times (
) ]
and
of the block reflector
to the rest of the matrix from the left
The corresponding steps of the ScaLAPACK routine, PDGEQRF, are as follows:
panel of
(i.e.,
)
times (
) ]
and
, computes
only in the current process
row.
to the rest of the matrix from the left
is broadcast rowwise across all columns of
processes. The transpose of
is locally multiplied by
,
then the products are added to the current process row
(
).
is broadcast rowwise in the current process
row to all columns of processes and multiplied with the sum
(
).
is broadcast columnwise down all rows of
processes. Now, processes have their own portions of
and
, then they update
the local portions of the matrix
(
).