Efficient Implementation of the Improved Unsymmetric Lanczos Process on Massively Distributed Memory Computers
by Tianruo Yang
Abstract:
For the eigenvalues of a large and sparse unsymmetric coefficient matrix, we have proposed an improved version of the unsymmetric Lanczos process combining elements of numerical stability and parallel algorithm design. The algorithm is derived such that all inner products and matrix-vector multiplications of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time. Therefore, the cost of global communication on parallel distributed memory computers can be significantly reduced. The resulting algorithm malntains the favorable properties of the Lanczos process while not increasing computational costs. In this paper, we describe an efficient implementation of this method which is particularly well suited to problems with irregular sparsity pattern. The corresponding communication cost is independent of the sparsity pattern with several performance improvement techniques such as overlapping computation and communication, balancing the computational load. The performance is demonstrated by numerical experimental results carried out on massively parallel distributed memory computer Parsytec GC I PowerPlus.
Keywords: theory
Source:
T. Yang, Efficient Implementation of the Improved Unsymmetric Lanczos Process on Massively Distributed Memory Computers. In V. Malyshkin (ed.),
Parallel Computing Technologies: Proceedings of the 4th International Conference,
Lect. Notes in Comp. Sci., Vol. 1277, Springer, 1997, pp. 145-155