You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1684 lines
69 KiB
1684 lines
69 KiB
\documentclass[11pt]{report}
|
|
|
|
\usepackage{indentfirst}
|
|
\usepackage[body={6in,8.5in}]{geometry}
|
|
\usepackage{hyperref}
|
|
\usepackage{graphicx}
|
|
\DeclareGraphicsRule{.ps}{eps}{}{}
|
|
|
|
\renewcommand{\thesection}{\arabic{section}}
|
|
\setcounter{tocdepth}{3}
|
|
\setcounter{secnumdepth}{3}
|
|
|
|
\begin{document}
|
|
\begin{center}
|
|
{\Large LAPACK Working Note 81\\
|
|
Quick Installation Guide for LAPACK on Unix Systems\footnote{This work was
|
|
supported by NSF Grant No. ASC-8715728 and NSF Grant No. 0444486}}
|
|
\end{center}
|
|
\begin{center}
|
|
% Edward Anderson\footnote{Current address: Cray Research Inc.,
|
|
% 655F Lone Oak Drive, Eagan, MN 55121},
|
|
The LAPACK Authors\\
|
|
Department of Computer Science \\
|
|
University of Tennessee \\
|
|
Knoxville, Tennessee 37996-1301 \\
|
|
\end{center}
|
|
\begin{center}
|
|
REVISED: VERSION 3.1.1, February 2007 \\
|
|
REVISED: VERSION 3.2.0, November 2008
|
|
\end{center}
|
|
|
|
\begin{center}
|
|
Abstract
|
|
\end{center}
|
|
This working note describes how to install, and test version 3.2.0
|
|
of LAPACK, a linear algebra package for high-performance
|
|
computers, on a Unix System. The timing routines are not actually included in
|
|
release 3.2.0, and that part of the LAWN refers to release 3.0. Also,
|
|
version 3.2.0 contains many prototype routines needing user feedback.
|
|
Non-Unix installation instructions and
|
|
further details of the testing and timing suites are only contained in
|
|
LAPACK Working Note 41, and not in this abbreviated version.
|
|
%Separate instructions are provided for the Unix and non-Unix
|
|
%versions of the test package.
|
|
%Further details are also given on the design of the test and timing
|
|
%programs.
|
|
\newpage
|
|
|
|
\tableofcontents
|
|
|
|
\newpage
|
|
% Introduction to Implementation Guide
|
|
|
|
\section{Introduction}
|
|
|
|
LAPACK is a linear algebra library for high-performance
|
|
computers.
|
|
The library includes Fortran subroutines for
|
|
the analysis and solution of systems of simultaneous linear algebraic
|
|
equations, linear least-squares problems, and matrix eigenvalue
|
|
problems.
|
|
Our approach to achieving high efficiency is based on the use of
|
|
a standard set of Basic Linear Algebra Subprograms (the BLAS),
|
|
which can be optimized for each computing environment.
|
|
By confining most of the computational work to the BLAS,
|
|
the subroutines should be
|
|
transportable and efficient across a wide range of computers.
|
|
|
|
This working note describes how to install, test, and time this
|
|
release of LAPACK on a Unix System.
|
|
|
|
The instructions for installing, testing, and timing
|
|
\footnote{timing are only provided in LAPACK 3.0 and before}
|
|
are designed for a person whose
|
|
responsibility is the maintenance of a mathematical software library.
|
|
We assume the installer has experience in compiling and running
|
|
Fortran programs and in creating object libraries.
|
|
The installation process involves untarring the file, creating a set of
|
|
libraries, and compiling and running the test and timing programs
|
|
\footnotemark[\value{footnote}].
|
|
|
|
%This guide combines the instructions for the Unix and non-Unix
|
|
%versions of the LAPACK test package (the non-Unix version is in Appendix
|
|
%~\ref{appendixe}).
|
|
%At this time, the non-Unix version of LAPACK can only be obtained
|
|
%after first untarring the Unix tar tape and then following the instructions in
|
|
%Appendix ~\ref{appendixe}.
|
|
|
|
Section~\ref{fileformat} describes how the files are organized in the
|
|
file, and
|
|
Section~\ref{overview} gives a general overview of the parts of the test package.
|
|
Step-by-step instructions appear in Section~\ref{installation}.
|
|
%for the Unix version and in the appendix for the non-Unix version.
|
|
|
|
For users desiring additional information, please refer to LAPACK
|
|
Working Note 41.
|
|
% Sections~\ref{moretesting}
|
|
%and ~\ref{moretiming} give
|
|
%details of the test and timing programs and their input files.
|
|
%Appendices ~\ref{appendixa} and ~\ref{appendixb} briefly describe
|
|
%the LAPACK routines and auxiliary routines provided
|
|
%in this release.
|
|
%Appendix ~\ref{appendixc} lists the operation counts we have computed
|
|
%for the BLAS and for some of the LAPACK routines.
|
|
Appendix ~\ref{appendixd}, entitled ``Caveats'', is a compendium of the known
|
|
problems from our own experiences, with suggestions on how to
|
|
overcome them.
|
|
|
|
\textbf{It is strongly advised that the user read Appendix
|
|
A before proceeding with the installation process.}
|
|
%Appendix E contains the execution times of the different test
|
|
%and timing runs on two sample machines.
|
|
%Appendix ~\ref{appendixe} contains the instructions to install LAPACK on a non-Unix
|
|
%system.
|
|
|
|
\section{Revisions Since the First Public Release}
|
|
|
|
Since its first public release in February, 1992, LAPACK has had
|
|
several updates, which have encompassed the introduction of new routines
|
|
as well as extending the functionality of existing routines. The first
|
|
update,
|
|
June 30, 1992, was version 1.0a; the second update, October 31, 1992,
|
|
was version 1.0b; the third update, March 31, 1993, was version 1.1;
|
|
version 2.0 on September 30, 1994, coincided with the release of the
|
|
Second Edition of the LAPACK Users' Guide;
|
|
version 3.0 on June 30, 1999 coincided with the release of the Third Edition of
|
|
the LAPACK Users' Guide;
|
|
version 3.1 was released on November, 2006;
|
|
version 3.1.1 was released on November, 2007;
|
|
and version 3.2.0 was released on November, 2008.
|
|
|
|
All LAPACK routines reflect the current version number with the date
|
|
on the routine indicating when it was last modified.
|
|
For more information on revisions in the latest release, please refer
|
|
to the \texttt{revisions.info} file in the lapack directory on netlib.
|
|
\begin{quote}
|
|
\url{http://www.netlib.org/lapack/revisions.info}
|
|
\end{quote}
|
|
|
|
%The distribution \texttt{tar} file \texttt{lapack.tar.z} that is
|
|
%available on netlib is always the most up-to-date.
|
|
%
|
|
%On-line manpages (troff files) for LAPACK driver and computational
|
|
%routines, as well as most of the BLAS routines, are available via
|
|
%the \texttt{lapack} index on netlib.
|
|
|
|
\section{File Format}\label{fileformat}
|
|
|
|
The software for LAPACK is distributed in the form of a
|
|
gzipped tar file (via anonymous ftp or the World Wide Web),
|
|
which contains the Fortran source for LAPACK,
|
|
the Basic Linear Algebra Subprograms
|
|
(the Level 1, 2, and 3 BLAS) needed by LAPACK, the testing programs,
|
|
and the timing programs\footnotemark[\value{footnote}].
|
|
Users who wish to have a non-Unix installation should refer to LAPACK
|
|
Working Note 41,
|
|
although the overview in section~\ref{overview} applies to both the Unix and non-Unix
|
|
versions.
|
|
%Users who wish to have a non-Unix installation should go to Appendix ~\ref{appendixe},
|
|
%although the overview in section ~\ref{overview} applies to both the Unix and non-Unix
|
|
%versions.
|
|
|
|
The package may be accessed via the World Wide Web through
|
|
the URL address:
|
|
\begin{quote}
|
|
\url{http://www.netlib.org/lapack/lapack.tgz}
|
|
\end{quote}
|
|
|
|
Or, you can retrieve the file via anonymous ftp at netlib:
|
|
|
|
\begin{verbatim}
|
|
ftp ftp.netlib.org
|
|
login: anonymous
|
|
password: <your email address>
|
|
cd lapack
|
|
binary
|
|
get lapack.tgz
|
|
quit
|
|
\end{verbatim}
|
|
|
|
The software in the \texttt{tar} file
|
|
is organized in a number of essential directories as shown
|
|
in Figure 1. Please note that this figure does not reflect everything
|
|
that is contained in the \texttt{LAPACK} directory. Input and instructional
|
|
files are also located at various levels.
|
|
\begin{figure}
|
|
\vspace{11pt}
|
|
\centerline{\includegraphics[width=6.5in,height=3in]{org2.ps}}
|
|
\caption{Unix organization of LAPACK 3.0}
|
|
\vspace{11pt}
|
|
\end{figure}
|
|
Libraries are created in the LAPACK directory and
|
|
executable files are created in one of the directories BLAS, TESTING,
|
|
or TIMING\footnotemark[\value{footnote}]. Input files for the test and
|
|
timing\footnotemark[\value{footnote}] programs are also
|
|
found in these three directories so that testing may be carried out
|
|
in the directories LAPACK/BLAS, LAPACK/TESTING, and LAPACK/TIMING \footnotemark[\value{footnote}].
|
|
A top-level makefile in the LAPACK directory is provided to perform the
|
|
entire installation procedure.
|
|
|
|
\section{Overview of Tape Contents}\label{overview}
|
|
|
|
Most routines in LAPACK occur in four versions: REAL,
|
|
DOUBLE PRECISION, COMPLEX, and COMPLEX*16.
|
|
The first three versions (REAL, DOUBLE PRECISION, and COMPLEX)
|
|
are written in standard Fortran and are completely portable;
|
|
the COMPLEX*16 version is provided for
|
|
those compilers which allow this data type.
|
|
Some routines use features of Fortran 90.
|
|
For convenience, we often refer to routines by their single precision
|
|
names; the leading `S' can be replaced by a `D' for double precision,
|
|
a `C' for complex, or a `Z' for complex*16.
|
|
For LAPACK use and testing you must decide which version(s)
|
|
of the package you intend to install at your site (for example,
|
|
REAL and COMPLEX on a Cray computer or DOUBLE PRECISION and
|
|
COMPLEX*16 on an IBM computer).
|
|
|
|
\subsection{LAPACK Routines}
|
|
|
|
There are three classes of LAPACK routines:
|
|
\begin{itemize}
|
|
|
|
\item \textbf{driver} routines solve a complete problem, such as solving
|
|
a system of linear equations or computing the eigenvalues of a real
|
|
symmetric matrix. Users are encouraged to use a driver routine if there
|
|
is one that meets their requirements. The driver routines are listed
|
|
in LAPACK Working Note 41~\cite{WN41} and the LAPACK Users' Guide~\cite{LUG}.
|
|
%in Appendix ~\ref{appendixa}.
|
|
|
|
\item \textbf{computational} routines, also called simply LAPACK routines,
|
|
perform a distinct computational task, such as computing
|
|
the $LU$ decomposition of an $m$-by-$n$ matrix or finding the
|
|
eigenvalues and eigenvectors of a symmetric tridiagonal matrix using
|
|
the $QR$ algorithm.
|
|
The LAPACK routines are listed in LAPACK Working Note 41~\cite{WN41}
|
|
and the LAPACK Users' Guide~\cite{LUG}.
|
|
%The LAPACK routines are listed in Appendix ~\ref{appendixa}; see also LAPACK
|
|
%Working Note \#5 \cite{WN5}.
|
|
|
|
\item \textbf{auxiliary} routines are all the other subroutines called
|
|
by the driver routines and computational routines.
|
|
%Among them are subroutines to perform subtasks of block algorithms,
|
|
%in particular, the unblocked versions of the block algorithms;
|
|
%extensions to the BLAS, such as matrix-vector operations involving
|
|
%complex symmetric matrices;
|
|
%the special routines LSAME and XERBLA which first appeared with the
|
|
%BLAS;
|
|
%and a number of routines to perform common low-level computations,
|
|
%such as computing a matrix norm, generating an elementary Householder
|
|
%transformation, and applying a sequence of plane rotations.
|
|
%Many of the auxiliary routines may be of use to numerical analysts
|
|
%or software developers, so we have documented the Fortran source for
|
|
%these routines with the same level of detail used for the LAPACK
|
|
%routines and driver routines.
|
|
The auxiliary routines are listed in LAPACK Working Note 41~\cite{WN41}
|
|
and the LAPACK Users' Guide~\cite{LUG}.
|
|
%The auxiliary routines are listed in Appendix ~\ref{appendixb}.
|
|
\end{itemize}
|
|
|
|
\subsection{Level 1, 2, and 3 BLAS}
|
|
|
|
The BLAS are a set of Basic Linear Algebra Subprograms that perform
|
|
vector-vector, matrix-vector, and matrix-matrix operations.
|
|
LAPACK is designed around the Level 1, 2, and 3 BLAS, and nearly all
|
|
of the parallelism in the LAPACK routines is contained in the BLAS.
|
|
Therefore,
|
|
the key to getting good performance from LAPACK lies in having an
|
|
efficient version of the BLAS optimized for your particular machine.
|
|
Optimized BLAS libraries are available on a variety of architectures,
|
|
refer to the BLAS FAQ on netlib for further information.
|
|
\begin{quote}
|
|
\url{http://www.netlib.org/blas/faq.html}
|
|
\end{quote}
|
|
There are also freely available BLAS generators that automatically
|
|
tune a subset of the BLAS for a given architecture. E.g.,
|
|
\begin{quote}
|
|
\url{http://www.netlib.org/atlas/}
|
|
\end{quote}
|
|
And, if all else fails, there is the Fortran~77 reference implementation
|
|
of the Level 1, 2, and 3 BLAS available on netlib (also included in
|
|
the LAPACK distribution tar file).
|
|
\begin{quote}
|
|
\url{http://www.netlib.org/blas/blas.tgz}
|
|
\end{quote}
|
|
No matter which BLAS library is used, the BLAS test programs should
|
|
always be run.
|
|
|
|
Users should not expect too much from the Fortran~77 reference implementation
|
|
BLAS; these versions were written to define the basic operations and do not
|
|
employ the standard tricks for optimizing Fortran code.
|
|
|
|
The formal definitions of the Level 1, 2, and 3 BLAS
|
|
are in \cite{BLAS1}, \cite{BLAS2}, and \cite{BLAS3}.
|
|
The BLAS Quick Reference card is available on netlib.
|
|
|
|
\subsection{Mixed- and Extended-Precision BLAS: XBLAS}
|
|
|
|
The XBLAS extend the BLAS to work with mixed input and output
|
|
precisions as well as using extra precision internally. The XBLAS are
|
|
used in the prototype extra-precise iterative refinement codes.
|
|
|
|
The current release of the XBLAS is available through
|
|
Netlib\footnote{Development versions may be available through
|
|
\url{http://www.cs.berkeley.edu/~yozo/} or
|
|
\url{http://www.nersc.gov/~xiaoye/XBLAS/}.} at
|
|
\begin{quote}
|
|
\url{http://www.netlib.org/xblas}
|
|
\end{quote}
|
|
Their formal definition is in \cite{XBLAS}.
|
|
|
|
\subsection{LAPACK Test Routines}
|
|
|
|
This release contains two distinct test programs for LAPACK routines
|
|
in each data type. One test program tests the routines for solving
|
|
linear equations and linear least squares problems,
|
|
and the other tests routines for the matrix eigenvalue problem.
|
|
The routines for generating test matrices are used by both test
|
|
programs and are compiled into a library for use by both test programs.
|
|
|
|
\subsection{LAPACK Timing Routines (for LAPACK 3.0 and before) }
|
|
|
|
This release also contains two distinct timing programs for the
|
|
LAPACK routines in each data type.
|
|
The linear equation timing program gathers performance data in
|
|
megaflops on the factor, solve, and inverse routines for solving
|
|
linear systems, the routines to generate or apply an orthogonal matrix
|
|
given as a sequence of elementary transformations, and the reductions
|
|
to bidiagonal, tridiagonal, or Hessenberg form for eigenvalue
|
|
computations.
|
|
The operation counts used in computing the megaflop rates are computed
|
|
from a formula;
|
|
see LAPACK Working Note 41~\cite{WN41}.
|
|
% see Appendix ~\ref{appendixc}.
|
|
The eigenvalue timing program is used with the eigensystem routines
|
|
and returns the execution time, number of floating point operations, and
|
|
megaflop rate for each of the requested subroutines.
|
|
In this program, the number of operations is computed while the
|
|
code is executing using special instrumented versions of the LAPACK
|
|
subroutines.
|
|
|
|
\section{Installing LAPACK on a Unix System}\label{installation}
|
|
|
|
Installing, testing, and timing\footnotemark[\value{footnote}] the Unix version of LAPACK
|
|
involves the following steps:
|
|
\begin{enumerate}
|
|
\item Gunzip and tar the file.
|
|
|
|
\item Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}.
|
|
|
|
\item Edit the file \texttt{LAPACK/Makefile} and type \texttt{make}.
|
|
|
|
%\item Test and Install the Machine-Dependent Routines \\
|
|
%\emph{(WARNING: You may need to supply a correct version of second.f and
|
|
%dsecnd.f for your machine)}
|
|
%{\tt
|
|
%\begin{list}{}{}
|
|
%\item cd LAPACK
|
|
%\item make install
|
|
%\end{list} }
|
|
%
|
|
%\item Create the BLAS Library, \emph{if necessary} \\
|
|
%\emph{(NOTE: For best performance, it is recommended you use the manufacturers' BLAS)}
|
|
%{\tt
|
|
%\begin{list}{}{}
|
|
%\item \texttt{cd LAPACK}
|
|
%\item \texttt{make blaslib}
|
|
%\end{list} }
|
|
%
|
|
%\item Run the Level 1, 2, and 3 BLAS Test Programs
|
|
%\begin{list}{}{}
|
|
%\item \texttt{cd LAPACK}
|
|
%\item \texttt{make blas\_testing}
|
|
%\end{list}
|
|
%
|
|
%\item Create the LAPACK Library
|
|
%\begin{list}{}{}
|
|
%\item \texttt{cd LAPACK}
|
|
%\item \texttt{make lapacklib}
|
|
%\end{list}
|
|
%
|
|
%\item Create the Library of Test Matrix Generators
|
|
%\begin{list}{}{}
|
|
%\item \texttt{cd LAPACK}
|
|
%\item \texttt{make tmglib}
|
|
%\end{list}
|
|
%
|
|
%\item Run the LAPACK Test Programs
|
|
%\begin{list}{}{}
|
|
%\item \texttt{cd LAPACK}
|
|
%\item \texttt{make testing}
|
|
%\end{list}
|
|
%
|
|
%\item Run the LAPACK Timing Programs
|
|
%\begin{list}{}{}
|
|
%\item \texttt{cd LAPACK}
|
|
%\item \texttt{make timing}
|
|
%\end{list}
|
|
%
|
|
%\item Run the BLAS Timing Programs
|
|
%\begin{list}{}{}
|
|
%\item \texttt{cd LAPACK}
|
|
%\item \texttt{make blas\_timing}
|
|
%\end{list}
|
|
\end{enumerate}
|
|
|
|
\subsection{Untar the File}
|
|
|
|
If you received a tar file of LAPACK via the World Wide
|
|
Web or anonymous ftp, enter the following command:
|
|
|
|
\begin{list}{}
|
|
\item{\texttt{gunzip -c lapack.tgz | tar xvf -}}
|
|
\end{list}
|
|
|
|
\noindent
|
|
This will create a top-level directory called \texttt{LAPACK}, which
|
|
requires approximately 34 Mbytes of disk space.
|
|
The total space requirements including the object files and executables
|
|
is approximately 100 Mbytes for all four data types.
|
|
|
|
\subsection{Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}}
|
|
|
|
Before the libraries can be built, or the testing and timing\footnotemark[\value{footnote}] programs
|
|
run, you must define all machine-specific parameters for the
|
|
architecture to which you are installing LAPACK. All machine-specific
|
|
parameters are contained in the file \texttt{LAPACK/make.inc}.
|
|
An example of \texttt{LAPACK/make.inc} for a LINUX machine with GNU compilers is given
|
|
in \texttt{LAPACK/make.inc.example}, copy that file to LAPACK/make.inc by entering the following command:
|
|
|
|
\begin{list}{}
|
|
\item{\texttt{cp LAPACK/make.inc.example LAPACK/make.inc}}
|
|
\end{list}
|
|
|
|
\noindent
|
|
Now modify your \texttt{LAPACK/make.inc} by applying the following recommendations.
|
|
The first line of this \texttt{make.inc} file is:
|
|
\begin{quote}
|
|
SHELL = /bin/sh
|
|
\end{quote}
|
|
and it will need to be modified to \texttt{SHELL = /sbin/sh} if you are
|
|
installing LAPACK on an SGI architecture.
|
|
Next, you will need to modify \texttt{FC}, \texttt{FFLAGS},
|
|
\texttt{FFLAGS\_DRV}, \texttt{FFLAGS\_NOOPT}, and \texttt{LDFLAGS} to specify
|
|
the compiler, compiler options, compiler options for the testing and
|
|
timing\footnotemark[\value{footnote}] main programs, and linker options.
|
|
Next you will have to choose which function you will use to time in the
|
|
\texttt{SECOND} and \texttt{DSECND} routines.
|
|
\begin{verbatim}
|
|
# Default: SECOND and DSECND will use a call to the
|
|
# EXTERNAL FUNCTION ETIME
|
|
#TIMER = EXT_ETIME
|
|
# For RS6K: SECOND and DSECND will use a call to the
|
|
# EXTERNAL FUNCTION ETIME_
|
|
#TIMER = EXT_ETIME_
|
|
# For gfortran compiler: SECOND and DSECND will use a call to the
|
|
# INTERNAL FUNCTION ETIME
|
|
TIMER = INT_ETIME
|
|
# If your Fortran compiler does not provide etime (like Nag Fortran
|
|
# Compiler, etc...) SECOND and DSECND will use a call to the
|
|
# INTERNAL FUNCTION CPU_TIME
|
|
#TIMER = INT_CPU_TIME
|
|
# If none of these work, you can use the NONE value.
|
|
# In that case, SECOND and DSECND will always return 0.
|
|
#TIMER = NONE
|
|
\end{verbatim}
|
|
Refer to the section~\ref{second} to get more information.
|
|
|
|
|
|
Next, you will need to modify \texttt{AR}, \texttt{ARFLAGS}, and \texttt{RANLIB} to specify archiver,
|
|
archiver options, and ranlib for your machine. If your architecture
|
|
does not require \texttt{ranlib} to be run after each archive command (as
|
|
is the case with CRAY computers running UNICOS, Hewlett Packard
|
|
computers running HP-UX, or SUN SPARCstations running Solaris), set
|
|
\texttt{RANLIB = echo}. And finally, you must
|
|
modify the \texttt{BLASLIB} definition to specify the BLAS library to which
|
|
you will be linking. If an optimized version of the BLAS is available
|
|
on your machine, you are highly recommended to link to that library.
|
|
Otherwise, by default, \texttt{BLASLIB} is set to the Fortran~77 version.
|
|
|
|
If you want to enable the XBLAS, define the variable \texttt{USEXBLAS}
|
|
to some value, for example \texttt{USEXBLAS = Yes}. Then set the
|
|
variable \texttt{XBLASLIB} to point at the XBLAS library. Note that
|
|
the prototype iterative refinement routines and their testers will not
|
|
be built unless \texttt{USEXBLAS} is defined.
|
|
|
|
\textbf{NOTE:} Example \texttt{make.inc} include files are contained in the
|
|
\texttt{LAPACK/INSTALL} directory. Please refer to
|
|
Appendix~\ref{appendixd} for machine-specific installation hints, and/or
|
|
the \texttt{release\_notes} file on \texttt{netlib}.
|
|
\begin{quote}
|
|
\url{http://www.netlib.org/lapack/release\_notes}
|
|
\end{quote}
|
|
|
|
\subsection{Edit the file \texttt{LAPACK/Makefile}}\label{toplevelmakefile}
|
|
|
|
This \texttt{Makefile} can be modified to perform as much of the
|
|
installation process as the user desires. Ideally, this is the ONLY
|
|
makefile the user must modify. However, modification of lower-level
|
|
makefiles may be necessary if a specific routine needs to be compiled
|
|
with a different level of optimization.
|
|
|
|
First, edit the definitions of \texttt{blaslib}, \texttt{lapacklib},
|
|
\texttt{tmglib}, \texttt{lapack\_testing}, and \texttt{timing}\footnotemark[\value{footnote}] in the file \texttt{LAPACK/Makefile}
|
|
to specify the data types desired. For example,
|
|
if you only wish to compile the single precision real version of the
|
|
LAPACK library, you would modify the \texttt{lapacklib} definition to be:
|
|
|
|
\begin{verbatim}
|
|
lapacklib:
|
|
$(MAKE) -C SRC single
|
|
\end{verbatim}
|
|
|
|
Likewise, you could specify \texttt{double, complex, or complex16} to
|
|
build the double precision real, single precision complex, or double
|
|
precision complex libraries, respectively. By default, the presence of
|
|
no arguments following the \texttt{make} command will result in the
|
|
building of all four data types.
|
|
The make command can be run more than once to add another
|
|
data type to the library if necessary.
|
|
|
|
%If you are installing LAPACK on a Silicon Graphics machine, you must
|
|
%modify the respective definitions of \texttt{testing} and \texttt{timing} to be
|
|
%\begin{verbatim}
|
|
%testing:
|
|
% ( cd TESTING; $(MAKE) -f Makefile.sgi )
|
|
%\end{verbatim}
|
|
%and
|
|
%\begin{verbatim}
|
|
%timing:
|
|
% ( cd TIMING; $(MAKE) -f Makefile.sgi )
|
|
%\end{verbatim}
|
|
|
|
Next, if you will be using a locally available BLAS library, you will need
|
|
to remove \texttt{blaslib} from the \texttt{lib} definition. And finally,
|
|
if you do not wish to build all of the libraries individually and
|
|
likewise run all of the testing and timing separately, you can
|
|
modify the \texttt{all} definition to specify the amount of the
|
|
installation process that you want performed. By default,
|
|
the \texttt{all} definition is set to
|
|
\begin{verbatim}
|
|
all: lapack_install lib lapack_testing blas_testing
|
|
\end{verbatim}
|
|
which will perform all phases of the installation
|
|
process -- testing of machine-dependent routines, building the libraries,
|
|
BLAS testing and LAPACK testing.
|
|
|
|
The entire installation process will then be performed by typing
|
|
\texttt{make}.
|
|
|
|
Questions and/or comments can be directed to the
|
|
authors as described in Section~\ref{sendresults}. If test failures
|
|
occur, please refer to the appropriate subsection in
|
|
Section~\ref{furtherdetails}.
|
|
|
|
If disk space is limited, we suggest building each data type separately
|
|
and/or deleting all object files after building the libraries. Likewise, all
|
|
testing and timing executables can be deleted after the testing and timing
|
|
process is completed. The removal of all object files and executables
|
|
can be accomplished by the following:
|
|
|
|
\begin{list}{}{}
|
|
\item \texttt{cd LAPACK}
|
|
\item \texttt{make cleanobj}
|
|
\end{list}
|
|
|
|
\section{Further Details of the Installation Process}\label{furtherdetails}
|
|
|
|
Alternatively, you can choose to run each of the phases of the
|
|
installation process separately. The following sections give details
|
|
on how this may be achieved.
|
|
|
|
\subsection{Test and Install the Machine-Dependent Routines.}
|
|
|
|
There are six machine-dependent functions in the test and timing
|
|
package, at least three of which must be installed. They are
|
|
|
|
\begin{tabbing}
|
|
MONOMO \= DOUBLE PRECISION \= \kill
|
|
LSAME \> LOGICAL \> Test if two characters are the same regardless of case \\
|
|
SLAMCH \> REAL \> Determine machine-dependent parameters \\
|
|
DLAMCH \> DOUBLE PRECISION \> Determine machine-dependent parameters \\
|
|
SECOND \> REAL \> Return time in seconds from a fixed starting time \\
|
|
DSECND \> DOUBLE PRECISION \> Return time in seconds from a fixed starting time\\
|
|
ILAENV \> INTEGER \> Checks that NaN and infinity arithmetic are IEEE-754 compliant
|
|
\end{tabbing}
|
|
|
|
\noindent
|
|
If you are working only in single precision, you do not need to install
|
|
DLAMCH and DSECND, and if you are working only in double precision,
|
|
you do not need to install SLAMCH and SECOND.
|
|
|
|
These six subroutines are provided in \texttt{LAPACK/INSTALL},
|
|
along with six test programs.
|
|
To compile the six test programs and run the tests, go to \texttt{LAPACK} and
|
|
type \texttt{make lapack\_install}. The test programs are called
|
|
\texttt{testlsame, testslamch, testdlamch, testsecond, testdsecnd} and
|
|
\texttt{testieee}.
|
|
If you do not wish to run all tests, you will need to modify the
|
|
\texttt{lapack\_install} definition in the \texttt{LAPACK/Makefile} to only include the
|
|
tests you wish to run. Otherwise, all tests will be performed.
|
|
The expected results of each test program are described below.
|
|
|
|
\subsubsection{Installing LSAME}
|
|
|
|
LSAME is a logical function with two character parameters, A and B.
|
|
It returns .TRUE. if A and B are the same regardless of case, or .FALSE.
|
|
if they are different.
|
|
For example, the expression
|
|
|
|
\begin{list}{}{}
|
|
\item \texttt{LSAME( UPLO, 'U' )}
|
|
\end{list}
|
|
\noindent
|
|
is equivalent to
|
|
\begin{list}{}{}
|
|
\item \texttt{( UPLO.EQ.'U' ).OR.( UPLO.EQ.'u' )}
|
|
\end{list}
|
|
|
|
The test program in \texttt{lsametst.f} tests all combinations of
|
|
the same character in upper and lower case for A and B, and two
|
|
cases where A and B are different characters.
|
|
|
|
Run the test program by typing \texttt{testlsame}.
|
|
If LSAME works correctly, the only message you should see after the
|
|
execution of \texttt{testlsame} is
|
|
\begin{verbatim}
|
|
ASCII character set
|
|
Tests completed
|
|
\end{verbatim}
|
|
The file \texttt{lsame.f} is automatically copied to
|
|
\texttt{LAPACK/BLAS/SRC/} and \texttt{LAPACK/SRC/}.
|
|
The function LSAME is needed by both the BLAS and LAPACK, so it is safer
|
|
to have it in both libraries as long as this does not cause trouble
|
|
in the link phase when both libraries are used.
|
|
|
|
\subsubsection{Installing SLAMCH and DLAMCH}
|
|
|
|
SLAMCH and DLAMCH are real functions with a single character parameter
|
|
that indicates the machine parameter to be returned. The test
|
|
program in \texttt{slamchtst.f}
|
|
simply prints out the different values computed by SLAMCH,
|
|
so you need to know something about what the values should be.
|
|
For example, the output of the test program executable \texttt{testslamch}
|
|
for SLAMCH on a Sun SPARCstation is
|
|
\begin{verbatim}
|
|
Epsilon = 5.96046E-08
|
|
Safe minimum = 1.17549E-38
|
|
Base = 2.00000
|
|
Precision = 1.19209E-07
|
|
Number of digits in mantissa = 24.0000
|
|
Rounding mode = 1.00000
|
|
Minimum exponent = -125.000
|
|
Underflow threshold = 1.17549E-38
|
|
Largest exponent = 128.000
|
|
Overflow threshold = 3.40282E+38
|
|
Reciprocal of safe minimum = 8.50706E+37
|
|
\end{verbatim}
|
|
On a Cray machine, the safe minimum underflows its output
|
|
representation and the overflow threshold overflows its output
|
|
representation, so the safe minimum is printed as 0.00000 and overflow
|
|
is printed as R. This is normal.
|
|
If you would prefer to print a representable number, you can modify
|
|
the test program to print SFMIN*100. and RMAX/100. for the safe
|
|
minimum and overflow thresholds.
|
|
|
|
Likewise, the test executable \texttt{testdlamch} is run for DLAMCH.
|
|
|
|
If both tests were successful, go to Section~\ref{second}.
|
|
|
|
If SLAMCH (or DLAMCH) returns an invalid value, you will have to create
|
|
your own version of this function. The following options are used in
|
|
LAPACK and must be set:
|
|
|
|
\begin{list}{}{}
|
|
\item {`B': } Base of the machine
|
|
\item {`E': } Epsilon (relative machine precision)
|
|
\item {`O': } Overflow threshold
|
|
\item {`P': } Precision = Epsilon*Base
|
|
\item {`S': } Safe minimum (often same as underflow threshold)
|
|
\item {`U': } Underflow threshold
|
|
\end{list}
|
|
|
|
Some people may be familiar with R1MACH (D1MACH), a primitive
|
|
routine for setting machine parameters in which the user must
|
|
comment out the appropriate assignment statements for the target
|
|
machine. If a version of R1MACH is on hand, the assignments in
|
|
SLAMCH can be made to refer to R1MACH using the correspondence
|
|
|
|
\begin{list}{}{}
|
|
\item {SLAMCH( `U' )} $=$ R1MACH( 1 )
|
|
\item {SLAMCH( `O' )} $=$ R1MACH( 2 )
|
|
\item {SLAMCH( `E' )} $=$ R1MACH( 3 )
|
|
\item {SLAMCH( `B' )} $=$ R1MACH( 5 )
|
|
\end{list}
|
|
|
|
\noindent
|
|
The safe minimum returned by SLAMCH( 'S' ) is initially set to the
|
|
underflow value, but if $1/(\mathrm{overflow}) \geq (\mathrm{underflow})$
|
|
it is recomputed as $(1/(\mathrm{overflow})) * ( 1 + \varepsilon )$,
|
|
where $\varepsilon$ is the machine precision.
|
|
|
|
BE AWARE that the initial call to SLAMCH or DLAMCH is expensive.
|
|
We suggest that installers run it once, save the results, and hard-code
|
|
the constants in the version they put in their library.
|
|
|
|
\subsubsection{Installing SECOND and DSECND}\label{second}
|
|
|
|
Both the timing routines\footnotemark[\value{footnote}] and the test routines call SECOND
|
|
(DSECND), a real function with no arguments that returns the time
|
|
in seconds from some fixed starting time.
|
|
Our version of this routine
|
|
returns only ``user time'', and not ``user time $+$ system time''.
|
|
The following version of SECOND in \texttt{second\_EXT\_ETIME.f, second\_INT\_ETIME.f} calls
|
|
ETIME, a Fortran library routine available on some computer systems.
|
|
If ETIME is not available or a better local timing function exists,
|
|
you will have to provide the correct interface to SECOND and DSECND
|
|
on your machine.
|
|
|
|
Since LAPACK 3.1.1 we provide 5 different flavours of the SECOND and DSECND routines.
|
|
The version that will be used depends on the value of the TIMER variable in the make.inc
|
|
|
|
\begin{itemize}
|
|
\item If ETIME is available as an external function, set the value of the TIMER variable in your
|
|
make.inc to \texttt{EXT\_ETIME}: \texttt{second\_EXT\_ETIME.f} and \texttt{dsecnd\_EXT\_ETIME.f} will be used.
|
|
Usually on HPPA architectures,
|
|
the compiler and linker flag \texttt{+U77} should be included to access
|
|
the function \texttt{ETIME}.
|
|
|
|
\item If ETIME\_ is available as an external function, set the value of the TIMER variable in your make.inc
|
|
to \texttt{EXT\_ETIME\_}: \texttt{second\_EXT\_ETIME\_.f} and \texttt{dsecnd\_EXT\_ETIME\_.f} will be used.
|
|
It is the case on some IBM architectures such as IBM RS/6000s.
|
|
|
|
\item If ETIME is available as an internal function, set the value of the TIMER variable in your make.inc
|
|
to \texttt{INT\_ETIME}: \texttt{second\_INT\_ETIME.f} and \texttt{dsecnd\_INT\_ETIME.f} will be used.
|
|
This is the case with gfortan.
|
|
|
|
\item If CPU\_TIME is available as an internal function, set the value of the TIMER variable in your make.inc
|
|
to \texttt{INT\_CPU\_TIME}: \texttt{second\_INT\_CPU\_TIME.f} and \texttt{dsecnd\_INT\_CPU\_TIME.f} will be used.
|
|
|
|
\item If none of these function is available, set the value of the TIMER variable in your make.inc
|
|
to \texttt{NONE}: \texttt{second\_NONE.f} and \texttt{dsecnd\_NONE.f} will be used.
|
|
These routines will always return zero.
|
|
\end{itemize}
|
|
|
|
The test program in \texttt{secondtst.f}
|
|
performs a million operations using 5000 iterations of
|
|
the SAXPY operation $y := y + \alpha x$ on a vector of length 100.
|
|
The total time and megaflops for this test is reported, then
|
|
the operation is repeated including a call to SECOND on each of
|
|
the 5000 iterations to determine the overhead due to calling SECOND.
|
|
The test program executable is called \texttt{testsecond} (or \texttt{testdsecnd}).
|
|
There is no single right answer, but the times
|
|
in seconds should be positive and the megaflop ratios should be
|
|
appropriate for your machine.
|
|
|
|
\subsubsection{Testing IEEE arithmetic and ILAENV}\label{testieee}
|
|
|
|
%\textbf{If you are installing LAPACK on a non-IEEE machine, you MUST
|
|
%modify ILAENV! Otherwise, ILAENV will crash . By default, ILAENV
|
|
%assumes an IEEE machine, and does a test for IEEE-754 compliance.}
|
|
|
|
As some new routines in LAPACK rely on IEEE-754 compliance,
|
|
two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV
|
|
(\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and
|
|
infinity arithmetic, respectively. By default, ILAENV assumes an IEEE
|
|
machine, and does a test for IEEE-754 compliance. \textbf{NOTE: If you
|
|
are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
|
|
as this test inside ILAENV will crash!}
|
|
|
|
If \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is
|
|
issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance,
|
|
and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant.
|
|
|
|
Thus, for non-IEEE machines, the user must hard-code the setting of
|
|
(\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version
|
|
of \texttt{LAPACK/SRC/ilaenv.f} to be put in
|
|
his library. There are also specialized testing and timing\footnotemark[\value{footnote}] versions of
|
|
ILAENV that will also need to be modified.
|
|
\begin{itemize}
|
|
\item Testing/timing version of \texttt{LAPACK/TESTING/LIN/ilaenv.f}
|
|
\item Testing/timing version of \texttt{LAPACK/TESTING/EIG/ilaenv.f}
|
|
\item Testing/timing version of \texttt{LAPACK/TIMING/LIN/ilaenv.f}
|
|
\item Testing/timing version of \texttt{LAPACK/TIMING/EIG/ilaenv.f}
|
|
\end{itemize}
|
|
|
|
%Some new routines in LAPACK rely on IEEE-754 compliance, and if non-compliance
|
|
%is detected (via a call to the function ILAENV), alternative (slower)
|
|
%algorithms will be chosen.
|
|
%For further details, refer to the leading comments of routines such
|
|
%as \texttt{LAPACK/SRC/sstevr.f}.
|
|
|
|
The test program in \texttt{LAPACK/INSTALL/tstiee.f} checks an installation
|
|
architecture
|
|
to see if infinity arithmetic and NaN arithmetic are IEEE-754 compliant.
|
|
A warning message to the user is printed if non-compliance is detected.
|
|
This same test is performed inside the function ILAENV. If
|
|
\texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is
|
|
issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance,
|
|
and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant.
|
|
|
|
To avoid this IEEE test being run every time you call
|
|
\texttt{ILAENV( 10, $\ldots$)} or \texttt{ILAENV( 11, $\ldots$ )}, we suggest
|
|
that the user hard-code the setting of
|
|
\texttt{ILAENV=1} or \texttt{ILAENV=0} in the version of \texttt{LAPACK/SRC/ilaenv.f} to be put in
|
|
his library. As aforementioned, there are also specialized testing and
|
|
timing\footnotemark[\value{footnote}] versions of ILAENV that will also need to be modified.
|
|
|
|
\subsection{Create the BLAS Library}
|
|
|
|
Ideally, a highly optimized version of the BLAS library already
|
|
exists on your machine.
|
|
In this case you can go directly to Section~\ref{testblas} to
|
|
make the BLAS test programs.
|
|
|
|
\begin{itemize}
|
|
\item[a)]
|
|
Go to \texttt{LAPACK} and edit the definition of \texttt{blaslib} in the
|
|
file \texttt{Makefile} to specify the data types desired, as in the example
|
|
in Section~\ref{toplevelmakefile}.
|
|
|
|
If you already have some of the BLAS, you will need to edit the file
|
|
\texttt{LAPACK/BLAS/SRC/Makefile} to comment out the lines
|
|
defining the BLAS you have.
|
|
|
|
\item[b)]
|
|
Type \texttt{make blaslib}.
|
|
The make command can be run more than once to add another
|
|
data type to the library if necessary.
|
|
\end{itemize}
|
|
|
|
\noindent
|
|
The BLAS library is created in \texttt{LAPACK/librefblas.a},
|
|
or in the user-defined location specified by \texttt{BLASLIB} in the file
|
|
\texttt{LAPACK/make.inc}.
|
|
|
|
\subsection{Run the BLAS Test Programs}\label{testblas}
|
|
|
|
Test programs for the Level 1, 2, and 3 BLAS are in the directory
|
|
\texttt{LAPACK/BLAS/TESTING}.
|
|
|
|
To compile and run the Level 1, 2, and 3 BLAS test programs,
|
|
go to \texttt{LAPACK} and type \texttt{make blas\_testing}. The executable
|
|
files are called \texttt{xblat\_s}, \texttt{xblat\_d}, \texttt{xblat\_c}, and
|
|
\texttt{xblat\_z}, where the \_ (underscore) is replaced by 1, 2, or 3,
|
|
depending upon the level of BLAS that it is testing. All executable and
|
|
output files are created in \texttt{LAPACK/BLAS/}.
|
|
For the Level 1 BLAS tests, the output file names are \texttt{sblat1.out},
|
|
\texttt{dblat1.out}, \texttt{cblat1.out}, and \texttt{zblat1.out}. For the Level
|
|
2 and 3 BLAS, the name of the output file is indicated on the first line of the
|
|
input file and is currently defined to be \texttt{sblat2.out} for
|
|
the Level 2 REAL version, and \texttt{sblat3.out} for the Level 3 REAL
|
|
version, with similar names for the other data types.
|
|
|
|
If the tests using the supplied data files were completed successfully,
|
|
consider whether the tests were sufficiently thorough.
|
|
For example, on a machine with vector registers, at least one value
|
|
of $N$ greater than the length of the vector registers should be used;
|
|
otherwise, important parts of the compiled code may not be
|
|
exercised by the tests.
|
|
If the tests were not successful, either because the program did not
|
|
finish or the test ratios did not pass the threshold, you will
|
|
probably have to find and correct the problem before continuing.
|
|
If you have been testing a system-specific
|
|
BLAS library, try using the Fortran BLAS for the routines that
|
|
did not pass the tests.
|
|
For more details on the BLAS test programs,
|
|
see \cite{BLAS2-test} and \cite{BLAS3-test}.
|
|
|
|
\subsection{Create the LAPACK Library}
|
|
|
|
\begin{itemize}
|
|
\item[a)]
|
|
Go to the directory \texttt{LAPACK} and edit the definition of
|
|
\texttt{lapacklib} in the file \texttt{Makefile} to specify the data types desired,
|
|
as in the example in Section~\ref{toplevelmakefile}.
|
|
|
|
\item[b)]
|
|
Type \texttt{make lapacklib}.
|
|
The make command can be run more than once to add another
|
|
data type to the library if necessary.
|
|
|
|
\end{itemize}
|
|
|
|
\noindent
|
|
The LAPACK library is created in \texttt{LAPACK/liblapack.a},
|
|
or in the user-defined location specified by \texttt{LAPACKLIB} in the file
|
|
\texttt{LAPACK/make.inc}.
|
|
|
|
\subsection{Create the Test Matrix Generator Library}
|
|
|
|
\begin{itemize}
|
|
\item[a)]
|
|
Go to the directory \texttt{LAPACK} and edit the definition of \texttt{tmglib}
|
|
in the file \texttt{Makefile} to specify the data types desired, as in the
|
|
example in Section~\ref{toplevelmakefile}.
|
|
|
|
\item[b)]
|
|
Type \texttt{make tmglib}.
|
|
The make command can be run more than once to add another
|
|
data type to the library if necessary.
|
|
|
|
\end{itemize}
|
|
|
|
\noindent
|
|
The test matrix generator library is created in \texttt{LAPACK/libtmglib.a},
|
|
or in the user-defined location specified by \texttt{TMGLIB} in the file
|
|
\texttt{LAPACK/make.inc}.
|
|
|
|
\subsection{Run the LAPACK Test Programs}
|
|
|
|
There are two distinct test programs for LAPACK routines
|
|
in each data type, one for the linear equation routines and
|
|
one for the eigensystem routines.
|
|
In each data type, there is one input file for testing the linear
|
|
equation routines and eighteen input files for testing the eigenvalue
|
|
routines.
|
|
The input files reside in \texttt{LAPACK/TESTING}.
|
|
For more information on the test programs and how to modify the
|
|
input files, please refer to LAPACK Working Note 41~\cite{WN41}.
|
|
% see Section~\ref{moretesting}.
|
|
|
|
If you do not wish to run each of the tests individually, you can
|
|
go to \texttt{LAPACK}, edit the definition \texttt{lapack\_testing} in the file
|
|
\texttt{Makefile} to specify the data types desired, and type \texttt{make
|
|
lapack\_testing}. This will
|
|
compile and run the tests as described in sections~\ref{testlin}
|
|
and ~\ref{testeig}.
|
|
|
|
%If you are installing LAPACK on a Silicon Graphics machine, you must
|
|
%modify the definition of \texttt{testing} to be
|
|
%\begin{verbatim}
|
|
%testing:
|
|
% ( cd TESTING; $(MAKE) -f Makefile.sgi )
|
|
%\end{verbatim}
|
|
|
|
\subsubsection{Testing the Linear Equations Routines}\label{testlin}
|
|
|
|
\begin{itemize}
|
|
|
|
\item[a)]
|
|
Go to \texttt{LAPACK/TESTING/LIN} and type \texttt{make} followed by the data types
|
|
desired. The executable files are called \texttt{xlintsts, xlintstc,
|
|
xlintstd}, or \texttt{xlintstz} and are created in \texttt{LAPACK/TESTING}.
|
|
|
|
\item[b)]
|
|
Go to \texttt{LAPACK/TESTING} and run the tests for each data type.
|
|
For the REAL version, the command is
|
|
\begin{list}{}{}
|
|
\item{} \texttt{xlintsts < stest.in > stest.out}
|
|
\end{list}
|
|
|
|
\noindent
|
|
The tests using \texttt{xlintstd}, \texttt{xlintstc}, and \texttt{xlintstz} are similar
|
|
with the leading `s' in the input and output file names replaced
|
|
by `d', `c', or `z'.
|
|
|
|
\end{itemize}
|
|
|
|
If you encountered failures in this phase of the testing process, please
|
|
refer to Section~\ref{sendresults}.
|
|
|
|
\subsubsection{Testing the Eigensystem Routines}\label{testeig}
|
|
|
|
\begin{itemize}
|
|
|
|
\item[a)]
|
|
Go to \texttt{LAPACK/TESTING/EIG} and type \texttt{make} followed by the data types
|
|
desired. The executable files are called \texttt{xeigtsts,
|
|
xeigtstc, xeigtstd}, and \texttt{xeigtstz} and are created
|
|
in \texttt{LAPACK/TESTING}.
|
|
|
|
\item[b)]
|
|
Go to \texttt{LAPACK/TESTING} and run the tests for each data type.
|
|
The tests for the eigensystem routines use eighteen separate input files
|
|
for testing the nonsymmetric eigenvalue problem,
|
|
the symmetric eigenvalue problem, the banded symmetric eigenvalue
|
|
problem, the generalized symmetric eigenvalue
|
|
problem, the generalized nonsymmetric eigenvalue problem, the
|
|
singular value decomposition, the banded singular value decomposition,
|
|
the generalized singular value
|
|
decomposition, the generalized QR and RQ factorizations, the generalized
|
|
linear regression model, and the constrained linear least squares
|
|
problem.
|
|
The tests for the REAL version are as follows:
|
|
\begin{list}{}{}
|
|
\item \texttt{xeigtsts < nep.in > snep.out}
|
|
\item \texttt{xeigtsts < sep.in > ssep.out}
|
|
\item \texttt{xeigtsts < svd.in > ssvd.out}
|
|
\item \texttt{xeigtsts < sec.in > sec.out}
|
|
\item \texttt{xeigtsts < sed.in > sed.out}
|
|
\item \texttt{xeigtsts < sgg.in > sgg.out}
|
|
\item \texttt{xeigtsts < sgd.in > sgd.out}
|
|
\item \texttt{xeigtsts < ssg.in > ssg.out}
|
|
\item \texttt{xeigtsts < ssb.in > ssb.out}
|
|
\item \texttt{xeigtsts < sbb.in > sbb.out}
|
|
\item \texttt{xeigtsts < sbal.in > sbal.out}
|
|
\item \texttt{xeigtsts < sbak.in > sbak.out}
|
|
\item \texttt{xeigtsts < sgbal.in > sgbal.out}
|
|
\item \texttt{xeigtsts < sgbak.in > sgbak.out}
|
|
\item \texttt{xeigtsts < glm.in > sglm.out}
|
|
\item \texttt{xeigtsts < gqr.in > sgqr.out}
|
|
\item \texttt{xeigtsts < gsv.in > sgsv.out}
|
|
\item \texttt{xeigtsts < lse.in > slse.out}
|
|
\end{list}
|
|
The tests using \texttt{xeigtstc}, \texttt{xeigtstd}, and \texttt{xeigtstz} also
|
|
use the input files \texttt{nep.in}, \texttt{sep.in}, \texttt{svd.in},
|
|
\texttt{glm.in}, \texttt{gqr.in}, \texttt{gsv.in}, and \texttt{lse.in},
|
|
but the leading `s' in the other input file names must be changed
|
|
to `c', `d', or `z'.
|
|
\end{itemize}
|
|
|
|
If you encountered failures in this phase of the testing process, please
|
|
refer to Section~\ref{sendresults}.
|
|
|
|
\subsection{Run the LAPACK Timing Programs (For LAPACK 3.0 and before)}
|
|
|
|
There are two distinct timing programs for LAPACK routines
|
|
in each data type, one for the linear equation routines and
|
|
one for the eigensystem routines. The timing program for the
|
|
linear equation routines is also used to time the BLAS.
|
|
We encourage you to conduct these timing experiments
|
|
in REAL and COMPLEX or in DOUBLE PRECISION and COMPLEX*16; it is
|
|
not necessary to send timing results in all four data types.
|
|
|
|
Two sets of input files are provided, a small set and a large set.
|
|
The small data sets are appropriate for a standard workstation or
|
|
other non-vector machine.
|
|
The large data sets are appropriate for supercomputers, vector
|
|
computers, and high-performance workstations.
|
|
We are mainly interested in results from the large data sets, and
|
|
it is not necessary to run both the large and small sets.
|
|
The values of N in the large data sets are about five times larger
|
|
than those in the small data set,
|
|
and the large data sets use additional values for parameters such as the
|
|
block size NB and the leading array dimension LDA.
|
|
Small data sets finished with the \_small in their name , such as
|
|
\texttt{stime\_small.in}, and large data sets finished with \_large in their name,
|
|
such as \texttt{stime\_large.in}.
|
|
Except as noted, the leading `s' in the input file name must be
|
|
replaced by `d', `c', or `z' for the other data types.
|
|
|
|
We encourage you to obtain timing results with the large data sets,
|
|
as this allows us to compare different machines.
|
|
If this would take too much time, suggestions for paring back the large
|
|
data sets are given in the instructions below.
|
|
We also encourage you to experiment with these timing
|
|
programs and send us any interesting results, such as results for
|
|
larger problems or for a wider range of block sizes.
|
|
The main programs are dimensioned for the large data sets,
|
|
so the parameters in the main program may have to be reduced in order
|
|
to run the small data sets on a small machine, or increased to run
|
|
experiments with larger problems.
|
|
|
|
The minimum time each subroutine will be timed is set to 0.0 in
|
|
the large data files and to 0.05 in the small data files, and on
|
|
many machines this value should be increased.
|
|
If the timing interval is not long
|
|
enough, the time for the subroutine after subtracting the overhead
|
|
may be very small or zero, resulting in megaflop rates that are
|
|
very large or zero. (To avoid division by zero, the megaflop rate is
|
|
set to zero if the time is less than or equal to zero.)
|
|
The minimum time that should be used depends on the machine and the
|
|
resolution of the clock.
|
|
|
|
For more information on the timing programs and how to modify the
|
|
input files, please refer to LAPACK Working Note 41~\cite{WN41}.
|
|
% see Section~\ref{moretiming}.
|
|
|
|
If you do not wish to run each of the timings individually, you can
|
|
go to \texttt{LAPACK}, edit the definition \texttt{lapack\_timing} in the file
|
|
\texttt{Makefile} to specify the data types desired, and type \texttt{make
|
|
lapack\_timing}. This will compile
|
|
and run the timings for the linear equation routines and the eigensystem
|
|
routines (see Sections~\ref{timelin} and ~\ref{timeeig}).
|
|
|
|
%If you are installing LAPACK on a Silicon Graphics machine, you must
|
|
%modify the definition of \texttt{timing} to be
|
|
%\begin{verbatim}
|
|
%timing:
|
|
% ( cd TIMING; $(MAKE) -f Makefile.sgi )
|
|
%\end{verbatim}
|
|
|
|
If you encounter failures in any phase of the timing process, please
|
|
feel free to contact the authors as directed in Section~\ref{sendresults}.
|
|
Tell us the
|
|
type of machine on which the tests were run, the version of the operating
|
|
system, the compiler and compiler options that were used,
|
|
and details of the BLAS library or libraries that you used. You should
|
|
also include a copy of the output file in which the failure occurs.
|
|
|
|
Please note that the BLAS
|
|
timing runs will still need to be run as instructed in ~\ref{timeblas}.
|
|
|
|
\subsubsection{Timing the Linear Equations Routines}\label{timelin}
|
|
|
|
The linear equation timing program is found in \texttt{LAPACK/TIMING/LIN}
|
|
and the input files are in \texttt{LAPACK/TIMING}.
|
|
Three input files are provided in each data type for timing the
|
|
linear equation routines, one for square matrices, one for band
|
|
matrices, and one for rectangular matrices. The small data sets for the REAL version
|
|
are \texttt{stime\_small.in}, \texttt{sband\_small.in}, and \texttt{stime2\_small.in}, respectively,
|
|
and the large data sets are
|
|
\texttt{stime\_large.in}, \texttt{sband\_large.in}, and \texttt{stime2\_large.in}.
|
|
|
|
The timing program for the least squares routines uses special instrumented
|
|
versions of the LAPACK routines to time individual sections of the code.
|
|
The first step in compiling the timing program is therefore to make a library
|
|
of the instrumented routines.
|
|
|
|
\begin{itemize}
|
|
\item[a)]
|
|
\begin{sloppypar}
|
|
To make a library of the instrumented LAPACK routines, first
|
|
go to \texttt{LAPACK/TIMING/LIN/LINSRC} and type \texttt{make} followed
|
|
by the data types desired, as in the examples of Section~\ref{toplevelmakefile}.
|
|
The library of instrumented code is created in
|
|
\texttt{LAPACK/TIMING/LIN/linsrc.a}.
|
|
\end{sloppypar}
|
|
|
|
\item[b)]
|
|
To make the linear equation timing programs,
|
|
go to \texttt{LAPACK/TIMING/LIN} and type \texttt{make} followed by the data
|
|
types desired, as in the examples in Section~\ref{toplevelmakefile}.
|
|
The executable files are called \texttt{xlintims},
|
|
\texttt{xlintimc}, \texttt{xlintimd}, and \texttt{xlintimz} and are created
|
|
in \texttt{LAPACK/TIMING}.
|
|
|
|
\item[c)]
|
|
Go to \texttt{LAPACK/TIMING} and
|
|
make any necessary modifications to the input files.
|
|
You may need to set the minimum time a subroutine will
|
|
be timed to a positive value, or to restrict the size of the tests
|
|
if you are using a computer with performance in between that of a
|
|
workstation and that of a supercomputer.
|
|
The computational requirements can be cut in half by using only one
|
|
value of LDA.
|
|
If it is necessary to also reduce the matrix sizes or the values of
|
|
the blocksize, corresponding changes should be made to the
|
|
BLAS input files (see Section~\ref{timeblas}).
|
|
|
|
\item[d)]
|
|
Run the programs for each data type you are using.
|
|
For the REAL version, the commands for the small data sets are
|
|
|
|
\begin{list}{}{}
|
|
\item{} \texttt{xlintims < stime\_small.in > stime\_small.out }
|
|
\item{} \texttt{xlintims < sband\_small.in > sband\_small.out }
|
|
\item{} \texttt{xlintims < stime2\_small.in > stime2\_small.out }
|
|
\end{list}
|
|
or the commands for the large data sets are
|
|
\begin{list}{}{}
|
|
\item{} \texttt{xlintims < stime\_large.in > stime\_large.out }
|
|
\item{} \texttt{xlintims < sband\_large.in > sband\_large.out }
|
|
\item{} \texttt{xlintims < stime2\_large.in > stime2\_large.out }
|
|
\end{list}
|
|
|
|
\noindent
|
|
Similar commands should be used for the other data types.
|
|
\end{itemize}
|
|
|
|
\subsubsection{Timing the BLAS}\label{timeblas}
|
|
|
|
The linear equation timing program is also used to time the BLAS.
|
|
Three input files are provided in each data type for timing the Level
|
|
2 and 3 BLAS.
|
|
These input files time the BLAS using the matrix shapes encountered
|
|
in the LAPACK routines, and we will use the results to analyze the
|
|
performance of the LAPACK routines.
|
|
For the REAL version, the small data files are
|
|
\texttt{sblasa\_small.in}, \texttt{sblasb\_small.in}, and \texttt{sblasc\_small.in}
|
|
and the large data files are
|
|
\texttt{sblasa\_large.in}, \texttt{sblasb\_large.in}, and \texttt{sblasc\_large.in}.
|
|
There are three sets of inputs because there are three
|
|
parameters in the Level 3 BLAS, M, N, and K, and
|
|
in most applications one of these parameters is small (on the order
|
|
of the blocksize) while the other two are large (on the order of the
|
|
matrix size).
|
|
In \texttt{sblasa\_small.in}, M and N are large but K is
|
|
small, while in \texttt{sblasb\_small.in} the small parameter is M, and
|
|
in \texttt{sblasc\_small.in} the small parameter is N.
|
|
The Level 2 BLAS are timed only in the first data set, where K
|
|
is also used as the bandwidth for the banded routines.
|
|
|
|
\begin{itemize}
|
|
|
|
\item[a)]
|
|
Go to \texttt{LAPACK/TIMING} and
|
|
make any necessary modifications to the input files.
|
|
You may need to set the minimum time a subroutine will
|
|
be timed to a positive value.
|
|
If you modified the values of N or NB
|
|
in Section~\ref{timelin}, set M, N, and K accordingly.
|
|
The large parameters among M, N, and K
|
|
should be the same as the matrix sizes used in timing the linear
|
|
equation routines,
|
|
and the small parameter should be the same as the
|
|
blocksizes used in timing the linear equation routines.
|
|
If necessary, the large data set can be simplified by using only one
|
|
value of LDA.
|
|
|
|
\item[b)]
|
|
Run the programs for each data type you are using.
|
|
For the REAL version, the commands for the small data sets are
|
|
|
|
\begin{list}{}{}
|
|
\item{} \texttt{xlintims < sblasa\_small.in > sblasa\_small.out }
|
|
\item{} \texttt{xlintims < sblasb\_small.in > sblasb\_small.out }
|
|
\item{} \texttt{xlintims < sblasc\_small.in > sblasc\_small.out }
|
|
\end{list}
|
|
or the commands for the large data sets are
|
|
\begin{list}{}{}
|
|
\item{} \texttt{xlintims < sblasa\_large.in > sblasa\_large.out }
|
|
\item{} \texttt{xlintims < sblasb\_large.in > sblasb\_large.out }
|
|
\item{} \texttt{xlintims < sblasc\_large.in > sblasc\_large.out }
|
|
\end{list}
|
|
|
|
\noindent
|
|
Similar commands should be used for the other data types.
|
|
\end{itemize}
|
|
|
|
\subsubsection{Timing the Eigensystem Routines}\label{timeeig}
|
|
|
|
The eigensystem timing program is found in \texttt{LAPACK/TIMING/EIG}
|
|
and the input files are in \texttt{LAPACK/TIMING}.
|
|
Four input files are provided in each data type for timing the
|
|
eigensystem routines,
|
|
one for the generalized nonsymmetric eigenvalue problem,
|
|
one for the nonsymmetric eigenvalue problem,
|
|
one for the symmetric and generalized symmetric eigenvalue problem,
|
|
and one for the singular value decomposition.
|
|
For the REAL version, the small data sets are called \texttt{sgeptim\_small.in},
|
|
\texttt{sneptim\_small.in}, \texttt{sseptim\_small.in}, and \texttt{ssvdtim\_small.in}, respectively.
|
|
and the large data sets are called \texttt{sgeptim\_large.in}, \texttt{sneptim\_large.in},
|
|
\texttt{sseptim\_large.in}, and \texttt{ssvdtim\_large.in}.
|
|
Each of the four input files reads a different set of parameters,
|
|
and the format of the input is indicated by a 3-character code
|
|
on the first line.
|
|
|
|
The timing program for eigenvalue/singular value routines accumulates
|
|
the operation count as the routines are executing using special
|
|
instrumented versions of the LAPACK routines. The first step in
|
|
compiling the timing program is therefore to make a library of the
|
|
instrumented routines.
|
|
|
|
\begin{itemize}
|
|
\item[a)]
|
|
\begin{sloppypar}
|
|
To make a library of the instrumented LAPACK routines, first
|
|
go to \texttt{LAPACK/TIMING/EIG/EIGSRC} and type \texttt{make} followed
|
|
by the data types desired, as in the examples of Section~\ref{toplevelmakefile}.
|
|
The library of instrumented code is created in
|
|
\texttt{LAPACK/TIMING/EIG/eigsrc.a}.
|
|
\end{sloppypar}
|
|
|
|
\item[b)]
|
|
To make the eigensystem timing programs,
|
|
go to \texttt{LAPACK/TIMING/EIG} and
|
|
type \texttt{make} followed by the data types desired, as in the examples
|
|
of Section~\ref{toplevelmakefile}. The executable files are called
|
|
\texttt{xeigtims}, \texttt{xeigtimc}, \texttt{xeigtimd}, and \texttt{xeigtimz}
|
|
and are created in \texttt{LAPACK/TIMING}.
|
|
|
|
\item[c)]
|
|
Go to \texttt{LAPACK/TIMING} and
|
|
make any necessary modifications to the input files.
|
|
You may need to set the minimum time a subroutine will
|
|
be timed to a positive value, or to restrict the number of tests
|
|
if you are using a computer with performance in between that of a
|
|
workstation and that of a supercomputer.
|
|
Instead of decreasing the matrix dimensions to reduce the time,
|
|
it would be better to reduce the number of matrix types to be timed,
|
|
since the performance varies more with the matrix size than with the
|
|
type. For example, for the nonsymmetric eigenvalue routines,
|
|
you could use only one matrix of type 4 instead of four matrices of
|
|
types 1, 3, 4, and 6.
|
|
Refer to LAPACK Working Note 41~\cite{WN41} for further details.
|
|
% See Section~\ref{moretiming} for further details.
|
|
|
|
\item[d)]
|
|
Run the programs for each data type you are using.
|
|
For the REAL version, the commands for the small data sets are
|
|
|
|
\begin{list}{}{}
|
|
\item{} \texttt{xeigtims < sgeptim\_small.in > sgeptim\_small.out }
|
|
\item{} \texttt{xeigtims < sneptim\_small.in > sneptim\_small.out }
|
|
\item{} \texttt{xeigtims < sseptim\_small.in > sseptim\_small.out }
|
|
\item{} \texttt{xeigtims < ssvdtim\_small.in > ssvdtim\_small.out }
|
|
\end{list}
|
|
or the commands for the large data sets are
|
|
\begin{list}{}{}
|
|
\item{} \texttt{xeigtims < sgeptim\_large.in > sgeptim\_large.out }
|
|
\item{} \texttt{xeigtims < sneptim\_large.in > sneptim\_large.out }
|
|
\item{} \texttt{xeigtims < sseptim\_large.in > sseptim\_large.out }
|
|
\item{} \texttt{xeigtims < ssvdtim\_large.in > ssvdtim\_large.out }
|
|
\end{list}
|
|
|
|
\noindent
|
|
Similar commands should be used for the other data types.
|
|
\end{itemize}
|
|
|
|
\subsection{Send the Results to Tennessee}\label{sendresults}
|
|
|
|
Congratulations! You have now finished installing, testing, and
|
|
timing LAPACK. If you encountered failures in any phase of the
|
|
testing or timing process, please
|
|
consult our \texttt{release\_notes} file on netlib.
|
|
\begin{quote}
|
|
\url{http://www.netlib.org/lapack/release\_notes}
|
|
\end{quote}
|
|
This file contains machine-dependent installation clues which hopefully will
|
|
alleviate your difficulties or at least let you know that other users
|
|
have had similar difficulties on that machine. If there is not an entry
|
|
for your machine or the suggestions do not fix your problem, please feel
|
|
free to contact the authors at
|
|
\begin{list}{}{}
|
|
\item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}.
|
|
\end{list}
|
|
Tell us the
|
|
type of machine on which the tests were run, the version of the operating
|
|
system, the compiler and compiler options that were used,
|
|
and details of the BLAS library or libraries that you used. You should
|
|
also include a copy of the output file in which the failure occurs.
|
|
|
|
We would like to keep our \texttt{release\_notes} file as up-to-date as possible.
|
|
Therefore, if you do not see an entry for your machine, please contact us
|
|
with your testing results.
|
|
|
|
Comments and suggestions are also welcome.
|
|
|
|
We encourage you to make the LAPACK library available to your
|
|
users and provide us with feedback from their experiences.
|
|
%This release of LAPACK is not guaranteed to be compatible
|
|
%with any previous test release.
|
|
|
|
\subsection{Get support}\label{getsupport}
|
|
First, take a look at the complete installation manual in the LAPACK Working Note 41~\cite{WN41}.
|
|
if you still cannot solve your problem, you have 2 ways to go:
|
|
\begin{itemize}
|
|
\item
|
|
either send a post in the LAPACK forum
|
|
\begin{quote}
|
|
\url{http://icl.cs.utk.edu/lapack-forum}
|
|
\end{quote}
|
|
\item
|
|
or send an email to the LAPACK mailing list:
|
|
\begin{list}{}{}
|
|
\item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}.
|
|
\end{list}
|
|
\end{itemize}
|
|
\section*{Acknowledgments}
|
|
|
|
Ed Anderson and Susan Blackford contributed to previous versions of this report.
|
|
|
|
\appendix
|
|
|
|
\chapter{Caveats}\label{appendixd}
|
|
|
|
In this appendix we list a few of the machine-specific difficulties we
|
|
have
|
|
encountered in our own experience with LAPACK. A more detailed list
|
|
of machine-dependent problems, bugs, and compiler errors encountered
|
|
in the LAPACK installation process is maintained
|
|
on \emph{netlib}.
|
|
\begin{quote}
|
|
\url{http://www.netlib.org/lapack/release\_notes}
|
|
\end{quote}
|
|
|
|
We assume the user has installed the machine-specific routines
|
|
correctly and that the Level 1, 2 and 3 BLAS test programs have run
|
|
successfully, so we do not list any warnings associated with those
|
|
routines.
|
|
|
|
\section{\texttt{LAPACK/make.inc}}
|
|
|
|
All machine-specific
|
|
parameters are specified in the file \texttt{LAPACK/make.inc}.
|
|
|
|
The first line of this \texttt{make.inc} file is:
|
|
\begin{quote}
|
|
SHELL = /bin/sh
|
|
\end{quote}
|
|
and will need to be modified to \texttt{SHELL = /sbin/sh} if you are
|
|
installing LAPACK on an SGI architecture.
|
|
|
|
\section{ETIME}
|
|
|
|
On HPPA architectures,
|
|
the compiler and linker flag \texttt{+U77} should be included to access
|
|
the function \texttt{ETIME}.
|
|
|
|
\section{ILAENV and IEEE-754 compliance}
|
|
|
|
%By default, ILAENV (\texttt{LAPACK/SRC/ilaenv.f}) assumes an IEEE and IEEE-754
|
|
%compliant architecture, and thus sets (\texttt{ILAENV=1}) for (\texttt{ISPEC=10})
|
|
%and (\texttt{ISPEC=11}) settings in ILAENV.
|
|
%
|
|
%If you are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
|
|
%as this test inside ILAENV will crash!
|
|
|
|
As some new routines in LAPACK rely on IEEE-754 compliance,
|
|
two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV
|
|
(\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and
|
|
infinity arithmetic, respectively. By default, ILAENV assumes an IEEE
|
|
machine, and does a test for IEEE-754 compliance. \textbf{NOTE: If you
|
|
are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
|
|
as this test inside ILAENV will crash!}
|
|
|
|
Thus, for non-IEEE machines, the user must hard-code the setting of
|
|
(\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version
|
|
of \texttt{LAPACK/SRC/ilaenv.f} to be put in
|
|
his library. For further details, refer to section~\ref{testieee}.
|
|
|
|
Be aware
|
|
that some IEEE compilers by default do not enforce IEEE-754 compliance, and
|
|
a compiler flag must be explicitly set by the user.
|
|
|
|
On SGIs for example, you must set the \texttt{-OPT:IEEE\_NaN\_inf=ON} compiler
|
|
flag to enable IEEE-754 compliance.
|
|
|
|
And lastly, the test inside ILAENV to detect IEEE-754 compliance, will
|
|
result in IEEE exceptions for ``Divide by Zero'' and ``Invalid Operation''.
|
|
Thus, if the user is installing on a machine that issues IEEE exception
|
|
warning messages (like a Sun SPARCstation), the user can disregard these
|
|
messages. To avoid these messages, the user can hard-code the values
|
|
inside ILAENV as explained in section~\ref{testieee}.
|
|
|
|
\section{Lack of \texttt{/tmp} space}
|
|
|
|
If \texttt{/tmp} space is small (i.e., less than approximately 16 MB) on your
|
|
architecture, you may run out of space
|
|
when compiling. There are a few possible solutions to this problem.
|
|
\begin{enumerate}
|
|
\item You can ask your system administrator to increase the size of the
|
|
\texttt{/tmp} partition.
|
|
\item You can change the environment variable \texttt{TMPDIR} to point to
|
|
your home directory for temporary space. E.g.,
|
|
\begin{quote}
|
|
\texttt{setenv TMPDIR /home/userid/}
|
|
\end{quote}
|
|
where \texttt{/home/userid/} is the user's home directory.
|
|
\item If your archive command has an \texttt{l} option, you can change the
|
|
archive command to \texttt{ar crl} so that the
|
|
archive command will only place temporary files in the current working
|
|
directory rather than in the default temporary directory /tmp.
|
|
\end{enumerate}
|
|
|
|
\section{BLAS}
|
|
|
|
If you suspect a BLAS-related problem and you are linking
|
|
with an optimized version of the BLAS, we would strongly suggest
|
|
as a first step that you link to the Fortran~77 version of
|
|
the suspected BLAS routine and see if the error has disappeared.
|
|
|
|
We have included test programs for the Level 1 BLAS.
|
|
Users should therefore beware of a common problem in machine-specific
|
|
implementations of xNRM2,
|
|
the function to compute the 2-norm of a vector.
|
|
The Fortran version of xNRM2 avoids underflow or overflow
|
|
by scaling intermediate results, but some library versions of xNRM2
|
|
are not so careful about scaling.
|
|
If xNRM2 is implemented without scaling intermediate results, some of
|
|
the LAPACK test ratios may be unusually high, or
|
|
a floating point exception may occur in the problems scaled near
|
|
underflow or overflow.
|
|
The solution to these problems is to link the Fortran version of
|
|
xNRM2 with the test program. \emph{On some CRAY architectures, the Fortran77
|
|
version of xNRM2 should be used.}
|
|
|
|
\section{Optimization}
|
|
|
|
If a large numbers of test failures occur for a specific matrix type
|
|
or operation, it could be that there is an optimization problem with
|
|
your compiler. Thus, the user could try reducing the level of
|
|
optimization or eliminating optimization entirely for those routines
|
|
to see if the failures disappear when you rerun the tests.
|
|
|
|
%LAPACK is written in Fortran 77. Prospective users with only a
|
|
%Fortran 66 compiler will not be able to use this package.
|
|
|
|
\section{Compiling testing/timing drivers}
|
|
|
|
The testing and timing main programs (xCHKAA, xCHKEE, xTIMAA, and
|
|
xTIMEE)
|
|
allocate large amounts of local variables. Therefore, it is vitally
|
|
important that the user know if his compiler by default allocates local
|
|
variables statically or on the stack. It is not uncommon for those
|
|
compilers which place local variables on the stack to cause a stack
|
|
overflow at runtime in the testing or timing process. The user then
|
|
has two options: increase your stack size, or force all local variables
|
|
to be allocated statically.
|
|
|
|
On HPPA architectures, the
|
|
compiler and linker flag \texttt{-K} should be used when compiling these testing
|
|
and timing main programs to avoid such a stack overflow. I.e., set
|
|
\texttt{FFLAGS\_DRV = -K} in the \texttt{LAPACK/make.inc} file.
|
|
|
|
For similar reasons,
|
|
on SGI architectures, the compiler and linker flag \texttt{-static} should be
|
|
used. I.e., set \texttt{FFLAGS\_DRV = -static} in the \texttt{LAPACK/make.inc} file.
|
|
|
|
\section{IEEE arithmetic}
|
|
|
|
Some of our test matrices are scaled near overflow or underflow,
|
|
but on the Crays, problems with the arithmetic near overflow and
|
|
underflow forced us to scale by only the square root of overflow
|
|
and underflow.
|
|
The LAPACK auxiliary routine SLABAD (or DLABAD) is called to
|
|
take the square root of underflow and overflow in cases where it
|
|
could cause difficulties.
|
|
We assume we are on a Cray if $ \log_{10} (\mathrm{overflow})$
|
|
is greater than 2000
|
|
and take the square root of underflow and overflow in this case.
|
|
The test in SLABAD is as follows:
|
|
\begin{verbatim}
|
|
IF( LOG10( LARGE ).GT.2000. ) THEN
|
|
SMALL = SQRT( SMALL )
|
|
LARGE = SQRT( LARGE )
|
|
END IF
|
|
\end{verbatim}
|
|
Users of other machines with similar restrictions on the effective
|
|
range of usable numbers may have to modify this test so that the
|
|
square roots are done on their machine as well. \emph{Usually on
|
|
HPPA architectures, a similar restriction in SLABAD should be enforced
|
|
for all testing involving complex arithmetic.}
|
|
SLABAD is located in \texttt{LAPACK/SRC}.
|
|
|
|
For machines which have a narrow exponent range or lack gradual
|
|
underflow (DEC VAXes for example), it is not uncommon to experience
|
|
failures in sec.out and/or dec.out with SLAQTR/DLAQTR or DTRSYL.
|
|
The failures in SLAQTR/DLAQTR and DTRSYL
|
|
occur with test problems which are very badly scaled when the norm of
|
|
the solution is very close to the underflow
|
|
threshold (or even underflows to zero). We believe that these failures
|
|
could probably be avoided by an even greater degree of care in scaling,
|
|
but we did not want to delay the release of LAPACK any further. These
|
|
tests pass successfully on most other machines. An example failure in
|
|
dec.out on a MicroVAX II looks like the following:
|
|
|
|
\begin{verbatim}
|
|
Tests of the Nonsymmetric eigenproblem condition estimation routines
|
|
DLALN2, DLASY2, DLANV2, DLAEXC, DTRSYL, DTREXC, DTRSNA, DTRSEN, DLAQTR
|
|
|
|
Relative machine precision (EPS) = 0.277556D-16
|
|
Safe minimum (SFMIN) = 0.587747D-38
|
|
|
|
Routines pass computational tests if test ratio is less than 20.00
|
|
|
|
DEC routines passed the tests of the error exits ( 35 tests done)
|
|
Error in DTRSYL: RMAX = 0.155D+07
|
|
LMAX = 5323 NINFO= 1600 KNT= 27648
|
|
Error in DLAQTR: RMAX = 0.344D+04
|
|
LMAX = 15792 NINFO= 26720 KNT= 45000
|
|
\end{verbatim}
|
|
|
|
\section{Timing programs}
|
|
|
|
In the eigensystem timing program, calls are made to the LINPACK
|
|
and EISPACK equivalents of the LAPACK routines to allow a direct
|
|
comparison of performance measures.
|
|
In some cases we have increased the minimum number of
|
|
iterations in the LINPACK and EISPACK routines to allow
|
|
them to converge for our test problems, but
|
|
even this may not be enough.
|
|
One goal of the LAPACK project is to improve the convergence
|
|
properties of these routines, so error messages in the output
|
|
file indicating that a LINPACK or EISPACK routine did not
|
|
converge should not be regarded with alarm.
|
|
|
|
In the eigensystem timing program, we have equivalenced some work
|
|
arrays and then passed them to a subroutine, where both arrays are
|
|
modified. This is a violation of the Fortran~77 standard, which
|
|
says ``if a subprogram reference causes a dummy argument in the
|
|
referenced subprogram to become associated with another dummy
|
|
argument in the referenced subprogram, neither dummy argument may
|
|
become defined during execution of the subprogram.''
|
|
\footnote{ ANSI X3.9-1978, sec. 15.9.3.6}
|
|
If this causes any difficulties, the equivalence
|
|
can be commented out as explained in the comments for the main
|
|
eigensystem timing programs.
|
|
|
|
%\section*{MACHINE-SPECIFIC DIFFICULTIES}
|
|
%Some IBM compilers do not recognize DBLE as a generic function as used
|
|
%in LAPACK. The software tools we use to convert from single precision
|
|
%to double precision convert REAL(C) and AIMAG(C), where C is COMPLEX,
|
|
%to DBLE(Z) and DIMAG(Z), where Z is COMPLEX*16, but
|
|
%IBM compilers use DREAL(Z) and DIMAG(Z) to take the real and
|
|
%imaginary parts of a double complex number.
|
|
%IBM users can fix this problem by changing DBLE to DREAL when the
|
|
%argument of DBLE is COMPLEX*16.
|
|
%
|
|
%IBM compilers do not permit the data type COMPLEX*16 in a FUNCTION
|
|
%subprogram definition. The data type on the first line of the
|
|
%function subprogram must be changed from COMPLEX*16 to DOUBLE COMPLEX
|
|
%for the following functions:
|
|
%
|
|
%\begin{tabbing}
|
|
%\dent ZLATMOO \= from the test matrix generator library \kill
|
|
%\dent ZBEG \> from the Level 2 BLAS test program \\
|
|
%\dent ZBEG \> from the Level 3 BLAS test program \\
|
|
%\dent ZLADIV \> from the LAPACK library \\
|
|
%\dent ZLARND \> from the test matrix generator library \\
|
|
%\dent ZLATM2 \> from the test matrix generator library \\
|
|
%\dent ZLATM3 \> from the test matrix generator library
|
|
%\end{tabbing}
|
|
%The functions ZDOTC and ZDOTU from the Level 1 BLAS are already
|
|
%declared DOUBLE COMPLEX. If that doesn't work, try the declaration
|
|
%COMPLEX FUNCTION*16.
|
|
|
|
|
|
\newpage
|
|
\addcontentsline{toc}{section}{Bibliography}
|
|
|
|
\begin{thebibliography}{9}
|
|
|
|
\bibitem{LUG}
|
|
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra,
|
|
J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney,
|
|
S. Ostrouchov, and D. Sorensen,
|
|
\textit{LAPACK Users' Guide}, Second Edition,
|
|
{SIAM}, Philadelphia, PA, 1995.
|
|
|
|
\bibitem{WN16}
|
|
E. Anderson and J. Dongarra,
|
|
\textit{LAPACK Working Note 16:
|
|
Results from the Initial Release of LAPACK},
|
|
University of Tennessee, CS-89-89, November 1989.
|
|
|
|
\bibitem{WN41}
|
|
E. Anderson, J. Dongarra, and S. Ostrouchov,
|
|
\textit{LAPACK Working Note 41:
|
|
Installation Guide for LAPACK},
|
|
University of Tennessee, CS-92-151, February 1992 (revised June 1999).
|
|
|
|
\bibitem{WN5}
|
|
C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum,
|
|
S. Hammarling, and D. Sorensen,
|
|
\textit{LAPACK Working Note \#5: Provisional Contents},
|
|
Argonne National Laboratory, ANL-88-38, September 1988.
|
|
|
|
\bibitem{WN13}
|
|
Z. Bai, J. Demmel, and A. McKenney,
|
|
\textit{LAPACK Working Note \#13: On the Conditioning of the Nonsymmetric
|
|
Eigenvalue Problem: Theory and Software},
|
|
University of Tennessee, CS-89-86, October 1989.
|
|
|
|
\bibitem{XBLAS}
|
|
X. S. Li, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar,
|
|
W. Kahan, S. Y. Kang, A. Kapur, M. C. Martin, B. J. Thompson, T. Tung,
|
|
and D. J. Yoo, \textit{Design, implementation and testing of extended
|
|
and mixed precision BLAS},
|
|
\textit{ACM Trans. Math. Soft.}, 28, 2:152--205, June 2002.
|
|
|
|
\bibitem{BLAS3}
|
|
J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling,
|
|
``A Set of Level 3 Basic Linear Algebra Subprograms,''
|
|
\textit{ACM Trans. Math. Soft.}, 16, 1:1-17, March 1990
|
|
%Argonne National Laboratory, ANL-MCS-P88-1, August 1988.
|
|
|
|
\bibitem{BLAS3-test}
|
|
J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling,
|
|
``A Set of Level 3 Basic Linear Algebra Subprograms:
|
|
Model Implementation and Test Programs,''
|
|
\textit{ACM Trans. Math. Soft.}, 16, 1:18-28, March 1990
|
|
%Argonne National Laboratory, ANL-MCS-TM-119, June 1988.
|
|
|
|
\bibitem{BLAS2}
|
|
J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson,
|
|
``An Extended Set of Fortran Basic Linear Algebra Subprograms,''
|
|
\textit{ACM Trans. Math. Soft.}, 14, 1:1-17, March 1988.
|
|
|
|
\bibitem{BLAS2-test}
|
|
J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson,
|
|
``An Extended Set of Fortran Basic Linear Algebra Subprograms:
|
|
Model Implementation and Test Programs,''
|
|
\textit{ACM Trans. Math. Soft.}, 14, 1:18-32, March 1988.
|
|
|
|
\bibitem{BLAS1}
|
|
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh,
|
|
``Basic Linear Algebra Subprograms for Fortran Usage,''
|
|
\textit{ACM Trans. Math. Soft.}, 5, 3:308-323, September 1979.
|
|
|
|
\end{thebibliography}
|
|
|
|
\end{document}
|
|
|