|
- \documentclass[11pt]{report}
-
- \usepackage{indentfirst}
- \usepackage[body={6in,8.5in}]{geometry}
- \usepackage{hyperref}
- \usepackage{graphicx}
- \DeclareGraphicsRule{.ps}{eps}{}{}
-
- \renewcommand{\thesection}{\arabic{section}}
- \setcounter{tocdepth}{3}
- \setcounter{secnumdepth}{3}
-
- \begin{document}
- \begin{center}
- {\Large LAPACK Working Note 81\\
- Quick Installation Guide for LAPACK on Unix Systems\footnote{This work was
- supported by NSF Grant No. ASC-8715728 and NSF Grant No. 0444486}}
- \end{center}
- \begin{center}
- % Edward Anderson\footnote{Current address: Cray Research Inc.,
- % 655F Lone Oak Drive, Eagan, MN 55121},
- The LAPACK Authors\\
- Department of Computer Science \\
- University of Tennessee \\
- Knoxville, Tennessee 37996-1301 \\
- \end{center}
- \begin{center}
- REVISED: VERSION 3.1.1, February 2007 \\
- REVISED: VERSION 3.2.0, November 2008
- \end{center}
-
- \begin{center}
- Abstract
- \end{center}
- This working note describes how to install, and test version 3.2.0
- of LAPACK, a linear algebra package for high-performance
- computers, on a Unix System. The timing routines are not actually included in
- release 3.2.0, and that part of the LAWN refers to release 3.0. Also,
- version 3.2.0 contains many prototype routines needing user feedback.
- Non-Unix installation instructions and
- further details of the testing and timing suites are only contained in
- LAPACK Working Note 41, and not in this abbreviated version.
- %Separate instructions are provided for the Unix and non-Unix
- %versions of the test package.
- %Further details are also given on the design of the test and timing
- %programs.
- \newpage
-
- \tableofcontents
-
- \newpage
- % Introduction to Implementation Guide
-
- \section{Introduction}
-
- LAPACK is a linear algebra library for high-performance
- computers.
- The library includes Fortran subroutines for
- the analysis and solution of systems of simultaneous linear algebraic
- equations, linear least-squares problems, and matrix eigenvalue
- problems.
- Our approach to achieving high efficiency is based on the use of
- a standard set of Basic Linear Algebra Subprograms (the BLAS),
- which can be optimized for each computing environment.
- By confining most of the computational work to the BLAS,
- the subroutines should be
- transportable and efficient across a wide range of computers.
-
- This working note describes how to install, test, and time this
- release of LAPACK on a Unix System.
-
- The instructions for installing, testing, and timing
- \footnote{timing are only provided in LAPACK 3.0 and before}
- are designed for a person whose
- responsibility is the maintenance of a mathematical software library.
- We assume the installer has experience in compiling and running
- Fortran programs and in creating object libraries.
- The installation process involves untarring the file, creating a set of
- libraries, and compiling and running the test and timing programs
- \footnotemark[\value{footnote}].
-
- %This guide combines the instructions for the Unix and non-Unix
- %versions of the LAPACK test package (the non-Unix version is in Appendix
- %~\ref{appendixe}).
- %At this time, the non-Unix version of LAPACK can only be obtained
- %after first untarring the Unix tar tape and then following the instructions in
- %Appendix ~\ref{appendixe}.
-
- Section~\ref{fileformat} describes how the files are organized in the
- file, and
- Section~\ref{overview} gives a general overview of the parts of the test package.
- Step-by-step instructions appear in Section~\ref{installation}.
- %for the Unix version and in the appendix for the non-Unix version.
-
- For users desiring additional information, please refer to LAPACK
- Working Note 41.
- % Sections~\ref{moretesting}
- %and ~\ref{moretiming} give
- %details of the test and timing programs and their input files.
- %Appendices ~\ref{appendixa} and ~\ref{appendixb} briefly describe
- %the LAPACK routines and auxiliary routines provided
- %in this release.
- %Appendix ~\ref{appendixc} lists the operation counts we have computed
- %for the BLAS and for some of the LAPACK routines.
- Appendix ~\ref{appendixd}, entitled ``Caveats'', is a compendium of the known
- problems from our own experiences, with suggestions on how to
- overcome them.
-
- \textbf{It is strongly advised that the user read Appendix
- A before proceeding with the installation process.}
- %Appendix E contains the execution times of the different test
- %and timing runs on two sample machines.
- %Appendix ~\ref{appendixe} contains the instructions to install LAPACK on a non-Unix
- %system.
-
- \section{Revisions Since the First Public Release}
-
- Since its first public release in February, 1992, LAPACK has had
- several updates, which have encompassed the introduction of new routines
- as well as extending the functionality of existing routines. The first
- update,
- June 30, 1992, was version 1.0a; the second update, October 31, 1992,
- was version 1.0b; the third update, March 31, 1993, was version 1.1;
- version 2.0 on September 30, 1994, coincided with the release of the
- Second Edition of the LAPACK Users' Guide;
- version 3.0 on June 30, 1999 coincided with the release of the Third Edition of
- the LAPACK Users' Guide;
- version 3.1 was released on November, 2006;
- version 3.1.1 was released on November, 2007;
- and version 3.2.0 was released on November, 2008.
-
- All LAPACK routines reflect the current version number with the date
- on the routine indicating when it was last modified.
- For more information on revisions in the latest release, please refer
- to the \texttt{revisions.info} file in the lapack directory on netlib.
- \begin{quote}
- \url{http://www.netlib.org/lapack/revisions.info}
- \end{quote}
-
- %The distribution \texttt{tar} file \texttt{lapack.tar.z} that is
- %available on netlib is always the most up-to-date.
- %
- %On-line manpages (troff files) for LAPACK driver and computational
- %routines, as well as most of the BLAS routines, are available via
- %the \texttt{lapack} index on netlib.
-
- \section{File Format}\label{fileformat}
-
- The software for LAPACK is distributed in the form of a
- gzipped tar file (via anonymous ftp or the World Wide Web),
- which contains the Fortran source for LAPACK,
- the Basic Linear Algebra Subprograms
- (the Level 1, 2, and 3 BLAS) needed by LAPACK, the testing programs,
- and the timing programs\footnotemark[\value{footnote}].
- Users who wish to have a non-Unix installation should refer to LAPACK
- Working Note 41,
- although the overview in section~\ref{overview} applies to both the Unix and non-Unix
- versions.
- %Users who wish to have a non-Unix installation should go to Appendix ~\ref{appendixe},
- %although the overview in section ~\ref{overview} applies to both the Unix and non-Unix
- %versions.
-
- The package may be accessed via the World Wide Web through
- the URL address:
- \begin{quote}
- \url{http://www.netlib.org/lapack/lapack.tgz}
- \end{quote}
-
- Or, you can retrieve the file via anonymous ftp at netlib:
-
- \begin{verbatim}
- ftp ftp.netlib.org
- login: anonymous
- password: <your email address>
- cd lapack
- binary
- get lapack.tgz
- quit
- \end{verbatim}
-
- The software in the \texttt{tar} file
- is organized in a number of essential directories as shown
- in Figure 1. Please note that this figure does not reflect everything
- that is contained in the \texttt{LAPACK} directory. Input and instructional
- files are also located at various levels.
- \begin{figure}
- \vspace{11pt}
- \centerline{\includegraphics[width=6.5in,height=3in]{org2.ps}}
- \caption{Unix organization of LAPACK 3.0}
- \vspace{11pt}
- \end{figure}
- Libraries are created in the LAPACK directory and
- executable files are created in one of the directories BLAS, TESTING,
- or TIMING\footnotemark[\value{footnote}]. Input files for the test and
- timing\footnotemark[\value{footnote}] programs are also
- found in these three directories so that testing may be carried out
- in the directories LAPACK/BLAS, LAPACK/TESTING, and LAPACK/TIMING \footnotemark[\value{footnote}].
- A top-level makefile in the LAPACK directory is provided to perform the
- entire installation procedure.
-
- \section{Overview of Tape Contents}\label{overview}
-
- Most routines in LAPACK occur in four versions: REAL,
- DOUBLE PRECISION, COMPLEX, and COMPLEX*16.
- The first three versions (REAL, DOUBLE PRECISION, and COMPLEX)
- are written in standard Fortran and are completely portable;
- the COMPLEX*16 version is provided for
- those compilers which allow this data type.
- Some routines use features of Fortran 90.
- For convenience, we often refer to routines by their single precision
- names; the leading `S' can be replaced by a `D' for double precision,
- a `C' for complex, or a `Z' for complex*16.
- For LAPACK use and testing you must decide which version(s)
- of the package you intend to install at your site (for example,
- REAL and COMPLEX on a Cray computer or DOUBLE PRECISION and
- COMPLEX*16 on an IBM computer).
-
- \subsection{LAPACK Routines}
-
- There are three classes of LAPACK routines:
- \begin{itemize}
-
- \item \textbf{driver} routines solve a complete problem, such as solving
- a system of linear equations or computing the eigenvalues of a real
- symmetric matrix. Users are encouraged to use a driver routine if there
- is one that meets their requirements. The driver routines are listed
- in LAPACK Working Note 41~\cite{WN41} and the LAPACK Users' Guide~\cite{LUG}.
- %in Appendix ~\ref{appendixa}.
-
- \item \textbf{computational} routines, also called simply LAPACK routines,
- perform a distinct computational task, such as computing
- the $LU$ decomposition of an $m$-by-$n$ matrix or finding the
- eigenvalues and eigenvectors of a symmetric tridiagonal matrix using
- the $QR$ algorithm.
- The LAPACK routines are listed in LAPACK Working Note 41~\cite{WN41}
- and the LAPACK Users' Guide~\cite{LUG}.
- %The LAPACK routines are listed in Appendix ~\ref{appendixa}; see also LAPACK
- %Working Note \#5 \cite{WN5}.
-
- \item \textbf{auxiliary} routines are all the other subroutines called
- by the driver routines and computational routines.
- %Among them are subroutines to perform subtasks of block algorithms,
- %in particular, the unblocked versions of the block algorithms;
- %extensions to the BLAS, such as matrix-vector operations involving
- %complex symmetric matrices;
- %the special routines LSAME and XERBLA which first appeared with the
- %BLAS;
- %and a number of routines to perform common low-level computations,
- %such as computing a matrix norm, generating an elementary Householder
- %transformation, and applying a sequence of plane rotations.
- %Many of the auxiliary routines may be of use to numerical analysts
- %or software developers, so we have documented the Fortran source for
- %these routines with the same level of detail used for the LAPACK
- %routines and driver routines.
- The auxiliary routines are listed in LAPACK Working Note 41~\cite{WN41}
- and the LAPACK Users' Guide~\cite{LUG}.
- %The auxiliary routines are listed in Appendix ~\ref{appendixb}.
- \end{itemize}
-
- \subsection{Level 1, 2, and 3 BLAS}
-
- The BLAS are a set of Basic Linear Algebra Subprograms that perform
- vector-vector, matrix-vector, and matrix-matrix operations.
- LAPACK is designed around the Level 1, 2, and 3 BLAS, and nearly all
- of the parallelism in the LAPACK routines is contained in the BLAS.
- Therefore,
- the key to getting good performance from LAPACK lies in having an
- efficient version of the BLAS optimized for your particular machine.
- Optimized BLAS libraries are available on a variety of architectures,
- refer to the BLAS FAQ on netlib for further information.
- \begin{quote}
- \url{http://www.netlib.org/blas/faq.html}
- \end{quote}
- There are also freely available BLAS generators that automatically
- tune a subset of the BLAS for a given architecture. E.g.,
- \begin{quote}
- \url{http://www.netlib.org/atlas/}
- \end{quote}
- And, if all else fails, there is the Fortran~77 reference implementation
- of the Level 1, 2, and 3 BLAS available on netlib (also included in
- the LAPACK distribution tar file).
- \begin{quote}
- \url{http://www.netlib.org/blas/blas.tgz}
- \end{quote}
- No matter which BLAS library is used, the BLAS test programs should
- always be run.
-
- Users should not expect too much from the Fortran~77 reference implementation
- BLAS; these versions were written to define the basic operations and do not
- employ the standard tricks for optimizing Fortran code.
-
- The formal definitions of the Level 1, 2, and 3 BLAS
- are in \cite{BLAS1}, \cite{BLAS2}, and \cite{BLAS3}.
- The BLAS Quick Reference card is available on netlib.
-
- \subsection{Mixed- and Extended-Precision BLAS: XBLAS}
-
- The XBLAS extend the BLAS to work with mixed input and output
- precisions as well as using extra precision internally. The XBLAS are
- used in the prototype extra-precise iterative refinement codes.
-
- The current release of the XBLAS is available through
- Netlib\footnote{Development versions may be available through
- \url{http://www.cs.berkeley.edu/~yozo/} or
- \url{http://www.nersc.gov/~xiaoye/XBLAS/}.} at
- \begin{quote}
- \url{http://www.netlib.org/xblas}
- \end{quote}
- Their formal definition is in \cite{XBLAS}.
-
- \subsection{LAPACK Test Routines}
-
- This release contains two distinct test programs for LAPACK routines
- in each data type. One test program tests the routines for solving
- linear equations and linear least squares problems,
- and the other tests routines for the matrix eigenvalue problem.
- The routines for generating test matrices are used by both test
- programs and are compiled into a library for use by both test programs.
-
- \subsection{LAPACK Timing Routines (for LAPACK 3.0 and before) }
-
- This release also contains two distinct timing programs for the
- LAPACK routines in each data type.
- The linear equation timing program gathers performance data in
- megaflops on the factor, solve, and inverse routines for solving
- linear systems, the routines to generate or apply an orthogonal matrix
- given as a sequence of elementary transformations, and the reductions
- to bidiagonal, tridiagonal, or Hessenberg form for eigenvalue
- computations.
- The operation counts used in computing the megaflop rates are computed
- from a formula;
- see LAPACK Working Note 41~\cite{WN41}.
- % see Appendix ~\ref{appendixc}.
- The eigenvalue timing program is used with the eigensystem routines
- and returns the execution time, number of floating point operations, and
- megaflop rate for each of the requested subroutines.
- In this program, the number of operations is computed while the
- code is executing using special instrumented versions of the LAPACK
- subroutines.
-
- \section{Installing LAPACK on a Unix System}\label{installation}
-
- Installing, testing, and timing\footnotemark[\value{footnote}] the Unix version of LAPACK
- involves the following steps:
- \begin{enumerate}
- \item Gunzip and tar the file.
-
- \item Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}.
-
- \item Edit the file \texttt{LAPACK/Makefile} and type \texttt{make}.
-
- %\item Test and Install the Machine-Dependent Routines \\
- %\emph{(WARNING: You may need to supply a correct version of second.f and
- %dsecnd.f for your machine)}
- %{\tt
- %\begin{list}{}{}
- %\item cd LAPACK
- %\item make install
- %\end{list} }
- %
- %\item Create the BLAS Library, \emph{if necessary} \\
- %\emph{(NOTE: For best performance, it is recommended you use the manufacturers' BLAS)}
- %{\tt
- %\begin{list}{}{}
- %\item \texttt{cd LAPACK}
- %\item \texttt{make blaslib}
- %\end{list} }
- %
- %\item Run the Level 1, 2, and 3 BLAS Test Programs
- %\begin{list}{}{}
- %\item \texttt{cd LAPACK}
- %\item \texttt{make blas\_testing}
- %\end{list}
- %
- %\item Create the LAPACK Library
- %\begin{list}{}{}
- %\item \texttt{cd LAPACK}
- %\item \texttt{make lapacklib}
- %\end{list}
- %
- %\item Create the Library of Test Matrix Generators
- %\begin{list}{}{}
- %\item \texttt{cd LAPACK}
- %\item \texttt{make tmglib}
- %\end{list}
- %
- %\item Run the LAPACK Test Programs
- %\begin{list}{}{}
- %\item \texttt{cd LAPACK}
- %\item \texttt{make testing}
- %\end{list}
- %
- %\item Run the LAPACK Timing Programs
- %\begin{list}{}{}
- %\item \texttt{cd LAPACK}
- %\item \texttt{make timing}
- %\end{list}
- %
- %\item Run the BLAS Timing Programs
- %\begin{list}{}{}
- %\item \texttt{cd LAPACK}
- %\item \texttt{make blas\_timing}
- %\end{list}
- \end{enumerate}
-
- \subsection{Untar the File}
-
- If you received a tar file of LAPACK via the World Wide
- Web or anonymous ftp, enter the following command:
-
- \begin{list}{}
- \item{\texttt{gunzip -c lapack.tgz | tar xvf -}}
- \end{list}
-
- \noindent
- This will create a top-level directory called \texttt{LAPACK}, which
- requires approximately 34 Mbytes of disk space.
- The total space requirements including the object files and executables
- is approximately 100 Mbytes for all four data types.
-
- \subsection{Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}}
-
- Before the libraries can be built, or the testing and timing\footnotemark[\value{footnote}] programs
- run, you must define all machine-specific parameters for the
- architecture to which you are installing LAPACK. All machine-specific
- parameters are contained in the file \texttt{LAPACK/make.inc}.
- An example of \texttt{LAPACK/make.inc} for a LINUX machine with GNU compilers is given
- in \texttt{LAPACK/make.inc.example}, copy that file to LAPACK/make.inc by entering the following command:
-
- \begin{list}{}
- \item{\texttt{cp LAPACK/make.inc.example LAPACK/make.inc}}
- \end{list}
-
- \noindent
- Now modify your \texttt{LAPACK/make.inc} by applying the following recommendations.
- The first line of this \texttt{make.inc} file is:
- \begin{quote}
- SHELL = /bin/sh
- \end{quote}
- and it will need to be modified to \texttt{SHELL = /sbin/sh} if you are
- installing LAPACK on an SGI architecture.
- Second, you will
- need to modify the \texttt{PLAT} definition, which is appended to all
- library names, to specify the architecture to which you are installing
- LAPACK. This features avoids confusion in library names when you are
- installing LAPACK on more than one architecture. Next, you will need
- to modify \texttt{FORTRAN}, \texttt{OPTS}, \texttt{DRVOPTS}, \texttt{NOOPT}, \texttt{LOADER},
- and \texttt{LOADOPTS} to specify
- the compiler, compiler options, compiler options for the testing and
- timing\footnotemark[\value{footnote}] main programs, loader, loader options.
- Next you will have to choose which function you will use to time in the \texttt{SECOND} and \texttt{DSECND} routines.
- \begin{verbatim}
- #The Default : SECOND and DSECND will use a call to the EXTERNAL FUNCTION ETIME
- TIMER = EXT_ETIME
- # For RS6K : SECOND and DSECND will use a call to the EXTERNAL FUNCTION ETIME_
- # TIMER = EXT_ETIME_
- # For gfortran compiler: SECOND and DSECND will use the INTERNAL FUNCTION ETIME
- # TIMER = INT_ETIME
- # If your Fortran compiler does not provide etime (like Nag Fortran Compiler, etc...)
- # SECOND and DSECND will use a call to the INTERNAL FUNCTION CPU_TIME
- # TIMER = INT_CPU_TIME
- # If neither of this works...you can use the NONE value...
- # In that case, SECOND and DSECND will always return 0
- # TIMER = NONE
- \end{verbatim}
- Refer to the section~\ref{second} to get more information.
-
-
- Next, you will need to modify \texttt{ARCH}, \texttt{ARCHFLAGS}, and \texttt{RANLIB} to specify archiver,
- archiver options, and ranlib for your machine. If your architecture
- does not require \texttt{ranlib} to be run after each archive command (as
- is the case with CRAY computers running UNICOS, Hewlett Packard
- computers running HP-UX, or SUN SPARCstations running Solaris), set
- \texttt{ranlib=echo}. And finally, you must
- modify the \texttt{BLASLIB} definition to specify the BLAS library to which
- you will be linking. If an optimized version of the BLAS is available
- on your machine, you are highly recommended to link to that library.
- Otherwise, by default, \texttt{BLASLIB} is set to the Fortran~77 version.
-
- If you want to enable the XBLAS, define the variable \texttt{USEXBLAS}
- to some value, for example \texttt{USEXBLAS = Yes}. Then set the
- variable \texttt{XBLASLIB} to point at the XBLAS library. Note that
- the prototype iterative refinement routines and their testers will not
- be built unless \texttt{USEXBLAS} is defined.
-
- \textbf{NOTE:} Example \texttt{make.inc} include files are contained in the
- \texttt{LAPACK/INSTALL} directory. Please refer to
- Appendix~\ref{appendixd} for machine-specific installation hints, and/or
- the \texttt{release\_notes} file on \texttt{netlib}.
- \begin{quote}
- \url{http://www.netlib.org/lapack/release\_notes}
- \end{quote}
-
- \subsection{Edit the file \texttt{LAPACK/Makefile}}\label{toplevelmakefile}
-
- This \texttt{Makefile} can be modified to perform as much of the
- installation process as the user desires. Ideally, this is the ONLY
- makefile the user must modify. However, modification of lower-level
- makefiles may be necessary if a specific routine needs to be compiled
- with a different level of optimization.
-
- First, edit the definitions of \texttt{blaslib}, \texttt{lapacklib},
- \texttt{tmglib}, \texttt{lapack\_testing}, and \texttt{timing}\footnotemark[\value{footnote}] in the file \texttt{LAPACK/Makefile}
- to specify the data types desired. For example,
- if you only wish to compile the single precision real version of the
- LAPACK library, you would modify the \texttt{lapacklib} definition to be:
-
- \begin{verbatim}
- lapacklib:
- ( cd SRC; $(MAKE) single )
- \end{verbatim}
-
- Likewise, you could specify \texttt{double, complex, or complex16} to
- build the double precision real, single precision complex, or double
- precision complex libraries, respectively. By default, the presence of
- no arguments following the \texttt{make} command will result in the
- building of all four data types.
- The make command can be run more than once to add another
- data type to the library if necessary.
-
- %If you are installing LAPACK on a Silicon Graphics machine, you must
- %modify the respective definitions of \texttt{testing} and \texttt{timing} to be
- %\begin{verbatim}
- %testing:
- % ( cd TESTING; $(MAKE) -f Makefile.sgi )
- %\end{verbatim}
- %and
- %\begin{verbatim}
- %timing:
- % ( cd TIMING; $(MAKE) -f Makefile.sgi )
- %\end{verbatim}
-
- Next, if you will be using a locally available BLAS library, you will need
- to remove \texttt{blaslib} from the \texttt{lib} definition. And finally,
- if you do not wish to build all of the libraries individually and
- likewise run all of the testing and timing separately, you can
- modify the \texttt{all} definition to specify the amount of the
- installation process that you want performed. By default,
- the \texttt{all} definition is set to
- \begin{verbatim}
- all: lapack_install lib lapack_testing blas_testing
- \end{verbatim}
- which will perform all phases of the installation
- process -- testing of machine-dependent routines, building the libraries,
- BLAS testing and LAPACK testing.
-
- The entire installation process will then be performed by typing
- \texttt{make}.
-
- Questions and/or comments can be directed to the
- authors as described in Section~\ref{sendresults}. If test failures
- occur, please refer to the appropriate subsection in
- Section~\ref{furtherdetails}.
-
- If disk space is limited, we suggest building each data type separately
- and/or deleting all object files after building the libraries. Likewise, all
- testing and timing executables can be deleted after the testing and timing
- process is completed. The removal of all object files and executables
- can be accomplished by the following:
-
- \begin{list}{}{}
- \item \texttt{cd LAPACK}
- \item \texttt{make clean}
- \end{list}
-
- \section{Further Details of the Installation Process}\label{furtherdetails}
-
- Alternatively, you can choose to run each of the phases of the
- installation process separately. The following sections give details
- on how this may be achieved.
-
- \subsection{Test and Install the Machine-Dependent Routines.}
-
- There are six machine-dependent functions in the test and timing
- package, at least three of which must be installed. They are
-
- \begin{tabbing}
- MONOMO \= DOUBLE PRECYSION \= \kill
- LSAME \> LOGICAL \> Test if two characters are the same regardless of case \\
- SLAMCH \> REAL \> Determine machine-dependent parameters \\
- DLAMCH \> DOUBLE PRECISION \> Determine machine-dependent parameters \\
- SECOND \> REAL \> Return time in seconds from a fixed starting time \\
- DSECND \> DOUBLE PRECISION \> Return time in seconds from a fixed starting time\\
- ILAENV \> INTEGER \> Checks that NaN and infinity arithmetic are IEEE-754 compliant
- \end{tabbing}
-
- \noindent
- If you are working only in single precision, you do not need to install
- DLAMCH and DSECND, and if you are working only in double precision,
- you do not need to install SLAMCH and SECOND.
-
- These six subroutines are provided in \texttt{LAPACK/INSTALL},
- along with six test programs.
- To compile the six test programs and run the tests, go to \texttt{LAPACK} and
- type \texttt{make lapack\_install}. The test programs are called
- \texttt{testlsame, testslamch, testdlamch, testsecond, testdsecnd} and
- \texttt{testieee}.
- If you do not wish to run all tests, you will need to modify the
- \texttt{lapack\_install} definition in the \texttt{LAPACK/Makefile} to only include the
- tests you wish to run. Otherwise, all tests will be performed.
- The expected results of each test program are described below.
-
- \subsubsection{Installing LSAME}
-
- LSAME is a logical function with two character parameters, A and B.
- It returns .TRUE. if A and B are the same regardless of case, or .FALSE.
- if they are different.
- For example, the expression
-
- \begin{list}{}{}
- \item \texttt{LSAME( UPLO, 'U' )}
- \end{list}
- \noindent
- is equivalent to
- \begin{list}{}{}
- \item \texttt{( UPLO.EQ.'U' ).OR.( UPLO.EQ.'u' )}
- \end{list}
-
- The test program in \texttt{lsametst.f} tests all combinations of
- the same character in upper and lower case for A and B, and two
- cases where A and B are different characters.
-
- Run the test program by typing \texttt{testlsame}.
- If LSAME works correctly, the only message you should see after the
- execution of \texttt{testlsame} is
- \begin{verbatim}
- ASCII character set
- Tests completed
- \end{verbatim}
- The file \texttt{lsame.f} is automatically copied to
- \texttt{LAPACK/BLAS/SRC/} and \texttt{LAPACK/SRC/}.
- The function LSAME is needed by both the BLAS and LAPACK, so it is safer
- to have it in both libraries as long as this does not cause trouble
- in the link phase when both libraries are used.
-
- \subsubsection{Installing SLAMCH and DLAMCH}
-
- SLAMCH and DLAMCH are real functions with a single character parameter
- that indicates the machine parameter to be returned. The test
- program in \texttt{slamchtst.f}
- simply prints out the different values computed by SLAMCH,
- so you need to know something about what the values should be.
- For example, the output of the test program executable \texttt{testslamch}
- for SLAMCH on a Sun SPARCstation is
- \begin{verbatim}
- Epsilon = 5.96046E-08
- Safe minimum = 1.17549E-38
- Base = 2.00000
- Precision = 1.19209E-07
- Number of digits in mantissa = 24.0000
- Rounding mode = 1.00000
- Minimum exponent = -125.000
- Underflow threshold = 1.17549E-38
- Largest exponent = 128.000
- Overflow threshold = 3.40282E+38
- Reciprocal of safe minimum = 8.50706E+37
- \end{verbatim}
- On a Cray machine, the safe minimum underflows its output
- representation and the overflow threshold overflows its output
- representation, so the safe minimum is printed as 0.00000 and overflow
- is printed as R. This is normal.
- If you would prefer to print a representable number, you can modify
- the test program to print SFMIN*100. and RMAX/100. for the safe
- minimum and overflow thresholds.
-
- Likewise, the test executable \texttt{testdlamch} is run for DLAMCH.
-
- If both tests were successful, go to Section~\ref{second}.
-
- If SLAMCH (or DLAMCH) returns an invalid value, you will have to create
- your own version of this function. The following options are used in
- LAPACK and must be set:
-
- \begin{list}{}{}
- \item {`B': } Base of the machine
- \item {`E': } Epsilon (relative machine precision)
- \item {`O': } Overflow threshold
- \item {`P': } Precision = Epsilon*Base
- \item {`S': } Safe minimum (often same as underflow threshold)
- \item {`U': } Underflow threshold
- \end{list}
-
- Some people may be familiar with R1MACH (D1MACH), a primitive
- routine for setting machine parameters in which the user must
- comment out the appropriate assignment statements for the target
- machine. If a version of R1MACH is on hand, the assignments in
- SLAMCH can be made to refer to R1MACH using the correspondence
-
- \begin{list}{}{}
- \item {SLAMCH( `U' )} $=$ R1MACH( 1 )
- \item {SLAMCH( `O' )} $=$ R1MACH( 2 )
- \item {SLAMCH( `E' )} $=$ R1MACH( 3 )
- \item {SLAMCH( `B' )} $=$ R1MACH( 5 )
- \end{list}
-
- \noindent
- The safe minimum returned by SLAMCH( 'S' ) is initially set to the
- underflow value, but if $1/(\mathrm{overflow}) \geq (\mathrm{underflow})$
- it is recomputed as $(1/(\mathrm{overflow})) * ( 1 + \varepsilon )$,
- where $\varepsilon$ is the machine precision.
-
- BE AWARE that the initial call to SLAMCH or DLAMCH is expensive.
- We suggest that installers run it once, save the results, and hard-code
- the constants in the version they put in their library.
-
- \subsubsection{Installing SECOND and DSECND}\label{second}
-
- Both the timing routines\footnotemark[\value{footnote}] and the test routines call SECOND
- (DSECND), a real function with no arguments that returns the time
- in seconds from some fixed starting time.
- Our version of this routine
- returns only ``user time'', and not ``user time $+$ system time''.
- The following version of SECOND in \texttt{second\_EXT\_ETIME.f, second\_INT\_ETIME.f} calls
- ETIME, a Fortran library routine available on some computer systems.
- If ETIME is not available or a better local timing function exists,
- you will have to provide the correct interface to SECOND and DSECND
- on your machine.
-
- Since LAPACK 3.1.1 we provide 5 different flavours of the SECOND and DSECND routines.
- The version that will be used depends on the value of the TIMER variable in the make.inc
-
- \begin{itemize}
- \item If ETIME is available as an external function, set the value of the TIMER variable in your
- make.inc to \texttt{EXT\_ETIME}:\texttt{second\_EXT\_ETIME.f} and \texttt{dsecnd\_EXT\_ETIME.f} will be used.
- Usually on HPPA architectures,
- the compiler and loader flag \texttt{+U77} should be included to access
- the function \texttt{ETIME}.
-
- \item If ETIME\_ is available as an external function, set the value of the TIMER variable in your make.inc
- to \texttt{EXT\_ETIME\_}:\texttt{second\_EXT\_ETIME\_.f} and \texttt{dsecnd\_EXT\_ETIME\_.f} will be used.
- It is the case on some IBM architectures such as IBM RS/6000s.
-
- \item If ETIME is available as an internal function, set the value of the TIMER variable in your make.inc
- to \texttt{INT\_ETIME}:\texttt{second\_INT\_ETIME.f} and \texttt{dsecnd\_INT\_ETIME.f} will be used.
- This is the case with gfortan.
-
- \item If CPU\_TIME is available as an internal function, set the value of the TIMER variable in your make.inc
- to \texttt{INT\_CPU\_TIME}:\texttt{second\_INT\_CPU\_TIME.f} and \texttt{dsecnd\_INT\_CPU\_TIME.f} will be used.
-
- \item If none of these function is available, set the value of the TIMER variable in your make.inc
- to \texttt{NONE:}\texttt{second\_NONE.f} and \texttt{dsecnd\_NONE.f} will be used.
- These routines will always return zero.
- \end{itemize}
-
- The test program in \texttt{secondtst.f}
- performs a million operations using 5000 iterations of
- the SAXPY operation $y := y + \alpha x$ on a vector of length 100.
- The total time and megaflops for this test is reported, then
- the operation is repeated including a call to SECOND on each of
- the 5000 iterations to determine the overhead due to calling SECOND.
- The test program executable is called \texttt{testsecond} (or \texttt{testdsecnd}).
- There is no single right answer, but the times
- in seconds should be positive and the megaflop ratios should be
- appropriate for your machine.
-
- \subsubsection{Testing IEEE arithmetic and ILAENV}\label{testieee}
-
- %\textbf{If you are installing LAPACK on a non-IEEE machine, you MUST
- %modify ILAENV! Otherwise, ILAENV will crash . By default, ILAENV
- %assumes an IEEE machine, and does a test for IEEE-754 compliance.}
-
- As some new routines in LAPACK rely on IEEE-754 compliance,
- two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV
- (\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and
- infinity arithmetic, respectively. By default, ILAENV assumes an IEEE
- machine, and does a test for IEEE-754 compliance. \textbf{NOTE: If you
- are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
- as this test inside ILAENV will crash!}
-
- If \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is
- issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance,
- and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant.
-
- Thus, for non-IEEE machines, the user must hard-code the setting of
- (\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version
- of \texttt{LAPACK/SRC/ilaenv.f} to be put in
- his library. There are also specialized testing and timing\footnotemark[\value{footnote}] versions of
- ILAENV that will also need to be modified.
- \begin{itemize}
- \item Testing/timing version of \texttt{LAPACK/TESTING/LIN/ilaenv.f}
- \item Testing/timing version of \texttt{LAPACK/TESTING/EIG/ilaenv.f}
- \item Testing/timing version of \texttt{LAPACK/TIMING/LIN/ilaenv.f}
- \item Testing/timing version of \texttt{LAPACK/TIMING/EIG/ilaenv.f}
- \end{itemize}
-
- %Some new routines in LAPACK rely on IEEE-754 compliance, and if non-compliance
- %is detected (via a call to the function ILAENV), alternative (slower)
- %algorithms will be chosen.
- %For further details, refer to the leading comments of routines such
- %as \texttt{LAPACK/SRC/sstevr.f}.
-
- The test program in \texttt{LAPACK/INSTALL/tstiee.f} checks an installation
- architecture
- to see if infinity arithmetic and NaN arithmetic are IEEE-754 compliant.
- A warning message to the user is printed if non-compliance is detected.
- This same test is performed inside the function ILAENV. If
- \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is
- issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance,
- and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant.
-
- To avoid this IEEE test being run every time you call
- \texttt{ILAENV( 10, $\ldots$)} or \texttt{ILAENV( 11, $\ldots$ )}, we suggest
- that the user hard-code the setting of
- \texttt{ILAENV=1} or \texttt{ILAENV=0} in the version of \texttt{LAPACK/SRC/ilaenv.f} to be put in
- his library. As aforementioned, there are also specialized testing and
- timing\footnotemark[\value{footnote}] versions of ILAENV that will also need to be modified.
-
- \subsection{Create the BLAS Library}
-
- Ideally, a highly optimized version of the BLAS library already
- exists on your machine.
- In this case you can go directly to Section~\ref{testblas} to
- make the BLAS test programs.
-
- \begin{itemize}
- \item[a)]
- Go to \texttt{LAPACK} and edit the definition of \texttt{blaslib} in the
- file \texttt{Makefile} to specify the data types desired, as in the example
- in Section~\ref{toplevelmakefile}.
-
- If you already have some of the BLAS, you will need to edit the file
- \texttt{LAPACK/BLAS/SRC/Makefile} to comment out the lines
- defining the BLAS you have.
-
- \item[b)]
- Type \texttt{make blaslib}.
- The make command can be run more than once to add another
- data type to the library if necessary.
- \end{itemize}
-
- \noindent
- The BLAS library is created in \texttt{LAPACK/blas\_PLAT.a}, where
- \texttt{PLAT} is the user-defined architecture suffix specified in the file
- \texttt{LAPACK/make.inc}.
-
- \subsection{Run the BLAS Test Programs}\label{testblas}
-
- Test programs for the Level 1, 2, and 3 BLAS are in the directory
- \texttt{LAPACK/BLAS/TESTING}.
-
- To compile and run the Level 1, 2, and 3 BLAS test programs,
- go to \texttt{LAPACK} and type \texttt{make blas\_testing}. The executable
- files are called \texttt{xblat\_s}, \texttt{xblat\_d}, \texttt{xblat\_c}, and
- \texttt{xblat\_z}, where the \_ (underscore) is replaced by 1, 2, or 3,
- depending upon the level of BLAS that it is testing. All executable and
- output files are created in \texttt{LAPACK/BLAS/}.
- For the Level 1 BLAS tests, the output file names are \texttt{sblat1.out},
- \texttt{dblat1.out}, \texttt{cblat1.out}, and \texttt{zblat1.out}. For the Level
- 2 and 3 BLAS, the name of the output file is indicated on the first line of the
- input file and is currently defined to be \texttt{sblat2.out} for
- the Level 2 REAL version, and \texttt{sblat3.out} for the Level 3 REAL
- version, with similar names for the other data types.
-
- If the tests using the supplied data files were completed successfully,
- consider whether the tests were sufficiently thorough.
- For example, on a machine with vector registers, at least one value
- of $N$ greater than the length of the vector registers should be used;
- otherwise, important parts of the compiled code may not be
- exercised by the tests.
- If the tests were not successful, either because the program did not
- finish or the test ratios did not pass the threshold, you will
- probably have to find and correct the problem before continuing.
- If you have been testing a system-specific
- BLAS library, try using the Fortran BLAS for the routines that
- did not pass the tests.
- For more details on the BLAS test programs,
- see \cite{BLAS2-test} and \cite{BLAS3-test}.
-
- \subsection{Create the LAPACK Library}
-
- \begin{itemize}
- \item[a)]
- Go to the directory \texttt{LAPACK} and edit the definition of
- \texttt{lapacklib} in the file \texttt{Makefile} to specify the data types desired,
- as in the example in Section~\ref{toplevelmakefile}.
-
- \item[b)]
- Type \texttt{make lapacklib}.
- The make command can be run more than once to add another
- data type to the library if necessary.
-
- \end{itemize}
-
- \noindent
- The LAPACK library is created in \texttt{LAPACK/lapack\_PLAT.a}, where
- \texttt{PLAT} is the user-defined architecture suffix specified in the file
- \texttt{LAPACK/make.inc}.
-
- \subsection{Create the Test Matrix Generator Library}
-
- \begin{itemize}
- \item[a)]
- Go to the directory \texttt{LAPACK} and edit the definition of \texttt{tmglib}
- in the file \texttt{Makefile} to specify the data types desired, as in the
- example in Section~\ref{toplevelmakefile}.
-
- \item[b)]
- Type \texttt{make tmglib}.
- The make command can be run more than once to add another
- data type to the library if necessary.
-
- \end{itemize}
-
- \noindent
- The test matrix generator library is created in \texttt{LAPACK/tmglib\_PLAT.a},
- where \texttt{PLAT} is the user-defined architecture suffix specified in the
- file \texttt{LAPACK/make.inc}.
-
- \subsection{Run the LAPACK Test Programs}
-
- There are two distinct test programs for LAPACK routines
- in each data type, one for the linear equation routines and
- one for the eigensystem routines.
- In each data type, there is one input file for testing the linear
- equation routines and eighteen input files for testing the eigenvalue
- routines.
- The input files reside in \texttt{LAPACK/TESTING}.
- For more information on the test programs and how to modify the
- input files, please refer to LAPACK Working Note 41~\cite{WN41}.
- % see Section~\ref{moretesting}.
-
- If you do not wish to run each of the tests individually, you can
- go to \texttt{LAPACK}, edit the definition \texttt{lapack\_testing} in the file
- \texttt{Makefile} to specify the data types desired, and type \texttt{make
- lapack\_testing}. This will
- compile and run the tests as described in sections~\ref{testlin}
- and ~\ref{testeig}.
-
- %If you are installing LAPACK on a Silicon Graphics machine, you must
- %modify the definition of \texttt{testing} to be
- %\begin{verbatim}
- %testing:
- % ( cd TESTING; $(MAKE) -f Makefile.sgi )
- %\end{verbatim}
-
- \subsubsection{Testing the Linear Equations Routines}\label{testlin}
-
- \begin{itemize}
-
- \item[a)]
- Go to \texttt{LAPACK/TESTING/LIN} and type \texttt{make} followed by the data types
- desired. The executable files are called \texttt{xlintsts, xlintstc,
- xlintstd}, or \texttt{xlintstz} and are created in \texttt{LAPACK/TESTING}.
-
- \item[b)]
- Go to \texttt{LAPACK/TESTING} and run the tests for each data type.
- For the REAL version, the command is
- \begin{list}{}{}
- \item{} \texttt{xlintsts < stest.in > stest.out}
- \end{list}
-
- \noindent
- The tests using \texttt{xlintstd}, \texttt{xlintstc}, and \texttt{xlintstz} are similar
- with the leading `s' in the input and output file names replaced
- by `d', `c', or `z'.
-
- \end{itemize}
-
- If you encountered failures in this phase of the testing process, please
- refer to Section~\ref{sendresults}.
-
- \subsubsection{Testing the Eigensystem Routines}\label{testeig}
-
- \begin{itemize}
-
- \item[a)]
- Go to \texttt{LAPACK/TESTING/EIG} and type \texttt{make} followed by the data types
- desired. The executable files are called \texttt{xeigtsts,
- xeigtstc, xeigtstd}, and \texttt{xeigtstz} and are created
- in \texttt{LAPACK/TESTING}.
-
- \item[b)]
- Go to \texttt{LAPACK/TESTING} and run the tests for each data type.
- The tests for the eigensystem routines use eighteen separate input files
- for testing the nonsymmetric eigenvalue problem,
- the symmetric eigenvalue problem, the banded symmetric eigenvalue
- problem, the generalized symmetric eigenvalue
- problem, the generalized nonsymmetric eigenvalue problem, the
- singular value decomposition, the banded singular value decomposition,
- the generalized singular value
- decomposition, the generalized QR and RQ factorizations, the generalized
- linear regression model, and the constrained linear least squares
- problem.
- The tests for the REAL version are as follows:
- \begin{list}{}{}
- \item \texttt{xeigtsts < nep.in > snep.out}
- \item \texttt{xeigtsts < sep.in > ssep.out}
- \item \texttt{xeigtsts < svd.in > ssvd.out}
- \item \texttt{xeigtsts < sec.in > sec.out}
- \item \texttt{xeigtsts < sed.in > sed.out}
- \item \texttt{xeigtsts < sgg.in > sgg.out}
- \item \texttt{xeigtsts < sgd.in > sgd.out}
- \item \texttt{xeigtsts < ssg.in > ssg.out}
- \item \texttt{xeigtsts < ssb.in > ssb.out}
- \item \texttt{xeigtsts < sbb.in > sbb.out}
- \item \texttt{xeigtsts < sbal.in > sbal.out}
- \item \texttt{xeigtsts < sbak.in > sbak.out}
- \item \texttt{xeigtsts < sgbal.in > sgbal.out}
- \item \texttt{xeigtsts < sgbak.in > sgbak.out}
- \item \texttt{xeigtsts < glm.in > sglm.out}
- \item \texttt{xeigtsts < gqr.in > sgqr.out}
- \item \texttt{xeigtsts < gsv.in > sgsv.out}
- \item \texttt{xeigtsts < lse.in > slse.out}
- \end{list}
- The tests using \texttt{xeigtstc}, \texttt{xeigtstd}, and \texttt{xeigtstz} also
- use the input files \texttt{nep.in}, \texttt{sep.in}, \texttt{svd.in},
- \texttt{glm.in}, \texttt{gqr.in}, \texttt{gsv.in}, and \texttt{lse.in},
- but the leading `s' in the other input file names must be changed
- to `c', `d', or `z'.
- \end{itemize}
-
- If you encountered failures in this phase of the testing process, please
- refer to Section~\ref{sendresults}.
-
- \subsection{Run the LAPACK Timing Programs (For LAPACK 3.0 and before)}
-
- There are two distinct timing programs for LAPACK routines
- in each data type, one for the linear equation routines and
- one for the eigensystem routines. The timing program for the
- linear equation routines is also used to time the BLAS.
- We encourage you to conduct these timing experiments
- in REAL and COMPLEX or in DOUBLE PRECISION and COMPLEX*16; it is
- not necessary to send timing results in all four data types.
-
- Two sets of input files are provided, a small set and a large set.
- The small data sets are appropriate for a standard workstation or
- other non-vector machine.
- The large data sets are appropriate for supercomputers, vector
- computers, and high-performance workstations.
- We are mainly interested in results from the large data sets, and
- it is not necessary to run both the large and small sets.
- The values of N in the large data sets are about five times larger
- than those in the small data set,
- and the large data sets use additional values for parameters such as the
- block size NB and the leading array dimension LDA.
- Small data sets finished with the \_small in their name , such as
- \texttt{stime\_small.in}, and large data sets finished with \_large in their name,
- such as \texttt{stime\_large.in}.
- Except as noted, the leading `s' in the input file name must be
- replaced by `d', `c', or `z' for the other data types.
-
- We encourage you to obtain timing results with the large data sets,
- as this allows us to compare different machines.
- If this would take too much time, suggestions for paring back the large
- data sets are given in the instructions below.
- We also encourage you to experiment with these timing
- programs and send us any interesting results, such as results for
- larger problems or for a wider range of block sizes.
- The main programs are dimensioned for the large data sets,
- so the parameters in the main program may have to be reduced in order
- to run the small data sets on a small machine, or increased to run
- experiments with larger problems.
-
- The minimum time each subroutine will be timed is set to 0.0 in
- the large data files and to 0.05 in the small data files, and on
- many machines this value should be increased.
- If the timing interval is not long
- enough, the time for the subroutine after subtracting the overhead
- may be very small or zero, resulting in megaflop rates that are
- very large or zero. (To avoid division by zero, the megaflop rate is
- set to zero if the time is less than or equal to zero.)
- The minimum time that should be used depends on the machine and the
- resolution of the clock.
-
- For more information on the timing programs and how to modify the
- input files, please refer to LAPACK Working Note 41~\cite{WN41}.
- % see Section~\ref{moretiming}.
-
- If you do not wish to run each of the timings individually, you can
- go to \texttt{LAPACK}, edit the definition \texttt{lapack\_timing} in the file
- \texttt{Makefile} to specify the data types desired, and type \texttt{make
- lapack\_timing}. This will compile
- and run the timings for the linear equation routines and the eigensystem
- routines (see Sections~\ref{timelin} and ~\ref{timeeig}).
-
- %If you are installing LAPACK on a Silicon Graphics machine, you must
- %modify the definition of \texttt{timing} to be
- %\begin{verbatim}
- %timing:
- % ( cd TIMING; $(MAKE) -f Makefile.sgi )
- %\end{verbatim}
-
- If you encounter failures in any phase of the timing process, please
- feel free to contact the authors as directed in Section~\ref{sendresults}.
- Tell us the
- type of machine on which the tests were run, the version of the operating
- system, the compiler and compiler options that were used,
- and details of the BLAS library or libraries that you used. You should
- also include a copy of the output file in which the failure occurs.
-
- Please note that the BLAS
- timing runs will still need to be run as instructed in ~\ref{timeblas}.
-
- \subsubsection{Timing the Linear Equations Routines}\label{timelin}
-
- The linear equation timing program is found in \texttt{LAPACK/TIMING/LIN}
- and the input files are in \texttt{LAPACK/TIMING}.
- Three input files are provided in each data type for timing the
- linear equation routines, one for square matrices, one for band
- matrices, and one for rectangular matrices. The small data sets for the REAL version
- are \texttt{stime\_small.in}, \texttt{sband\_small.in}, and \texttt{stime2\_small.in}, respectively,
- and the large data sets are
- \texttt{stime\_large.in}, \texttt{sband\_large.in}, and \texttt{stime2\_large.in}.
-
- The timing program for the least squares routines uses special instrumented
- versions of the LAPACK routines to time individual sections of the code.
- The first step in compiling the timing program is therefore to make a library
- of the instrumented routines.
-
- \begin{itemize}
- \item[a)]
- \begin{sloppypar}
- To make a library of the instrumented LAPACK routines, first
- go to \texttt{LAPACK/TIMING/LIN/LINSRC} and type \texttt{make} followed
- by the data types desired, as in the examples of Section~\ref{toplevelmakefile}.
- The library of instrumented code is created in
- \texttt{LAPACK/TIMING/LIN/linsrc\_PLAT.a},
- where \texttt{PLAT} is the user-defined architecture suffix specified in the
- file \texttt{LAPACK/make.inc}.
- \end{sloppypar}
-
- \item[b)]
- To make the linear equation timing programs,
- go to \texttt{LAPACK/TIMING/LIN} and type \texttt{make} followed by the data
- types desired, as in the examples in Section~\ref{toplevelmakefile}.
- The executable files are called \texttt{xlintims},
- \texttt{xlintimc}, \texttt{xlintimd}, and \texttt{xlintimz} and are created
- in \texttt{LAPACK/TIMING}.
-
- \item[c)]
- Go to \texttt{LAPACK/TIMING} and
- make any necessary modifications to the input files.
- You may need to set the minimum time a subroutine will
- be timed to a positive value, or to restrict the size of the tests
- if you are using a computer with performance in between that of a
- workstation and that of a supercomputer.
- The computational requirements can be cut in half by using only one
- value of LDA.
- If it is necessary to also reduce the matrix sizes or the values of
- the blocksize, corresponding changes should be made to the
- BLAS input files (see Section~\ref{timeblas}).
-
- \item[d)]
- Run the programs for each data type you are using.
- For the REAL version, the commands for the small data sets are
-
- \begin{list}{}{}
- \item{} \texttt{xlintims < stime\_small.in > stime\_small.out }
- \item{} \texttt{xlintims < sband\_small.in > sband\_small.out }
- \item{} \texttt{xlintims < stime2\_small.in > stime2\_small.out }
- \end{list}
- or the commands for the large data sets are
- \begin{list}{}{}
- \item{} \texttt{xlintims < stime\_large.in > stime\_large.out }
- \item{} \texttt{xlintims < sband\_large.in > sband\_large.out }
- \item{} \texttt{xlintims < stime2\_large.in > stime2\_large.out }
- \end{list}
-
- \noindent
- Similar commands should be used for the other data types.
- \end{itemize}
-
- \subsubsection{Timing the BLAS}\label{timeblas}
-
- The linear equation timing program is also used to time the BLAS.
- Three input files are provided in each data type for timing the Level
- 2 and 3 BLAS.
- These input files time the BLAS using the matrix shapes encountered
- in the LAPACK routines, and we will use the results to analyze the
- performance of the LAPACK routines.
- For the REAL version, the small data files are
- \texttt{sblasa\_small.in}, \texttt{sblasb\_small.in}, and \texttt{sblasc\_small.in}
- and the large data files are
- \texttt{sblasa\_large.in}, \texttt{sblasb\_large.in}, and \texttt{sblasc\_large.in}.
- There are three sets of inputs because there are three
- parameters in the Level 3 BLAS, M, N, and K, and
- in most applications one of these parameters is small (on the order
- of the blocksize) while the other two are large (on the order of the
- matrix size).
- In \texttt{sblasa\_small.in}, M and N are large but K is
- small, while in \texttt{sblasb\_small.in} the small parameter is M, and
- in \texttt{sblasc\_small.in} the small parameter is N.
- The Level 2 BLAS are timed only in the first data set, where K
- is also used as the bandwidth for the banded routines.
-
- \begin{itemize}
-
- \item[a)]
- Go to \texttt{LAPACK/TIMING} and
- make any necessary modifications to the input files.
- You may need to set the minimum time a subroutine will
- be timed to a positive value.
- If you modified the values of N or NB
- in Section~\ref{timelin}, set M, N, and K accordingly.
- The large parameters among M, N, and K
- should be the same as the matrix sizes used in timing the linear
- equation routines,
- and the small parameter should be the same as the
- blocksizes used in timing the linear equation routines.
- If necessary, the large data set can be simplified by using only one
- value of LDA.
-
- \item[b)]
- Run the programs for each data type you are using.
- For the REAL version, the commands for the small data sets are
-
- \begin{list}{}{}
- \item{} \texttt{xlintims < sblasa\_small.in > sblasa\_small.out }
- \item{} \texttt{xlintims < sblasb\_small.in > sblasb\_small.out }
- \item{} \texttt{xlintims < sblasc\_small.in > sblasc\_small.out }
- \end{list}
- or the commands for the large data sets are
- \begin{list}{}{}
- \item{} \texttt{xlintims < sblasa\_large.in > sblasa\_large.out }
- \item{} \texttt{xlintims < sblasb\_large.in > sblasb\_large.out }
- \item{} \texttt{xlintims < sblasc\_large.in > sblasc\_large.out }
- \end{list}
-
- \noindent
- Similar commands should be used for the other data types.
- \end{itemize}
-
- \subsubsection{Timing the Eigensystem Routines}\label{timeeig}
-
- The eigensystem timing program is found in \texttt{LAPACK/TIMING/EIG}
- and the input files are in \texttt{LAPACK/TIMING}.
- Four input files are provided in each data type for timing the
- eigensystem routines,
- one for the generalized nonsymmetric eigenvalue problem,
- one for the nonsymmetric eigenvalue problem,
- one for the symmetric and generalized symmetric eigenvalue problem,
- and one for the singular value decomposition.
- For the REAL version, the small data sets are called \texttt{sgeptim\_small.in},
- \texttt{sneptim\_small.in}, \texttt{sseptim\_small.in}, and \texttt{ssvdtim\_small.in}, respectively.
- and the large data sets are called \texttt{sgeptim\_large.in}, \texttt{sneptim\_large.in},
- \texttt{sseptim\_large.in}, and \texttt{ssvdtim\_large.in}.
- Each of the four input files reads a different set of parameters,
- and the format of the input is indicated by a 3-character code
- on the first line.
-
- The timing program for eigenvalue/singular value routines accumulates
- the operation count as the routines are executing using special
- instrumented versions of the LAPACK routines. The first step in
- compiling the timing program is therefore to make a library of the
- instrumented routines.
-
- \begin{itemize}
- \item[a)]
- \begin{sloppypar}
- To make a library of the instrumented LAPACK routines, first
- go to \texttt{LAPACK/TIMING/EIG/EIGSRC} and type \texttt{make} followed
- by the data types desired, as in the examples of Section~\ref{toplevelmakefile}.
- The library of instrumented code is created in
- \texttt{LAPACK/TIMING/EIG/eigsrc\_PLAT.a},
- where \texttt{PLAT} is the user-defined architecture suffix specified in the
- file \texttt{LAPACK/make.inc}.
- \end{sloppypar}
-
- \item[b)]
- To make the eigensystem timing programs,
- go to \texttt{LAPACK/TIMING/EIG} and
- type \texttt{make} followed by the data types desired, as in the examples
- of Section~\ref{toplevelmakefile}. The executable files are called
- \texttt{xeigtims}, \texttt{xeigtimc}, \texttt{xeigtimd}, and \texttt{xeigtimz}
- and are created in \texttt{LAPACK/TIMING}.
-
- \item[c)]
- Go to \texttt{LAPACK/TIMING} and
- make any necessary modifications to the input files.
- You may need to set the minimum time a subroutine will
- be timed to a positive value, or to restrict the number of tests
- if you are using a computer with performance in between that of a
- workstation and that of a supercomputer.
- Instead of decreasing the matrix dimensions to reduce the time,
- it would be better to reduce the number of matrix types to be timed,
- since the performance varies more with the matrix size than with the
- type. For example, for the nonsymmetric eigenvalue routines,
- you could use only one matrix of type 4 instead of four matrices of
- types 1, 3, 4, and 6.
- Refer to LAPACK Working Note 41~\cite{WN41} for further details.
- % See Section~\ref{moretiming} for further details.
-
- \item[d)]
- Run the programs for each data type you are using.
- For the REAL version, the commands for the small data sets are
-
- \begin{list}{}{}
- \item{} \texttt{xeigtims < sgeptim\_small.in > sgeptim\_small.out }
- \item{} \texttt{xeigtims < sneptim\_small.in > sneptim\_small.out }
- \item{} \texttt{xeigtims < sseptim\_small.in > sseptim\_small.out }
- \item{} \texttt{xeigtims < ssvdtim\_small.in > ssvdtim\_small.out }
- \end{list}
- or the commands for the large data sets are
- \begin{list}{}{}
- \item{} \texttt{xeigtims < sgeptim\_large.in > sgeptim\_large.out }
- \item{} \texttt{xeigtims < sneptim\_large.in > sneptim\_large.out }
- \item{} \texttt{xeigtims < sseptim\_large.in > sseptim\_large.out }
- \item{} \texttt{xeigtims < ssvdtim\_large.in > ssvdtim\_large.out }
- \end{list}
-
- \noindent
- Similar commands should be used for the other data types.
- \end{itemize}
-
- \subsection{Send the Results to Tennessee}\label{sendresults}
-
- Congratulations! You have now finished installing, testing, and
- timing LAPACK. If you encountered failures in any phase of the
- testing or timing process, please
- consult our \texttt{release\_notes} file on netlib.
- \begin{quote}
- \url{http://www.netlib.org/lapack/release\_notes}
- \end{quote}
- This file contains machine-dependent installation clues which hopefully will
- alleviate your difficulties or at least let you know that other users
- have had similar difficulties on that machine. If there is not an entry
- for your machine or the suggestions do not fix your problem, please feel
- free to contact the authors at
- \begin{list}{}{}
- \item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}.
- \end{list}
- Tell us the
- type of machine on which the tests were run, the version of the operating
- system, the compiler and compiler options that were used,
- and details of the BLAS library or libraries that you used. You should
- also include a copy of the output file in which the failure occurs.
-
- We would like to keep our \texttt{release\_notes} file as up-to-date as possible.
- Therefore, if you do not see an entry for your machine, please contact us
- with your testing results.
-
- Comments and suggestions are also welcome.
-
- We encourage you to make the LAPACK library available to your
- users and provide us with feedback from their experiences.
- %This release of LAPACK is not guaranteed to be compatible
- %with any previous test release.
-
- \subsection{Get support}\label{getsupport}
- First, take a look at the complete installation manual in the LAPACK Working Note 41~\cite{WN41}.
- if you still cannot solve your problem, you have 2 ways to go:
- \begin{itemize}
- \item
- either send a post in the LAPACK forum
- \begin{quote}
- \url{http://icl.cs.utk.edu/lapack-forum}
- \end{quote}
- \item
- or send an email to the LAPACK mailing list:
- \begin{list}{}{}
- \item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}.
- \end{list}
- \end{itemize}
- \section*{Acknowledgments}
-
- Ed Anderson and Susan Blackford contributed to previous versions of this report.
-
- \appendix
-
- \chapter{Caveats}\label{appendixd}
-
- In this appendix we list a few of the machine-specific difficulties we
- have
- encountered in our own experience with LAPACK. A more detailed list
- of machine-dependent problems, bugs, and compiler errors encountered
- in the LAPACK installation process is maintained
- on \emph{netlib}.
- \begin{quote}
- \url{http://www.netlib.org/lapack/release\_notes}
- \end{quote}
-
- We assume the user has installed the machine-specific routines
- correctly and that the Level 1, 2 and 3 BLAS test programs have run
- successfully, so we do not list any warnings associated with those
- routines.
-
- \section{\texttt{LAPACK/make.inc}}
-
- All machine-specific
- parameters are specified in the file \texttt{LAPACK/make.inc}.
-
- The first line of this \texttt{make.inc} file is:
- \begin{quote}
- SHELL = /bin/sh
- \end{quote}
- and will need to be modified to \texttt{SHELL = /sbin/sh} if you are
- installing LAPACK on an SGI architecture.
-
- \section{ETIME}
-
- On HPPA architectures,
- the compiler and loader flag \texttt{+U77} should be included to access
- the function \texttt{ETIME}.
-
- \section{ILAENV and IEEE-754 compliance}
-
- %By default, ILAENV (\texttt{LAPACK/SRC/ilaenv.f}) assumes an IEEE and IEEE-754
- %compliant architecture, and thus sets (\texttt{ILAENV=1}) for (\texttt{ISPEC=10})
- %and (\texttt{ISPEC=11}) settings in ILAENV.
- %
- %If you are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
- %as this test inside ILAENV will crash!
-
- As some new routines in LAPACK rely on IEEE-754 compliance,
- two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV
- (\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and
- infinity arithmetic, respectively. By default, ILAENV assumes an IEEE
- machine, and does a test for IEEE-754 compliance. \textbf{NOTE: If you
- are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
- as this test inside ILAENV will crash!}
-
- Thus, for non-IEEE machines, the user must hard-code the setting of
- (\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version
- of \texttt{LAPACK/SRC/ilaenv.f} to be put in
- his library. For further details, refer to section~\ref{testieee}.
-
- Be aware
- that some IEEE compilers by default do not enforce IEEE-754 compliance, and
- a compiler flag must be explicitly set by the user.
-
- On SGIs for example, you must set the \texttt{-OPT:IEEE\_NaN\_inf=ON} compiler
- flag to enable IEEE-754 compliance.
-
- And lastly, the test inside ILAENV to detect IEEE-754 compliance, will
- result in IEEE exceptions for ``Divide by Zero'' and ``Invalid Operation''.
- Thus, if the user is installing on a machine that issues IEEE exception
- warning messages (like a Sun SPARCstation), the user can disregard these
- messages. To avoid these messages, the user can hard-code the values
- inside ILAENV as explained in section~\ref{testieee}.
-
- \section{Lack of \texttt{/tmp} space}
-
- If \texttt{/tmp} space is small (i.e., less than approximately 16 MB) on your
- architecture, you may run out of space
- when compiling. There are a few possible solutions to this problem.
- \begin{enumerate}
- \item You can ask your system administrator to increase the size of the
- \texttt{/tmp} partition.
- \item You can change the environment variable \texttt{TMPDIR} to point to
- your home directory for temporary space. E.g.,
- \begin{quote}
- \texttt{setenv TMPDIR /home/userid/}
- \end{quote}
- where \texttt{/home/userid/} is the user's home directory.
- \item If your archive command has an \texttt{l} option, you can change the
- archive command to \texttt{ar crl} so that the
- archive command will only place temporary files in the current working
- directory rather than in the default temporary directory /tmp.
- \end{enumerate}
-
- \section{BLAS}
-
- If you suspect a BLAS-related problem and you are linking
- with an optimized version of the BLAS, we would strongly suggest
- as a first step that you link to the Fortran~77 version of
- the suspected BLAS routine and see if the error has disappeared.
-
- We have included test programs for the Level 1 BLAS.
- Users should therefore beware of a common problem in machine-specific
- implementations of xNRM2,
- the function to compute the 2-norm of a vector.
- The Fortran version of xNRM2 avoids underflow or overflow
- by scaling intermediate results, but some library versions of xNRM2
- are not so careful about scaling.
- If xNRM2 is implemented without scaling intermediate results, some of
- the LAPACK test ratios may be unusually high, or
- a floating point exception may occur in the problems scaled near
- underflow or overflow.
- The solution to these problems is to link the Fortran version of
- xNRM2 with the test program. \emph{On some CRAY architectures, the Fortran77
- version of xNRM2 should be used.}
-
- \section{Optimization}
-
- If a large numbers of test failures occur for a specific matrix type
- or operation, it could be that there is an optimization problem with
- your compiler. Thus, the user could try reducing the level of
- optimization or eliminating optimization entirely for those routines
- to see if the failures disappear when you rerun the tests.
-
- %LAPACK is written in Fortran 77. Prospective users with only a
- %Fortran 66 compiler will not be able to use this package.
-
- \section{Compiling testing/timing drivers}
-
- The testing and timing main programs (xCHKAA, xCHKEE, xTIMAA, and
- xTIMEE)
- allocate large amounts of local variables. Therefore, it is vitally
- important that the user know if his compiler by default allocates local
- variables statically or on the stack. It is not uncommon for those
- compilers which place local variables on the stack to cause a stack
- overflow at runtime in the testing or timing process. The user then
- has two options: increase your stack size, or force all local variables
- to be allocated statically.
-
- On HPPA architectures, the
- compiler and loader flag \texttt{-K} should be used when compiling these testing
- and timing main programs to avoid such a stack overflow. I.e., set
- \texttt{DRVOPTS = -K} in the \texttt{LAPACK/make.inc} file.
-
- For similar reasons,
- on SGI architectures, the compiler and loader flag \texttt{-static} should be
- used. I.e., set \texttt{DRVOPTS = -static} in the \texttt{LAPACK/make.inc} file.
-
- \section{IEEE arithmetic}
-
- Some of our test matrices are scaled near overflow or underflow,
- but on the Crays, problems with the arithmetic near overflow and
- underflow forced us to scale by only the square root of overflow
- and underflow.
- The LAPACK auxiliary routine SLABAD (or DLABAD) is called to
- take the square root of underflow and overflow in cases where it
- could cause difficulties.
- We assume we are on a Cray if $ \log_{10} (\mathrm{overflow})$
- is greater than 2000
- and take the square root of underflow and overflow in this case.
- The test in SLABAD is as follows:
- \begin{verbatim}
- IF( LOG10( LARGE ).GT.2000. ) THEN
- SMALL = SQRT( SMALL )
- LARGE = SQRT( LARGE )
- END IF
- \end{verbatim}
- Users of other machines with similar restrictions on the effective
- range of usable numbers may have to modify this test so that the
- square roots are done on their machine as well. \emph{Usually on
- HPPA architectures, a similar restriction in SLABAD should be enforced
- for all testing involving complex arithmetic.}
- SLABAD is located in \texttt{LAPACK/SRC}.
-
- For machines which have a narrow exponent range or lack gradual
- underflow (DEC VAXes for example), it is not uncommon to experience
- failures in sec.out and/or dec.out with SLAQTR/DLAQTR or DTRSYL.
- The failures in SLAQTR/DLAQTR and DTRSYL
- occur with test problems which are very badly scaled when the norm of
- the solution is very close to the underflow
- threshold (or even underflows to zero). We believe that these failures
- could probably be avoided by an even greater degree of care in scaling,
- but we did not want to delay the release of LAPACK any further. These
- tests pass successfully on most other machines. An example failure in
- dec.out on a MicroVAX II looks like the following:
-
- \begin{verbatim}
- Tests of the Nonsymmetric eigenproblem condition estimation routines
- DLALN2, DLASY2, DLANV2, DLAEXC, DTRSYL, DTREXC, DTRSNA, DTRSEN, DLAQTR
-
- Relative machine precision (EPS) = 0.277556D-16
- Safe minimum (SFMIN) = 0.587747D-38
-
- Routines pass computational tests if test ratio is less than 20.00
-
- DEC routines passed the tests of the error exits ( 35 tests done)
- Error in DTRSYL: RMAX = 0.155D+07
- LMAX = 5323 NINFO= 1600 KNT= 27648
- Error in DLAQTR: RMAX = 0.344D+04
- LMAX = 15792 NINFO= 26720 KNT= 45000
- \end{verbatim}
-
- \section{Timing programs}
-
- In the eigensystem timing program, calls are made to the LINPACK
- and EISPACK equivalents of the LAPACK routines to allow a direct
- comparison of performance measures.
- In some cases we have increased the minimum number of
- iterations in the LINPACK and EISPACK routines to allow
- them to converge for our test problems, but
- even this may not be enough.
- One goal of the LAPACK project is to improve the convergence
- properties of these routines, so error messages in the output
- file indicating that a LINPACK or EISPACK routine did not
- converge should not be regarded with alarm.
-
- In the eigensystem timing program, we have equivalenced some work
- arrays and then passed them to a subroutine, where both arrays are
- modified. This is a violation of the Fortran~77 standard, which
- says ``if a subprogram reference causes a dummy argument in the
- referenced subprogram to become associated with another dummy
- argument in the referenced subprogram, neither dummy argument may
- become defined during execution of the subprogram.''
- \footnote{ ANSI X3.9-1978, sec. 15.9.3.6}
- If this causes any difficulties, the equivalence
- can be commented out as explained in the comments for the main
- eigensystem timing programs.
-
- %\section*{MACHINE-SPECIFIC DIFFICULTIES}
- %Some IBM compilers do not recognize DBLE as a generic function as used
- %in LAPACK. The software tools we use to convert from single precision
- %to double precision convert REAL(C) and AIMAG(C), where C is COMPLEX,
- %to DBLE(Z) and DIMAG(Z), where Z is COMPLEX*16, but
- %IBM compilers use DREAL(Z) and DIMAG(Z) to take the real and
- %imaginary parts of a double complex number.
- %IBM users can fix this problem by changing DBLE to DREAL when the
- %argument of DBLE is COMPLEX*16.
- %
- %IBM compilers do not permit the data type COMPLEX*16 in a FUNCTION
- %subprogram definition. The data type on the first line of the
- %function subprogram must be changed from COMPLEX*16 to DOUBLE COMPLEX
- %for the following functions:
- %
- %\begin{tabbing}
- %\dent ZLATMOO \= from the test matrix generator library \kill
- %\dent ZBEG \> from the Level 2 BLAS test program \\
- %\dent ZBEG \> from the Level 3 BLAS test program \\
- %\dent ZLADIV \> from the LAPACK library \\
- %\dent ZLARND \> from the test matrix generator library \\
- %\dent ZLATM2 \> from the test matrix generator library \\
- %\dent ZLATM3 \> from the test matrix generator library
- %\end{tabbing}
- %The functions ZDOTC and ZDOTU from the Level 1 BLAS are already
- %declared DOUBLE COMPLEX. If that doesn't work, try the declaration
- %COMPLEX FUNCTION*16.
-
-
- \newpage
- \addcontentsline{toc}{section}{Bibliography}
-
- \begin{thebibliography}{9}
-
- \bibitem{LUG}
- E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra,
- J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney,
- S. Ostrouchov, and D. Sorensen,
- \textit{LAPACK Users' Guide}, Second Edition,
- {SIAM}, Philadelphia, PA, 1995.
-
- \bibitem{WN16}
- E. Anderson and J. Dongarra,
- \textit{LAPACK Working Note 16:
- Results from the Initial Release of LAPACK},
- University of Tennessee, CS-89-89, November 1989.
-
- \bibitem{WN41}
- E. Anderson, J. Dongarra, and S. Ostrouchov,
- \textit{LAPACK Working Note 41:
- Installation Guide for LAPACK},
- University of Tennessee, CS-92-151, February 1992 (revised June 1999).
-
- \bibitem{WN5}
- C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum,
- S. Hammarling, and D. Sorensen,
- \textit{LAPACK Working Note \#5: Provisional Contents},
- Argonne National Laboratory, ANL-88-38, September 1988.
-
- \bibitem{WN13}
- Z. Bai, J. Demmel, and A. McKenney,
- \textit{LAPACK Working Note \#13: On the Conditioning of the Nonsymmetric
- Eigenvalue Problem: Theory and Software},
- University of Tennessee, CS-89-86, October 1989.
-
- \bibitem{XBLAS}
- X. S. Li, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar,
- W. Kahan, S. Y. Kang, A. Kapur, M. C. Martin, B. J. Thompson, T. Tung,
- and D. J. Yoo, \textit{Design, implementation and testing of extended
- and mixed precision BLAS},
- \textit{ACM Trans. Math. Soft.}, 28, 2:152--205, June 2002.
-
- \bibitem{BLAS3}
- J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling,
- ``A Set of Level 3 Basic Linear Algebra Subprograms,''
- \textit{ACM Trans. Math. Soft.}, 16, 1:1-17, March 1990
- %Argonne National Laboratory, ANL-MCS-P88-1, August 1988.
-
- \bibitem{BLAS3-test}
- J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling,
- ``A Set of Level 3 Basic Linear Algebra Subprograms:
- Model Implementation and Test Programs,''
- \textit{ACM Trans. Math. Soft.}, 16, 1:18-28, March 1990
- %Argonne National Laboratory, ANL-MCS-TM-119, June 1988.
-
- \bibitem{BLAS2}
- J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson,
- ``An Extended Set of Fortran Basic Linear Algebra Subprograms,''
- \textit{ACM Trans. Math. Soft.}, 14, 1:1-17, March 1988.
-
- \bibitem{BLAS2-test}
- J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson,
- ``An Extended Set of Fortran Basic Linear Algebra Subprograms:
- Model Implementation and Test Programs,''
- \textit{ACM Trans. Math. Soft.}, 14, 1:18-32, March 1988.
-
- \bibitem{BLAS1}
- C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh,
- ``Basic Linear Algebra Subprograms for Fortran Usage,''
- \textit{ACM Trans. Math. Soft.}, 5, 3:308-323, September 1979.
-
- \end{thebibliography}
-
- \end{document}
|