You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

lawn81.tex 71 kB


  1. \documentclass[11pt]{report}
  2. \usepackage{indentfirst}
  3. \usepackage[body={6in,8.5in}]{geometry}
  4. \usepackage{hyperref}
  5. \usepackage{graphicx}
  6. \DeclareGraphicsRule{.ps}{eps}{}{}
  7. \renewcommand{\thesection}{\arabic{section}}
  8. \setcounter{tocdepth}{3}
  9. \setcounter{secnumdepth}{3}
  10. \begin{document}
  11. \begin{center}
  12. {\Large LAPACK Working Note 81\\
  13. Quick Installation Guide for LAPACK on Unix Systems\footnote{This work was
  14. supported by NSF Grant No. ASC-8715728 and NSF Grant No. 0444486}}
  15. \end{center}
  16. \begin{center}
  17. % Edward Anderson\footnote{Current address: Cray Research Inc.,
  18. % 655F Lone Oak Drive, Eagan, MN 55121},
  19. The LAPACK Authors\\
  20. Department of Computer Science \\
  21. University of Tennessee \\
  22. Knoxville, Tennessee 37996-1301 \\
  23. \end{center}
  24. \begin{center}
  25. REVISED: VERSION 3.1.1, February 2007 \\
  26. REVISED: VERSION 3.2.0, November 2008
  27. \end{center}
  28. \begin{center}
  29. Abstract
  30. \end{center}
  31. This working note describes how to install, and test version 3.2.0
  32. of LAPACK, a linear algebra package for high-performance
  33. computers, on a Unix System. The timing routines are not actually included in
  34. release 3.2.0, and that part of the LAWN refers to release 3.0. Also,
  35. version 3.2.0 contains many prototype routines needing user feedback.
  36. Non-Unix installation instructions and
  37. further details of the testing and timing suites are only contained in
  38. LAPACK Working Note 41, and not in this abbreviated version.
  39. %Separate instructions are provided for the Unix and non-Unix
  40. %versions of the test package.
  41. %Further details are also given on the design of the test and timing
  42. %programs.
  43. \newpage
  44. \tableofcontents
  45. \newpage
  46. % Introduction to Implementation Guide
  47. \section{Introduction}
  48. LAPACK is a linear algebra library for high-performance
  49. computers.
  50. The library includes Fortran subroutines for
  51. the analysis and solution of systems of simultaneous linear algebraic
  52. equations, linear least-squares problems, and matrix eigenvalue
  53. problems.
  54. Our approach to achieving high efficiency is based on the use of
  55. a standard set of Basic Linear Algebra Subprograms (the BLAS),
  56. which can be optimized for each computing environment.
  57. By confining most of the computational work to the BLAS,
  58. the subroutines should be
  59. transportable and efficient across a wide range of computers.
  60. This working note describes how to install, test, and time this
  61. release of LAPACK on a Unix System.
  62. The instructions for installing, testing, and timing
  63. \footnote{timing are only provided in LAPACK 3.0 and before}
  64. are designed for a person whose
  65. responsibility is the maintenance of a mathematical software library.
  66. We assume the installer has experience in compiling and running
  67. Fortran programs and in creating object libraries.
  68. The installation process involves untarring the file, creating a set of
  69. libraries, and compiling and running the test and timing programs
  70. \footnotemark[\value{footnote}].
  71. %This guide combines the instructions for the Unix and non-Unix
  72. %versions of the LAPACK test package (the non-Unix version is in Appendix
  73. %~\ref{appendixe}).
  74. %At this time, the non-Unix version of LAPACK can only be obtained
  75. %after first untarring the Unix tar tape and then following the instructions in
  76. %Appendix ~\ref{appendixe}.
  77. Section~\ref{fileformat} describes how the files are organized in the
  78. file, and
  79. Section~\ref{overview} gives a general overview of the parts of the test package.
  80. Step-by-step instructions appear in Section~\ref{installation}.
  81. %for the Unix version and in the appendix for the non-Unix version.
  82. For users desiring additional information, please refer to LAPACK
  83. Working Note 41.
  84. % Sections~\ref{moretesting}
  85. %and ~\ref{moretiming} give
  86. %details of the test and timing programs and their input files.
  87. %Appendices ~\ref{appendixa} and ~\ref{appendixb} briefly describe
  88. %the LAPACK routines and auxiliary routines provided
  89. %in this release.
  90. %Appendix ~\ref{appendixc} lists the operation counts we have computed
  91. %for the BLAS and for some of the LAPACK routines.
  92. Appendix ~\ref{appendixd}, entitled ``Caveats'', is a compendium of the known
  93. problems from our own experiences, with suggestions on how to
  94. overcome them.
  95. \textbf{It is strongly advised that the user read Appendix
  96. A before proceeding with the installation process.}
  97. %Appendix E contains the execution times of the different test
  98. %and timing runs on two sample machines.
  99. %Appendix ~\ref{appendixe} contains the instructions to install LAPACK on a non-Unix
  100. %system.
  101. \section{Revisions Since the First Public Release}
  102. Since its first public release in February, 1992, LAPACK has had
  103. several updates, which have encompassed the introduction of new routines
  104. as well as extending the functionality of existing routines. The first
  105. update,
  106. June 30, 1992, was version 1.0a; the second update, October 31, 1992,
  107. was version 1.0b; the third update, March 31, 1993, was version 1.1;
  108. version 2.0 on September 30, 1994, coincided with the release of the
  109. Second Edition of the LAPACK Users' Guide;
  110. version 3.0 on June 30, 1999 coincided with the release of the Third Edition of
  111. the LAPACK Users' Guide;
  112. version 3.1 was released on November, 2006;
  113. version 3.1.1 was released on November, 2007;
  114. and version 3.2.0 was released on November, 2008.
  115. All LAPACK routines reflect the current version number with the date
  116. on the routine indicating when it was last modified.
  117. For more information on revisions in the latest release, please refer
  118. to the \texttt{revisions.info} file in the lapack directory on netlib.
  119. \begin{quote}
  120. \url{http://www.netlib.org/lapack/revisions.info}
  121. \end{quote}
  122. %The distribution \texttt{tar} file \texttt{lapack.tar.z} that is
  123. %available on netlib is always the most up-to-date.
  124. %
  125. %On-line manpages (troff files) for LAPACK driver and computational
  126. %routines, as well as most of the BLAS routines, are available via
  127. %the \texttt{lapack} index on netlib.
  128. \section{File Format}\label{fileformat}
  129. The software for LAPACK is distributed in the form of a
  130. gzipped tar file (via anonymous ftp or the World Wide Web),
  131. which contains the Fortran source for LAPACK,
  132. the Basic Linear Algebra Subprograms
  133. (the Level 1, 2, and 3 BLAS) needed by LAPACK, the testing programs,
  134. and the timing programs\footnotemark[\value{footnote}].
  135. Users who wish to have a non-Unix installation should refer to LAPACK
  136. Working Note 41,
  137. although the overview in section~\ref{overview} applies to both the Unix and non-Unix
  138. versions.
  139. %Users who wish to have a non-Unix installation should go to Appendix ~\ref{appendixe},
  140. %although the overview in section ~\ref{overview} applies to both the Unix and non-Unix
  141. %versions.
  142. The package may be accessed via the World Wide Web through
  143. the URL address:
  144. \begin{quote}
  145. \url{http://www.netlib.org/lapack/lapack.tgz}
  146. \end{quote}
  147. Or, you can retrieve the file via anonymous ftp at netlib:
  148. \begin{verbatim}
  149. ftp ftp.netlib.org
  150. login: anonymous
  151. password: <your email address>
  152. cd lapack
  153. binary
  154. get lapack.tgz
  155. quit
  156. \end{verbatim}
  157. The software in the \texttt{tar} file
  158. is organized in a number of essential directories as shown
  159. in Figure 1. Please note that this figure does not reflect everything
  160. that is contained in the \texttt{LAPACK} directory. Input and instructional
  161. files are also located at various levels.
  162. \begin{figure}
  163. \vspace{11pt}
  164. \centerline{\includegraphics[width=6.5in,height=3in]{org2.ps}}
  165. \caption{Unix organization of LAPACK 3.0}
  166. \vspace{11pt}
  167. \end{figure}
  168. Libraries are created in the LAPACK directory and
  169. executable files are created in one of the directories BLAS, TESTING,
  170. or TIMING\footnotemark[\value{footnote}]. Input files for the test and
  171. timing\footnotemark[\value{footnote}] programs are also
  172. found in these three directories so that testing may be carried out
  173. in the directories LAPACK/BLAS, LAPACK/TESTING, and LAPACK/TIMING \footnotemark[\value{footnote}].
  174. A top-level makefile in the LAPACK directory is provided to perform the
  175. entire installation procedure.
  176. \section{Overview of Tape Contents}\label{overview}
  177. Most routines in LAPACK occur in four versions: REAL,
  178. DOUBLE PRECISION, COMPLEX, and COMPLEX*16.
  179. The first three versions (REAL, DOUBLE PRECISION, and COMPLEX)
  180. are written in standard Fortran and are completely portable;
  181. the COMPLEX*16 version is provided for
  182. those compilers which allow this data type.
  183. Some routines use features of Fortran 90.
  184. For convenience, we often refer to routines by their single precision
  185. names; the leading `S' can be replaced by a `D' for double precision,
  186. a `C' for complex, or a `Z' for complex*16.
  187. For LAPACK use and testing you must decide which version(s)
  188. of the package you intend to install at your site (for example,
  189. REAL and COMPLEX on a Cray computer or DOUBLE PRECISION and
  190. COMPLEX*16 on an IBM computer).
  191. \subsection{LAPACK Routines}
  192. There are three classes of LAPACK routines:
  193. \begin{itemize}
  194. \item \textbf{driver} routines solve a complete problem, such as solving
  195. a system of linear equations or computing the eigenvalues of a real
  196. symmetric matrix. Users are encouraged to use a driver routine if there
  197. is one that meets their requirements. The driver routines are listed
  198. in LAPACK Working Note 41~\cite{WN41} and the LAPACK Users' Guide~\cite{LUG}.
  199. %in Appendix ~\ref{appendixa}.
  200. \item \textbf{computational} routines, also called simply LAPACK routines,
  201. perform a distinct computational task, such as computing
  202. the $LU$ decomposition of an $m$-by-$n$ matrix or finding the
  203. eigenvalues and eigenvectors of a symmetric tridiagonal matrix using
  204. the $QR$ algorithm.
  205. The LAPACK routines are listed in LAPACK Working Note 41~\cite{WN41}
  206. and the LAPACK Users' Guide~\cite{LUG}.
  207. %The LAPACK routines are listed in Appendix ~\ref{appendixa}; see also LAPACK
  208. %Working Note \#5 \cite{WN5}.
  209. \item \textbf{auxiliary} routines are all the other subroutines called
  210. by the driver routines and computational routines.
  211. %Among them are subroutines to perform subtasks of block algorithms,
  212. %in particular, the unblocked versions of the block algorithms;
  213. %extensions to the BLAS, such as matrix-vector operations involving
  214. %complex symmetric matrices;
  215. %the special routines LSAME and XERBLA which first appeared with the
  216. %BLAS;
  217. %and a number of routines to perform common low-level computations,
  218. %such as computing a matrix norm, generating an elementary Householder
  219. %transformation, and applying a sequence of plane rotations.
  220. %Many of the auxiliary routines may be of use to numerical analysts
  221. %or software developers, so we have documented the Fortran source for
  222. %these routines with the same level of detail used for the LAPACK
  223. %routines and driver routines.
  224. The auxiliary routines are listed in LAPACK Working Note 41~\cite{WN41}
  225. and the LAPACK Users' Guide~\cite{LUG}.
  226. %The auxiliary routines are listed in Appendix ~\ref{appendixb}.
  227. \end{itemize}
  228. \subsection{Level 1, 2, and 3 BLAS}
  229. The BLAS are a set of Basic Linear Algebra Subprograms that perform
  230. vector-vector, matrix-vector, and matrix-matrix operations.
  231. LAPACK is designed around the Level 1, 2, and 3 BLAS, and nearly all
  232. of the parallelism in the LAPACK routines is contained in the BLAS.
  233. Therefore,
  234. the key to getting good performance from LAPACK lies in having an
  235. efficient version of the BLAS optimized for your particular machine.
  236. Optimized BLAS libraries are available on a variety of architectures,
  237. refer to the BLAS FAQ on netlib for further information.
  238. \begin{quote}
  239. \url{http://www.netlib.org/blas/faq.html}
  240. \end{quote}
  241. There are also freely available BLAS generators that automatically
  242. tune a subset of the BLAS for a given architecture. E.g.,
  243. \begin{quote}
  244. \url{http://www.netlib.org/atlas/}
  245. \end{quote}
  246. And, if all else fails, there is the Fortran~77 reference implementation
  247. of the Level 1, 2, and 3 BLAS available on netlib (also included in
  248. the LAPACK distribution tar file).
  249. \begin{quote}
  250. \url{http://www.netlib.org/blas/blas.tgz}
  251. \end{quote}
  252. No matter which BLAS library is used, the BLAS test programs should
  253. always be run.
  254. Users should not expect too much from the Fortran~77 reference implementation
  255. BLAS; these versions were written to define the basic operations and do not
  256. employ the standard tricks for optimizing Fortran code.
  257. The formal definitions of the Level 1, 2, and 3 BLAS
  258. are in \cite{BLAS1}, \cite{BLAS2}, and \cite{BLAS3}.
  259. The BLAS Quick Reference card is available on netlib.
  260. \subsection{Mixed- and Extended-Precision BLAS: XBLAS}
  261. The XBLAS extend the BLAS to work with mixed input and output
  262. precisions as well as using extra precision internally. The XBLAS are
  263. used in the prototype extra-precise iterative refinement codes.
  264. The current release of the XBLAS is available through
  265. Netlib\footnote{Development versions may be available through
  266. \url{http://www.cs.berkeley.edu/~yozo/} or
  267. \url{http://www.nersc.gov/~xiaoye/XBLAS/}.} at
  268. \begin{quote}
  269. \url{http://www.netlib.org/xblas}
  270. \end{quote}
  271. Their formal definition is in \cite{XBLAS}.
  272. \subsection{LAPACK Test Routines}
  273. This release contains two distinct test programs for LAPACK routines
  274. in each data type. One test program tests the routines for solving
  275. linear equations and linear least squares problems,
  276. and the other tests routines for the matrix eigenvalue problem.
  277. The routines for generating test matrices are used by both test
  278. programs and are compiled into a library for use by both test programs.
  279. \subsection{LAPACK Timing Routines (for LAPACK 3.0 and before) }
  280. This release also contains two distinct timing programs for the
  281. LAPACK routines in each data type.
  282. The linear equation timing program gathers performance data in
  283. megaflops on the factor, solve, and inverse routines for solving
  284. linear systems, the routines to generate or apply an orthogonal matrix
  285. given as a sequence of elementary transformations, and the reductions
  286. to bidiagonal, tridiagonal, or Hessenberg form for eigenvalue
  287. computations.
  288. The operation counts used in computing the megaflop rates are computed
  289. from a formula;
  290. see LAPACK Working Note 41~\cite{WN41}.
  291. % see Appendix ~\ref{appendixc}.
  292. The eigenvalue timing program is used with the eigensystem routines
  293. and returns the execution time, number of floating point operations, and
  294. megaflop rate for each of the requested subroutines.
  295. In this program, the number of operations is computed while the
  296. code is executing using special instrumented versions of the LAPACK
  297. subroutines.
  298. \section{Installing LAPACK on a Unix System}\label{installation}
  299. Installing, testing, and timing\footnotemark[\value{footnote}] the Unix version of LAPACK
  300. involves the following steps:
  301. \begin{enumerate}
  302. \item Gunzip and tar the file.
  303. \item Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}.
  304. \item Edit the file \texttt{LAPACK/Makefile} and type \texttt{make}.
  305. %\item Test and Install the Machine-Dependent Routines \\
  306. %\emph{(WARNING: You may need to supply a correct version of second.f and
  307. %dsecnd.f for your machine)}
  308. %{\tt
  309. %\begin{list}{}{}
  310. %\item cd LAPACK
  311. %\item make install
  312. %\end{list} }
  313. %
  314. %\item Create the BLAS Library, \emph{if necessary} \\
  315. %\emph{(NOTE: For best performance, it is recommended you use the manufacturers' BLAS)}
  316. %{\tt
  317. %\begin{list}{}{}
  318. %\item \texttt{cd LAPACK}
  319. %\item \texttt{make blaslib}
  320. %\end{list} }
  321. %
  322. %\item Run the Level 1, 2, and 3 BLAS Test Programs
  323. %\begin{list}{}{}
  324. %\item \texttt{cd LAPACK}
  325. %\item \texttt{make blas\_testing}
  326. %\end{list}
  327. %
  328. %\item Create the LAPACK Library
  329. %\begin{list}{}{}
  330. %\item \texttt{cd LAPACK}
  331. %\item \texttt{make lapacklib}
  332. %\end{list}
  333. %
  334. %\item Create the Library of Test Matrix Generators
  335. %\begin{list}{}{}
  336. %\item \texttt{cd LAPACK}
  337. %\item \texttt{make tmglib}
  338. %\end{list}
  339. %
  340. %\item Run the LAPACK Test Programs
  341. %\begin{list}{}{}
  342. %\item \texttt{cd LAPACK}
  343. %\item \texttt{make testing}
  344. %\end{list}
  345. %
  346. %\item Run the LAPACK Timing Programs
  347. %\begin{list}{}{}
  348. %\item \texttt{cd LAPACK}
  349. %\item \texttt{make timing}
  350. %\end{list}
  351. %
  352. %\item Run the BLAS Timing Programs
  353. %\begin{list}{}{}
  354. %\item \texttt{cd LAPACK}
  355. %\item \texttt{make blas\_timing}
  356. %\end{list}
  357. \end{enumerate}
  358. \subsection{Untar the File}
  359. If you received a tar file of LAPACK via the World Wide
  360. Web or anonymous ftp, enter the following command:
  361. \begin{list}{}
  362. \item{\texttt{gunzip -c lapack.tgz | tar xvf -}}
  363. \end{list}
  364. \noindent
  365. This will create a top-level directory called \texttt{LAPACK}, which
  366. requires approximately 34 Mbytes of disk space.
  367. The total space requirements including the object files and executables
  368. is approximately 100 Mbytes for all four data types.
  369. \subsection{Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}}
  370. Before the libraries can be built, or the testing and timing\footnotemark[\value{footnote}] programs
  371. run, you must define all machine-specific parameters for the
  372. architecture to which you are installing LAPACK. All machine-specific
  373. parameters are contained in the file \texttt{LAPACK/make.inc}.
  374. An example of \texttt{LAPACK/make.inc} for a LINUX machine with GNU compilers is given
  375. in \texttt{LAPACK/make.inc.example}, copy that file to LAPACK/make.inc by entering the following command:
  376. \begin{list}{}
  377. \item{\texttt{cp LAPACK/make.inc.example LAPACK/make.inc}}
  378. \end{list}
  379. \noindent
  380. Now modify your \texttt{LAPACK/make.inc} by applying the following recommendations.
  381. The first line of this \texttt{make.inc} file is:
  382. \begin{quote}
  383. SHELL = /bin/sh
  384. \end{quote}
  385. and it will need to be modified to \texttt{SHELL = /sbin/sh} if you are
  386. installing LAPACK on an SGI architecture.
  387. Next, you will need to modify \texttt{FC}, \texttt{FFLAGS},
  388. \texttt{FFLAGS\_DRV}, \texttt{FFLAGS\_NOOPT}, and \texttt{LDFLAGS} to specify
  389. the compiler, compiler options, compiler options for the testing and
  390. timing\footnotemark[\value{footnote}] main programs, and linker options.
  391. Next you will have to choose which function you will use to time in the
  392. \texttt{SECOND} and \texttt{DSECND} routines.
  393. \begin{verbatim}
  394. # Default: SECOND and DSECND will use a call to the
  395. # EXTERNAL FUNCTION ETIME
  396. #TIMER = EXT_ETIME
  397. # For RS6K: SECOND and DSECND will use a call to the
  398. # EXTERNAL FUNCTION ETIME_
  399. #TIMER = EXT_ETIME_
  400. # For gfortran compiler: SECOND and DSECND will use a call to the
  401. # INTERNAL FUNCTION ETIME
  402. TIMER = INT_ETIME
  403. # If your Fortran compiler does not provide etime (like Nag Fortran
  404. # Compiler, etc...) SECOND and DSECND will use a call to the
  405. # INTERNAL FUNCTION CPU_TIME
  406. #TIMER = INT_CPU_TIME
  407. # If none of these work, you can use the NONE value.
  408. # In that case, SECOND and DSECND will always return 0.
  409. #TIMER = NONE
  410. \end{verbatim}
  411. Refer to the section~\ref{second} to get more information.
  412. Next, you will need to modify \texttt{AR}, \texttt{ARFLAGS}, and \texttt{RANLIB} to specify archiver,
  413. archiver options, and ranlib for your machine. If your architecture
  414. does not require \texttt{ranlib} to be run after each archive command (as
  415. is the case with CRAY computers running UNICOS, Hewlett Packard
  416. computers running HP-UX, or SUN SPARCstations running Solaris), set
  417. \texttt{RANLIB = echo}. And finally, you must
  418. modify the \texttt{BLASLIB} definition to specify the BLAS library to which
  419. you will be linking. If an optimized version of the BLAS is available
  420. on your machine, you are highly recommended to link to that library.
  421. Otherwise, by default, \texttt{BLASLIB} is set to the Fortran~77 version.
  422. If you want to enable the XBLAS, define the variable \texttt{USEXBLAS}
  423. to some value, for example \texttt{USEXBLAS = Yes}. Then set the
  424. variable \texttt{XBLASLIB} to point at the XBLAS library. Note that
  425. the prototype iterative refinement routines and their testers will not
  426. be built unless \texttt{USEXBLAS} is defined.
  427. \textbf{NOTE:} Example \texttt{make.inc} include files are contained in the
  428. \texttt{LAPACK/INSTALL} directory. Please refer to
  429. Appendix~\ref{appendixd} for machine-specific installation hints, and/or
  430. the \texttt{release\_notes} file on \texttt{netlib}.
  431. \begin{quote}
  432. \url{http://www.netlib.org/lapack/release\_notes}
  433. \end{quote}
  434. \subsection{Edit the file \texttt{LAPACK/Makefile}}\label{toplevelmakefile}
  435. This \texttt{Makefile} can be modified to perform as much of the
  436. installation process as the user desires. Ideally, this is the ONLY
  437. makefile the user must modify. However, modification of lower-level
  438. makefiles may be necessary if a specific routine needs to be compiled
  439. with a different level of optimization.
  440. First, edit the definitions of \texttt{blaslib}, \texttt{lapacklib},
  441. \texttt{tmglib}, \texttt{lapack\_testing}, and \texttt{timing}\footnotemark[\value{footnote}] in the file \texttt{LAPACK/Makefile}
  442. to specify the data types desired. For example,
  443. if you only wish to compile the single precision real version of the
  444. LAPACK library, you would modify the \texttt{lapacklib} definition to be:
  445. \begin{verbatim}
  446. lapacklib:
  447. $(MAKE) -C SRC single
  448. \end{verbatim}
  449. Likewise, you could specify \texttt{double, complex, or complex16} to
  450. build the double precision real, single precision complex, or double
  451. precision complex libraries, respectively. By default, the presence of
  452. no arguments following the \texttt{make} command will result in the
  453. building of all four data types.
  454. The make command can be run more than once to add another
  455. data type to the library if necessary.
  456. %If you are installing LAPACK on a Silicon Graphics machine, you must
  457. %modify the respective definitions of \texttt{testing} and \texttt{timing} to be
  458. %\begin{verbatim}
  459. %testing:
  460. % ( cd TESTING; $(MAKE) -f Makefile.sgi )
  461. %\end{verbatim}
  462. %and
  463. %\begin{verbatim}
  464. %timing:
  465. % ( cd TIMING; $(MAKE) -f Makefile.sgi )
  466. %\end{verbatim}
  467. Next, if you will be using a locally available BLAS library, you will need
  468. to remove \texttt{blaslib} from the \texttt{lib} definition. And finally,
  469. if you do not wish to build all of the libraries individually and
  470. likewise run all of the testing and timing separately, you can
  471. modify the \texttt{all} definition to specify the amount of the
  472. installation process that you want performed. By default,
  473. the \texttt{all} definition is set to
  474. \begin{verbatim}
  475. all: lapack_install lib lapack_testing blas_testing
  476. \end{verbatim}
  477. which will perform all phases of the installation
  478. process -- testing of machine-dependent routines, building the libraries,
  479. BLAS testing and LAPACK testing.
  480. The entire installation process will then be performed by typing
  481. \texttt{make}.
  482. Questions and/or comments can be directed to the
  483. authors as described in Section~\ref{sendresults}. If test failures
  484. occur, please refer to the appropriate subsection in
  485. Section~\ref{furtherdetails}.
  486. If disk space is limited, we suggest building each data type separately
  487. and/or deleting all object files after building the libraries. Likewise, all
  488. testing and timing executables can be deleted after the testing and timing
  489. process is completed. The removal of all object files and executables
  490. can be accomplished by the following:
  491. \begin{list}{}{}
  492. \item \texttt{cd LAPACK}
  493. \item \texttt{make cleanobj}
  494. \end{list}
  495. \section{Further Details of the Installation Process}\label{furtherdetails}
  496. Alternatively, you can choose to run each of the phases of the
  497. installation process separately. The following sections give details
  498. on how this may be achieved.
  499. \subsection{Test and Install the Machine-Dependent Routines.}
  500. There are six machine-dependent functions in the test and timing
  501. package, at least three of which must be installed. They are
  502. \begin{tabbing}
  503. MONOMO \= DOUBLE PRECISION \= \kill
  504. LSAME \> LOGICAL \> Test if two characters are the same regardless of case \\
  505. SLAMCH \> REAL \> Determine machine-dependent parameters \\
  506. DLAMCH \> DOUBLE PRECISION \> Determine machine-dependent parameters \\
  507. SECOND \> REAL \> Return time in seconds from a fixed starting time \\
  508. DSECND \> DOUBLE PRECISION \> Return time in seconds from a fixed starting time\\
  509. ILAENV \> INTEGER \> Checks that NaN and infinity arithmetic are IEEE-754 compliant
  510. \end{tabbing}
  511. \noindent
  512. If you are working only in single precision, you do not need to install
  513. DLAMCH and DSECND, and if you are working only in double precision,
  514. you do not need to install SLAMCH and SECOND.
  515. These six subroutines are provided in \texttt{LAPACK/INSTALL},
  516. along with six test programs.
  517. To compile the six test programs and run the tests, go to \texttt{LAPACK} and
  518. type \texttt{make lapack\_install}. The test programs are called
  519. \texttt{testlsame, testslamch, testdlamch, testsecond, testdsecnd} and
  520. \texttt{testieee}.
  521. If you do not wish to run all tests, you will need to modify the
  522. \texttt{lapack\_install} definition in the \texttt{LAPACK/Makefile} to only include the
  523. tests you wish to run. Otherwise, all tests will be performed.
  524. The expected results of each test program are described below.
  525. \subsubsection{Installing LSAME}
  526. LSAME is a logical function with two character parameters, A and B.
  527. It returns .TRUE. if A and B are the same regardless of case, or .FALSE.
  528. if they are different.
  529. For example, the expression
  530. \begin{list}{}{}
  531. \item \texttt{LSAME( UPLO, 'U' )}
  532. \end{list}
  533. \noindent
  534. is equivalent to
  535. \begin{list}{}{}
  536. \item \texttt{( UPLO.EQ.'U' ).OR.( UPLO.EQ.'u' )}
  537. \end{list}
  538. The test program in \texttt{lsametst.f} tests all combinations of
  539. the same character in upper and lower case for A and B, and two
  540. cases where A and B are different characters.
  541. Run the test program by typing \texttt{testlsame}.
  542. If LSAME works correctly, the only message you should see after the
  543. execution of \texttt{testlsame} is
  544. \begin{verbatim}
  545. ASCII character set
  546. Tests completed
  547. \end{verbatim}
  548. The file \texttt{lsame.f} is automatically copied to
  549. \texttt{LAPACK/BLAS/SRC/} and \texttt{LAPACK/SRC/}.
  550. The function LSAME is needed by both the BLAS and LAPACK, so it is safer
  551. to have it in both libraries as long as this does not cause trouble
  552. in the link phase when both libraries are used.
  553. \subsubsection{Installing SLAMCH and DLAMCH}
  554. SLAMCH and DLAMCH are real functions with a single character parameter
  555. that indicates the machine parameter to be returned. The test
  556. program in \texttt{slamchtst.f}
  557. simply prints out the different values computed by SLAMCH,
  558. so you need to know something about what the values should be.
  559. For example, the output of the test program executable \texttt{testslamch}
  560. for SLAMCH on a Sun SPARCstation is
  561. \begin{verbatim}
  562. Epsilon = 5.96046E-08
  563. Safe minimum = 1.17549E-38
  564. Base = 2.00000
  565. Precision = 1.19209E-07
  566. Number of digits in mantissa = 24.0000
  567. Rounding mode = 1.00000
  568. Minimum exponent = -125.000
  569. Underflow threshold = 1.17549E-38
  570. Largest exponent = 128.000
  571. Overflow threshold = 3.40282E+38
  572. Reciprocal of safe minimum = 8.50706E+37
  573. \end{verbatim}
  574. On a Cray machine, the safe minimum underflows its output
  575. representation and the overflow threshold overflows its output
  576. representation, so the safe minimum is printed as 0.00000 and overflow
  577. is printed as R. This is normal.
  578. If you would prefer to print a representable number, you can modify
  579. the test program to print SFMIN*100. and RMAX/100. for the safe
  580. minimum and overflow thresholds.
  581. Likewise, the test executable \texttt{testdlamch} is run for DLAMCH.
  582. If both tests were successful, go to Section~\ref{second}.
  583. If SLAMCH (or DLAMCH) returns an invalid value, you will have to create
  584. your own version of this function. The following options are used in
  585. LAPACK and must be set:
  586. \begin{list}{}{}
  587. \item {`B': } Base of the machine
  588. \item {`E': } Epsilon (relative machine precision)
  589. \item {`O': } Overflow threshold
  590. \item {`P': } Precision = Epsilon*Base
  591. \item {`S': } Safe minimum (often same as underflow threshold)
  592. \item {`U': } Underflow threshold
  593. \end{list}
  594. Some people may be familiar with R1MACH (D1MACH), a primitive
  595. routine for setting machine parameters in which the user must
  596. comment out the appropriate assignment statements for the target
  597. machine. If a version of R1MACH is on hand, the assignments in
  598. SLAMCH can be made to refer to R1MACH using the correspondence
  599. \begin{list}{}{}
  600. \item {SLAMCH( `U' )} $=$ R1MACH( 1 )
  601. \item {SLAMCH( `O' )} $=$ R1MACH( 2 )
  602. \item {SLAMCH( `E' )} $=$ R1MACH( 3 )
  603. \item {SLAMCH( `B' )} $=$ R1MACH( 5 )
  604. \end{list}
  605. \noindent
  606. The safe minimum returned by SLAMCH( 'S' ) is initially set to the
  607. underflow value, but if $1/(\mathrm{overflow}) \geq (\mathrm{underflow})$
  608. it is recomputed as $(1/(\mathrm{overflow})) * ( 1 + \varepsilon )$,
  609. where $\varepsilon$ is the machine precision.
  610. BE AWARE that the initial call to SLAMCH or DLAMCH is expensive.
  611. We suggest that installers run it once, save the results, and hard-code
  612. the constants in the version they put in their library.
  613. \subsubsection{Installing SECOND and DSECND}\label{second}
  614. Both the timing routines\footnotemark[\value{footnote}] and the test routines call SECOND
  615. (DSECND), a real function with no arguments that returns the time
  616. in seconds from some fixed starting time.
  617. Our version of this routine
  618. returns only ``user time'', and not ``user time $+$ system time''.
  619. The following version of SECOND in \texttt{second\_EXT\_ETIME.f, second\_INT\_ETIME.f} calls
  620. ETIME, a Fortran library routine available on some computer systems.
  621. If ETIME is not available or a better local timing function exists,
  622. you will have to provide the correct interface to SECOND and DSECND
  623. on your machine.
  624. Since LAPACK 3.1.1 we provide 5 different flavours of the SECOND and DSECND routines.
  625. The version that will be used depends on the value of the TIMER variable in the make.inc
  626. \begin{itemize}
  627. \item If ETIME is available as an external function, set the value of the TIMER variable in your
  628. make.inc to \texttt{EXT\_ETIME}: \texttt{second\_EXT\_ETIME.f} and \texttt{dsecnd\_EXT\_ETIME.f} will be used.
  629. Usually on HPPA architectures,
  630. the compiler and linker flag \texttt{+U77} should be included to access
  631. the function \texttt{ETIME}.
  632. \item If ETIME\_ is available as an external function, set the value of the TIMER variable in your make.inc
  633. to \texttt{EXT\_ETIME\_}: \texttt{second\_EXT\_ETIME\_.f} and \texttt{dsecnd\_EXT\_ETIME\_.f} will be used.
  634. It is the case on some IBM architectures such as IBM RS/6000s.
  635. \item If ETIME is available as an internal function, set the value of the TIMER variable in your make.inc
  636. to \texttt{INT\_ETIME}: \texttt{second\_INT\_ETIME.f} and \texttt{dsecnd\_INT\_ETIME.f} will be used.
  637. This is the case with gfortan.
  638. \item If CPU\_TIME is available as an internal function, set the value of the TIMER variable in your make.inc
  639. to \texttt{INT\_CPU\_TIME}: \texttt{second\_INT\_CPU\_TIME.f} and \texttt{dsecnd\_INT\_CPU\_TIME.f} will be used.
  640. \item If none of these function is available, set the value of the TIMER variable in your make.inc
  641. to \texttt{NONE}: \texttt{second\_NONE.f} and \texttt{dsecnd\_NONE.f} will be used.
  642. These routines will always return zero.
  643. \end{itemize}
  644. The test program in \texttt{secondtst.f}
  645. performs a million operations using 5000 iterations of
  646. the SAXPY operation $y := y + \alpha x$ on a vector of length 100.
  647. The total time and megaflops for this test is reported, then
  648. the operation is repeated including a call to SECOND on each of
  649. the 5000 iterations to determine the overhead due to calling SECOND.
  650. The test program executable is called \texttt{testsecond} (or \texttt{testdsecnd}).
  651. There is no single right answer, but the times
  652. in seconds should be positive and the megaflop ratios should be
  653. appropriate for your machine.
  654. \subsubsection{Testing IEEE arithmetic and ILAENV}\label{testieee}
  655. %\textbf{If you are installing LAPACK on a non-IEEE machine, you MUST
  656. %modify ILAENV! Otherwise, ILAENV will crash . By default, ILAENV
  657. %assumes an IEEE machine, and does a test for IEEE-754 compliance.}
  658. As some new routines in LAPACK rely on IEEE-754 compliance,
  659. two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV
  660. (\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and
  661. infinity arithmetic, respectively. By default, ILAENV assumes an IEEE
  662. machine, and does a test for IEEE-754 compliance. \textbf{NOTE: If you
  663. are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
  664. as this test inside ILAENV will crash!}
  665. If \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is
  666. issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance,
  667. and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant.
  668. Thus, for non-IEEE machines, the user must hard-code the setting of
  669. (\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version
  670. of \texttt{LAPACK/SRC/ilaenv.f} to be put in
  671. his library. There are also specialized testing and timing\footnotemark[\value{footnote}] versions of
  672. ILAENV that will also need to be modified.
  673. \begin{itemize}
  674. \item Testing/timing version of \texttt{LAPACK/TESTING/LIN/ilaenv.f}
  675. \item Testing/timing version of \texttt{LAPACK/TESTING/EIG/ilaenv.f}
  676. \item Testing/timing version of \texttt{LAPACK/TIMING/LIN/ilaenv.f}
  677. \item Testing/timing version of \texttt{LAPACK/TIMING/EIG/ilaenv.f}
  678. \end{itemize}
  679. %Some new routines in LAPACK rely on IEEE-754 compliance, and if non-compliance
  680. %is detected (via a call to the function ILAENV), alternative (slower)
  681. %algorithms will be chosen.
  682. %For further details, refer to the leading comments of routines such
  683. %as \texttt{LAPACK/SRC/sstevr.f}.
  684. The test program in \texttt{LAPACK/INSTALL/tstiee.f} checks an installation
  685. architecture
  686. to see if infinity arithmetic and NaN arithmetic are IEEE-754 compliant.
  687. A warning message to the user is printed if non-compliance is detected.
  688. This same test is performed inside the function ILAENV. If
  689. \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is
  690. issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance,
  691. and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant.
  692. To avoid this IEEE test being run every time you call
  693. \texttt{ILAENV( 10, $\ldots$)} or \texttt{ILAENV( 11, $\ldots$ )}, we suggest
  694. that the user hard-code the setting of
  695. \texttt{ILAENV=1} or \texttt{ILAENV=0} in the version of \texttt{LAPACK/SRC/ilaenv.f} to be put in
  696. his library. As aforementioned, there are also specialized testing and
  697. timing\footnotemark[\value{footnote}] versions of ILAENV that will also need to be modified.
  698. \subsection{Create the BLAS Library}
  699. Ideally, a highly optimized version of the BLAS library already
  700. exists on your machine.
  701. In this case you can go directly to Section~\ref{testblas} to
  702. make the BLAS test programs.
  703. \begin{itemize}
  704. \item[a)]
  705. Go to \texttt{LAPACK} and edit the definition of \texttt{blaslib} in the
  706. file \texttt{Makefile} to specify the data types desired, as in the example
  707. in Section~\ref{toplevelmakefile}.
  708. If you already have some of the BLAS, you will need to edit the file
  709. \texttt{LAPACK/BLAS/SRC/Makefile} to comment out the lines
  710. defining the BLAS you have.
  711. \item[b)]
  712. Type \texttt{make blaslib}.
  713. The make command can be run more than once to add another
  714. data type to the library if necessary.
  715. \end{itemize}
  716. \noindent
  717. The BLAS library is created in \texttt{LAPACK/librefblas.a},
  718. or in the user-defined location specified by \texttt{BLASLIB} in the file
  719. \texttt{LAPACK/make.inc}.
  720. \subsection{Run the BLAS Test Programs}\label{testblas}
  721. Test programs for the Level 1, 2, and 3 BLAS are in the directory
  722. \texttt{LAPACK/BLAS/TESTING}.
  723. To compile and run the Level 1, 2, and 3 BLAS test programs,
  724. go to \texttt{LAPACK} and type \texttt{make blas\_testing}. The executable
  725. files are called \texttt{xblat\_s}, \texttt{xblat\_d}, \texttt{xblat\_c}, and
  726. \texttt{xblat\_z}, where the \_ (underscore) is replaced by 1, 2, or 3,
  727. depending upon the level of BLAS that it is testing. All executable and
  728. output files are created in \texttt{LAPACK/BLAS/}.
  729. For the Level 1 BLAS tests, the output file names are \texttt{sblat1.out},
  730. \texttt{dblat1.out}, \texttt{cblat1.out}, and \texttt{zblat1.out}. For the Level
  731. 2 and 3 BLAS, the name of the output file is indicated on the first line of the
  732. input file and is currently defined to be \texttt{sblat2.out} for
  733. the Level 2 REAL version, and \texttt{sblat3.out} for the Level 3 REAL
  734. version, with similar names for the other data types.
  735. If the tests using the supplied data files were completed successfully,
  736. consider whether the tests were sufficiently thorough.
  737. For example, on a machine with vector registers, at least one value
  738. of $N$ greater than the length of the vector registers should be used;
  739. otherwise, important parts of the compiled code may not be
  740. exercised by the tests.
  741. If the tests were not successful, either because the program did not
  742. finish or the test ratios did not pass the threshold, you will
  743. probably have to find and correct the problem before continuing.
  744. If you have been testing a system-specific
  745. BLAS library, try using the Fortran BLAS for the routines that
  746. did not pass the tests.
  747. For more details on the BLAS test programs,
  748. see \cite{BLAS2-test} and \cite{BLAS3-test}.
  749. \subsection{Create the LAPACK Library}
  750. \begin{itemize}
  751. \item[a)]
  752. Go to the directory \texttt{LAPACK} and edit the definition of
  753. \texttt{lapacklib} in the file \texttt{Makefile} to specify the data types desired,
  754. as in the example in Section~\ref{toplevelmakefile}.
  755. \item[b)]
  756. Type \texttt{make lapacklib}.
  757. The make command can be run more than once to add another
  758. data type to the library if necessary.
  759. \end{itemize}
  760. \noindent
  761. The LAPACK library is created in \texttt{LAPACK/liblapack.a},
  762. or in the user-defined location specified by \texttt{LAPACKLIB} in the file
  763. \texttt{LAPACK/make.inc}.
  764. \subsection{Create the Test Matrix Generator Library}
  765. \begin{itemize}
  766. \item[a)]
  767. Go to the directory \texttt{LAPACK} and edit the definition of \texttt{tmglib}
  768. in the file \texttt{Makefile} to specify the data types desired, as in the
  769. example in Section~\ref{toplevelmakefile}.
  770. \item[b)]
  771. Type \texttt{make tmglib}.
  772. The make command can be run more than once to add another
  773. data type to the library if necessary.
  774. \end{itemize}
  775. \noindent
  776. The test matrix generator library is created in \texttt{LAPACK/libtmglib.a},
  777. or in the user-defined location specified by \texttt{TMGLIB} in the file
  778. \texttt{LAPACK/make.inc}.
  779. \subsection{Run the LAPACK Test Programs}
  780. There are two distinct test programs for LAPACK routines
  781. in each data type, one for the linear equation routines and
  782. one for the eigensystem routines.
  783. In each data type, there is one input file for testing the linear
  784. equation routines and eighteen input files for testing the eigenvalue
  785. routines.
  786. The input files reside in \texttt{LAPACK/TESTING}.
  787. For more information on the test programs and how to modify the
  788. input files, please refer to LAPACK Working Note 41~\cite{WN41}.
  789. % see Section~\ref{moretesting}.
  790. If you do not wish to run each of the tests individually, you can
  791. go to \texttt{LAPACK}, edit the definition \texttt{lapack\_testing} in the file
  792. \texttt{Makefile} to specify the data types desired, and type \texttt{make
  793. lapack\_testing}. This will
  794. compile and run the tests as described in sections~\ref{testlin}
  795. and ~\ref{testeig}.
  796. %If you are installing LAPACK on a Silicon Graphics machine, you must
  797. %modify the definition of \texttt{testing} to be
  798. %\begin{verbatim}
  799. %testing:
  800. % ( cd TESTING; $(MAKE) -f Makefile.sgi )
  801. %\end{verbatim}
  802. \subsubsection{Testing the Linear Equations Routines}\label{testlin}
  803. \begin{itemize}
  804. \item[a)]
  805. Go to \texttt{LAPACK/TESTING/LIN} and type \texttt{make} followed by the data types
  806. desired. The executable files are called \texttt{xlintsts, xlintstc,
  807. xlintstd}, or \texttt{xlintstz} and are created in \texttt{LAPACK/TESTING}.
  808. \item[b)]
  809. Go to \texttt{LAPACK/TESTING} and run the tests for each data type.
  810. For the REAL version, the command is
  811. \begin{list}{}{}
  812. \item{} \texttt{xlintsts < stest.in > stest.out}
  813. \end{list}
  814. \noindent
  815. The tests using \texttt{xlintstd}, \texttt{xlintstc}, and \texttt{xlintstz} are similar
  816. with the leading `s' in the input and output file names replaced
  817. by `d', `c', or `z'.
  818. \end{itemize}
  819. If you encountered failures in this phase of the testing process, please
  820. refer to Section~\ref{sendresults}.
  821. \subsubsection{Testing the Eigensystem Routines}\label{testeig}
  822. \begin{itemize}
  823. \item[a)]
  824. Go to \texttt{LAPACK/TESTING/EIG} and type \texttt{make} followed by the data types
  825. desired. The executable files are called \texttt{xeigtsts,
  826. xeigtstc, xeigtstd}, and \texttt{xeigtstz} and are created
  827. in \texttt{LAPACK/TESTING}.
  828. \item[b)]
  829. Go to \texttt{LAPACK/TESTING} and run the tests for each data type.
  830. The tests for the eigensystem routines use eighteen separate input files
  831. for testing the nonsymmetric eigenvalue problem,
  832. the symmetric eigenvalue problem, the banded symmetric eigenvalue
  833. problem, the generalized symmetric eigenvalue
  834. problem, the generalized nonsymmetric eigenvalue problem, the
  835. singular value decomposition, the banded singular value decomposition,
  836. the generalized singular value
  837. decomposition, the generalized QR and RQ factorizations, the generalized
  838. linear regression model, and the constrained linear least squares
  839. problem.
  840. The tests for the REAL version are as follows:
  841. \begin{list}{}{}
  842. \item \texttt{xeigtsts < nep.in > snep.out}
  843. \item \texttt{xeigtsts < sep.in > ssep.out}
  844. \item \texttt{xeigtsts < svd.in > ssvd.out}
  845. \item \texttt{xeigtsts < sec.in > sec.out}
  846. \item \texttt{xeigtsts < sed.in > sed.out}
  847. \item \texttt{xeigtsts < sgg.in > sgg.out}
  848. \item \texttt{xeigtsts < sgd.in > sgd.out}
  849. \item \texttt{xeigtsts < ssg.in > ssg.out}
  850. \item \texttt{xeigtsts < ssb.in > ssb.out}
  851. \item \texttt{xeigtsts < sbb.in > sbb.out}
  852. \item \texttt{xeigtsts < sbal.in > sbal.out}
  853. \item \texttt{xeigtsts < sbak.in > sbak.out}
  854. \item \texttt{xeigtsts < sgbal.in > sgbal.out}
  855. \item \texttt{xeigtsts < sgbak.in > sgbak.out}
  856. \item \texttt{xeigtsts < glm.in > sglm.out}
  857. \item \texttt{xeigtsts < gqr.in > sgqr.out}
  858. \item \texttt{xeigtsts < gsv.in > sgsv.out}
  859. \item \texttt{xeigtsts < lse.in > slse.out}
  860. \end{list}
  861. The tests using \texttt{xeigtstc}, \texttt{xeigtstd}, and \texttt{xeigtstz} also
  862. use the input files \texttt{nep.in}, \texttt{sep.in}, \texttt{svd.in},
  863. \texttt{glm.in}, \texttt{gqr.in}, \texttt{gsv.in}, and \texttt{lse.in},
  864. but the leading `s' in the other input file names must be changed
  865. to `c', `d', or `z'.
  866. \end{itemize}
  867. If you encountered failures in this phase of the testing process, please
  868. refer to Section~\ref{sendresults}.
  869. \subsection{Run the LAPACK Timing Programs (For LAPACK 3.0 and before)}
  870. There are two distinct timing programs for LAPACK routines
  871. in each data type, one for the linear equation routines and
  872. one for the eigensystem routines. The timing program for the
  873. linear equation routines is also used to time the BLAS.
  874. We encourage you to conduct these timing experiments
  875. in REAL and COMPLEX or in DOUBLE PRECISION and COMPLEX*16; it is
  876. not necessary to send timing results in all four data types.
  877. Two sets of input files are provided, a small set and a large set.
  878. The small data sets are appropriate for a standard workstation or
  879. other non-vector machine.
  880. The large data sets are appropriate for supercomputers, vector
  881. computers, and high-performance workstations.
  882. We are mainly interested in results from the large data sets, and
  883. it is not necessary to run both the large and small sets.
  884. The values of N in the large data sets are about five times larger
  885. than those in the small data set,
  886. and the large data sets use additional values for parameters such as the
  887. block size NB and the leading array dimension LDA.
  888. Small data sets finished with the \_small in their name , such as
  889. \texttt{stime\_small.in}, and large data sets finished with \_large in their name,
  890. such as \texttt{stime\_large.in}.
  891. Except as noted, the leading `s' in the input file name must be
  892. replaced by `d', `c', or `z' for the other data types.
  893. We encourage you to obtain timing results with the large data sets,
  894. as this allows us to compare different machines.
  895. If this would take too much time, suggestions for paring back the large
  896. data sets are given in the instructions below.
  897. We also encourage you to experiment with these timing
  898. programs and send us any interesting results, such as results for
  899. larger problems or for a wider range of block sizes.
  900. The main programs are dimensioned for the large data sets,
  901. so the parameters in the main program may have to be reduced in order
  902. to run the small data sets on a small machine, or increased to run
  903. experiments with larger problems.
  904. The minimum time each subroutine will be timed is set to 0.0 in
  905. the large data files and to 0.05 in the small data files, and on
  906. many machines this value should be increased.
  907. If the timing interval is not long
  908. enough, the time for the subroutine after subtracting the overhead
  909. may be very small or zero, resulting in megaflop rates that are
  910. very large or zero. (To avoid division by zero, the megaflop rate is
  911. set to zero if the time is less than or equal to zero.)
  912. The minimum time that should be used depends on the machine and the
  913. resolution of the clock.
  914. For more information on the timing programs and how to modify the
  915. input files, please refer to LAPACK Working Note 41~\cite{WN41}.
  916. % see Section~\ref{moretiming}.
  917. If you do not wish to run each of the timings individually, you can
  918. go to \texttt{LAPACK}, edit the definition \texttt{lapack\_timing} in the file
  919. \texttt{Makefile} to specify the data types desired, and type \texttt{make
  920. lapack\_timing}. This will compile
  921. and run the timings for the linear equation routines and the eigensystem
  922. routines (see Sections~\ref{timelin} and ~\ref{timeeig}).
  923. %If you are installing LAPACK on a Silicon Graphics machine, you must
  924. %modify the definition of \texttt{timing} to be
  925. %\begin{verbatim}
  926. %timing:
  927. % ( cd TIMING; $(MAKE) -f Makefile.sgi )
  928. %\end{verbatim}
  929. If you encounter failures in any phase of the timing process, please
  930. feel free to contact the authors as directed in Section~\ref{sendresults}.
  931. Tell us the
  932. type of machine on which the tests were run, the version of the operating
  933. system, the compiler and compiler options that were used,
  934. and details of the BLAS library or libraries that you used. You should
  935. also include a copy of the output file in which the failure occurs.
  936. Please note that the BLAS
  937. timing runs will still need to be run as instructed in ~\ref{timeblas}.
  938. \subsubsection{Timing the Linear Equations Routines}\label{timelin}
  939. The linear equation timing program is found in \texttt{LAPACK/TIMING/LIN}
  940. and the input files are in \texttt{LAPACK/TIMING}.
  941. Three input files are provided in each data type for timing the
  942. linear equation routines, one for square matrices, one for band
  943. matrices, and one for rectangular matrices. The small data sets for the REAL version
  944. are \texttt{stime\_small.in}, \texttt{sband\_small.in}, and \texttt{stime2\_small.in}, respectively,
  945. and the large data sets are
  946. \texttt{stime\_large.in}, \texttt{sband\_large.in}, and \texttt{stime2\_large.in}.
  947. The timing program for the least squares routines uses special instrumented
  948. versions of the LAPACK routines to time individual sections of the code.
  949. The first step in compiling the timing program is therefore to make a library
  950. of the instrumented routines.
  951. \begin{itemize}
  952. \item[a)]
  953. \begin{sloppypar}
  954. To make a library of the instrumented LAPACK routines, first
  955. go to \texttt{LAPACK/TIMING/LIN/LINSRC} and type \texttt{make} followed
  956. by the data types desired, as in the examples of Section~\ref{toplevelmakefile}.
  957. The library of instrumented code is created in
  958. \texttt{LAPACK/TIMING/LIN/linsrc.a}.
  959. \end{sloppypar}
  960. \item[b)]
  961. To make the linear equation timing programs,
  962. go to \texttt{LAPACK/TIMING/LIN} and type \texttt{make} followed by the data
  963. types desired, as in the examples in Section~\ref{toplevelmakefile}.
  964. The executable files are called \texttt{xlintims},
  965. \texttt{xlintimc}, \texttt{xlintimd}, and \texttt{xlintimz} and are created
  966. in \texttt{LAPACK/TIMING}.
  967. \item[c)]
  968. Go to \texttt{LAPACK/TIMING} and
  969. make any necessary modifications to the input files.
  970. You may need to set the minimum time a subroutine will
  971. be timed to a positive value, or to restrict the size of the tests
  972. if you are using a computer with performance in between that of a
  973. workstation and that of a supercomputer.
  974. The computational requirements can be cut in half by using only one
  975. value of LDA.
  976. If it is necessary to also reduce the matrix sizes or the values of
  977. the blocksize, corresponding changes should be made to the
  978. BLAS input files (see Section~\ref{timeblas}).
  979. \item[d)]
  980. Run the programs for each data type you are using.
  981. For the REAL version, the commands for the small data sets are
  982. \begin{list}{}{}
  983. \item{} \texttt{xlintims < stime\_small.in > stime\_small.out }
  984. \item{} \texttt{xlintims < sband\_small.in > sband\_small.out }
  985. \item{} \texttt{xlintims < stime2\_small.in > stime2\_small.out }
  986. \end{list}
  987. or the commands for the large data sets are
  988. \begin{list}{}{}
  989. \item{} \texttt{xlintims < stime\_large.in > stime\_large.out }
  990. \item{} \texttt{xlintims < sband\_large.in > sband\_large.out }
  991. \item{} \texttt{xlintims < stime2\_large.in > stime2\_large.out }
  992. \end{list}
  993. \noindent
  994. Similar commands should be used for the other data types.
  995. \end{itemize}
  996. \subsubsection{Timing the BLAS}\label{timeblas}
  997. The linear equation timing program is also used to time the BLAS.
  998. Three input files are provided in each data type for timing the Level
  999. 2 and 3 BLAS.
  1000. These input files time the BLAS using the matrix shapes encountered
  1001. in the LAPACK routines, and we will use the results to analyze the
  1002. performance of the LAPACK routines.
  1003. For the REAL version, the small data files are
  1004. \texttt{sblasa\_small.in}, \texttt{sblasb\_small.in}, and \texttt{sblasc\_small.in}
  1005. and the large data files are
  1006. \texttt{sblasa\_large.in}, \texttt{sblasb\_large.in}, and \texttt{sblasc\_large.in}.
  1007. There are three sets of inputs because there are three
  1008. parameters in the Level 3 BLAS, M, N, and K, and
  1009. in most applications one of these parameters is small (on the order
  1010. of the blocksize) while the other two are large (on the order of the
  1011. matrix size).
  1012. In \texttt{sblasa\_small.in}, M and N are large but K is
  1013. small, while in \texttt{sblasb\_small.in} the small parameter is M, and
  1014. in \texttt{sblasc\_small.in} the small parameter is N.
  1015. The Level 2 BLAS are timed only in the first data set, where K
  1016. is also used as the bandwidth for the banded routines.
  1017. \begin{itemize}
  1018. \item[a)]
  1019. Go to \texttt{LAPACK/TIMING} and
  1020. make any necessary modifications to the input files.
  1021. You may need to set the minimum time a subroutine will
  1022. be timed to a positive value.
  1023. If you modified the values of N or NB
  1024. in Section~\ref{timelin}, set M, N, and K accordingly.
  1025. The large parameters among M, N, and K
  1026. should be the same as the matrix sizes used in timing the linear
  1027. equation routines,
  1028. and the small parameter should be the same as the
  1029. blocksizes used in timing the linear equation routines.
  1030. If necessary, the large data set can be simplified by using only one
  1031. value of LDA.
  1032. \item[b)]
  1033. Run the programs for each data type you are using.
  1034. For the REAL version, the commands for the small data sets are
  1035. \begin{list}{}{}
  1036. \item{} \texttt{xlintims < sblasa\_small.in > sblasa\_small.out }
  1037. \item{} \texttt{xlintims < sblasb\_small.in > sblasb\_small.out }
  1038. \item{} \texttt{xlintims < sblasc\_small.in > sblasc\_small.out }
  1039. \end{list}
  1040. or the commands for the large data sets are
  1041. \begin{list}{}{}
  1042. \item{} \texttt{xlintims < sblasa\_large.in > sblasa\_large.out }
  1043. \item{} \texttt{xlintims < sblasb\_large.in > sblasb\_large.out }
  1044. \item{} \texttt{xlintims < sblasc\_large.in > sblasc\_large.out }
  1045. \end{list}
  1046. \noindent
  1047. Similar commands should be used for the other data types.
  1048. \end{itemize}
  1049. \subsubsection{Timing the Eigensystem Routines}\label{timeeig}
  1050. The eigensystem timing program is found in \texttt{LAPACK/TIMING/EIG}
  1051. and the input files are in \texttt{LAPACK/TIMING}.
  1052. Four input files are provided in each data type for timing the
  1053. eigensystem routines,
  1054. one for the generalized nonsymmetric eigenvalue problem,
  1055. one for the nonsymmetric eigenvalue problem,
  1056. one for the symmetric and generalized symmetric eigenvalue problem,
  1057. and one for the singular value decomposition.
  1058. For the REAL version, the small data sets are called \texttt{sgeptim\_small.in},
  1059. \texttt{sneptim\_small.in}, \texttt{sseptim\_small.in}, and \texttt{ssvdtim\_small.in}, respectively.
  1060. and the large data sets are called \texttt{sgeptim\_large.in}, \texttt{sneptim\_large.in},
  1061. \texttt{sseptim\_large.in}, and \texttt{ssvdtim\_large.in}.
  1062. Each of the four input files reads a different set of parameters,
  1063. and the format of the input is indicated by a 3-character code
  1064. on the first line.
  1065. The timing program for eigenvalue/singular value routines accumulates
  1066. the operation count as the routines are executing using special
  1067. instrumented versions of the LAPACK routines. The first step in
  1068. compiling the timing program is therefore to make a library of the
  1069. instrumented routines.
  1070. \begin{itemize}
  1071. \item[a)]
  1072. \begin{sloppypar}
  1073. To make a library of the instrumented LAPACK routines, first
  1074. go to \texttt{LAPACK/TIMING/EIG/EIGSRC} and type \texttt{make} followed
  1075. by the data types desired, as in the examples of Section~\ref{toplevelmakefile}.
  1076. The library of instrumented code is created in
  1077. \texttt{LAPACK/TIMING/EIG/eigsrc.a}.
  1078. \end{sloppypar}
  1079. \item[b)]
  1080. To make the eigensystem timing programs,
  1081. go to \texttt{LAPACK/TIMING/EIG} and
  1082. type \texttt{make} followed by the data types desired, as in the examples
  1083. of Section~\ref{toplevelmakefile}. The executable files are called
  1084. \texttt{xeigtims}, \texttt{xeigtimc}, \texttt{xeigtimd}, and \texttt{xeigtimz}
  1085. and are created in \texttt{LAPACK/TIMING}.
  1086. \item[c)]
  1087. Go to \texttt{LAPACK/TIMING} and
  1088. make any necessary modifications to the input files.
  1089. You may need to set the minimum time a subroutine will
  1090. be timed to a positive value, or to restrict the number of tests
  1091. if you are using a computer with performance in between that of a
  1092. workstation and that of a supercomputer.
  1093. Instead of decreasing the matrix dimensions to reduce the time,
  1094. it would be better to reduce the number of matrix types to be timed,
  1095. since the performance varies more with the matrix size than with the
  1096. type. For example, for the nonsymmetric eigenvalue routines,
  1097. you could use only one matrix of type 4 instead of four matrices of
  1098. types 1, 3, 4, and 6.
  1099. Refer to LAPACK Working Note 41~\cite{WN41} for further details.
  1100. % See Section~\ref{moretiming} for further details.
  1101. \item[d)]
  1102. Run the programs for each data type you are using.
  1103. For the REAL version, the commands for the small data sets are
  1104. \begin{list}{}{}
  1105. \item{} \texttt{xeigtims < sgeptim\_small.in > sgeptim\_small.out }
  1106. \item{} \texttt{xeigtims < sneptim\_small.in > sneptim\_small.out }
  1107. \item{} \texttt{xeigtims < sseptim\_small.in > sseptim\_small.out }
  1108. \item{} \texttt{xeigtims < ssvdtim\_small.in > ssvdtim\_small.out }
  1109. \end{list}
  1110. or the commands for the large data sets are
  1111. \begin{list}{}{}
  1112. \item{} \texttt{xeigtims < sgeptim\_large.in > sgeptim\_large.out }
  1113. \item{} \texttt{xeigtims < sneptim\_large.in > sneptim\_large.out }
  1114. \item{} \texttt{xeigtims < sseptim\_large.in > sseptim\_large.out }
  1115. \item{} \texttt{xeigtims < ssvdtim\_large.in > ssvdtim\_large.out }
  1116. \end{list}
  1117. \noindent
  1118. Similar commands should be used for the other data types.
  1119. \end{itemize}
  1120. \subsection{Send the Results to Tennessee}\label{sendresults}
  1121. Congratulations! You have now finished installing, testing, and
  1122. timing LAPACK. If you encountered failures in any phase of the
  1123. testing or timing process, please
  1124. consult our \texttt{release\_notes} file on netlib.
  1125. \begin{quote}
  1126. \url{http://www.netlib.org/lapack/release\_notes}
  1127. \end{quote}
  1128. This file contains machine-dependent installation clues which hopefully will
  1129. alleviate your difficulties or at least let you know that other users
  1130. have had similar difficulties on that machine. If there is not an entry
  1131. for your machine or the suggestions do not fix your problem, please feel
  1132. free to contact the authors at
  1133. \begin{list}{}{}
  1134. \item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}.
  1135. \end{list}
  1136. Tell us the
  1137. type of machine on which the tests were run, the version of the operating
  1138. system, the compiler and compiler options that were used,
  1139. and details of the BLAS library or libraries that you used. You should
  1140. also include a copy of the output file in which the failure occurs.
  1141. We would like to keep our \texttt{release\_notes} file as up-to-date as possible.
  1142. Therefore, if you do not see an entry for your machine, please contact us
  1143. with your testing results.
  1144. Comments and suggestions are also welcome.
  1145. We encourage you to make the LAPACK library available to your
  1146. users and provide us with feedback from their experiences.
  1147. %This release of LAPACK is not guaranteed to be compatible
  1148. %with any previous test release.
  1149. \subsection{Get support}\label{getsupport}
  1150. First, take a look at the complete installation manual in the LAPACK Working Note 41~\cite{WN41}.
  1151. if you still cannot solve your problem, you have 2 ways to go:
  1152. \begin{itemize}
  1153. \item
  1154. either send a post in the LAPACK forum
  1155. \begin{quote}
  1156. \url{http://icl.cs.utk.edu/lapack-forum}
  1157. \end{quote}
  1158. \item
  1159. or send an email to the LAPACK mailing list:
  1160. \begin{list}{}{}
  1161. \item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}.
  1162. \end{list}
  1163. \end{itemize}
  1164. \section*{Acknowledgments}
  1165. Ed Anderson and Susan Blackford contributed to previous versions of this report.
  1166. \appendix
  1167. \chapter{Caveats}\label{appendixd}
  1168. In this appendix we list a few of the machine-specific difficulties we
  1169. have
  1170. encountered in our own experience with LAPACK. A more detailed list
  1171. of machine-dependent problems, bugs, and compiler errors encountered
  1172. in the LAPACK installation process is maintained
  1173. on \emph{netlib}.
  1174. \begin{quote}
  1175. \url{http://www.netlib.org/lapack/release\_notes}
  1176. \end{quote}
  1177. We assume the user has installed the machine-specific routines
  1178. correctly and that the Level 1, 2 and 3 BLAS test programs have run
  1179. successfully, so we do not list any warnings associated with those
  1180. routines.
  1181. \section{\texttt{LAPACK/make.inc}}
  1182. All machine-specific
  1183. parameters are specified in the file \texttt{LAPACK/make.inc}.
  1184. The first line of this \texttt{make.inc} file is:
  1185. \begin{quote}
  1186. SHELL = /bin/sh
  1187. \end{quote}
  1188. and will need to be modified to \texttt{SHELL = /sbin/sh} if you are
  1189. installing LAPACK on an SGI architecture.
  1190. \section{ETIME}
  1191. On HPPA architectures,
  1192. the compiler and linker flag \texttt{+U77} should be included to access
  1193. the function \texttt{ETIME}.
  1194. \section{ILAENV and IEEE-754 compliance}
  1195. %By default, ILAENV (\texttt{LAPACK/SRC/ilaenv.f}) assumes an IEEE and IEEE-754
  1196. %compliant architecture, and thus sets (\texttt{ILAENV=1}) for (\texttt{ISPEC=10})
  1197. %and (\texttt{ISPEC=11}) settings in ILAENV.
  1198. %
  1199. %If you are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
  1200. %as this test inside ILAENV will crash!
  1201. As some new routines in LAPACK rely on IEEE-754 compliance,
  1202. two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV
  1203. (\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and
  1204. infinity arithmetic, respectively. By default, ILAENV assumes an IEEE
  1205. machine, and does a test for IEEE-754 compliance. \textbf{NOTE: If you
  1206. are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV,
  1207. as this test inside ILAENV will crash!}
  1208. Thus, for non-IEEE machines, the user must hard-code the setting of
  1209. (\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version
  1210. of \texttt{LAPACK/SRC/ilaenv.f} to be put in
  1211. his library. For further details, refer to section~\ref{testieee}.
  1212. Be aware
  1213. that some IEEE compilers by default do not enforce IEEE-754 compliance, and
  1214. a compiler flag must be explicitly set by the user.
  1215. On SGIs for example, you must set the \texttt{-OPT:IEEE\_NaN\_inf=ON} compiler
  1216. flag to enable IEEE-754 compliance.
  1217. And lastly, the test inside ILAENV to detect IEEE-754 compliance, will
  1218. result in IEEE exceptions for ``Divide by Zero'' and ``Invalid Operation''.
  1219. Thus, if the user is installing on a machine that issues IEEE exception
  1220. warning messages (like a Sun SPARCstation), the user can disregard these
  1221. messages. To avoid these messages, the user can hard-code the values
  1222. inside ILAENV as explained in section~\ref{testieee}.
  1223. \section{Lack of \texttt{/tmp} space}
  1224. If \texttt{/tmp} space is small (i.e., less than approximately 16 MB) on your
  1225. architecture, you may run out of space
  1226. when compiling. There are a few possible solutions to this problem.
  1227. \begin{enumerate}
  1228. \item You can ask your system administrator to increase the size of the
  1229. \texttt{/tmp} partition.
  1230. \item You can change the environment variable \texttt{TMPDIR} to point to
  1231. your home directory for temporary space. E.g.,
  1232. \begin{quote}
  1233. \texttt{setenv TMPDIR /home/userid/}
  1234. \end{quote}
  1235. where \texttt{/home/userid/} is the user's home directory.
  1236. \item If your archive command has an \texttt{l} option, you can change the
  1237. archive command to \texttt{ar crl} so that the
  1238. archive command will only place temporary files in the current working
  1239. directory rather than in the default temporary directory /tmp.
  1240. \end{enumerate}
  1241. \section{BLAS}
  1242. If you suspect a BLAS-related problem and you are linking
  1243. with an optimized version of the BLAS, we would strongly suggest
  1244. as a first step that you link to the Fortran~77 version of
  1245. the suspected BLAS routine and see if the error has disappeared.
  1246. We have included test programs for the Level 1 BLAS.
  1247. Users should therefore beware of a common problem in machine-specific
  1248. implementations of xNRM2,
  1249. the function to compute the 2-norm of a vector.
  1250. The Fortran version of xNRM2 avoids underflow or overflow
  1251. by scaling intermediate results, but some library versions of xNRM2
  1252. are not so careful about scaling.
  1253. If xNRM2 is implemented without scaling intermediate results, some of
  1254. the LAPACK test ratios may be unusually high, or
  1255. a floating point exception may occur in the problems scaled near
  1256. underflow or overflow.
  1257. The solution to these problems is to link the Fortran version of
  1258. xNRM2 with the test program. \emph{On some CRAY architectures, the Fortran77
  1259. version of xNRM2 should be used.}
  1260. \section{Optimization}
  1261. If a large numbers of test failures occur for a specific matrix type
  1262. or operation, it could be that there is an optimization problem with
  1263. your compiler. Thus, the user could try reducing the level of
  1264. optimization or eliminating optimization entirely for those routines
  1265. to see if the failures disappear when you rerun the tests.
  1266. %LAPACK is written in Fortran 77. Prospective users with only a
  1267. %Fortran 66 compiler will not be able to use this package.
  1268. \section{Compiling testing/timing drivers}
  1269. The testing and timing main programs (xCHKAA, xCHKEE, xTIMAA, and
  1270. xTIMEE)
  1271. allocate large amounts of local variables. Therefore, it is vitally
  1272. important that the user know if his compiler by default allocates local
  1273. variables statically or on the stack. It is not uncommon for those
  1274. compilers which place local variables on the stack to cause a stack
  1275. overflow at runtime in the testing or timing process. The user then
  1276. has two options: increase your stack size, or force all local variables
  1277. to be allocated statically.
  1278. On HPPA architectures, the
  1279. compiler and linker flag \texttt{-K} should be used when compiling these testing
  1280. and timing main programs to avoid such a stack overflow. I.e., set
  1281. \texttt{FFLAGS\_DRV = -K} in the \texttt{LAPACK/make.inc} file.
  1282. For similar reasons,
  1283. on SGI architectures, the compiler and linker flag \texttt{-static} should be
  1284. used. I.e., set \texttt{FFLAGS\_DRV = -static} in the \texttt{LAPACK/make.inc} file.
  1285. \section{IEEE arithmetic}
  1286. Some of our test matrices are scaled near overflow or underflow,
  1287. but on the Crays, problems with the arithmetic near overflow and
  1288. underflow forced us to scale by only the square root of overflow
  1289. and underflow.
  1290. The LAPACK auxiliary routine SLABAD (or DLABAD) is called to
  1291. take the square root of underflow and overflow in cases where it
  1292. could cause difficulties.
  1293. We assume we are on a Cray if $ \log_{10} (\mathrm{overflow})$
  1294. is greater than 2000
  1295. and take the square root of underflow and overflow in this case.
  1296. The test in SLABAD is as follows:
  1297. \begin{verbatim}
  1298. IF( LOG10( LARGE ).GT.2000. ) THEN
  1299. SMALL = SQRT( SMALL )
  1300. LARGE = SQRT( LARGE )
  1301. END IF
  1302. \end{verbatim}
  1303. Users of other machines with similar restrictions on the effective
  1304. range of usable numbers may have to modify this test so that the
  1305. square roots are done on their machine as well. \emph{Usually on
  1306. HPPA architectures, a similar restriction in SLABAD should be enforced
  1307. for all testing involving complex arithmetic.}
  1308. SLABAD is located in \texttt{LAPACK/SRC}.
  1309. For machines which have a narrow exponent range or lack gradual
  1310. underflow (DEC VAXes for example), it is not uncommon to experience
  1311. failures in sec.out and/or dec.out with SLAQTR/DLAQTR or DTRSYL.
  1312. The failures in SLAQTR/DLAQTR and DTRSYL
  1313. occur with test problems which are very badly scaled when the norm of
  1314. the solution is very close to the underflow
  1315. threshold (or even underflows to zero). We believe that these failures
  1316. could probably be avoided by an even greater degree of care in scaling,
  1317. but we did not want to delay the release of LAPACK any further. These
  1318. tests pass successfully on most other machines. An example failure in
  1319. dec.out on a MicroVAX II looks like the following:
  1320. \begin{verbatim}
  1321. Tests of the Nonsymmetric eigenproblem condition estimation routines
  1322. DLALN2, DLASY2, DLANV2, DLAEXC, DTRSYL, DTREXC, DTRSNA, DTRSEN, DLAQTR
  1323. Relative machine precision (EPS) = 0.277556D-16
  1324. Safe minimum (SFMIN) = 0.587747D-38
  1325. Routines pass computational tests if test ratio is less than 20.00
  1326. DEC routines passed the tests of the error exits ( 35 tests done)
  1327. Error in DTRSYL: RMAX = 0.155D+07
  1328. LMAX = 5323 NINFO= 1600 KNT= 27648
  1329. Error in DLAQTR: RMAX = 0.344D+04
  1330. LMAX = 15792 NINFO= 26720 KNT= 45000
  1331. \end{verbatim}
  1332. \section{Timing programs}
  1333. In the eigensystem timing program, calls are made to the LINPACK
  1334. and EISPACK equivalents of the LAPACK routines to allow a direct
  1335. comparison of performance measures.
  1336. In some cases we have increased the minimum number of
  1337. iterations in the LINPACK and EISPACK routines to allow
  1338. them to converge for our test problems, but
  1339. even this may not be enough.
  1340. One goal of the LAPACK project is to improve the convergence
  1341. properties of these routines, so error messages in the output
  1342. file indicating that a LINPACK or EISPACK routine did not
  1343. converge should not be regarded with alarm.
  1344. In the eigensystem timing program, we have equivalenced some work
  1345. arrays and then passed them to a subroutine, where both arrays are
  1346. modified. This is a violation of the Fortran~77 standard, which
  1347. says ``if a subprogram reference causes a dummy argument in the
  1348. referenced subprogram to become associated with another dummy
  1349. argument in the referenced subprogram, neither dummy argument may
  1350. become defined during execution of the subprogram.''
  1351. \footnote{ ANSI X3.9-1978, sec. 15.9.3.6}
  1352. If this causes any difficulties, the equivalence
  1353. can be commented out as explained in the comments for the main
  1354. eigensystem timing programs.
  1355. %\section*{MACHINE-SPECIFIC DIFFICULTIES}
  1356. %Some IBM compilers do not recognize DBLE as a generic function as used
  1357. %in LAPACK. The software tools we use to convert from single precision
  1358. %to double precision convert REAL(C) and AIMAG(C), where C is COMPLEX,
  1359. %to DBLE(Z) and DIMAG(Z), where Z is COMPLEX*16, but
  1360. %IBM compilers use DREAL(Z) and DIMAG(Z) to take the real and
  1361. %imaginary parts of a double complex number.
  1362. %IBM users can fix this problem by changing DBLE to DREAL when the
  1363. %argument of DBLE is COMPLEX*16.
  1364. %
  1365. %IBM compilers do not permit the data type COMPLEX*16 in a FUNCTION
  1366. %subprogram definition. The data type on the first line of the
  1367. %function subprogram must be changed from COMPLEX*16 to DOUBLE COMPLEX
  1368. %for the following functions:
  1369. %
  1370. %\begin{tabbing}
  1371. %\dent ZLATMOO \= from the test matrix generator library \kill
  1372. %\dent ZBEG \> from the Level 2 BLAS test program \\
  1373. %\dent ZBEG \> from the Level 3 BLAS test program \\
  1374. %\dent ZLADIV \> from the LAPACK library \\
  1375. %\dent ZLARND \> from the test matrix generator library \\
  1376. %\dent ZLATM2 \> from the test matrix generator library \\
  1377. %\dent ZLATM3 \> from the test matrix generator library
  1378. %\end{tabbing}
  1379. %The functions ZDOTC and ZDOTU from the Level 1 BLAS are already
  1380. %declared DOUBLE COMPLEX. If that doesn't work, try the declaration
  1381. %COMPLEX FUNCTION*16.
  1382. \newpage
  1383. \addcontentsline{toc}{section}{Bibliography}
  1384. \begin{thebibliography}{9}
  1385. \bibitem{LUG}
  1386. E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra,
  1387. J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney,
  1388. S. Ostrouchov, and D. Sorensen,
  1389. \textit{LAPACK Users' Guide}, Second Edition,
  1390. {SIAM}, Philadelphia, PA, 1995.
  1391. \bibitem{WN16}
  1392. E. Anderson and J. Dongarra,
  1393. \textit{LAPACK Working Note 16:
  1394. Results from the Initial Release of LAPACK},
  1395. University of Tennessee, CS-89-89, November 1989.
  1396. \bibitem{WN41}
  1397. E. Anderson, J. Dongarra, and S. Ostrouchov,
  1398. \textit{LAPACK Working Note 41:
  1399. Installation Guide for LAPACK},
  1400. University of Tennessee, CS-92-151, February 1992 (revised June 1999).
  1401. \bibitem{WN5}
  1402. C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum,
  1403. S. Hammarling, and D. Sorensen,
  1404. \textit{LAPACK Working Note \#5: Provisional Contents},
  1405. Argonne National Laboratory, ANL-88-38, September 1988.
  1406. \bibitem{WN13}
  1407. Z. Bai, J. Demmel, and A. McKenney,
  1408. \textit{LAPACK Working Note \#13: On the Conditioning of the Nonsymmetric
  1409. Eigenvalue Problem: Theory and Software},
  1410. University of Tennessee, CS-89-86, October 1989.
  1411. \bibitem{XBLAS}
  1412. X. S. Li, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar,
  1413. W. Kahan, S. Y. Kang, A. Kapur, M. C. Martin, B. J. Thompson, T. Tung,
  1414. and D. J. Yoo, \textit{Design, implementation and testing of extended
  1415. and mixed precision BLAS},
  1416. \textit{ACM Trans. Math. Soft.}, 28, 2:152--205, June 2002.
  1417. \bibitem{BLAS3}
  1418. J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling,
  1419. ``A Set of Level 3 Basic Linear Algebra Subprograms,''
  1420. \textit{ACM Trans. Math. Soft.}, 16, 1:1-17, March 1990
  1421. %Argonne National Laboratory, ANL-MCS-P88-1, August 1988.
  1422. \bibitem{BLAS3-test}
  1423. J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling,
  1424. ``A Set of Level 3 Basic Linear Algebra Subprograms:
  1425. Model Implementation and Test Programs,''
  1426. \textit{ACM Trans. Math. Soft.}, 16, 1:18-28, March 1990
  1427. %Argonne National Laboratory, ANL-MCS-TM-119, June 1988.
  1428. \bibitem{BLAS2}
  1429. J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson,
  1430. ``An Extended Set of Fortran Basic Linear Algebra Subprograms,''
  1431. \textit{ACM Trans. Math. Soft.}, 14, 1:1-17, March 1988.
  1432. \bibitem{BLAS2-test}
  1433. J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson,
  1434. ``An Extended Set of Fortran Basic Linear Algebra Subprograms:
  1435. Model Implementation and Test Programs,''
  1436. \textit{ACM Trans. Math. Soft.}, 14, 1:18-32, March 1988.
  1437. \bibitem{BLAS1}
  1438. C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh,
  1439. ``Basic Linear Algebra Subprograms for Fortran Usage,''
  1440. \textit{ACM Trans. Math. Soft.}, 5, 3:308-323, September 1979.
  1441. \end{thebibliography}
  1442. \end{document}