|
|
@@ -1,4 +1,76 @@ |
|
|
|
OpenBLAS ChangeLog |
|
|
|
==================================================================== |
|
|
|
Version 0.3.11 |
|
|
|
17-Oct-2020 |
|
|
|
|
|
|
|
common: |
|
|
|
* API change: |
|
|
|
the newly added BFLOAT16 functions were renamed to use the |
|
|
|
letter "B" instead of "H" to avoid potential confusion with |
|
|
|
the IEEE "half precision float" type, i.e. the 0.3.10 |
|
|
|
SHGEMM is now SBGEMM and the corresponding build option |
|
|
|
was changed from "BUILD_HALF" to "BUILD_BFLOAT16". |
|
|
|
* Reduced the default BLAS3_MEM_ALLOC_THRESHOLD (used as an upper |
|
|
|
limit for placing temporary arrays on the stack) to be compatible |
|
|
|
with a stack size of 1mb (as imposed by the JAVA runtime library) |
|
|
|
* Added mixed-precision dot function SBDOT and utility functions |
|
|
|
shstobf16, shdtobf16, sbf16tos and dbf16tod to convert between |
|
|
|
single or double precision float arrays and bfloat16 arrays |
|
|
|
* Fixed prototypes of LAPACK_?ggsvp and LAPACK_?ggsvd functions |
|
|
|
in lapack.h |
|
|
|
* Fixed underflow and rounding errors in LAPACK SLANV2 and DLANV2 |
|
|
|
(causing miscalculations in e.g. SHSEQR/DHSEQR, LAPACK issue #263) |
|
|
|
* Fixed workspace calculation in LAPACK ?GELQ (LAPACK issue #415) |
|
|
|
* Fixed several bugs in the LAPACK testsuite |
|
|
|
* Improved performance of TRMM and TRSM for certain problem sizes |
|
|
|
* Fixed infinite recursions and workspace miscalculations in ReLAPACK |
|
|
|
* CMAKE builds no longer require pkg-config for creating the .pc file |
|
|
|
* Makefile builds no longer misread NO_CBLAS=0 or NO_LAPACK=0 as |
|
|
|
enabling these options |
|
|
|
* Fixed detection of gfortran when invoked through an mpi wrapper |
|
|
|
* Improve thread reinitialization performance with OpenMP xafter a fork |
|
|
|
* Added support for building only the subset of the library required |
|
|
|
for a particular precision by specifying BUILD_SINGLE, BUILD_DOUBLE |
|
|
|
* Optional function name prefixes and suffixes are now correctly |
|
|
|
reflected in the generated cblas.h |
|
|
|
* Added CMAKE build support for the LAPACK and multithreading tests |
|
|
|
|
|
|
|
POWER: |
|
|
|
* Added optimized support for POWER10 |
|
|
|
* Added support for compiling for POWER8 in 32bit mode |
|
|
|
* Added support for compilation with LLVM/clang |
|
|
|
* Added support for compilation with NVIDIA/PGI compilers |
|
|
|
* Fixed building on big-endian POWER8 |
|
|
|
* Fixed miscompilation of ZDOTC by gcc10 |
|
|
|
* Fixed alignment errors in the POWER8 SAXPY kernel |
|
|
|
* Improved CPU detection on AIX |
|
|
|
* Supported building with older compilers on POWER9 |
|
|
|
|
|
|
|
x86_64: |
|
|
|
* Added support for Intel Cooperlake |
|
|
|
* Added autodetection of AMD Renoir/Matisse/Zen3 cpus |
|
|
|
* Added autodetection of Intel Comet Lake cpus |
|
|
|
* Reimplemented ?sum, ?dot and daxpy using universal intrinsics |
|
|
|
* Reset the fpu state before using the fpu on Windows as a workaround |
|
|
|
for a problem introduced in Windows 10 build 19041 (a.k.a. SDK 2004) |
|
|
|
* Fixed potentially undefined behaviour in the dot and gemv_t kernels |
|
|
|
* Fixed a potential segmentation fault in DYNAMIC_ARCH builds |
|
|
|
* Fixed building for ZEN with PGI/NVIDIA and AMD AOCC compilers |
|
|
|
|
|
|
|
ARMV7: |
|
|
|
* Fixed cpu detection on BSD-like systems |
|
|
|
|
|
|
|
ARMV8: |
|
|
|
* Added preliminary support for Apple Vortex cpus |
|
|
|
* Added support for the Cavium ThunderX3T110 cpu |
|
|
|
* Fixed cpu detection on BSD-like systems |
|
|
|
* Fixed compilation in -std=C18 mode |
|
|
|
|
|
|
|
|
|
|
|
IBM Z: |
|
|
|
* Added support for compiling with the clang compiler |
|
|
|
* Improved GEMM performance on Z14 |
|
|
|
|
|
|
|
==================================================================== |
|
|
|
Version 0.3.10 |
|
|
|
14-Jun-2020 |
|
|
|