|
|
@@ -1,4 +1,86 @@ |
|
|
|
OpenBLAS ChangeLog |
|
|
|
==================================================================== |
|
|
|
Version 0.3.21 |
|
|
|
07-Aug-2022 |
|
|
|
|
|
|
|
general: |
|
|
|
- Updated the included LAPACK to Reference-LAPACK release 3.10.1 |
|
|
|
- when no Fortran compiler is available, OpenBLAS builds will now automatically |
|
|
|
build LAPACK from an f2c-converted copy of LAPACK 3.9.0 unless the NO_LAPACK option |
|
|
|
is specified |
|
|
|
- similarly added C versions of the BLAS and CBLAS tests |
|
|
|
- enabled building of the ReLAPACK GEMMT kernels when ReLAPACK is built |
|
|
|
- function LAPACKE_lsame is now annotated with the GCC attribute "const" to aid static analyzers |
|
|
|
- added USE_TLS to the list of options reported by the openblas_get_config() function |
|
|
|
- CMAKE builds now support the BUILD_TESTING keyword (to disable the LAPACK testsuite) of Reference-LAPACK |
|
|
|
- fixed CMAKE builds of the laswp_ncopy and neg_tcopy kernels |
|
|
|
- removed the build system requirements for PERL (while keeping the original perl scripts as backup) |
|
|
|
- handle building and running OpenBLAS on systems that report zero available cpu cores |
|
|
|
- added SYMBOLPREFIX/SYMBOLSUFFIX handling for LAPACK 3.10.0 functions added in 0.3.20 |
|
|
|
- fixed linking of the utests on QNX |
|
|
|
- Added support for compilation with the Intel ifx compiler |
|
|
|
- Added support for compilation with the Fujitsu FCC compiler for Fugaku |
|
|
|
- Added support for compilation with the Cray C and Fortran compilers |
|
|
|
- reverted OpenMP threadpool behaviour in the exec_blas call to its state before 0.3.11, that is |
|
|
|
the threadpool will no longer grow or shrink on demand as the overhead for this is too big at least with |
|
|
|
GNU OpenMP. The adaptive behaviour introduced in 0.3.11 can still be requested at runtime by setting |
|
|
|
the environment variable OMP_ADAPTIVE |
|
|
|
- worked around spurious STFSM/CTFSM errors reported by the LAPACK testsuite |
|
|
|
|
|
|
|
x86_64: |
|
|
|
- fixed determination of compiler support for AVX512 and removed the 0.3.19 |
|
|
|
workaround for building SKYLAKEX kernels on Sandybridge hardware |
|
|
|
- fixed compilation for the SKYLAKEX target with gcc 6 |
|
|
|
- fixed compilation of the CooperLake SBGEMM kernel with LLVM |
|
|
|
- fixed compilation of the SkyLakeX small matrix GEMM kernels with LLVM or ICC |
|
|
|
- fixed compilation of some BFLOAT16 kernels with CMAKE |
|
|
|
- added support for the Zhaoxin/Centaur KH40000 cpu |
|
|
|
- fixed a potential crash in the ZSYMV kernel used for all targets except generic |
|
|
|
- fixed gmake compilation for DYNAMIC_ARCH with a DYNAMIC_LIST including ATOM |
|
|
|
- fixed compilation of LAPACKE with the INTEGER64 option on Windows |
|
|
|
- added support for cross-compiling to individual Intel or AMD targets using CMAKE |
|
|
|
(previously only CORE2 supported, added targets are ATOM, PRESCOTT, NEHALEM, SANDYBRIDGE, |
|
|
|
HASWELL,SKYLAKEX, COOPERLAKE, SAPPHIRERAPIDS, OPTERON, BARCELONA, BULLDOZER, PILEDRIVER, |
|
|
|
STEAMROLLER,EXCAVATOR, ZEN) |
|
|
|
|
|
|
|
SPARC: |
|
|
|
- worked around an overflow error in the DNRM2 kernel |
|
|
|
|
|
|
|
POWER: |
|
|
|
- worked around an overflow error in the POWER6 DNRM2 kernel |
|
|
|
- fixed compilation on PPC440 |
|
|
|
- fixed a performance regression in the level1 BLAS on POWER10 |
|
|
|
- fixed the POWER10 ZGEMM kernel |
|
|
|
- fixed singlethreaded builds for POWER10 |
|
|
|
- fixed compilation of the POWER10 DGEMV kernel with older gcc versions |
|
|
|
- enabled compilation of the BFLOAT16 kernels by default |
|
|
|
- enabled the small matrix kernels by default for DYNAMIC_ARCH builds |
|
|
|
- added a workaround for a miscompilation of the CDOT and ZDOT kernels by GCC 12 |
|
|
|
|
|
|
|
- RISCV: |
|
|
|
- fixed cpu autodetection logic |
|
|
|
|
|
|
|
ARMV8: |
|
|
|
- added an SBGEMM kernel for Neoverse N2 |
|
|
|
- worked around an overflow error in the DNRM2 kernel used on M1, NeoverseN1, ThunderX2T99 |
|
|
|
- added support for ARM64 systems running MS Windows |
|
|
|
- added support for cross-compiling to the GENERIC ARMV8 target under CMAKE (Windows/MSVC) |
|
|
|
- fixed a performance regression in the generic ARMV8 DGEMM kernel introduced in 0.3.19 |
|
|
|
- added initial support for the Apple M1 cpu under Linux |
|
|
|
- added initial support for the Phytium FT2000 cpu |
|
|
|
- added initial support for the Cortex A510, A710, X1 and X2 cpu |
|
|
|
- fixed an accidental mixup of cpu identifiers in the autodetection code introduced in 0.3.20 |
|
|
|
- fixed linking of Apple M1 builds on macOS 12 and later with recent XCode |
|
|
|
- made Neoverse N2 available in DYNAMIC_ARCH builds |
|
|
|
|
|
|
|
MIPS,MIPS64: |
|
|
|
- worked around an overflow error in the DNRM2 kernel |
|
|
|
|
|
|
|
LOONGARCH64: |
|
|
|
- worked around an overflow error in the DNRM2 kernel |
|
|
|
- added preliminary support for the LOONGSON2K1000 cpu |
|
|
|
- added DYNAMIC_ARCH support |
|
|
|
|
|
|
|
==================================================================== |
|
|
|
Version 0.3.20 |
|
|
|
20-Feb-2022 |
|
|
|