|
@@ -1,4 +1,82 @@ |
|
|
OpenBLAS ChangeLog |
|
|
OpenBLAS ChangeLog |
|
|
|
|
|
==================================================================== |
|
|
|
|
|
Version 0.3.6 |
|
|
|
|
|
29-Apr-2019 |
|
|
|
|
|
|
|
|
|
|
|
common: |
|
|
|
|
|
* the build tools now check that a given cpu TARGET is actually valid |
|
|
|
|
|
* the build-time check of system features (c_check) has been made |
|
|
|
|
|
less dependent on particular perl features (this should mainly |
|
|
|
|
|
benefit building on Windows) |
|
|
|
|
|
* several problem with the ReLAPACK integration were fixed, |
|
|
|
|
|
including INTERFACE64 support and building a shared library |
|
|
|
|
|
* building with CMAKE on BSD systems was improved |
|
|
|
|
|
* a non-absolute SUM function was added based on the |
|
|
|
|
|
existing optimized code for ASUM |
|
|
|
|
|
* CBLAS interfaces to the IxMIN and IxMAX functions were added |
|
|
|
|
|
* a name clash between LAPACKE and BOOST headers was resolved |
|
|
|
|
|
* CMAKE builds with OpenMP failed to include the appropriate getrf_parallel |
|
|
|
|
|
kernels |
|
|
|
|
|
* a crash on thread (key) deletion with the USE_TLS=1 memory management |
|
|
|
|
|
option was fixed |
|
|
|
|
|
* restored several earlier fixes, in particular for OpenMP performance, |
|
|
|
|
|
building on BSD, and calling fork on CYGWIN, which had inadvertently |
|
|
|
|
|
been dropped in the 0.3.3 rewrite of the memory management code. |
|
|
|
|
|
|
|
|
|
|
|
x86_64: |
|
|
|
|
|
* the AVX512 DGEMM kernel has been disabled again due to unsolved problems |
|
|
|
|
|
* building with old versions of MSVC was fixed |
|
|
|
|
|
* it is now possible to build a static library on Windows with CMAKE |
|
|
|
|
|
* accessing environment variables on CYGWIN at run time was fixed |
|
|
|
|
|
* the CMAKE build system now recognizes 32bit userspace on 64bit hardware |
|
|
|
|
|
* Intel "Denverton" atom and Hygon "Dhyana" zen CPUs are now autodetected |
|
|
|
|
|
* building for DYNAMIC_ARCH with a DYNAMIC_LIST of targets is now supported |
|
|
|
|
|
with CMAKE as well |
|
|
|
|
|
* building for DYNAMIC_ARCH with GENERIC as the default target is now supported |
|
|
|
|
|
* a buffer overflow in the SSE GEMM kernel for Intel Nano targets was fixed |
|
|
|
|
|
* assembly bugs involving undeclared modification of input operands were fixed |
|
|
|
|
|
in the AXPY, DOT, GEMV, GER, SCAL, SYMV and TRSM microkernels for Nehalem, |
|
|
|
|
|
Sandybridge, Haswell, Bulldozer and Piledriver. These would typically cause |
|
|
|
|
|
test failures or segfaults when compiled with recent versions of gcc from 8 onward. |
|
|
|
|
|
* a similar bug was fixed in the blas_quickdivide code used to split workloads |
|
|
|
|
|
in most functions |
|
|
|
|
|
* a bug in the IxMIN implementation for the GENERIC target made it return the result of IxMAX |
|
|
|
|
|
* fixed building on SkylakeX systems when either the compiler or the (emulated) operating |
|
|
|
|
|
environment does not support AVX512 |
|
|
|
|
|
* improved GEMM performance on ZEN targets |
|
|
|
|
|
|
|
|
|
|
|
x86: |
|
|
|
|
|
* build failures caused by the recently added checks for AVX512 were fixed |
|
|
|
|
|
* an inline assembly bug involving undeclared modification of an input argument was |
|
|
|
|
|
fixed in the blas_quickdivide code used to split workloads in most functions |
|
|
|
|
|
* a bug in the IMIN implementation for the GENERIC target made it return the result of IMAX |
|
|
|
|
|
|
|
|
|
|
|
MIPS32: |
|
|
|
|
|
* a bug in the IMIN implementation made it return the result of IMAX |
|
|
|
|
|
|
|
|
|
|
|
POWER: |
|
|
|
|
|
* single precision BLAS1/2 functions have received optimized POWER8 kernels |
|
|
|
|
|
* POWER9 is now a separate target, with an optimized DGEMM/DTRMM kernel |
|
|
|
|
|
* building on PPC970 systems under OSX Leopard or Tiger is now supported |
|
|
|
|
|
* out-of-bounds memory accesses in the gemm_beta microkernels were fixed |
|
|
|
|
|
* building a shared library on AIX is now supported for POWER6 |
|
|
|
|
|
* DYNAMIC_ARCH support has been added for POWER6 and newer |
|
|
|
|
|
|
|
|
|
|
|
ARMv7: |
|
|
|
|
|
* corrected xDOT behaviour with zero INC_X or INC_Y |
|
|
|
|
|
* a bug in the IMIN implementation made it return the result of IMAX |
|
|
|
|
|
|
|
|
|
|
|
ARMv8: |
|
|
|
|
|
* added support for HiSilicon TSV110 cpus |
|
|
|
|
|
* the CMAKE build system now recognizes 32bit userspace on 64bit hardware |
|
|
|
|
|
* cross-compilation with CMAKE now works again |
|
|
|
|
|
* a bug in the IMIN implementation made it return the result of IMAX |
|
|
|
|
|
* ARMV8 builds with the BINARY=32 option are now automatically handled as ARMV7 |
|
|
|
|
|
|
|
|
|
|
|
IBM Z: |
|
|
|
|
|
* optimized microkernels for single precicion BLAS1/2 functions have been added |
|
|
|
|
|
for both Z13 and Z14 |
|
|
|
|
|
|
|
|
==================================================================== |
|
|
==================================================================== |
|
|
Version 0.3.5 |
|
|
Version 0.3.5 |
|
|
31-Dec-2018 |
|
|
31-Dec-2018 |
|
|