Martin Kroeker
25427926bc
Improve handling of NO_STATIC and NO_SHARED
to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422
6 years ago
Martin Kroeker
edb8143141
Merge pull request #2037 from martin-frbg/issue2033-2
Make sure that AVX512 is disabled in 32bit builds
6 years ago
Martin Kroeker
c4868d11c0
Make sure that AVX512 is disabled in 32bit builds
for #2033
6 years ago
Martin Kroeker
4c321ae571
Merge pull request #2034 from martin-frbg/issue2033
Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
6 years ago
Martin Kroeker
2ffb727187
Keep xcode8.3 for osx BINARY=32 build
as xcode10 deprecated i386
6 years ago
Martin Kroeker
d66214c946
Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
fixes #2033
6 years ago
Martin Kroeker
fd34820b99
Fix AVX512 test always returning false due to missing compiler option
6 years ago
Martin Kroeker
918a0cc4d1
Fix missing -c option in AVX512 test
6 years ago
Martin Kroeker
0db9c03e7e
Merge pull request #2028 from brada4/mv
Move one of clobber fixes to right place
6 years ago
Andrew
6eee1beac5
move fix to right place
6 years ago
Andrew
e5df5958cc
init
6 years ago
Martin Kroeker
343b301d14
Reduce list of kernels in the dynamic arch build
to make compilation complete reliably within the 1h limit again
6 years ago
Martin Kroeker
45333d5793
Fix error introduced during cleanup
6 years ago
Martin Kroeker
e29b0cfcc4
Allow multithreading TRMV again
revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388 )
6 years ago
Martin Kroeker
78d9910236
Correct range_n limiting
same bug as seen in #1388 , somehow missed in corresponding PR #1389
6 years ago
Martin Kroeker
e12cdf58ef
Merge pull request #2024 from martin-frbg/gcc9fixes4
Fix inline assembly constraints in Bulldozer TRSM kernels
6 years ago
Martin Kroeker
1860c9456d
Merge pull request #2023 from martin-frbg/gcc9fixes3
Fix inline assembly constraints in various x86_64 GEMVN kernels
6 years ago
Martin Kroeker
aec905498f
Merge pull request #1988 from TiborGY/patch-1
Reword/expand comments in Makefile.rule
6 years ago
TiborGY
56089991e2
fix the the
6 years ago
Martin Kroeker
f9bb76d29a
Fix inline assembly constraints in Bulldozer TRSM kernels
rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009
6 years ago
Martin Kroeker
8242b1fe3f
Fix inline assembly constraints
6 years ago
Martin Kroeker
efb9038f72
Fix inline assembly constraints
6 years ago
Martin Kroeker
e976557d29
Fix inline assembly constraints
rework indices to allow marking argument lda as input and output.
6 years ago
Martin Kroeker
9d8be15789
Fix inline assembly constraints
rework indices to allow marking argument lda4 as input and output. For #2009
6 years ago
Martin Kroeker
d752799a0f
Merge pull request #2021 from martin-frbg/gcc9fixes2
Fix wrong constraints in inline assembly of Haswell DTRSM kernel
6 years ago
TiborGY
f209fc7fa9
Update Makefile.rule
add note about NUM_THREADS for package maintainers, add examples of programs that cause affinity troubles
6 years ago
Martin Kroeker
c26c0b77a7
Fix wrong constraints in inline assembly
for #2009
6 years ago
Martin Kroeker
1c6da2d03c
Merge pull request #2019 from martin-frbg/gcc9fixes
Fix unannounced modification of input operand 8 (lda4) in Haswell GEMVN microkernel
6 years ago
Martin Kroeker
4255a58cd2
Rename operands to put lda on the input/output constraint list
6 years ago
Martin Kroeker
d3e4725548
Merge pull request #2020 from martin-frbg/issue1956
With the Intel compiler on Linux, prefer ifort for the final link step
6 years ago
Martin Kroeker
adb419ed67
With the Intel compiler on Linux, prefer ifort for the final link step
icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956
6 years ago
Martin Kroeker
46e415b140
Save and restore input argument 8 (lda4)
Fixes miscompilation with gcc9 -ftree-vectorize (related to issue #2009 )
6 years ago
Martin Kroeker
cd5a59b9cf
Merge pull request #2018 from bartoldeman/fix-dgemv-znver1-tree-vectorize
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
6 years ago
Bart Oldeman
69a97ca7b9
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
This fixes a crash in dblat2 when OpenBLAS is compiled using
-march=znver1 -ftree-vectorize -O2
See also:
https://github.com/easybuilders/easybuild-easyconfigs/issues/7180
6 years ago
Martin Kroeker
b55c586fac
Fix missing clobber in x86/x86_64 blas_quickdivide inline assembly function ( #2017 )
* Fix missing clobber in blas_quickdivide assembly
6 years ago
Martin Kroeker
056917d616
Merge pull request #2013 from martin-frbg/issue2011
Fix invalid memory access in PPC gemm_beta
6 years ago
Martin Kroeker
718efcec6f
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq), assuming typo by K.Goto
6 years ago
Martin Kroeker
f9d67bb5e8
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq) presuming typo by K.Goto
6 years ago
Martin Kroeker
76bb74fcd4
Merge pull request #2012 from maamountki/z14
[ZARCH] Many improvements
6 years ago
maamountki
0a54c98b9d
[ZARCH] Modify constraints
6 years ago
maamountki
bec54ae366
[ZARCH] Fix caxpy
6 years ago
Martin Kroeker
63d7bad8a5
Merge pull request #2010 from martin-frbg/issue2009
Fix declaration of input arguments in x86_64 GEMV, SYMV and DSCAL
6 years ago
Martin Kroeker
ab1630f9fa
Fix declaration of arguments in inline assembly
Argument 0 is modified so should be input and output
6 years ago
Martin Kroeker
b824fa70eb
Fix declaration of assembly arguments in SSYMV and DSYMV microkernels
Arguments 0 and 1 are both input and output
6 years ago
Martin Kroeker
91481a3e4e
Fix declaration of input arguments in inline assembly
Argument 0 is modified as it doubles as a counter
6 years ago
Martin Kroeker
dc6ac9eab0
Fix declaration of input arguments in the x86_64 s/dGEMV_T and s/dGEMV_N kernels
Arguments 0 and 1 need to be tagged as both input and output
6 years ago
maamountki
f583674109
[ZARCH] Fix cgemv_t_4
6 years ago
maamountki
77fe70019f
[ZARCH] Fix constraints and source code formatting
6 years ago
Martin Kroeker
03a2bf2602
Fix potential memory leak in cpu enumeration on Linux ( #2008 )
* Fix potential memory leak in cpu enumeration with glibc
An early return after a failed call to sched_getaffinity would leak the previously allocated cpu_set_t. Wrong calculation of the size argument in that call increased the likelyhood of that failure. Fixes #2003
6 years ago
Martin Kroeker
69edc5bbe7
Restore dropped patches in the non-TLS branch of memory.c ( #2004 )
* Restore dropped patches in the non-TLS branch of memory.c
As discovered in #2002 , the reintroduction of the "original" non-TLS version of memory.c as an alternate branch had inadvertently used ba1f91f rather than a8002e2 , thereby dropping the commits for #1450 , #1468 , #1501 , #1504 and #1520 .
6 years ago