Martin Kroeker
|
7ed8431527
|
Disable the SkyLakeX DGEMMITCOPY kernel as well
as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955
|
6 years ago |
Martin Kroeker
|
3f427c0cf9
|
Merge pull request #2107 from quickwritereader/develop
sgemm/strmm kernel for power9
|
6 years ago |
AbdelRauf
|
47f892198c
|
conflict resolve
|
6 years ago |
AbdelRauf
|
628b335e83
|
Merge branch 'develop' of https://github.com/quickwritereader/OpenBLAS into develop
|
6 years ago |
AbdelRauf
|
0f105dd8a5
|
sgemm/strmm
|
6 years ago |
Martin Kroeker
|
ccfb7ead15
|
Merge pull request #2072 from martin-frbg/sum
Add (C)BLAS extension ?sum
|
6 years ago |
Rashmica Gupta
|
bcdf1d4917
|
Add in runtime CPU detection for POWER.
|
6 years ago |
Martin Kroeker
|
c04a729081
|
Add ?sum definitions for generic kernel
|
6 years ago |
Martin Kroeker
|
100d94f94e
|
Add ?sum
|
6 years ago |
Martin Kroeker
|
246ca29679
|
Add ZARCH implementation of ?sum
as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed
|
6 years ago |
Martin Kroeker
|
9d717cb5ee
|
Add x86_64 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
|
6 years ago |
Martin Kroeker
|
e3bc83f2a8
|
Add x86 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
|
6 years ago |
Martin Kroeker
|
70f2a4e0d7
|
Add SPARC implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure
|
6 years ago |
Martin Kroeker
|
706dfe263b
|
Add POWER implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure
|
6 years ago |
Martin Kroeker
|
688fa9201c
|
Add MIPS64 implementation of ?sum
as trivial copy of ?asum with the fabs replaced by mov to preserve code structure
|
6 years ago |
Martin Kroeker
|
cdbe0f0235
|
Add MIPS implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
|
6 years ago |
Martin Kroeker
|
f8b82bc6dc
|
Add ia64 implementation of ?sum
as trivial copy of asum with the fabs calls removed
|
6 years ago |
Martin Kroeker
|
3e3ccb9011
|
Add ARM64 implementations of ?sum
as trivial copies of the respective ?asum kernels with the fabs calls removed
|
6 years ago |
Martin Kroeker
|
94ab4e6fb2
|
Add ARM implementations of ?sum
(trivial copies of the respective ?asum with the fabs calls removed)
|
6 years ago |
Martin Kroeker
|
c3cfc6986b
|
Add implementations of ssum/dsum and csum/zsum
as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure
|
6 years ago |
Martin Kroeker
|
b9f4943a14
|
Add ?sum
|
6 years ago |
Martin Kroeker
|
32c7063cb0
|
Merge pull request #2061 from martin-frbg/martin-frbg-patch-1
Disable the AVX512 DGEMM kernel (again)
|
6 years ago |
Martin Kroeker
|
7c51cc8527
|
Merge branch 'develop' into develop
|
6 years ago |
AbdelRauf
|
853a18bc17
|
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
|
6 years ago |
Martin Kroeker
|
e608d4f7fe
|
Disable the AVX512 DGEMM kernel (again)
Due to as yet unresolved errors seen in #1955 and #2029
|
6 years ago |
Martin Kroeker
|
03d7110900
|
Merge pull request #2042 from maomao194313/develop
add TARGET support for HiSilicon tsv110 CPUs
|
6 years ago |
Martin Kroeker
|
f18ab6c17b
|
Merge pull request #2051 from martin-frbg/issue2048
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
|
6 years ago |
Martin Kroeker
|
5b95534afc
|
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
for issue #2048
|
6 years ago |
Celelibi
|
b7f59da42d
|
Fix crash in sgemm SSE/nano kernel on x86_64
Fix bug #2047.
Signed-off-by: Celelibi <celelibi@gmail.com>
|
6 years ago |
maomao194313
|
783ba8058f
|
HiSilicon tsv110 CPUs optimization branch
add HiSilicon tsv110 CPUs optimization branch
|
6 years ago |
Andrew
|
6eee1beac5
|
move fix to right place
|
6 years ago |
Martin Kroeker
|
e12cdf58ef
|
Merge pull request #2024 from martin-frbg/gcc9fixes4
Fix inline assembly constraints in Bulldozer TRSM kernels
|
6 years ago |
Martin Kroeker
|
1860c9456d
|
Merge pull request #2023 from martin-frbg/gcc9fixes3
Fix inline assembly constraints in various x86_64 GEMVN kernels
|
6 years ago |
Martin Kroeker
|
f9bb76d29a
|
Fix inline assembly constraints in Bulldozer TRSM kernels
rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009
|
6 years ago |
Martin Kroeker
|
efb9038f72
|
Fix inline assembly constraints
|
6 years ago |
Martin Kroeker
|
e976557d29
|
Fix inline assembly constraints
rework indices to allow marking argument lda as input and output.
|
6 years ago |
Martin Kroeker
|
9d8be15789
|
Fix inline assembly constraints
rework indices to allow marking argument lda4 as input and output. For #2009
|
6 years ago |
Martin Kroeker
|
d752799a0f
|
Merge pull request #2021 from martin-frbg/gcc9fixes2
Fix wrong constraints in inline assembly of Haswell DTRSM kernel
|
6 years ago |
Martin Kroeker
|
c26c0b77a7
|
Fix wrong constraints in inline assembly
for #2009
|
6 years ago |
Martin Kroeker
|
1c6da2d03c
|
Merge pull request #2019 from martin-frbg/gcc9fixes
Fix unannounced modification of input operand 8 (lda4) in Haswell GEMVN microkernel
|
6 years ago |
Martin Kroeker
|
4255a58cd2
|
Rename operands to put lda on the input/output constraint list
|
6 years ago |
Martin Kroeker
|
46e415b140
|
Save and restore input argument 8 (lda4)
Fixes miscompilation with gcc9 -ftree-vectorize (related to issue #2009)
|
6 years ago |
Bart Oldeman
|
69a97ca7b9
|
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
This fixes a crash in dblat2 when OpenBLAS is compiled using
-march=znver1 -ftree-vectorize -O2
See also:
https://github.com/easybuilders/easybuild-easyconfigs/issues/7180
|
6 years ago |
Martin Kroeker
|
056917d616
|
Merge pull request #2013 from martin-frbg/issue2011
Fix invalid memory access in PPC gemm_beta
|
6 years ago |
Martin Kroeker
|
718efcec6f
|
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq), assuming typo by K.Goto
|
6 years ago |
Martin Kroeker
|
f9d67bb5e8
|
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq) presuming typo by K.Goto
|
6 years ago |
Martin Kroeker
|
76bb74fcd4
|
Merge pull request #2012 from maamountki/z14
[ZARCH] Many improvements
|
6 years ago |
maamountki
|
0a54c98b9d
|
[ZARCH] Modify constraints
|
6 years ago |
maamountki
|
bec54ae366
|
[ZARCH] Fix caxpy
|
6 years ago |
Martin Kroeker
|
ab1630f9fa
|
Fix declaration of arguments in inline assembly
Argument 0 is modified so should be input and output
|
6 years ago |