Martin Kroeker
e7c4d6705a
Revert #2051 and replace with a better fix ( #2261 )
* Revert #2051 and add a better fix for TARGET=generic with DYNAMIC_ARCH
fixes #2257 without breaking #2048 again
6 years ago
Martin Kroeker
f3c314550c
Merge pull request #2243 from quickwritereader/develop
possible cgemv,caxpy,cdot fix
6 years ago
AbdelRauf
847c20c9b7
fix uninitialized variables i
6 years ago
AbdelRauf
4c22828812
caxpy and cdot are using vec_vsx_ld
6 years ago
AbdelRauf
e79712d969
cgemv using vec_vsx_ld instead of letting gcc to decide
6 years ago
AbdelRauf
be09551cdf
aligned
6 years ago
Martin Kroeker
11c59acfb1
Keep both PGI/SUN and default code paths to avoid breaking Clang/WIndows
6 years ago
Martin Kroeker
3a55dca2dc
Make x86_64 zdot compile with PGI and Sun C again
broken by #2222 as CREAL,CIMAG do not expand to a valid lvalue with these compilers
6 years ago
Kavana Bhat
3dc6b26eff
AIX changes for Power8
6 years ago
Martin Kroeker
9ef96b32a6
Add multithreading support to the x86_64 zdot kernel ( #2222 )
* Add multithreading support
copied from the ThunderX2T99 kernel. For #2221
6 years ago
Martin Kroeker
103b32fdb7
Merge pull request #2216 from martin-frbg/issue2214
Remove case-sensitivity in x86 LSAME on (AMD) cpus without CMOV
6 years ago
Martin Kroeker
aef9804089
Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV
Problem was already noticed some years ago in #238 , but back then the problem was only corrected in one of the #ifdef branches.
Fixes #2214
6 years ago
Martin Kroeker
dccff2e785
Merge pull request #2206 from martin-frbg/zen-dtrmm
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
6 years ago
Martin Kroeker
5c3458a6e7
Merge pull request #2199 from martin-frbg/zen-dtrsm
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
6 years ago
Martin Kroeker
acf6002ab2
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
6 years ago
Martin Kroeker
2dfb804cb9
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
to improve performance on AMD Zen (#2180 ) applying wjc404's improvement of the DGEMM kernel from #2186
6 years ago
Martin Kroeker
4c153ec9da
Merge pull request #2196 from wjc404/develop
Add vbroadcastsd kernel to dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
7eecd8e39c
Add files via upload
6 years ago
Martin Kroeker
7b0b7c11d2
Merge pull request #2190 from martin-frbg/zdot-zen
Replace vpermpd with vpermilpd in the Haswell/Zen zdot microkernel
6 years ago
Martin Kroeker
28e96458e5
Replace vpermpd with vpermilpd
to improve performance on Zen/Zen2 (as demonstrated by wjc404 in #2180 )
6 years ago
wjc404
95fb98f556
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
4801c6d36b
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
9440fa607d
Add files via upload
6 years ago
wjc404
94db259e5b
Add files via upload
6 years ago
wjc404
f49f8047ac
Add files via upload
6 years ago
wjc404
825777faab
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
9c89757562
Add files via upload
6 years ago
wjc404
9b04baeaee
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
8a074b3965
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
211ab03b14
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
1733f927e6
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
182b06d6ad
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
7a9050d681
Update dgemm_kernel_4x8_haswell.S
6 years ago
wjc404
0ba29fd262
Update dgemm_kernel_4x8_haswell.S for zen2
replaced a bunch of vpermpd instructions with vpermilpd and vperm2f128
6 years ago
Martin Kroeker
6b6c9b1441
Merge pull request #2172 from quickwritereader/develop
power9 cgemm/ctrmm. new sgemm 8x16
6 years ago
AbdelRauf
a97b301aaa
cgemm/ctrmm power9
6 years ago
Piotr Kubaj
eebfeba768
Fix build on FreeBSD/powerpc64.
Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl>
6 years ago
kavanabhat
a575f1e4c7
Update dtrmm_kernel_16x4_power8.S
6 years ago
AbdelRauf
cdbfb891da
new sgemm 8x16
6 years ago
Martin Kroeker
a17cf36225
Merge pull request #2153 from quickwritereader/develop
improved power9 zgemm,sgemm
6 years ago
AbdelRauf
148c4cc5fd
conflict resolve
6 years ago
AbdelRauf
d0c3543c3f
power9 zgemm ztrmm optimized
6 years ago
AbdelRauf
a469b32cf4
sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52
6 years ago
AbdelRauf
8fe794f059
improved zgemm power9 based on power8
6 years ago
Martin Kroeker
74c10b57c6
Use generic kernels for complex (I)AMAX to support softfp
6 years ago
Martin Kroeker
c5495d2056
Ensure correct output for DAMAX with softfp
6 years ago
Martin Kroeker
c70496b108
Separate implementations of AMAX and IAMAX on arm
As noted in #1912 and comment on #1942 , the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register
6 years ago
Martin Kroeker
9ea30f3788
Replace ISMIN and ISAMIN kernels on all x86_64 platforms ( #2125 )
* Mark iamax_sse.S as unsuitable for MIN due to issue #2116
* Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116
6 years ago
Martin Kroeker
6a8b4269b5
Merge pull request #2111 from martin-frbg/issue1955
Disable the SkyLakeX DGEMMIxCOPY kernels as well
6 years ago
Martin Kroeker
b1561ecc68
Disable DGEMMINCOPY as well for now
#1955
6 years ago