tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
8 months ago
Martin Kroeker
24acdd6bbb
correct offset
1 year ago
Martin Kroeker
dd6c33d34d
make NAN handling depend on dummy2 parameter
1 year ago
Martin Kroeker
9e24121e7e
temporarily(?) disable da=0 shortcut to handle x=Inf or NAN
1 year ago
Martin Kroeker
6ffaf99817
disable da=0 shortcut to handle NAN and INF correctly
1 year ago
Martin Kroeker
1c31f56e5a
Handle NAN
1 year ago
Martin Kroeker
7ee1ee38e2
Handle NaN in input
1 year ago
Martin Kroeker
07e32c4cb8
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Wangyang Guo
3dc6052c7e
initial support for Sapphire Rapids platform
4 years ago
Chen, Guobing
e740c4873d
Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
5 years ago
Martin Kroeker
aef9804089
Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV
Problem was already noticed some years ago in #238 , but back then the problem was only corrected in one of the #ifdef branches.
Fixes #2214
6 years ago
Martin Kroeker
100d94f94e
Add ?sum
6 years ago
Martin Kroeker
e3bc83f2a8
Add x86 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
6 years ago
Martin Kroeker
0023515733
Typo fix (misplaced parenthesis)
7 years ago
Arjan van de Ven
99c7bba8e4
Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)
This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".
Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
7 years ago
Martin Kroeker
7df8c4f76f
typo fix
7 years ago
Martin Kroeker
2fc748bf72
Restore optimized swap kernel now that we have a proper fix
7 years ago
Martin Kroeker
d1b7be14aa
Handle INCX=0,INCY=0 case
Fixes #1575 (sswap/dswap failing the swap utest on x86) as suggested by atsampson.
7 years ago
Martin Kroeker
28ac9ea5a6
Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
to resolve dsdot utest failure seen in #1492
7 years ago
Martin Kroeker
e7366a4161
Restore the remaining utests ( #1462 )
* Restore the remaining utests
* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well
* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest
* Disable zdotu test for MS cl to work around runtime error -1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API)
7 years ago
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
8 years ago
Zhang Xianyi
53b6023a6c
Fix cmake bug on MSVC 32-bit.
10 years ago
Zhang Xianyi
7df0820160
Use C kernels for s/dgemv on x86.
10 years ago
Zhang Xianyi
1cf2b10224
Use pure C generic target on x86 and x86_64.
make TARGET=GENERIC
?gemm3m is unimplemented on generic target.
10 years ago
wernsaar
0884b73c69
Lapack-test Windows 32bit now error free
11 years ago
wernsaar
9bd9472ae9
Lapack-test: cleanup of x86 32bit KERNEL file
11 years ago
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
11 years ago
wernsaar
793509a3b5
replaced files for sdot, sgemv_n and sgemv_t for bug #348
11 years ago
wernsaar
9423f980f6
modified trsm kernel
12 years ago
wernsaar
c6156b2ef2
added trsm kernels from origin
12 years ago
wernsaar
6216ab8a7e
removed obsolete gemm_kernels from haswell branch
12 years ago
Zhang Xianyi
f51a849d91
Merge pull request #278 from wernsaar/haswell
Merge wernsaar's Haswell gemm kernels.
12 years ago
wernsaar
4070d9a123
added dgemm_kernel_16x2_haswell.S
12 years ago
wernsaar
0b90c0ec64
added sgemm_kernel_16x4_haswell.S
12 years ago
Zhang Xianyi
2638370844
Init code base for Intel Haswell.
12 years ago
Zhang Xianyi
886cbaf4e4
Support AMD Piledriver by bulldozer kernels.
12 years ago
Zhang Xianyi
fa916a0fac
Fixed #238 bug in lsame on x86.
12 years ago
wangqian
6a72840945
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86.
12 years ago
Zhang Xianyi
5c8bf6ae0e
Merge branch 'bulldozer' into develop
12 years ago
Zhang Xianyi
0b08f7479e
Refs #154 . Fixed gemv_t bug about overflow 16MB buffer on x86.
12 years ago
Zhang Xianyi
69200884e1
Refs #173 . Fixed overflow internal buffer bug of gemv_n on x86
12 years ago
Zhang Xianyi
0d1518add9
Refs #173 . Fixed overflow internal buffer bug of sgemv_t on x86
12 years ago
Zhang Xianyi
91ed4e4450
Refs #171 . Prevent loading the dirty number from the buffer in sgemv_t x86 kernel.
12 years ago
Zhang Xianyi
fd3046b32a
Refs #173 . Fixed overflow internal buffer bug of gemv_t on x86.
12 years ago
Zhang Xianyi
bfaaa975e6
Added BULLDOZER target. So far it uses barcelona kernels.
13 years ago
Zhang Xianyi
b7c0fa6bd2
Init AMD Bulldozer codebase.
13 years ago
Zhang Xianyi
2573311308
refs #140 . Fixed zdot incompatibility ABI issue with GCC 4.7 on Win 32.
GCC 4.7 uses MSVC ABI on Win 32. This means the caller pops the hidden pointer for returning
aggregate structures larger than 8 bytes.
13 years ago
Zhang Xianyi
d3b67d0bd8
Refs #113 . Fixed the typo BOBCATE -> BOBCAT
13 years ago
Zhang Xianyi
d6cab3f37e
Refs #113 . Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.
13 years ago
Xianyi Zhang
a53c6e2440
Merge branch 'develop' into sandybridge
13 years ago