Martin Kroeker
d3555d2e50
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
4 years ago
Martin Kroeker
251a09ec90
Typo fix
5 years ago
Martin Kroeker
95d37e1575
Regroup the 32 and 64bit sections and restore 64bit CAXPY
5 years ago
Martin Kroeker
f308e741b2
remove debug output and revert changes to cdot and crot
5 years ago
Martin Kroeker
f8c2697701
Use POWER6 GEMM, TRMM and DTRSM on 32bit POWER8
5 years ago
Rajalakshmi Srinivasaraghavan
bd9ff820bc
Fix cmake compilation issue - POWER9
This patch removes extra space in the sgemmotcopy filename
thereby allowing it to create entry in kernel/Makefile
created by cmake.
5 years ago
Martin Kroeker
06208c8d01
Limit this fix to ELFv2 builds
5 years ago
Martin Kroeker
f5c4c28b98
Work around POWER8BE bugs on FreeBSD (ELFv2)
for #2299
5 years ago
Martin Kroeker
0b39cf95b0
Fix endianness conditionals
5 years ago
Martin Kroeker
9f39f0a2c3
Specify ismin/ismax assembly kernels for POWER8 directly
to fix utest failure in new ismin test - Makefile.L1 defaults look wrong
5 years ago
Martin Kroeker
d483e9270a
Update KERNEL.POWER8
5 years ago
Martin Kroeker
01834aee33
Merge pull request #29 from xianyi/develop
rebase
5 years ago
Martin Kroeker
d92bd5be24
Update KERNEL.POWER8
5 years ago
Martin Kroeker
46e4b12946
Update KERNEL.POWER8
5 years ago
Martin Kroeker
dc345d84df
Fix syntax of endianness conditional and add gcc version check for workaround
5 years ago
Martin Kroeker
cad0d150db
Define alternate kernels for big-endian POWER8
5 years ago
Martin Kroeker
673e5a0495
Replace several POWER8/9 C kernels with their gcc7-generated assembly versions ( #2263 )
* Add gcc7-generated assembly files for POWER8/9 isa/ica-min/max and POWER9 caxpy
To work around internal compiler errors encountered when compiling the original C source with gcc 4 and 5, and wrong code generated by gcc 8.3.0
* Use gcc-generated assembly instead of original C sources
to work around internal compiler errors encountered with gcc 4.8/5.4 and wrong code generation by gcc 8.3
* Use gcc-generated assembly instead of the original C source
to work around internal compiler errors encountered with gcc 4.8 and 5.4, and wrong code generation by gcc 8.3
* Add gcc7-generated assembler version of caxpy for power8
to work around wrong code generated by gcc 8.3
* Handle CONJ define for caxpyc
* Handle CONJ define for caxpyc
* Add gcc7-generated assembly cdot for POWER9
* Use prebuilt assembly for POWER9 cdot
created with gcc 7.3.1 to work around ICE in older gcc versions
* Exclude POWER9 from DYNAMIC_ARCH when gcc versions is lower than 6
* Update Makefile.system
* Use PROLOGUE macro to ensure correct function name for DYNAMIC_ARCH
* Disable POWER9 with old gcc versions
6 years ago
Rashmica Gupta
bcdf1d4917
Add in runtime CPU detection for POWER.
6 years ago
Ubuntu
4abc375a91
sgemv cgemv pairs
6 years ago
Ubuntu
8c3386be87
Added missing Blas1 single fp {saxpy, caxpy, cdot, crot(refactored version of srot),isamax ,isamin, icamax, icamin},
Fixed idamin,icamin choosing the first occurance index of equal minimals
6 years ago
QWR QWR
28ca97015d
power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
z13: improved zgemv_(t|n)_4,zscal,zaxpy
7 years ago
martin
7a4b3cfbf8
Add trivially optimized DSDOT for POWER8
7 years ago
Zhang Xianyi
515bc56ea9
Refs #946 . Use nrm2 reference implementation for Power8.
9 years ago
Zhang Xianyi
ae70b916f4
Refs #929 . Deal with zero and NaNs for scale.
9 years ago
Werner Saar
8fb5a1aaff
added optimized dtrsm_LT kernel for POWER8
9 years ago
Werner Saar
56948dbf0f
optimized dgemm for POWER8
9 years ago
Werner Saar
0d0c6f7d7d
optimized dgemm for POWER8
9 years ago
Werner Saar
a3da10662f
added sgemm_tcopy_8_power8.S
9 years ago
Werner Saar
d46f07bb4e
added cgemm_tcopy_8_power8.S
9 years ago
Werner Saar
879a51165f
Optimized zgemm and tested zgemm again
9 years ago
Werner Saar
9276c9012f
Optimized sgemm and dgemm and tested again.
9 years ago
Werner Saar
3c6294ca3d
added optimized sgemm_tcopy for power8
9 years ago
Werner Saar
68a69c5b50
added optimized dgemv_n kernel for POWER8
9 years ago
Werner Saar
c2464a7c4a
added optimized casum kernel for POWER8
9 years ago
Werner Saar
294f933869
added optimized zasum kernel for POWER8
9 years ago
Werner Saar
f59c9bd6ef
added optimized sasum kernel for POWER8
9 years ago
Werner Saar
c53be46d78
added optimized dasum kernel for POWER8
9 years ago
Werner Saar
659ed16591
added otimized cswap and zswap kernels for POWER8
9 years ago
Werner Saar
35c98a3556
added optimized zscal kernel for POWER8
9 years ago
Werner Saar
f1a5dd06c5
added optimized sscal kernel for POWER8
9 years ago
Werner Saar
35f1f21a7f
added drot- and srot-kernel optimimized for POWER8
9 years ago
Werner Saar
3d9a50e841
added optimized sswap kernel for POWER8
9 years ago
Werner Saar
828c849b44
added optimized ccopy kernel for POWER8
9 years ago
Werner Saar
ecc0bc9813
added optimized scopy kernel for POWER8
9 years ago
Werner Saar
12f209b7b0
added optimized zswap kernel for POWER8
9 years ago
Werner Saar
7316a87930
added optimized dswap kernel for POWER8
9 years ago
Werner Saar
0bff057a87
added optimized dcopy kernel for POWER8
9 years ago
Werner Saar
1e6cf9808c
added optimized dscal kernel for POWER8
9 years ago
Werner Saar
55eda3813b
added optimized zaxpy kernel for POWER8
9 years ago
Werner Saar
0664ba4c97
added optimized daxpy kernel for POWER8
9 years ago