Martin Kroeker
8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines
Modernize obsolete inline order
2 years ago
Ian McInerney
79c15db348
Fix power10 gcc intrinsic check
__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter.
2 years ago
TGY
b5ba95a6c0
Modernize obsolete inline order
2 years ago
Martin Kroeker
54d3246fc6
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Manjul Mohan
58b88aa5f0
POWER10: Fix compiler warnings
This patch removes the warning messages related to unused variables in
sbgemm_kernel_power10.c.
Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com>
2 years ago
Martin Kroeker
1688c7da43
change line endings from CRLF to LF
2 years ago
Martin Kroeker
6c118b7977
Fix DNRM2 returning INF instead of zero due to intermediate overflow
3 years ago
Martin Kroeker
c43ec53bdd
Merge pull request #3690 from RajalakshmiSR/cdotp10
POWER: Fix complex dot function failures
3 years ago
Rajalakshmi Srinivasaraghavan
a612e78a97
POWER: Fix complex dot function failures
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
3 years ago
Rajalakshmi Srinivasaraghavan
432fd99445
POWER10: dgemv builtin rename
Add check to use correct builtin name for older versions
of gcc10 compilers.
3 years ago
VFerrari
cac634fce3
POWER10: Fix multithreading check when USE_THREAD=0
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.
The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
3 years ago
Martin Kroeker
9283c7c0b5
Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
POWER10: Fix ZGEMM testcase failures
3 years ago
Rajalakshmi Srinivasaraghavan
f191bc652b
POWER10: Fix ZGEMM testcase failures
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
3 years ago
Rajalakshmi Srinivasaraghavan
8419d538ff
POWER10: convert dgemv inline assembly
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
3 years ago
Rajalakshmi Srinivasaraghavan
b62173c5a0
POWER10: Changing store instructions for Level1 functions
This patch changes 32 bytes stores to two 16 bytes stores
to fix a recent degradation due to 32 bytes stores.
3 years ago
Martin Kroeker
05dcfa176e
fix undefined prefetchsizes
3 years ago
Martin Kroeker
2bbb9f05c7
fix undefined prefetchsize
3 years ago
Rafael Cardoso Fernandes Sousa
c78fdcc80d
[POWER] Add support for SMALL_MATRIX_OPT
3 years ago
kavanabhat
9cc95e5657
AIX changes for P10 with GNU Compiler
4 years ago
kavanabhat
fe3c778c51
AIX changes for P10 with GNU Compiler
4 years ago
Rafael Cardoso Fernandes Sousa
b751edf624
Fix unused variable warnings on Power
4 years ago
Rajalakshmi Srinivasaraghavan
b06880c2cd
POWER10: Improving dasum performance
Unrolling a loop in dasum micro code to help in improving
POWER10 performance.
4 years ago
Martin Kroeker
c4b464cac6
Merge pull request #3273 from austinpagan/sbgemm_gcc10_fix
Power10: Fix for SBGEMM
4 years ago
Gordon Fossum
e6dd44d989
Power10: Fix for SBGEMM
While testing bfloat16 sbgemm kernel, there are some failures for odd value inputs due to updating result for
additional bytes.
4 years ago
Martin Kroeker
2e8ff4a781
Merge pull request #3266 from martin-frbg/powerparam
Remove spurious casts from PPC parameters and fix compilation for older targets
4 years ago
Martin Kroeker
efdbdd8f82
Add prefetch values for power3
4 years ago
Martin Kroeker
3906ef3b0f
Add prefetch values for power3
4 years ago
Martin Kroeker
8adf0971d8
Add prefetch values for power3
4 years ago
Martin Kroeker
08e2e60762
Add prefetch values for power3
4 years ago
Martin Kroeker
fb9e678235
Fix caxpy/zaxpy for big-endian
4 years ago
Martin Kroeker
dc4fcb48df
Fix inverted conditional for caxpy/zaxpy
4 years ago
Martin Kroeker
7a48247761
fix c/zrot and sgemv for POWER5
4 years ago
Rajalakshmi Srinivasaraghavan
cbb70438df
POWER10: Fixes for sbgemm kernel
While testing bfloat16 sbgemm kernel, there are some failures
for odd value inputs due to array access beyond the boundary.
4 years ago
Rajalakshmi Srinivasaraghavan
2379abaa5e
POWER10: Improve dgemm performance
This patch uses vector pair pointer for input load operation
which helps to generate power10 lxvp instructions.
4 years ago
Rajalakshmi Srinivasaraghavan
55bb9f639a
POWER10: Optimized zgemv
This patch makes use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1 for zgemv_n and zgemv_t.
4 years ago
Rajalakshmi Srinivasaraghavan
2dbcddd83d
POWER10: Adding check for little endian
This patch makes sure that recent POWER10 patches are used
only for little endian.
4 years ago
Martin Kroeker
86c5a0013f
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler
4 years ago
Martin Kroeker
ef85c22474
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
4 years ago
Martin Kroeker
d3555d2e50
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
4 years ago
Rajalakshmi Srinivasaraghavan
09d47af2c0
Optimize zscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
4 years ago
Rajalakshmi Srinivasaraghavan
41646ed006
Optimize s/dasum function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
4 years ago
Rajalakshmi Srinivasaraghavan
0571c3187b
POWER10: Rename mma builtins
The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. This patch is to make
corresponding changes in dgemm kernel. Also made changes in
inputs to those builtins to avoid some potential typecasting issues.
Reference gcc commit id:77ef995c1fbcab76a2a69b9f4700bcfd005d8e62
4 years ago
Rajalakshmi Srinivasaraghavan
2056ffc227
Optimize cscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
4 years ago
Rajalakshmi Srinivasaraghavan
3ede843d50
Optimize s/dscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
4 years ago
Rajalakshmi Srinivasaraghavan
439b93f6d2
Optimize s/drot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
4 years ago
Rajalakshmi Srinivasaraghavan
eff7c9166e
Optimize cdot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
4 years ago
Rajalakshmi Srinivasaraghavan
601b711c78
Optimize swap function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
4 years ago
Rajalakshmi Srinivasaraghavan
2fb11f873b
POWER10: Improve copy performance
This patch aligns the stores to 32 byte boundary for scopy and dcopy
before entering into vector pair loop. For ccopy, changed the store
instructions to stxv to improve performance of unaligned cases.
4 years ago
Martin Kroeker
043128cbe5
Merge pull request #3029 from RajalakshmiSR/axpyp10
POWER10: Improve axpy performance
4 years ago
Rajalakshmi Srinivasaraghavan
346e30a46a
POWER10: Improve axpy performance
This patch aligns the stores to 32 byte boundary for saxpy and daxpy
before entering into vector pair loop. Fox caxpy, changed the store
instructions to stxv to improve performance of unaligned cases.
4 years ago