Martin Kroeker
a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug
2 months ago
Martin Kroeker
cf06250d36
add handling of dummy2 flag
4 months ago
Martin Kroeker
4ec62d7f73
remove non-vectorized code path for power8, restoring PR4880
5 months ago
Ubuntu
0cc2485594
Explicit unaligned vector load/stores in PPC64LE GEMV kernels
5 months ago
Martin Kroeker
77fba0f400
Fix "dummy2" flag handling
7 months ago
Martin Kroeker
81eed868b6
Restore the non-vectorized code from before PR4880 for POWER8
7 months ago
Martin Kroeker
98b5ef929c
Restore the non-vectorized code from before PR4880 for POWER8
7 months ago
Martin Kroeker
d7036cfd74
Remove trailing blanks that break the cmake parser
8 months ago
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
8 months ago
Sergey Fedorov
229efa42ff
scal.S: use r11 on 32-bit Darwin on powerpc
9 months ago
Sergey Fedorov
81e1be8d90
Revert "temporarily disable the default S/DSCAL kernel"
This reverts commit 9b9c0aa5c9
.
9 months ago
Martin Kroeker
9b9c0aa5c9
temporarily disable the default S/DSCAL kernel
9 months ago
Ayappan Perumal
020cce1068
Fix build issues with gcc compiler as well
11 months ago
Ayappan Perumal
b6ec73e77c
Fix AIX build
11 months ago
Chip Kerchner
ab71a1edf2
Better VSX.
11 months ago
Chip Kerchner
36bd3eeddf
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
11 months ago
Martin Kroeker
e52d9b4cf1
Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
11 months ago
Gordon Fossum
0b7fb5c791
CGEMM & ZGEMM using C code.
11 months ago
Martin Kroeker
c9e92348a6
Handle inf/nan if dummy2 flag is set
1 year ago
Martin Kroeker
d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds
1 year ago
Chip Kerchner
1a7b8c650d
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Martin Kroeker
f5d04318e3
Merge branch 'OpenMathLib:develop' into scalfixes
1 year ago
Martin Kroeker
73f8866ffb
make NAN handling depend on DUMMY2 parameter
1 year ago
Hong Bo Peng
db98f8753f
Try to fix LAPACK testing failures on P7.
1. Remove the FADD insn from the GEMV Transpose code.
2. Remove the FADD insn from GEMM and ZGEMM code.
3. Reorder the compution of the Imaginary part in ZGEMM code.
1 year ago
Martin Kroeker
b9bfc8ce09
make NAN handling depend on dummy2 parameter
1 year ago
Chip Kerchner
ba47c7f4f3
Vectorize reduction stage of sgemv_t.
1 year ago
Chip Kerchner
cb154832f8
Vectorize SBGEMM incopy - 4x faster.
1 year ago
Martin Kroeker
2a5fe97e3b
temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN
1 year ago
Martin Kroeker
7f8f037a36
handle INF and NAN in input
1 year ago
Martin Kroeker
f1248b849d
handle INF and NAN in input
1 year ago
Rajalakshmi Srinivasaraghavan
e112191b54
POWER: Fix issues in zscal to address lapack failures
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold
zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold
zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold
1 year ago
Martin Kroeker
aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
Fix regression SAXPY when compiler with OpenXL compiler.
1 year ago
Chip Kerchner
3a1417671a
POWER: Fixing endianness issue in cswap/zswap kernel for AIX
1 year ago
Amrita H S
87b3d9054f
Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.
This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
1 year ago
Chip-Kerchner
99384933ff
Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code"
This reverts commit accea1555159d0928a6aa2db740c042c7e8f0dd3, reversing
changes made to b925353006
.
1 year ago
Martin Kroeker
accea15551
Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code
Cgemm zgemm c code
1 year ago
austinpagan
87ba528d8b
Changed C files to straighten out indentation. Removed commented lines from other file.
1 year ago
austinpagan
ddac75e0ef
Adding .C versions of CGEMM and ZGEMM
1 year ago
Chip Kerchner
2bb7ea64a1
Only vectorize 64-bit version for Power8.
1 year ago
Chip Kerchner
09bb48d1b9
Vectorize in-copy packing/copying for SGEMM - 4X faster.
1 year ago
Chip-Kerchner
058dd2a4cb
Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.
1 year ago
barracuda156
d9653af018
KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
1 year ago
Chip-Kerchner
4e738e561a
Replace two vector loads with one vector pair load and fix endianess of stores.
1 year ago
Rajalakshmi Srinivasaraghavan
980f702f72
POWER: AIX: Make use of power10 optimization
POWER10 optimizations are disabled when using default AIX assembler.
As we have fixed many issues recently, enabling optimization path
for default assembler.
1 year ago
Rajalakshmi Srinivasaraghavan
82fc29a57a
POWER10: Fallback to POWER8 functions
As cgemm and zgemm kernels are not optimized for big endian falling
back to POWER8 versions. Tested on AIX using gcc and Open XL C.
2 years ago
Martin Kroeker
8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines
Modernize obsolete inline order
2 years ago
Ian McInerney
79c15db348
Fix power10 gcc intrinsic check
__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter.
2 years ago
TGY
b5ba95a6c0
Modernize obsolete inline order
2 years ago
Martin Kroeker
54d3246fc6
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Manjul Mohan
58b88aa5f0
POWER10: Fix compiler warnings
This patch removes the warning messages related to unused variables in
sbgemm_kernel_power10.c.
Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com>
2 years ago