Martin Kroeker
4664b57e6e
use shortcut only when both incx and incy are zero
2 years ago
Martin Kroeker
09131f79a6
Merge pull request #4164 from martin-frbg/issue4162
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
2 years ago
Martin Kroeker
6a428b5629
Update casum_microk_skylakex-2.c
2 years ago
Martin Kroeker
ebb447e32e
Update zasum_microk_skylakex-2.c
2 years ago
Martin Kroeker
9f6847583a
nvc currently miscompiles this, hopefully fixed in release 23.09
2 years ago
Martin Kroeker
fe54ee3d15
nvc currently miscompiles this, hopefully fixed in release 23.09
2 years ago
Martin Kroeker
5720fa02c5
Merge pull request #4168 from Mousius/sve-zgemm-cgemm
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
2 years ago
Chris Sidebottom
84a268b6ca
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868 , this means I'm happy to enable this on any applicable cores.
I also replicated the unrolling the copies from sgemm and dgemm.
2 years ago
Chris Sidebottom
730ca04b48
Fix ZHEMM copy for SVE
Whilst disambiguating whilelt, I inadvertantly used the wrong datatype
for offsets, which can be negative. This rectifies that.
2 years ago
Martin Kroeker
2a62d2df96
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
2 years ago
Martin Kroeker
849c8806b8
Merge pull request #4161 from Mousius/non-sve-kernels
Use latest non-SVE kernels in ARMV8SVE
2 years ago
Chris Sidebottom
24586bc4ff
Disambiguate whilelt
2 years ago
Chris Sidebottom
aea2a4622b
Use latest non-SVE kernels in ARMV8SVE
These are generally better and, in some cases, include threading which helps in the cores we're targeting here.
2 years ago
martin-frbg
7976deff80
Fix file permissions (issue 4095)
2 years ago
Martin Kroeker
76ef1672f8
Override DSDOT with generic code to get rid of qemu precision error
2 years ago
Martin Kroeker
49077e7bde
Merge pull request #4145 from martin-frbg/issue4144
Restore zero-initialization of variables in generic ztrsm_utcopy
2 years ago
Martin Kroeker
3d31191b0f
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI ( #4140 )
* Add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH
* add casts to disambiguate svwhilelt for clang
2 years ago
Martin Kroeker
cfa0a80664
Restore initialization of data variables
2 years ago
Martin Kroeker
9567305e4c
Restore initialization of data01,data02
2 years ago
Xianyi Zhang
e14a025bb1
Temporily walk around zaxpy vector kernel bug.
2 years ago
Martin Kroeker
772b0cc715
Fix early bailout
2 years ago
Martin Kroeker
d6be5036d7
Fix IDAMAX
2 years ago
Martin Kroeker
1fe96f8da7
Fix failures to handle increments of zero
2 years ago
Martin Kroeker
73b30b1dec
Fix VLEV_FLOAT/VSEV_FLOAT macros to compile with t-head 2.6.1
2 years ago
Martin Kroeker
c3a2d407a0
Merge pull request #4048 from imzhuhl/spr_sbgemm_fix
Sapphire Rapids sbgemm fix
2 years ago
Manjul Mohan
58b88aa5f0
POWER10: Fix compiler warnings
This patch removes the warning messages related to unused variables in
sbgemm_kernel_power10.c.
Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com>
2 years ago
Honglin Zhu
9e80a194d6
Fix dynamic_list build and gcc version check error
2 years ago
Honglin Zhu
a76afdc047
Compatible with older version of GNU make
2 years ago
Honglin Zhu
90f041e348
Invoke the syscall to allow the use of amx tiles
2 years ago
Honglin Zhu
0b83088887
spr dynamic arch support
2 years ago
Honglin Zhu
f249ccb741
Fix spr sbgemm error
2 years ago
Martin Kroeker
e9a8d5b45f
Merge pull request #4015 from martin-frbg/issue4013-2
[WIP] Disable gcc's tree-vectorizer for x86_64 CGEMV
2 years ago
Martin Kroeker
72caceb324
Merge pull request #4009 from Mousius/sve-gemm
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2 years ago
Martin Kroeker
84bcf6639f
Disable gcc's tree-vectorizer pass on all operating systems
2 years ago
Martin Kroeker
c9174ae8d7
Disable gcc's tree-vectorizer pass on all operating systems
2 years ago
Martin Kroeker
c2fe9cb91f
Disable gcc's tree-vectorizer pass on all operating systems
2 years ago
Martin Kroeker
66b39b835c
Disable gcc's tree-vectorizer pass on all operating systems
2 years ago
Martin Kroeker
bb6d6735bf
Disable gcc's tree-vectorizer pass on all operating systems
2 years ago
Martin Kroeker
d18efaed20
Disable gcc's tree-vectorizer pass on all operating systems
2 years ago
Martin Kroeker
99f6d31ed5
Disable gcc's tree-vectorizer pass on all operating systems
2 years ago
Martin Kroeker
7de9335c56
Disable gcc's tree-vectorizer pass on all operating systems
2 years ago
Martin Kroeker
437c0bf2b4
Merge pull request #3843 from Mousius/switch-ratio
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2 years ago
Chris Sidebottom
ec334e69dc
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.
After #3868 , the SVE kernels represent a pretty good boost.
This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2 years ago
Chris Sidebottom
32f2fafde7
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2 years ago
Martin Kroeker
44164e3a3d
revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw)
2 years ago
Martin Kroeker
8be68fa7f4
move declaration of sca to really keep the compiler from throwing it out (for now)
2 years ago
Martin Kroeker
3727672a74
Improve workaround and keep compilers from optimizing it out
2 years ago
Martin Kroeker
108a21e47a
Move ALPHA out of register 18 (reserved on OSX)
2 years ago
Martin Kroeker
0b1acb0ba3
Move ALPHA_I out of register 18 (reserved on OSX)
2 years ago
Martin Kroeker
c7bbad09ad
Move ALPHA_I out of register 18 (reserved on OSX)
2 years ago