gxw
553cc1372f
LoongArch64: Add sgemm_kernel
2 years ago
gxw
d46772e037
LoongArch64: Add compiler feature checks
2 years ago
Chris Sidebottom
84a268b6ca
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868 , this means I'm happy to enable this on any applicable cores.
I also replicated the unrolling the copies from sgemm and dgemm.
2 years ago
Chris Sidebottom
f971ef55f2
Add ARMV8SVE to AArch64 Dynamic Dispatch
In order to enable support for future cores which have similar tunings
(in this case I'm doing this for the Arm(R) Neoverse(TM) V2 core), this generically detects SVE support and enables it. This should better manage the size and complexity of dynamic dispatch rather than just copy pasting the same parameters.
To make `ARMV8SVE` more representive of the common 128-bit SVE case,
I've split it and similar parameters from A64FX which has the wider
512-bit SVE.
2 years ago
Martin Kroeker
72caceb324
Merge pull request #4009 from Mousius/sve-gemm
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2 years ago
Martin Kroeker
437c0bf2b4
Merge pull request #3843 from Mousius/switch-ratio
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2 years ago
Chris Sidebottom
ec334e69dc
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.
After #3868 , the SVE kernels represent a pretty good boost.
This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2 years ago
Chris Sidebottom
5b165420b5
SWITCH_RATIO for Arm(R) Neoverse(TM) architecture
This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of
thread throttling so I've gone with `SWITCH_RATIO=8`.
2 years ago
Chris Sidebottom
32f2fafde7
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2 years ago
Martin Kroeker
31fd13d048
MIPS: make HAVE_MSA reflect cpu capability and NO_MSA software/env
2 years ago
Chris Sidebottom
2fb096315e
Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs
From testing this yields better results than the default of `2`.
2 years ago
Honglin Zhu
4989e039a5
Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build
2 years ago
Jiaxun Yang
a50b29c540
Provide a fallback MIPS64_GENERIC target
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
3 years ago
gxw
fbfe1daf6e
LoongArch64: Add DYNAMIC_ARCH support
3 years ago
gxw
3573306a69
LoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC
3 years ago
Honglin Zhu
123e0dfb62
Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
2. No memory allocation in sbgemm kernel
3. Optimize when alpha == 1.0f
3 years ago
Honglin Zhu
55d686d41e
neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
3 years ago
Martin Kroeker
dac14a5f7d
revert "switch DGEMM parameters for SkylakeX if DYNAMIC_ARCH"
3 years ago
Martin Kroeker
a55a06c269
Update param.h
3 years ago
Martin Kroeker
d93cf7f23c
fix defines for CORTEX-X
3 years ago
Martin Kroeker
09b8545fc5
Add initial support for M1 on Linux, Phytium FT2xxx series, ARM Cortex 510/710/X1/X2
3 years ago
Martin Kroeker
8d0f7f0176
Revert accidental change of generic ARMV8 DGEMM parameters from #3425
3 years ago
Martin Kroeker
c1c0d5ce1d
Merge pull request #3492 from binebrank/arm_sve_zgemm
SVE zgemm&cgemm (and other BLAS 3 complex)
3 years ago
Bine Brank
b6a445cfd8
adapt Makefile for SVE trsm
3 years ago
Martin Kroeker
499ae5e8f7
Merge pull request #3510 from martin-frbg/issue3505
Fix recent SkylakeX/DYNAMIC_ARCH DGEMM breakage
3 years ago
Martin Kroeker
b6b024232d
Merge pull request #3508 from snadampal/v1_n2
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
3 years ago
Martin Kroeker
15d4b37913
SkylakeX: match parameters to dgemm kernels for dyn/non-dyn
3 years ago
Sunita Nadampalli
19c8f615dc
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
3 years ago
Bine Brank
39ab219704
sve copy functions for cgemm chemm zsymm
3 years ago
gxw
8d9b9c6b2a
loongarch64: Optimize dgemm_kernel
3 years ago
Martin Kroeker
697e2752d7
Merge pull request #3464 from binebrank/arm_sve_sgemm
Add sgemm part for Arm SVE
3 years ago
Bine Brank
a8f62a347b
fix UNROLL_MN and add to targets for SVE
3 years ago
Martin Kroeker
f7f7fea0dc
Merge pull request #3472 from kavanabhat/p10_aixas_p8
Fallback for Power kernels
3 years ago
kavanabhat
eee3381cbe
Fallback for Power kernels
3 years ago
Martin Kroeker
dd1f645371
switch DGEMM unroll parameters for SkylakeX if DYNAMIC_ARCH
3 years ago
Bine Brank
86ae89bf33
add sgemm kernel and copy functions for sgemm and ssymm
3 years ago
Martin Kroeker
454edd741c
Merge pull request #3425 from binebrank/arm_sve_dgemm
Add dgemm kernel for arm64 SVE
3 years ago
Bine Brank
f4da23dcb6
reduced dgemm_unroll_m to work with 128-bit sve
3 years ago
Bine Brank
9388f05a3c
configure SVE Makefile
3 years ago
Martin Kroeker
52a3f004a0
Fix unintended reversion of recent CortexA53 changes
3 years ago
Martin Kroeker
19ccef5fb1
Add generic MIPS32 target
3 years ago
Jia-Chen
302f22693a
MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55
3 years ago
Martin Kroeker
46947efb83
Ignore compiler support for MIPS MSA if the cpu lacks this capability
3 years ago
Bine Brank
ab7917910d
add v2x8 kernel + fix sve dtrmm
3 years ago
Bine Brank
7093372e32
add ARMV8SVE target
3 years ago
Wangyang Guo
7b2f5cb3b7
sbgemm: spr: enlarge P to 256 for performance
4 years ago
Wangyang Guo
0abbcd19c1
sbgemm: spr: tuning for blocking params
4 years ago
Wangyang Guo
3dc6052c7e
initial support for Sapphire Rapids platform
4 years ago
Martin Kroeker
24233b7c49
Use "big arm server" GEMM defaults for Vortex
4 years ago
kavanabhat
fe3c778c51
AIX changes for P10 with GNU Compiler
4 years ago