Martin Kroeker
72caceb324
Merge pull request #4009 from Mousius/sve-gemm
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2 years ago
Martin Kroeker
437c0bf2b4
Merge pull request #3843 from Mousius/switch-ratio
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2 years ago
Chris Sidebottom
ec334e69dc
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.
After #3868 , the SVE kernels represent a pretty good boost.
This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2 years ago
Chris Sidebottom
5b165420b5
SWITCH_RATIO for Arm(R) Neoverse(TM) architecture
This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of
thread throttling so I've gone with `SWITCH_RATIO=8`.
2 years ago
Chris Sidebottom
32f2fafde7
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2 years ago
Martin Kroeker
31fd13d048
MIPS: make HAVE_MSA reflect cpu capability and NO_MSA software/env
2 years ago
Chris Sidebottom
2fb096315e
Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs
From testing this yields better results than the default of `2`.
2 years ago
Honglin Zhu
4989e039a5
Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build
2 years ago
Jiaxun Yang
a50b29c540
Provide a fallback MIPS64_GENERIC target
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
3 years ago
gxw
fbfe1daf6e
LoongArch64: Add DYNAMIC_ARCH support
3 years ago
gxw
3573306a69
LoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC
3 years ago
Honglin Zhu
123e0dfb62
Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
2. No memory allocation in sbgemm kernel
3. Optimize when alpha == 1.0f
3 years ago
Honglin Zhu
55d686d41e
neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
3 years ago
Martin Kroeker
dac14a5f7d
revert "switch DGEMM parameters for SkylakeX if DYNAMIC_ARCH"
3 years ago
Martin Kroeker
a55a06c269
Update param.h
3 years ago
Martin Kroeker
d93cf7f23c
fix defines for CORTEX-X
3 years ago
Martin Kroeker
09b8545fc5
Add initial support for M1 on Linux, Phytium FT2xxx series, ARM Cortex 510/710/X1/X2
3 years ago
Martin Kroeker
8d0f7f0176
Revert accidental change of generic ARMV8 DGEMM parameters from #3425
3 years ago
Martin Kroeker
c1c0d5ce1d
Merge pull request #3492 from binebrank/arm_sve_zgemm
SVE zgemm&cgemm (and other BLAS 3 complex)
3 years ago
Bine Brank
b6a445cfd8
adapt Makefile for SVE trsm
3 years ago
Martin Kroeker
499ae5e8f7
Merge pull request #3510 from martin-frbg/issue3505
Fix recent SkylakeX/DYNAMIC_ARCH DGEMM breakage
3 years ago
Martin Kroeker
b6b024232d
Merge pull request #3508 from snadampal/v1_n2
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
3 years ago
Martin Kroeker
15d4b37913
SkylakeX: match parameters to dgemm kernels for dyn/non-dyn
3 years ago
Sunita Nadampalli
19c8f615dc
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
3 years ago
Bine Brank
39ab219704
sve copy functions for cgemm chemm zsymm
3 years ago
gxw
8d9b9c6b2a
loongarch64: Optimize dgemm_kernel
3 years ago
Martin Kroeker
697e2752d7
Merge pull request #3464 from binebrank/arm_sve_sgemm
Add sgemm part for Arm SVE
3 years ago
Bine Brank
a8f62a347b
fix UNROLL_MN and add to targets for SVE
3 years ago
Martin Kroeker
f7f7fea0dc
Merge pull request #3472 from kavanabhat/p10_aixas_p8
Fallback for Power kernels
3 years ago
kavanabhat
eee3381cbe
Fallback for Power kernels
3 years ago
Martin Kroeker
dd1f645371
switch DGEMM unroll parameters for SkylakeX if DYNAMIC_ARCH
3 years ago
Bine Brank
86ae89bf33
add sgemm kernel and copy functions for sgemm and ssymm
3 years ago
Martin Kroeker
454edd741c
Merge pull request #3425 from binebrank/arm_sve_dgemm
Add dgemm kernel for arm64 SVE
3 years ago
Bine Brank
f4da23dcb6
reduced dgemm_unroll_m to work with 128-bit sve
3 years ago
Bine Brank
9388f05a3c
configure SVE Makefile
3 years ago
Martin Kroeker
52a3f004a0
Fix unintended reversion of recent CortexA53 changes
3 years ago
Martin Kroeker
19ccef5fb1
Add generic MIPS32 target
3 years ago
Jia-Chen
302f22693a
MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55
3 years ago
Martin Kroeker
46947efb83
Ignore compiler support for MIPS MSA if the cpu lacks this capability
3 years ago
Bine Brank
ab7917910d
add v2x8 kernel + fix sve dtrmm
3 years ago
Bine Brank
7093372e32
add ARMV8SVE target
3 years ago
Wangyang Guo
7b2f5cb3b7
sbgemm: spr: enlarge P to 256 for performance
4 years ago
Wangyang Guo
0abbcd19c1
sbgemm: spr: tuning for blocking params
4 years ago
Wangyang Guo
3dc6052c7e
initial support for Sapphire Rapids platform
4 years ago
Martin Kroeker
24233b7c49
Use "big arm server" GEMM defaults for Vortex
4 years ago
kavanabhat
fe3c778c51
AIX changes for P10 with GNU Compiler
4 years ago
Wangyang Guo
8356a604f0
sbgemm: cooperlake: tuning for block params
4 years ago
Niyas Sait
7cddbf99b1
Make explicit conversion condition on _WIN64 flag
4 years ago
Niyas Sait
d1ed72fa87
[win/arm64]: Explicit casting for GMEMM_DEFAULT_ALIGN to create 64-bit value
Win64 uses LLP64 datamodel and unsigned long is only 32-bit. For 64-bit
architecture we need 64-bit mask to correctly generate address
4 years ago
gxw
af0a69f355
Add support for LOONGARCH64
4 years ago