Chris Sidebottom
114316f361
Optimize SBGEMM / BGEMM for NEOVERSEV1 further
This changes the kernels to pack full SVE vectors and reduces the
overall complexity of the inner GEMM loop.
1 month ago
Masato Nakagawa
7e29f11396
Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1)
2 months ago
Martin Kroeker
c504aedca1
Merge pull request #5400 from Mousius/neoversev2-target
Add NEOVERSEV2 target support
2 months ago
Chris Sidebottom
87247daadc
Add NEOVERSEV2 target support
Did a quick run around to make `TARGET=NEVOERSEV2` build successfully.
Fixes #5385
2 months ago
Chris Sidebottom
ea2faf0c9a
Add optimized BGEMM for NEOVERSEN2 target
This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.
2 months ago
Chris Sidebottom
740efd71c4
Add optimized BGEMM kernel for NEOVERSEV1 target
This also improves the testing and generic kernel by re-using the BF16
conversion functions.
Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287
Co-authored-by: Ye Tao <ye.tao@arm.com>
2 months ago
Chris Sidebottom
f95e7b0e32
Add infrastructure for BGEMM
Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places.
Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287
Co-authored-by: Ye Tao <ye.tao@arm.com>
3 months ago
Masato Nakagawa
5253c8f165
Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for
A64FX.
3 months ago
h-motoki
bba75d5e45
GEMM_PREFERED_SIZE parameter has been changed for A64FX.
3 months ago
Martin Kroeker
d96daa220d
Merge pull request #5290 from Srangrang/develop
Add support for FP16 to openBLAS and shgemm on RISCV
3 months ago
davidz-ampere
aa90ab4142
Add support for Ampere AmpereOne processors
3 months ago
davidz-ampere
be68ef03b4
Add support for Ampere processors
3 months ago
gkdddd
670ec6f757
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0
Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com>
4 months ago
Srangrang
0a967797a1
Add FP16 support for RISCV
4 months ago
Martin Kroeker
a34b487f22
Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN
5 months ago
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
7 months ago
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
10 months ago
Ye Tao
c748e6a338
optimized sbgemm kernel for neoverse-v1 (sve-256)
Signed-off-by: Ye Tao <ye.tao@arm.com>
10 months ago
Aditya Tewari
4379a6fbe3
* checkpoint sbgemm for SVE-256
11 months ago
Martin Kroeker
926e56e389
Align GEMM3M parameters for GENERIC with ZGEMM and add P/Q/R
10 months ago
Martin Kroeker
a47b3c8867
Fix unroll parameter selection for MIPS64_GENERIC
11 months ago
Martin Kroeker
7c4f3638fd
switch PPCG4 SGEMM kernel to 4x4
1 year ago
gxw
48698b2b1d
LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
1 year ago
Chip Kerchner
b1737698db
Fix DEFAULTS in SBGEMM for POWER10. Also comparisons for SBGEMM unit test can be exactly due to epilison differences.
1 year ago
Piotr Kubaj
4c12090776
Fix build on FreeBSD/powerpc64*
1 year ago
gxw
6017ad7146
loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6
1 year ago
Usui, Tetsuzo
ca673ca774
Add GEMM_PREFERED_SIZE parameter for Neoverse V1
1 year ago
Martin Kroeker
93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
loongarch: Optimizing the performance of the GEMM on servers
1 year ago
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
1 year ago
Martin Kroeker
ba6d485102
Adjust SWITCH_RATIO for ZEN and apply GEMM_PREFERRED_SIZE
1 year ago
Martin Kroeker
584e87661d
set SWITCH_RATIO for Cortex-A76
1 year ago
Martin Kroeker
b925f61fb0
Add support for Cortex-A76
1 year ago
Rajalakshmi Srinivasaraghavan
f5b2a877e2
POWER9: Use default param values from POWER8 on AIX
AIX uses KERNEL.POWER8 optimization on POWER9 and changing
the default GEMM parameters in param.h to use POWER8 values
on POWER9.
1 year ago
pengxu
4787a55c64
Optimized cgemm kernel 16x4 LASX for LoongArch
1 year ago
pengxu
fe3da43b7d
Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch
1 year ago
Martin Kroeker
e5d2725e5a
Merge pull request #4185 from XiWeiGu/mips_enable_msa
MIPS: Enable MSA
1 year ago
Sergei Lewis
1093def0d1
Merge branch 'risc-v' into develop
1 year ago
Martin Kroeker
889c5d026a
Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
1 year ago
kseniyazaytseva
b193ea3d7b
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
1 year ago
Dirreke
ec89466e14
Add CSKY support
1 year ago
Martin Kroeker
504f9b0c5e
Increase S/D GEMM PQ to match typical L2 size as forNeoverseV1
1 year ago
Martin Kroeker
2802478449
revert change to Loongson2k1000 zgemm
1 year ago
Martin Kroeker
44b5b9e39f
Update C/ZGEMM MN for Loongson2k1000
1 year ago
Martin Kroeker
519b40fad9
Merge pull request #4398 from yinshiyou/la-dev
Add Optimizations for LoongArch.
1 year ago
pengxu
a5d0d21378
loongarch64: Add zgemm and cgemm optimization
1 year ago
Hao Chen
179ed51d3b
Add dgemm_kernel_8x4.S file.
1 year ago
Darshan Patel
dab0da8243
Update GEMM param for NEOVERSEV1
1 year ago
Octavian Maghiar
e4586e81b8
[RISC-V] Add RISC-V Vector 128-bit target
Current RVV x280 target depends on vlen=512-bits for Level 3 operations.
Commit adds generic target that supports vlen=128-bits.
New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations.
Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.
1 year ago
Rajalakshmi Srinivasaraghavan
980f702f72
POWER: AIX: Make use of power10 optimization
POWER10 optimizations are disabled when using default AIX assembler.
As we have fixed many issues recently, enabling optimization path
for default assembler.
1 year ago
gxw
553cc1372f
LoongArch64: Add sgemm_kernel
2 years ago