Iha, Taisei
08b5c18d70
fixed a potential out-of-bounds on gemv.
7 months ago
Annop Wongwathanarat
e11744a411
Use SVE kernel for S/DGEMVN for SVE machines
7 months ago
Martin Kroeker
db0abfa907
Merge pull request #5238 from martin-frbg/revert5125
remove non-vectorized SGEMV transpose reduce path for POWER8, restoring optimizations frpm PR4880
7 months ago
Martin Kroeker
7389b6c483
Merge pull request #5237 from martin-frbg/revert5219
Fix and reinstate the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV
7 months ago
Martin Kroeker
4ec62d7f73
remove non-vectorized code path for power8, restoring PR4880
7 months ago
Martin Kroeker
1df8738f27
Merge pull request #5235 from quickwritereader/issue_unaligned_ppc64le
Explicit unaligned vector load/stores in PPC64LE GEMV kernels
7 months ago
Martin Kroeker
99d9f1ff38
Fix conditional
7 months ago
Martin Kroeker
96d80801bc
Reinstate the CooperLake microkernel
7 months ago
Martin Kroeker
2e4309315c
Merge pull request #5219 from martin-frbg/sbgemvn_cooper
Temporarily disable the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV
7 months ago
Ubuntu
0cc2485594
Explicit unaligned vector load/stores in PPC64LE GEMV kernels
7 months ago
Martin Kroeker
dd38b4e811
Merge pull request #5225 from annop-w/gemv_n
Improve performance for SGEMVN on NEONVERSEN1
7 months ago
Martin Kroeker
0241d516f6
Merge pull request #5220 from iha-taisei/sdgemv_n_unroll
Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.
7 months ago
Annop Wongwathanarat
d535728803
Improve performance for SGEMVN on NEONVERSEN1
7 months ago
Usui, Tetsuzo
d711906e3e
Add symv kernels for arm64
7 months ago
Iha, Taisei
f1e628b889
Further performance improvements to [SD]GEMV.
7 months ago
Martin Kroeker
211dfd0754
disable the CooperLake microkernel as it produces wrong results
7 months ago
Martin Kroeker
b30dc9701f
Merge pull request #5215 from annop-w/gemv_t
Use SVE kernel for S/DGEMVT for SVE machines
7 months ago
Martin Kroeker
2893d0add4
Merge pull request #5211 from guoyuanplct/develop
Optimizing the Implementation of GEMV on the RISC-V V Extension
7 months ago
Annop Wongwathanarat
ec146157d3
Use SVE kernel for S/DGEMVT for SVE machines
7 months ago
Martin Kroeker
70865a894e
Merge pull request #5180 from ywwry66/openmp_use_cmake
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
7 months ago
lglglglgy
1ff303f36e
Optimizing the Implementation of GEMV on the RISC-V V Extension
Specialized some scenarios, performed loop unrolling, and reduced the
number of multiplications.
7 months ago
ColumbusAI
7bf848454d
Update zsum.c -- fixed spelling error to successfully compile
spelling error where zsum_kernel is used and it should be zasum_kernel. Will not compile without fix.
7 months ago
Vaisakh K V
04915be829
Add vector registers to clobber list to prevent compiler optimization.
SME based SGEMMDIRECT kernel uses the vector registers (z) and adding
clobber list informs compiler not to optimize these registers.
7 months ago
Egbert Eich
ea6515c4b3
On zarch don't produce objects from assembler with a writable stack section
On z-series, the current version of the GNU toolchain produces warnings
such as:
```
/usr/lib64/gcc/[...]/s390x-suse-linux/bin/ld: warning: ztrmm_kernel_RC_Z14.o: missing .note.GNU-stack section implies
executable stack
/usr/lib64/[...]/s390x-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
```
To prevent this message and make sure we are future proof, add
```
.section .note.GNU-stack,"",@progbits
```
Also add the `.size` bit to give the asm defined functions a proper size
in the symbol table.
Signed-off-by: Egbert Eich <eich@suse.com>
7 months ago
Ruiyang Wu
02fd1df10b
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
8 months ago
Ye Tao
f27ba5efd1
fix bugs in aarch64 sbgemv_n kernel
8 months ago
Annop Wongwathanarat
edef2e4441
Fix bug in ARM64 sbgemv_t
8 months ago
Martin Kroeker
b55ca71d5b
Merge pull request #5182 from annop-w/sgemm_ncopy
Optimize aarch64 sgemm_ncopy
8 months ago
Martin Kroeker
2f778554b8
Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
8 months ago
Annop Wongwathanarat
9807f56580
Optimize aarch64 sgemm_ncopy
8 months ago
Martin Kroeker
a3e7b16072
Merge pull request #5157 from manaalmj/feature
Optimize gemv_n_sve kernel
8 months ago
Ye Tao
4c00099ed6
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
8 months ago
Annop Wongwathanarat
a085b6c9ec
Fix aarch64 sbgemv_t compilation error for GCC < 13
8 months ago
manjam01
5c4e38ab17
Optimize gemv_n_sve kernel
8 months ago
Martin Kroeker
1d5ed5c46b
Merge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
8 months ago
Ye Tao
6b8b35cdf2
fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
8 months ago
Ye Tao
38ee7c9301
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
8 months ago
Martin Kroeker
2b941c44b5
Merge branch 'develop' into sbgemv_n_neon
8 months ago
Ye Tao
35bdbca153
Add sbgemv_n_neon kernel for arm64.
8 months ago
Annop Wongwathanarat
edaf51dd99
Add sbgemv_t_bfdot kernel for ARM64
This improves performance for sbgemv_t by up to 100x on NEOVERSEV1.
The geometric mean speedup is ~61x for M=N=[2,512].
8 months ago
Martin Kroeker
77fba0f400
Fix "dummy2" flag handling
9 months ago
Martin Kroeker
eb84aac7ad
Merge pull request #5084 from quic/topic/sgemm_direct_sme1
Support for SGEMM_DIRECT Kernel based on SME1
9 months ago
Martin Kroeker
b9ae246f20
define USE_TRMM for RISCV64 targets as well
9 months ago
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
9 months ago
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
11 months ago
Martin Kroeker
8d487ef6eb
Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed
LoongArch64: Fixed lapack test for LA264
9 months ago
Martin Kroeker
81eed868b6
Restore the non-vectorized code from before PR4880 for POWER8
9 months ago
Martin Kroeker
98b5ef929c
Restore the non-vectorized code from before PR4880 for POWER8
9 months ago
gxw
2c4a5cc6e6
LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S
When the data type is single-precision real or single-precision complex,
converting it to double precision does not prevent overflow (as exposed in LAPACK tests).
The only solution is to follow C's approach: find the maximum value in the
array and divide each element by that maximum to avoid this issue
9 months ago
gxw
9e75d6b3d1
LoongArch64: Fixed swap_lsx.S
Fixed the error when the stride is zero
9 months ago