Martin Kroeker
d96daa220d
Merge pull request #5290 from Srangrang/develop
Add support for FP16 to openBLAS and shgemm on RISCV
3 months ago
Srangrang
ec14e1648c
fix: resolve non-RISCV host build failed issue
- adjust interface to disable "small matrix" pathway
- separate HFLOAT16 from BFLOAT16
- remove SHGEMM_UNROLL_M and SHGEMM_UNROLL_N equal conditions
Related to PR#5290
Co-authored-by Martin
3 months ago
Martin Kroeker
5e393f207c
fix source file used for sbgemmt/sbgemmtr
3 months ago
Martin Kroeker
11ff18bb0f
Merge pull request #5081 from XiWeiGu/kernel_generic_fixed_cscal_zscal
kernel/generic: Fixed cscal and zscal
3 months ago
gkdddd
670ec6f757
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0
Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com>
4 months ago
Martin Kroeker
42b7d1f897
Fix addressing of alpha in CBLAS
4 months ago
Martin Kroeker
6680e0592f
Fix conditional inclusion of SGEMM_KERNEL_DIRECT
4 months ago
Martin Kroeker
70865a894e
Merge pull request #5180 from ywwry66/openmp_use_cmake
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
5 months ago
Ruiyang Wu
02fd1df10b
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
Using `OpenMP::OpenMP_LANG` targets for CMake is less error-prone than
passing the compiler and linker flags manually. Furthermore, it allows
the user to customize those flags by setting `OpenMP_LANG_FLAGS`,
`OpenMP_LANG_LIB_NAMES`, and `OpenMP_omp_LIBRARY`.
6 months ago
Martin Kroeker
51c1fb1f93
Fix ?spmv build and misinterpretation of NO_LAPACK=0
6 months ago
shubham.chaudhari
8e289ecddc
Simplified thread throttling function in gemv
6 months ago
shubham.chaudhari
189dbbc04f
Add thread throttling for dynamic arch neoversev1
7 months ago
shubham.chaudhari
b6cb5ece58
Add thread throttling profile for DGEMV on NEOVERSEV1
7 months ago
Martin Kroeker
7338a473a7
Merge pull request #5150 from Harishmcw/WoA-Experiments
Redefined threading logic for GESV and GEMV on WoA
7 months ago
Martin Kroeker
09ba099461
make throttling code conditional on SMP
7 months ago
Harishmcw
030ae1fd97
Redefined threading logic for WoA
7 months ago
Martin Kroeker
c03a81b927
Merge pull request #5141 from michalowski-arm/fork-throttle
Add throttling profile for SGEMM and SGEMV on `NEOVERSEV2`
7 months ago
Martin Kroeker
75b958a018
Transform the B array back if necessary before returning
7 months ago
Marek Michalowski
650a062e19
Add thread throttling profile for SGEMV on `NEOVERSEV2`
7 months ago
Marek Michalowski
b723c1b7b7
Add thread throttling profile for SGEMM on `NEOVERSEV2`
7 months ago
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
7 months ago
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
10 months ago
Harish-Gits
daf16b8229
Adjusted GESV threading logic for optimal performance on WoA
7 months ago
Martin Kroeker
60d0be0e97
Update nrm2.c
7 months ago
Martin Kroeker
0fd5448b2c
Handle INCX=0
7 months ago
Martin Kroeker
db7e5f1fa7
Update gemmt.c
7 months ago
Martin Kroeker
ff30ac9666
Update Makefile
7 months ago
Martin Kroeker
7c3e169b67
Update gemmt.c
7 months ago
Martin Kroeker
09414a4187
Ensure that GEMMTR name appears in XERBLA if gemmt was called as such
7 months ago
Marek Michalowski
838bb57e27
Merge branch 'develop' into develop
8 months ago
Martin Kroeker
a54f9a9c69
Merge pull request #5071 from annop-w/sgemm_throttling
Add thread throttling profile for SGEMM on NEOVERSEV1
8 months ago
Marek Michalowski
4d5b13f765
Add thread throttling profile for SGEMV on `NEOVERSEV1`
8 months ago
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
8 months ago
gxw
e114880dc4
kernel/generic: Fixed cscal and zscal
8 months ago
Annop Wongwathanarat
c8cd8da496
Add thread throttling profile for SGEMM on NEOVERSEV1
8 months ago
Martin Kroeker
a1075477c3
Merge pull request #4994 from martin-frbg/issue4886
Disable multithreading in ?TRTRI for small workloads
9 months ago
Martin Kroeker
0c440f8a27
disable multithreading for small workloads
10 months ago
Martin Kroeker
2a290dfc2c
forward GEMM3M calls for GENERIC targets to the regular C/ZGEMM for now
10 months ago
Martin Kroeker
0cf656fd3e
Add copies of GEMMT under its new name GEMMTR
11 months ago
Chris Daley
cb48505251
optimize gemv forwarding on ARM64 systems
11 months ago
Chip Kerchner
36bd3eeddf
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
11 months ago
Chip Kerchner
1d51ca5798
Change multi-threading logic for SBGEMV to be the same as SGEMV.
11 months ago
Martin Kroeker
9762464718
Fix CBLAS interface filling in the wrong triangle for Row-Major
11 months ago
gxw
48698b2b1d
LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
1 year ago
Martin Kroeker
7878976236
disable forwarding from SBGEMM to SBGEMV for now
1 year ago
Chris Sidebottom
b26424c6a2
Allow opt into GEMM -> GEMV forwarding
1 year ago
Chris Sidebottom
90eb863d4b
Re-add accidental removal
1 year ago
Chris Sidebottom
28b5334f22
Complete implementation of GEMV forwarding
1 year ago
Martin Kroeker
3db5dbc88e
forward to GEMV when one argument is actually a vector
1 year ago
gxw
f3cebb3ca3
x86: Fixed numpy CI failure when the target is ZEN.
1 year ago