Martin Kroeker
|
b34235ca66
|
Fix inclusion of deprecated interfaces and cgesvdq/strsyl3
|
6 months ago |
Martin Kroeker
|
37b854769b
|
Merge pull request #5173 from nakagawa-fj/gemm_load_imbalance
Improving Load Imbalance in Thread-Parallel GEMM
|
6 months ago |
Martin Kroeker
|
a3e7b16072
|
Merge pull request #5157 from manaalmj/feature
Optimize gemv_n_sve kernel
|
6 months ago |
Martin Kroeker
|
8865850496
|
Merge pull request #5176 from annop-w/fix_sbgemv_t
Fix aarch64 sbgemv_t compilation error for GCC < 13
|
6 months ago |
Annop Wongwathanarat
|
a085b6c9ec
|
Fix aarch64 sbgemv_t compilation error for GCC < 13
|
6 months ago |
Masato Nakagawa
|
80d3c2ad95
|
Add Improving Load Imbalance in Thread-Parallel GEMM
|
6 months ago |
manjam01
|
5c4e38ab17
|
Optimize gemv_n_sve kernel
|
7 months ago |
Martin Kroeker
|
39eb43d441
|
Improve thread safety of pthreads builds that rely on C11 atomic operations for locking (#5170)
* Tighten memory orders for C11 atomic operations
|
6 months ago |
Martin Kroeker
|
1d5ed5c46b
|
Merge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
|
7 months ago |
Martin Kroeker
|
7338a473a7
|
Merge pull request #5150 from Harishmcw/WoA-Experiments
Redefined threading logic for GESV and GEMV on WoA
|
7 months ago |
Martin Kroeker
|
5f200dca54
|
Merge pull request #5166 from martin-frbg/issue5158
Expose the option to build without LAPACKE to ccmake
|
7 months ago |
Martin Kroeker
|
8b98db13e3
|
Merge pull request #5167 from taoye9/fix_sbgemv_n_kernel_typo
fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
|
7 months ago |
Ye Tao
|
6b8b35cdf2
|
fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
|
7 months ago |
Ye Tao
|
38ee7c9301
|
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
|
7 months ago |
Martin Kroeker
|
217324d880
|
Merge pull request #5162 from taoye9/add_sbgemv_tests
add beta and alpha testcase for sbgemv
|
7 months ago |
Martin Kroeker
|
e4630ed15a
|
Merge pull request #5160 from taoye9/sbgemv_n_neon
Add SBGEMVN Kernel for ARM64
|
7 months ago |
Martin Kroeker
|
35914aa9a2
|
Expose the option to build without LAPACKE to ccmake
|
7 months ago |
Martin Kroeker
|
2b941c44b5
|
Merge branch 'develop' into sbgemv_n_neon
|
7 months ago |
Martin Kroeker
|
c797e27a1c
|
Merge pull request #5159 from annop-w/sbgemv_t_bfdot
Add sbgemv_t_bfdot kernel for ARM64
|
7 months ago |
Ye Tao
|
4346b91559
|
add beta and alpha testcase for sbgemv
|
7 months ago |
Ye Tao
|
35bdbca153
|
Add sbgemv_n_neon kernel for arm64.
|
7 months ago |
Annop Wongwathanarat
|
edaf51dd99
|
Add sbgemv_t_bfdot kernel for ARM64
This improves performance for sbgemv_t by up to 100x on NEOVERSEV1.
The geometric mean speedup is ~61x for M=N=[2,512].
|
7 months ago |
Martin Kroeker
|
ef9e3f7159
|
Merge pull request #5149 from martin-frbg/fixup5077-5088
Make the Neoverse GEMM/GEMV throttling code conditional on SMP
|
7 months ago |
Martin Kroeker
|
09ba099461
|
make throttling code conditional on SMP
|
7 months ago |
Harishmcw
|
030ae1fd97
|
Redefined threading logic for WoA
|
7 months ago |
Martin Kroeker
|
1533fe49be
|
Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2
dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
|
7 months ago |
Martin Kroeker
|
c03a81b927
|
Merge pull request #5141 from michalowski-arm/fork-throttle
Add throttling profile for SGEMM and SGEMV on `NEOVERSEV2`
|
7 months ago |
Martin Kroeker
|
643966d9c7
|
Merge pull request #5146 from martin-frbg/issue5123
Fix "dummy2" flag reading in PPC970 S/DSCAL
|
7 months ago |
Martin Kroeker
|
77fba0f400
|
Fix "dummy2" flag handling
|
7 months ago |
Ye Tao
|
f0bea79a6e
|
dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
|
7 months ago |
Martin Kroeker
|
20d1118865
|
Merge pull request #5143 from martin-frbg/issue5111
Fix GEMMT transforming the input array B in some complex cases
|
7 months ago |
Martin Kroeker
|
75b958a018
|
Transform the B array back if necessary before returning
|
7 months ago |
Marek Michalowski
|
650a062e19
|
Add thread throttling profile for SGEMV on `NEOVERSEV2`
|
7 months ago |
Marek Michalowski
|
b723c1b7b7
|
Add thread throttling profile for SGEMM on `NEOVERSEV2`
|
7 months ago |
Martin Kroeker
|
ceb8f1e34b
|
Merge pull request #5140 from martin-frbg/issue5139
Add ARM64 options for NVIDIA HPC
|
7 months ago |
Martin Kroeker
|
f1fa370579
|
fix missing endif
|
7 months ago |
Martin Kroeker
|
6d1444be3a
|
Add ARM64 options for NVIDIA HPC
|
7 months ago |
Martin Kroeker
|
eb84aac7ad
|
Merge pull request #5084 from quic/topic/sgemm_direct_sme1
Support for SGEMM_DIRECT Kernel based on SME1
|
7 months ago |
Martin Kroeker
|
abbd78aa59
|
Merge pull request #5138 from martin-frbg/issue5131
Ensure that gmake builds with flang-new link the flang runtime into the shared library
|
7 months ago |
Martin Kroeker
|
ebcab90976
|
Handle flang-new runtime library linking on Linux like classic-flang
|
7 months ago |
Martin Kroeker
|
ed1584666c
|
Merge pull request #5137 from martin-frbg/issue5136
Fix the CMake build to define USE_TRMM for RISCV64 targets as well
|
7 months ago |
Martin Kroeker
|
b9ae246f20
|
define USE_TRMM for RISCV64 targets as well
|
7 months ago |
Martin Kroeker
|
86cf9d8a2e
|
Merge pull request #5133 from OpenMathLib/revert-4920-issue4917
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
|
7 months ago |
Martin Kroeker
|
0b3c56968d
|
Merge pull request #5135 from martin-frbg/ghwf-n2
CI: remove the express NeoverseN2 target from the Cobalt100 job in the gh workflow
|
7 months ago |
Martin Kroeker
|
c1bb90a823
|
remove the express NeoverseN2 target from the Cobalt100 job
|
7 months ago |
Martin Kroeker
|
77c638db67
|
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
|
7 months ago |
Vaisakh K V
|
f66ca05b31
|
Merge branch 'develop' into topic/sgemm_direct_sme1
|
7 months ago |
Vaisakh K V
|
d23eb3b93e
|
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
|
10 months ago |
Martin Kroeker
|
a64b75a2e0
|
Merge pull request #5127 from Harishmcw/gesv-threshold
Refined GESV Parallelization Logic for Windows on ARM64
|
7 months ago |
Martin Kroeker
|
453efbd103
|
Merge pull request #5128 from martin-frbg/issue5120
Add -O2 to flang flags when building on WoA in Release mode
|
7 months ago |