lglglglgy
1ff303f36e
Optimizing the Implementation of GEMV on the RISC-V V Extension
Specialized some scenarios, performed loop unrolling, and reduced the
number of multiplications.
5 months ago
Martin Kroeker
67c5bdd639
Azure CI: Update flang call in OSX_LLVM_flangnew job ( #5208 )
* Update flang call in OSX_LLVM_flangnew job
5 months ago
Martin Kroeker
f0008f50cc
Merge pull request #5206 from ColumbusAI/develop
Update zsum.c -- fixed spelling error to successfully compile
6 months ago
ColumbusAI
7bf848454d
Update zsum.c -- fixed spelling error to successfully compile
spelling error where zsum_kernel is used and it should be zasum_kernel. Will not compile without fix.
6 months ago
Martin Kroeker
f90eff306d
Merge pull request #5197 from e4t/z-arch-exec-stack
On zarch don't produce objects from assembler with a writable stack s…
6 months ago
Egbert Eich
61b9339d3a
getarch/cpuid.S: Fix warning about executable stack
When using the GNU toolchain a warning is printed about an executible
stack:
/usr/lib64/gcc/.../x86_64-suse-linux/bin/ld: warning: /tmp/ccyG3xBB.o: missing .note.GNU-stack section implies executable stack
[ 15s] /usr/lib64/gcc/.../x86_64-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
to prevent this warning, add:
```
.section .note.GNU-stack,"",@progbits
```
Signed-off-by: Egbert Eich <eich@suse.com>
6 months ago
Egbert Eich
ea6515c4b3
On zarch don't produce objects from assembler with a writable stack section
On z-series, the current version of the GNU toolchain produces warnings
such as:
```
/usr/lib64/gcc/[...]/s390x-suse-linux/bin/ld: warning: ztrmm_kernel_RC_Z14.o: missing .note.GNU-stack section implies
executable stack
/usr/lib64/[...]/s390x-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
```
To prevent this message and make sure we are future proof, add
```
.section .note.GNU-stack,"",@progbits
```
Also add the `.size` bit to give the asm defined functions a proper size
in the symbol table.
Signed-off-by: Egbert Eich <eich@suse.com>
6 months ago
Martin Kroeker
f33943d73e
Merge pull request #5196 from martin-frbg/issue5193
Fix misinterpretation of NO_LAPACK=0 and SPMV settings in CMake builds
6 months ago
Martin Kroeker
8b35534201
Merge pull request #5195 from martin-frbg/update-gensymbolpl
Re-synchronize gensymbol.pl with the posix shell version
6 months ago
Martin Kroeker
51c1fb1f93
Fix ?spmv build and misinterpretation of NO_LAPACK=0
6 months ago
Martin Kroeker
3ca1ba1be3
resynchronize with the posix shell version
6 months ago
Martin Kroeker
72f0abeed5
Merge pull request #5191 from Harishmcw/CMake_Symbol_Fix
Fix DLL symbol name pre/postfixing in CMake builds on Windows
6 months ago
Harishmcw
1724b3f104
DLL symbol pre/postfixing in CMake builds
6 months ago
Harishmcw
c2e7ab5351
DLL symbol pre/postfixing in CMake builds
6 months ago
Martin Kroeker
200771078f
Merge pull request #5190 from Harishmcw/develop
Fix missing commas in gensymbol.pl and DLL symbol pre/postfixing in CMake builds
6 months ago
Martin Kroeker
4e3afa7beb
Merge pull request #5175 from shubhamsvc/dgemv_thread_throttling
Add thread throttling profile for DGEMV on NEOVERSEV1
6 months ago
Harishmcw
c0a5c9655e
Fix missing commas in gensymbol.pl
6 months ago
shubham.chaudhari
8e289ecddc
Simplified thread throttling function in gemv
6 months ago
shubham.chaudhari
189dbbc04f
Add thread throttling for dynamic arch neoversev1
7 months ago
shubham.chaudhari
b6cb5ece58
Add thread throttling profile for DGEMV on NEOVERSEV1
7 months ago
Martin Kroeker
51c244a098
Merge pull request #5184 from taoye9/fix_sbgemv_n_bug
fix bugs in aarch64 sbgemv_n kernel
6 months ago
Ye Tao
f27ba5efd1
fix bugs in aarch64 sbgemv_n kernel
6 months ago
Martin Kroeker
e9fbe0a838
Merge pull request #5183 from annop-w/fix_sbgemv_t
Fix bug in ARM64 sbgemv_t
6 months ago
Annop Wongwathanarat
edef2e4441
Fix bug in ARM64 sbgemv_t
6 months ago
Martin Kroeker
b55ca71d5b
Merge pull request #5182 from annop-w/sgemm_ncopy
Optimize aarch64 sgemm_ncopy
6 months ago
Martin Kroeker
2f778554b8
Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
6 months ago
Martin Kroeker
66e0f1e621
Merge pull request #5178 from martin-frbg/lapack_cplx_dummy
Add dummy implementations of make_complex_(float/double) to simplify Windows DLL linking
6 months ago
Annop Wongwathanarat
9807f56580
Optimize aarch64 sgemm_ncopy
6 months ago
Martin Kroeker
1ba02656e6
Merge pull request #5177 from martin-frbg/cmakelapacke
Fix omission of LAPACKE interfaces for cgesvdq,strsyl3 and deprecated functions in CMAKE builds
6 months ago
Martin Kroeker
8a418b1aab
Add dummy implementations for the LAPACK_COMPLEX_CUSTOM case
6 months ago
Martin Kroeker
b34235ca66
Fix inclusion of deprecated interfaces and cgesvdq/strsyl3
6 months ago
Martin Kroeker
37b854769b
Merge pull request #5173 from nakagawa-fj/gemm_load_imbalance
Improving Load Imbalance in Thread-Parallel GEMM
6 months ago
Martin Kroeker
a3e7b16072
Merge pull request #5157 from manaalmj/feature
Optimize gemv_n_sve kernel
6 months ago
Martin Kroeker
8865850496
Merge pull request #5176 from annop-w/fix_sbgemv_t
Fix aarch64 sbgemv_t compilation error for GCC < 13
6 months ago
Ye Tao
4c00099ed6
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
6 months ago
Annop Wongwathanarat
a085b6c9ec
Fix aarch64 sbgemv_t compilation error for GCC < 13
6 months ago
Masato Nakagawa
80d3c2ad95
Add Improving Load Imbalance in Thread-Parallel GEMM
6 months ago
manjam01
5c4e38ab17
Optimize gemv_n_sve kernel
7 months ago
Martin Kroeker
39eb43d441
Improve thread safety of pthreads builds that rely on C11 atomic operations for locking ( #5170 )
* Tighten memory orders for C11 atomic operations
6 months ago
Martin Kroeker
1d5ed5c46b
Merge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
7 months ago
Martin Kroeker
7338a473a7
Merge pull request #5150 from Harishmcw/WoA-Experiments
Redefined threading logic for GESV and GEMV on WoA
7 months ago
Martin Kroeker
5f200dca54
Merge pull request #5166 from martin-frbg/issue5158
Expose the option to build without LAPACKE to ccmake
7 months ago
Martin Kroeker
8b98db13e3
Merge pull request #5167 from taoye9/fix_sbgemv_n_kernel_typo
fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
7 months ago
Ye Tao
6b8b35cdf2
fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
7 months ago
Ye Tao
38ee7c9301
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
7 months ago
Martin Kroeker
217324d880
Merge pull request #5162 from taoye9/add_sbgemv_tests
add beta and alpha testcase for sbgemv
7 months ago
Martin Kroeker
e4630ed15a
Merge pull request #5160 from taoye9/sbgemv_n_neon
Add SBGEMVN Kernel for ARM64
7 months ago
Martin Kroeker
35914aa9a2
Expose the option to build without LAPACKE to ccmake
7 months ago
Martin Kroeker
2b941c44b5
Merge branch 'develop' into sbgemv_n_neon
7 months ago
Martin Kroeker
c797e27a1c
Merge pull request #5159 from annop-w/sbgemv_t_bfdot
Add sbgemv_t_bfdot kernel for ARM64
7 months ago