Chris Sidebottom
114316f361
Optimize SBGEMM / BGEMM for NEOVERSEV1 further
This changes the kernels to pack full SVE vectors and reduces the
overall complexity of the inner GEMM loop.
1 month ago
Martin Kroeker
75c6ab4036
CI: Update WoA job to use LLVM 20.1.8 and avoid stray preinstalled LLVM19 ( #5411 )
* Update to 20.1.8
* fix PATH to avoid the obsolete LLVM19 that appeared in the preinstalled msvc folder hierarchy
1 month ago
Martin Kroeker
5c5f852ee3
Merge pull request #5415 from martin-frbg/Fixum-5399
Fix compilation of the NeoverseN2 SBGEMM kernel
1 month ago
Martin Kroeker
f1ee61ea30
Include NEON header for the bfloat conversion functions
1 month ago
Martin Kroeker
b3ffd5524a
Include NEON header for the bfloat conversion functions
1 month ago
Martin Kroeker
d23680b81d
Merge pull request #5407 from nakagawa-fj/feature/gemm_divide_rate_for_neoversev1
Multi-thread Performance Improvement of GEMM on NeoverseV1 with DIVIDE_RATE=1
2 months ago
Martin Kroeker
b4cc4be2ce
Merge pull request #5410 from martin-frbg/issue5404
Adjust multithreading threshold in S/DGER and add an intermediate step
2 months ago
Martin Kroeker
0968dddf1a
Merge pull request #5409 from martin-frbg/issue5372
Work around gcc15.1 on POWER misoptimizing DGEMV at -O3
2 months ago
Martin Kroeker
eddfe1e6b3
Merge pull request #5408 from ChipKerchner/fixRISCV64GEMVInitializationAndWarnings
Fix bad vector zero initializer and other compiler warnings for RISC-V.
2 months ago
Martin Kroeker
30d11bc92c
Adjust multithreading threshold and add an intermediate step
2 months ago
Martin Kroeker
a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug
2 months ago
Chip Kerchner
72f082f31d
Fix bad vector zero initializer and other compiler warnings for RISC-V.
2 months ago
Masato Nakagawa
7e29f11396
Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1)
2 months ago
Martin Kroeker
9a64b32b44
Merge pull request #5406 from martin-frbg/fixbgemmtest
Fix building of bgemm tests on GEMM3M-capable (x86) targets
2 months ago
Martin Kroeker
b66a01f909
Fix building of bgemm tests on GEMM3M-capable (x86) targets
2 months ago
Martin Kroeker
a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16
ARM64: Enable bfloat16 kernels by default
2 months ago
abhishek-fujitsu
6356190d06
fix gfortran link path in dynamic_arch.yml
2 months ago
abhishek-fujitsu
4c8dcb3a8f
Darwin/arm64: disable SVE/SME and fix gfortran link path
2 months ago
Martin Kroeker
33b50548eb
Merge pull request #5403 from martin-frbg/issue5402
Introduce a (crude) threshold to multithreading in STRMV/DTRMV
2 months ago
Martin Kroeker
c504aedca1
Merge pull request #5400 from Mousius/neoversev2-target
Add NEOVERSEV2 target support
2 months ago
Martin Kroeker
b9e107932a
add NeoverseV2
2 months ago
Martin Kroeker
2f89a5970e
fix NeoverseV2 typo
2 months ago
Martin Kroeker
a9e8fa06bf
Introduce a (crude) threshold to multithreading
2 months ago
Martin Kroeker
b4c2b34a45
Merge pull request #5401 from martin-frbg/followup-5397
Include float-bfloat conversion functions in ONLY_CBLAS builds as well
2 months ago
Martin Kroeker
c9204f7b6f
Merge pull request #5399 from Mousius/bgemm-8x4
Add optimized BGEMM for NEOVERSEN2 target
2 months ago
Martin Kroeker
a55e65dba9
Merge pull request #5391 from martin-frbg/issue5387
Use OpenBLAS_ROOT_DIR in OpenBLASConfig.cmake generation only if set
2 months ago
abhishek-fujitsu
0bc79da587
add neon header
2 months ago
abhishek-fujitsu
720a4743b9
update contribution list
2 months ago
abhishek-fujitsu
05fc88180c
ARM64: Enable bfloat16 kernels by default
4 months ago
Martin Kroeker
965463f177
Include float-bfloat conversion functions in ONLY_CBLAS builds as well
2 months ago
Martin Kroeker
4272cf8c7f
Merge pull request #5398 from martin-frbg/fixup-5394
Update ?GEMM-to-?GEMV forwarding settings for CMake
2 months ago
Chris Sidebottom
87247daadc
Add NEOVERSEV2 target support
Did a quick run around to make `TARGET=NEVOERSEV2` build successfully.
Fixes #5385
2 months ago
Chris Sidebottom
ea2faf0c9a
Add optimized BGEMM for NEOVERSEN2 target
This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.
2 months ago
Martin Kroeker
a5b55f6fe3
remove CBLAS restriction on GEMM_GEMV forwarding
2 months ago
Martin Kroeker
a4f4662459
Merge pull request #5397 from omegacoleman/fix-cblas-bgemm
Fix cmake building with cblas_bgemm
2 months ago
Martin Kroeker
82954ba4ca
Update ?GEMM-to-?GEMV forwarding settings
2 months ago
Martin Kroeker
392d38168e
Merge pull request #5394 from Mousius/optimize-bgemv
Optimized BGEMV for NEOVERSEV1 target
2 months ago
youcai
41f9701ebc
Fix cmake building with cblas_bgemm
2 months ago
Martin Kroeker
f4caa61e47
Merge pull request #5395 from martin-frbg/fixloongsonCI
Fix libffi6 download in the Loongarch64_clang CI job (for now)
2 months ago
Martin Kroeker
444d03db9c
switch to another site that still has libffi6 (for now)
2 months ago
Chris Sidebottom
2c3cdaf74e
Optimized BGEMV for NEOVERSEV1 target
- Adds bgemv T based off of sbgemv T kernel
- Adds bgemv N which is slightly alterated to not use Y as an
accumulator due to the output being bf16 which results in loss of
precision
- Enables BGEMM_GEMV_FORWARD to proxy BGEMM to BGEMV with new kernels
2 months ago
Martin Kroeker
7d908564fe
Use OpenBLAS_ROOT_DIR in CMake config file generation only if set
2 months ago
Martin Kroeker
2f81d6e60c
Merge pull request #5390 from martin-frbg/issue5388-2
Declare the "small" complex DOT and AXPY kernels for RISCV-ZVL256B static in addition to inline
2 months ago
Martin Kroeker
e2d941e9af
Declare the "small" kernel static in addition to inline
2 months ago
Martin Kroeker
8214700930
Declare the "small" kernel static in addition to inline
2 months ago
Martin Kroeker
4ae8707b54
Merge pull request #5389 from martin-frbg/issue5388
Add cross-compilation parameters for RISCV64 targets in CMake
2 months ago
Martin Kroeker
b24212f5df
fix numbers
2 months ago
Martin Kroeker
6ff06f5483
Add cross-compilation data for RISCV64 targets
2 months ago
Martin Kroeker
d92f151634
Merge pull request #5386 from martin-frbg/issue5384
Fixes for some gcc warnings
2 months ago
Martin Kroeker
30dbca5051
fix misleading indentation to silence a gcc warning
2 months ago