abhishek-fujitsu
0c239c9d48
update contribution list
5 months ago
abhishek-fujitsu
9c02cdb073
optimise dot using thread throttling for NEOVERSE V1
6 months ago
Martin Kroeker
d0e8fd6d40
Merge pull request #5239 from annop-w/gemv_n_sve
Use SVE kernel for S/DGEMVN for SVE machines
5 months ago
Martin Kroeker
ddfefd9bf8
Merge pull request #5240 from iha-taisei/fixedIssue5231
Fix: Potential out-of-bounds read in non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.
5 months ago
Iha, Taisei
08b5c18d70
fixed a potential out-of-bounds on gemv.
5 months ago
Annop Wongwathanarat
e11744a411
Use SVE kernel for S/DGEMVN for SVE machines
5 months ago
Martin Kroeker
db0abfa907
Merge pull request #5238 from martin-frbg/revert5125
remove non-vectorized SGEMV transpose reduce path for POWER8, restoring optimizations frpm PR4880
5 months ago
Martin Kroeker
7389b6c483
Merge pull request #5237 from martin-frbg/revert5219
Fix and reinstate the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV
5 months ago
Martin Kroeker
4ec62d7f73
remove non-vectorized code path for power8, restoring PR4880
5 months ago
Martin Kroeker
1df8738f27
Merge pull request #5235 from quickwritereader/issue_unaligned_ppc64le
Explicit unaligned vector load/stores in PPC64LE GEMV kernels
5 months ago
Martin Kroeker
99d9f1ff38
Fix conditional
5 months ago
Martin Kroeker
96d80801bc
Reinstate the CooperLake microkernel
5 months ago
Martin Kroeker
f5bc97c37e
Merge pull request #5227 from zanpeeters/develop
Wrong output from getarch on Apple M4
5 months ago
Martin Kroeker
050c3b26ae
Merge pull request #5236 from ywwry66/apple_workaround
Follow-up to #5233 , fixing "Argument list too long"
5 months ago
Ruiyang Wu
9aa7a0b2a7
Follow-up to d659f3c
5 months ago
Martin Kroeker
94fceaeac5
Merge pull request #5233 from ywwry66/apple_workaround
Fix "Argument list too long" compilation error for Intel macOS
5 months ago
Ruiyang Wu
d659f3c3f6
Fix "Argument list too long" compilation error for Intel macOS
5 months ago
Martin Kroeker
2e4309315c
Merge pull request #5219 from martin-frbg/sbgemvn_cooper
Temporarily disable the Cooper Lake/Sapphire Rapids microkernel for non-transpose SBGEMV
5 months ago
Martin Kroeker
afc1dc69cd
Merge pull request #5234 from RevySR/bump-xuantie-qemu
Bump xuantie qemu for c910v
5 months ago
Ubuntu
0cc2485594
Explicit unaligned vector load/stores in PPC64LE GEMV kernels
5 months ago
Han Gao
1f687b2f60
Bump xuantie qemu for c910v
Signed-off-by: Han Gao <rabenda.cn@gmail.com>
5 months ago
Martin Kroeker
dd38b4e811
Merge pull request #5225 from annop-w/gemv_n
Improve performance for SGEMVN on NEONVERSEN1
5 months ago
Martin Kroeker
3a088de2d1
Merge pull request #5228 from martin-frbg/cmakecrossarm
Update and amend parameters for Neoverse cpus in CMake crossbuilds
5 months ago
Martin Kroeker
0241d516f6
Merge pull request #5220 from iha-taisei/sdgemv_n_unroll
Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1.
5 months ago
Martin Kroeker
afb664527f
Merge pull request #5221 from tetsuzo-usui/tune_symv_for_arm64
Add AArch64-optimized SYMV kernels
5 months ago
Annop Wongwathanarat
d535728803
Improve performance for SGEMVN on NEONVERSEN1
5 months ago
Martin Kroeker
d9369bda1e
Update and amend parameters for Neoverse cpus
5 months ago
zanpeeters
acef78c778
Reset buffer length before every call to sysctlbyname.
5 months ago
zanpeeters
d1c2528aed
Add L1_DATA_LINESIZE for ifdef __APPLE__
5 months ago
zanpeeters
7b66330dea
hw.perflevel[01].cpusperl changed to hw.perflevel[01].cpusperl2
5 months ago
Usui, Tetsuzo
d711906e3e
Add symv kernels for arm64
5 months ago
Iha, Taisei
f1e628b889
Further performance improvements to [SD]GEMV.
5 months ago
Martin Kroeker
39718cd28e
Merge pull request #5218 from martin-frbg/lapacke_mangling
lapacke_mangling.h is no longer generated, so don't delete on make clean
5 months ago
Martin Kroeker
211dfd0754
disable the CooperLake microkernel as it produces wrong results
5 months ago
Martin Kroeker
fd3afef122
lapacke_mangling.h is no longer generated, so don't delete on make clean
5 months ago
Martin Kroeker
b30dc9701f
Merge pull request #5215 from annop-w/gemv_t
Use SVE kernel for S/DGEMVT for SVE machines
5 months ago
Martin Kroeker
2893d0add4
Merge pull request #5211 from guoyuanplct/develop
Optimizing the Implementation of GEMV on the RISC-V V Extension
5 months ago
Martin Kroeker
ed1e470663
Merge pull request #5217 from haampie/hs/fix/darwin-gcc
test_potrs.c: do not use GCC pragma on darwin-aarch64
5 months ago
Harmen Stoppels
3d6d026fe1
no-gcse when loongarch64
5 months ago
Harmen Stoppels
51ba70f47b
test_potrs.c: remove pragma darwin-aarch64 support
Using GCC 14.2.0 on Darwin, the pragma ultimately causes a linker error
"ld: invalid r_symbolnum=". The current workaround is to use the old
linker, but (a) it's deprecated and (b) it can produce libraries that
are subsequently not linkable with the newer linker in dependents: the
new ld64 does not link to libraries with duplicate rpaths created by the
classic linker.
5 months ago
Annop Wongwathanarat
ec146157d3
Use SVE kernel for S/DGEMVT for SVE machines
6 months ago
Martin Kroeker
de2380e5a6
Merge pull request #5214 from martin-frbg/issue5200
Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN
5 months ago
Martin Kroeker
a34b487f22
Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN
5 months ago
Martin Kroeker
1b3e7cc491
Merge pull request #5212 from martin-frbg/lapack1119
Fix incomplete error message in EIG test (Reference-LAPACK PR 1119)
5 months ago
Martin Kroeker
4270d5bc43
Merge pull request #5204 from martin-frbg/issue4692
Repeat the libs target's "ln" in the all target to ensure completeness of copy on Windows
5 months ago
Martin Kroeker
880e43ee54
Merge pull request #5198 from martin-frbg/woadlldebug
Fix pdb file creation in debug dll builds with CMake on Windows/WoA
5 months ago
Martin Kroeker
70865a894e
Merge pull request #5180 from ywwry66/openmp_use_cmake
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
5 months ago
Martin Kroeker
f0f274725d
Merge pull request #5207 from martin-frbg/issue5202
Fix MacOS compilation with xcode16.3/clang17/gcc14
5 months ago
Martin Kroeker
94fb7033a4
Fix incomplete error message (Reference-LAPACK PR 1119)
5 months ago
lglglglgy
1ff303f36e
Optimizing the Implementation of GEMV on the RISC-V V Extension
Specialized some scenarios, performed loop unrolling, and reduced the
number of multiplications.
5 months ago