Martin Kroeker
15d6e58510
Merge pull request #5364 from martin-frbg/blashalf
change BLAS_HALF to BLAS_BFLOAT16 in parallelized POTRF (another missed rename)
2 months ago
Martin Kroeker
04bb5acd79
change BLAS_HALF to BLAS_BFLOAT16 (another missed rename)
2 months ago
Martin Kroeker
3d31887073
Merge pull request #5362 from Mousius/fix-bf16
Fix SBGEMM BFLOAT16 build
2 months ago
Martin Kroeker
0ddf8ebd42
Merge pull request #5354 from pratiklp00/p11
Add Support for POWER11
2 months ago
Martin Kroeker
d2ea9bbb6d
Merge pull request #5363 from guoyuanplct/develop
Update CONTRIBUTORS.md
2 months ago
guoyuanplct
4ff549a450
Update CONTRIBUTORS.md
2 months ago
guoyuanplct
309c48e327
Update CONTRIBUTORS.md
2 months ago
Chris Sidebottom
552e1c7a7a
Correct compiler flags for NEOVERSEV1 target
2 months ago
Chris Sidebottom
46b9b7a080
Also enable BFLOAT16 for make cirun
2 months ago
Chris Sidebottom
eaaa628af2
Enable BUILD_BFLOAT16 in cirun
2 months ago
Chris Sidebottom
7a97c4ca97
Rename HALF -> BFLOAT16 in some more places
2 months ago
Martin Kroeker
ee6560c89f
Merge pull request #5360 from sertonix/cpuid-arm
Fix cpuid.S on arm
2 months ago
Sertonix
8d11e4630c
Fix cpuid.S on arm
The ARM assembly syntax differs a bit
Fixes 61b9339d3a
getarch/cpuid.S: Fix warning about executable stack
Signed-off-by: Sertonix <sertonix@posteo.net>
2 months ago
Martin Kroeker
03a4afcf14
Merge pull request #5359 from martin-frbg/gitign_isnan
update gitignore configuration
2 months ago
Martin Kroeker
901de8f33a
remove lapacke_mangling.h and add la_xisnan.mod
2 months ago
Martin Kroeker
ce6991780a
Merge pull request #5356 from ilina-linaro/ilina-woa
Update README.md to include Windows on Arm64
2 months ago
Martin Kroeker
df013c5e28
Merge pull request #5358 from iha-taisei/dot_unroll
Performance improvements of [SD]DOT with loop-unrolling on A64FX
2 months ago
Iha, Taisei
f7ad906b49
Performance improvements of [SD]DOT with loop-unrolling on A64FX
2 months ago
Lina Iyer
7f360001f9
Update README.md to include Windows on Arm64
Update README.md to indicate that binaries are available for Windows on ARM64
3 months ago
Martin Kroeker
36c2589d3a
Merge pull request #5355 from tetsuzo-usui/add_parallel_laed3
Improve [SD]SYEVD performance by parallelizing [SD]LAED3
3 months ago
Usui, Tetsuzo
14107e37d9
Add parallel laed3
3 months ago
Martin Kroeker
a06bcf836b
Merge pull request #5353 from nakagawa-fj/feature/gemm_divide_rate_for_A64FX
Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for A64FX
3 months ago
Masato Nakagawa
5253c8f165
Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for
A64FX.
3 months ago
Martin Kroeker
8f0a1a3f82
Merge pull request #5303 from martin-frbg/issue5289
Exit if memory allocation keeps failing, instead of retrying forever
3 months ago
Martin Kroeker
2c0dd2468e
Merge pull request #5350 from martin-frbg/issue5341
Declare the server_lock mutex volatile in addition to static
3 months ago
Martin Kroeker
7ae24d0b85
Merge pull request #5351 from martin-frbg/lapack1140
Fix documentation error and ordering bug in ?LAED/?LASD (Reference-LAPACK PR 1140)
3 months ago
Martin Kroeker
5aeca597fe
Fix documentation error and ordering bug (Reference-LAPACK PR 1140)
3 months ago
Martin Kroeker
dcb289539b
Merge pull request #5344 from MaartenBaert/fix-dlasd7
LAPACK: Fix documentation error and ordering bug in DLASD7
3 months ago
Martin Kroeker
9bcffbd655
Declare the server_lock mutex volatile in addition to static
3 months ago
Martin Kroeker
334cd242d4
Merge pull request #5348 from hideaki-motoki/issue5343_prefered_size_for_a64fx
Setting `GEMM_PREFERED_SIZE` parameter for `A64FX`
3 months ago
h-motoki
bba75d5e45
GEMM_PREFERED_SIZE parameter has been changed for A64FX.
3 months ago
Martin Kroeker
4062c10370
Merge pull request #5345 from OpenMathLib/revert-5251-issue5250
Revert "Fix out-of-bounds accesses in ?/SCAL/?GEEV triggered by preceding errrors/invalid inputs"
3 months ago
Martin Kroeker
b78d1dc0ae
Merge pull request #5342 from martin-frbg/cmake_ampere
Add CMake build settings for the Ampere One cpu
3 months ago
Martin Kroeker
83a01d29ca
Revert "Fix out-of-bounds accesses in ?/SCAL/?GEEV triggered by preceding errrors/invalid inputs"
3 months ago
Martin Kroeker
560fa88c96
Add cross-build parameters for Ampere One
3 months ago
Martin Kroeker
55bb5ef867
Add compiler options for Ampere One
3 months ago
Maarten Baert
b37889e52d
Merge branch 'OpenMathLib:develop' into fix-dlasd7
3 months ago
pratiklp00
1dde4a13c0
p11 changes
3 months ago
Martin Kroeker
11ce79a4f0
Merge pull request #5329 from foxtran/fix/docs
Update FAQ
3 months ago
Maarten Baert
0904a42fa4
Fix documentation error and ordering bug in DLASD7
3 months ago
Martin Kroeker
d24195e9a1
Merge pull request #5295 from Pengzhou0810/develop
Fix some hyperthreading errors.
3 months ago
zhoupeng
134b21ae60
Fix some hyperthreading errors.
When there are multiple NUMA nodes and hyper-threading causes adjacent logical cores to share a physical core (e.g., common -> avail[i] = 0x5555555555555555UL), the numa_mapping function should not use a bitmask for filtering, as this would lead to redundant masking with the subsequent local_cpu_map function.
4 months ago
Martin Kroeker
d96daa220d
Merge pull request #5290 from Srangrang/develop
Add support for FP16 to openBLAS and shgemm on RISCV
3 months ago
Martin Kroeker
fdc1c32340
Merge pull request #5336 from martin-frbg/issue5332
Use response files on old PPC/Intel Macs in single-target builds too
3 months ago
Martin Kroeker
5aa483e16c
Use response files on old PPC/Intel Macs in single-target builds too
3 months ago
Martin Kroeker
12591caa91
Merge pull request #5334 from azuresky01/develop
Fix INTERFACE64 builds on Loongarch64 with LLVM
3 months ago
Martin Kroeker
ee26caffb3
Merge pull request #5309 from davidz-ampere/dev-ampereone
Add support for Ampere AmpereOne processors
3 months ago
Martin Kroeker
8b08df5c5a
Merge pull request #5335 from martin-frbg/issue5330
Remove non-portable option from objcopy calls in the CMake build
3 months ago
Martin Kroeker
3bba35b8f7
Remove non-portable option from objcopy calls
3 months ago
azuresky01
8953ba9c2f
Fix INTERFACE64 builds on Loongarch64 with LLVM
fix https://github.com/OpenMathLib/OpenBLAS/issues/5331
3 months ago