Martin Kroeker
36fcb52094
Fix logic - we want real OR imaginary part of X to be nonzero here
2 years ago
H. Vetinari
f2659516ef
remove unqualified ifdef's for NO_LAPACK(E)
2 years ago
Martin Kroeker
579bc86671
remove call to omp_set_num_threads
2 years ago
Martin Kroeker
e298d613fa
initialize status variable for openblas_set_num_threads
2 years ago
Martin Kroeker
05aa88268f
add status variable for openblas_set_num_threads
2 years ago
Martin Kroeker
e38ab079a0
Fix OpenMP thread counting returning places rather than cores
2 years ago
Martin Kroeker
d4868babbc
Fix typos
2 years ago
Martin Kroeker
18c99d3e63
Update dynamic_arm64.c
2 years ago
Martin Kroeker
186a310f92
Update dynamic_arm64.c
2 years ago
Martin Kroeker
da6e426b13
fix Cooperlake not selectable via environment variable
2 years ago
Honglin Zhu
4989e039a5
Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build
2 years ago
Honglin Zhu
b00d5b9746
New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
2. Padding k to a power of 4.
2 years ago
Martin Kroeker
ab6009b0b6
Merge pull request #3773 from staticfloat/sf/openblas_default_num_threads
Add `OPENBLAS_DEFAULT_NUM_THREADS`
3 years ago
Martin Kroeker
db50ab4a72
Add BUILD_vartype defines
3 years ago
Elliot Saba
d2ce93179f
Add `OPENBLAS_DEFAULT_NUM_THREADS`
This allows Julia to set a default number of threads (usually `1`) to be
used when no other thread counts are specified [0], to short-circuit the
default OpenBLAS thread initialization routine that spins up a different
number of threads than Julia would otherwise choose.
The reason to add a new environment variable is that we want to be able
to configure OpenBLAS to avoid performing its initial memory
allocation/thread startup, as that can consume significant amounts of
memory, but we still want to be sensitive to legacy codebases that set
things like `OMP_NUM_THREADS` or `GOTOBLAS_NUM_THREADS`. Creating a new
environment variable that is openblas-specific and is not already
publicly used to control the overall number of threads of programs like
Julia seems to be the best way forward.
[0] https://github.com/JuliaLang/julia/pull/46844
3 years ago
Kai T. Ohlhus
84453b924f
Support CONSISTENT_FPCSR on AARCH64
3 years ago
Martin Kroeker
9402df5604
Fix missing external declaration
3 years ago
Martin Kroeker
bd30120ba7
Merge pull request #3720 from FlyGoat/mips64
Make it work on general MIPS64 processors
3 years ago
Jiaxun Yang
fae9368f14
Implement DYNAMIC_LIST for MIPS64
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
3 years ago
Jiaxun Yang
a50b29c540
Provide a fallback MIPS64_GENERIC target
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
3 years ago
Jiaxun Yang
b633eb79f2
Use $at as temporary register for mips/loongson CPUCFG read
Some compilers (namely LLVM) are not happy with clobbering
registers in inline assembly.
Use $at as temporary register and explicitly use noat
hint.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
3 years ago
Martin Kroeker
19fefd100e
Merge pull request #3703 from martin-frbg/omp_adaptive
Add env variable OMP_ADAPTIVE to control OMP threadpool behaviour
3 years ago
Jiaxun Yang
19d4f90c44
Use auvx to detect CPUCFG on mips/loongson
It's safer and easier than SIGILL.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
3 years ago
Martin Kroeker
d0ba257de0
Merge pull request #3704 from XiWeiGu/loongarch64_dynamic_arch
LoongArch64: Add DYNAMIC_ARCH support
3 years ago
gxw
fbfe1daf6e
LoongArch64: Add DYNAMIC_ARCH support
3 years ago
Martin Kroeker
80cdfed7b2
Use OMP_ADAPTIVE setting to choose between static and dynamic OMP threadpool size
3 years ago
Martin Kroeker
08e3754b39
Add environment variable OMP_ADAPTIVE
3 years ago
Martin Kroeker
30473b6a9d
add openblas_getaffinity()
3 years ago
Martin Kroeker
daca01622b
fix detection of Neoverse V1 and user-enforced selection of N2 in ARM64 DYNAMIC_ARCH ( #3700 )
* fix detection of Neoverse V1 and user-enforced selection of N2
3 years ago
Honglin Zhu
d5ca477f42
Neoverse N2: DYNAMIC_ARCH
3 years ago
Martin Kroeker
69148ae795
Guard against sysconf returning zero processors
3 years ago
Martin Kroeker
e9260f5451
Guard against system call returning zero processors
3 years ago
Martin Kroeker
2c62096fce
Expand cpu mapping for future Zen cpus and use feature-based fallback for unknown AMD family codes
3 years ago
Adam Niederer
69f2ac4ea2
Fix broken elif in dynamic.c
This fixes compilation in the following case:
$(MAKE) USE_OPENMP=1 USE_THREAD=1 NO_LAPACK=0 DYNAMIC_ARCH=1 \
DYNAMIC_LIST="HASWELL SKYLAKEX ATOM COOPERLAKE SAPPHIRERAPIDS ZEN"
3 years ago
Martin Kroeker
8d5a9c2f98
Merge pull request #3565 from jonaszhou1/develop
Support Zhaoxin/Centaur kh40000 as ZEN
3 years ago
Martin Kroeker
bf4642eb7e
Report USE_TLS if set
3 years ago
JonasZhou
2d0ad89b0d
Support Zhaoxin/Centaur kh40000 as ZEN
Signed-off-by: JonasZhou <JonasZhou@zhaoxin.com>
3 years ago
Martin Kroeker
fa3e9f25e6
Support AVX512-enabled Alder Lake
3 years ago
Martin Kroeker
7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
WIP casts and cleanups
3 years ago
Martin Kroeker
7f0b11fbc1
Exclude some complex drivers when NO_LAPACK is set
3 years ago
Martin Kroeker
b6b024232d
Merge pull request #3508 from snadampal/v1_n2
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
3 years ago
Sunita Nadampalli
19c8f615dc
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
3 years ago
Martin Kroeker
b329e45288
Guard against omp_get_num_places returning zero
3 years ago
Martin Kroeker
07fe5b19a4
typecast function pointers
3 years ago
Martin Kroeker
6ed52576f8
Add feature-based fallback for unknown x86_64 cpus
3 years ago
Martin Kroeker
7a7fbb11c3
define "unlikely" on non-cygwin too
3 years ago
Martin Kroeker
b31349c22a
Open up delayed (re)init to non-Cygwin OS as well
3 years ago
Martin Kroeker
c8d05aa7a5
Move the threads overflow flag under the protection of the local blas lock ( #3476 )
* Move accesses to the overflow flag into the scope of the blas lock
3 years ago
Rafael Cardoso Fernandes Sousa
214fbcee15
Fix cmake for power
3 years ago
Martin Kroeker
4f057bffd6
Fix NULL pointer checks in blas_memory_alloc
3 years ago