Martin Kroeker
8f22ac552b
Add vendor string Shanghai as successor to Centaur
4 years ago
Martin Kroeker
eb2fdd3af0
Recognize newer Zhaoxin/Centaur processors as Nehalem
4 years ago
User User-User
750719528a
bugz
4 years ago
User User-User
6423b282a1
dynamic_arch
4 years ago
Martin Kroeker
cbfd3c87e1
Recognize Intel Ice Lake SP as Cooper Lake
4 years ago
Martin Kroeker
623d580b4c
Restore __volatile__ keyword
4 years ago
Martin Kroeker
186368ddc3
Fix compilation with CLANG
4 years ago
Martin Kroeker
1a3ad4b670
Fix signatures of the TLS-mode dll_callback and p_process_term functions for Win64
4 years ago
Peter Hawkins
dbbf92c1d1
Fix race in blas_thread_shutdown.
blas_server_avail was read without holding server_lock. If multiple threads call blas_thread_shutdown simultaneously, for example, by calling fork(), then they can attempt to shut down multiple times. This can lead to a segmentation fault.
4 years ago
Martin Kroeker
cb429d6b12
Merge pull request #3110 from martin-frbg/issue3108
Fix get_num_procs() in the USE_TLS branch for non-glibc systems
4 years ago
Martin Kroeker
b0bded3f2f
Fix get_num_procs() in the USE_TLS branch for non-glibc systems
4 years ago
Martin Kroeker
e4e5042e38
Recognize Intel Tiger Lake as SkylakeX
4 years ago
Martin Kroeker
0cc36770f1
Merge pull request #3073 from xoviat/embedded
add embedded option
4 years ago
Martin Kroeker
eea0c0f2ed
Merge pull request #3085 from alexhenrie/memory_alloc
Fix null pointer check in blas_memory_alloc
4 years ago
Martin Kroeker
0cb9e9fc8d
Remove the VORTEX support bits again for now
4 years ago
Alex Henrie
113840da12
Fix null pointer check in blas_memory_alloc
4 years ago
Martin Kroeker
deb2e66bcc
Add DYNAMIC_LIST support for ARM64
4 years ago
xoviat
2e8d6e8690
add functions for embedded
4 years ago
Martin Kroeker
b94dab5250
patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel
4 years ago
Martin Kroeker
63fa3c3f8f
Require gcc 11 for builtin_cpu_is(power10)
fixes #3074
4 years ago
xoviat
b60de4447a
add cortex-m platform
4 years ago
Martin Kroeker
2c445be8ba
Merge pull request #3051 from martin-frbg/rocketlake
Add CPUID information for Intel Rocket Lake
4 years ago
Martin Kroeker
6fe0f1fab9
Label get_cpu_ftr as volatile to keep gcc from rearranging the code
4 years ago
Martin Kroeker
17c16f2a71
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers
4 years ago
Martin Kroeker
865676682d
Add Intel Rocket Lake
4 years ago
Martin Kroeker
6232237dba
Make fallback from P10 to P9 conditional on suitable compiler
4 years ago
Martin Kroeker
18d8a67485
Merge pull request #2994 from antonblanchard/power10-fixes
Power10 fixes
4 years ago
gxw
4b548857d6
Add msa support for loongson
1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson
Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1
4 years ago
Martin Kroeker
bc5b1ddf0d
Merge pull request #3004 from martin-frbg/bsd_getauxval
ARM64 DYNAMIC_ARCH build fix for BSD/OSX
4 years ago
Martin Kroeker
e7bf8ced6c
Build fix for systems that do not support getauxval
4 years ago
Martin Kroeker
5fa305172a
Use ifeq instead of ifdef for user-definable options
4 years ago
Alexander Grund
60005eb47b
Don't overwrite blas_thread_buffer if already set
After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.
Fixes #2993
4 years ago
Anton Blanchard
043f3d6faa
POWER10: Use POWER9 as a fallback
If the toolchain is too old, or the mma features isn't set on a POWER10
fall back to the POWER9 loops.
4 years ago
Martin Kroeker
ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic
Improve the performance of rot by using AVX512 and AVX2 intrinsic
4 years ago
Gengxin Xie
d9ba49165a
Improve the performance of rot by using AVX512 and AVX2 intrinsic
5 years ago
Martin Kroeker
aa21cb5217
Merge pull request #2960 from thrasibule/avx2_detection
fix avx2 detection
5 years ago
Guillaume Horel
1f564d729b
fix avx2 detection
reword commits to make it clearer
5 years ago
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
5 years ago
Martin Kroeker
2207a16235
Merge pull request #2952 from martin-frbg/issue2931
Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails
5 years ago
Martin Kroeker
b937d78a6d
Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails
5 years ago
Martin Kroeker
fd7da56965
Move definitions that are neither needed nor supported on SUNOS
5 years ago
Martin Kroeker
ff65952e46
Move HAVE_P10_SUPPORT to the build system
to be able to include a binutils version check
5 years ago
Martin Kroeker
85154c2e18
Change "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
ac653c94f3
Merge branch 'develop' into issue2588-cmake
5 years ago
Martin Kroeker
f032d8966e
Merge pull request #2874 from Flamefire/memory_fixes
Avoid out of bounds access on invalid memory free
5 years ago
Martin Kroeker
f6e4cf2f9d
Merge pull request #2876 from Flamefire/omp_fork_fix
Lazyly reinit threads after a fork in OMP mode
5 years ago
User User-User
d2333e7842
aarch64 fix std=c18 compilation
5 years ago
Alexander Grund
3094fc6c83
Lazyly reinit threads after a fork in OMP mode
This initializes the per-thread memory buffers which get
cleared/released on a fork via pthread_at_fork. Not doing so leads to
each thread calling blas_memory_alloc on almost every execution which
slows down the code significantly as the threads race for the memory
allocation using locks to serialize that.
5 years ago
Alexander Grund
3c05f54df8
Avoid out of bounds access on invalid memory free
5 years ago
Alexander Grund
dee7c49938
Fix TABs and trailing space
5 years ago