Rajalakshmi Srinivasaraghavan
41fe6e864e
POWER10: Update param.h
Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.
4 years ago
Martin Kroeker
74b5850581
Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds
4 years ago
Martin Kroeker
da0c94c76f
Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds
4 years ago
Martin Kroeker
a6692dc129
use gfortran-10 with xcode 12
4 years ago
Martin Kroeker
72a553f5bc
Update .travis.yml
4 years ago
Martin Kroeker
dcbb3b5ef1
fix misplaced lines
4 years ago
Martin Kroeker
57456c248b
fix gfortran requirement in osx interface64 test
4 years ago
Martin Kroeker
c361313564
Disable deprecated 32bit xcode
4 years ago
Gengxin Xie
0cb7a403b2
fix error declare function blas_level1_thread_with_return_value
4 years ago
Martin Kroeker
77a538d4ba
Update an overlooked instance of xcode 10.0 as well
4 years ago
Martin Kroeker
9621062eba
Update OSX xcode version to 11.5
4 years ago
Gengxin Xie
b766c1e9bb
Improve the performance of zasum and casum with AVX512 intrinsic
4 years ago
Martin Kroeker
22574b474e
Suppress -mfma as well for gcc 4.6
4 years ago
Martin Kroeker
f662022994
Move the version check to avoid overwriting unprocessed compiler data
4 years ago
Martin Kroeker
5e81e81478
Merge pull request #3014 from RajalakshmiSR/dgemvnp10
POWER10: Optimize dgemv_n
4 years ago
Rajalakshmi Srinivasaraghavan
7d46e31de1
POWER10: Optimize dgemv_n
Handling as 4x8 with vector pairs gives better performance than
existing code in POWER10.
4 years ago
Martin Kroeker
62a2eb884f
Add SSE flags for x86
4 years ago
Martin Kroeker
2e99e2699b
Add workaround for gcc 4.6 miscompiling assembly kernels with -mavx
4 years ago
Martin Kroeker
006b13299f
Merge pull request #3012 from martin-frbg/restore-getarch
Restore RISCV entries accidentally trashed by my PR 3005
4 years ago
Martin Kroeker
ca17d3dc3d
Restore RISCV entries accidentally trashed by my PR 3005
4 years ago
Martin Kroeker
52ed2741c5
Merge pull request #3010 from ggouaillardet/topic/fj_compilers
add Fujitsu compilers
4 years ago
cyy
3b4c016110
link math lib on FreeBSD
4 years ago
Gilles Gouaillardet
358100ec15
add Fujitsu compilers
Co-authored-by: Tomoki Karatsu <karatsu.spack@gmail.com>
4 years ago
Martin Kroeker
3788b6d156
Merge pull request #3005 from martin-frbg/ssefix
Add -msse for x86 and silence build warning in getarch
4 years ago
Martin Kroeker
bc5b1ddf0d
Merge pull request #3004 from martin-frbg/bsd_getauxval
ARM64 DYNAMIC_ARCH build fix for BSD/OSX
4 years ago
Martin Kroeker
2f42d23104
Merge pull request #3002 from martin-frbg/issue3000
Ensure that all targets in a DYNAMIC_ARCH build on POWER use the same buffer size
4 years ago
Martin Kroeker
b72dd007dc
Merge pull request #3001 from martin-frbg/issue2996
Fix ambiguous ifdefs in tests for user-defined options in Makefiles
4 years ago
Martin Kroeker
11ebe5fa25
Avoid redefinition warning
4 years ago
Martin Kroeker
01f01dae98
Add -msse if supported
4 years ago
Martin Kroeker
e7bf8ced6c
Build fix for systems that do not support getauxval
4 years ago
Martin Kroeker
0256294921
Fix syntax mixup
4 years ago
Martin Kroeker
2b114c3f30
Restore proper Makefile
4 years ago
Martin Kroeker
60e1fddca7
Ensure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds
4 years ago
Martin Kroeker
ebb8788696
Use ifneq instead of ifdef for CROSS option
4 years ago
Martin Kroeker
857afcc41d
Use ifeq instead of ifdef for user-definable build options
4 years ago
Martin Kroeker
5fa305172a
Use ifeq instead of ifdef for user-definable options
4 years ago
Martin Kroeker
d3ff1f889f
Convert ifndefs to ifneq
4 years ago
Martin Kroeker
65eb7afaf4
Change ifndef CROSS to ifneq
4 years ago
Martin Kroeker
8a6b17f97d
Change ifndefs to ifneq
4 years ago
Martin Kroeker
0f863f96e4
Merge pull request #112 from xianyi/develop
rebase
4 years ago
Martin Kroeker
437702e0e1
Merge pull request #2965 from epsilon-0/develop
allow setting soname without suffix or prefix
4 years ago
Martin Kroeker
f1bf040b25
Merge pull request #2988 from xiegengxin/smp-asum
Improve the performance of dasum and sasum when SMP is defined
4 years ago
Martin Kroeker
613e3b2baf
Merge pull request #2997 from Flamefire/reproduce_crash
Add reproducer test for crash after fork
4 years ago
Xianyi Zhang
05a0ea2340
Merge branch 'risc-v' into develop
4 years ago
Xianyi Zhang
7037849498
Merge branch 'develop' into risc-v
4 years ago
Xianyi Zhang
c6c9c24d1b
Update doc for C910.
4 years ago
Martin Kroeker
6dd71af0c3
Merge pull request #2995 from Flamefire/fix_thread_buffer_init
Don't overwrite blas_thread_buffer if already set
4 years ago
Alexander Grund
a05dc6e62b
Add reproducer test for crash after fork
See #2993 for an analysis
4 years ago
Alexander Grund
60005eb47b
Don't overwrite blas_thread_buffer if already set
After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.
Fixes #2993
4 years ago
Anton Blanchard
043f3d6faa
POWER10: Use POWER9 as a fallback
If the toolchain is too old, or the mma features isn't set on a POWER10
fall back to the POWER9 loops.
4 years ago