Martin Kroeker
20c6c38e51
Merge branch 'develop' into atomic
7 years ago
Martin Kroeker
8ec28ff461
Remove unguarded use of _Atomic and fix tabbing
7 years ago
Martin Kroeker
bb9876db33
Fix thread races and infinite looping on systems with many cpus
On systems with more than 64 cpus, blas_quickdivide will sometimes return zero which creates bogus workloads when used for the stride calculation. This then leads to threads spinning incessantly waiting for a status change that never happens, as seen in #1497 .
This patch also fixes several data races that were found by helgrind and/or tsan while debugging the issue.
7 years ago
Martin Kroeker
40160ff3c1
Use _Atomic instead of volatile for thread safety where C11 is supported
7 years ago
Andrew
d602b99386
LAPACK helpers in C that need care too
7 years ago
Ashwin Sekhar T K
3918d17025
LAPACK: Fix lapack-test errors in ARM64 threaded version
8 years ago
Werner Saar
c81dc6322f
prepared lapack/potrf functions for UNROLL values, that are not a power of two
8 years ago
Werner Saar
3e1bbd6b5f
prepared lapack/getrf functions for UNROLL values, that are not a power of two
8 years ago
Werner Saar
956be69e1d
optimized getrf_single.c for POWER8
9 years ago
Werner Saar
6a2bde7a2d
optimized dgemm and dgetrf for POWER8
9 years ago
Hank Anderson
e74462a3f5
Moved declarations to start of functions to satisfy MSVC C89 implementation.
10 years ago
Hank Anderson
056ba26755
Changed a number of inline calls to use __inline.
MSVC doesn't inmplement C99, so can't use the inline keyword. __inline
appears to work in MSVC and GCC.
10 years ago
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
11 years ago
Zhang Xianyi
5048a80032
Refs #283 . Fixed the incorrect usage of long data type for Windows 64.
12 years ago
Zhang Xianyi
32d2ca3035
Refs #214 , #221 , #246 . Fixed the getrf overflow bug on Windows.
I used a smaller threshold since the stack size is 1MB on windows.
12 years ago
Zhang Xianyi
5d3312142a
Refs #221 #246 . Fixed the overflowing stack bug in mutlithreading BLAS3.
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.
typedef struct {
volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
job_t job[MAX_CPU_NUMBER];
The job array is equal 8MB.
Thus, We use malloc instead of stack allocation.
12 years ago
Zhang Xianyi
1b056c5328
Refs #130 Prevent reading ipiv array beyond the bound in ?laswp. Use laswp instead of laswp_oncopy in getrf.
13 years ago
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
14 years ago