Jerome Robert
b17ccb4c5c
Fix a segfault in gemv when MAX_STACK_ALLOC is set
* stack_alloc_size is needed after the implementation call
but it may be overwritten if it's optimized to a register,
because some gemv implementation (ex: dgemv_n.S) do not
restore all register (ex: r10).
* do the same in ger.c for the same reasons even if the bug
has not been observed.
10 years ago
Hank Anderson
5eefe18ae4
Added CMakeLists.txt for the first of the BLAS folders.
It only does the double precision compile currently.
I realized I didn't finish converting Makefile.system yet, so I made
a note of that.
10 years ago
Jerome Robert
e9d9a8eae3
Allow to do gemv and ger buffer allocation on the stack
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.
Fix #478
10 years ago
wernsaar
9e829ce98f
enabled cblas gemm3m functions
11 years ago
wernsaar
d49fd33885
disabled SYMM3M and HEMM3M functions because segment violations
11 years ago
wernsaar
7aae4a62e7
enabled use of GEMM3M functions
11 years ago
wernsaar
3300f5ebff
optimized multithreading lower limits
11 years ago
wernsaar
fd2478c9e2
optimized interface/zgemv.c for multithreading
11 years ago
Zhang Xianyi
1cba8e7b11
Merge pull request #446 from grisuthedragon/cblas_matcopy
Add a CBLAS interface for the BLAS extension s/d/c/z*matcopy routines.
11 years ago
Martin Koehler
a057e5434d
add CBLAS interface for s/d/c/zimatcopy
11 years ago
Martin Köhler
7794766d3c
Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them.
11 years ago
wernsaar
f511807fc0
modified multithreading threshold
11 years ago
wernsaar
d1800397f5
optimized interface/gemv.c for multithreading
11 years ago
wernsaar
f4ff889491
updated interface/gemv.c for multithreading
11 years ago
wernsaar
51413925bd
adjust number of threads for small size in cgemv and zgemv
11 years ago
wernsaar
b985cea65d
adjust number of threads for sgemv and dgemv
11 years ago
wernsaar
d286daa2ba
adjusted number of threads for small size
11 years ago
wernsaar
cedc1f4b14
Ref #410 : disabled optimized potri functions ( single threading bug)
11 years ago
wernsaar
02a504c0b8
fixed my bug in ger.c
11 years ago
wernsaar
be94db096c
disabled *3M functions for x86_64 platforms
11 years ago
wernsaar
aee61456a4
disabled SMP for sbmv and zsbmv again
11 years ago
wernsaar
01a119abfc
enabled SMP for sbmv and zsbmv, but only for 64bit binaries
11 years ago
wernsaar
1fad2b759f
enabled smp for ger.c and zger.c, but only for 64bit binaries
11 years ago
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
11 years ago
wernsaar
15d5dfa92c
fixed compiler warnings
11 years ago
wernsaar
86d8c8978b
Ref #391 : disabled SMP in ger.c and zger.c
11 years ago
wernsaar
a19d209005
Ref #103 : enhancement for small matrix dimensions
11 years ago
wernsaar
faeab93df0
Ref #51 : added blas extensions simatcopy, dimatcopy, cimatcopy, zimatcopy
11 years ago
wernsaar
cee257f384
Ref #51 : added blas extensions zomatcopy and comatcopy
11 years ago
wernsaar
7bfb3011e8
Ref #51 : added blas extension somatcopy
11 years ago
wernsaar
8c8f596238
Ref #51 : added blas extension domatcopy as not opimized reference
11 years ago
wernsaar
bff575d0b1
Ref #375 : added workaround for small sizes to scal.c and zscal.c
11 years ago
wernsaar
faf3ac0aad
Ref #285 : added axpby kernels
11 years ago
Zhang Xianyi
b31ec99372
Fixed #374 .
Merge branch 'TimothyGu-develop' into develop
11 years ago
wernsaar
25e899b60b
fixed function profile in zpotri.c
11 years ago
wernsaar
89da450800
enabled and tested optimized potri lapack functions
11 years ago
wernsaar
c26bbee489
enabled abd tested optimized trtri lapack functions
11 years ago
Timothy Gu
ced13574a0
Random "walk (a)round" --> "work-around" typo fixes
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
11 years ago
wernsaar
a748d3a75d
enabled optimized trti2 lapack functions again
11 years ago
wernsaar
a5ab231ad4
enabled optimized complex lauum lapack functions again
11 years ago
wernsaar
dbaeea7b59
enabled lauu2 and lauum lapack functions again
11 years ago
wernsaar
0d75f3b6a2
enabled and tested optimized gesv lapack functions
11 years ago
wernsaar
abad6f66d6
marked trti2.c and ztrti2.c as bad
11 years ago
wernsaar
2ff66e661d
enabled and tested optimized laswp lapack function
11 years ago
wernsaar
5e55034922
marked zlauu2.c and zlauum.c as bad
11 years ago
wernsaar
9a9e810239
marked trtri.c and ztrtri as bad
11 years ago
wernsaar
45be9ac111
moved trtri.c and ztrtri.c to the directory lapack
11 years ago
wernsaar
9f201558c9
marked lauu2.c and lauum.c as bad
11 years ago
wernsaar
d4237cb7f3
marked larf.c as obsolete
11 years ago
wernsaar
aaa9d7fbf8
marked potri functions as bad because a lot of errors
11 years ago