Jerome Robert
1fe3aab047
Use GEMM_MULTITHREAD_THRESHOLD as a number of ops
...not a matrix size. For GEMM_MULTITHREAD_THRESHOLD=4
(the default value) this does not change anything but
for other values it make the GEMM and GEMV thresholds
changing in the same way.
Close #742
9 years ago
Jerome Robert
1a1935507b
[z]ger: increase multithread threshold
The ones given in 3ae30cd was by far to low because I
mixed m and m*n in my measures. Note that the new ones
are closed to the [z]gemv ones which is comforting
that both are right.
10 years ago
Jerome Robert
66eafb16cf
swap: disable multi-threading for small matrices
Close #731
9 years ago
Jerome Robert
3ae30cd6b9
Disable multi-threading for small matrices in [z]ger
Ref #731
10 years ago
Jerome Robert
87a2ccc37c
Factorize MAX_STACK_ALLOC code to common_stackalloc.h
Ref #727
10 years ago
Jerome Robert
f9890a6452
Fix compilation when MAX_STACK_ALLOC is not set
Close #722
10 years ago
Zhang Xianyi
285d042b10
Fixed rotg bug on ARM.
10 years ago
Zhang Xianyi
640cccc2b1
Refs #697 . Fixed gemv bug for Windows.
Thank matzeri's patch.
10 years ago
Ralph Campbell
55a0b27c01
Minor C code fixes in interface/
10 years ago
Zhang Xianyi
2feef49fa8
Merge branch 'develop' into cmake
Conflicts:
driver/others/memory.c
10 years ago
Zhang Xianyi
5a291606ad
Refs #671 . the return of i?max cannot larger than N.
10 years ago
Zhang Xianyi
8fade093aa
Fixed cmake bug on Visual Studio.
10 years ago
Zhang Xianyi
94b125255f
Merge branch 'develop' into cmake
Conflicts:
driver/others/memory.c
10 years ago
Zhang Xianyi
baec8f5cac
Refs #638 . Fixed compiling bug with clang on Mac OS X.
10 years ago
Martin Koehler
711ca33bc6
Improved Ximatcopy when lda==ldb.
The Ximatcopy functions create a copy of the input matrix
although they seem to work inplace. The new routines
XIMATCOPY_K_YY perform the operations inplace if the leading
dimension does not change.
10 years ago
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
Disable CBLAS and LAPACK.
10 years ago
Zhang Xianyi
dcd5ba4443
Merge branch 'cmake' of https://github.com/hpanderson/OpenBLAS into hpanderson_cmake
10 years ago
Werner Saar
f8f2e261fe
use only 1 thread if m or n < 2*GEMM_MULTITHREAD_THRESHOLD
10 years ago
Jerome Robert
ab567d8443
gemv: Ensure stack buffer is large enough to handle memory alignment
Ref #478
10 years ago
Zhang Xianyi
847e19c04e
Refs #478,#482, Enable stack alloc for s/dgemv_t.(revert 9798491)
10 years ago
Zhang Xianyi
fd9fd42936
Refs #478 , #482 . Fixed bug on previous commit.
10 years ago
Zhang Xianyi
9798481979
Refs #478 , #482 . Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.
For gemv_t, directly use malloc to create the buffer.
10 years ago
Zhang Xianyi
cdefdb21cd
Refs #492 . Fixed c/zsyr bug with negative incx.
10 years ago
Hank Anderson
0d8e227ea7
Changed strategy for setting preprocessor definitions.
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.
This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
10 years ago
Hank Anderson
b2284647a3
More complex objects.
10 years ago
Hank Anderson
a6116e5859
Added some more complex-only objects.
10 years ago
Hank Anderson
67e39bd8fb
Added mangled complex filenames to interface and lapack CMakeLists.txt.
10 years ago
Hank Anderson
9eb1499095
Added another param to GenerateNamedObjects to mangle complex source names.
There are a lot of sources for complex float types that are the same
names as the real sources, except with z prepended.
10 years ago
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
10 years ago
Hank Anderson
4662a0b13a
Changed generate functions to iterate through a list of float types.
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
10 years ago
Hank Anderson
e74462a3f5
Moved declarations to start of functions to satisfy MSVC C89 implementation.
10 years ago
Hank Anderson
e8c39138c6
Removed return value from GenerateNamedObjects.
It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files.
10 years ago
Hank Anderson
58cff2fed8
Added CBLAS define/naming convention to GenerateNamedObjects.
10 years ago
Hank Anderson
5690cf3f0e
Added override for function names in GenerateNamedObjects.
The BLAS interface folder should now be generated the correct objects
for the DOUBLE case.
10 years ago
Hank Anderson
a0aeda6187
Added function to set defines for the object names (e.g. -DNAME=dgemm).
10 years ago
Hank Anderson
20e593a44a
Added cblas_ objects to interface CMakeLists.
Naming isn't right, though, not seeing cblas_xxxx exports in the
resulting library.
10 years ago
Hank Anderson
9e154aba58
Added LAPACK object files to interface CMakeLists.
10 years ago
Hank Anderson
5057a4b4df
Added openblas add_library call that uses DBLAS_OBJS ojbects.
10 years ago
Hank Anderson
a6cf8aafc0
Updated level3/CMakeLists with correct defines using all combos.
10 years ago
Jerome Robert
b17ccb4c5c
Fix a segfault in gemv when MAX_STACK_ALLOC is set
* stack_alloc_size is needed after the implementation call
but it may be overwritten if it's optimized to a register,
because some gemv implementation (ex: dgemv_n.S) do not
restore all register (ex: r10).
* do the same in ger.c for the same reasons even if the bug
has not been observed.
10 years ago
Hank Anderson
5eefe18ae4
Added CMakeLists.txt for the first of the BLAS folders.
It only does the double precision compile currently.
I realized I didn't finish converting Makefile.system yet, so I made
a note of that.
10 years ago
Jerome Robert
e9d9a8eae3
Allow to do gemv and ger buffer allocation on the stack
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.
Fix #478
11 years ago
wernsaar
9e829ce98f
enabled cblas gemm3m functions
11 years ago
wernsaar
d49fd33885
disabled SYMM3M and HEMM3M functions because segment violations
11 years ago
wernsaar
7aae4a62e7
enabled use of GEMM3M functions
11 years ago
wernsaar
3300f5ebff
optimized multithreading lower limits
11 years ago
wernsaar
fd2478c9e2
optimized interface/zgemv.c for multithreading
11 years ago
Zhang Xianyi
1cba8e7b11
Merge pull request #446 from grisuthedragon/cblas_matcopy
Add a CBLAS interface for the BLAS extension s/d/c/z*matcopy routines.
11 years ago
Martin Koehler
a057e5434d
add CBLAS interface for s/d/c/zimatcopy
11 years ago
Martin Köhler
7794766d3c
Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them.
11 years ago