Werner Saar
3119def9a7
updated cdot and zdot
10 years ago
Werner Saar
33b332372a
add optimized cdot- and zdot-kernel for sandybridge
10 years ago
Werner Saar
fd838c75bc
add optimized cdot- and zdot-kernel for haswell
10 years ago
Werner Saar
b57a60dac8
updated cdot and zdot for piledriver
10 years ago
Werner Saar
5c51163972
added optimized cdot- and zdot-kernel for steamroller
10 years ago
Werner Saar
9299d8cfd6
added optimized cdot- and zdot-kernels for bulldozer
10 years ago
Zhang Xianyi
0a3d3b945d
Refs #535 . Fix the wrong vector instruction in sgemm sandy bridge kernel.
10 years ago
Werner Saar
60c6dec6e6
updated some lines for bulldozer
10 years ago
Werner Saar
47898cca35
added optimized saxpy- and daxpy-kernel for sandybridge
10 years ago
Werner Saar
53bb924287
added optimized saxpy- and daxpy-kernel for haswell
10 years ago
Werner Saar
a901b065d3
added optimized ddot-kernel for sandybridge
10 years ago
Werner Saar
3937e2a0a0
add optimized sdot-kernel for sandybridge
10 years ago
Werner Saar
9707d608d5
removed double definition line
10 years ago
Werner Saar
701b9d7556
added optimized sdot- and ddot-kernel for HASWELL
10 years ago
Zhang Xianyi
e5b96e55a7
Fix build bug for ARM64.
10 years ago
Hank Anderson
84d90d6ed8
Fixed some compiler errors/warnings for clang.
10 years ago
Hank Anderson
518e2424a8
Fixed bad filename for cpuid.S compile.
10 years ago
Zhang Xianyi
ea7f9dacf4
Refs #509 . Fixed geadd building bug with DYNAMIC_ARCH=1.
10 years ago
Hank Anderson
0d8e227ea7
Changed strategy for setting preprocessor definitions.
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.
This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
10 years ago
Hank Anderson
12d1fb2e40
Fixed incorrect object name in kernel CMakeLists.txt
10 years ago
Hank Anderson
1b7f427401
Added conj gemv objects for complex build.
10 years ago
Hank Anderson
b2284647a3
More complex objects.
10 years ago
Hank Anderson
a6116e5859
Added some more complex-only objects.
10 years ago
Hank Anderson
714638c187
Added some TRMM objects for complex types.
10 years ago
Hank Anderson
e27c372e53
Fixed reuse of float_char from parent loop.
Fixed in/it/on/otcopy names.
10 years ago
Hank Anderson
f3f2b3d768
Added complex and single netlib-lapack fortran sources to lapack.cmake.
10 years ago
Hank Anderson
9492298048
Added other float types to Makefile.L3.
10 years ago
Hank Anderson
14fd3d35de
Added checks for missing defines in kernel.
10 years ago
Hank Anderson
cebc07cebd
ParseMakefileVars now recursively parses included makefiles.
10 years ago
Hank Anderson
33c5e8db7f
Added a helper function for setting the L1 kernel defaults.
Added loop to build objects with different KERNEL defines.
10 years ago
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
10 years ago
Hank Anderson
4662a0b13a
Changed generate functions to iterate through a list of float types.
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
10 years ago
Hank Anderson
162791e30e
Added common objects from kernel Makefile.
10 years ago
Hank Anderson
c0624a26be
Fixed some dgemm_copy function names.
10 years ago
Hank Anderson
4bfaf1ce66
Removed some list appends I missed.
10 years ago
Hank Anderson
e8c39138c6
Removed return value from GenerateNamedObjects.
It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files.
10 years ago
Hank Anderson
f992799226
Added the rest of Makefile.L3.
10 years ago
Hank Anderson
4c65afcce1
Changed kernel filenames to vars. These will need to be read from KERNEL.
Added some kernel/L3 objects.
10 years ago
Hank Anderson
7fa5c4e2fd
Fixed some case issues with ARCH.
Added some kernel and driver/others objects.
10 years ago
Hank Anderson
fa0e6a6c93
Added the rest of the L1 kernel makefile.
10 years ago
Hank Anderson
38681fb1c6
Added more kernel files.
10 years ago
Hank Anderson
189fadfde0
Started implementing kernel/Makefile in cmake.
10 years ago
Zhang Xianyi
229ce2ccd1
Add cortex-a9 and cortex-a15 targets.
11 years ago
Zhang Xianyi
41aad0407f
Merge pull request #482 from jeromerobert/develop
Allow to do gemv and ger buffer allocation on the stack
11 years ago
Werner Saar
ddf983d643
added optimizations for steamroller
11 years ago
Werner Saar
4319769b79
added target processor STEAMROLLER
11 years ago
Jerome Robert
e9d9a8eae3
Allow to do gemv and ger buffer allocation on the stack
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.
Fix #478
11 years ago
Werner Saar
587e16fba3
Ref #458 : Backport, sandybrigde uses nehalem zgemm kernel
11 years ago
Werner Saar
6261342de3
small optimization on dgemm_kernel for N=1
11 years ago
Werner Saar
bc5fff7085
changed inline assembler labels to short form
11 years ago