Martin Kroeker
36fcb52094
Fix logic - we want real OR imaginary part of X to be nonzero here
2 years ago
H. Vetinari
f2659516ef
remove unqualified ifdef's for NO_LAPACK(E)
2 years ago
Martin Kroeker
7f0b11fbc1
Exclude some complex drivers when NO_LAPACK is set
3 years ago
Martin Kroeker
5f6a609253
Add sbgemv
4 years ago
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
5 years ago
Martin Kroeker
887e00fd7f
Adapt for supporting only a subset of variable types
5 years ago
Martin Kroeker
3287848c8f
Support building only seleced types
5 years ago
Martin Kroeker
806f89166e
Make ARMV7 compile with xcode and add a CI job for it ( #2537 )
* Add an ARMV7 iOS build on Travis
* thread_local appears to be unavailable on ARMV7 iOS
* Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH
* Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler
5 years ago
Martin Kroeker
8617d75548
Revert "Avoid taking root of negative number in symv_thread.c"
6 years ago
Sebastian Berg
6355c25dde
Avoid taking root of negative number in symv_thread.c
This is similar to fixes in gh-1929, but there was one remaining
occurance of this type of pattern in the driver/level2/*_thread.c
files.
6 years ago
Martin Kroeker
45333d5793
Fix error introduced during cleanup
6 years ago
Martin Kroeker
78d9910236
Correct range_n limiting
same bug as seen in #1388 , somehow missed in corresponding PR #1389
6 years ago
Martin Kroeker
5a720cf9ca
Re-enable loop unrolling in trmv and remove the scary warning
fixes #1748 as that half of the fix for #1332 appears to have been an overreaction on my part.
6 years ago
Martin Kroeker
368d14f8c8
Fix harmless typo
fixes #1872
7 years ago
Martin Kroeker
0427277cef
Allow optimization for small m, large n only if it can be made threadsafe
otherwise the introduction of a static array in 8e5a108 to improve #532 breaks concurrent calls from multiple threads as seen in #1844
7 years ago
Martin Kroeker
cc9500db41
Merge pull request #1403 from brada4/develop
Address few more warnings
8 years ago
Andrew
bfc2a88594
remove unused buffer
8 years ago
Martin Kroeker
177b78c8b4
Issue1388 ( #1389 )
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262 - should fix #1388
* Calculation of range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
8 years ago
Andrew
281a2b952f
warning cleanup ( #1380 )
* dead increments in driver/level2
* dead increments in kernel/generic
* part dead increments in kernel/x86_64
8 years ago
Martin Kroeker
b414283f48
Disable gemv unrolling
as a (hopefully temporary) workaround for #1332
8 years ago
Andrew
e14d50d86e
eliminate Wunused-const gcc7 warning
8 years ago
Sacha Refshauge
37858d1146
Fix threading usage in CMake: s/SMP/USE_THREAD/
8 years ago
Martin Kroeker
719fcc56b0
Merge pull request #1262 from martin-frbg/xmv_thread-splitting
Make sure that range limit of last thread never exceeds data size
8 years ago
Martin Kroeker
0ba64cee60
Update trmv_thread.c
8 years ago
Martin Kroeker
c4e5ba1bfe
Make sure that range_n of last thread never exceeds the actual data size when splitting the workload
8 years ago
Martin Kroeker
a6f533b248
Revert "Fix calculated range limit exceeding actual data size for last thread"
8 years ago
Isuru Fernando
d245caa49a
Support out-of-source build
8 years ago
Martin Kroeker
585c0010a5
Fix range limit exceeding actual data size in last step
8 years ago
Martin Kroeker
857f61bc5d
Fix range limit exceeding data size in last step
8 years ago
Martin Kroeker
9332042d5f
Fix range exceeding actual data size in quick_divide
8 years ago
Andrew
529bfc36ec
Fix write past fixed size buffer
8 years ago
John Biddiscombe
053044ae4d
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
9 years ago
Jerome Robert
53ba1a77c8
ztrmv_L.c: no longer need a 4kB buffer
Fix #786
9 years ago
Jerome Robert
78dcf5c3d5
Improve performances of ztrmv on small matrices
* Use stack allocation
* Disable multi-threading
* Ref #727
9 years ago
Ralph Campbell
fbc21266e6
Minor C code fixes in driver/
10 years ago
Zhang Xianyi
d8392c1245
Fixe cmake config bugs.
10 years ago
Zhang Xianyi
f8eba3d548
Fixed cmake build bugs on Linux.
10 years ago
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
Disable CBLAS and LAPACK.
10 years ago
Zhang Xianyi
dcd5ba4443
Merge branch 'cmake' of https://github.com/hpanderson/OpenBLAS into hpanderson_cmake
10 years ago
Zhang Xianyi
8e5a1083bb
Refs #532 . Improve gemv paralel with small m and large n case.
Splite the matrix and reduction.
10 years ago
Hank Anderson
ab7043373f
Fixed bug generating trmv complex source names.
10 years ago
Hank Anderson
0553476fba
Added TRANS defines for complex sources in lapack.
10 years ago
Hank Anderson
2416d9dbac
Fixed TRANSA defines for complex sources in driver/level2.
10 years ago
Hank Anderson
0d8e227ea7
Changed strategy for setting preprocessor definitions.
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.
This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
10 years ago
Hank Anderson
1b7f427401
Added conj gemv objects for complex build.
10 years ago
Hank Anderson
fb5d5bb971
Added defines for complex trmv.
10 years ago
Hank Anderson
33c5e8db7f
Added a helper function for setting the L1 kernel defaults.
Added loop to build objects with different KERNEL defines.
10 years ago
Hank Anderson
4662a0b13a
Changed generate functions to iterate through a list of float types.
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
10 years ago
Hank Anderson
e8c39138c6
Removed return value from GenerateNamedObjects.
It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files.
10 years ago
Hank Anderson
2f59135eb6
Added gemv to level2 CMakeLists.txt.
10 years ago