Martin Kroeker
667f0cc1cb
Merge pull request #1793 from fenrus75/ncopy
Add optimized *copy versions for skylakex
7 years ago
fengrl
d4c8853a02
Update common_mips64.h
7 years ago
Martin Kroeker
d3d58f8ee5
Catch conflicting usage of ARCH in at least some BSD environments
fixes #1796
7 years ago
Martin Kroeker
697dc1baf8
Use override for ARCH in make.inc
in case a conflicting setting of ARCH (for architecture) gets pulled in from the environment
(originally suggested by dloghin in #1753 )
7 years ago
Martin Kroeker
a9b51b8448
Merge pull request #1798 from martin-frbg/cmake-avx512
Add -march=skylake-avx512 when required
7 years ago
Martin Kroeker
eba394c711
Add -march=skylake-avx512 when required
fixes #1797
7 years ago
Arjan van de Ven
582c589727
dgemm/skylakex: replace discrete mul/add with fma
very minor gains since it's not super hot code, but general principles
7 years ago
Arjan van de Ven
adbf6afa25
Add vector optimizations for ncopy as well for dgemm/skylakex
7 years ago
Arjan van de Ven
32bec8afbb
add a skylakex optimized dgemm beta function
7 years ago
Martin Kroeker
6e2c494556
Merge pull request #1791 from dev-zero/develop
fix parallel build issues with APFS/HFS+/ext2/3 in netlib-lapack
7 years ago
Arjan van de Ven
20c5d668fe
dgemm/avx512 simplify and speed up the 4x4 kernel
7 years ago
Arjan van de Ven
6d43c51ccf
undo slow dgemm/skylake microoptimization
the compare is more costly than the work
7 years ago
Arjan van de Ven
d74dc39b0f
Add optimized *copy versions for skylakex
Add optimized n/t copy versions for skylakex; in the patch the
tcopy is also rewritten using intrinsics; the ncopy file
will be worked on in a future commit
7 years ago
Martin Kroeker
41951da6d4
Merge pull request #6 from xianyi/develop
merge develop
7 years ago
Martin Kroeker
474f7e9583
Add SYMBOLPREFIX and -SUFFIX options and improve help output
7 years ago
Tiziano Müller
79ea839b63
fix parallel build issues with APFS/HFS+/ext2/3 in netlib-lapack
The problem is that OpenBLAS sets the LAPACKE_LIB and the TMGLIB to the
same object and uses the `ar` feature to update the archive file. If the
underlying filesystem does not have sub-second timestamp resolution and
the system is fast enough (or `ccache` is used), the timestamp of the
builds which should be added to the previously generated archive is the
same as the archive file itself and therefore `make` does not update the
archive.
Since OpenBLAS takes care to not run the different targets updating the
archive in parallel, the easiest solution is to declare the respective
targets `.PHONY`, forcing `make` to always update them.
fixes #1682
7 years ago
Martin Kroeker
f7f97c6148
Merge pull request #1789 from brada4/develop
update travis alpine chroot with avx512 intrinsics headers
7 years ago
Martin Kroeker
6f22e1cfb8
Merge pull request #1788 from fenrus75/avx512-8x16
skylake dgemm: Add a 16x8 kernel
7 years ago
Arjan van de Ven
66b43affbc
Add a 24x8 kernel to the skylakex dgemm implementation
Minor gains for small matrixes, but at 512x512 and above the gain
gets more significant.
7 years ago
Arjan van de Ven
1938819c25
skylake dgemm: Add a 16x8 kernel
The next step for the avx512 dgemm code is adding a 16x8 kernel.
In the 8x8 kernel, each FMA has a matching load (the broadcast);
in the 16x8 kernel we can reuse this load for 2 FMAs, which
in turn reduces pressure on the load ports of the CPU and gives
a nice performance boost (in the 25% range).
7 years ago
Andrew
bda3dbe2eb
update travis alpine chroot with avx512 intrinsics headers
7 years ago
Andrew
c3e0f0eb38
update travis alpine chroot with avx512 intrinsics headers
7 years ago
Martin Kroeker
a980953bd7
Merge pull request #1785 from brada4/develop
address #1782 2nd loop
7 years ago
Martin Kroeker
78c99d5231
Merge pull request #1784 from fenrus75/dgemm-avx512
Create a AVX512 enabled version of DGEMM
7 years ago
Martin Kroeker
b7496c3638
Function name needs to be CNAME, set from outside to allow suffixing for dynamic_arch
7 years ago
Martin Kroeker
95f4e87579
Merge pull request #1787 from jeromerobert/develop
Fix unknown type name __WAIT_STATUS on RHEL5
7 years ago
Jerome Robert
b095f2fad6
Fix unknown type name __WAIT_STATUS on RHEL5
With glibc 2.5 one must have #define _XOPEN_SOURCE >= 500 to use wait.
But reading glibc code this is actually needed only if stdlib.h was
included before sys/wait.h. This was the case here through
openblas_utest.h. So changing include fix compilation on RHEL5 and
should ne hurt with more recent distro.
* Problem found when using with gcc 5.5 and 4.7.2 on RHEL5/CENTOS5
* Fix #1519
7 years ago
Martin Kroeker
02ef20a1e4
Merge pull request #1786 from martin-frbg/immintrin
Check for Immintrin.h presence in the AVX512 compatibility test as well
7 years ago
Martin Kroeker
4c3643ed7f
Check availability of immintrin.h in the AVX512 compatibility test
7 years ago
Martin Kroeker
591cca7cb0
Check availability of immintrin.h in the AVX512 compatibility test
7 years ago
Andrew
3439158dea
address #1782 2nd loop
7 years ago
Arjan van de Ven
45fe8cb0c5
Create a AVX512 enabled version of DGEMM
This patch adds dgemm_kernel_4x8_skylakex.c which is
* dgemm_kernel_4x8_haswell.s converted to C + intrinsics
* 8x8 support added
* 8x8 kernel implemented using AVX512
Performance is a work in progress, but already shows a 10% - 20%
increase for a wide range of matrix sizes.
7 years ago
Martin Kroeker
544b069e85
Merge pull request #1780 from martin-frbg/issue1774-2
Convert fldmia/fstmia instructions to UAL syntax for clang7
7 years ago
Martin Kroeker
9b2a7ad40d
Convert fldmia/fstmia instructions to UAL syntax for clang7
second part of fix for #1774 , containing files missed in #1775
7 years ago
Martin Kroeker
10ce70701a
Merge pull request #1778 from fengrl/develop
test_axpy work error on LOONGSON3A platform #1777
7 years ago
fengruilin
6fc85a6359
test_axpy work error on LOONGSON3A platform #1777
7 years ago
Martin Kroeker
831c661386
Merge pull request #1775 from martin-frbg/issue1774
Convert fldmia/fstmia instructions to UAL syntax for clang7
7 years ago
Martin Kroeker
7e5df34e6a
Convert fldmia/fstmia instructions to UAL syntax for clang7
fixes #1774
7 years ago
Martin Kroeker
4f45040b89
Merge pull request #1773 from martin-frbg/issue1767
Include thread numbers in failure message from blas_thread_init
7 years ago
Martin Kroeker
28aa94bf4b
Include thread numbers in failure message from blas_thread_init
to aid in debugging cases like #1767
7 years ago
Martin Kroeker
56e7c68810
Merge pull request #1771 from staticfloat/sf/ldflags
Add `$(LDFLAGS)` to `$(CC)` and `$(FC)` invocations within `exports/Makefile`
7 years ago
Martin Kroeker
cf6df9464c
Document the stub status of the QUAD_PRECiSION code ( #1772 )
* Document the stub status of the QUAD_PRECiSION code inherited from GotoBLAS2
in response to #1769
7 years ago
Elliot Saba
6f77af2eef
Add `$(LDFLAGS)` to `$(CC)` and `$(FC)` invocations within `exports/Makefile`
7 years ago
Martin Kroeker
4d183e5567
Merge pull request #1765 from martin-frbg/issue1761
Do not use the new TLS-enabled memory allocator for non-threaded builds, and disable TLS by default in gmake as well
7 years ago
Martin Kroeker
34d55fd165
Merge pull request #1764 from yurivict/64-suffix
Allow to install the 'interface64' version concurrently with the regular version
7 years ago
Martin Kroeker
b991570210
Merge pull request #1762 from martin-frbg/issue1710-2
Add explicit casts to silence compiler warnings
7 years ago
Martin Kroeker
288aeea8a2
Fix default settings - USE_TLS and USE_SIMPLE_THREADED_LEVEL3 should both be off
7 years ago
Martin Kroeker
1ad1e79062
Catch inadvertent USE_TLS=0 declaration
for #1766
7 years ago
Martin Kroeker
b402626509
Do not use the new TLS code for non-threaded builds even if USE_TLS is set
Workaround for #1761 as that exposed a problem in the new code (which was intended to speed up multithreaded code only anyway).
7 years ago
Martin Kroeker
ec0cac1669
Merge pull request #4 from xianyi/develop
Update branch
7 years ago