OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	26a3402773	Reflect ARMV8 target definition changes from PR1876 and create config target directory for cross-compiles.	6 years ago
Martin Kroeker	a5a1118527	Merge pull request #1 from xianyi/develop rebase	6 years ago
Martin Kroeker	e23366e860	Merge pull request #1921 from fenrus75/haswelldgemm Replicate some of the SKYLAKEX dgemm improvements also to HASWELL	6 years ago
Arjan van de Ven	b28f75cd7e	set GEMM_PREFERED_SIZE for HASWELL Haswell likes a GEMM_PREFERED_SIZE of 16 to improve the split that the threading code does to make it a nice multiple of the SIMD kernel size	6 years ago
Arjan van de Ven	d321448a63	dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell The dgemm_ncopy_8_skylakex.c code is not avx512 specific and gives a nice performance boost for medium sized matrices	6 years ago
Arjan van de Ven	c43331ad0a	dgemm: Use the skylakex beta function also for haswell it's more efficient for certain tall/skinny matrices	6 years ago
Martin Kroeker	e8ca5a59a9	Merge pull request #1919 from fenrus75/haswelltuning (sgemm) Apply some of the SKYLAKEX optimizations also to HASWELL	6 years ago
Martin Kroeker	c4e23dd016	Update Makefile	6 years ago
Martin Kroeker	cfc4acc221	typo	6 years ago
Martin Kroeker	545c2b1bbb	Add -mavx2 on Haswell only if the compiler supports it	6 years ago
Arjan van de Ven	69d206440a	Make the skylakex/haswell sgemm code compile and run even with compilers without avx2 support	6 years ago
Martin Kroeker	3843e3e017	use -maxv2 on haswell	6 years ago
Martin Kroeker	fbcb14a74b	should be core-avx2	6 years ago
Martin Kroeker	2a3190dc76	fix elseifeq and use older option core2-avx for compatibility	6 years ago
Martin Kroeker	1ebe5c0f49	Add -march=haswell to HASWELL part of DYNAMIC_ARCH build	6 years ago
Arjan van de Ven	0586899a10	Use sgemm_ncopy_4_skylakex.c also for Haswell sgemm_ncopy_4_skylakex.c uses SSE transpose operations where the real perf win happens; this also works great for Haswell. This gives double digit percentage gains on small and skinny matrices	6 years ago
Arjan van de Ven	00dc09ad19	Use the skylake sgemm beta code also for haswell with a few small changes it's possible to use the skylake sgemm code also for haswell, this gives a modest gain (10% range) for smallish matrixes but does wonders for very skinny matrixes	6 years ago
Martin Kroeker	78d877b54b	Merge pull request #1914 from fenrus75/smallmatrix Add a "sgemm direct" mode for small matrixes	6 years ago
Arjan van de Ven	cdc668d82b	Add a "sgemm direct" mode for small matrixes OpenBLAS has a fancy algorithm for copying the input data while laying it out in a more CPU friendly memory layout. This is great for large matrixes; the cost of the copy is easily ammortized by the gains from the better memory layout. But for small matrixes (on CPUs that can do efficient unaligned loads) this copy can be a net loss. This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses the whole copy machinary for ALPHA=1/BETA=0/... standard arguments, for small matrixes only. What is small? For the non-threaded case this has been measured to be in the MNK = 28 * 512 * 512 range, while in the threaded case it's less, around MNK = 1 * 512 * 512	6 years ago
Martin Kroeker	87718807f0	Merge pull request #1910 from martin-frbg/issue1909 Fix for DYNAMIC_ARCH builds made on a AVX512-capable host	6 years ago
Martin Kroeker	51aec8e96b	make sure the added march=skylake-avx512 does not cause problems on Windows	6 years ago
Martin Kroeker	06f7d78d70	Add -march=skylake-avx512 to SkylakeX part of DYNAMIC_ARCH builds	6 years ago
Martin Kroeker	38cc638591	Avoid adding blanket march=skylake-avx512 to dynamic_arch builds	6 years ago
Martin Kroeker	0bf6d74e5f	Fix typo in previous commit for arm dynamic arch	6 years ago
Martin Kroeker	133c278ee5	Add DYNAMIC_CORE list for ARM64 cf #1908	6 years ago
Martin Kroeker	2b355592e3	Make sure to use the arm version of dynamic.c in ARM64 DYNAMIC_ARCH cf. #1908	6 years ago
Martin Kroeker	ff3eb1d474	Merge pull request #1904 from martin-frbg/issue1870 Fix cmake parsing of GEMM kernels for ARMV8	6 years ago
Martin Kroeker	0b09516678	Fix missing parameter in popen call	6 years ago
Martin Kroeker	7639f2e1f0	Rewrite the conditional for OSX to fix cmake parsing on others The Makefile variable parser in utils.cmake currently does not handle conditionals. Having the definitions for non-OSX last will at least make cmake builds work again on non-OSX platforms.	6 years ago
Martin Kroeker	2fc712469d	Avoid creating spurious non-suffixed c/zgemm_kernels Plain cgemm_kernel and zgemm_kernel are not used anywhere, only cgemm_kernel_b etc. Needlessly building them (without any define like NN, CN, etc.) just happened to work on most platforms, but not on arm64. See #1870	6 years ago
Martin Kroeker	6ba30e270d	Fix typo that broke CNRM2 on ARMV8 since 0.3.0 must have happened in my #1449	6 years ago
Martin Kroeker	bf23518e36	Merge pull request #1903 from rengolin/armv8 Fix two mistakes on Arm64 builds	6 years ago
Renato Golin	31a490ea88	Fix two mistakes on Arm64 builds * Falkor is an ARMv8.0 with ARMv8.1 features, and chosing armv8.1-a for march generates instructions it cannot cope with. Reverting it back to armv8-a. * ThunderX2's build was left with a #define VULCAN, which made it miss the right compiler flags in Makefile.arm64, although it did create the right library in the end.	6 years ago
Martin Kroeker	701ea88347	Use p2align instead of align for OSX compatibility fixes #1902	6 years ago
Martin Kroeker	721c56c224	Merge pull request #1899 from brada4/fbsd12 Add mutually supported architecture mappings for FreeBSD12 ports	6 years ago
Martin Kroeker	c5f8aeff2d	Merge branch 'develop' into fbsd12	6 years ago
Martin Kroeker	8278cbe7f8	Merge pull request #1894 from pkubaj/patch-2 Use correct ARCH name on BSD powerpc64	6 years ago
Martin Kroeker	ea6d1b96bd	Update Makefile.system	6 years ago
Martin Kroeker	360374be62	Update with the changes from 0.3.4	6 years ago
Martin Kroeker	f5acaad8f0	Increment version to 0.3.5.dev	6 years ago
Martin Kroeker	93fa6b7b76	Increment version to 0.3.5.dev	6 years ago
Martin Kroeker	b028960aba	Merge branch 'release-0.3.0' into develop	6 years ago
Martin Kroeker	3c9e3faedb	fixup BSD naming of powerpc arch	6 years ago
Andrew	44c81fd135	oops	6 years ago
Andrew	26b3710485	Add architecture mappings for FreeBSD12	6 years ago
Andrew	84e614d0fd	init	6 years ago
Martin Kroeker	dceff5542c	Handle Android environments that identify as Linux (#1898 ) * Handle Android environments that identify as Linux termux terminal emulator does this, causing build failures through missed defines in common.h	6 years ago
Martin Kroeker	6c7b691083	Really revert xDOT changes from 1832 neglected to rebase #1892 on merging	6 years ago
Martin Kroeker	5f4c550c27	Merge pull request #1892 from martin-frbg/mipsdot revert MIPS64 xDOT kernel changes from #1832	6 years ago
pkubaj	731b2722ba	Fix build on POWER, remove DragonFly, add NetBSD __asm is complete on its own DBSD developers state they will only support amd64, but NetBSD supports POWER.	6 years ago

1 2 3 4 5 ...

3356 Commits (26a3402773050c8fb3c0e633e967fc1a6456fe0b) All Branches Search

3356 Commits (26a3402773050c8fb3c0e633e967fc1a6456fe0b)

All Branches