OpenBLAS

Commit Graph

Author	SHA1	Message	Date
gkdddd	670ec6f757	Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B Added HFLOAT16 support for RISCV64 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16 The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0 Related to issue #5279 Co-authored-by Linjin Li <linjin_li@163.com>	4 months ago
Martin Kroeker	d0794f88dc	add gemm_batch driver	1 year ago
Martin Kroeker	307c4c0786	Fix typo	4 years ago
Martin Kroeker	e83df93975	Work around another recent macro name collision with winnt.h	4 years ago
Martin Kroeker	d3ff1f889f	Convert ifndefs to ifneq	4 years ago
Rajalakshmi Srinivasaraghavan	b5d30b390d	Fix build issues with bfloat16 This patch fixes compilation errors due to recent renaming from SH to SB with BUILD_BFLOAT16.	5 years ago
Martin Kroeker	006c7f6671	Change "HALF" and "sh" to "BFLOAT16" and "sb"	5 years ago
Martin Kroeker	886a8e3190	Adapt for supporting only a subset of variable types	5 years ago
Martin Kroeker	5dd14e3d48	Make building the bfloat16 functions conditional on option BUILD_HALF (#2590 ) * make building the bfloat16 BLAS functions conditional on BUILD_HALF * pass the BUILD_HALF option to gensymbol * Pass BUILD_HALF as a compiler define for dynamic_arch builds	5 years ago
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	5 years ago
Martin Kroeker	a91f1587b9	Work around name clash with Windows10's winnt.h fixes #1503	7 years ago
wernsaar	7aae4a62e7	enabled use of GEMM3M functions	11 years ago
wernsaar	be94db096c	disabled *3M functions for x86_64 platforms	11 years ago
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	11 years ago
Xianyi Zhang	342bbc3871	Import GotoBLAS2 1.13 BSD version codes.	14 years ago

15 Commits (670ec6f7576ecc74fff96be7c00ec8fffed8647b)