OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Chris Sidebottom	f95e7b0e32	Add infrastructure for BGEMM Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places. Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	3 months ago
gkdddd	670ec6f757	Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B Added HFLOAT16 support for RISCV64 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16 The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0 Related to issue #5279 Co-authored-by Linjin Li <linjin_li@163.com>	4 months ago
Martin Kroeker	3fd6ccdf76	Include just the definition of BLASLONG rather than all of common.h	4 years ago
Martin Kroeker	34753eaebb	Include common.h (and indirectly param.h) rather than just param.h to have BLASLONG available w/o circular dependencies	4 years ago
Martin Kroeker	ca31c32693	Rename "HALF" and "sh" to "BFLOAT16" and "sb"	5 years ago
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	5 years ago
Zhang Xianyi	faa73690e4	Delete LOCAL_BUFFER_SIZE for other architectures.	9 years ago
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	11 years ago
Zhang Xianyi	bd2da90e13	Fixed typo in getarch_2nd.c.	12 years ago
wernsaar	d854b30ae6	Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3	12 years ago
Zhang Xianyi	36e0982966	Refs #187 . Use perl to generate cblas_noconst.h instead of sed. Thank Dan Povey's patch. https://github.com/xianyi/OpenBLAS/issues/187	12 years ago
Xianyi Zhang	31c836ac25	Ref #79 Added GEMM_MULTITHREAD_THRESHOLD flag to use single thread in gemm function with small matrices.	13 years ago
Xianyi Zhang	552f31dbbd	Fixed #13 . Fixed blasint undefined bug in <cblas.h> file.	14 years ago
Xianyi Zhang	342bbc3871	Import GotoBLAS2 1.13 BSD version codes.	14 years ago

14 Commits (f95e7b0e3279b0ca443b8ca4850b612df19343bb)