OpenBLAS

78 MB

Tree: e105411460

Author	SHA1	Message	Date
Chris Sidebottom	e105411460	Add infrastructure for bgemv/bscal - Sets up all the various entrypoints for `bgemv` - Adds `bscal` for use in the `bgemv` interface - Adds test cases for comparing `sgemv` and `bgemv` - Adds generic kernels for `bgemv_n` and `bgemv_t` which are accurate enough to pass above tests	2 months ago
Chris Sidebottom	740efd71c4	Add optimized BGEMM kernel for NEOVERSEV1 target This also improves the testing and generic kernel by re-using the BF16 conversion functions. Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	2 months ago
Chris Sidebottom	f95e7b0e32	Add infrastructure for BGEMM Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places. Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	3 months ago
Martin Kroeker	756062afa5	Rename "HALF" and "sh" to "BFLOAT16" and "sb"	5 years ago
Rajalakshmi Srinivasaraghavan	a87793e03c	Fix DYNAMIC_ARCH compilation errors	5 years ago
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	5 years ago
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	11 years ago
Wang Qian	8e53b57bb2	Appending gemmkernel and trmmkernel C code in kernel/generic, this code can be used to execute on a new platform which dose not have optimized assemble kernel.	13 years ago

8 Commits (e1054114601c972634ef692ed0ba150d7d5ede5d)