OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Chris Sidebottom	2c3cdaf74e	Optimized BGEMV for NEOVERSEV1 target - Adds bgemv T based off of sbgemv T kernel - Adds bgemv N which is slightly alterated to not use Y as an accumulator due to the output being bf16 which results in loss of precision - Enables BGEMM_GEMV_FORWARD to proxy BGEMM to BGEMV with new kernels	2 months ago
Chris Sidebottom	740efd71c4	Add optimized BGEMM kernel for NEOVERSEV1 target This also improves the testing and generic kernel by re-using the BF16 conversion functions. Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	2 months ago
Iha, Taisei	f1e628b889	Further performance improvements to [SD]GEMV.	5 months ago
Martin Kroeker	2b941c44b5	Merge branch 'develop' into sbgemv_n_neon	7 months ago
Ye Tao	35bdbca153	Add sbgemv_n_neon kernel for arm64.	7 months ago
Annop Wongwathanarat	edaf51dd99	Add sbgemv_t_bfdot kernel for ARM64 This improves performance for sbgemv_t by up to 100x on NEOVERSEV1. The geometric mean speedup is ~61x for M=N=[2,512].	7 months ago
Ye Tao	c748e6a338	optimized sbgemm kernel for neoverse-v1 (sve-256) Signed-off-by: Ye Tao <ye.tao@arm.com>	10 months ago
Aditya Tewari	4379a6fbe3	* checkpoint sbgemm for SVE-256	11 months ago
Iha, Taisei	4918beecbe	Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1	10 months ago
Chris Sidebottom	ba2e989c67	Add accumulators to AArch64 GEMV Kernels This helps to reduce values going missing as we accumulate.	1 year ago
Chris Sidebottom	84a268b6ca	Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868, this means I'm happy to enable this on any applicable cores. I also replicated the unrolling the copies from sgemm and dgemm.	2 years ago
Chris Sidebottom	aea2a4622b	Use latest non-SVE kernels in ARMV8SVE These are generally better and, in some cases, include threading which helps in the cores we're targeting here.	2 years ago
Chris Sidebottom	ec334e69dc	Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1 This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance. After #3868, the SVE kernels represent a pretty good boost. This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).	2 years ago
Chris Sidebottom	fd4f52c797	Add SVE implementation for sdot/ddot This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel. All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.	2 years ago
Sunita Nadampalli	19c8f615dc	OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics	3 years ago

15 Commits (2c3cdaf74ed3397ad75a15d8c7f64324ecf7ecf0)