OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Chris Sidebottom	114316f361	Optimize SBGEMM / BGEMM for NEOVERSEV1 further This changes the kernels to pack full SVE vectors and reduces the overall complexity of the inner GEMM loop.	1 month ago
Masato Nakagawa	7e29f11396	Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1)	2 months ago
Martin Kroeker	c504aedca1	Merge pull request #5400 from Mousius/neoversev2-target Add NEOVERSEV2 target support	2 months ago
Chris Sidebottom	87247daadc	Add NEOVERSEV2 target support Did a quick run around to make `TARGET=NEVOERSEV2` build successfully. Fixes #5385	2 months ago
Chris Sidebottom	ea2faf0c9a	Add optimized BGEMM for NEOVERSEN2 target This re-uses the existing NEOVERSEN2 8x4 `sbgemm` kernel to implement `bgemm`.	2 months ago
Chris Sidebottom	740efd71c4	Add optimized BGEMM kernel for NEOVERSEV1 target This also improves the testing and generic kernel by re-using the BF16 conversion functions. Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	2 months ago
Chris Sidebottom	f95e7b0e32	Add infrastructure for BGEMM Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places. Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	3 months ago
Masato Nakagawa	5253c8f165	Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for A64FX.	3 months ago
h-motoki	bba75d5e45	GEMM_PREFERED_SIZE parameter has been changed for A64FX.	3 months ago
Martin Kroeker	d96daa220d	Merge pull request #5290 from Srangrang/develop Add support for FP16 to openBLAS and shgemm on RISCV	3 months ago
davidz-ampere	aa90ab4142	Add support for Ampere AmpereOne processors	3 months ago
davidz-ampere	be68ef03b4	Add support for Ampere processors	3 months ago
gkdddd	670ec6f757	Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B Added HFLOAT16 support for RISCV64 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16 The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0 Related to issue #5279 Co-authored-by Linjin Li <linjin_li@163.com>	4 months ago
Srangrang	0a967797a1	Add FP16 support for RISCV	4 months ago
Martin Kroeker	a34b487f22	Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN	5 months ago
Vaisakh K V	f66ca05b31	Merge branch 'develop' into topic/sgemm_direct_sme1	7 months ago
Vaisakh K V	d23eb3b93e	Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API * Added ARMV9SME target * Added SGEMM_DIRECT kernel based on SME1	10 months ago
Ye Tao	c748e6a338	optimized sbgemm kernel for neoverse-v1 (sve-256) Signed-off-by: Ye Tao <ye.tao@arm.com>	10 months ago
Aditya Tewari	4379a6fbe3	* checkpoint sbgemm for SVE-256	11 months ago
Martin Kroeker	926e56e389	Align GEMM3M parameters for GENERIC with ZGEMM and add P/Q/R	10 months ago
Martin Kroeker	a47b3c8867	Fix unroll parameter selection for MIPS64_GENERIC	11 months ago
Martin Kroeker	7c4f3638fd	switch PPCG4 SGEMM kernel to 4x4	1 year ago
gxw	48698b2b1d	LoongArch64: Rename core Use microarchitecture name instead of meaningless strings to name the core, the legacy core is still retained. 1. Rename LOONGSONGENERIC to LA64_GENERIC 2. Rename LOONGSON3R5 to LA464 3. Rename LOONGSON2K1000 to LA264	1 year ago
Chip Kerchner	b1737698db	Fix DEFAULTS in SBGEMM for POWER10. Also comparisons for SBGEMM unit test can be exactly due to epilison differences.	1 year ago
Piotr Kubaj	4c12090776	Fix build on FreeBSD/powerpc64*	1 year ago
gxw	6017ad7146	loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6	1 year ago
Usui, Tetsuzo	ca673ca774	Add GEMM_PREFERED_SIZE parameter for Neoverse V1	1 year ago
Martin Kroeker	93d975d8fd	Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset loongarch: Optimizing the performance of the GEMM on servers	1 year ago
gxw	d8c4ea8793	loongarch: Optimizing the performance of the GEMM on servers	1 year ago
Martin Kroeker	ba6d485102	Adjust SWITCH_RATIO for ZEN and apply GEMM_PREFERRED_SIZE	1 year ago
Martin Kroeker	584e87661d	set SWITCH_RATIO for Cortex-A76	1 year ago
Martin Kroeker	b925f61fb0	Add support for Cortex-A76	1 year ago
Rajalakshmi Srinivasaraghavan	f5b2a877e2	POWER9: Use default param values from POWER8 on AIX AIX uses KERNEL.POWER8 optimization on POWER9 and changing the default GEMM parameters in param.h to use POWER8 values on POWER9.	1 year ago
pengxu	4787a55c64	Optimized cgemm kernel 16x4 LASX for LoongArch	1 year ago
pengxu	fe3da43b7d	Optimized zgemm kernel 84 LASX, 44 LSX and cgemm kernel 8*4 LSX for LoongArch	1 year ago
Martin Kroeker	e5d2725e5a	Merge pull request #4185 from XiWeiGu/mips_enable_msa MIPS: Enable MSA	1 year ago
Sergei Lewis	1093def0d1	Merge branch 'risc-v' into develop	1 year ago
Martin Kroeker	889c5d026a	Merge pull request #4456 from kseniyazaytseva/riscv-rvv10 Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics	1 year ago
kseniyazaytseva	b193ea3d7b	Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics * Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores) * Fixed nrm2, axpby, ncopy, zgemv and scal kernels * Added zero size checks	1 year ago
Dirreke	ec89466e14	Add CSKY support	1 year ago
Martin Kroeker	504f9b0c5e	Increase S/D GEMM PQ to match typical L2 size as forNeoverseV1	1 year ago
Martin Kroeker	2802478449	revert change to Loongson2k1000 zgemm	1 year ago
Martin Kroeker	44b5b9e39f	Update C/ZGEMM MN for Loongson2k1000	1 year ago
Martin Kroeker	519b40fad9	Merge pull request #4398 from yinshiyou/la-dev Add Optimizations for LoongArch.	1 year ago
pengxu	a5d0d21378	loongarch64: Add zgemm and cgemm optimization	1 year ago
Hao Chen	179ed51d3b	Add dgemm_kernel_8x4.S file.	1 year ago
Darshan Patel	dab0da8243	Update GEMM param for NEOVERSEV1	1 year ago
Octavian Maghiar	e4586e81b8	[RISC-V] Add RISC-V Vector 128-bit target Current RVV x280 target depends on vlen=512-bits for Level 3 operations. Commit adds generic target that supports vlen=128-bits. New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations. Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.	1 year ago
Rajalakshmi Srinivasaraghavan	980f702f72	POWER: AIX: Make use of power10 optimization POWER10 optimizations are disabled when using default AIX assembler. As we have fixed many issues recently, enabling optimization path for default assembler.	1 year ago
gxw	553cc1372f	LoongArch64: Add sgemm_kernel	2 years ago

1 2 3 4 5 ...

318 Commits (06c09deee94e4d03ab814d576da95fb047acbdda)