OpenBLAS

Commit Graph

Author	SHA1	Message	Date
kseniyazaytseva	e1afb23811	Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets * Fixed bugs in dgemm, [a]min\max, asum kernels * Added zero checks for BLAS kernels * Added dsdot implementation for RVV 0.7.1 * Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets * Added additional definitions for RISCV64_ZVL256B target	2 years ago
Martin Kroeker	88e994116c	Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator [RISC-V] Improve RVV kernel generator LMUL usage	1 year ago
Sergei Lewis	9edb805e64	fix builds with t-head toolchains that use old versions of the intrinsics spec	1 year ago
Octavian Maghiar	4a12cf53ec	[RISC-V] Improve RVV kernel generator LMUL usage The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers. Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax. At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW. Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.	1 year ago
Octavian Maghiar	826a9d5fa4	Adds tail undisturbed for RVV Level 2 operations During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic. Commit changes intrinsics tail policy to undistrubed.	2 years ago
Octavian Maghiar	8df0289db6	Adds tail undisturbed for RVV Level 1 operations During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic. Commit changes intrinsics tail policy to undistrubed.	2 years ago
Octavian Maghiar	1e4a3a2b5e	Fixes RVV masked intrinsics for izamax/izamin kernels	2 years ago
Octavian Maghiar	e1958eb705	Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels Changes masked intrinsics from _m to _mu and reintroduces maskedoff argument.	2 years ago
ZhengSh	2a8bc38cdc	Merge branch 'xianyi:risc-v' into risc-v	2 years ago
Heller Zheng	0954746380	remove argument unused during compilation. fix wrong vr = VFMVVF_FLOAT(0, vl);	2 years ago
sh-zheng	d3bf5a5401	Combine two reduction operations of zhe/symv into one, with tail undisturbed setted.	2 years ago
sh-zheng	18d7afe69d	Add rvv support for zsymv and active rvv support for zhemv	2 years ago
Heller Zheng	1374a2d08b	This PR adapts latest spec changes Add prefix (_riscv) for all riscv intrinsics Update some intrinsics' parameter, like vfredxxxx, vmerge	2 years ago
Zhang Xianyi	19f17c8bc6	Merge pull request #3893 from HellerZheng/develop add riscv level3 C,Z kernel functions.	2 years ago
Sergei Lewis	9b61be4545	factoring riscv64/dot.c fix into separate PR as requested	2 years ago
Sergei Lewis	2406958629	* update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics) * fix multiple numerical stability and corner case issues * add a script to generate arbitrary gemm kernel shapes * add a generic zvl256b target to demonstrate large gemm kernel unrolls	2 years ago
Heller Zheng	63cf4d0166	add riscv level3 C,Z kernel functions.	2 years ago
Xianyi Zhang	c19dff0a31	Fix T-Head RVV intrinsic API changes.	2 years ago
Xianyi Zhang	e5313f53d5	Merge branch 'develop' of https://github.com/HellerZheng/OpenBLAS_riscv_x280 into HellerZheng-develop	2 years ago
Chris Sidebottom	eea006a688	Wrap SVE header with __has_include check	2 years ago
Chris Sidebottom	fd4f52c797	Add SVE implementation for sdot/ddot This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel. All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.	2 years ago
Chris Sidebottom	4f7b77e08a	Remove unnecessary instructions from Advanced SIMD dot The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register. This has an impact on smaller sized dots and seemed like a quick fix	2 years ago
Heller Zheng	3918d8504e	nrm2 simple optimization	2 years ago
HellerZheng	943372bdf5	Merge branch 'develop' into develop	2 years ago
Martin Kroeker	f73cfb7e2c	change line endings from CRLF to LF	2 years ago
Martin Kroeker	1688c7da43	change line endings from CRLF to LF	2 years ago
Heller Zheng	5d0d1c5551	Remove redundant files	2 years ago
Heller Zheng	bef47917bd	Initial version for riscv sifive x280	2 years ago
Bart Oldeman	6c1043eb41	Add [cz]scal microkernels for SKYLAKEX These are as similar to dscal_microk_skylakex-2.c as possible for consistency. Note that before this change SKYLAKEX+ uses generic C functions for cscal/zscal via commit `2271c350` from #2610 (which is masked by commit `086d87a30`). However now #3799 disables FMAs (in turn enabled by `-march=skylake-avx512`) in the plain C code which fixes excessive LAPACK test failures more nicely.	2 years ago
Martin Kroeker	c9d78dc3b2	Remove excess initializer (leftover from rework of PR 3793)	2 years ago
Martin Kroeker	65338a9493	Merge pull request #3799 from bartoldeman/cscal-zscal-no-fma x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.	2 years ago
Honglin Zhu	79066b6bf3	Change file name to match the norm and delete useless code.	2 years ago
Bart Oldeman	e7e3aa2948	x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal. If e.g. -march=haswell is set in CFLAGS, GCC generates FMAs by default, which is inconsistent with the microkernels, none of which use FMAs. These inconsistencies cause a few failures in the LAPACK testcases, where eigenvalue results with/without eigenvectors are compared. Moreover using FMAs for multiplication of complex numbers can give surprising results, see `22aa81f` for more information. This uses the same syntax as used in `22aa81f` for zarch (s390x).	2 years ago
Honglin Zhu	4989e039a5	Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build	2 years ago
Honglin Zhu	843e9fd0b9	Fix typo error	2 years ago
Honglin Zhu	b00d5b9746	New sbgemm implementation for Neoverse N2 1. Use UZP instructions but not gather load and scatter store instructions to get lower latency. 2. Padding k to a power of 4.	2 years ago
Martin Kroeker	f6f35a4288	fix copyobj declarations to work with DYNAMIC_ARCH	3 years ago
Martin Kroeker	b1d69fb3ac	Add MIPS64_GENERIC as a copy of GENERIC	3 years ago
gxw	edea1bcfaf	MIPS64: Fixed failed utest dsdot:dsdot_n_1 when TARGET=I6500	3 years ago
Martin Kroeker	101a2c77c3	Fix warnings	3 years ago
Martin Kroeker	23d59baaf1	Add -mfma to -mavx2 for Apple clang, and set AVX2 options for Zen as well	3 years ago
gxw	365936ae1b	MIPS64: Using the macro MTC rather than MTC1	3 years ago
Martin Kroeker	739c3c44a7	Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI (#3745 ) Add pragma to disable the gcc tree-optimizer for some x86_64 S and Z kernels with gcc12 on OSX or Windows	3 years ago
Martin Kroeker	bd30120ba7	Merge pull request #3720 from FlyGoat/mips64 Make it work on general MIPS64 processors	3 years ago
Jiaxun Yang	a50b29c540	Provide a fallback MIPS64_GENERIC target It is really dangerous to fallback to Loongson core on other MIPS64 processors. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>	3 years ago
Jiaxun Yang	50c4eeb97d	alpha: Remove include of version.h It will be defined by preprocessor argument. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>	3 years ago
Ivan Pribec	802e71bf05	Add const attribute to lsame	3 years ago
gxw	fbfe1daf6e	LoongArch64: Add DYNAMIC_ARCH support	3 years ago
Martin Kroeker	cd8e57040c	Merge pull request #3691 from martin-frbg/issue3679-sparc SPARC: fix DNRM2 returning INF instead of zero due to intermediate overflow	3 years ago
Martin Kroeker	6c118b7977	Fix DNRM2 returning INF instead of zero due to intermediate overflow	3 years ago

1 2 3 4 5 ...

1944 Commits (e1afb23811256b231c259ca57d7a5f6e81ac6da5)