2524 Commits (2c0dd2468e253ec7ecdabafcb15d5016a7218a12)

Author SHA1 Message Date
  yancheng d4c96a35a8 loongarch64: Add optimizations for axpy and axpby. 1 year ago
  yancheng 360acc0a41 loongarch64: Add optimizations for swap. 1 year ago
  yancheng 174c25766b loongarch64: Add optimizations for copy. 1 year ago
  yancheng 49829b2b7d loongarch64: Add optimizations for iamin. 1 year ago
  yancheng be83f5e4e0 loongarch64: Add optimizations for iamax. 1 year ago
  yancheng e3fb2b5afa loongarch64: Add optimizations for imin. 1 year ago
  yancheng e46b48e372 loongarch64: Add optimizations for imax. 1 year ago
  yancheng 702fc1d56d loongarch64: Add optimization for min. 1 year ago
  yancheng 346b384d1c loongarch64: Add optimization for max. 1 year ago
  yancheng ff2ecc6cda loongarch64: Add optimization for amin. 1 year ago
  yancheng 265b5f2e80 loongarch64: Add optimizations for amax. 1 year ago
  yancheng 993ede7c70 loongarch64: Add optimizations for scal. 1 year ago
  Octavian Maghiar 4a12cf53ec [RISC-V] Improve RVV kernel generator LMUL usage 1 year ago
  Octavian Maghiar e4586e81b8 [RISC-V] Add RISC-V Vector 128-bit target 1 year ago
  Martin Kroeker 39bf8ece20
Merge pull request #4340 from yinshiyou/la-dev 1 year ago
  Shiyou Yin 9fe07d82fd loongarch: Add LSX optimization for dot. 1 year ago
  Shiyou Yin 13b8c44b44 loongarch: Add optimization for dsdot kernel. 1 year ago
  Shiyou Yin 3def6a8143 loongarch: Add LASX optimization for dot. 1 year ago
  Bart Oldeman c34e2cf380 Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum 1 year ago
  Martin Kroeker 22aa401656
Temporarily disable the AVX512 CASUM/ZASUM microkernels for any version of NVIDIA HPC (#4327) 1 year ago
  Bart Oldeman f8ad5344c2 Fix casum fallback kernel. 1 year ago
  Martin Kroeker 04bc801999
(Re)apply fixes for supporting only a subset of precision types from PR 3915 1 year ago
  Martin Kroeker 9019bc4945
Use SkylakeX ?ASUM microkernel for Cooperlake/Sapphirerapids as well 1 year ago
  Martin Kroeker 3bfa4d4dcc
Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 1 year ago
  Rajalakshmi Srinivasaraghavan 980f702f72 POWER: AIX: Make use of power10 optimization 1 year ago
  Rajalakshmi Srinivasaraghavan 9f42570e33 POWER: Increase macro size limit for AIX 2 years ago
  Martin Kroeker 9f49aef91b
Merge pull request #4255 from RajalakshmiSR/AIX-P10 2 years ago
  Martin Kroeker e7d05402e0
Fix up S/D GEMM copy function definitions after #4009 2 years ago
  Rajalakshmi Srinivasaraghavan 71d733e5f7 POWER: Avoid m4 conversions for C files 2 years ago
  Rajalakshmi Srinivasaraghavan 82fc29a57a POWER10: Fallback to POWER8 functions 2 years ago
  Rajalakshmi Srinivasaraghavan db0805906b powerpc: Fix build errors with Open XL C 2 years ago
  Martin Kroeker 675cd551da
fix improper function prototypes (empty parentheses) 2 years ago
  gxw d15e0a055c LoongArch64: Fixed compilation issues when enable DYNAMIC_ARCH 2 years ago
  gxw 4670eb1462 LoongArch64: Add dtrsm kernel 2 years ago
  gxw f2cf929374 LoongArch64: Add sgemv kernel 2 years ago
  Martin Kroeker 8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines 2 years ago
  gxw 394a1fd1bf LoongArch64: Compatible with early internal toolchain 2 years ago
  Martin Kroeker 9c4ae4d4fb
Merge pull request #4206 from martin-frbg/issue4201-2 2 years ago
  Martin Kroeker 88435104c8
Merge pull request #4204 from martin-frbg/llvm17-2 2 years ago
  Martin Kroeker fc8894dd98
Workaround miscompilation by NVIDIA nvc 2 years ago
  Martin Kroeker 7a6203ffa1
restore default Neoverse SVE build instructions for non-NVIDIA compilers 2 years ago
  Martin Kroeker 2c3034ff7f
Disable the C/ZASUM AVX512 microkernels when compiling with LLVM17 as well 2 years ago
  Martin Kroeker 8794544b43
Add support for compiling the Neoverse SVE kernels with the NVIDIA HPC compiler 2 years ago
  gxw 553cc1372f LoongArch64: Add sgemm_kernel 2 years ago
  Martin Kroeker 12ede72ab7
Merge pull request #4192 from imciner2/im/clangfix 2 years ago
  Ian McInerney 79c15db348 Fix power10 gcc intrinsic check 2 years ago
  TGY b5ba95a6c0 Modernize obsolete inline order 2 years ago
  Ian McInerney 8a8a8479be Fix cooperlake and sapphire rapids march flags on clang 2 years ago
  Martin Kroeker 34da1a067d
Allow negative INCX (API change from version 3.10 of the reference implementation) 2 years ago
  Martin Kroeker 07e32c4cb8
Allow negative INCX (API change from version 3.10 of the reference implementation) 2 years ago