2023 Commits (2d0b2334259d41c2003b51a07580dbd25cfe267c)

Author SHA1 Message Date
  Rajalakshmi Srinivasaraghavan 0571c3187b POWER10: Rename mma builtins 4 years ago
  Martin Kroeker 292d1af1a0
Update omatcopy_rt.c 4 years ago
  Martin Kroeker 325b398e3c
Update omatcopy_rt.c 4 years ago
  Martin Kroeker 6f5667b4d4
Enable optimized S/D OMATCOPY_RT 4 years ago
  Martin Kroeker cceeee7806
Add optimized omatcopy_rt 4 years ago
  Martin Kroeker 0a4546b742
Typo fix 4 years ago
  Martin Kroeker b1eed27a54
Replace naive omatcopy_rt with 4x4 blocked implementation 4 years ago
  Martin Kroeker 47691c031f
Use Haswell optimizations for Zen as well 4 years ago
  Martin Kroeker ce7ddd8921
Use Haswell optimizations for Zen as well 4 years ago
  Martin Kroeker 950c047b49
Use Haswell optimizations for Zen as well 4 years ago
  Martin Kroeker 46509953a9
Use Haswell optimizations for Zen as well 4 years ago
  Martin Kroeker db348dcff2
Enable optimized srot/drot kernels from Haswell 4 years ago
  Rajalakshmi Srinivasaraghavan 2056ffc227 Optimize cscal function for POWER10 4 years ago
  Rajalakshmi Srinivasaraghavan 3ede843d50 Optimize s/dscal function for POWER10 4 years ago
  Martin Kroeker 69a5558203
Merge pull request #3059 from Guobing-Chen/BF16_gemm 4 years ago
  Martin Kroeker d6905403e3
Merge pull request #3068 from alexhenrie/scan-build 4 years ago
  Rajalakshmi Srinivasaraghavan 439b93f6d2 Optimize s/drot function for POWER10 4 years ago
  Rajalakshmi Srinivasaraghavan eff7c9166e Optimize cdot function for POWER10 4 years ago
  Alex Henrie 202fc9e8ed Fix uninitialized argument value in dasum_k 4 years ago
  Martin Kroeker e378b24487
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake 4 years ago
  Albert Ziegenhagel e3f4063683 Fix building "generic" TRMM kernel with CMake 4 years ago
  Martin Kroeker b716c0ef01
Add workaround for NVIDIA HPC 4 years ago
  Martin Kroeker 2efa3b70dc
Add workaround for NVIDIA HPC 4 years ago
  Martin Kroeker 49959d4f1c
Add workaround for NVIDIA HPC 4 years ago
  Martin Kroeker 0f27a03607
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 4 years ago
  Martin Kroeker c2a8ebfe69
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 4 years ago
  Martin Kroeker 43aac5bacc
Support NVIDIA HPC compiler 4 years ago
  Chen, Guobing b0beb0b1ca Initial code for Cooperlake BF16 GEMM kernel 4 years ago
  Rajalakshmi Srinivasaraghavan 601b711c78 Optimize swap function for POWER10 4 years ago
  Ashwin Sekhar T K 1b2508362b arm64: Fix nrm2 for input vectors with Inf 4 years ago
  Martin Kroeker 3559c5d7a2
Merge pull request #3048 from martin-frbg/issue2998 4 years ago
  Martin Kroeker 8631e2976a
Temporarily revert to the old nrm2 kernels 4 years ago
  Martin Kroeker 2768bc1764
Temporarily revert to the old nrm2 kernels 4 years ago
  Martin Kroeker 6f4698ee1f
Temporarily revert to the old nrm2 kernel 4 years ago
  Martin Kroeker 114eb159a4
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA 4 years ago
  Martin Kroeker 005cce5507
Amend SkylakeX options to support the NVIDIA compiler 4 years ago
  Xianyi Zhang a3cac9cca0 Update sgemm kernel 1x4 for C910. 4 years ago
  Martin Kroeker c73d8ee40d
Conditionally add -mfma to compiler options where needed 4 years ago
  Rajalakshmi Srinivasaraghavan 2fb11f873b POWER10: Improve copy performance 4 years ago
  Martin Kroeker 043128cbe5
Merge pull request #3029 from RajalakshmiSR/axpyp10 4 years ago
  Martin Kroeker 3331ca492d
Merge pull request #3021 from austinpagan/trsm_p10 4 years ago
  Rajalakshmi Srinivasaraghavan 346e30a46a POWER10: Improve axpy performance 4 years ago
  gxw 4b548857d6 Add msa support for loongson 4 years ago
  Martin Kroeker 7f11e33e8d
Merge pull request #3025 from TiredNotTear/develop 4 years ago
  Martin Kroeker 53e0837809
Merge pull request #3022 from jinboson/develop 4 years ago
  Hao Chen ad38bd0e89 Fix failed cgemv and zgemv test case after using msa optimization 4 years ago
  Hao Chen 47b639cc9b Fix failed sswap and dswap case by using msa optimization 4 years ago
  Martin Kroeker b660008c7e
Work around DOT and SWAP test failures 4 years ago
  Martin Kroeker f8346603cf
Fix compilation with SolarisStudio 4 years ago
  Jin Bo 65de6f5957 Fix test errors reported by cblas_cgemm & cblas_ctrmm 4 years ago