1641 Commits (98ebc8ac5987af4ef44618d95e34ae122ec24c20)

Author SHA1 Message Date
  Martin Kroeker 3d511f0e66
replace spurious avx512 requirement with fma check 4 years ago
  Rajalakshmi Srinivasaraghavan 2379abaa5e POWER10: Improve dgemm performance 4 years ago
  Rajalakshmi Srinivasaraghavan 55bb9f639a POWER10: Optimized zgemv 4 years ago
  Martin Kroeker 2dfb24730d
Use "old" compute(24) function with clang due to register limitations 4 years ago
  Martin Kroeker 147e0a75fd
Merge pull request #3170 from CodesWithWolves/sgemm_tcopy_16-invalid-read 4 years ago
  Rajalakshmi Srinivasaraghavan 2dbcddd83d POWER10: Adding check for little endian 4 years ago
  CodesWithWolves d2bda3b56a Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro 4 years ago
  Martin Kroeker bdd6e3a153
Merge pull request #3157 from martin-frbg/issue3020-final 4 years ago
  Martin Kroeker 7b8f580941
Merge pull request #3156 from martin-frbg/omatcopy_d 4 years ago
  Martin Kroeker 86c5a0013f
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler 4 years ago
  Martin Kroeker ef85c22474
Add workaround for LAPACK test failures with the NVIDIA HPC compiler 4 years ago
  Martin Kroeker d3555d2e50
Add workaround for LAPACK test failures with the NVIDIA HPC compiler 4 years ago
  Martin Kroeker 0f5e86a0d9
Remove premature entry for DOMATCOPY_RT 4 years ago
  Martin Kroeker 7b294a99fd
Move common.h back to the top of the file so that SKYLAKEX (from config.h) is defined in time 4 years ago
  Martin Kroeker 0934568d9c
Move includes under the ifdef for compilers w/o intrinsics support 4 years ago
  Rajalakshmi Srinivasaraghavan 09d47af2c0 Optimize zscal function for POWER10 4 years ago
  Martin Kroeker ef0238ba2b
Merge pull request #3130 from martin-frbg/issue3128 4 years ago
  Martin Kroeker a9f6f7ad39
Remove spurious AVX512 requirement and add AVX2/FMA3 guard 4 years ago
  Rajalakshmi Srinivasaraghavan 41646ed006 Optimize s/dasum function for POWER10 4 years ago
  Rajalakshmi Srinivasaraghavan 0571c3187b POWER10: Rename mma builtins 4 years ago
  Martin Kroeker 292d1af1a0
Update omatcopy_rt.c 4 years ago
  Martin Kroeker 325b398e3c
Update omatcopy_rt.c 4 years ago
  Martin Kroeker 6f5667b4d4
Enable optimized S/D OMATCOPY_RT 4 years ago
  Martin Kroeker cceeee7806
Add optimized omatcopy_rt 4 years ago
  Martin Kroeker 0a4546b742
Typo fix 4 years ago
  Martin Kroeker b1eed27a54
Replace naive omatcopy_rt with 4x4 blocked implementation 4 years ago
  Martin Kroeker 47691c031f
Use Haswell optimizations for Zen as well 4 years ago
  Martin Kroeker ce7ddd8921
Use Haswell optimizations for Zen as well 4 years ago
  Martin Kroeker 950c047b49
Use Haswell optimizations for Zen as well 4 years ago
  Martin Kroeker 46509953a9
Use Haswell optimizations for Zen as well 4 years ago
  Martin Kroeker db348dcff2
Enable optimized srot/drot kernels from Haswell 4 years ago
  Rajalakshmi Srinivasaraghavan 2056ffc227 Optimize cscal function for POWER10 4 years ago
  Rajalakshmi Srinivasaraghavan 3ede843d50 Optimize s/dscal function for POWER10 4 years ago
  Martin Kroeker 69a5558203
Merge pull request #3059 from Guobing-Chen/BF16_gemm 4 years ago
  Martin Kroeker d6905403e3
Merge pull request #3068 from alexhenrie/scan-build 4 years ago
  Rajalakshmi Srinivasaraghavan 439b93f6d2 Optimize s/drot function for POWER10 4 years ago
  Rajalakshmi Srinivasaraghavan eff7c9166e Optimize cdot function for POWER10 4 years ago
  Alex Henrie 202fc9e8ed Fix uninitialized argument value in dasum_k 4 years ago
  Martin Kroeker e378b24487
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake 4 years ago
  Albert Ziegenhagel e3f4063683 Fix building "generic" TRMM kernel with CMake 4 years ago
  Martin Kroeker b716c0ef01
Add workaround for NVIDIA HPC 4 years ago
  Martin Kroeker 2efa3b70dc
Add workaround for NVIDIA HPC 4 years ago
  Martin Kroeker 49959d4f1c
Add workaround for NVIDIA HPC 4 years ago
  Martin Kroeker 0f27a03607
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 4 years ago
  Martin Kroeker c2a8ebfe69
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 4 years ago
  Martin Kroeker 43aac5bacc
Support NVIDIA HPC compiler 4 years ago
  Chen, Guobing b0beb0b1ca Initial code for Cooperlake BF16 GEMM kernel 4 years ago
  Rajalakshmi Srinivasaraghavan 601b711c78 Optimize swap function for POWER10 4 years ago
  Ashwin Sekhar T K 1b2508362b arm64: Fix nrm2 for input vectors with Inf 4 years ago
  Martin Kroeker 3559c5d7a2
Merge pull request #3048 from martin-frbg/issue2998 4 years ago