20 Commits (develop)

Author SHA1 Message Date
  Chip Kerchner 36bd3eeddf Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power). 11 months ago
  Chip Kerchner 09bb48d1b9 Vectorize in-copy packing/copying for SGEMM - 4X faster. 1 year ago
  Martin Kroeker ef85c22474
Add workaround for LAPACK test failures with the NVIDIA HPC compiler 4 years ago
  Gordon Fossum 213c0e7abb Added special unrolled vectorized versions of "Solve" for specific sizes, 4 years ago
  Rajalakshmi Srinivasaraghavan bd9ff820bc Fix cmake compilation issue - POWER9 5 years ago
  Martin Kroeker 4f371b0fbf
Use POWER8 kernels on big-endian POWER9 for now 5 years ago
  Martin Kroeker a9b62c03f8
Substitute precompiled gcc7 codes only when gcc is older than 9.x 5 years ago
  Martin Kroeker 673e5a0495
Replace several POWER8/9 C kernels with their gcc7-generated assembly versions (#2263) 6 years ago
  Martin Kroeker 6b6c9b1441
Merge pull request #2172 from quickwritereader/develop 6 years ago
  AbdelRauf a97b301aaa cgemm/ctrmm power9 6 years ago
  Martin Kroeker a17cf36225
Merge pull request #2153 from quickwritereader/develop 6 years ago
  AbdelRauf 148c4cc5fd conflict resolve 6 years ago
  AbdelRauf d0c3543c3f power9 zgemm ztrmm optimized 6 years ago
  AbdelRauf a469b32cf4 sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52 6 years ago
  AbdelRauf 8fe794f059 improved zgemm power9 based on power8 6 years ago
  Martin Kroeker 3f427c0cf9
Merge pull request #2107 from quickwritereader/develop 6 years ago
  AbdelRauf 47f892198c conflict resolve 6 years ago
  AbdelRauf 0f105dd8a5 sgemm/strmm 6 years ago
  Rashmica Gupta bcdf1d4917 Add in runtime CPU detection for POWER. 6 years ago
  AbdelRauf 853a18bc17 power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself 6 years ago