289 Commits (develop)

Author SHA1 Message Date
  Martin Kroeker a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug 2 months ago
  Martin Kroeker cf06250d36
add handling of dummy2 flag 4 months ago
  Martin Kroeker 4ec62d7f73
remove non-vectorized code path for power8, restoring PR4880 5 months ago
  Ubuntu 0cc2485594 Explicit unaligned vector load/stores in PPC64LE GEMV kernels 5 months ago
  Martin Kroeker 77fba0f400
Fix "dummy2" flag handling 7 months ago
  Martin Kroeker 81eed868b6
Restore the non-vectorized code from before PR4880 for POWER8 7 months ago
  Martin Kroeker 98b5ef929c
Restore the non-vectorized code from before PR4880 for POWER8 7 months ago
  Martin Kroeker d7036cfd74
Remove trailing blanks that break the cmake parser 8 months ago
  tingbo.liao 3c8df6358f Further rearranged the rotm kernel for the different architectures. 8 months ago
  Sergey Fedorov 229efa42ff scal.S: use r11 on 32-bit Darwin on powerpc 9 months ago
  Sergey Fedorov 81e1be8d90 Revert "temporarily disable the default S/DSCAL kernel" 9 months ago
  Martin Kroeker 9b9c0aa5c9
temporarily disable the default S/DSCAL kernel 9 months ago
  Ayappan Perumal 020cce1068 Fix build issues with gcc compiler as well 11 months ago
  Ayappan Perumal b6ec73e77c Fix AIX build 11 months ago
  Chip Kerchner ab71a1edf2 Better VSX. 11 months ago
  Chip Kerchner 36bd3eeddf Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power). 11 months ago
  Martin Kroeker e52d9b4cf1
Merge pull request #4928 from austinpagan/czgemm_in_c 11 months ago
  Gordon Fossum 0b7fb5c791 CGEMM & ZGEMM using C code. 11 months ago
  Martin Kroeker c9e92348a6
Handle inf/nan if dummy2 flag is set 1 year ago
  Martin Kroeker d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds 1 year ago
  Chip Kerchner 1a7b8c650d Merge branch 'develop' into betterPowerGEMVTail 1 year ago
  Martin Kroeker f5d04318e3
Merge branch 'OpenMathLib:develop' into scalfixes 1 year ago
  Martin Kroeker 73f8866ffb
make NAN handling depend on DUMMY2 parameter 1 year ago
  Hong Bo Peng db98f8753f Try to fix LAPACK testing failures on P7. 1 year ago
  Martin Kroeker b9bfc8ce09
make NAN handling depend on dummy2 parameter 1 year ago
  Chip Kerchner ba47c7f4f3 Vectorize reduction stage of sgemv_t. 1 year ago
  Chip Kerchner cb154832f8 Vectorize SBGEMM incopy - 4x faster. 1 year ago
  Martin Kroeker 2a5fe97e3b
temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN 1 year ago
  Martin Kroeker 7f8f037a36
handle INF and NAN in input 1 year ago
  Martin Kroeker f1248b849d
handle INF and NAN in input 1 year ago
  Rajalakshmi Srinivasaraghavan e112191b54 POWER: Fix issues in zscal to address lapack failures 1 year ago
  Martin Kroeker aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix 1 year ago
  Chip Kerchner 3a1417671a POWER: Fixing endianness issue in cswap/zswap kernel for AIX 1 year ago
  Amrita H S 87b3d9054f Fix regression SAXPY when compiler with OpenXL compiler. 1 year ago
  Chip-Kerchner 99384933ff Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code" 1 year ago
  Martin Kroeker accea15551
Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code 1 year ago
  austinpagan 87ba528d8b Changed C files to straighten out indentation. Removed commented lines from other file. 1 year ago
  austinpagan ddac75e0ef Adding .C versions of CGEMM and ZGEMM 1 year ago
  Chip Kerchner 2bb7ea64a1 Only vectorize 64-bit version for Power8. 1 year ago
  Chip Kerchner 09bb48d1b9 Vectorize in-copy packing/copying for SGEMM - 4X faster. 1 year ago
  Chip-Kerchner 058dd2a4cb Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions. 1 year ago
  barracuda156 d9653af018 KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing 1 year ago
  Chip-Kerchner 4e738e561a Replace two vector loads with one vector pair load and fix endianess of stores. 1 year ago
  Rajalakshmi Srinivasaraghavan 980f702f72 POWER: AIX: Make use of power10 optimization 1 year ago
  Rajalakshmi Srinivasaraghavan 82fc29a57a POWER10: Fallback to POWER8 functions 2 years ago
  Martin Kroeker 8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines 2 years ago
  Ian McInerney 79c15db348 Fix power10 gcc intrinsic check 2 years ago
  TGY b5ba95a6c0 Modernize obsolete inline order 2 years ago
  Martin Kroeker 54d3246fc6
Allow negative INCX (API change from version 3.10 of the reference implementation) 2 years ago
  Manjul Mohan 58b88aa5f0 POWER10: Fix compiler warnings 2 years ago