1681 Commits (57ed58cefec3ca6669afc156cc90ffb49dba6593)

Author SHA1 Message Date
  Martin Kroeker dfbc62ef7e
Support building only a subset of types 5 years ago
  Qiyu8 14f7dad3b7 performance improved 5 years ago
  Qiyu8 325b539c26 Optimize the performance of daxpy by using universal intrinsics 5 years ago
  Marius Hillenbrand 22aa81f3e5 s390x: fix cscal and zscal implementations 5 years ago
  Marius Hillenbrand f91057cbad s390x: move common vector definitions and utils into header 5 years ago
  Rajalakshmi Srinivasaraghavan be43d2cb96 Optimize daxpy/zaxpy for POWER10 5 years ago
  Martin Kroeker 91c84e1c01
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis 5 years ago
  Martin Kroeker e72430fe46
Merge pull request #2803 from xiegengxin/AVX2-asum 5 years ago
  Chen, Guobing deaeb6c5b8 Add bfloat16 based dot and conversion with single/double 5 years ago
  Martin Kroeker 775a87242d
Rename KERNEL.SILICON to KERNEL.VORTEX 5 years ago
  Gengxin Xie 1b0f17eeed align to 64, using SSE when input size is small 5 years ago
  Martin Kroeker 80794fe8fd
Create KERNEL.SILICON 5 years ago
  Marius Hillenbrand 2ee5b899ce s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang 5 years ago
  Marius Hillenbrand 87e5bbd887 s390x: avoid variable-length arrays in struct for asm operands 5 years ago
  Marius Hillenbrand b9b3265ec8 s390x: avoid inline assembly for vector loads for clang 5 years ago
  Marius Hillenbrand a1616a0b86 s390x: replace nop with "nop 0" in inline assembly 5 years ago
  Marius Hillenbrand 60ef193258 s390x: use "lghi" for immediate values to fix build with clang 5 years ago
  Gengxin Xie 448152cdd8 define __AVX2__ to ensure the haswell code compiled with avx2 5 years ago
  Gengxin Xie cb3c190a3a Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic 5 years ago
  Rajalakshmi Srinivasaraghavan 317ff27cda POWER10: Avoid setting accumulators to zero in gemm kernels 5 years ago
  Martin Kroeker b2053239fc
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 5 years ago
  Martin Kroeker 9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support 5 years ago
  Martin Kroeker 6f4dc7445d
Fix typo 5 years ago
  Martin Kroeker 81fbe8d088
-march=cooperlake only available in gcc >= 10 5 years ago
  Martin Kroeker 75eeb265d7
[WIP] Refactor the driver code for direct SGEMM (#2782) 5 years ago
  Chen, Guobing e740c4873d Enable COOPERLAKE build target 5 years ago
  Martin Kroeker cbbe38bb88
Merge pull request #2772 from mhillenibm/s390x_gemm_tuning 5 years ago
  Marius Hillenbrand 07c334e7be s390x: Factor out small block sizes for SGEMM/DGEMM on z14 5 years ago
  Marius Hillenbrand e2828e30aa s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving 5 years ago
  Rajalakshmi Srinivasaraghavan 475b5c95b9 Remove extra symbol in Makefile 5 years ago
  Martin Kroeker 81dcfdcf39
Multiply by 2 instead of left-shifting a potentially negative number 5 years ago
  Martin Kroeker 0ef4b3f1f2
Multiply instead of doing a left shift of a potentially negative number 5 years ago
  Martin Kroeker aa53a8a5cb
Multiply by two instead of left-shifting one place 5 years ago
  Martin Kroeker aa3a1e7d8c
Multiply by two rather than left shift by one place 5 years ago
  Rajalakshmi Srinivasaraghavan f77b6a83f4 dgemv optimization for POWER10 5 years ago
  Rajalakshmi Srinivasaraghavan d557584b71 Fix compilation issues with clang on POWER 5 years ago
  Ashwin Sekhar T K 4e1be0e481 ARM64: Add THUNDERX3T110 Target 5 years ago
  Rajalakshmi Srinivasaraghavan 9be2688c78 Fix to store results in correct order for POWER10 GEMM kernels 5 years ago
  Martin Kroeker 6a2a60038c
Merge pull request #2720 from martin-frbg/issue2694 5 years ago
  Martin Kroeker 251a09ec90
Typo fix 5 years ago
  Martin Kroeker 95d37e1575
Regroup the 32 and 64bit sections and restore 64bit CAXPY 5 years ago
  Martin Kroeker 3523bb778e
Merge pull request #2721 from martin-frbg/p8align 5 years ago
  Martin Kroeker bf1f0734ff
Use OPENBLAS_MAKE_COMPLEX_FLOAT on PPC only 5 years ago
  Martin Kroeker ca3561cab9
Add ifdefs around call to altivec microkernel 5 years ago
  Martin Kroeker 21072e502a
Typo fix 5 years ago
  Martin Kroeker 7c6e56b5df
Rewrite assignment to complex for better portability 5 years ago
  Martin Kroeker 661c6bfa5a
Exclude altivec code paths if the compiler does not support them 5 years ago
  Martin Kroeker 0033f8be0d
Use vec_vsx_ld/st to fix misaligned accesses flagged by asan 5 years ago
  Martin Kroeker f308e741b2
remove debug output and revert changes to cdot and crot 5 years ago
  Martin Kroeker da17abec87
fix trailing whitespace 5 years ago