1468 Commits (de139337b8bcb1c76cd157afd4d5fd035a76efdf)

Author SHA1 Message Date
  Martin Kroeker e72430fe46
Merge pull request #2803 from xiegengxin/AVX2-asum 5 years ago
  Martin Kroeker 775a87242d
Rename KERNEL.SILICON to KERNEL.VORTEX 5 years ago
  Gengxin Xie 1b0f17eeed align to 64, using SSE when input size is small 5 years ago
  Martin Kroeker 80794fe8fd
Create KERNEL.SILICON 5 years ago
  Marius Hillenbrand 2ee5b899ce s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang 5 years ago
  Marius Hillenbrand 87e5bbd887 s390x: avoid variable-length arrays in struct for asm operands 5 years ago
  Marius Hillenbrand b9b3265ec8 s390x: avoid inline assembly for vector loads for clang 5 years ago
  Marius Hillenbrand a1616a0b86 s390x: replace nop with "nop 0" in inline assembly 5 years ago
  Marius Hillenbrand 60ef193258 s390x: use "lghi" for immediate values to fix build with clang 5 years ago
  Gengxin Xie 448152cdd8 define __AVX2__ to ensure the haswell code compiled with avx2 5 years ago
  Gengxin Xie cb3c190a3a Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic 5 years ago
  Rajalakshmi Srinivasaraghavan 317ff27cda POWER10: Avoid setting accumulators to zero in gemm kernels 5 years ago
  Martin Kroeker b2053239fc
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 5 years ago
  Martin Kroeker 9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support 5 years ago
  Martin Kroeker 6f4dc7445d
Fix typo 5 years ago
  Martin Kroeker 81fbe8d088
-march=cooperlake only available in gcc >= 10 5 years ago
  Martin Kroeker 75eeb265d7
[WIP] Refactor the driver code for direct SGEMM (#2782) 5 years ago
  Chen, Guobing e740c4873d Enable COOPERLAKE build target 5 years ago
  Martin Kroeker cbbe38bb88
Merge pull request #2772 from mhillenibm/s390x_gemm_tuning 5 years ago
  Marius Hillenbrand 07c334e7be s390x: Factor out small block sizes for SGEMM/DGEMM on z14 5 years ago
  Marius Hillenbrand e2828e30aa s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving 5 years ago
  Rajalakshmi Srinivasaraghavan 475b5c95b9 Remove extra symbol in Makefile 5 years ago
  Martin Kroeker 81dcfdcf39
Multiply by 2 instead of left-shifting a potentially negative number 5 years ago
  Martin Kroeker 0ef4b3f1f2
Multiply instead of doing a left shift of a potentially negative number 5 years ago
  Martin Kroeker aa53a8a5cb
Multiply by two instead of left-shifting one place 5 years ago
  Martin Kroeker aa3a1e7d8c
Multiply by two rather than left shift by one place 5 years ago
  Rajalakshmi Srinivasaraghavan f77b6a83f4 dgemv optimization for POWER10 5 years ago
  Rajalakshmi Srinivasaraghavan d557584b71 Fix compilation issues with clang on POWER 5 years ago
  Ashwin Sekhar T K 4e1be0e481 ARM64: Add THUNDERX3T110 Target 5 years ago
  Rajalakshmi Srinivasaraghavan 9be2688c78 Fix to store results in correct order for POWER10 GEMM kernels 5 years ago
  Martin Kroeker 6a2a60038c
Merge pull request #2720 from martin-frbg/issue2694 5 years ago
  Martin Kroeker 251a09ec90
Typo fix 5 years ago
  Martin Kroeker 95d37e1575
Regroup the 32 and 64bit sections and restore 64bit CAXPY 5 years ago
  Martin Kroeker 3523bb778e
Merge pull request #2721 from martin-frbg/p8align 5 years ago
  Martin Kroeker bf1f0734ff
Use OPENBLAS_MAKE_COMPLEX_FLOAT on PPC only 5 years ago
  Martin Kroeker ca3561cab9
Add ifdefs around call to altivec microkernel 5 years ago
  Martin Kroeker 21072e502a
Typo fix 5 years ago
  Martin Kroeker 7c6e56b5df
Rewrite assignment to complex for better portability 5 years ago
  Martin Kroeker 661c6bfa5a
Exclude altivec code paths if the compiler does not support them 5 years ago
  Martin Kroeker 0033f8be0d
Use vec_vsx_ld/st to fix misaligned accesses flagged by asan 5 years ago
  Martin Kroeker f308e741b2
remove debug output and revert changes to cdot and crot 5 years ago
  Martin Kroeker da17abec87
fix trailing whitespace 5 years ago
  Martin Kroeker f8c2697701
Use POWER6 GEMM, TRMM and DTRSM on 32bit POWER8 5 years ago
  Martin Kroeker b144423f0f
Do not define USE_TRMM for 32bit POWER8 5 years ago
  Martin Kroeker ed7e155c35
Merge branch 'develop' into aix 5 years ago
  EGuesnet 634e1305f9
Update cgemm_kernel_8x4_power8.S 5 years ago
  Martin Kroeker 28d69e0097
Merge pull request #2687 from martin-frbg/utfbom 5 years ago
  Martin Kroeker c2467c9619
Merge pull request #2686 from RajalakshmiSR/p10_shgemm 5 years ago
  Martin Kroeker d199c2787d
Merge pull request #2680 from kavanabhat/aix_makefile_fix 5 years ago
  Martin Kroeker e30ad0e521
Strip UTF8 byte order marker from source 5 years ago