5933 Commits (6b58bca18b427a0c149d25542a5eb7c5ada6a19f)
 

Author SHA1 Message Date
  Wangyang Guo 6b58bca18b Small Matrix: disable low performance default kernel 4 years ago
  Wangyang Guo fa777f5517 Small Matrix: skylakex: add DGEMM_SMALL_M_PERMIT and tune for TN kernel 4 years ago
  Wangyang Guo 8592c21af4 Small Matrix: skylakex: dgemm nn: fix typo in idx load 4 years ago
  Wangyang Guo 3e79f6d89a Small Matrix: skylakex: add dgemm tn kernel 4 years ago
  Wangyang Guo 323d7da4f7 Small Matrix: skylakex: add dgemm tt kernel 4 years ago
  Wangyang Guo f57fc932ac Small Matrix: skylakex: add dgemm nt kernel 4 years ago
  Wangyang Guo 91ec21202b Small Matrix: skylakex: add dgemm nn kernel 4 years ago
  Wangyang Guo 72e070539c Small Matrix: skylakex: add sgemm tt kernel 4 years ago
  Wangyang Guo 02c6e764f2 Small Matrix: skylakex: add SGEMM_SMALL_M_PERMIT and tune for TN kernel 4 years ago
  Wangyang Guo 5dc7c3c8e5 Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case 4 years ago
  Wangyang Guo 642c393879 Small Matrix: skylakex: add sgemm tn kernel 4 years ago
  Wangyang Guo ae3f5c737c Small Matrix: skylakex: sgemm nt: optimize for M < 12 4 years ago
  Wangyang Guo 0d72d75bf9 Small Matrix: skylakex: add sgemm nt kernel 4 years ago
  Wangyang Guo ca7682e3a3 Small Matrix: skylakex: sgemm nn: fix n6 conflicts with n4 4 years ago
  Wangyang Guo 9967e61abb Small Matrix: skylakex: sgemm nn: fix error when beta not zero 4 years ago
  Wangyang Guo a87736346f Small Matrix: skylakex: sgemm nn: add n6 to improve performance 4 years ago
  Wangyang Guo 4c9d9940fd Small Matrix: skylakex: sgemm nn: reduce store 4 N at a time 4 years ago
  Wangyang Guo 13b32f69b7 Small Matrix: skylakex: sgemm nn: reduce store 4 M at a time 4 years ago
  Wangyang Guo 3d8c6d9607 Small Matrix: skylakex: sgemm nn: clean up unused code 4 years ago
  Wangyang Guo 49b61a3f30 Small Matrix: skylakex: sgemm_nn: optimize for M <= 8 4 years ago
  Wangyang Guo f88470323b Optimize M < 16 using AVX512 mask 4 years ago
  Wangyang Guo 9186456a12 small matrix: SkylakeX: add SGEMM NN kernel 4 years ago
  Xianyi Zhang 6022e5629c Refs #2587 fix small matrix c/zgemm bug. 5 years ago
  Xianyi Zhang 57ed58cefe Refs #2587 Add small matrix optimization reference kernel for c/zgemm. 5 years ago
  Xianyi Zhang 17d32a4a82 Change a1b0 gemm to b0 gemm. 5 years ago
  Xianyi Zhang 59cb5de46b Refs #2587 Fix typos. 5 years ago
  Xianyi Zhang 4271cfcc6f Fix gemm interface bug for small matrix. 5 years ago
  Xianyi Zhang be3349405d Add alpha=1.0 beta=0.0 for small gemm. 5 years ago
  Xianyi Zhang 0a2077901c Add small marix optimization kernel interface. 5 years ago
  Martin Kroeker e6d6d3ee43
Merge pull request #3331 from gxw-loongson/develop 4 years ago
  gxw 0b8f7c8c10 Add cmake support for LOONGARCH64 4 years ago
  Martin Kroeker e0e88f9edc
Merge pull request #3329 from martin-frbg/issue3272 4 years ago
  Martin Kroeker 5dc6aa74f0
Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3 4 years ago
  Martin Kroeker e78fbe4654
Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3 4 years ago
  Martin Kroeker b4f4ed378b
Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3 4 years ago
  Martin Kroeker cbc41973fd
Disable gfortran tree vectorizer to avoid gcc11+ miscompilation at O3 4 years ago
  gxw 34207bdf5b Fixed typos about LOONGARCH64 4 years ago
  Martin Kroeker 1b6db3dbba
Merge pull request #3327 from h-vetinari/lapack597_redux 4 years ago
  Martin Kroeker f681553c6a
Merge pull request #3326 from wattoc/develop 4 years ago
  Martin Kroeker afadeeba2a
Merge pull request #3325 from gxw-loongson/develop 4 years ago
  Isuru Fernando 02d4a49761 Also make sure the `1` is INTEGER*4 for OMP_SET_NUM_THREADS 4 years ago
  Craig Watson 4d7dfe4845 Include Haiku in processor count checks 4 years ago
  gxw af0a69f355 Add support for LOONGARCH64 4 years ago
  Martin Kroeker 5a2fe5bfb9
Merge pull request #3323 from martin-frbg/issue3322 4 years ago
  Martin Kroeker 342d3e8b5c
Merge pull request #3314 from martin-frbg/lapack597 4 years ago
  Martin Kroeker efbd7c7840
GCC did not support -mtune for ARM64 before 5.1 4 years ago
  Martin Kroeker 3a7955cd93
Merge pull request #3320 from martin-frbg/issue3318 4 years ago
  Martin Kroeker 47ba85f314
Fix regex to match kernels suffixed with cpuname too 4 years ago
  Martin Kroeker 30f23be0f9
Rework setting of -mfma to only apply it where necessary 4 years ago
  Martin Kroeker 49bbf330ca
Empirical workaround for numpy SVD NaN problem from issue 3318 4 years ago