51 Commits (eae0abfdb6153a4f8619927e46c797859e55d48c)

Author SHA1 Message Date
  Rajendra Prasad Matcha eae0abfdb6 SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 2 months ago
  Srangrang ec14e1648c fix: resolve non-RISCV host build failed issue 3 months ago
  Martin Kroeker 6680e0592f
Fix conditional inclusion of SGEMM_KERNEL_DIRECT 4 months ago
  Martin Kroeker 09ba099461
make throttling code conditional on SMP 7 months ago
  Marek Michalowski b723c1b7b7 Add thread throttling profile for SGEMM on `NEOVERSEV2` 7 months ago
  Vaisakh K V d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 10 months ago
  Chris Daley cb48505251 optimize gemv forwarding on ARM64 systems 11 months ago
  Chip Kerchner 36bd3eeddf Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power). 11 months ago
  gxw 48698b2b1d LoongArch64: Rename core 1 year ago
  Martin Kroeker 7878976236
disable forwarding from SBGEMM to SBGEMV for now 1 year ago
  Chris Sidebottom b26424c6a2 Allow opt into GEMM -> GEMV forwarding 1 year ago
  Chris Sidebottom 90eb863d4b Re-add accidental removal 1 year ago
  Chris Sidebottom 28b5334f22 Complete implementation of GEMV forwarding 1 year ago
  Martin Kroeker 3db5dbc88e forward to GEMV when one argument is actually a vector 1 year ago
  gxw 637c650f4f loongarch64: Add buffer offset for target LOONGSON3R5 1 year ago
  Martin Kroeker 93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset 1 year ago
  gxw d8c4ea8793 loongarch: Optimizing the performance of the GEMM on servers 1 year ago
  Martin Kroeker a3354a7630
Cap the number of parallel threads 1 year ago
  Honglin Zhu 71e4125795 Fix syscall error on non-x86 platform 2 years ago
  Honglin Zhu 90f041e348 Invoke the syscall to allow the use of amx tiles 2 years ago
  Wangyang Guo 4289cf048d sbgemm: avoid falling into SGEMM_KERNEL_DIRECT 4 years ago
  Wangyang Guo 2e44ca0136 sbgemm: add missing cblas_sbgemm definition 4 years ago
  Wangyang Guo 1d83ca4bca Small Matrix: support BFLOAT16 data type 4 years ago
  Wangyang Guo c17d6dacb2 Small Matrix: skip compile in unimplemented data type 4 years ago
  Wangyang Guo aa50185647 Small Matrix: better handle with GEMM3M marco 4 years ago
  Wangyang Guo 478d1086c1 Small Matrix: support DYNAMIC_ARCH build 4 years ago
  Wangyang Guo 5dc7c3c8e5 Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case 4 years ago
  Xianyi Zhang 6022e5629c Refs #2587 fix small matrix c/zgemm bug. 5 years ago
  Xianyi Zhang 57ed58cefe Refs #2587 Add small matrix optimization reference kernel for c/zgemm. 5 years ago
  Xianyi Zhang 17d32a4a82 Change a1b0 gemm to b0 gemm. 5 years ago
  Xianyi Zhang 4271cfcc6f Fix gemm interface bug for small matrix. 5 years ago
  Xianyi Zhang be3349405d Add alpha=1.0 beta=0.0 for small gemm. 5 years ago
  Xianyi Zhang 0a2077901c Add small marix optimization kernel interface. 5 years ago
  Martin Kroeker 7bb59fceb7
Clean up some warnings 4 years ago
  Gordon Fossum 8b599836db Add error message token for SBGEMM in gemm.c 4 years ago
  Alex Henrie 6f32991eae Don't define the mode variable when not needed in gemm functions 4 years ago
  Martin Kroeker 75eeb265d7
[WIP] Refactor the driver code for direct SGEMM (#2782) 5 years ago
  Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS 5 years ago
  Martin Kroeker 8229c163b7
Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH 5 years ago
  Martin Kroeker 6a14b34c20
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX 5 years ago
  Arjan van de Ven cdc668d82b Add a "sgemm direct" mode for small matrixes 6 years ago
  Craig Donner 66316b9f4c Improve performance of GEMM for small matrices when SMP is defined. 7 years ago
  Martin Kroeker 2c222f1faa
Modify complex CBLAS functions to take void pointers 8 years ago
  Hank Anderson e74462a3f5 Moved declarations to start of functions to satisfy MSVC C89 implementation. 10 years ago
  wernsaar 3300f5ebff optimized multithreading lower limits 11 years ago
  wernsaar d286daa2ba adjusted number of threads for small size 11 years ago
  Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib 11 years ago
  wernsaar a19d209005 Ref #103: enhancement for small matrix dimensions 11 years ago
  Lars Buitinck 3f7b0cd994 Merge pull request #290 from larsmans/missing-threshold 12 years ago
  Xianyi Zhang 31c836ac25 Ref #79 Added GEMM_MULTITHREAD_THRESHOLD flag to use single thread in gemm function with small matrices. 13 years ago