424 Commits (4270d5bc436e0064ed654668086efa54bd60fda8)

Author SHA1 Message Date
  Martin Kroeker 70865a894e
Merge pull request #5180 from ywwry66/openmp_use_cmake 5 months ago
  Ruiyang Wu 02fd1df10b CMake: Pass `OpenMP` compiler and linker flags through CMake targets 6 months ago
  Martin Kroeker 51c1fb1f93
Fix ?spmv build and misinterpretation of NO_LAPACK=0 6 months ago
  shubham.chaudhari 8e289ecddc Simplified thread throttling function in gemv 6 months ago
  shubham.chaudhari 189dbbc04f Add thread throttling for dynamic arch neoversev1 7 months ago
  shubham.chaudhari b6cb5ece58 Add thread throttling profile for DGEMV on NEOVERSEV1 7 months ago
  Martin Kroeker 7338a473a7
Merge pull request #5150 from Harishmcw/WoA-Experiments 7 months ago
  Martin Kroeker 09ba099461
make throttling code conditional on SMP 7 months ago
  Harishmcw 030ae1fd97 Redefined threading logic for WoA 7 months ago
  Martin Kroeker c03a81b927
Merge pull request #5141 from michalowski-arm/fork-throttle 7 months ago
  Martin Kroeker 75b958a018
Transform the B array back if necessary before returning 7 months ago
  Marek Michalowski 650a062e19 Add thread throttling profile for SGEMV on `NEOVERSEV2` 7 months ago
  Marek Michalowski b723c1b7b7 Add thread throttling profile for SGEMM on `NEOVERSEV2` 7 months ago
  Vaisakh K V f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1 7 months ago
  Vaisakh K V d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 10 months ago
  Harish-Gits daf16b8229 Adjusted GESV threading logic for optimal performance on WoA 7 months ago
  Martin Kroeker 60d0be0e97
Update nrm2.c 7 months ago
  Martin Kroeker 0fd5448b2c
Handle INCX=0 7 months ago
  Martin Kroeker db7e5f1fa7
Update gemmt.c 7 months ago
  Martin Kroeker ff30ac9666
Update Makefile 7 months ago
  Martin Kroeker 7c3e169b67
Update gemmt.c 7 months ago
  Martin Kroeker 09414a4187
Ensure that GEMMTR name appears in XERBLA if gemmt was called as such 7 months ago
  Marek Michalowski 838bb57e27
Merge branch 'develop' into develop 8 months ago
  Martin Kroeker a54f9a9c69
Merge pull request #5071 from annop-w/sgemm_throttling 8 months ago
  Marek Michalowski 4d5b13f765 Add thread throttling profile for SGEMV on `NEOVERSEV1` 8 months ago
  tingbo.liao 3c8df6358f Further rearranged the rotm kernel for the different architectures. 8 months ago
  Annop Wongwathanarat c8cd8da496 Add thread throttling profile for SGEMM on NEOVERSEV1 8 months ago
  Martin Kroeker a1075477c3
Merge pull request #4994 from martin-frbg/issue4886 9 months ago
  Martin Kroeker 0c440f8a27
disable multithreading for small workloads 10 months ago
  Martin Kroeker 2a290dfc2c
forward GEMM3M calls for GENERIC targets to the regular C/ZGEMM for now 10 months ago
  Martin Kroeker 0cf656fd3e
Add copies of GEMMT under its new name GEMMTR 11 months ago
  Chris Daley cb48505251 optimize gemv forwarding on ARM64 systems 11 months ago
  Chip Kerchner 36bd3eeddf Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power). 11 months ago
  Chip Kerchner 1d51ca5798 Change multi-threading logic for SBGEMV to be the same as SGEMV. 11 months ago
  Martin Kroeker 9762464718
Fix CBLAS interface filling in the wrong triangle for Row-Major 11 months ago
  gxw 48698b2b1d LoongArch64: Rename core 1 year ago
  Martin Kroeker 7878976236
disable forwarding from SBGEMM to SBGEMV for now 1 year ago
  Chris Sidebottom b26424c6a2 Allow opt into GEMM -> GEMV forwarding 1 year ago
  Chris Sidebottom 90eb863d4b Re-add accidental removal 1 year ago
  Chris Sidebottom 28b5334f22 Complete implementation of GEMV forwarding 1 year ago
  Martin Kroeker 3db5dbc88e forward to GEMV when one argument is actually a vector 1 year ago
  gxw f3cebb3ca3 x86: Fixed numpy CI failure when the target is ZEN. 1 year ago
  Martin Kroeker 2f12a47405
fix build options for CAXPYC/ZAXPYC 1 year ago
  Martin Kroeker db9f7bc552
fix float array types to include bfloat16 1 year ago
  Martin Kroeker 076766df4e
Update CMakeLists.txt 1 year ago
  Martin Kroeker ff6670cb83
don't generate non-cblas files for gemm_batch 1 year ago
  Martin Kroeker 362a063396
remove return value 1 year ago
  Martin Kroeker 89c7bbcba6
add cblas_?gemm_batch 1 year ago
  Martin Kroeker 2957281275
Introduce a lower limit for multithreading 1 year ago
  Martin Kroeker 5fd871d7ea
Introduce a lower limit for multithreading 1 year ago