306 Commits (cfabc48190bb3ac1b5c6ace9ee560477394054c8)

Author SHA1 Message Date
  Martin Kroeker 8c99d5d1b6
Merge pull request #3796 from martin-frbg/gemmt 2 years ago
  Martin Kroeker e6204d254f
Update CMakeLists.txt 2 years ago
  Martin Kroeker 1b77764182
Conditionally leave out bits of LAPACK to be overridden by ReLAPACK 2 years ago
  Martin Kroeker c970717157
fix missing t in xgemmt rule 2 years ago
  Martin Kroeker e7fd8d21a6
Add GEMMT based on looped GEMV 2 years ago
  Martin Kroeker a3e02742f2
Add USE_PERL fallback option for create script used with FUNCTION_PROFILE 3 years ago
  Martin Kroeker f1c570a5f1
Add back original PERL-based script under new name 3 years ago
  Owen Rafferty 42c7a27e6b
rewrite perl scripts in universal shell 3 years ago
  Martin Kroeker 7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup 3 years ago
  Martin Kroeker d2b5fbf80f
Exclude some complex (LAPACK) functions when NO_LAPACK is set 3 years ago
  Martin Kroeker 64365c919e
fix function typecasts 3 years ago
  gxw 25f99fa9f8 Add cblas_{c/z}srot cblas_{c/z}rotg support 3 years ago
  Martin Kroeker 4b3769823a
Revert #3252 3 years ago
  Martin Kroeker 2845f54eb8
Remove dangerous optimization from previous #3252 - buffer is never unused here 4 years ago
  Martin Kroeker c35739db5e
Add separate entries for BFLOAT16 functions and fix missing cblas_xerbla 4 years ago
  Martin Kroeker 1085775bc6
really remove the unused variable 4 years ago
  Martin Kroeker 20581bf303
Remove unused variable 4 years ago
  Wangyang Guo 4289cf048d sbgemm: avoid falling into SGEMM_KERNEL_DIRECT 4 years ago
  Wangyang Guo 2e44ca0136 sbgemm: add missing cblas_sbgemm definition 4 years ago
  Wangyang Guo 1d83ca4bca Small Matrix: support BFLOAT16 data type 4 years ago
  Wangyang Guo c17d6dacb2 Small Matrix: skip compile in unimplemented data type 4 years ago
  Wangyang Guo aa50185647 Small Matrix: better handle with GEMM3M marco 4 years ago
  Wangyang Guo 478d1086c1 Small Matrix: support DYNAMIC_ARCH build 4 years ago
  Wangyang Guo 5dc7c3c8e5 Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case 4 years ago
  Xianyi Zhang 6022e5629c Refs #2587 fix small matrix c/zgemm bug. 5 years ago
  Xianyi Zhang 57ed58cefe Refs #2587 Add small matrix optimization reference kernel for c/zgemm. 5 years ago
  Xianyi Zhang 17d32a4a82 Change a1b0 gemm to b0 gemm. 5 years ago
  Xianyi Zhang 4271cfcc6f Fix gemm interface bug for small matrix. 5 years ago
  Xianyi Zhang be3349405d Add alpha=1.0 beta=0.0 for small gemm. 5 years ago
  Xianyi Zhang 0a2077901c Add small marix optimization kernel interface. 5 years ago
  Martin Kroeker 1dea57ab25
Revert PR #3250 (shortcut without buffer allocation) as it is unsafe on some x86_64 4 years ago
  Martin Kroeker 7bb59fceb7
Clean up some warnings 4 years ago
  Martin Kroeker 4ed99c2ce3
Merge pull request #3292 from martin-frbg/syrk_limit 4 years ago
  Martin Kroeker 8186963d8c
Add lower limit for multithreading 4 years ago
  Martin Kroeker 726c44242b
Add lower threshold for multithreading 4 years ago
  Martin Kroeker 1b5620b66e
Add lower threshold for multithreading in ?potrf and ?potri 4 years ago
  Martin Kroeker baf03a0937
Merge pull request #3252 from martin-frbg/more_shortcuts 4 years ago
  Martin Kroeker 7aab5e826c
Merge pull request #3250 from martin-frbg/gemv-shortcut 4 years ago
  Martin Kroeker f84197c1a7
Add shortcuts for (small) cases that do not need expensive buffer allocation 4 years ago
  Martin Kroeker 734bd265a8
revert symv changes for now 4 years ago
  Martin Kroeker 1217eb910d
Fix copy-paste errors in variables used 4 years ago
  Martin Kroeker d6d7a6685d
Add shortcuts for (small) cases that do not need expensive buffer allocation 4 years ago
  Martin Kroeker f0e7345fb8
Add shortcut for small-size gemv_n with increments of one 4 years ago
  Martin Kroeker 03297ff9f0
Add fast path for small xSYR with INCX==1 4 years ago
  Gordon Fossum 8b599836db Add error message token for SBGEMM in gemm.c 4 years ago
  Martin Kroeker 904b221f03
Add cast to prevent overflow of intermediate result 4 years ago
  Martin Kroeker c5fb91f1bc
Fix division by zero in the non-x86 codepath 4 years ago
  Harmen Stoppels ec6b354c32 use /usr/bin/env perl 4 years ago
  Martin Kroeker bd906e3410
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg 4 years ago
  Alex Henrie f1bf2603e6 Remove dead assignment to dflag in rotmg functions 4 years ago