338 Commits (2d0b2334259d41c2003b51a07580dbd25cfe267c)

Author SHA1 Message Date
  Wangyang Guo 2e44ca0136 sbgemm: add missing cblas_sbgemm definition 4 years ago
  Wangyang Guo 1d83ca4bca Small Matrix: support BFLOAT16 data type 4 years ago
  Wangyang Guo c17d6dacb2 Small Matrix: skip compile in unimplemented data type 4 years ago
  Wangyang Guo aa50185647 Small Matrix: better handle with GEMM3M marco 4 years ago
  Wangyang Guo 478d1086c1 Small Matrix: support DYNAMIC_ARCH build 4 years ago
  Wangyang Guo 5dc7c3c8e5 Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case 4 years ago
  Xianyi Zhang 6022e5629c Refs #2587 fix small matrix c/zgemm bug. 5 years ago
  Xianyi Zhang 57ed58cefe Refs #2587 Add small matrix optimization reference kernel for c/zgemm. 5 years ago
  Xianyi Zhang 17d32a4a82 Change a1b0 gemm to b0 gemm. 5 years ago
  Xianyi Zhang 4271cfcc6f Fix gemm interface bug for small matrix. 5 years ago
  Xianyi Zhang be3349405d Add alpha=1.0 beta=0.0 for small gemm. 5 years ago
  Xianyi Zhang 0a2077901c Add small marix optimization kernel interface. 5 years ago
  Martin Kroeker 1dea57ab25
Revert PR #3250 (shortcut without buffer allocation) as it is unsafe on some x86_64 4 years ago
  Martin Kroeker 7bb59fceb7
Clean up some warnings 4 years ago
  Martin Kroeker 4ed99c2ce3
Merge pull request #3292 from martin-frbg/syrk_limit 4 years ago
  Martin Kroeker 8186963d8c
Add lower limit for multithreading 4 years ago
  Martin Kroeker 726c44242b
Add lower threshold for multithreading 4 years ago
  Martin Kroeker 1b5620b66e
Add lower threshold for multithreading in ?potrf and ?potri 4 years ago
  Martin Kroeker baf03a0937
Merge pull request #3252 from martin-frbg/more_shortcuts 4 years ago
  Martin Kroeker 7aab5e826c
Merge pull request #3250 from martin-frbg/gemv-shortcut 4 years ago
  Martin Kroeker f84197c1a7
Add shortcuts for (small) cases that do not need expensive buffer allocation 4 years ago
  Martin Kroeker 734bd265a8
revert symv changes for now 4 years ago
  Martin Kroeker 1217eb910d
Fix copy-paste errors in variables used 4 years ago
  Martin Kroeker d6d7a6685d
Add shortcuts for (small) cases that do not need expensive buffer allocation 4 years ago
  Martin Kroeker f0e7345fb8
Add shortcut for small-size gemv_n with increments of one 4 years ago
  Martin Kroeker 03297ff9f0
Add fast path for small xSYR with INCX==1 4 years ago
  Gordon Fossum 8b599836db Add error message token for SBGEMM in gemm.c 4 years ago
  Martin Kroeker 904b221f03
Add cast to prevent overflow of intermediate result 4 years ago
  Martin Kroeker c5fb91f1bc
Fix division by zero in the non-x86 codepath 4 years ago
  Harmen Stoppels ec6b354c32 use /usr/bin/env perl 4 years ago
  Martin Kroeker bd906e3410
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg 4 years ago
  Alex Henrie f1bf2603e6 Remove dead assignment to dflag in rotmg functions 4 years ago
  Alex Henrie 6f32991eae Don't define the mode variable when not needed in gemm functions 4 years ago
  Martin Kroeker a8f249458d
Build CBLAS interfaces for CROTG and ZROTG as well 4 years ago
  Martin Kroeker ac3e2a3fdd
Add CBLAS interfaces for csrot and zdrot 4 years ago
  Martin Kroeker 857afcc41d
Use ifeq instead of ifdef for user-definable build options 4 years ago
  Chen, Guobing a7b1f9b1bb Implementation of BF16 based gemv 5 years ago
  Martin Kroeker 6a1f3e40af
Remove debug printout of object list 5 years ago
  Rajalakshmi Srinivasaraghavan b5d30b390d Fix build issues with bfloat16 5 years ago
  Martin Kroeker 1e7eb7b7a9
Fix typos in currently unused sections 5 years ago
  Martin Kroeker 052f31bc3c
Change "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker 0f7d73ff6d
Allow supporting only a subset of variable types 5 years ago
  Martin Kroeker b475b4bd0d
Support building only a subset of types 5 years ago
  Chen, Guobing deaeb6c5b8 Add bfloat16 based dot and conversion with single/double 5 years ago
  Martin Kroeker 75eeb265d7
[WIP] Refactor the driver code for direct SGEMM (#2782) 5 years ago
  Martin Kroeker fee361ae64
fix another source of NO_CBLAS=0 surprise 5 years ago
  Ashwin Sekhar T K 4e1be0e481 ARM64: Add THUNDERX3T110 Target 5 years ago
  Martin Kroeker 5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF (#2590) 5 years ago
  Martin Kroeker 2db5178e2d
enable cblas interfaces to GEMM3M in CMAKE builds 5 years ago
  Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS 5 years ago