351 Commits (114316f36102e3500ec84c055acb37802afdb313)

Author SHA1 Message Date
  Chris Sidebottom 114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further 1 month ago
  Martin Kroeker f1ee61ea30
Include NEON header for the bfloat conversion functions 1 month ago
  Martin Kroeker b3ffd5524a
Include NEON header for the bfloat conversion functions 1 month ago
  Martin Kroeker a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 2 months ago
  abhishek-fujitsu 0bc79da587 add neon header 2 months ago
  Chris Sidebottom ea2faf0c9a Add optimized BGEMM for NEOVERSEN2 target 2 months ago
  Chris Sidebottom 2c3cdaf74e Optimized BGEMV for NEOVERSEV1 target 2 months ago
  Martin Kroeker 39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta 2 months ago
  Rajendra Prasad Matcha eae0abfdb6 SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 2 months ago
  Chris Sidebottom 740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target 2 months ago
  Martin Kroeker fd37406817
Merge branch 'develop' into optimized_gemv_n_1x3 2 months ago
  Iha, Taisei f7ad906b49 Performance improvements of [SD]DOT with loop-unrolling on A64FX 3 months ago
  Martin Kroeker ee26caffb3
Merge pull request #5309 from davidz-ampere/dev-ampereone 3 months ago
  davidz-ampere aa90ab4142 Add support for Ampere AmpereOne processors 3 months ago
  Ian McInerney badef1d32e Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types 3 months ago
  davidz-ampere 84730068af reduce duplicate kernel code 3 months ago
  davidz-ampere be68ef03b4 Add support for Ampere processors 3 months ago
  Martin Kroeker 58eeb9041c
fix handling of dummy2 3 months ago
  Martin Kroeker 1589d0b21e
Merge pull request #5281 from martin-frbg/zscal_arm64 3 months ago
  Sharif Inamdar 8279e68805 Optimize gemv_n_sve_v1x3 kernel 3 months ago
  Arne Juul 5442aff218 Accumulate results in output register explicitly 3 months ago
  Martin Kroeker 28f8fdaf0f
support flag for NaN/Inf handling and fix scaling of NaN/Inf values 4 months ago
  Martin Kroeker 5141a90993
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222) 4 months ago
  Martin Kroeker 151b74284e
Merge pull request #5203 from quic/fix-sgemmdirect-sme1 4 months ago
  abhishek-fujitsu 9c02cdb073 optimise dot using thread throttling for NEOVERSE V1 6 months ago
  Martin Kroeker d0e8fd6d40
Merge pull request #5239 from annop-w/gemv_n_sve 5 months ago
  Iha, Taisei 08b5c18d70 fixed a potential out-of-bounds on gemv. 5 months ago
  Annop Wongwathanarat e11744a411 Use SVE kernel for S/DGEMVN for SVE machines 5 months ago
  Martin Kroeker dd38b4e811
Merge pull request #5225 from annop-w/gemv_n 5 months ago
  Martin Kroeker 0241d516f6
Merge pull request #5220 from iha-taisei/sdgemv_n_unroll 5 months ago
  Annop Wongwathanarat d535728803 Improve performance for SGEMVN on NEONVERSEN1 5 months ago
  Usui, Tetsuzo d711906e3e Add symv kernels for arm64 5 months ago
  Iha, Taisei f1e628b889 Further performance improvements to [SD]GEMV. 5 months ago
  Annop Wongwathanarat ec146157d3 Use SVE kernel for S/DGEMVT for SVE machines 6 months ago
  Vaisakh K V 04915be829 Add vector registers to clobber list to prevent compiler optimization. 6 months ago
  Ye Tao f27ba5efd1 fix bugs in aarch64 sbgemv_n kernel 6 months ago
  Annop Wongwathanarat edef2e4441 Fix bug in ARM64 sbgemv_t 6 months ago
  Martin Kroeker b55ca71d5b
Merge pull request #5182 from annop-w/sgemm_ncopy 6 months ago
  Martin Kroeker 2f778554b8
Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16 6 months ago
  Annop Wongwathanarat 9807f56580 Optimize aarch64 sgemm_ncopy 6 months ago
  Martin Kroeker a3e7b16072
Merge pull request #5157 from manaalmj/feature 6 months ago
  Ye Tao 4c00099ed6 replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16 6 months ago
  Annop Wongwathanarat a085b6c9ec Fix aarch64 sbgemv_t compilation error for GCC < 13 6 months ago
  manjam01 5c4e38ab17 Optimize gemv_n_sve kernel 7 months ago
  Martin Kroeker 1d5ed5c46b
Merge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2 7 months ago
  Ye Tao 6b8b35cdf2 fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c 7 months ago
  Ye Tao 38ee7c9301 Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2 7 months ago
  Martin Kroeker 2b941c44b5
Merge branch 'develop' into sbgemv_n_neon 7 months ago
  Ye Tao 35bdbca153 Add sbgemv_n_neon kernel for arm64. 7 months ago
  Annop Wongwathanarat edaf51dd99 Add sbgemv_t_bfdot kernel for ARM64 7 months ago