wjc404
|
096da2f51a
|
Update dgemm_kernel_16x2_skylakex.c
|
5 years ago |
wjc404
|
081b188529
|
Update KERNEL.SKYLAKEX
|
5 years ago |
wjc404
|
8019e70211
|
AVX512 16x2 DGEMM kernel
|
5 years ago |
Qiyu8
|
ff42e68652
|
Optimize genenal Gemm Beta
|
5 years ago |
Martin Kroeker
|
70f45749b9
|
Merge pull request #2367 from wjc404/develop
Improve paralleled SGEMM performance on SKYLAKEX CPUs
|
5 years ago |
wjc404
|
e5dcdeb550
|
Update sgemm_direct_skylakex.c
|
5 years ago |
wjc404
|
952cc2ba38
|
Update sgemm_kernel_16x4_skylakex_2.c
|
5 years ago |
wjc404
|
feaafbedd3
|
make skylakex sgemm code more friendly for readers
BTW some kernels were adjusted to improve performance
|
5 years ago |
Martin Kroeker
|
b36018be6d
|
Merge pull request #2365 from wjc404/develop
Fix SKYLAKEX STRMM issues
|
5 years ago |
wjc404
|
3a100b2797
|
Update KERNEL.SKYLAKEX
|
5 years ago |
Martin Kroeker
|
38742d5547
|
Merge pull request #2361 from wjc404/develop
Optimize AVX2 SGEMM & STRMM
|
5 years ago |
wjc404
|
bd4c032f52
|
Update sgemm_kernel_8x4_haswell.c
|
5 years ago |
wjc404
|
9dc9b7b95e
|
Update sgemm_kernel_8x4_haswell.c
|
5 years ago |
wjc404
|
92b10212de
|
optimize AVX2 SGEMM
|
5 years ago |
wjc404
|
b73bf01378
|
optimize AVX2 SGEMM
|
5 years ago |
wjc404
|
eb3c9f1db9
|
optimize AVX2 SGEMM
|
5 years ago |
Martin Kroeker
|
456ee2e1f0
|
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
kernel/arm64/dgemm_beta.S: add beta == zero branch
|
5 years ago |
shengyang
|
80db5f11e1
|
update
|
5 years ago |
chenxuqiang
|
52de4cc8fd
|
kernel/arm64/dgemm_beta.S: add beta == zero branch
added beta == zero branch, and no need to load C matrix.
Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>
|
5 years ago |
Martin Kroeker
|
44028581cc
|
Merge pull request #2355 from Zeyiii/dev-zeyi2
Use arm neon instructions to optimize sgemm_beta operation
|
5 years ago |
Martin Kroeker
|
86ab939936
|
Merge pull request #2354 from ZuoQ3/develop
[WIP] Use arm neon instructions to optimize tcopy operation
|
5 years ago |
Martin Kroeker
|
6c85cb1869
|
Merge pull request #2352 from wjc404/develop
AVX2 ZGEMM3M kernel
|
5 years ago |
Martin Kroeker
|
995768bbc5
|
Merge pull request #2351 from Zeyiii/develop
prefetching for dgemm_beta
|
5 years ago |
int_13h
|
96ad579428
|
add in runtime cpu detection for zarch (#2349)
add in runtime cpu detection for zarch
|
5 years ago |
shengyang
|
8d84403205
|
Use arm neon instructions to optimize ncopy operation
modified: KERNEL.ARMV8
modified: KERNEL.TSV110
new file: sgemm_ncopy_4.S
|
5 years ago |
w00421467
|
0833a4846a
|
Use arm neon instructions to optimize sgemm_beta operation
|
5 years ago |
zq
|
50f7fc1401
|
[WIP] Use arm neon instructions to optimize tcopy operation
|
5 years ago |
w00421467
|
d1b53806be
|
Merge remote-tracking branch 'pub/develop' into develop
|
5 years ago |
wjc404
|
a0f0a802fc
|
Update zgemm3m_kernel_4x4_haswell.c
|
5 years ago |
wjc404
|
700fe5b5ee
|
Add files via upload
|
5 years ago |
wjc404
|
f60840c420
|
Update KERNEL.ZEN
|
5 years ago |
wjc404
|
109e18cd96
|
Update KERNEL.HASWELL
|
5 years ago |
wjc404
|
ae1579be13
|
Create zgemm3m_kernel_4x4_haswell.c
|
5 years ago |
w00421467
|
3ccf8885ac
|
prefetching for dgemm_beta
|
5 years ago |
wjc404
|
cd765f094b
|
Update cgemm3m_kernel_8x4_haswell.c
|
5 years ago |
wjc404
|
3a66c8cac1
|
Update KERNEL.ZEN
|
5 years ago |
wjc404
|
ed9af2f7da
|
Update KERNEL.HASWELL
|
5 years ago |
wjc404
|
5fd1edead9
|
Create cgemm3m_kernel_8x4_haswell.c
|
5 years ago |
wjc404
|
eeecd623d8
|
Update cgemm_kernel_8x2_haswell.c
|
6 years ago |
wjc404
|
2cd9306bb5
|
Update KERNEL.ZEN
|
6 years ago |
wjc404
|
c418c81224
|
Update KERNEL.HASWELL
|
6 years ago |
wjc404
|
025741f16a
|
Fast Haswell CGEMM kernel
|
6 years ago |
wjc404
|
f41d52665d
|
Fast Haswell ZGEMM kernel
|
6 years ago |
wjc404
|
d573d24de7
|
Fast Haswell ZGEMM kernel
|
6 years ago |
w00421467
|
b7cc69ee62
|
declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL
|
6 years ago |
w00421467
|
aeef942c4f
|
use arm neon instructions to optimize gemm beta operation
|
6 years ago |
Martin Kroeker
|
1a6ea8ee6d
|
Merge pull request #2338 from kavanabhat/aix_mod
Changes to build on AIX in POWER8 mode
|
6 years ago |
Kavana Bhat
|
6baa9b07d7
|
AIX changes for Power8
|
6 years ago |
Kavana Bhat
|
3938e59569
|
AIX changes for Power8
|
6 years ago |
Isuru Fernando
|
b863b32ac5
|
Workaround an ICE in clang 9.0.0
This bug is not there in 8.x nor in the 9.0 daily snapshot.
|
6 years ago |