Martin Kroeker
db070a9223
add gemm_batch drivers
1 year ago
Martin Kroeker
d0794f88dc
add gemm_batch driver
1 year ago
Martin Kroeker
0073affe63
Merge pull request #4693 from goplanid/locks-improvement
Lock Management Improvements for Memory Allocation Efficiency
1 year ago
Martin Kroeker
6ca9ffa7f5
Merge pull request #4655 from yamazakimitsufumi/update_2d_thread_distribution
Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance
1 year ago
Deeksha Goplani
0dc80a5c8d
locks improvement
1 year ago
Martin Kroeker
8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
Loongarch64: Improving the Performance and Stability of dgemm
1 year ago
gxw
637c650f4f
loongarch64: Add buffer offset for target LOONGSON3R5
1 year ago
Martin Kroeker
5500b4ab26
Merge pull request #4680 from theAeon/develop
Expose whether locking is enabled in get_config
1 year ago
Martin Kroeker
f0f1ff7820
fix HUGETLB allocation for TLS mode as well
1 year ago
Andrew Robbins
edfe1aa471
Expose whether locking is enabled in get_config
1 year ago
Martin Kroeker
dc99b61380
sort unwanted interdependencies of alloc_shm and alloc_hugetlb
1 year ago
Martin Kroeker
ddcd7d6fa8
Merge branch 'develop' into Threading_Callback
1 year ago
yamazaki-mitsufumi
51ab1903e7
Expanding the scop of 2D thread distribution
1 year ago
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
1 year ago
shivammonaka
7102367fde
Introduced callback to Pthread, Win32 and OpenMP backend
1 year ago
Mark Seminatore
b0ad8a78ff
code to fix lost work in case of re-entrant calls to exec_blas_async()
1 year ago
Martin Kroeker
88b5330ae7
Restore outer loop of blas_buffer_inuse setup
1 year ago
shivammonaka
d49ebc54e1
Merge branch 'shivam-develop' into shivam-Locks
1 year ago
shivammonaka
bc191015e3
Using OpenMP locks with NUM_PARALLEL
1 year ago
Mark Seminatore
b29fd48998
Merge branch 'develop' into win_tidy
1 year ago
Mark Seminatore
98c56a7314
more cleanup
1 year ago
Chip Kerchner
d408ecedba
Add environment variable to display coretype for dynamic arch.
1 year ago
Chip Kerchner
ac6b4b7aa4
Make sure CPU ID works for all POWER_10 conditions
1 year ago
Chip Kerchner
08ce6b1c1c
Add missing CPU ID definitions for old versions of AIX.
1 year ago
Martin Kroeker
a4fde2c5ac
Merge pull request #4451 from martin-frbg/overflow_reset
Reset "buffer management structure overflowed" state and free auxiliary struct on blas_shutdown
1 year ago
Martin Kroeker
e61d96303d
Fix missing NO_AVX2 fallback for SapphireRapids
1 year ago
Mark Seminatore
42cb567f0f
more cleanup
1 year ago
Mark Seminatore
0d7fe5ea61
clean up whitespace
1 year ago
Martin Kroeker
d938aed7fe
reset "mem structure overflowed" state on shutdown
1 year ago
Chris Sidebottom
aaf65210cc
Add dynamic support for Arm(R) Neoverse(TM) V2 processor
Whilst I figure out how best to map the L2 parameters without
duplicating all of `ARMV8SVE`, lets just map this to `NEOVERSEV1`.
1 year ago
Martin Kroeker
152a6c43b6
Add blas_omp_threads_local
1 year ago
Martin Kroeker
8a9d492af7
Add default for blas_omp_threads_local
1 year ago
Martin Kroeker
87d31af2ae
Add openblas_set_num_threads_local()
1 year ago
Martin Kroeker
e7a895e714
Add Apple M as NeoverseN1
1 year ago
Chris Sidebottom
dc20a78188
Use functionally equivalent dynamic targets
Similar to `drivers/other/dynamic.c`, I've looked for functionally
equivalent targets and mapped them in the default DYNAMIC_ARCH build.
Users can still build specific cores using DYNAMIC_LIST.
1 year ago
Mark Seminatore
6bd7c54af5
introduce MT_TRACE to clean up SMP_DEBUG code
1 year ago
Mark Seminatore
edac80d7e8
some cleanup, dynamically scale threads, add missing WIN_CASE defn
1 year ago
Mark Seminatore
4ebf814b42
fix bug failing to mark task as finished.
1 year ago
Mark Seminatore
5f51811728
try at new threading model
1 year ago
Shiyou Yin
1310a0931b
loongarch: Refine build control for loongarch64.
1. Use getauxval instead of cpucfg to test hardware capability.
2. Remove unnecessary code and option for compiler check in c_check.
1 year ago
Chip-Kerchner
d99aad8ee3
Fix older version of gcc - missing __has_builtin, cpuid and no support of P10.
1 year ago
Martin Kroeker
9b5f8eb33a
Fix empty function prototypes
1 year ago
Martin Kroeker
9324520d0e
typo fix
1 year ago
Martin Kroeker
ff6437f2d7
Add workaround for omp_get_max_threads hanging on FreeBSD with libomp from LLVM14
1 year ago
Chip-Kerchner
4eecccd49b
Fix __builtin_cpu_is for AIX.
1 year ago
Chip-Kerchner
5e31c57083
Only define __builtin_cpu_is and __builtin_cpu_supports if not present.
1 year ago
Chip-Kerchner
7dcb2d67f2
Have POWER7 return arch=POWER6.
1 year ago
Chip-Kerchner
c8882bd9d8
Remove POWER7 from cpu list.
1 year ago
Chip Kerchner
badfb2e60f
Merge branch 'develop' into XLC-AIX
1 year ago
Martin Kroeker
e12aaed13d
Fix unwanted fallthrough from Intel Family 6 to 15 in case of identification failure
1 year ago