Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
10 months ago
Caroline Newcombe
760bf7aa37
Update Fortran return for complex data types (Cray and Nvidia compilers)
10 months ago
TGY
815cb24944
remove unused INLINE macro definitions
2 years ago
NickelWenzel
bee123e8e3
fix: add missing NO_AFFINITY checks
11 months ago
Martin Kroeker
c57f9326d6
Add implementation of WhereAmI() to support NO_AFFINITY=0 on ARM64 ( #4648 )
* Add preliminary implementation of WhereAmI()
1 year ago
Martin Kroeker
728788f667
typo fix
1 year ago
Martin Kroeker
d003ad630b
Increase the default GEMM buffer size on modern ARM server cpus
1 year ago
TGY
b5ba95a6c0
Modernize obsolete inline order
2 years ago
Nursultan Zarlyk
1dfc4e6150
Replace with ARM64 intrinsics
3 years ago
Nursultan Zarlyk
1bb7993a97
Fix MSVC ARM64 build. Add generic kernel for ARM64
3 years ago
Niyas Sait
cdb5d2737e
add support for building on windows/arm64 target
4 years ago
Martin Kroeker
2d45a262d9
Support compilation with nvfortran
4 years ago
Martin Kroeker
7f26be4802
Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH
4 years ago
Martin Kroeker
d237dc1360
Add read barrier definition
5 years ago
Martin Kroeker
a33d177430
Increase default BUFFER_SIZE on ARM, ZARCH and newer x86_64, add GEMM_R for POWER8/9
As shown in #2538 , default buffersizes on some platforms were smaller than required in memory.c
and the requirement could never be fulfilled for a calculated GEMM_R on PPC given the fomula used
5 years ago
Martin Kroeker
e94590e400
Merge pull request #2468 from AGSaidi/wfe
Use wait-for-event to not spin in the blas_lock
5 years ago
Ali Saidi
0af9991cc9
Use wait-for-event to not spin in the blas_lock
5 years ago
Ali Saidi
19f3a4091c
Make rpcc() on arm64 get closer to what x86 returns
The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.
5 years ago
Martin Kroeker
48f5a89f92
Merge pull request #2282 from martin-frbg/issue2281
Optimize RPCC function on ARM64
6 years ago
Martin Kroeker
b687fba5bc
Disable direct clock register access on IOS and Android
as I find conflicting information on accessibility from non-priviledged processes
6 years ago
Martin Kroeker
5f6206fa2d
Simplify OSX/IOS cross-compilation and add a CI test for it ( #2279 )
* Add automatic fixups for OSX/IOS cross-compilation
* Add OSX/IOS cross-compilation test to Travis CI
* Handle platforms that lack hwcap.h by falling back to ARMV8
* Fix PROLOGUE for OSX/IOS
6 years ago
Martin Kroeker
f2cde2ccfb
Update common_arm64.h
6 years ago
Martin Kroeker
bb5413863f
Rewrite ARM64 PROLOGUE to make it compatible with xcode/ios
6 years ago
Paul Osmialowski
42bbe74791
build: LLVM: Add Flang compiler support and enable OpenMP for Clang
Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
8 years ago
Ashwin Sekhar T K
1d121852c1
Fix blas_lock for arm64
10 years ago
Ashwin Sekhar T K
39937d15cd
Change BUFFER_SIZE for Cortex A57 to 20 MB
Change the GEMM_P, GEMM_Q, GEMM_R values for Cortex A57
10 years ago
Zhang Xianyi
233ec2a1cc
Use 40 MB buffer for ARM Cortex A57.
10 years ago
Ashwin Sekhar T K
f2f8a0fe8b
Adding arm64 target CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
10 years ago
Grazvydas Ignotas
abade3f896
really fix ARM64 locking
10 years ago
Grazvydas Ignotas
6b92204a7c
add fallback blas_lock implementation
to be used on armv5 and new platforms
10 years ago
Grazvydas Ignotas
e12cf1123e
add fallback rpcc implementation
- use on arm, arm64 and any new platform
- use faster integer math instead of double
- use similar scale as rdtsc so that timeouts work
10 years ago
Zhang Xianyi
3f1b57668e
Fix blas lock bug on AArch64.
10 years ago
Werner Saar
19b8fd2aed
smp lock bugfix
10 years ago
Zhang Xianyi
2fb02626da
Update organization info.
11 years ago
Benedikt Huber
58c90d5937
# The first commit's message is:
Optimizations for APM's xgene-1 (aarch64).
1) general system updates to support armv8 better. Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C. Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.
Added Dave Nuechterlein to the contributors list.
11 years ago
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
11 years ago
wernsaar
fe5f46c330
added experimental support for ARMV8
12 years ago