 Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago  Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago  Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago  Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago  Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago  Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago  Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago  Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago  Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago  Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago  Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago  Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago  Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago  Simplifying ARMv8 build parameters
ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode
(which is not right because TX2 is ARMv8.1) as well as requiring a few
redundancies in the defines, making it harder to maintain and understand
what core has what. A few other minor issues were also fixed.
Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX,
ThunderX2, and XGene.
Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester.
A summary:
* Removed TX2 code from ARMv8 build, to make sure it is compatible with
all ARMv8 cores, not just v8.1. Also, the TX2 code has actually
harmed performance on big cores.
* Commoned up ARMv8 architectures' defines in params.h, to make sure
that all will benefit from ARMv8 settings, in addition to their own.
* Adding a few more cores, using ARMv8's include strategy, to benefit
from compiler optimisations using mtune. Also updated cache
information from the manuals, making sure we set good conservative
values by default. Removed Vulcan, as it's an alias to TX2.
* Auto-detecting most of those cores, but also updating the forced
compilation in getarch.c, to make sure the parameters are the same
whether compiled natively or forced arch.
Benefits:
* ARMv8 build is now guaranteed to work on all ARMv8 cores
* Improved performance for ARMv8 builds on some cores (A72, Falkor,
ThunderX1 and 2: up to 11%) over current develop
* Improved performance for *all* cores comparing to develop branch
before TX2's patch (9% ~ 36%)
* ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than
current develop's branch and 8% faster than deveop before tx2 patches
Issues:
* Regression from current develop branch for A53 (-12%) and A57 (-3%)
with ARMv8 builds, but still faster than before TX2's commit (+15%
and +24% respectively). This can be improved with a simplification of
TX2's code, to be done in future patches. At least the code is
guaranteed to be ARMv8.0 now.
Comments:
* CortexA57 builds are unchanged on A57 hardware from develop's branch,
which makes sense, as it's untouched.
* CortexA72 builds improve over A57 on A72 hardware, even if they're
using the same includes due to new compiler tunning in the makefile.
6 years ago |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429 |
- ###############################################################################
- # Copyright (c) 2025, The OpenBLAS Project
- # All rights reserved.
- # Redistribution and use in source and binary forms, with or without
- # modification, are permitted provided that the following conditions are
- # met:
- # 1. Redistributions of source code must retain the above copyright
- # notice, this list of conditions and the following disclaimer.
- # 2. Redistributions in binary form must reproduce the above copyright
- # notice, this list of conditions and the following disclaimer in
- # the documentation and/or other materials provided with the
- # distribution.
- # 3. Neither the name of the OpenBLAS project nor the names of
- # its contributors may be used to endorse or promote products
- # derived from this software without specific prior written permission.
- # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
- # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- # ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
- # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
- # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
- # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
- # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- # POSSIBILITY OF SUCH DAMAGE.
- ###############################################################################
-
- ifneq ($(C_COMPILER), PGI)
-
- ifeq ($(C_COMPILER), CLANG)
- ISCLANG=1
- endif
- ifeq ($(C_COMPILER), FUJITSU)
- ISCLANG=1
- endif
- ifneq (1, $(filter 1,$(GCCVERSIONGT4) $(ISCLANG)))
- CCOMMON_OPT += -march=armv8-a
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a
- endif
-
-
- else
-
-
- ifeq ($(CORE), ARMV8)
- CCOMMON_OPT += -march=armv8-a
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a
- endif
- endif
-
- ifeq ($(CORE), ARMV8SVE)
- CCOMMON_OPT += -march=armv8-a+sve
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a+sve
- endif
- endif
-
- ifeq ($(CORE), ARMV9SME)
- CCOMMON_OPT += -march=armv9-a+sve2+sme
- FCOMMON_OPT += -march=armv9-a+sve2
- endif
-
- ifeq ($(CORE), CORTEXA53)
- CCOMMON_OPT += -march=armv8-a -mtune=cortex-a53
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a -mtune=cortex-a53
- endif
- endif
-
- ifeq ($(CORE), CORTEXA57)
- CCOMMON_OPT += -march=armv8-a -mtune=cortex-a57
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a -mtune=cortex-a57
- endif
- endif
-
- ifeq ($(CORE), CORTEXA72)
- CCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
- endif
- endif
-
- ifeq ($(CORE), CORTEXA73)
- CCOMMON_OPT += -march=armv8-a -mtune=cortex-a73
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a -mtune=cortex-a73
- endif
- endif
-
- ifeq ($(CORE), CORTEXA76)
- CCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a76
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a76
- endif
- endif
-
- ifeq ($(CORE), FT2000)
- CCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
- endif
- endif
-
- # Use a72 tunings because Neoverse-N1 is only available
- # in GCC>=9
- ifeq ($(CORE), NEOVERSEN1)
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ7) $(ISCLANG)))
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ9) $(ISCLANG)))
- CCOMMON_OPT += -march=armv8.2-a -mtune=neoverse-n1
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.2-a -mtune=neoverse-n1
- endif
- else
- CCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a72
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a72
- endif
- endif
- else
- CCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
- endif
- endif
- endif
-
- # Use a72 tunings because Neoverse-V1 is only available
- # in GCC>=10.4
- ifeq ($(CORE), NEOVERSEV1)
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ7) $(ISCLANG)))
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ10) $(ISCLANG)))
- ifeq (1, $(filter 1,$(GCCMINORVERSIONGTEQ4) $(GCCVERSIONGTEQ11) $(ISCLANG)))
- CCOMMON_OPT += -march=armv8.4-a+sve+bf16
- ifeq (1, $(ISCLANG))
- CCOMMON_OPT += -mtune=cortex-x1
- else
- CCOMMON_OPT += -mtune=neoverse-v1
- endif
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.4-a -mtune=neoverse-v1
- endif
- else
- CCOMMON_OPT += -march=armv8.4-a+sve+bf16
- ifneq ($(CROSS), 1)
- CCOMMON_OPT += -mtune=native
- endif
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.4-a
- ifneq ($(CROSS), 1)
- FCOMMON_OPT += -mtune=native
- endif
- endif
- endif
- else
- CCOMMON_OPT += -march=armv8.2-a+sve -mtune=cortex-a72
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a72
- endif
- endif
- else
- CCOMMON_OPT += -march=armv8-a+sve -mtune=cortex-a72
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
- endif
- endif
- endif
-
- # Use a72 tunings because Neoverse-N2 is only available
- # in GCC>=10.4
- ifeq ($(CORE), NEOVERSEN2)
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ7) $(ISCLANG)))
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ10) $(ISCLANG)))
- ifeq (1, $(filter 1,$(GCCMINORVERSIONGTEQ4) $(GCCVERSIONGTEQ11) $(ISCLANG)))
- ifneq ($(OSNAME), Darwin)
- CCOMMON_OPT += -march=armv8.5-a+sve+sve2+bf16 -mtune=neoverse-n2
- else
- CCOMMON_OPT += -march=armv8.2-a+sve+bf16 -mtune=cortex-a72
- endif
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.5-a+sve+sve2+bf16 -mtune=neoverse-n2
- endif
- else
- CCOMMON_OPT += -march=armv8.5-a+sve+bf16
- ifneq ($(CROSS), 1)
- CCOMMON_OPT += -mtune=native
- endif
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.5-a
- ifneq ($(CROSS), 1)
- FCOMMON_OPT += -mtune=native
- endif
- endif
- endif
- else
- CCOMMON_OPT += -march=armv8.2-a+sve+bf16 -mtune=cortex-a72
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a72
- endif
- endif
- else
- CCOMMON_OPT += -march=armv8-a+sve+bf16 -mtune=cortex-a72
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
- endif
- endif
- endif
-
- # Detect ARM Neoverse V2.
- ifeq ($(CORE), NEOVERSEV2)
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ13) $(ISCLANG)))
- CCOMMON_OPT += -mcpu=neoverse-v2
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -mcpu=neoverse-v2
- endif
- else
- CCOMMON_OPT += -march=armv8.2-a+sve+bf16 -mtune=neoverse-n1
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.2-a -mtune=neoverse-n1
- endif
- endif
- endif
-
- # Detect Ampere AmpereOne(ampere1,ampere1a) processors.
- ifeq ($(CORE), AMPERE1)
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ12) $(ISCLANG)))
- CCOMMON_OPT += -march=armv8.6-a+crypto+crc+fp16+sha3+rng
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.6-a+crypto+crc+fp16+sha3+rng
- endif
- endif
- endif
-
- # Use a53 tunings because a55 is only available in GCC>=8.1
- ifeq ($(CORE), CORTEXA55)
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ7) $(ISCLANG)))
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ8) $(ISCLANG)))
- CCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a55
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a55
- endif
- else
- CCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a53
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a53
- endif
- endif
- else
- CCOMMON_OPT += -march=armv8-a -mtune=cortex-a53
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a -mtune=cortex-a53
- endif
- endif
- endif
-
- ifeq ($(CORE), THUNDERX)
- CCOMMON_OPT += -march=armv8-a -mtune=thunderx
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a -mtune=thunderx
- endif
- endif
-
- ifeq ($(CORE), FALKOR)
- CCOMMON_OPT += -march=armv8-a -mtune=falkor
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a -mtune=falkor
- endif
- endif
-
- ifeq ($(CORE), THUNDERX2T99)
- CCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
- endif
- endif
-
- ifeq ($(CORE), THUNDERX3T110)
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ10) $(ISCLANG)))
- CCOMMON_OPT += -march=armv8.3-a
- ifeq (0, $(ISCLANG))
- CCOMMON_OPT += -mtune=thunderx3t110
- else
- CCOMMON_OPT += -mtune=thunderx2t99
- endif
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.3-a -mtune=thunderx3t110
- endif
- else
- CCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
- endif
- endif
- endif
-
- ifeq ($(CORE), VORTEX)
- CCOMMON_OPT += -march=armv8.3-a
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.3-a
- endif
- endif
-
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ9) $(ISCLANG)))
- ifeq ($(CORE), TSV110)
- CCOMMON_OPT += -march=armv8.2-a -mtune=tsv110
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.2-a -mtune=tsv110
- endif
- endif
- endif
-
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ9) $(ISCLANG)))
- ifeq ($(CORE), EMAG8180)
- CCOMMON_OPT += -march=armv8-a
- ifeq ($(ISCLANG), 0)
- CCOMMON_OPT += -mtune=emag
- endif
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8-a -mtune=emag
- endif
- endif
- endif
-
- ifeq ($(CORE), A64FX)
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ10) $(ISCLANG)))
- ifeq (1, $(filter 1,$(GCCMINORVERSIONGTEQ3) $(GCCVERSIONGTEQ11) $(ISCLANG)))
- CCOMMON_OPT += -march=armv8.2-a+sve -mtune=a64fx
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.2-a+sve -mtune=a64fx
- endif
- else
- CCOMMON_OPT += -march=armv8.4-a+sve -mtune=neoverse-n1
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.4-a -mtune=neoverse-n1
- endif
- endif
- endif
- endif
-
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ11) $(ISCLANG)))
- ifeq ($(CORE), CORTEXX1)
- CCOMMON_OPT += -march=armv8.2-a
- ifeq (1, $(filter 1,$(GCCMINORVERSIONGTEQ4) $(GCCVERSIONGTEQ12) $(ISCLANG)))
- CCOMMON_OPT += -mtune=cortex-x1
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-x1
- endif
- else
- CCOMMON_OPT += -mtune=cortex-a72
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a72
- endif
- endif
- endif
- endif
-
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ11) $(ISCLANG)))
- ifeq ($(CORE), CORTEXX2)
- CCOMMON_OPT += -march=armv8.4-a+sve
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.4-a+sve
- endif
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ12) $(ISCLANG)))
- CCOMMON_OPT += -mtune=cortex-x2
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -mtune=cortex-x2
- endif
- endif
- endif
- endif
-
- #ifeq (1, $(filter 1,$(ISCLANG)))
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ11) $(ISCLANG)))
- ifeq ($(CORE), CORTEXA510)
- CCOMMON_OPT += -march=armv8.4-a+sve
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.4-a+sve
- endif
- endif
- endif
-
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ11) $(ISCLANG)))
- ifeq ($(CORE), CORTEXA710)
- CCOMMON_OPT += -march=armv8.4-a+sve
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -march=armv8.4-a+sve
- endif
- ifeq (1, $(filter 1,$(GCCVERSIONGTEQ12) $(ISCLANG)))
- CCOMMON_OPT += -mtune=cortex-a710
- ifneq ($(F_COMPILER), NAG)
- FCOMMON_OPT += -mtune=cortex-a710
- endif
- endif
- endif
- endif
-
- endif
-
- else
- # NVIDIA HPC options necessary to enable SVE in the compiler
- ifeq ($(CORE), THUNDERX2T99)
- CCOMMON_OPT += -tp=thunderx2t99
- FCOMMON_OPT += -tp=thunderx2t99
- endif
- ifeq ($(CORE), NEOVERSEN1)
- CCOMMON_OPT += -tp=neoverse-n1
- FCOMMON_OPT += -tp=neoverse-n1
- endif
- ifeq ($(CORE), NEOVERSEV1)
- CCOMMON_OPT += -tp=neoverse-v1
- FCOMMON_OPT += -tp=neoverse-v1
- endif
- ifeq ($(CORE), NEOVERSEV2)
- CCOMMON_OPT += -tp=neoverse-v2
- FCOMMON_OPT += -tp=neoverse-v2
- endif
- ifeq ($(CORE), ARMV8SVE)
- CCOMMON_OPT += -tp=neoverse-v2
- FCOMMON_OPT += -tp=neoverse-v2
- endif
- ifeq ($(CORE), ARMV9SVE)
- CCOMMON_OPT += -tp=neoverse-v2
- FCOMMON_OPT += -tp=neoverse-v2
- endif
-
- endif
|