You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

Makefile.arm64 11 kB

Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429
  1. ###############################################################################
  2. # Copyright (c) 2025, The OpenBLAS Project
  3. # All rights reserved.
  4. # Redistribution and use in source and binary forms, with or without
  5. # modification, are permitted provided that the following conditions are
  6. # met:
  7. # 1. Redistributions of source code must retain the above copyright
  8. # notice, this list of conditions and the following disclaimer.
  9. # 2. Redistributions in binary form must reproduce the above copyright
  10. # notice, this list of conditions and the following disclaimer in
  11. # the documentation and/or other materials provided with the
  12. # distribution.
  13. # 3. Neither the name of the OpenBLAS project nor the names of
  14. # its contributors may be used to endorse or promote products
  15. # derived from this software without specific prior written permission.
  16. # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  17. # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  18. # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  19. # ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
  20. # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  21. # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  22. # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  23. # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  24. # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  25. # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  26. # POSSIBILITY OF SUCH DAMAGE.
  27. ###############################################################################
  28. ifneq ($(C_COMPILER), PGI)
  29. ifeq ($(C_COMPILER), CLANG)
  30. ISCLANG=1
  31. endif
  32. ifeq ($(C_COMPILER), FUJITSU)
  33. ISCLANG=1
  34. endif
  35. ifneq (1, $(filter 1,$(GCCVERSIONGT4) $(ISCLANG)))
  36. CCOMMON_OPT += -march=armv8-a
  37. ifneq ($(F_COMPILER), NAG)
  38. FCOMMON_OPT += -march=armv8-a
  39. endif
  40. else
  41. ifeq ($(CORE), ARMV8)
  42. CCOMMON_OPT += -march=armv8-a
  43. ifneq ($(F_COMPILER), NAG)
  44. FCOMMON_OPT += -march=armv8-a
  45. endif
  46. endif
  47. ifeq ($(CORE), ARMV8SVE)
  48. CCOMMON_OPT += -march=armv8-a+sve
  49. ifneq ($(F_COMPILER), NAG)
  50. FCOMMON_OPT += -march=armv8-a+sve
  51. endif
  52. endif
  53. ifeq ($(CORE), ARMV9SME)
  54. CCOMMON_OPT += -march=armv9-a+sve2+sme
  55. FCOMMON_OPT += -march=armv9-a+sve2
  56. endif
  57. ifeq ($(CORE), CORTEXA53)
  58. CCOMMON_OPT += -march=armv8-a -mtune=cortex-a53
  59. ifneq ($(F_COMPILER), NAG)
  60. FCOMMON_OPT += -march=armv8-a -mtune=cortex-a53
  61. endif
  62. endif
  63. ifeq ($(CORE), CORTEXA57)
  64. CCOMMON_OPT += -march=armv8-a -mtune=cortex-a57
  65. ifneq ($(F_COMPILER), NAG)
  66. FCOMMON_OPT += -march=armv8-a -mtune=cortex-a57
  67. endif
  68. endif
  69. ifeq ($(CORE), CORTEXA72)
  70. CCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
  71. ifneq ($(F_COMPILER), NAG)
  72. FCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
  73. endif
  74. endif
  75. ifeq ($(CORE), CORTEXA73)
  76. CCOMMON_OPT += -march=armv8-a -mtune=cortex-a73
  77. ifneq ($(F_COMPILER), NAG)
  78. FCOMMON_OPT += -march=armv8-a -mtune=cortex-a73
  79. endif
  80. endif
  81. ifeq ($(CORE), CORTEXA76)
  82. CCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a76
  83. ifneq ($(F_COMPILER), NAG)
  84. FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a76
  85. endif
  86. endif
  87. ifeq ($(CORE), FT2000)
  88. CCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
  89. ifneq ($(F_COMPILER), NAG)
  90. FCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
  91. endif
  92. endif
  93. # Use a72 tunings because Neoverse-N1 is only available
  94. # in GCC>=9
  95. ifeq ($(CORE), NEOVERSEN1)
  96. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ7) $(ISCLANG)))
  97. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ9) $(ISCLANG)))
  98. CCOMMON_OPT += -march=armv8.2-a -mtune=neoverse-n1
  99. ifneq ($(F_COMPILER), NAG)
  100. FCOMMON_OPT += -march=armv8.2-a -mtune=neoverse-n1
  101. endif
  102. else
  103. CCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a72
  104. ifneq ($(F_COMPILER), NAG)
  105. FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a72
  106. endif
  107. endif
  108. else
  109. CCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
  110. ifneq ($(F_COMPILER), NAG)
  111. FCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
  112. endif
  113. endif
  114. endif
  115. # Use a72 tunings because Neoverse-V1 is only available
  116. # in GCC>=10.4
  117. ifeq ($(CORE), NEOVERSEV1)
  118. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ7) $(ISCLANG)))
  119. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ10) $(ISCLANG)))
  120. ifeq (1, $(filter 1,$(GCCMINORVERSIONGTEQ4) $(GCCVERSIONGTEQ11) $(ISCLANG)))
  121. CCOMMON_OPT += -march=armv8.4-a+sve+bf16
  122. ifeq (1, $(ISCLANG))
  123. CCOMMON_OPT += -mtune=cortex-x1
  124. else
  125. CCOMMON_OPT += -mtune=neoverse-v1
  126. endif
  127. ifneq ($(F_COMPILER), NAG)
  128. FCOMMON_OPT += -march=armv8.4-a -mtune=neoverse-v1
  129. endif
  130. else
  131. CCOMMON_OPT += -march=armv8.4-a+sve+bf16
  132. ifneq ($(CROSS), 1)
  133. CCOMMON_OPT += -mtune=native
  134. endif
  135. ifneq ($(F_COMPILER), NAG)
  136. FCOMMON_OPT += -march=armv8.4-a
  137. ifneq ($(CROSS), 1)
  138. FCOMMON_OPT += -mtune=native
  139. endif
  140. endif
  141. endif
  142. else
  143. CCOMMON_OPT += -march=armv8.2-a+sve -mtune=cortex-a72
  144. ifneq ($(F_COMPILER), NAG)
  145. FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a72
  146. endif
  147. endif
  148. else
  149. CCOMMON_OPT += -march=armv8-a+sve -mtune=cortex-a72
  150. ifneq ($(F_COMPILER), NAG)
  151. FCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
  152. endif
  153. endif
  154. endif
  155. # Use a72 tunings because Neoverse-N2 is only available
  156. # in GCC>=10.4
  157. ifeq ($(CORE), NEOVERSEN2)
  158. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ7) $(ISCLANG)))
  159. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ10) $(ISCLANG)))
  160. ifeq (1, $(filter 1,$(GCCMINORVERSIONGTEQ4) $(GCCVERSIONGTEQ11) $(ISCLANG)))
  161. ifneq ($(OSNAME), Darwin)
  162. CCOMMON_OPT += -march=armv8.5-a+sve+sve2+bf16 -mtune=neoverse-n2
  163. else
  164. CCOMMON_OPT += -march=armv8.2-a+sve+bf16 -mtune=cortex-a72
  165. endif
  166. ifneq ($(F_COMPILER), NAG)
  167. FCOMMON_OPT += -march=armv8.5-a+sve+sve2+bf16 -mtune=neoverse-n2
  168. endif
  169. else
  170. CCOMMON_OPT += -march=armv8.5-a+sve+bf16
  171. ifneq ($(CROSS), 1)
  172. CCOMMON_OPT += -mtune=native
  173. endif
  174. ifneq ($(F_COMPILER), NAG)
  175. FCOMMON_OPT += -march=armv8.5-a
  176. ifneq ($(CROSS), 1)
  177. FCOMMON_OPT += -mtune=native
  178. endif
  179. endif
  180. endif
  181. else
  182. CCOMMON_OPT += -march=armv8.2-a+sve+bf16 -mtune=cortex-a72
  183. ifneq ($(F_COMPILER), NAG)
  184. FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a72
  185. endif
  186. endif
  187. else
  188. CCOMMON_OPT += -march=armv8-a+sve+bf16 -mtune=cortex-a72
  189. ifneq ($(F_COMPILER), NAG)
  190. FCOMMON_OPT += -march=armv8-a -mtune=cortex-a72
  191. endif
  192. endif
  193. endif
  194. # Detect ARM Neoverse V2.
  195. ifeq ($(CORE), NEOVERSEV2)
  196. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ13) $(ISCLANG)))
  197. CCOMMON_OPT += -mcpu=neoverse-v2
  198. ifneq ($(F_COMPILER), NAG)
  199. FCOMMON_OPT += -mcpu=neoverse-v2
  200. endif
  201. else
  202. CCOMMON_OPT += -march=armv8.2-a+sve+bf16 -mtune=neoverse-n1
  203. ifneq ($(F_COMPILER), NAG)
  204. FCOMMON_OPT += -march=armv8.2-a -mtune=neoverse-n1
  205. endif
  206. endif
  207. endif
  208. # Detect Ampere AmpereOne(ampere1,ampere1a) processors.
  209. ifeq ($(CORE), AMPERE1)
  210. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ12) $(ISCLANG)))
  211. CCOMMON_OPT += -march=armv8.6-a+crypto+crc+fp16+sha3+rng
  212. ifneq ($(F_COMPILER), NAG)
  213. FCOMMON_OPT += -march=armv8.6-a+crypto+crc+fp16+sha3+rng
  214. endif
  215. endif
  216. endif
  217. # Use a53 tunings because a55 is only available in GCC>=8.1
  218. ifeq ($(CORE), CORTEXA55)
  219. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ7) $(ISCLANG)))
  220. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ8) $(ISCLANG)))
  221. CCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a55
  222. ifneq ($(F_COMPILER), NAG)
  223. FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a55
  224. endif
  225. else
  226. CCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a53
  227. ifneq ($(F_COMPILER), NAG)
  228. FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a53
  229. endif
  230. endif
  231. else
  232. CCOMMON_OPT += -march=armv8-a -mtune=cortex-a53
  233. ifneq ($(F_COMPILER), NAG)
  234. FCOMMON_OPT += -march=armv8-a -mtune=cortex-a53
  235. endif
  236. endif
  237. endif
  238. ifeq ($(CORE), THUNDERX)
  239. CCOMMON_OPT += -march=armv8-a -mtune=thunderx
  240. ifneq ($(F_COMPILER), NAG)
  241. FCOMMON_OPT += -march=armv8-a -mtune=thunderx
  242. endif
  243. endif
  244. ifeq ($(CORE), FALKOR)
  245. CCOMMON_OPT += -march=armv8-a -mtune=falkor
  246. ifneq ($(F_COMPILER), NAG)
  247. FCOMMON_OPT += -march=armv8-a -mtune=falkor
  248. endif
  249. endif
  250. ifeq ($(CORE), THUNDERX2T99)
  251. CCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
  252. ifneq ($(F_COMPILER), NAG)
  253. FCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
  254. endif
  255. endif
  256. ifeq ($(CORE), THUNDERX3T110)
  257. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ10) $(ISCLANG)))
  258. CCOMMON_OPT += -march=armv8.3-a
  259. ifeq (0, $(ISCLANG))
  260. CCOMMON_OPT += -mtune=thunderx3t110
  261. else
  262. CCOMMON_OPT += -mtune=thunderx2t99
  263. endif
  264. ifneq ($(F_COMPILER), NAG)
  265. FCOMMON_OPT += -march=armv8.3-a -mtune=thunderx3t110
  266. endif
  267. else
  268. CCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
  269. ifneq ($(F_COMPILER), NAG)
  270. FCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
  271. endif
  272. endif
  273. endif
  274. ifeq ($(CORE), VORTEX)
  275. CCOMMON_OPT += -march=armv8.3-a
  276. ifneq ($(F_COMPILER), NAG)
  277. FCOMMON_OPT += -march=armv8.3-a
  278. endif
  279. endif
  280. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ9) $(ISCLANG)))
  281. ifeq ($(CORE), TSV110)
  282. CCOMMON_OPT += -march=armv8.2-a -mtune=tsv110
  283. ifneq ($(F_COMPILER), NAG)
  284. FCOMMON_OPT += -march=armv8.2-a -mtune=tsv110
  285. endif
  286. endif
  287. endif
  288. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ9) $(ISCLANG)))
  289. ifeq ($(CORE), EMAG8180)
  290. CCOMMON_OPT += -march=armv8-a
  291. ifeq ($(ISCLANG), 0)
  292. CCOMMON_OPT += -mtune=emag
  293. endif
  294. ifneq ($(F_COMPILER), NAG)
  295. FCOMMON_OPT += -march=armv8-a -mtune=emag
  296. endif
  297. endif
  298. endif
  299. ifeq ($(CORE), A64FX)
  300. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ10) $(ISCLANG)))
  301. ifeq (1, $(filter 1,$(GCCMINORVERSIONGTEQ3) $(GCCVERSIONGTEQ11) $(ISCLANG)))
  302. CCOMMON_OPT += -march=armv8.2-a+sve -mtune=a64fx
  303. ifneq ($(F_COMPILER), NAG)
  304. FCOMMON_OPT += -march=armv8.2-a+sve -mtune=a64fx
  305. endif
  306. else
  307. CCOMMON_OPT += -march=armv8.4-a+sve -mtune=neoverse-n1
  308. ifneq ($(F_COMPILER), NAG)
  309. FCOMMON_OPT += -march=armv8.4-a -mtune=neoverse-n1
  310. endif
  311. endif
  312. endif
  313. endif
  314. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ11) $(ISCLANG)))
  315. ifeq ($(CORE), CORTEXX1)
  316. CCOMMON_OPT += -march=armv8.2-a
  317. ifeq (1, $(filter 1,$(GCCMINORVERSIONGTEQ4) $(GCCVERSIONGTEQ12) $(ISCLANG)))
  318. CCOMMON_OPT += -mtune=cortex-x1
  319. ifneq ($(F_COMPILER), NAG)
  320. FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-x1
  321. endif
  322. else
  323. CCOMMON_OPT += -mtune=cortex-a72
  324. ifneq ($(F_COMPILER), NAG)
  325. FCOMMON_OPT += -march=armv8.2-a -mtune=cortex-a72
  326. endif
  327. endif
  328. endif
  329. endif
  330. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ11) $(ISCLANG)))
  331. ifeq ($(CORE), CORTEXX2)
  332. CCOMMON_OPT += -march=armv8.4-a+sve
  333. ifneq ($(F_COMPILER), NAG)
  334. FCOMMON_OPT += -march=armv8.4-a+sve
  335. endif
  336. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ12) $(ISCLANG)))
  337. CCOMMON_OPT += -mtune=cortex-x2
  338. ifneq ($(F_COMPILER), NAG)
  339. FCOMMON_OPT += -mtune=cortex-x2
  340. endif
  341. endif
  342. endif
  343. endif
  344. #ifeq (1, $(filter 1,$(ISCLANG)))
  345. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ11) $(ISCLANG)))
  346. ifeq ($(CORE), CORTEXA510)
  347. CCOMMON_OPT += -march=armv8.4-a+sve
  348. ifneq ($(F_COMPILER), NAG)
  349. FCOMMON_OPT += -march=armv8.4-a+sve
  350. endif
  351. endif
  352. endif
  353. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ11) $(ISCLANG)))
  354. ifeq ($(CORE), CORTEXA710)
  355. CCOMMON_OPT += -march=armv8.4-a+sve
  356. ifneq ($(F_COMPILER), NAG)
  357. FCOMMON_OPT += -march=armv8.4-a+sve
  358. endif
  359. ifeq (1, $(filter 1,$(GCCVERSIONGTEQ12) $(ISCLANG)))
  360. CCOMMON_OPT += -mtune=cortex-a710
  361. ifneq ($(F_COMPILER), NAG)
  362. FCOMMON_OPT += -mtune=cortex-a710
  363. endif
  364. endif
  365. endif
  366. endif
  367. endif
  368. else
  369. # NVIDIA HPC options necessary to enable SVE in the compiler
  370. ifeq ($(CORE), THUNDERX2T99)
  371. CCOMMON_OPT += -tp=thunderx2t99
  372. FCOMMON_OPT += -tp=thunderx2t99
  373. endif
  374. ifeq ($(CORE), NEOVERSEN1)
  375. CCOMMON_OPT += -tp=neoverse-n1
  376. FCOMMON_OPT += -tp=neoverse-n1
  377. endif
  378. ifeq ($(CORE), NEOVERSEV1)
  379. CCOMMON_OPT += -tp=neoverse-v1
  380. FCOMMON_OPT += -tp=neoverse-v1
  381. endif
  382. ifeq ($(CORE), NEOVERSEV2)
  383. CCOMMON_OPT += -tp=neoverse-v2
  384. FCOMMON_OPT += -tp=neoverse-v2
  385. endif
  386. ifeq ($(CORE), ARMV8SVE)
  387. CCOMMON_OPT += -tp=neoverse-v2
  388. FCOMMON_OPT += -tp=neoverse-v2
  389. endif
  390. ifeq ($(CORE), ARMV9SVE)
  391. CCOMMON_OPT += -tp=neoverse-v2
  392. FCOMMON_OPT += -tp=neoverse-v2
  393. endif
  394. endif