You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

getarch.c 66 kB

6 years ago
14 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
10 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
6 years ago
6 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191119211931194119511961197119811991200120112021203120412051206120712081209121012111212121312141215121612171218121912201221122212231224122512261227122812291230123112321233123412351236123712381239124012411242124312441245124612471248124912501251125212531254125512561257125812591260126112621263126412651266126712681269127012711272127312741275127612771278127912801281128212831284128512861287128812891290129112921293129412951296129712981299130013011302130313041305130613071308130913101311131213131314131513161317131813191320132113221323132413251326132713281329133013311332133313341335133613371338133913401341134213431344134513461347134813491350135113521353135413551356135713581359136013611362136313641365136613671368136913701371137213731374137513761377137813791380138113821383138413851386138713881389139013911392139313941395139613971398139914001401140214031404140514061407140814091410141114121413141414151416141714181419142014211422142314241425142614271428142914301431143214331434143514361437143814391440144114421443144414451446144714481449145014511452145314541455145614571458145914601461146214631464146514661467146814691470147114721473147414751476147714781479148014811482148314841485148614871488148914901491149214931494149514961497149814991500150115021503150415051506150715081509151015111512151315141515151615171518151915201521152215231524152515261527152815291530153115321533153415351536153715381539154015411542154315441545154615471548154915501551155215531554155515561557155815591560156115621563156415651566156715681569157015711572157315741575157615771578157915801581158215831584158515861587158815891590159115921593159415951596159715981599160016011602160316041605160616071608160916101611161216131614161516161617161816191620162116221623162416251626162716281629163016311632163316341635163616371638163916401641164216431644164516461647164816491650165116521653165416551656165716581659166016611662166316641665166616671668166916701671167216731674167516761677167816791680168116821683168416851686168716881689169016911692169316941695169616971698169917001701170217031704170517061707170817091710171117121713171417151716171717181719172017211722172317241725172617271728172917301731173217331734173517361737173817391740174117421743174417451746174717481749175017511752175317541755175617571758175917601761176217631764176517661767176817691770177117721773177417751776177717781779178017811782178317841785178617871788178917901791179217931794179517961797179817991800180118021803180418051806180718081809181018111812181318141815181618171818181918201821182218231824182518261827182818291830183118321833183418351836183718381839184018411842184318441845184618471848184918501851185218531854185518561857185818591860186118621863186418651866186718681869187018711872187318741875187618771878187918801881188218831884188518861887188818891890189118921893189418951896189718981899190019011902190319041905190619071908190919101911191219131914191519161917191819191920192119221923192419251926192719281929193019311932193319341935193619371938193919401941194219431944194519461947194819491950195119521953195419551956195719581959196019611962196319641965196619671968196919701971197219731974197519761977197819791980198119821983198419851986198719881989199019911992199319941995199619971998
  1. /*****************************************************************************
  2. Copyright (c) 2011-2014, The OpenBLAS Project
  3. All rights reserved.
  4. Redistribution and use in source and binary forms, with or without
  5. modification, are permitted provided that the following conditions are
  6. met:
  7. 1. Redistributions of source code must retain the above copyright
  8. notice, this list of conditions and the following disclaimer.
  9. 2. Redistributions in binary form must reproduce the above copyright
  10. notice, this list of conditions and the following disclaimer in
  11. the documentation and/or other materials provided with the
  12. distribution.
  13. 3. Neither the name of the OpenBLAS project nor the names of
  14. its contributors may be used to endorse or promote products
  15. derived from this software without specific prior written
  16. permission.
  17. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  18. AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  19. IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  20. ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  21. LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  22. DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  23. SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  24. CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  25. OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
  26. USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  27. **********************************************************************************/
  28. /*********************************************************************/
  29. /* Copyright 2009, 2010 The University of Texas at Austin. */
  30. /* All rights reserved. */
  31. /* */
  32. /* Redistribution and use in source and binary forms, with or */
  33. /* without modification, are permitted provided that the following */
  34. /* conditions are met: */
  35. /* */
  36. /* 1. Redistributions of source code must retain the above */
  37. /* copyright notice, this list of conditions and the following */
  38. /* disclaimer. */
  39. /* */
  40. /* 2. Redistributions in binary form must reproduce the above */
  41. /* copyright notice, this list of conditions and the following */
  42. /* disclaimer in the documentation and/or other materials */
  43. /* provided with the distribution. */
  44. /* */
  45. /* THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF TEXAS AT */
  46. /* AUSTIN ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, */
  47. /* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */
  48. /* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */
  49. /* DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT */
  50. /* AUSTIN OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, */
  51. /* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES */
  52. /* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE */
  53. /* GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR */
  54. /* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF */
  55. /* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT */
  56. /* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT */
  57. /* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */
  58. /* POSSIBILITY OF SUCH DAMAGE. */
  59. /* */
  60. /* The views and conclusions contained in the software and */
  61. /* documentation are those of the authors and should not be */
  62. /* interpreted as representing official policies, either expressed */
  63. /* or implied, of The University of Texas at Austin. */
  64. /*********************************************************************/
  65. #if defined(__WIN32__) || defined(__WIN64__) || defined(__CYGWIN32__) || defined(__CYGWIN64__) || defined(_WIN32) || defined(_WIN64)
  66. #define OS_WINDOWS
  67. #endif
  68. #if defined(__i386__) || defined(__x86_64__) || defined(_M_IX86) || defined(_M_X64)
  69. #define INTEL_AMD
  70. #endif
  71. #include <stdio.h>
  72. #include <string.h>
  73. #ifdef OS_WINDOWS
  74. #include <windows.h>
  75. #endif
  76. #if defined(__FreeBSD__) || defined(__OpenBSD__) || defined(__NetBSD__) || defined(__DragonFly__) || defined(__APPLE__)
  77. #include <sys/types.h>
  78. #include <sys/sysctl.h>
  79. #endif
  80. #if defined(linux) || defined(__sun__)
  81. #include <sys/sysinfo.h>
  82. #include <unistd.h>
  83. #endif
  84. #if defined(AIX)
  85. #include <sys/sysinfo.h>
  86. #endif
  87. /* #define FORCE_P2 */
  88. /* #define FORCE_KATMAI */
  89. /* #define FORCE_COPPERMINE */
  90. /* #define FORCE_NORTHWOOD */
  91. /* #define FORCE_PRESCOTT */
  92. /* #define FORCE_BANIAS */
  93. /* #define FORCE_YONAH */
  94. /* #define FORCE_CORE2 */
  95. /* #define FORCE_PENRYN */
  96. /* #define FORCE_DUNNINGTON */
  97. /* #define FORCE_NEHALEM */
  98. /* #define FORCE_SANDYBRIDGE */
  99. /* #define FORCE_ATOM */
  100. /* #define FORCE_ATHLON */
  101. /* #define FORCE_OPTERON */
  102. /* #define FORCE_OPTERON_SSE3 */
  103. /* #define FORCE_BARCELONA */
  104. /* #define FORCE_SHANGHAI */
  105. /* #define FORCE_ISTANBUL */
  106. /* #define FORCE_BOBCAT */
  107. /* #define FORCE_BULLDOZER */
  108. /* #define FORCE_PILEDRIVER */
  109. /* #define FORCE_SSE_GENERIC */
  110. /* #define FORCE_VIAC3 */
  111. /* #define FORCE_NANO */
  112. /* #define FORCE_POWER3 */
  113. /* #define FORCE_POWER4 */
  114. /* #define FORCE_POWER5 */
  115. /* #define FORCE_POWER6 */
  116. /* #define FORCE_POWER7 */
  117. /* #define FORCE_POWER8 */
  118. /* #define FORCE_PPCG4 */
  119. /* #define FORCE_PPC970 */
  120. /* #define FORCE_PPC970MP */
  121. /* #define FORCE_PPC440 */
  122. /* #define FORCE_PPC440FP2 */
  123. /* #define FORCE_CELL */
  124. /* #define FORCE_MIPS64_GENERIC */
  125. /* #define FORCE_SICORTEX */
  126. /* #define FORCE_LOONGSON3R3 */
  127. /* #define FORCE_LOONGSON3R4 */
  128. /* #define FORCE_LOONGSON3R5 */
  129. /* #define FORCE_LOONGSON2K1000 */
  130. /* #define FORCE_LOONGSONGENERIC */
  131. /* #define FORCE_I6400 */
  132. /* #define FORCE_P6600 */
  133. /* #define FORCE_P5600 */
  134. /* #define FORCE_I6500 */
  135. /* #define FORCE_ITANIUM2 */
  136. /* #define FORCE_SPARC */
  137. /* #define FORCE_SPARCV7 */
  138. /* #define FORCE_ZARCH_GENERIC */
  139. /* #define FORCE_Z13 */
  140. /* #define FORCE_EV4 */
  141. /* #define FORCE_EV5 */
  142. /* #define FORCE_EV6 */
  143. /* #define FORCE_GENERIC */
  144. #ifdef FORCE_P2
  145. #define FORCE
  146. #define FORCE_INTEL
  147. #define ARCHITECTURE "X86"
  148. #define SUBARCHITECTURE "PENTIUM2"
  149. #define ARCHCONFIG "-DPENTIUM2 " \
  150. "-DL1_DATA_SIZE=16384 -DL1_DATA_LINESIZE=32 " \
  151. "-DL2_SIZE=512488 -DL2_LINESIZE=32 " \
  152. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  153. "-DHAVE_CMOV -DHAVE_MMX"
  154. #define LIBNAME "p2"
  155. #define CORENAME "P5"
  156. #endif
  157. #ifdef FORCE_KATMAI
  158. #define FORCE
  159. #define FORCE_INTEL
  160. #define ARCHITECTURE "X86"
  161. #define SUBARCHITECTURE "PENTIUM3"
  162. #define ARCHCONFIG "-DPENTIUM3 " \
  163. "-DL1_DATA_SIZE=16384 -DL1_DATA_LINESIZE=32 " \
  164. "-DL2_SIZE=524288 -DL2_LINESIZE=32 " \
  165. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  166. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE "
  167. #define LIBNAME "katmai"
  168. #define CORENAME "KATMAI"
  169. #endif
  170. #ifdef FORCE_COPPERMINE
  171. #define FORCE
  172. #define FORCE_INTEL
  173. #define ARCHITECTURE "X86"
  174. #define SUBARCHITECTURE "PENTIUM3"
  175. #define ARCHCONFIG "-DPENTIUM3 " \
  176. "-DL1_DATA_SIZE=16384 -DL1_DATA_LINESIZE=32 " \
  177. "-DL2_SIZE=262144 -DL2_LINESIZE=32 " \
  178. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  179. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE "
  180. #define LIBNAME "coppermine"
  181. #define CORENAME "COPPERMINE"
  182. #endif
  183. #ifdef FORCE_NORTHWOOD
  184. #define FORCE
  185. #define FORCE_INTEL
  186. #define ARCHITECTURE "X86"
  187. #define SUBARCHITECTURE "PENTIUM4"
  188. #define ARCHCONFIG "-DPENTIUM4 " \
  189. "-DL1_DATA_SIZE=8192 -DL1_DATA_LINESIZE=64 " \
  190. "-DL2_SIZE=524288 -DL2_LINESIZE=64 " \
  191. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 " \
  192. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 "
  193. #define LIBNAME "northwood"
  194. #define CORENAME "NORTHWOOD"
  195. #endif
  196. #ifdef FORCE_PRESCOTT
  197. #define FORCE
  198. #define FORCE_INTEL
  199. #define ARCHITECTURE "X86"
  200. #define SUBARCHITECTURE "PENTIUM4"
  201. #define ARCHCONFIG "-DPENTIUM4 " \
  202. "-DL1_DATA_SIZE=16384 -DL1_DATA_LINESIZE=64 " \
  203. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 " \
  204. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 " \
  205. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3"
  206. #define LIBNAME "prescott"
  207. #define CORENAME "PRESCOTT"
  208. #endif
  209. #ifdef FORCE_BANIAS
  210. #define FORCE
  211. #define FORCE_INTEL
  212. #define ARCHITECTURE "X86"
  213. #define SUBARCHITECTURE "BANIAS"
  214. #define ARCHCONFIG "-DPENTIUMM " \
  215. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  216. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 " \
  217. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  218. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 "
  219. #define LIBNAME "banias"
  220. #define CORENAME "BANIAS"
  221. #endif
  222. #ifdef FORCE_YONAH
  223. #define FORCE
  224. #define FORCE_INTEL
  225. #define ARCHITECTURE "X86"
  226. #define SUBARCHITECTURE "YONAH"
  227. #define ARCHCONFIG "-DPENTIUMM " \
  228. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  229. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 " \
  230. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  231. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 "
  232. #define LIBNAME "yonah"
  233. #define CORENAME "YONAH"
  234. #endif
  235. #ifdef FORCE_CORE2
  236. #define FORCE
  237. #define FORCE_INTEL
  238. #define ARCHITECTURE "X86"
  239. #define SUBARCHITECTURE "CONRORE"
  240. #define ARCHCONFIG "-DCORE2 " \
  241. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  242. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 " \
  243. "-DDTB_DEFAULT_ENTRIES=256 -DDTB_SIZE=4096 " \
  244. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3"
  245. #define LIBNAME "core2"
  246. #define CORENAME "CORE2"
  247. #endif
  248. #ifdef FORCE_PENRYN
  249. #define FORCE
  250. #define FORCE_INTEL
  251. #define ARCHITECTURE "X86"
  252. #define SUBARCHITECTURE "PENRYN"
  253. #define ARCHCONFIG "-DPENRYN " \
  254. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  255. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 " \
  256. "-DDTB_DEFAULT_ENTRIES=256 -DDTB_SIZE=4096 " \
  257. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1"
  258. #define LIBNAME "penryn"
  259. #define CORENAME "PENRYN"
  260. #endif
  261. #ifdef FORCE_DUNNINGTON
  262. #define FORCE
  263. #define FORCE_INTEL
  264. #define ARCHITECTURE "X86"
  265. #define SUBARCHITECTURE "DUNNINGTON"
  266. #define ARCHCONFIG "-DDUNNINGTON " \
  267. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  268. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 " \
  269. "-DL3_SIZE=16777216 -DL3_LINESIZE=64 " \
  270. "-DDTB_DEFAULT_ENTRIES=256 -DDTB_SIZE=4096 " \
  271. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1"
  272. #define LIBNAME "dunnington"
  273. #define CORENAME "DUNNINGTON"
  274. #endif
  275. #ifdef FORCE_NEHALEM
  276. #define FORCE
  277. #define FORCE_INTEL
  278. #define ARCHITECTURE "X86"
  279. #define SUBARCHITECTURE "NEHALEM"
  280. #define ARCHCONFIG "-DNEHALEM " \
  281. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  282. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  283. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  284. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2"
  285. #define LIBNAME "nehalem"
  286. #define CORENAME "NEHALEM"
  287. #endif
  288. #ifdef FORCE_SANDYBRIDGE
  289. #define FORCE
  290. #define FORCE_INTEL
  291. #define ARCHITECTURE "X86"
  292. #ifdef NO_AVX
  293. #define SUBARCHITECTURE "NEHALEM"
  294. #define ARCHCONFIG "-DNEHALEM " \
  295. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  296. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  297. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  298. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2"
  299. #define LIBNAME "nehalem"
  300. #define CORENAME "NEHALEM"
  301. #else
  302. #define SUBARCHITECTURE "SANDYBRIDGE"
  303. #define ARCHCONFIG "-DSANDYBRIDGE " \
  304. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  305. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  306. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  307. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 -DHAVE_AVX"
  308. #define LIBNAME "sandybridge"
  309. #define CORENAME "SANDYBRIDGE"
  310. #endif
  311. #endif
  312. #ifdef FORCE_HASWELL
  313. #define FORCE
  314. #define FORCE_INTEL
  315. #define ARCHITECTURE "X86"
  316. #ifdef NO_AVX2
  317. #ifdef NO_AVX
  318. #define SUBARCHITECTURE "NEHALEM"
  319. #define ARCHCONFIG "-DNEHALEM " \
  320. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  321. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  322. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  323. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2"
  324. #define LIBNAME "nehalem"
  325. #define CORENAME "NEHALEM"
  326. #else
  327. #define SUBARCHITECTURE "SANDYBRIDGE"
  328. #define ARCHCONFIG "-DSANDYBRIDGE " \
  329. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  330. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  331. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  332. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 -DHAVE_AVX"
  333. #define LIBNAME "sandybridge"
  334. #define CORENAME "SANDYBRIDGE"
  335. #endif
  336. #else
  337. #define SUBARCHITECTURE "HASWELL"
  338. #define ARCHCONFIG "-DHASWELL " \
  339. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  340. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  341. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  342. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 -DHAVE_AVX " \
  343. "-DHAVE_AVX2 -DHAVE_FMA3 -DFMA3"
  344. #define LIBNAME "haswell"
  345. #define CORENAME "HASWELL"
  346. #endif
  347. #endif
  348. #ifdef FORCE_SKYLAKEX
  349. #define FORCE
  350. #define FORCE_INTEL
  351. #define ARCHITECTURE "X86"
  352. #ifdef NO_AVX512
  353. #ifdef NO_AVX2
  354. #ifdef NO_AVX
  355. #define SUBARCHITECTURE "NEHALEM"
  356. #define ARCHCONFIG "-DNEHALEM " \
  357. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  358. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  359. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  360. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2"
  361. #define LIBNAME "nehalem"
  362. #define CORENAME "NEHALEM"
  363. #else
  364. #define SUBARCHITECTURE "SANDYBRIDGE"
  365. #define ARCHCONFIG "-DSANDYBRIDGE " \
  366. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  367. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  368. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  369. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 -DHAVE_AVX"
  370. #define LIBNAME "sandybridge"
  371. #define CORENAME "SANDYBRIDGE"
  372. #endif
  373. #else
  374. #define SUBARCHITECTURE "HASWELL"
  375. #define ARCHCONFIG "-DHASWELL " \
  376. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  377. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  378. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  379. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 -DHAVE_AVX " \
  380. "-DHAVE_AVX2 -DHAVE_FMA3 -DFMA3"
  381. #define LIBNAME "haswell"
  382. #define CORENAME "HASWELL"
  383. #endif
  384. #else
  385. #define SUBARCHITECTURE "SKYLAKEX"
  386. #define ARCHCONFIG "-DSKYLAKEX " \
  387. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  388. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  389. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  390. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 -DHAVE_AVX " \
  391. "-DHAVE_AVX2 -DHAVE_FMA3 -DFMA3 -DHAVE_AVX512VL -march=skylake-avx512"
  392. #define LIBNAME "skylakex"
  393. #define CORENAME "SKYLAKEX"
  394. #endif
  395. #endif
  396. #ifdef FORCE_COOPERLAKE
  397. #define FORCE
  398. #define FORCE_INTEL
  399. #define ARCHITECTURE "X86"
  400. #ifdef NO_AVX512
  401. #ifdef NO_AVX2
  402. #ifdef NO_AVX
  403. #define SUBARCHITECTURE "NEHALEM"
  404. #define ARCHCONFIG "-DNEHALEM " \
  405. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  406. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  407. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  408. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2"
  409. #define LIBNAME "nehalem"
  410. #define CORENAME "NEHALEM"
  411. #else
  412. #define SUBARCHITECTURE "SANDYBRIDGE"
  413. #define ARCHCONFIG "-DSANDYBRIDGE " \
  414. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  415. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  416. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  417. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 -DHAVE_AVX"
  418. #define LIBNAME "sandybridge"
  419. #define CORENAME "SANDYBRIDGE"
  420. #endif
  421. #else
  422. #define SUBARCHITECTURE "HASWELL"
  423. #define ARCHCONFIG "-DHASWELL " \
  424. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  425. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  426. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  427. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 -DHAVE_AVX " \
  428. "-DHAVE_AVX2 -DHAVE_FMA3 -DFMA3"
  429. #define LIBNAME "haswell"
  430. #define CORENAME "HASWELL"
  431. #endif
  432. #else
  433. #define SUBARCHITECTURE "COOPERLAKE"
  434. #define ARCHCONFIG "-DCOOPERLAKE " \
  435. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  436. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  437. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  438. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 -DHAVE_AVX " \
  439. "-DHAVE_AVX2 -DHAVE_FMA3 -DFMA3 -DHAVE_AVX512VL -DHAVE_AVX512BF16 -march=cooperlake"
  440. #define LIBNAME "cooperlake"
  441. #define CORENAME "COOPERLAKE"
  442. #endif
  443. #endif
  444. #ifdef FORCE_SAPPHIRERAPIDS
  445. #define FORCE
  446. #define FORCE_INTEL
  447. #define ARCHITECTURE "X86"
  448. #ifdef NO_AVX512
  449. #ifdef NO_AVX2
  450. #ifdef NO_AVX
  451. #define SUBARCHITECTURE "NEHALEM"
  452. #define ARCHCONFIG "-DNEHALEM " \
  453. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  454. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  455. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  456. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2"
  457. #define LIBNAME "nehalem"
  458. #define CORENAME "NEHALEM"
  459. #else
  460. #define SUBARCHITECTURE "SANDYBRIDGE"
  461. #define ARCHCONFIG "-DSANDYBRIDGE " \
  462. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  463. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  464. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  465. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 -DHAVE_AVX"
  466. #define LIBNAME "sandybridge"
  467. #define CORENAME "SANDYBRIDGE"
  468. #endif
  469. #else
  470. #define SUBARCHITECTURE "HASWELL"
  471. #define ARCHCONFIG "-DHASWELL " \
  472. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  473. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  474. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  475. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 -DHAVE_AVX " \
  476. "-DHAVE_AVX2 -DHAVE_FMA3 -DFMA3"
  477. #define LIBNAME "haswell"
  478. #define CORENAME "HASWELL"
  479. #endif
  480. #else
  481. #define SUBARCHITECTURE "SAPPHIRERAPIDS"
  482. #define ARCHCONFIG "-DSAPPHIRERAPIDS " \
  483. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  484. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  485. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  486. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 -DHAVE_AVX " \
  487. "-DHAVE_AVX2 -DHAVE_FMA3 -DFMA3 -DHAVE_AVX512VL -DHAVE_AVX512BF16 -march=sapphirerapids"
  488. #define LIBNAME "sapphirerapids"
  489. #define CORENAME "SAPPHIRERAPIDS"
  490. #endif
  491. #endif
  492. #ifdef FORCE_ATOM
  493. #define FORCE
  494. #define FORCE_INTEL
  495. #define ARCHITECTURE "X86"
  496. #define SUBARCHITECTURE "ATOM"
  497. #define ARCHCONFIG "-DATOM " \
  498. "-DL1_DATA_SIZE=24576 -DL1_DATA_LINESIZE=64 " \
  499. "-DL2_SIZE=524288 -DL2_LINESIZE=64 " \
  500. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=4 " \
  501. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3"
  502. #define LIBNAME "atom"
  503. #define CORENAME "ATOM"
  504. #endif
  505. #ifdef FORCE_ATHLON
  506. #define FORCE
  507. #define FORCE_INTEL
  508. #define ARCHITECTURE "X86"
  509. #define SUBARCHITECTURE "ATHLON"
  510. #define ARCHCONFIG "-DATHLON " \
  511. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 " \
  512. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 " \
  513. "-DDTB_DEFAULT_ENTRIES=32 -DDTB_SIZE=4096 -DHAVE_3DNOW " \
  514. "-DHAVE_3DNOWEX -DHAVE_MMX -DHAVE_SSE "
  515. #define LIBNAME "athlon"
  516. #define CORENAME "ATHLON"
  517. #endif
  518. #ifdef FORCE_OPTERON
  519. #define FORCE
  520. #define FORCE_INTEL
  521. #define ARCHITECTURE "X86"
  522. #define SUBARCHITECTURE "OPTERON"
  523. #define ARCHCONFIG "-DOPTERON " \
  524. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 " \
  525. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 " \
  526. "-DDTB_DEFAULT_ENTRIES=32 -DDTB_SIZE=4096 -DHAVE_3DNOW " \
  527. "-DHAVE_3DNOWEX -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 "
  528. #define LIBNAME "opteron"
  529. #define CORENAME "OPTERON"
  530. #endif
  531. #ifdef FORCE_OPTERON_SSE3
  532. #define FORCE
  533. #define FORCE_INTEL
  534. #define ARCHITECTURE "X86"
  535. #define SUBARCHITECTURE "OPTERON"
  536. #define ARCHCONFIG "-DOPTERON " \
  537. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 " \
  538. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 " \
  539. "-DDTB_DEFAULT_ENTRIES=32 -DDTB_SIZE=4096 -DHAVE_3DNOW " \
  540. "-DHAVE_3DNOWEX -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3"
  541. #define LIBNAME "opteron"
  542. #define CORENAME "OPTERON"
  543. #endif
  544. #if defined(FORCE_BARCELONA) || defined(FORCE_SHANGHAI) || defined(FORCE_ISTANBUL)
  545. #define FORCE
  546. #define FORCE_INTEL
  547. #define ARCHITECTURE "X86"
  548. #define SUBARCHITECTURE "BARCELONA"
  549. #define ARCHCONFIG "-DBARCELONA " \
  550. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 " \
  551. "-DL2_SIZE=524288 -DL2_LINESIZE=64 -DL3_SIZE=2097152 " \
  552. "-DDTB_DEFAULT_ENTRIES=48 -DDTB_SIZE=4096 " \
  553. "-DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 " \
  554. "-DHAVE_SSE4A -DHAVE_MISALIGNSSE -DHAVE_128BITFPU -DHAVE_FASTMOVU"
  555. #define LIBNAME "barcelona"
  556. #define CORENAME "BARCELONA"
  557. #endif
  558. #if defined(FORCE_BOBCAT)
  559. #define FORCE
  560. #define FORCE_INTEL
  561. #define ARCHITECTURE "X86"
  562. #define SUBARCHITECTURE "BOBCAT"
  563. #define ARCHCONFIG "-DBOBCAT " \
  564. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  565. "-DL2_SIZE=524288 -DL2_LINESIZE=64 " \
  566. "-DDTB_DEFAULT_ENTRIES=40 -DDTB_SIZE=4096 " \
  567. "-DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 " \
  568. "-DHAVE_SSE4A -DHAVE_MISALIGNSSE -DHAVE_CFLUSH -DHAVE_CMOV"
  569. #define LIBNAME "bobcat"
  570. #define CORENAME "BOBCAT"
  571. #endif
  572. #if defined (FORCE_BULLDOZER)
  573. #define FORCE
  574. #define FORCE_INTEL
  575. #define ARCHITECTURE "X86"
  576. #define SUBARCHITECTURE "BULLDOZER"
  577. #define ARCHCONFIG "-DBULLDOZER " \
  578. "-DL1_DATA_SIZE=49152 -DL1_DATA_LINESIZE=64 " \
  579. "-DL2_SIZE=1024000 -DL2_LINESIZE=64 -DL3_SIZE=16777216 " \
  580. "-DDTB_DEFAULT_ENTRIES=32 -DDTB_SIZE=4096 " \
  581. "-DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 " \
  582. "-DHAVE_SSE4A -DHAVE_MISALIGNSSE -DHAVE_128BITFPU -DHAVE_FASTMOVU " \
  583. "-DHAVE_AVX"
  584. #define LIBNAME "bulldozer"
  585. #define CORENAME "BULLDOZER"
  586. #endif
  587. #if defined (FORCE_PILEDRIVER)
  588. #define FORCE
  589. #define FORCE_INTEL
  590. #define ARCHITECTURE "X86"
  591. #define SUBARCHITECTURE "PILEDRIVER"
  592. #define ARCHCONFIG "-DPILEDRIVER " \
  593. "-DL1_DATA_SIZE=16384 -DL1_DATA_LINESIZE=64 " \
  594. "-DL2_SIZE=2097152 -DL2_LINESIZE=64 -DL3_SIZE=12582912 " \
  595. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  596. "-DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 " \
  597. "-DHAVE_SSE4A -DHAVE_MISALIGNSSE -DHAVE_128BITFPU -DHAVE_FASTMOVU -DHAVE_CFLUSH " \
  598. "-DHAVE_AVX -DHAVE_FMA3"
  599. #define LIBNAME "piledriver"
  600. #define CORENAME "PILEDRIVER"
  601. #endif
  602. #if defined (FORCE_STEAMROLLER)
  603. #define FORCE
  604. #define FORCE_INTEL
  605. #define ARCHITECTURE "X86"
  606. #define SUBARCHITECTURE "STEAMROLLER"
  607. #define ARCHCONFIG "-DSTEAMROLLER " \
  608. "-DL1_DATA_SIZE=16384 -DL1_DATA_LINESIZE=64 " \
  609. "-DL2_SIZE=2097152 -DL2_LINESIZE=64 -DL3_SIZE=12582912 " \
  610. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  611. "-DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 " \
  612. "-DHAVE_SSE4A -DHAVE_MISALIGNSSE -DHAVE_128BITFPU -DHAVE_FASTMOVU -DHAVE_CFLUSH " \
  613. "-DHAVE_AVX -DHAVE_FMA3"
  614. #define LIBNAME "steamroller"
  615. #define CORENAME "STEAMROLLER"
  616. #endif
  617. #if defined (FORCE_EXCAVATOR)
  618. #define FORCE
  619. #define FORCE_INTEL
  620. #define ARCHITECTURE "X86"
  621. #define SUBARCHITECTURE "EXCAVATOR"
  622. #define ARCHCONFIG "-DEXCAVATOR " \
  623. "-DL1_DATA_SIZE=16384 -DL1_DATA_LINESIZE=64 " \
  624. "-DL2_SIZE=2097152 -DL2_LINESIZE=64 -DL3_SIZE=12582912 " \
  625. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  626. "-DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 " \
  627. "-DHAVE_SSE4A -DHAVE_MISALIGNSSE -DHAVE_128BITFPU -DHAVE_FASTMOVU -DHAVE_CFLUSH " \
  628. "-DHAVE_AVX -DHAVE_FMA3"
  629. #define LIBNAME "excavator"
  630. #define CORENAME "EXCAVATOR"
  631. #endif
  632. #if defined (FORCE_ZEN)
  633. #define FORCE
  634. #define FORCE_INTEL
  635. #define ARCHITECTURE "X86"
  636. #ifdef NO_AVX2
  637. #ifdef NO_AVX
  638. #define SUBARCHITECTURE "NEHALEM"
  639. #define ARCHCONFIG "-DNEHALEM " \
  640. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  641. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  642. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  643. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2"
  644. #define LIBNAME "nehalem"
  645. #define CORENAME "NEHALEM"
  646. #else
  647. #define SUBARCHITECTURE "SANDYBRIDGE"
  648. #define ARCHCONFIG "-DSANDYBRIDGE " \
  649. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  650. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  651. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  652. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 -DHAVE_AVX"
  653. #define LIBNAME "sandybridge"
  654. #define CORENAME "SANDYBRIDGE"
  655. #endif
  656. #else
  657. #define SUBARCHITECTURE "ZEN"
  658. #define ARCHCONFIG "-DZEN " \
  659. "-DL1_CODE_SIZE=32768 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=8 " \
  660. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 -DL2_CODE_ASSOCIATIVE=8 " \
  661. "-DL2_SIZE=524288 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=8 " \
  662. "-DL3_SIZE=16777216 -DL3_LINESIZE=64 -DL3_ASSOCIATIVE=8 " \
  663. "-DITB_DEFAULT_ENTRIES=64 -DITB_SIZE=4096 " \
  664. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  665. "-DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSE4_1 -DHAVE_SSE4_2 " \
  666. "-DHAVE_SSE4A -DHAVE_MISALIGNSSE -DHAVE_128BITFPU -DHAVE_FASTMOVU -DHAVE_CFLUSH " \
  667. "-DHAVE_AVX -DHAVE_AVX2 -DHAVE_FMA3 -DFMA3"
  668. #define LIBNAME "zen"
  669. #define CORENAME "ZEN"
  670. #endif
  671. #endif
  672. #ifdef FORCE_SSE_GENERIC
  673. #define FORCE
  674. #define FORCE_INTEL
  675. #define ARCHITECTURE "X86"
  676. #define SUBARCHITECTURE "GENERIC"
  677. #define ARCHCONFIG "-DGENERIC " \
  678. "-DL1_DATA_SIZE=16384 -DL1_DATA_LINESIZE=64 " \
  679. "-DL2_SIZE=524288 -DL2_LINESIZE=64 " \
  680. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 " \
  681. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2"
  682. #define LIBNAME "generic"
  683. #define CORENAME "GENERIC"
  684. #endif
  685. #ifdef FORCE_VIAC3
  686. #define FORCE
  687. #define FORCE_INTEL
  688. #define ARCHITECTURE "X86"
  689. #define SUBARCHITECTURE "VIAC3"
  690. #define ARCHCONFIG "-DVIAC3 " \
  691. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=32 " \
  692. "-DL2_SIZE=65536 -DL2_LINESIZE=32 " \
  693. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 " \
  694. "-DHAVE_MMX -DHAVE_SSE "
  695. #define LIBNAME "viac3"
  696. #define CORENAME "VIAC3"
  697. #endif
  698. #ifdef FORCE_NANO
  699. #define FORCE
  700. #define FORCE_INTEL
  701. #define ARCHITECTURE "X86"
  702. #define SUBARCHITECTURE "NANO"
  703. #define ARCHCONFIG "-DNANO " \
  704. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 " \
  705. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 " \
  706. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 " \
  707. "-DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3"
  708. #define LIBNAME "nano"
  709. #define CORENAME "NANO"
  710. #endif
  711. #ifdef FORCE_POWER3
  712. #define FORCE
  713. #define ARCHITECTURE "POWER"
  714. #define SUBARCHITECTURE "POWER3"
  715. #define SUBDIRNAME "power"
  716. #define ARCHCONFIG "-DPOWER3 " \
  717. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=128 " \
  718. "-DL2_SIZE=2097152 -DL2_LINESIZE=128 " \
  719. "-DDTB_DEFAULT_ENTRIES=256 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  720. #define LIBNAME "power3"
  721. #define CORENAME "POWER3"
  722. #endif
  723. #ifdef FORCE_POWER4
  724. #define FORCE
  725. #define ARCHITECTURE "POWER"
  726. #define SUBARCHITECTURE "POWER4"
  727. #define SUBDIRNAME "power"
  728. #define ARCHCONFIG "-DPOWER4 " \
  729. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=128 " \
  730. "-DL2_SIZE=1509949 -DL2_LINESIZE=128 " \
  731. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=6 "
  732. #define LIBNAME "power4"
  733. #define CORENAME "POWER4"
  734. #endif
  735. #ifdef FORCE_POWER5
  736. #define FORCE
  737. #define ARCHITECTURE "POWER"
  738. #define SUBARCHITECTURE "POWER5"
  739. #define SUBDIRNAME "power"
  740. #define ARCHCONFIG "-DPOWER5 " \
  741. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=128 " \
  742. "-DL2_SIZE=1509949 -DL2_LINESIZE=128 " \
  743. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=6 "
  744. #define LIBNAME "power5"
  745. #define CORENAME "POWER5"
  746. #endif
  747. #if defined(FORCE_POWER6) || defined(FORCE_POWER7)
  748. #define FORCE
  749. #define ARCHITECTURE "POWER"
  750. #define SUBARCHITECTURE "POWER6"
  751. #define SUBDIRNAME "power"
  752. #define ARCHCONFIG "-DPOWER6 " \
  753. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=128 " \
  754. "-DL2_SIZE=4194304 -DL2_LINESIZE=128 " \
  755. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  756. #define LIBNAME "power6"
  757. #define CORENAME "POWER6"
  758. #endif
  759. #if defined(FORCE_POWER8)
  760. #define FORCE
  761. #define ARCHITECTURE "POWER"
  762. #define SUBARCHITECTURE "POWER8"
  763. #define SUBDIRNAME "power"
  764. #define ARCHCONFIG "-DPOWER8 " \
  765. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=128 " \
  766. "-DL2_SIZE=4194304 -DL2_LINESIZE=128 " \
  767. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  768. #define LIBNAME "power8"
  769. #define CORENAME "POWER8"
  770. #endif
  771. #if defined(FORCE_POWER9)
  772. #define FORCE
  773. #define ARCHITECTURE "POWER"
  774. #define SUBARCHITECTURE "POWER9"
  775. #define SUBDIRNAME "power"
  776. #define ARCHCONFIG "-DPOWER9 " \
  777. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=128 " \
  778. "-DL2_SIZE=4194304 -DL2_LINESIZE=128 " \
  779. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  780. #define LIBNAME "power9"
  781. #define CORENAME "POWER9"
  782. #endif
  783. #if defined(FORCE_POWER10)
  784. #define FORCE
  785. #define ARCHITECTURE "POWER"
  786. #define SUBARCHITECTURE "POWER10"
  787. #define SUBDIRNAME "power"
  788. #define ARCHCONFIG "-DPOWER10 " \
  789. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=128 " \
  790. "-DL2_SIZE=4194304 -DL2_LINESIZE=128 " \
  791. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  792. #define LIBNAME "power10"
  793. #define CORENAME "POWER10"
  794. #endif
  795. #ifdef FORCE_PPCG4
  796. #define FORCE
  797. #define ARCHITECTURE "POWER"
  798. #define SUBARCHITECTURE "PPCG4"
  799. #define SUBDIRNAME "power"
  800. #define ARCHCONFIG "-DPPCG4 " \
  801. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=32 " \
  802. "-DL2_SIZE=262144 -DL2_LINESIZE=32 " \
  803. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  804. #define LIBNAME "ppcg4"
  805. #define CORENAME "PPCG4"
  806. #endif
  807. #ifdef FORCE_PPC970
  808. #define FORCE
  809. #define ARCHITECTURE "POWER"
  810. #define SUBARCHITECTURE "PPC970"
  811. #define SUBDIRNAME "power"
  812. #define ARCHCONFIG "-DPPC970 " \
  813. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=128 " \
  814. "-DL2_SIZE=512488 -DL2_LINESIZE=128 " \
  815. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  816. #define LIBNAME "ppc970"
  817. #define CORENAME "PPC970"
  818. #endif
  819. #ifdef FORCE_PPC970MP
  820. #define FORCE
  821. #define ARCHITECTURE "POWER"
  822. #define SUBARCHITECTURE "PPC970"
  823. #define SUBDIRNAME "power"
  824. #define ARCHCONFIG "-DPPC970 " \
  825. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=128 " \
  826. "-DL2_SIZE=1024976 -DL2_LINESIZE=128 " \
  827. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  828. #define LIBNAME "ppc970mp"
  829. #define CORENAME "PPC970"
  830. #endif
  831. #ifdef FORCE_PPC440
  832. #define FORCE
  833. #define ARCHITECTURE "POWER"
  834. #define SUBARCHITECTURE "PPC440"
  835. #define SUBDIRNAME "power"
  836. #define ARCHCONFIG "-DPPC440 " \
  837. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=32 " \
  838. "-DL2_SIZE=16384 -DL2_LINESIZE=128 " \
  839. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=16 "
  840. #define LIBNAME "ppc440"
  841. #define CORENAME "PPC440"
  842. #endif
  843. #ifdef FORCE_PPC440FP2
  844. #define FORCE
  845. #define ARCHITECTURE "POWER"
  846. #define SUBARCHITECTURE "PPC440FP2"
  847. #define SUBDIRNAME "power"
  848. #define ARCHCONFIG "-DPPC440FP2 " \
  849. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=32 " \
  850. "-DL2_SIZE=16384 -DL2_LINESIZE=128 " \
  851. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=16 "
  852. #define LIBNAME "ppc440FP2"
  853. #define CORENAME "PPC440FP2"
  854. #endif
  855. #ifdef FORCE_CELL
  856. #define FORCE
  857. #define ARCHITECTURE "POWER"
  858. #define SUBARCHITECTURE "CELL"
  859. #define SUBDIRNAME "power"
  860. #define ARCHCONFIG "-DCELL " \
  861. "-DL1_DATA_SIZE=262144 -DL1_DATA_LINESIZE=128 " \
  862. "-DL2_SIZE=512488 -DL2_LINESIZE=128 " \
  863. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  864. #define LIBNAME "cell"
  865. #define CORENAME "CELL"
  866. #endif
  867. #ifdef FORCE_MIPS64_GENERIC
  868. #define FORCE
  869. #define ARCHITECTURE "MIPS"
  870. #define SUBARCHITECTURE "MIPS64_GENERIC"
  871. #define SUBDIRNAME "mips64"
  872. #define ARCHCONFIG "-DMIPS64_GENERIC " \
  873. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=32 " \
  874. "-DL2_SIZE=1048576 -DL2_LINESIZE=32 " \
  875. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  876. #define LIBNAME "mips64_generic"
  877. #define CORENAME "MIPS64_GENERIC"
  878. #else
  879. #endif
  880. #ifdef FORCE_SICORTEX
  881. #define FORCE
  882. #define ARCHITECTURE "MIPS"
  883. #define SUBARCHITECTURE "SICORTEX"
  884. #define SUBDIRNAME "mips"
  885. #define ARCHCONFIG "-DSICORTEX " \
  886. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=32 " \
  887. "-DL2_SIZE=512488 -DL2_LINESIZE=32 " \
  888. "-DDTB_DEFAULT_ENTRIES=32 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  889. #define LIBNAME "mips"
  890. #define CORENAME "sicortex"
  891. #endif
  892. #if defined FORCE_LOONGSON3R3 || defined FORCE_LOONGSON3A || defined FORCE_LOONGSON3B
  893. #define FORCE
  894. #define ARCHITECTURE "MIPS"
  895. #define SUBARCHITECTURE "LOONGSON3R3"
  896. #define SUBDIRNAME "mips64"
  897. #define ARCHCONFIG "-DLOONGSON3R3 " \
  898. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=32 " \
  899. "-DL2_SIZE=512488 -DL2_LINESIZE=32 " \
  900. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=4 "
  901. #define LIBNAME "loongson3r3"
  902. #define CORENAME "LOONGSON3R3"
  903. #else
  904. #endif
  905. #ifdef FORCE_LOONGSON3R4
  906. #define FORCE
  907. #define ARCHITECTURE "MIPS"
  908. #define SUBARCHITECTURE "LOONGSON3R4"
  909. #define SUBDIRNAME "mips64"
  910. #define ARCHCONFIG "-DLOONGSON3R4 " \
  911. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=32 " \
  912. "-DL2_SIZE=512488 -DL2_LINESIZE=32 " \
  913. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=4 -DHAVE_MSA"
  914. #define LIBNAME "loongson3r4"
  915. #define CORENAME "LOONGSON3R4"
  916. #else
  917. #endif
  918. #ifdef FORCE_LOONGSON3R5
  919. #define FORCE
  920. #define ARCHITECTURE "LOONGARCH"
  921. #define SUBARCHITECTURE "LOONGSON3R5"
  922. #define SUBDIRNAME "loongarch64"
  923. #define ARCHCONFIG "-DLOONGSON3R5 " \
  924. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 " \
  925. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 " \
  926. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=16 -DHAVE_MSA"
  927. #define LIBNAME "loongson3r5"
  928. #define CORENAME "LOONGSON3R5"
  929. #else
  930. #endif
  931. #ifdef FORCE_LOONGSON2K1000
  932. #define FORCE
  933. #define ARCHITECTURE "LOONGARCH"
  934. #define SUBARCHITECTURE "LOONGSON2K1000"
  935. #define SUBDIRNAME "loongarch64"
  936. #define ARCHCONFIG "-DLOONGSON2K1000 " \
  937. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 " \
  938. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  939. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=16 -DHAVE_MSA"
  940. #define LIBNAME "loongson2k1000"
  941. #define CORENAME "LOONGSON2K1000"
  942. #else
  943. #endif
  944. #ifdef FORCE_LOONGSONGENERIC
  945. #define FORCE
  946. #define ARCHITECTURE "LOONGARCH"
  947. #define SUBARCHITECTURE "LOONGSONGENERIC"
  948. #define SUBDIRNAME "loongarch64"
  949. #define ARCHCONFIG "-DLOONGSONGENERIC " \
  950. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 " \
  951. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  952. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=16 -DHAVE_MSA"
  953. #define LIBNAME "loongsongeneric"
  954. #define CORENAME "LOONGSONGENERIC"
  955. #else
  956. #endif
  957. #ifdef FORCE_I6400
  958. #define FORCE
  959. #define ARCHITECTURE "MIPS"
  960. #define SUBARCHITECTURE "I6400"
  961. #define SUBDIRNAME "mips64"
  962. #define ARCHCONFIG "-DI6400 " \
  963. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=32 " \
  964. "-DL2_SIZE=1048576 -DL2_LINESIZE=32 " \
  965. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 -DHAVE_MSA "
  966. #define LIBNAME "i6400"
  967. #define CORENAME "I6400"
  968. #else
  969. #endif
  970. #ifdef FORCE_P6600
  971. #define FORCE
  972. #define ARCHITECTURE "MIPS"
  973. #define SUBARCHITECTURE "P6600"
  974. #define SUBDIRNAME "mips64"
  975. #define ARCHCONFIG "-DP6600 " \
  976. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=32 " \
  977. "-DL2_SIZE=1048576 -DL2_LINESIZE=32 " \
  978. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  979. #define LIBNAME "p6600"
  980. #define CORENAME "P6600"
  981. #else
  982. #endif
  983. #ifdef FORCE_P5600
  984. #define FORCE
  985. #define ARCHITECTURE "MIPS"
  986. #define SUBARCHITECTURE "P5600"
  987. #define SUBDIRNAME "mips"
  988. #define ARCHCONFIG "-DP5600 " \
  989. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=32 " \
  990. "-DL2_SIZE=1048576 -DL2_LINESIZE=32 " \
  991. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8"
  992. #define LIBNAME "p5600"
  993. #define CORENAME "P5600"
  994. #else
  995. #endif
  996. #ifdef FORCE_MIPS1004K
  997. #define FORCE
  998. #define ARCHITECTURE "MIPS"
  999. #define SUBARCHITECTURE "MIPS1004K"
  1000. #define SUBDIRNAME "mips"
  1001. #define ARCHCONFIG "-DMIPS1004K " \
  1002. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=32 " \
  1003. "-DL2_SIZE=262144 -DL2_LINESIZE=32 " \
  1004. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8"
  1005. #define LIBNAME "mips1004K"
  1006. #define CORENAME "MIPS1004K"
  1007. #else
  1008. #endif
  1009. #ifdef FORCE_MIPS24K
  1010. #define FORCE
  1011. #define ARCHITECTURE "MIPS"
  1012. #define SUBARCHITECTURE "MIPS24K"
  1013. #define SUBDIRNAME "mips"
  1014. #define ARCHCONFIG "-DMIPS24K " \
  1015. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=32 " \
  1016. "-DL2_SIZE=32768 -DL2_LINESIZE=32 " \
  1017. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8"
  1018. #define LIBNAME "mips24K"
  1019. #define CORENAME "MIPS24K"
  1020. #else
  1021. #endif
  1022. #ifdef FORCE_I6500
  1023. #define FORCE
  1024. #define ARCHITECTURE "MIPS"
  1025. #define SUBARCHITECTURE "I6500"
  1026. #define SUBDIRNAME "mips64"
  1027. #define ARCHCONFIG "-DI6500 " \
  1028. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=32 " \
  1029. "-DL2_SIZE=1048576 -DL2_LINESIZE=32 " \
  1030. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 -DHAVE_MSA"
  1031. #define LIBNAME "i6500"
  1032. #define CORENAME "I6500"
  1033. #else
  1034. #endif
  1035. #ifdef FORCE_ITANIUM2
  1036. #define FORCE
  1037. #define ARCHITECTURE "IA64"
  1038. #define SUBARCHITECTURE "ITANIUM2"
  1039. #define SUBDIRNAME "ia64"
  1040. #define ARCHCONFIG "-DITANIUM2 " \
  1041. "-DL1_DATA_SIZE=262144 -DL1_DATA_LINESIZE=128 " \
  1042. "-DL2_SIZE=1572864 -DL2_LINESIZE=128 -DDTB_SIZE=16384 -DDTB_DEFAULT_ENTRIES=128 "
  1043. #define LIBNAME "itanium2"
  1044. #define CORENAME "itanium2"
  1045. #endif
  1046. #ifdef FORCE_SPARC
  1047. #define FORCE
  1048. #define ARCHITECTURE "SPARC"
  1049. #define SUBARCHITECTURE "SPARC"
  1050. #define SUBDIRNAME "sparc"
  1051. #define ARCHCONFIG "-DSPARC -DV9 " \
  1052. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 " \
  1053. "-DL2_SIZE=1572864 -DL2_LINESIZE=64 -DDTB_SIZE=8192 -DDTB_DEFAULT_ENTRIES=64 "
  1054. #define LIBNAME "sparc"
  1055. #define CORENAME "sparc"
  1056. #endif
  1057. #ifdef FORCE_SPARCV7
  1058. #define FORCE
  1059. #define ARCHITECTURE "SPARC"
  1060. #define SUBARCHITECTURE "SPARC"
  1061. #define SUBDIRNAME "sparc"
  1062. #define ARCHCONFIG "-DSPARC -DV7 " \
  1063. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 " \
  1064. "-DL2_SIZE=1572864 -DL2_LINESIZE=64 -DDTB_SIZE=8192 -DDTB_DEFAULT_ENTRIES=64 "
  1065. #define LIBNAME "sparcv7"
  1066. #define CORENAME "sparcv7"
  1067. #endif
  1068. #ifdef FORCE_GENERIC
  1069. #define FORCE
  1070. #define ARCHITECTURE "GENERIC"
  1071. #define SUBARCHITECTURE "GENERIC"
  1072. #define SUBDIRNAME "generic"
  1073. #define ARCHCONFIG "-DGENERIC " \
  1074. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=128 " \
  1075. "-DL2_SIZE=512488 -DL2_LINESIZE=128 " \
  1076. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  1077. #define LIBNAME "generic"
  1078. #define CORENAME "generic"
  1079. #endif
  1080. #ifdef FORCE_ARMV7
  1081. #define FORCE
  1082. #define ARCHITECTURE "ARM"
  1083. #define SUBARCHITECTURE "ARMV7"
  1084. #define SUBDIRNAME "arm"
  1085. #define ARCHCONFIG "-DARMV7 " \
  1086. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=32 " \
  1087. "-DL2_SIZE=512488 -DL2_LINESIZE=32 " \
  1088. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=4 " \
  1089. "-DHAVE_VFPV3 -DHAVE_VFP"
  1090. #define LIBNAME "armv7"
  1091. #define CORENAME "ARMV7"
  1092. #else
  1093. #endif
  1094. #ifdef FORCE_CORTEXA9
  1095. #define FORCE
  1096. #define ARCHITECTURE "ARM"
  1097. #define SUBARCHITECTURE "CORTEXA9"
  1098. #define SUBDIRNAME "arm"
  1099. #define ARCHCONFIG "-DCORTEXA9 -DARMV7 " \
  1100. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=32 " \
  1101. "-DL2_SIZE=1048576 -DL2_LINESIZE=32 " \
  1102. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=4 " \
  1103. "-DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON"
  1104. #define LIBNAME "cortexa9"
  1105. #define CORENAME "CORTEXA9"
  1106. #else
  1107. #endif
  1108. #ifdef FORCE_RISCV64_GENERIC
  1109. #define FORCE
  1110. #define ARCHITECTURE "RISCV64"
  1111. #define SUBARCHITECTURE "RISCV64_GENERIC"
  1112. #define SUBDIRNAME "riscv64"
  1113. #define ARCHCONFIG "-DRISCV64_GENERIC " \
  1114. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=32 " \
  1115. "-DL2_SIZE=1048576 -DL2_LINESIZE=32 " \
  1116. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=4 "
  1117. #define LIBNAME "riscv64_generic"
  1118. #define CORENAME "RISCV64_GENERIC"
  1119. #else
  1120. #endif
  1121. #ifdef FORCE_CORTEXA15
  1122. #define FORCE
  1123. #define ARCHITECTURE "ARM"
  1124. #define SUBARCHITECTURE "CORTEXA15"
  1125. #define SUBDIRNAME "arm"
  1126. #define ARCHCONFIG "-DCORTEXA15 -DARMV7 " \
  1127. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=32 " \
  1128. "-DL2_SIZE=1048576 -DL2_LINESIZE=32 " \
  1129. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=4 " \
  1130. "-DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON"
  1131. #define LIBNAME "cortexa15"
  1132. #define CORENAME "CORTEXA15"
  1133. #else
  1134. #endif
  1135. #ifdef FORCE_ARMV6
  1136. #define FORCE
  1137. #define ARCHITECTURE "ARM"
  1138. #define SUBARCHITECTURE "ARMV6"
  1139. #define SUBDIRNAME "arm"
  1140. #define ARCHCONFIG "-DARMV6 " \
  1141. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=32 " \
  1142. "-DL2_SIZE=512488 -DL2_LINESIZE=32 " \
  1143. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=4 " \
  1144. "-DHAVE_VFP"
  1145. #define LIBNAME "armv6"
  1146. #define CORENAME "ARMV6"
  1147. #else
  1148. #endif
  1149. #ifdef FORCE_ARMV5
  1150. #define FORCE
  1151. #define ARCHITECTURE "ARM"
  1152. #define SUBARCHITECTURE "ARMV5"
  1153. #define SUBDIRNAME "arm"
  1154. #define ARCHCONFIG "-DARMV5 " \
  1155. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=32 " \
  1156. "-DL2_SIZE=512488 -DL2_LINESIZE=32 " \
  1157. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=4 "
  1158. #define LIBNAME "armv5"
  1159. #define CORENAME "ARMV5"
  1160. #else
  1161. #endif
  1162. #ifdef FORCE_ARMV8SVE
  1163. #define FORCE
  1164. #define ARCHITECTURE "ARM64"
  1165. #define SUBARCHITECTURE "ARMV8SVE"
  1166. #define SUBDIRNAME "arm64"
  1167. #define ARCHCONFIG "-DARMV8SVE " \
  1168. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  1169. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  1170. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=32 " \
  1171. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DHAVE_SVE -DARMV8"
  1172. #define LIBNAME "armv8sve"
  1173. #define CORENAME "ARMV8SVE"
  1174. #endif
  1175. #ifdef FORCE_ARMV8
  1176. #define FORCE
  1177. #define ARCHITECTURE "ARM64"
  1178. #define SUBARCHITECTURE "ARMV8"
  1179. #define SUBDIRNAME "arm64"
  1180. #define ARCHCONFIG "-DARMV8 " \
  1181. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  1182. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  1183. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=32 " \
  1184. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1185. #define LIBNAME "armv8"
  1186. #define CORENAME "ARMV8"
  1187. #endif
  1188. #ifdef FORCE_CORTEXA53
  1189. #define FORCE
  1190. #define ARCHITECTURE "ARM64"
  1191. #define SUBARCHITECTURE "CORTEXA53"
  1192. #define SUBDIRNAME "arm64"
  1193. #define ARCHCONFIG "-DCORTEXA53 " \
  1194. "-DL1_CODE_SIZE=32768 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=3 " \
  1195. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=2 " \
  1196. "-DL2_SIZE=262144 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=16 " \
  1197. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1198. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1199. #define LIBNAME "cortexa53"
  1200. #define CORENAME "CORTEXA53"
  1201. #endif
  1202. #ifdef FORCE_CORTEXA57
  1203. #define FORCE
  1204. #define ARCHITECTURE "ARM64"
  1205. #define SUBARCHITECTURE "CORTEXA57"
  1206. #define SUBDIRNAME "arm64"
  1207. #define ARCHCONFIG "-DCORTEXA57 " \
  1208. "-DL1_CODE_SIZE=49152 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=3 " \
  1209. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=2 " \
  1210. "-DL2_SIZE=2097152 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=16 " \
  1211. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1212. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1213. #define LIBNAME "cortexa57"
  1214. #define CORENAME "CORTEXA57"
  1215. #endif
  1216. #ifdef FORCE_CORTEXA72
  1217. #define FORCE
  1218. #define ARCHITECTURE "ARM64"
  1219. #define SUBARCHITECTURE "CORTEXA72"
  1220. #define SUBDIRNAME "arm64"
  1221. #define ARCHCONFIG "-DCORTEXA72 " \
  1222. "-DL1_CODE_SIZE=49152 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=3 " \
  1223. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=2 " \
  1224. "-DL2_SIZE=2097152 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=16 " \
  1225. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1226. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1227. #define LIBNAME "cortexa72"
  1228. #define CORENAME "CORTEXA72"
  1229. #endif
  1230. #ifdef FORCE_CORTEXA73
  1231. #define FORCE
  1232. #define ARCHITECTURE "ARM64"
  1233. #define SUBARCHITECTURE "CORTEXA73"
  1234. #define SUBDIRNAME "arm64"
  1235. #define ARCHCONFIG "-DCORTEXA73 " \
  1236. "-DL1_CODE_SIZE=49152 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=3 " \
  1237. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=2 " \
  1238. "-DL2_SIZE=2097152 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=16 " \
  1239. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1240. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1241. #define LIBNAME "cortexa73"
  1242. #define CORENAME "CORTEXA73"
  1243. #endif
  1244. #ifdef FORCE_CORTEXX1
  1245. #define FORCE
  1246. #define ARCHITECTURE "ARM64"
  1247. #define SUBARCHITECTURE "CORTEXX1"
  1248. #define SUBDIRNAME "arm64"
  1249. #define ARCHCONFIG "-DCORTEXX1 " \
  1250. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  1251. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  1252. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=32 " \
  1253. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1254. #define LIBNAME "cortexx1"
  1255. #define CORENAME "CORTEXX1"
  1256. #endif
  1257. #ifdef FORCE_CORTEXX2
  1258. #define FORCE
  1259. #define ARCHITECTURE "ARM64"
  1260. #define SUBARCHITECTURE "CORTEXX2"
  1261. #define SUBDIRNAME "arm64"
  1262. #define ARCHCONFIG "-DCORTEXX2 " \
  1263. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  1264. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  1265. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=32 " \
  1266. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DHAVE_SVE -DARMV8 -DARMV9"
  1267. #define LIBNAME "cortexx2"
  1268. #define CORENAME "CORTEXX2"
  1269. #endif
  1270. #ifdef FORCE_CORTEXA510
  1271. #define FORCE
  1272. #define ARCHITECTURE "ARM64"
  1273. #define SUBARCHITECTURE "CORTEXA510"
  1274. #define SUBDIRNAME "arm64"
  1275. #define ARCHCONFIG "-DCORTEXA510 " \
  1276. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  1277. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  1278. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=32 " \
  1279. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DHAVE_SVE -DARMV8 -DARMV9"
  1280. #define LIBNAME "cortexa510"
  1281. #define CORENAME "CORTEXA510"
  1282. #endif
  1283. #ifdef FORCE_CORTEXA710
  1284. #define FORCE
  1285. #define ARCHITECTURE "ARM64"
  1286. #define SUBARCHITECTURE "CORTEXA710"
  1287. #define SUBDIRNAME "arm64"
  1288. #define ARCHCONFIG "-DCORTEXA710 " \
  1289. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  1290. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  1291. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=32 " \
  1292. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DHAVE_SVE -DARMV8 -DARMV9"
  1293. #define LIBNAME "cortexa710"
  1294. #define CORENAME "CORTEXA710"
  1295. #endif
  1296. #ifdef FORCE_NEOVERSEN1
  1297. #define FORCE
  1298. #define ARCHITECTURE "ARM64"
  1299. #define SUBARCHITECTURE "NEOVERSEN1"
  1300. #define SUBDIRNAME "arm64"
  1301. #define ARCHCONFIG "-DNEOVERSEN1 " \
  1302. "-DL1_CODE_SIZE=65536 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=4 " \
  1303. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=4 " \
  1304. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=16 " \
  1305. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1306. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8 " \
  1307. "-march=armv8.2-a -mtune=neoverse-n1"
  1308. #define LIBNAME "neoversen1"
  1309. #define CORENAME "NEOVERSEN1"
  1310. #endif
  1311. #ifdef FORCE_NEOVERSEV1
  1312. #define FORCE
  1313. #define ARCHITECTURE "ARM64"
  1314. #define SUBARCHITECTURE "NEOVERSEV1"
  1315. #define SUBDIRNAME "arm64"
  1316. #define ARCHCONFIG "-DNEOVERSEV1 " \
  1317. "-DL1_CODE_SIZE=65536 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=4 " \
  1318. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=4 " \
  1319. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=16 " \
  1320. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1321. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DHAVE_SVE -DARMV8 " \
  1322. "-march=armv8.4-a+sve -mtune=neoverse-v1"
  1323. #define LIBNAME "neoversev1"
  1324. #define CORENAME "NEOVERSEV1"
  1325. #endif
  1326. #ifdef FORCE_NEOVERSEN2
  1327. #define FORCE
  1328. #define ARCHITECTURE "ARM64"
  1329. #define SUBARCHITECTURE "NEOVERSEN2"
  1330. #define SUBDIRNAME "arm64"
  1331. #define ARCHCONFIG "-DNEOVERSEN2 " \
  1332. "-DL1_CODE_SIZE=65536 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=4 " \
  1333. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=4 " \
  1334. "-DL2_SIZE=1048576 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=16 " \
  1335. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1336. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DHAVE_SVE -DARMV8 " \
  1337. "-march=armv8.5-a -mtune=neoverse-n2"
  1338. #define LIBNAME "neoversen2"
  1339. #define CORENAME "NEOVERSEN2"
  1340. #endif
  1341. #ifdef FORCE_CORTEXA55
  1342. #define FORCE
  1343. #define ARCHITECTURE "ARM64"
  1344. #define SUBARCHITECTURE "CORTEXA55"
  1345. #define SUBDIRNAME "arm64"
  1346. #define ARCHCONFIG "-DCORTEXA55 " \
  1347. "-DL1_CODE_SIZE=16384 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=3 " \
  1348. "-DL1_DATA_SIZE=16384 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=2 " \
  1349. "-DL2_SIZE=65536 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=16 " \
  1350. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1351. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1352. #define LIBNAME "cortexa55"
  1353. #define CORENAME "CORTEXA55"
  1354. #endif
  1355. #ifdef FORCE_FALKOR
  1356. #define FORCE
  1357. #define ARCHITECTURE "ARM64"
  1358. #define SUBARCHITECTURE "FALKOR"
  1359. #define SUBDIRNAME "arm64"
  1360. #define ARCHCONFIG "-DFALKOR " \
  1361. "-DL1_CODE_SIZE=49152 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=3 " \
  1362. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=2 " \
  1363. "-DL2_SIZE=2097152 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=16 " \
  1364. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1365. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1366. #define LIBNAME "falkor"
  1367. #define CORENAME "FALKOR"
  1368. #endif
  1369. #ifdef FORCE_THUNDERX
  1370. #define FORCE
  1371. #define ARCHITECTURE "ARM64"
  1372. #define SUBARCHITECTURE "THUNDERX"
  1373. #define SUBDIRNAME "arm64"
  1374. #define ARCHCONFIG "-DTHUNDERX " \
  1375. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=128 " \
  1376. "-DL2_SIZE=16777216 -DL2_LINESIZE=128 -DL2_ASSOCIATIVE=16 " \
  1377. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1378. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1379. #define LIBNAME "thunderx"
  1380. #define CORENAME "THUNDERX"
  1381. #endif
  1382. #ifdef FORCE_THUNDERX2T99
  1383. #define ARMV8
  1384. #define FORCE
  1385. #define ARCHITECTURE "ARM64"
  1386. #define SUBARCHITECTURE "THUNDERX2T99"
  1387. #define SUBDIRNAME "arm64"
  1388. #define ARCHCONFIG "-DTHUNDERX2T99 " \
  1389. "-DL1_CODE_SIZE=32768 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=8 " \
  1390. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=8 " \
  1391. "-DL2_SIZE=262144 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=8 " \
  1392. "-DL3_SIZE=33554432 -DL3_LINESIZE=64 -DL3_ASSOCIATIVE=32 " \
  1393. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1394. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1395. #define LIBNAME "thunderx2t99"
  1396. #define CORENAME "THUNDERX2T99"
  1397. #endif
  1398. #ifdef FORCE_TSV110
  1399. #define FORCE
  1400. #define ARCHITECTURE "ARM64"
  1401. #define SUBARCHITECTURE "TSV110"
  1402. #define SUBDIRNAME "arm64"
  1403. #define ARCHCONFIG "-DTSV110 " \
  1404. "-DL1_CODE_SIZE=65536 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=4 " \
  1405. "-DL1_DATA_SIZE=65536 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=4 " \
  1406. "-DL2_SIZE=524288 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=8 " \
  1407. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1408. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1409. #define LIBNAME "tsv110"
  1410. #define CORENAME "TSV110"
  1411. #endif
  1412. #ifdef FORCE_EMAG8180
  1413. #define ARMV8
  1414. #define FORCE
  1415. #define ARCHITECTURE "ARM64"
  1416. #define SUBARCHITECTURE "EMAG8180"
  1417. #define SUBDIRNAME "arm64"
  1418. #define ARCHCONFIG "-DEMAG8180 " \
  1419. "-DL1_CODE_SIZE=32768 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=8 " \
  1420. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=8 " \
  1421. "-DL2_SIZE=262144 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=8 " \
  1422. "-DL3_SIZE=33554432 -DL3_LINESIZE=64 -DL3_ASSOCIATIVE=32 " \
  1423. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1424. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1425. #define LIBNAME "emag8180"
  1426. #define CORENAME "EMAG8180"
  1427. #endif
  1428. #ifdef FORCE_THUNDERX3T110
  1429. #define ARMV8
  1430. #define FORCE
  1431. #define ARCHITECTURE "ARM64"
  1432. #define SUBARCHITECTURE "THUNDERX3T110"
  1433. #define SUBDIRNAME "arm64"
  1434. #define ARCHCONFIG "-DTHUNDERX3T110 " \
  1435. "-DL1_CODE_SIZE=65536 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=8 " \
  1436. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=8 " \
  1437. "-DL2_SIZE=524288 -DL2_LINESIZE=64 -DL2_ASSOCIATIVE=8 " \
  1438. "-DL3_SIZE=94371840 -DL3_LINESIZE=64 -DL3_ASSOCIATIVE=32 " \
  1439. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1440. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1441. #define LIBNAME "thunderx3t110"
  1442. #define CORENAME "THUNDERX3T110"
  1443. #endif
  1444. #ifdef FORCE_VORTEX
  1445. #define FORCE
  1446. #define ARCHITECTURE "ARM64"
  1447. #define SUBARCHITECTURE "VORTEX"
  1448. #define SUBDIRNAME "arm64"
  1449. #define ARCHCONFIG "-DVORTEX " \
  1450. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  1451. "-DL2_SIZE=262144 -DL2_LINESIZE=64 " \
  1452. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=32 " \
  1453. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1454. #define LIBNAME "vortex"
  1455. #define CORENAME "VORTEX"
  1456. #endif
  1457. #ifdef FORCE_A64FX
  1458. #define ARMV8
  1459. #define FORCE
  1460. #define ARCHITECTURE "ARM64"
  1461. #define SUBARCHITECTURE "A64FX"
  1462. #define SUBDIRNAME "arm64"
  1463. #define ARCHCONFIG "-DA64FX " \
  1464. "-DL1_CODE_SIZE=65536 -DL1_CODE_LINESIZE=256 -DL1_CODE_ASSOCIATIVE=8 " \
  1465. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=256 -DL1_DATA_ASSOCIATIVE=8 " \
  1466. "-DL2_SIZE=8388608 -DL2_LINESIZE=256 -DL2_ASSOCIATIVE=8 " \
  1467. "-DL3_SIZE=0 -DL3_LINESIZE=0 -DL3_ASSOCIATIVE=0 " \
  1468. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1469. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DHAVE_SVE -DARMV8"
  1470. #define LIBNAME "a64fx"
  1471. #define CORENAME "A64FX"
  1472. #endif
  1473. #ifdef FORCE_FT2000
  1474. #define ARMV8
  1475. #define FORCE
  1476. #define ARCHITECTURE "ARM64"
  1477. #define SUBARCHITECTURE "FT2000"
  1478. #define SUBDIRNAME "arm64"
  1479. #define ARCHCONFIG "-DFT2000 " \
  1480. "-DL1_CODE_SIZE=32768 -DL1_CODE_LINESIZE=64 -DL1_CODE_ASSOCIATIVE=8 " \
  1481. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 -DL1_DATA_ASSOCIATIVE=8 " \
  1482. "-DL2_SIZE=33554426-DL2_LINESIZE=64 -DL2_ASSOCIATIVE=8 " \
  1483. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 " \
  1484. "-DHAVE_VFPV4 -DHAVE_VFPV3 -DHAVE_VFP -DHAVE_NEON -DARMV8"
  1485. #define LIBNAME "ft2000"
  1486. #define CORENAME "FT2000"
  1487. #endif
  1488. #ifdef FORCE_ZARCH_GENERIC
  1489. #define FORCE
  1490. #define ARCHITECTURE "ZARCH"
  1491. #define SUBARCHITECTURE "ZARCH_GENERIC"
  1492. #define ARCHCONFIG "-DZARCH_GENERIC " \
  1493. "-DDTB_DEFAULT_ENTRIES=64"
  1494. #define LIBNAME "zarch_generic"
  1495. #define CORENAME "ZARCH_GENERIC"
  1496. #endif
  1497. #ifdef FORCE_Z13
  1498. #define FORCE
  1499. #define ARCHITECTURE "ZARCH"
  1500. #define SUBARCHITECTURE "Z13"
  1501. #define ARCHCONFIG "-DZ13 " \
  1502. "-DDTB_DEFAULT_ENTRIES=64"
  1503. #define LIBNAME "z13"
  1504. #define CORENAME "Z13"
  1505. #endif
  1506. #ifdef FORCE_Z14
  1507. #define FORCE
  1508. #define ARCHITECTURE "ZARCH"
  1509. #define SUBARCHITECTURE "Z14"
  1510. #define ARCHCONFIG "-DZ14 " \
  1511. "-DDTB_DEFAULT_ENTRIES=64"
  1512. #define LIBNAME "z14"
  1513. #define CORENAME "Z14"
  1514. #endif
  1515. #ifdef FORCE_EV4
  1516. #define FORCE
  1517. #define ARCHITECTURE "ALPHA"
  1518. #define SUBARCHITECTURE "ev4"
  1519. #define ARCHCONFIG "-DEV4 " \
  1520. "-DL1_DATA_SIZE=16384 -DL1_DATA_LINESIZE=32 " \
  1521. "-DL2_SIZE=2097152 -DL2_LINESIZE=32 " \
  1522. "-DDTB_DEFAULT_ENTRIES=32 -DDTB_SIZE=8192 "
  1523. #define LIBNAME "ev4"
  1524. #define CORENAME "EV4"
  1525. #endif
  1526. #ifdef FORCE_EV5
  1527. #define FORCE
  1528. #define ARCHITECTURE "ALPHA"
  1529. #define SUBARCHITECTURE "ev5"
  1530. #define ARCHCONFIG "-DEV5 " \
  1531. "-DL1_DATA_SIZE=16384 -DL1_DATA_LINESIZE=32 " \
  1532. "-DL2_SIZE=2097152 -DL2_LINESIZE=64 " \
  1533. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=8192 "
  1534. #define LIBNAME "ev5"
  1535. #define CORENAME "EV5"
  1536. #endif
  1537. #ifdef FORCE_EV6
  1538. #define FORCE
  1539. #define ARCHITECTURE "ALPHA"
  1540. #define SUBARCHITECTURE "ev6"
  1541. #define ARCHCONFIG "-DEV6 " \
  1542. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=64 " \
  1543. "-DL2_SIZE=4194304 -DL2_LINESIZE=64 " \
  1544. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=8192 "
  1545. #define LIBNAME "ev6"
  1546. #define CORENAME "EV6"
  1547. #endif
  1548. #ifdef FORCE_C910V
  1549. #define FORCE
  1550. #define ARCHITECTURE "RISCV64"
  1551. #ifdef NO_RV64GV
  1552. #define SUBARCHITECTURE "RISCV64_GENERIC"
  1553. #define SUBDIRNAME "riscv64"
  1554. #define ARCHCONFIG "-DRISCV64_GENERIC " \
  1555. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=32 " \
  1556. "-DL2_SIZE=1048576 -DL2_LINESIZE=32 " \
  1557. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=4 "
  1558. #define LIBNAME "riscv64_generic"
  1559. #define CORENAME "RISCV64_GENERIC"
  1560. #else
  1561. #define SUBARCHITECTURE "C910V"
  1562. #define SUBDIRNAME "riscv64"
  1563. #define ARCHCONFIG "-DC910V " \
  1564. "-DL1_DATA_SIZE=32768 -DL1_DATA_LINESIZE=32 " \
  1565. "-DL2_SIZE=1048576 -DL2_LINESIZE=32 " \
  1566. "-DDTB_DEFAULT_ENTRIES=128 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=4 "
  1567. #define LIBNAME "c910v"
  1568. #define CORENAME "C910V"
  1569. #endif
  1570. #else
  1571. #endif
  1572. #if defined(FORCE_E2K) || defined(__e2k__)
  1573. #define FORCE
  1574. #define ARCHITECTURE "E2K"
  1575. #define ARCHCONFIG "-DGENERIC " \
  1576. "-DL1_DATA_SIZE=16384 -DL1_DATA_LINESIZE=64 " \
  1577. "-DL2_SIZE=524288 -DL2_LINESIZE=64 " \
  1578. "-DDTB_DEFAULT_ENTRIES=64 -DDTB_SIZE=4096 -DL2_ASSOCIATIVE=8 "
  1579. #define LIBNAME "generic"
  1580. #define CORENAME "generic"
  1581. #endif
  1582. #ifndef FORCE
  1583. #ifdef USER_TARGET
  1584. #error "The TARGET specified on the command line or in Makefile.rule is not supported. Please choose a target from TargetList.txt"
  1585. #endif
  1586. #if defined(__powerpc__) || defined(__powerpc) || defined(powerpc) || \
  1587. defined(__PPC__) || defined(PPC) || defined(_POWER) || defined(__POWERPC__)
  1588. #ifndef POWER
  1589. #define POWER
  1590. #endif
  1591. #define OPENBLAS_SUPPORTED
  1592. #endif
  1593. #if defined(__zarch__) || defined(__s390x__)
  1594. #define ZARCH
  1595. #include "cpuid_zarch.c"
  1596. #define OPENBLAS_SUPPORTED
  1597. #endif
  1598. #ifdef INTEL_AMD
  1599. #include "cpuid_x86.c"
  1600. #define OPENBLAS_SUPPORTED
  1601. #endif
  1602. #ifdef __ia64__
  1603. #include "cpuid_ia64.c"
  1604. #define OPENBLAS_SUPPORTED
  1605. #endif
  1606. #ifdef __alpha
  1607. #include "cpuid_alpha.c"
  1608. #define OPENBLAS_SUPPORTED
  1609. #endif
  1610. #ifdef POWER
  1611. #include "cpuid_power.c"
  1612. #define OPENBLAS_SUPPORTED
  1613. #endif
  1614. #ifdef sparc
  1615. #include "cpuid_sparc.c"
  1616. #define OPENBLAS_SUPPORTED
  1617. #endif
  1618. #ifdef __mips__
  1619. #ifdef __mips64
  1620. #include "cpuid_mips64.c"
  1621. #else
  1622. #include "cpuid_mips.c"
  1623. #endif
  1624. #define OPENBLAS_SUPPORTED
  1625. #endif
  1626. #ifdef __loongarch64
  1627. #include "cpuid_loongarch64.c"
  1628. #define OPENBLAS_SUPPORTED
  1629. #endif
  1630. #ifdef __riscv
  1631. #include "cpuid_riscv64.c"
  1632. #define OPENBLAS_SUPPORTED
  1633. #endif
  1634. #ifdef __arm__
  1635. #include "cpuid_arm.c"
  1636. #define OPENBLAS_SUPPORTED
  1637. #endif
  1638. #ifdef __aarch64__
  1639. #include "cpuid_arm64.c"
  1640. #define OPENBLAS_SUPPORTED
  1641. #endif
  1642. #ifndef OPENBLAS_SUPPORTED
  1643. #error "This arch/CPU is not supported by OpenBLAS."
  1644. #endif
  1645. #else
  1646. #endif
  1647. static int get_num_cores(void) {
  1648. int count;
  1649. #ifdef OS_WINDOWS
  1650. SYSTEM_INFO sysinfo;
  1651. #elif defined(__FreeBSD__) || defined(__OpenBSD__) || defined(__NetBSD__) || defined(__DragonFly__) || defined(__APPLE__)
  1652. int m[2];
  1653. size_t len;
  1654. #endif
  1655. #if defined(linux) || defined(__sun__)
  1656. //returns the number of processors which are currently online
  1657. count = sysconf(_SC_NPROCESSORS_CONF);
  1658. if (count <= 0) count = 2;
  1659. return count;
  1660. #elif defined(OS_WINDOWS)
  1661. GetSystemInfo(&sysinfo);
  1662. return sysinfo.dwNumberOfProcessors;
  1663. #elif defined(__FreeBSD__) || defined(__OpenBSD__) || defined(__NetBSD__) || defined(__DragonFly__) || defined(__APPLE__)
  1664. m[0] = CTL_HW;
  1665. m[1] = HW_NCPU;
  1666. len = sizeof(int);
  1667. sysctl(m, 2, &count, &len, NULL, 0);
  1668. if (count <= 0) count = 2;
  1669. return count;
  1670. #elif defined(AIX)
  1671. //returns the number of processors which are currently online
  1672. count = sysconf(_SC_NPROCESSORS_ONLN);
  1673. if (count <= 0) count = 2;
  1674. #else
  1675. return 2;
  1676. #endif
  1677. }
  1678. int main(int argc, char *argv[]){
  1679. #ifdef FORCE
  1680. char buffer[8192], *p, *q;
  1681. int length;
  1682. #endif
  1683. if (argc == 1) return 0;
  1684. switch (argv[1][0]) {
  1685. case '0' : /* for Makefile */
  1686. #ifdef FORCE
  1687. printf("CORE=%s\n", CORENAME);
  1688. #else
  1689. #if defined(INTEL_AMD) || defined(POWER) || defined(__mips__) || defined(__arm__) || defined(__aarch64__) || defined(ZARCH) || defined(sparc) || defined(__loongarch__) || defined(__riscv) || defined(__alpha__)
  1690. printf("CORE=%s\n", get_corename());
  1691. #endif
  1692. #endif
  1693. #ifdef FORCE
  1694. printf("LIBCORE=%s\n", LIBNAME);
  1695. #else
  1696. printf("LIBCORE=");
  1697. get_libname();
  1698. printf("\n");
  1699. #endif
  1700. printf("NUM_CORES=%d\n", get_num_cores());
  1701. #if defined(__arm__)
  1702. #if !defined(FORCE)
  1703. fprintf(stderr,"get features!\n");
  1704. get_features();
  1705. #else
  1706. fprintf(stderr,"split archconfig!\n");
  1707. sprintf(buffer, "%s", ARCHCONFIG);
  1708. p = &buffer[0];
  1709. while (*p) {
  1710. if ((*p == '-') && (*(p + 1) == 'D')) {
  1711. p += 2;
  1712. if (*p != 'H') {
  1713. while( (*p != ' ') && (*p != '-') && (*p != '\0') && (*p != '\n')) {p++; }
  1714. if (*p == '-') continue;
  1715. }
  1716. while ((*p != ' ') && (*p != '\0')) {
  1717. if (*p == '=') {
  1718. printf("=");
  1719. p ++;
  1720. while ((*p != ' ') && (*p != '\0')) {
  1721. printf("%c", *p);
  1722. p ++;
  1723. }
  1724. } else {
  1725. printf("%c", *p);
  1726. p ++;
  1727. if ((*p == ' ') || (*p =='\0')) printf("=1\n");
  1728. }
  1729. }
  1730. } else p ++;
  1731. }
  1732. #endif
  1733. #endif
  1734. #ifdef INTEL_AMD
  1735. #ifndef FORCE
  1736. get_sse();
  1737. #else
  1738. sprintf(buffer, "%s", ARCHCONFIG);
  1739. p = &buffer[0];
  1740. while (*p) {
  1741. if ((*p == '-') && (*(p + 1) == 'D')) {
  1742. p += 2;
  1743. while ((*p != ' ') && (*p != '\0')) {
  1744. if (*p == '=') {
  1745. printf("=");
  1746. p ++;
  1747. while ((*p != ' ') && (*p != '\0')) {
  1748. printf("%c", *p);
  1749. p ++;
  1750. }
  1751. } else {
  1752. printf("%c", *p);
  1753. p ++;
  1754. if ((*p == ' ') || (*p =='\0')) printf("=1");
  1755. }
  1756. }
  1757. printf("\n");
  1758. } else p ++;
  1759. }
  1760. #endif
  1761. #endif
  1762. #if defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
  1763. printf("__BYTE_ORDER__=__ORDER_BIG_ENDIAN__\n");
  1764. #elif defined(__BIG_ENDIAN__) && __BIG_ENDIAN__ > 0
  1765. printf("__BYTE_ORDER__=__ORDER_BIG_ENDIAN__\n");
  1766. #endif
  1767. #if defined(_CALL_ELF) && (_CALL_ELF == 2)
  1768. printf("ELF_VERSION=2\n");
  1769. #endif
  1770. #ifdef MAKE_NB_JOBS
  1771. #if MAKE_NB_JOBS > 0
  1772. printf("MAKEFLAGS += -j %d\n", MAKE_NB_JOBS);
  1773. #else
  1774. // Let make use parent -j argument or -j1 if there
  1775. // is no make parent
  1776. #endif
  1777. #elif NO_PARALLEL_MAKE==1
  1778. printf("MAKEFLAGS += -j 1\n");
  1779. #else
  1780. printf("MAKEFLAGS += -j %d\n", get_num_cores());
  1781. #endif
  1782. break;
  1783. case '1' : /* For config.h */
  1784. #ifdef FORCE
  1785. sprintf(buffer, "%s -DCORE_%s\n", ARCHCONFIG, CORENAME);
  1786. p = &buffer[0];
  1787. while (*p) {
  1788. if ((*p == '-') && (*(p + 1) == 'D')) {
  1789. p += 2;
  1790. printf("#define ");
  1791. while ((*p != ' ') && (*p != '\0')) {
  1792. if (*p == '=') {
  1793. printf(" ");
  1794. p ++;
  1795. while ((*p != ' ') && (*p != '\0')) {
  1796. printf("%c", *p);
  1797. p ++;
  1798. }
  1799. } else {
  1800. if (*p != '\n')
  1801. printf("%c", *p);
  1802. p ++;
  1803. }
  1804. }
  1805. printf("\n");
  1806. } else p ++;
  1807. }
  1808. #else
  1809. get_cpuconfig();
  1810. #endif
  1811. #ifdef FORCE
  1812. printf("#define CHAR_CORENAME \"%s\"\n", CORENAME);
  1813. #else
  1814. #if defined(INTEL_AMD) || defined(POWER) || defined(__mips__) || defined(__arm__) || defined(__aarch64__) || defined(ZARCH) || defined(sparc) || defined(__loongarch__) || defined(__riscv)
  1815. printf("#define CHAR_CORENAME \"%s\"\n", get_corename());
  1816. #endif
  1817. #endif
  1818. break;
  1819. case '2' : /* SMP */
  1820. if (get_num_cores() > 1) printf("SMP=1\n");
  1821. break;
  1822. }
  1823. fflush(stdout);
  1824. return 0;
  1825. }