You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

cpuid_arm64.c 24 kB

Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777
  1. /**************************************************************************
  2. Copyright (c) 2013, The OpenBLAS Project
  3. All rights reserved.
  4. Redistribution and use in source and binary forms, with or without
  5. modification, are permitted provided that the following conditions are
  6. met:
  7. 1. Redistributions of source code must retain the above copyright
  8. notice, this list of conditions and the following disclaimer.
  9. 2. Redistributions in binary form must reproduce the above copyright
  10. notice, this list of conditions and the following disclaimer in
  11. the documentation and/or other materials provided with the
  12. distribution.
  13. 3. Neither the name of the OpenBLAS project nor the names of
  14. its contributors may be used to endorse or promote products
  15. derived from this software without specific prior written permission.
  16. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  17. AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  18. IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  19. ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
  20. LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  21. DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  22. SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  23. CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  24. OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
  25. USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  26. *****************************************************************************/
  27. #include <stdlib.h>
  28. #include <string.h>
  29. #ifdef __APPLE__
  30. #include <sys/sysctl.h>
  31. int32_t value;
  32. size_t length=sizeof(value);
  33. int64_t value64;
  34. size_t length64=sizeof(value64);
  35. #endif
  36. #if (defined OS_LINUX || defined OS_ANDROID)
  37. #include <asm/hwcap.h>
  38. #include <sys/auxv.h>
  39. #ifndef HWCAP_CPUID
  40. #define HWCAP_CPUID (1 << 11)
  41. #endif
  42. #ifndef HWCAP_SVE
  43. #define HWCAP_SVE (1 << 22)
  44. #endif
  45. #if (defined OS_WINDOWS)
  46. #include <winreg.h>
  47. #endif
  48. #define get_cpu_ftr(id, var) ({ \
  49. __asm__ __volatile__ ("mrs %0, "#id : "=r" (var)); \
  50. })
  51. #endif
  52. #define CPU_UNKNOWN 0
  53. #define CPU_ARMV8 1
  54. // Arm
  55. #define CPU_CORTEXA53 2
  56. #define CPU_CORTEXA55 14
  57. #define CPU_CORTEXA57 3
  58. #define CPU_CORTEXA72 4
  59. #define CPU_CORTEXA73 5
  60. #define CPU_CORTEXA76 23
  61. #define CPU_NEOVERSEN1 11
  62. #define CPU_NEOVERSEV1 16
  63. #define CPU_NEOVERSEN2 17
  64. #define CPU_NEOVERSEV2 24
  65. #define CPU_CORTEXX1 18
  66. #define CPU_CORTEXX2 19
  67. #define CPU_CORTEXA510 20
  68. #define CPU_CORTEXA710 21
  69. // Qualcomm
  70. #define CPU_FALKOR 6
  71. // Cavium
  72. #define CPU_THUNDERX 7
  73. #define CPU_THUNDERX2T99 8
  74. #define CPU_THUNDERX3T110 12
  75. //Hisilicon
  76. #define CPU_TSV110 9
  77. // Ampere
  78. #define CPU_EMAG8180 10
  79. // Apple
  80. #define CPU_VORTEX 13
  81. // Fujitsu
  82. #define CPU_A64FX 15
  83. // Phytium
  84. #define CPU_FT2000 22
  85. static char *cpuname[] = {
  86. "UNKNOWN",
  87. "ARMV8" ,
  88. "CORTEXA53",
  89. "CORTEXA57",
  90. "CORTEXA72",
  91. "CORTEXA73",
  92. "FALKOR",
  93. "THUNDERX",
  94. "THUNDERX2T99",
  95. "TSV110",
  96. "EMAG8180",
  97. "NEOVERSEN1",
  98. "THUNDERX3T110",
  99. "VORTEX",
  100. "CORTEXA55",
  101. "A64FX",
  102. "NEOVERSEV1",
  103. "NEOVERSEN2",
  104. "CORTEXX1",
  105. "CORTEXX2",
  106. "CORTEXA510",
  107. "CORTEXA710",
  108. "FT2000",
  109. "CORTEXA76",
  110. "NEOVERSEV2"
  111. };
  112. static char *cpuname_lower[] = {
  113. "unknown",
  114. "armv8",
  115. "cortexa53",
  116. "cortexa57",
  117. "cortexa72",
  118. "cortexa73",
  119. "falkor",
  120. "thunderx",
  121. "thunderx2t99",
  122. "tsv110",
  123. "emag8180",
  124. "neoversen1",
  125. "thunderx3t110",
  126. "vortex",
  127. "cortexa55",
  128. "a64fx",
  129. "neoversev1",
  130. "neoversen2",
  131. "cortexx1",
  132. "cortexx2",
  133. "cortexa510",
  134. "cortexa710",
  135. "ft2000",
  136. "cortexa76",
  137. "neoversev2"
  138. };
  139. static int cpulowperf=0;
  140. static int cpumidperf=0;
  141. static int cpuhiperf=0;
  142. int get_feature(char *search)
  143. {
  144. #if defined( __linux ) || defined( __NetBSD__ )
  145. FILE *infile;
  146. char buffer[2048], *p,*t;
  147. p = (char *) NULL ;
  148. infile = fopen("/proc/cpuinfo", "r");
  149. while (fgets(buffer, sizeof(buffer), infile))
  150. {
  151. if (!strncmp("Features", buffer, 8))
  152. {
  153. p = strchr(buffer, ':') + 2;
  154. break;
  155. }
  156. }
  157. fclose(infile);
  158. if( p == NULL ) return 0;
  159. t = strtok(p," ");
  160. while( (t = strtok(NULL," ")))
  161. {
  162. if (!strcmp(t, search)) { return(1); }
  163. }
  164. #endif
  165. return(0);
  166. }
  167. static int cpusort(const void *model1, const void *model2)
  168. {
  169. return (*(int*)model2-*(int*)model1);
  170. }
  171. int detect(void)
  172. {
  173. #if defined( __linux ) || defined( __NetBSD__ )
  174. int n,i,ii;
  175. int midr_el1;
  176. int implementer;
  177. int cpucap[1024];
  178. int cpucores[1024];
  179. FILE *infile;
  180. char cpupart[6],cpuimpl[6];
  181. char *cpu_impl=NULL,*cpu_pt=NULL;
  182. char buffer[2048], *p, *cpu_part = NULL, *cpu_implementer = NULL;
  183. p = (char *) NULL ;
  184. cpulowperf=cpumidperf=cpuhiperf=0;
  185. for (i=0;i<1024;i++)cpucores[i]=0;
  186. n=0;
  187. infile = fopen("/sys/devices/system/cpu/possible", "r");
  188. if (!infile) {
  189. infile = fopen("/proc/cpuinfo", "r");
  190. while (fgets(buffer, sizeof(buffer), infile)) {
  191. if (!strncmp("processor", buffer, 9))
  192. n++;
  193. }
  194. } else {
  195. fgets(buffer, sizeof(buffer), infile);
  196. sscanf(buffer,"0-%d",&n);
  197. n++;
  198. }
  199. fclose(infile);
  200. cpu_implementer=NULL;
  201. for (i=0;i<n;i++){
  202. sprintf(buffer,"/sys/devices/system/cpu/cpu%d/regs/identification/midr_el1",i);
  203. infile= fopen(buffer,"r");
  204. if (!infile) {
  205. infile = fopen("/proc/cpuinfo", "r");
  206. for (ii=0;ii<n;ii++){
  207. cpu_part=NULL;cpu_implementer=NULL;
  208. while (fgets(buffer, sizeof(buffer), infile)) {
  209. if ((cpu_part != NULL) && (cpu_implementer != NULL)) {
  210. break;
  211. }
  212. if ((cpu_part == NULL) && !strncmp("CPU part", buffer, 8)) {
  213. cpu_pt = strchr(buffer, ':') + 2;
  214. cpu_part = strdup(cpu_pt);
  215. cpucores[i]=strtol(cpu_part,NULL,0);
  216. } else if ((cpu_implementer == NULL) && !strncmp("CPU implementer", buffer, 15)) {
  217. cpu_impl = strchr(buffer, ':') + 2;
  218. cpu_implementer = strdup(cpu_impl);
  219. }
  220. }
  221. if (strstr(cpu_implementer, "0x41")) {
  222. if (cpucores[ii] >= 0xd4b) cpuhiperf++;
  223. else
  224. if (cpucores[ii] >= 0xd07) cpumidperf++;
  225. else cpulowperf++;
  226. }
  227. else cpulowperf++;
  228. }
  229. fclose(infile);
  230. break;
  231. } else {
  232. (void)fgets(buffer, sizeof(buffer), infile);
  233. midr_el1=strtoul(buffer,NULL,16);
  234. fclose(infile);
  235. implementer = (midr_el1 >> 24) & 0xFF;
  236. cpucores[i] = (midr_el1 >> 4) & 0xFFF;
  237. sprintf(buffer,"/sys/devices/system/cpu/cpu%d/cpu_capacity",i);
  238. infile= fopen(buffer,"r");
  239. if (!infile) {
  240. if (implementer== 65) {
  241. if (cpucores[i] >= 0xd4b) cpuhiperf++;
  242. else
  243. if (cpucores[i] >= 0xd07) cpumidperf++;
  244. else cpulowperf++;
  245. }
  246. else cpulowperf++;
  247. } else {
  248. (void)fgets(buffer, sizeof(buffer), infile);
  249. sscanf(buffer,"%d",&cpucap[i]);
  250. if (cpucap[i] >= 1000) cpuhiperf++;
  251. else
  252. if (cpucap[i] >= 500) cpumidperf++;
  253. else cpulowperf++;
  254. fclose(infile);
  255. }
  256. }
  257. sprintf(cpuimpl,"0x%2x",implementer);
  258. cpu_implementer=strdup(cpuimpl);
  259. }
  260. qsort(cpucores,1024,sizeof(int),cpusort);
  261. sprintf(cpupart,"0x%3x",cpucores[0]);
  262. cpu_part=strdup(cpupart);
  263. if(cpu_part != NULL && cpu_implementer != NULL) {
  264. // Arm
  265. if (strstr(cpu_implementer, "0x41")) {
  266. if (strstr(cpu_part, "0xd03"))
  267. return CPU_CORTEXA53;
  268. else if (strstr(cpu_part, "0xd07"))
  269. return CPU_CORTEXA57;
  270. else if (strstr(cpu_part, "0xd08"))
  271. return CPU_CORTEXA72;
  272. else if (strstr(cpu_part, "0xd09"))
  273. return CPU_CORTEXA73;
  274. else if (strstr(cpu_part, "0xd0c"))
  275. return CPU_NEOVERSEN1;
  276. else if (strstr(cpu_part, "0xd40"))
  277. return CPU_NEOVERSEV1;
  278. else if (strstr(cpu_part, "0xd49"))
  279. return CPU_NEOVERSEN2;
  280. else if (strstr(cpu_part, "0xd05"))
  281. return CPU_CORTEXA55;
  282. else if (strstr(cpu_part, "0xd46"))
  283. return CPU_CORTEXA510;
  284. else if (strstr(cpu_part, "0xd47"))
  285. return CPU_CORTEXA710;
  286. else if (strstr(cpu_part, "0xd4d")) //A715
  287. return CPU_CORTEXA710;
  288. else if (strstr(cpu_part, "0xd44"))
  289. return CPU_CORTEXX1;
  290. else if (strstr(cpu_part, "0xd4c"))
  291. return CPU_CORTEXX2;
  292. else if (strstr(cpu_part, "0xd4e")) //X3
  293. return CPU_CORTEXX2;
  294. else if (strstr(cpu_part, "0xd4f")) //NVIDIA Grace et al.
  295. return CPU_NEOVERSEV2;
  296. else if (strstr(cpu_part, "0xd0b"))
  297. return CPU_CORTEXA76;
  298. }
  299. // Qualcomm
  300. else if (strstr(cpu_implementer, "0x51") && strstr(cpu_part, "0xc00"))
  301. return CPU_FALKOR;
  302. // Cavium
  303. else if (strstr(cpu_implementer, "0x43") && strstr(cpu_part, "0x0a1"))
  304. return CPU_THUNDERX;
  305. else if (strstr(cpu_implementer, "0x43") && strstr(cpu_part, "0x0af"))
  306. return CPU_THUNDERX2T99;
  307. else if (strstr(cpu_implementer, "0x43") && strstr(cpu_part, "0x0b8"))
  308. return CPU_THUNDERX3T110;
  309. // HiSilicon
  310. else if (strstr(cpu_implementer, "0x48") && strstr(cpu_part, "0xd01"))
  311. return CPU_TSV110;
  312. // Ampere
  313. else if (strstr(cpu_implementer, "0x50") && strstr(cpu_part, "0x000"))
  314. return CPU_EMAG8180;
  315. // Fujitsu
  316. else if (strstr(cpu_implementer, "0x46") && strstr(cpu_part, "0x001"))
  317. return CPU_A64FX;
  318. // Apple
  319. else if (strstr(cpu_implementer, "0x61") && strstr(cpu_part, "0x022"))
  320. return CPU_VORTEX;
  321. // Phytium
  322. else if (strstr(cpu_implementer, "0x70") && (strstr(cpu_part, "0x660") || strstr(cpu_part, "0x661")
  323. || strstr(cpu_part, "0x662") || strstr(cpu_part, "0x663")))
  324. return CPU_FT2000;
  325. }
  326. p = (char *) NULL ;
  327. infile = fopen("/proc/cpuinfo", "r");
  328. while (fgets(buffer, sizeof(buffer), infile))
  329. {
  330. if ((!strncmp("model name", buffer, 10)) || (!strncmp("Processor", buffer, 9)) ||
  331. (!strncmp("CPU architecture", buffer, 16)))
  332. {
  333. p = strchr(buffer, ':') + 2;
  334. break;
  335. }
  336. }
  337. fclose(infile);
  338. if(p != NULL)
  339. {
  340. if ((strstr(p, "AArch64")) || (strstr(p, "8")))
  341. {
  342. return CPU_ARMV8;
  343. }
  344. }
  345. #else
  346. #ifdef __APPLE__
  347. sysctlbyname("hw.ncpu",&value64,&length64,NULL,0);
  348. cpulowperf=value64;
  349. sysctlbyname("hw.nperflevels",&value64,&length64,NULL,0);
  350. if (value64 > 1) {
  351. sysctlbyname("hw.perflevel0.cpusperl",&value64,&length64,NULL,0);
  352. cpuhiperf=value64;
  353. sysctlbyname("hw.perflevel1.cpusperl",&value64,&length64,NULL,0);
  354. cpulowperf=value64;
  355. }
  356. sysctlbyname("hw.cpufamily",&value64,&length64,NULL,0);
  357. if (value64 ==131287967|| value64 == 458787763 ) return CPU_VORTEX; //A12/M1
  358. if (value64 == 3660830781) return CPU_VORTEX; //A15/M2
  359. if (value64 == 2271604202) return CPU_VORTEX; //A16/M3
  360. if (value64 == 1867590060) return CPU_VORTEX; //M4
  361. #else
  362. #ifdef OS_WINDOWS
  363. HKEY reghandle;
  364. HKEY hklm = HKEY_LOCAL_MACHINE;
  365. WCHAR valstring[512];
  366. PVOID pvalstring=valstring;
  367. DWORD size=sizeof (valstring);
  368. DWORD type=RRF_RT_ANY;
  369. DWORD flags=0;
  370. LPCWSTR subkey= L"HARDWARE\\DESCRIPTION\\System\\CentralProcessor\\0";
  371. LPCWSTR field=L"ProcessorNameString";
  372. LONG errcode=RegOpenKeyEx(HKEY_LOCAL_MACHINE,TEXT("Hardware\\Description\\System\\CentralProcessor\\0"), 0, KEY_READ, &reghandle);
  373. if (errcode != NO_ERROR) wprintf(L"Could not open registry key for proc0: %x\n",errcode);
  374. errcode=RegQueryValueEx(reghandle, "ProcessorNameString", NULL,NULL ,pvalstring,&size);
  375. if (errcode != ERROR_SUCCESS) wprintf(L"Error reading cpuname from registry:%x\n",errcode);
  376. //wprintf(stderr,L"%s\n",(PWSTR)valstring);
  377. RegCloseKey(reghandle);
  378. if (strstr(valstring, "Snapdragon(R) X Elite")) return CPU_NEOVERSEN1;
  379. if (strstr(valstring, "Ampere(R) Altra")) return CPU_NEOVERSEN1;
  380. if (strstr(valstring, "Snapdragon (TM) 8cx Gen 3")) return CPU_CORTEXX1;
  381. if (strstr(valstring, "Snapdragon Compute Platform")) return CPU_CORTEXX1;
  382. #endif
  383. #endif
  384. return CPU_ARMV8;
  385. #endif
  386. return CPU_UNKNOWN;
  387. }
  388. char *get_corename(void)
  389. {
  390. return cpuname[detect()];
  391. }
  392. void get_architecture(void)
  393. {
  394. printf("ARM64");
  395. }
  396. void get_subarchitecture(void)
  397. {
  398. int d = detect();
  399. printf("%s", cpuname[d]);
  400. }
  401. void get_subdirname(void)
  402. {
  403. printf("arm64");
  404. }
  405. void get_cpucount(void)
  406. {
  407. int n=0;
  408. #if defined( __linux ) || defined( __NetBSD__ )
  409. FILE *infile;
  410. char buffer[2048], *p,*t;
  411. p = (char *) NULL ;
  412. infile = fopen("/proc/cpuinfo", "r");
  413. while (fgets(buffer, sizeof(buffer), infile))
  414. {
  415. if (!strncmp("processor", buffer, 9))
  416. n++;
  417. }
  418. fclose(infile);
  419. printf("#define NUM_CORES %d\n",n);
  420. if (cpulowperf >0)
  421. printf("#define NUM_CORES_LP %d\n",cpulowperf);
  422. if (cpumidperf >0)
  423. printf("#define NUM_CORES_MP %d\n",cpumidperf);
  424. if (cpuhiperf >0)
  425. printf("#define NUM_CORES_HP %d\n",cpuhiperf);
  426. #endif
  427. #ifdef __APPLE__
  428. sysctlbyname("hw.physicalcpu_max",&value,&length,NULL,0);
  429. printf("#define NUM_CORES %d\n",value);
  430. if (cpulowperf >0)
  431. printf("#define NUM_CORES_LP %d\n",cpulowperf);
  432. if (cpumidperf >0)
  433. printf("#define NUM_CORES_MP %d\n",cpumidperf);
  434. if (cpuhiperf >0)
  435. printf("#define NUM_CORES_HP %d\n",cpuhiperf);
  436. #endif
  437. }
  438. void get_cpuconfig(void)
  439. {
  440. // All arches should define ARMv8
  441. printf("#define ARMV8\n");
  442. printf("#define HAVE_NEON\n"); // This shouldn't be necessary
  443. printf("#define HAVE_VFPV4\n"); // This shouldn't be necessary
  444. int d = detect();
  445. switch (d)
  446. {
  447. case CPU_CORTEXA53:
  448. case CPU_CORTEXA55:
  449. printf("#define %s\n", cpuname[d]);
  450. // Fall-through
  451. case CPU_ARMV8:
  452. // Minimum parameters for ARMv8 (based on A53)
  453. printf("#define L1_DATA_SIZE 32768\n");
  454. printf("#define L1_DATA_LINESIZE 64\n");
  455. printf("#define L2_SIZE 262144\n");
  456. printf("#define L2_LINESIZE 64\n");
  457. printf("#define DTB_DEFAULT_ENTRIES 64\n");
  458. printf("#define DTB_SIZE 4096\n");
  459. printf("#define L2_ASSOCIATIVE 4\n");
  460. break;
  461. case CPU_CORTEXA57:
  462. case CPU_CORTEXA72:
  463. case CPU_CORTEXA73:
  464. // Common minimum settings for these Arm cores
  465. // Can change a lot, but we need to be conservative
  466. // TODO: detect info from /sys if possible
  467. printf("#define %s\n", cpuname[d]);
  468. printf("#define L1_CODE_SIZE 49152\n");
  469. printf("#define L1_CODE_LINESIZE 64\n");
  470. printf("#define L1_CODE_ASSOCIATIVE 3\n");
  471. printf("#define L1_DATA_SIZE 32768\n");
  472. printf("#define L1_DATA_LINESIZE 64\n");
  473. printf("#define L1_DATA_ASSOCIATIVE 2\n");
  474. printf("#define L2_SIZE 524288\n");
  475. printf("#define L2_LINESIZE 64\n");
  476. printf("#define L2_ASSOCIATIVE 16\n");
  477. printf("#define DTB_DEFAULT_ENTRIES 64\n");
  478. printf("#define DTB_SIZE 4096\n");
  479. break;
  480. case CPU_NEOVERSEN1:
  481. printf("#define %s\n", cpuname[d]);
  482. printf("#define L1_CODE_SIZE 65536\n");
  483. printf("#define L1_CODE_LINESIZE 64\n");
  484. printf("#define L1_CODE_ASSOCIATIVE 4\n");
  485. printf("#define L1_DATA_SIZE 65536\n");
  486. printf("#define L1_DATA_LINESIZE 64\n");
  487. printf("#define L1_DATA_ASSOCIATIVE 4\n");
  488. printf("#define L2_SIZE 1048576\n");
  489. printf("#define L2_LINESIZE 64\n");
  490. printf("#define L2_ASSOCIATIVE 8\n");
  491. printf("#define DTB_DEFAULT_ENTRIES 48\n");
  492. printf("#define DTB_SIZE 4096\n");
  493. break;
  494. case CPU_NEOVERSEV1:
  495. printf("#define HAVE_SVE 1\n");
  496. case CPU_CORTEXA76:
  497. printf("#define %s\n", cpuname[d]);
  498. printf("#define L1_CODE_SIZE 65536\n");
  499. printf("#define L1_CODE_LINESIZE 64\n");
  500. printf("#define L1_CODE_ASSOCIATIVE 4\n");
  501. printf("#define L1_DATA_SIZE 65536\n");
  502. printf("#define L1_DATA_LINESIZE 64\n");
  503. printf("#define L1_DATA_ASSOCIATIVE 4\n");
  504. printf("#define L2_SIZE 1048576\n");
  505. printf("#define L2_LINESIZE 64\n");
  506. printf("#define L2_ASSOCIATIVE 8\n");
  507. printf("#define DTB_DEFAULT_ENTRIES 48\n");
  508. printf("#define DTB_SIZE 4096\n");
  509. break;
  510. case CPU_NEOVERSEN2:
  511. printf("#define %s\n", cpuname[d]);
  512. printf("#define L1_CODE_SIZE 65536\n");
  513. printf("#define L1_CODE_LINESIZE 64\n");
  514. printf("#define L1_CODE_ASSOCIATIVE 4\n");
  515. printf("#define L1_DATA_SIZE 65536\n");
  516. printf("#define L1_DATA_LINESIZE 64\n");
  517. printf("#define L1_DATA_ASSOCIATIVE 4\n");
  518. printf("#define L2_SIZE 1048576\n");
  519. printf("#define L2_LINESIZE 64\n");
  520. printf("#define L2_ASSOCIATIVE 8\n");
  521. printf("#define DTB_DEFAULT_ENTRIES 48\n");
  522. printf("#define DTB_SIZE 4096\n");
  523. printf("#define HAVE_SVE 1\n");
  524. break;
  525. case CPU_NEOVERSEV2:
  526. printf("#define ARMV9\n");
  527. printf("#define HAVE_SVE 1\n");
  528. printf("#define %s\n", cpuname[d]);
  529. printf("#define L1_CODE_SIZE 65536\n");
  530. printf("#define L1_CODE_LINESIZE 64\n");
  531. printf("#define L1_CODE_ASSOCIATIVE 4\n");
  532. printf("#define L1_DATA_SIZE 65536\n");
  533. printf("#define L1_DATA_LINESIZE 64\n");
  534. printf("#define L1_DATA_ASSOCIATIVE 4\n");
  535. printf("#define L2_SIZE 1048576\n");
  536. printf("#define L2_LINESIZE 64\n");
  537. printf("#define L2_ASSOCIATIVE 8\n");
  538. // L1 Data TLB = 48 entries
  539. // L2 Data TLB = 2048 entries
  540. printf("#define DTB_DEFAULT_ENTRIES 48\n");
  541. printf("#define DTB_SIZE 4096\n"); // Set to 4096 for symmetry with other configs.
  542. break;
  543. case CPU_CORTEXA510:
  544. case CPU_CORTEXA710:
  545. case CPU_CORTEXX1:
  546. case CPU_CORTEXX2:
  547. printf("#define ARMV9\n");
  548. printf("#define HAVE_SVE 1\n");
  549. printf("#define %s\n", cpuname[d]);
  550. printf("#define L1_CODE_SIZE 65536\n");
  551. printf("#define L1_CODE_LINESIZE 64\n");
  552. printf("#define L1_CODE_ASSOCIATIVE 4\n");
  553. printf("#define L1_DATA_SIZE 65536\n");
  554. printf("#define L1_DATA_LINESIZE 64\n");
  555. printf("#define L1_DATA_ASSOCIATIVE 4\n");
  556. printf("#define L2_SIZE 1048576\n");
  557. printf("#define L2_LINESIZE 64\n");
  558. printf("#define L2_ASSOCIATIVE 8\n");
  559. printf("#define DTB_DEFAULT_ENTRIES 64\n");
  560. printf("#define DTB_SIZE 4096\n");
  561. break;
  562. case CPU_FALKOR:
  563. printf("#define FALKOR\n");
  564. printf("#define L1_CODE_SIZE 65536\n");
  565. printf("#define L1_CODE_LINESIZE 64\n");
  566. printf("#define L1_DATA_SIZE 32768\n");
  567. printf("#define L1_DATA_LINESIZE 128\n");
  568. printf("#define L2_SIZE 524288\n");
  569. printf("#define L2_LINESIZE 64\n");
  570. printf("#define DTB_DEFAULT_ENTRIES 64\n");
  571. printf("#define DTB_SIZE 4096\n");
  572. printf("#define L2_ASSOCIATIVE 16\n");
  573. break;
  574. case CPU_THUNDERX:
  575. printf("#define THUNDERX\n");
  576. printf("#define L1_DATA_SIZE 32768\n");
  577. printf("#define L1_DATA_LINESIZE 128\n");
  578. printf("#define L2_SIZE 16777216\n");
  579. printf("#define L2_LINESIZE 128\n");
  580. printf("#define DTB_DEFAULT_ENTRIES 64\n");
  581. printf("#define DTB_SIZE 4096\n");
  582. printf("#define L2_ASSOCIATIVE 16\n");
  583. break;
  584. case CPU_THUNDERX2T99:
  585. printf("#define THUNDERX2T99 \n");
  586. printf("#define L1_CODE_SIZE 32768 \n");
  587. printf("#define L1_CODE_LINESIZE 64 \n");
  588. printf("#define L1_CODE_ASSOCIATIVE 8 \n");
  589. printf("#define L1_DATA_SIZE 32768 \n");
  590. printf("#define L1_DATA_LINESIZE 64 \n");
  591. printf("#define L1_DATA_ASSOCIATIVE 8 \n");
  592. printf("#define L2_SIZE 262144 \n");
  593. printf("#define L2_LINESIZE 64 \n");
  594. printf("#define L2_ASSOCIATIVE 8 \n");
  595. printf("#define L3_SIZE 33554432 \n");
  596. printf("#define L3_LINESIZE 64 \n");
  597. printf("#define L3_ASSOCIATIVE 32 \n");
  598. printf("#define DTB_DEFAULT_ENTRIES 64 \n");
  599. printf("#define DTB_SIZE 4096 \n");
  600. break;
  601. case CPU_TSV110:
  602. printf("#define TSV110 \n");
  603. printf("#define L1_CODE_SIZE 65536 \n");
  604. printf("#define L1_CODE_LINESIZE 64 \n");
  605. printf("#define L1_CODE_ASSOCIATIVE 4 \n");
  606. printf("#define L1_DATA_SIZE 65536 \n");
  607. printf("#define L1_DATA_LINESIZE 64 \n");
  608. printf("#define L1_DATA_ASSOCIATIVE 4 \n");
  609. printf("#define L2_SIZE 524228 \n");
  610. printf("#define L2_LINESIZE 64 \n");
  611. printf("#define L2_ASSOCIATIVE 8 \n");
  612. printf("#define DTB_DEFAULT_ENTRIES 64 \n");
  613. printf("#define DTB_SIZE 4096 \n");
  614. break;
  615. case CPU_EMAG8180:
  616. // Minimum parameters for ARMv8 (based on A53)
  617. printf("#define EMAG8180\n");
  618. printf("#define L1_CODE_SIZE 32768\n");
  619. printf("#define L1_DATA_SIZE 32768\n");
  620. printf("#define L1_DATA_LINESIZE 64\n");
  621. printf("#define L2_SIZE 262144\n");
  622. printf("#define L2_LINESIZE 64\n");
  623. printf("#define DTB_DEFAULT_ENTRIES 64\n");
  624. printf("#define DTB_SIZE 4096\n");
  625. break;
  626. case CPU_THUNDERX3T110:
  627. printf("#define THUNDERX3T110 \n");
  628. printf("#define L1_CODE_SIZE 65536 \n");
  629. printf("#define L1_CODE_LINESIZE 64 \n");
  630. printf("#define L1_CODE_ASSOCIATIVE 8 \n");
  631. printf("#define L1_DATA_SIZE 32768 \n");
  632. printf("#define L1_DATA_LINESIZE 64 \n");
  633. printf("#define L1_DATA_ASSOCIATIVE 8 \n");
  634. printf("#define L2_SIZE 524288 \n");
  635. printf("#define L2_LINESIZE 64 \n");
  636. printf("#define L2_ASSOCIATIVE 8 \n");
  637. printf("#define L3_SIZE 94371840 \n");
  638. printf("#define L3_LINESIZE 64 \n");
  639. printf("#define L3_ASSOCIATIVE 32 \n");
  640. printf("#define DTB_DEFAULT_ENTRIES 64 \n");
  641. printf("#define DTB_SIZE 4096 \n");
  642. break;
  643. case CPU_VORTEX:
  644. printf("#define VORTEX \n");
  645. #ifdef __APPLE__
  646. sysctlbyname("hw.l1icachesize",&value64,&length64,NULL,0);
  647. printf("#define L1_CODE_SIZE %lld \n",value64);
  648. sysctlbyname("hw.cachelinesize",&value64,&length64,NULL,0);
  649. printf("#define L1_CODE_LINESIZE %lld \n",value64);
  650. sysctlbyname("hw.l1dcachesize",&value64,&length64,NULL,0);
  651. printf("#define L1_DATA_SIZE %lld \n",value64);
  652. sysctlbyname("hw.l2cachesize",&value64,&length64,NULL,0);
  653. printf("#define L2_SIZE %lld \n",value64);
  654. #endif
  655. printf("#define DTB_DEFAULT_ENTRIES 64 \n");
  656. printf("#define DTB_SIZE 4096 \n");
  657. break;
  658. case CPU_A64FX:
  659. printf("#define A64FX\n");
  660. printf("#define HAVE_SVE 1\n");
  661. printf("#define L1_CODE_SIZE 65535\n");
  662. printf("#define L1_DATA_SIZE 65535\n");
  663. printf("#define L1_DATA_LINESIZE 256\n");
  664. printf("#define L2_SIZE 8388608\n");
  665. printf("#define L2_LINESIZE 256\n");
  666. printf("#define DTB_DEFAULT_ENTRIES 64\n");
  667. printf("#define DTB_SIZE 4096\n");
  668. break;
  669. case CPU_FT2000:
  670. printf("#define FT2000\n");
  671. printf("#define L1_CODE_SIZE 32768\n");
  672. printf("#define L1_DATA_SIZE 32768\n");
  673. printf("#define L1_DATA_LINESIZE 64\n");
  674. printf("#define L2_SIZE 33554432\n");
  675. printf("#define L2_LINESIZE 64\n");
  676. printf("#define DTB_DEFAULT_ENTRIES 64\n");
  677. printf("#define DTB_SIZE 4096\n");
  678. break;
  679. }
  680. get_cpucount();
  681. }
  682. void get_libname(void)
  683. {
  684. int d = detect();
  685. printf("%s", cpuname_lower[d]);
  686. }
  687. void get_features(void)
  688. {
  689. #if defined( __linux ) || defined( __NetBSD__ )
  690. FILE *infile;
  691. char buffer[2048], *p,*t;
  692. p = (char *) NULL ;
  693. infile = fopen("/proc/cpuinfo", "r");
  694. while (fgets(buffer, sizeof(buffer), infile))
  695. {
  696. if (!strncmp("Features", buffer, 8))
  697. {
  698. p = strchr(buffer, ':') + 2;
  699. break;
  700. }
  701. }
  702. fclose(infile);
  703. if( p == NULL ) return;
  704. t = strtok(p," ");
  705. while( (t = strtok(NULL," ")))
  706. {
  707. }
  708. #endif
  709. return;
  710. }