You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

param.h 100 kB

12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
5 years ago
12 years ago
12 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
12 years ago
12 years ago
12 years ago
5 years ago
5 years ago
12 years ago
5 years ago
5 years ago
12 years ago
5 years ago
12 years ago
5 years ago
5 years ago
5 years ago
12 years ago
6 years ago
12 years ago
12 years ago
12 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
3 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
12 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
12 years ago
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000200120022003200420052006200720082009201020112012201320142015201620172018201920202021202220232024202520262027202820292030203120322033203420352036203720382039204020412042204320442045204620472048204920502051205220532054205520562057205820592060206120622063206420652066206720682069207020712072207320742075207620772078207920802081208220832084208520862087208820892090209120922093209420952096209720982099210021012102210321042105210621072108210921102111211221132114211521162117211821192120212121222123212421252126212721282129213021312132213321342135213621372138213921402141214221432144214521462147214821492150215121522153215421552156215721582159216021612162216321642165216621672168216921702171217221732174217521762177217821792180218121822183218421852186218721882189219021912192219321942195219621972198219922002201220222032204220522062207220822092210221122122213221422152216221722182219222022212222222322242225222622272228222922302231223222332234223522362237223822392240224122422243224422452246224722482249225022512252225322542255225622572258225922602261226222632264226522662267226822692270227122722273227422752276227722782279228022812282228322842285228622872288228922902291229222932294229522962297229822992300230123022303230423052306230723082309231023112312231323142315231623172318231923202321232223232324232523262327232823292330233123322333233423352336233723382339234023412342234323442345234623472348234923502351235223532354235523562357235823592360236123622363236423652366236723682369237023712372237323742375237623772378237923802381238223832384238523862387238823892390239123922393239423952396239723982399240024012402240324042405240624072408240924102411241224132414241524162417241824192420242124222423242424252426242724282429243024312432243324342435243624372438243924402441244224432444244524462447244824492450245124522453245424552456245724582459246024612462246324642465246624672468246924702471247224732474247524762477247824792480248124822483248424852486248724882489249024912492249324942495249624972498249925002501250225032504250525062507250825092510251125122513251425152516251725182519252025212522252325242525252625272528252925302531253225332534253525362537253825392540254125422543254425452546254725482549255025512552255325542555255625572558255925602561256225632564256525662567256825692570257125722573257425752576257725782579258025812582258325842585258625872588258925902591259225932594259525962597259825992600260126022603260426052606260726082609261026112612261326142615261626172618261926202621262226232624262526262627262826292630263126322633263426352636263726382639264026412642264326442645264626472648264926502651265226532654265526562657265826592660266126622663266426652666266726682669267026712672267326742675267626772678267926802681268226832684268526862687268826892690269126922693269426952696269726982699270027012702270327042705270627072708270927102711271227132714271527162717271827192720272127222723272427252726272727282729273027312732273327342735273627372738273927402741274227432744274527462747274827492750275127522753275427552756275727582759276027612762276327642765276627672768276927702771277227732774277527762777277827792780278127822783278427852786278727882789279027912792279327942795279627972798279928002801280228032804280528062807280828092810281128122813281428152816281728182819282028212822282328242825282628272828282928302831283228332834283528362837283828392840284128422843284428452846284728482849285028512852285328542855285628572858285928602861286228632864286528662867286828692870287128722873287428752876287728782879288028812882288328842885288628872888288928902891289228932894289528962897289828992900290129022903290429052906290729082909291029112912291329142915291629172918291929202921292229232924292529262927292829292930293129322933293429352936293729382939294029412942294329442945294629472948294929502951295229532954295529562957295829592960296129622963296429652966296729682969297029712972297329742975297629772978297929802981298229832984298529862987298829892990299129922993299429952996299729982999300030013002300330043005300630073008300930103011301230133014301530163017301830193020302130223023302430253026302730283029303030313032303330343035303630373038303930403041304230433044304530463047304830493050305130523053305430553056305730583059306030613062306330643065306630673068306930703071307230733074307530763077307830793080308130823083308430853086308730883089309030913092309330943095309630973098309931003101310231033104310531063107310831093110311131123113311431153116311731183119312031213122312331243125312631273128312931303131313231333134313531363137313831393140314131423143314431453146314731483149315031513152315331543155315631573158315931603161316231633164316531663167316831693170317131723173317431753176317731783179318031813182318331843185318631873188318931903191319231933194319531963197319831993200320132023203320432053206320732083209321032113212321332143215321632173218321932203221322232233224322532263227322832293230323132323233323432353236323732383239324032413242324332443245324632473248324932503251325232533254325532563257325832593260326132623263326432653266326732683269327032713272327332743275327632773278327932803281328232833284328532863287328832893290329132923293329432953296329732983299330033013302330333043305330633073308330933103311331233133314331533163317331833193320332133223323332433253326332733283329333033313332333333343335333633373338333933403341334233433344334533463347334833493350335133523353335433553356335733583359336033613362336333643365336633673368336933703371337233733374337533763377337833793380338133823383338433853386338733883389339033913392339333943395339633973398339934003401340234033404340534063407340834093410341134123413341434153416341734183419342034213422342334243425342634273428342934303431343234333434343534363437343834393440344134423443344434453446344734483449345034513452345334543455345634573458345934603461346234633464346534663467346834693470347134723473347434753476347734783479348034813482348334843485348634873488348934903491349234933494349534963497349834993500350135023503350435053506350735083509351035113512351335143515351635173518351935203521352235233524352535263527352835293530353135323533353435353536353735383539354035413542354335443545354635473548354935503551355235533554355535563557355835593560356135623563356435653566356735683569357035713572357335743575357635773578357935803581358235833584358535863587358835893590359135923593359435953596359735983599360036013602360336043605360636073608360936103611361236133614361536163617361836193620362136223623362436253626362736283629363036313632363336343635363636373638363936403641364236433644364536463647364836493650365136523653365436553656365736583659366036613662366336643665366636673668366936703671367236733674367536763677367836793680368136823683368436853686368736883689369036913692369336943695369636973698369937003701370237033704370537063707370837093710371137123713371437153716371737183719372037213722372337243725372637273728372937303731373237333734373537363737373837393740374137423743374437453746374737483749375037513752375337543755375637573758375937603761376237633764376537663767376837693770377137723773377437753776377737783779378037813782378337843785378637873788378937903791379237933794379537963797379837993800380138023803380438053806380738083809381038113812381338143815381638173818381938203821382238233824382538263827382838293830383138323833383438353836383738383839384038413842384338443845384638473848384938503851385238533854385538563857385838593860386138623863386438653866386738683869387038713872387338743875387638773878387938803881388238833884388538863887388838893890389138923893389438953896389738983899390039013902390339043905390639073908390939103911391239133914391539163917391839193920392139223923392439253926392739283929393039313932393339343935393639373938393939403941394239433944394539463947394839493950395139523953395439553956395739583959396039613962396339643965396639673968396939703971397239733974397539763977397839793980398139823983398439853986398739883989399039913992399339943995399639973998399940004001400240034004400540064007400840094010401140124013401440154016401740184019402040214022402340244025402640274028402940304031403240334034403540364037403840394040404140424043404440454046404740484049405040514052405340544055405640574058405940604061406240634064406540664067406840694070407140724073407440754076407740784079408040814082408340844085408640874088408940904091409240934094409540964097409840994100410141024103410441054106410741084109411041114112411341144115411641174118411941204121412241234124412541264127412841294130413141324133413441354136413741384139414041414142414341444145414641474148414941504151415241534154415541564157415841594160416141624163416441654166416741684169417041714172417341744175417641774178417941804181418241834184418541864187418841894190419141924193419441954196419741984199420042014202420342044205420642074208420942104211
  1. /*****************************************************************************
  2. Copyright (c) 2011-2023, The OpenBLAS Project
  3. All rights reserved.
  4. Redistribution and use in source and binary forms, with or without
  5. modification, are permitted provided that the following conditions are
  6. met:
  7. 1. Redistributions of source code must retain the above copyright
  8. notice, this list of conditions and the following disclaimer.
  9. 2. Redistributions in binary form must reproduce the above copyright
  10. notice, this list of conditions and the following disclaimer in
  11. the documentation and/or other materials provided with the
  12. distribution.
  13. 3. Neither the name of the OpenBLAS project nor the names of
  14. its contributors may be used to endorse or promote products
  15. derived from this software without specific prior written
  16. permission.
  17. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  18. AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  19. IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  20. ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  21. LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  22. DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  23. SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  24. CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  25. OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
  26. USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  27. **********************************************************************************/
  28. /*********************************************************************/
  29. /* Copyright 2009, 2010 The University of Texas at Austin. */
  30. /* All rights reserved. */
  31. /* */
  32. /* Redistribution and use in source and binary forms, with or */
  33. /* without modification, are permitted provided that the following */
  34. /* conditions are met: */
  35. /* */
  36. /* 1. Redistributions of source code must retain the above */
  37. /* copyright notice, this list of conditions and the following */
  38. /* disclaimer. */
  39. /* */
  40. /* 2. Redistributions in binary form must reproduce the above */
  41. /* copyright notice, this list of conditions and the following */
  42. /* disclaimer in the documentation and/or other materials */
  43. /* provided with the distribution. */
  44. /* */
  45. /* THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF TEXAS AT */
  46. /* AUSTIN ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, */
  47. /* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */
  48. /* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */
  49. /* DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT */
  50. /* AUSTIN OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, */
  51. /* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES */
  52. /* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE */
  53. /* GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR */
  54. /* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF */
  55. /* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT */
  56. /* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT */
  57. /* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */
  58. /* POSSIBILITY OF SUCH DAMAGE. */
  59. /* */
  60. /* The views and conclusions contained in the software and */
  61. /* documentation are those of the authors and should not be */
  62. /* interpreted as representing official policies, either expressed */
  63. /* or implied, of The University of Texas at Austin. */
  64. /*********************************************************************/
  65. #ifndef PARAM_H
  66. #define PARAM_H
  67. #define SHGEMM_DEFAULT_UNROLL_N 8
  68. #define SHGEMM_DEFAULT_UNROLL_M 8
  69. #define SHGEMM_DEFAULT_P 128
  70. #define SHGEMM_DEFAULT_R 240
  71. #define SHGEMM_DEFAULT_Q 12288
  72. #define SBGEMM_DEFAULT_UNROLL_N 4
  73. #define SBGEMM_DEFAULT_UNROLL_M 8
  74. #define SBGEMM_DEFAULT_UNROLL_MN 32
  75. #define SBGEMM_DEFAULT_P 256
  76. #define SBGEMM_DEFAULT_R 256
  77. #define SBGEMM_DEFAULT_Q 256
  78. #define SBGEMM_ALIGN_K 1 // must be 2^x
  79. #ifdef OPTERON
  80. #define SNUMOPT 4
  81. #define DNUMOPT 2
  82. #define GEMM_DEFAULT_OFFSET_A 64
  83. #define GEMM_DEFAULT_OFFSET_B 256
  84. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x01ffffUL
  85. #define SGEMM_DEFAULT_UNROLL_N 4
  86. #define DGEMM_DEFAULT_UNROLL_N 4
  87. #define QGEMM_DEFAULT_UNROLL_N 2
  88. #define CGEMM_DEFAULT_UNROLL_N 2
  89. #define ZGEMM_DEFAULT_UNROLL_N 2
  90. #define XGEMM_DEFAULT_UNROLL_N 1
  91. #ifdef ARCH_X86
  92. #define SGEMM_DEFAULT_UNROLL_M 4
  93. #define DGEMM_DEFAULT_UNROLL_M 2
  94. #define QGEMM_DEFAULT_UNROLL_M 2
  95. #define CGEMM_DEFAULT_UNROLL_M 2
  96. #define ZGEMM_DEFAULT_UNROLL_M 1
  97. #define XGEMM_DEFAULT_UNROLL_M 1
  98. #else
  99. #define SGEMM_DEFAULT_UNROLL_M 8
  100. #define DGEMM_DEFAULT_UNROLL_M 4
  101. #define QGEMM_DEFAULT_UNROLL_M 2
  102. #define CGEMM_DEFAULT_UNROLL_M 4
  103. #define ZGEMM_DEFAULT_UNROLL_M 2
  104. #define XGEMM_DEFAULT_UNROLL_M 1
  105. #endif
  106. #define SGEMM_DEFAULT_P sgemm_p
  107. #define DGEMM_DEFAULT_P dgemm_p
  108. #define QGEMM_DEFAULT_P qgemm_p
  109. #define CGEMM_DEFAULT_P cgemm_p
  110. #define ZGEMM_DEFAULT_P zgemm_p
  111. #define XGEMM_DEFAULT_P xgemm_p
  112. #define SGEMM_DEFAULT_R sgemm_r
  113. #define DGEMM_DEFAULT_R dgemm_r
  114. #define QGEMM_DEFAULT_R qgemm_r
  115. #define CGEMM_DEFAULT_R cgemm_r
  116. #define ZGEMM_DEFAULT_R zgemm_r
  117. #define XGEMM_DEFAULT_R xgemm_r
  118. #ifdef ALLOC_HUGETLB
  119. #define SGEMM_DEFAULT_Q 248
  120. #define DGEMM_DEFAULT_Q 248
  121. #define QGEMM_DEFAULT_Q 248
  122. #define CGEMM_DEFAULT_Q 248
  123. #define ZGEMM_DEFAULT_Q 248
  124. #define XGEMM_DEFAULT_Q 248
  125. #else
  126. #define SGEMM_DEFAULT_Q 240
  127. #define DGEMM_DEFAULT_Q 240
  128. #define QGEMM_DEFAULT_Q 240
  129. #define CGEMM_DEFAULT_Q 240
  130. #define ZGEMM_DEFAULT_Q 240
  131. #define XGEMM_DEFAULT_Q 240
  132. #endif
  133. #define SYMV_P 16
  134. #define HAVE_EXCLUSIVE_CACHE
  135. #endif
  136. #if defined(BARCELONA) || defined(SHANGHAI) || defined(BOBCAT)
  137. #define SNUMOPT 8
  138. #define DNUMOPT 4
  139. #define GEMM_DEFAULT_OFFSET_A 64
  140. #define GEMM_DEFAULT_OFFSET_B 832
  141. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  142. #define SGEMM_DEFAULT_UNROLL_N 4
  143. #define DGEMM_DEFAULT_UNROLL_N 4
  144. #define QGEMM_DEFAULT_UNROLL_N 2
  145. #define CGEMM_DEFAULT_UNROLL_N 2
  146. #define ZGEMM_DEFAULT_UNROLL_N 2
  147. #define XGEMM_DEFAULT_UNROLL_N 1
  148. #ifdef ARCH_X86
  149. #define SGEMM_DEFAULT_UNROLL_M 4
  150. #define DGEMM_DEFAULT_UNROLL_M 2
  151. #define QGEMM_DEFAULT_UNROLL_M 2
  152. #define CGEMM_DEFAULT_UNROLL_M 2
  153. #define ZGEMM_DEFAULT_UNROLL_M 1
  154. #define XGEMM_DEFAULT_UNROLL_M 1
  155. #else
  156. #define SGEMM_DEFAULT_UNROLL_M 8
  157. #define DGEMM_DEFAULT_UNROLL_M 4
  158. #define QGEMM_DEFAULT_UNROLL_M 2
  159. #define CGEMM_DEFAULT_UNROLL_M 4
  160. #define ZGEMM_DEFAULT_UNROLL_M 2
  161. #define XGEMM_DEFAULT_UNROLL_M 1
  162. #endif
  163. #if 0
  164. #define SGEMM_DEFAULT_P 496
  165. #define DGEMM_DEFAULT_P 248
  166. #define QGEMM_DEFAULT_P 124
  167. #define CGEMM_DEFAULT_P 248
  168. #define ZGEMM_DEFAULT_P 124
  169. #define XGEMM_DEFAULT_P 62
  170. #define SGEMM_DEFAULT_Q 248
  171. #define DGEMM_DEFAULT_Q 248
  172. #define QGEMM_DEFAULT_Q 248
  173. #define CGEMM_DEFAULT_Q 248
  174. #define ZGEMM_DEFAULT_Q 248
  175. #define XGEMM_DEFAULT_Q 248
  176. #else
  177. #define SGEMM_DEFAULT_P 448
  178. #define DGEMM_DEFAULT_P 224
  179. #define QGEMM_DEFAULT_P 112
  180. #define CGEMM_DEFAULT_P 224
  181. #define ZGEMM_DEFAULT_P 112
  182. #define XGEMM_DEFAULT_P 56
  183. #define SGEMM_DEFAULT_Q 224
  184. #define DGEMM_DEFAULT_Q 224
  185. #define QGEMM_DEFAULT_Q 224
  186. #define CGEMM_DEFAULT_Q 224
  187. #define ZGEMM_DEFAULT_Q 224
  188. #define XGEMM_DEFAULT_Q 224
  189. #endif
  190. #define SGEMM_DEFAULT_R sgemm_r
  191. #define QGEMM_DEFAULT_R qgemm_r
  192. #define DGEMM_DEFAULT_R dgemm_r
  193. #define CGEMM_DEFAULT_R cgemm_r
  194. #define ZGEMM_DEFAULT_R zgemm_r
  195. #define XGEMM_DEFAULT_R xgemm_r
  196. #define SYMV_P 16
  197. #define HAVE_EXCLUSIVE_CACHE
  198. #define GEMM_THREAD gemm_thread_mn
  199. #endif
  200. #ifdef BULLDOZER
  201. #define SNUMOPT 8
  202. #define DNUMOPT 4
  203. #define GEMM_DEFAULT_OFFSET_A 64
  204. #define GEMM_DEFAULT_OFFSET_B 832
  205. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  206. #define QGEMM_DEFAULT_UNROLL_N 2
  207. #define CGEMM_DEFAULT_UNROLL_N 2
  208. #define ZGEMM_DEFAULT_UNROLL_N 2
  209. #define XGEMM_DEFAULT_UNROLL_N 1
  210. #ifdef ARCH_X86
  211. #define SGEMM_DEFAULT_UNROLL_N 4
  212. #define DGEMM_DEFAULT_UNROLL_N 4
  213. #define SGEMM_DEFAULT_UNROLL_M 4
  214. #define DGEMM_DEFAULT_UNROLL_M 2
  215. #define QGEMM_DEFAULT_UNROLL_M 2
  216. #define CGEMM_DEFAULT_UNROLL_M 2
  217. #define ZGEMM_DEFAULT_UNROLL_M 1
  218. #define XGEMM_DEFAULT_UNROLL_M 1
  219. #else
  220. #define SGEMM_DEFAULT_UNROLL_N 2
  221. #define DGEMM_DEFAULT_UNROLL_N 2
  222. #define SGEMM_DEFAULT_UNROLL_M 16
  223. #define DGEMM_DEFAULT_UNROLL_M 8
  224. #define QGEMM_DEFAULT_UNROLL_M 2
  225. #define CGEMM_DEFAULT_UNROLL_M 4
  226. #define ZGEMM_DEFAULT_UNROLL_M 2
  227. #define XGEMM_DEFAULT_UNROLL_M 1
  228. #define CGEMM3M_DEFAULT_UNROLL_N 4
  229. #define CGEMM3M_DEFAULT_UNROLL_M 8
  230. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  231. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  232. #define DGEMM_DEFAULT_UNROLL_MN 16
  233. #define GEMV_UNROLL 8
  234. #endif
  235. #if defined(ARCH_X86_64)
  236. #define SGEMM_DEFAULT_P 768
  237. #define DGEMM_DEFAULT_P 384
  238. #else
  239. #define SGEMM_DEFAULT_P 448
  240. #define DGEMM_DEFAULT_P 224
  241. #endif
  242. #define QGEMM_DEFAULT_P 112
  243. #define CGEMM_DEFAULT_P 224
  244. #define ZGEMM_DEFAULT_P 112
  245. #define XGEMM_DEFAULT_P 56
  246. #if defined(ARCH_X86_64)
  247. #define SGEMM_DEFAULT_Q 168
  248. #define DGEMM_DEFAULT_Q 168
  249. #else
  250. #define SGEMM_DEFAULT_Q 224
  251. #define DGEMM_DEFAULT_Q 224
  252. #endif
  253. #define QGEMM_DEFAULT_Q 224
  254. #define CGEMM_DEFAULT_Q 224
  255. #define ZGEMM_DEFAULT_Q 224
  256. #define XGEMM_DEFAULT_Q 224
  257. #define CGEMM3M_DEFAULT_P 448
  258. #define ZGEMM3M_DEFAULT_P 224
  259. #define XGEMM3M_DEFAULT_P 112
  260. #define CGEMM3M_DEFAULT_Q 224
  261. #define ZGEMM3M_DEFAULT_Q 224
  262. #define XGEMM3M_DEFAULT_Q 224
  263. #define CGEMM3M_DEFAULT_R 12288
  264. #define ZGEMM3M_DEFAULT_R 12288
  265. #define XGEMM3M_DEFAULT_R 12288
  266. #define SGEMM_DEFAULT_R sgemm_r
  267. #define QGEMM_DEFAULT_R qgemm_r
  268. #define DGEMM_DEFAULT_R dgemm_r
  269. #define CGEMM_DEFAULT_R cgemm_r
  270. #define ZGEMM_DEFAULT_R zgemm_r
  271. #define XGEMM_DEFAULT_R xgemm_r
  272. #define SYMV_P 16
  273. #define HAVE_EXCLUSIVE_CACHE
  274. #define GEMM_THREAD gemm_thread_mn
  275. #endif
  276. #ifdef PILEDRIVER
  277. #define SNUMOPT 8
  278. #define DNUMOPT 4
  279. #define GEMM_DEFAULT_OFFSET_A 64
  280. #define GEMM_DEFAULT_OFFSET_B 832
  281. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  282. #define QGEMM_DEFAULT_UNROLL_N 2
  283. #define CGEMM_DEFAULT_UNROLL_N 2
  284. #define ZGEMM_DEFAULT_UNROLL_N 2
  285. #define XGEMM_DEFAULT_UNROLL_N 1
  286. #ifdef ARCH_X86
  287. #define SGEMM_DEFAULT_UNROLL_N 4
  288. #define DGEMM_DEFAULT_UNROLL_N 4
  289. #define SGEMM_DEFAULT_UNROLL_M 4
  290. #define DGEMM_DEFAULT_UNROLL_M 2
  291. #define QGEMM_DEFAULT_UNROLL_M 2
  292. #define CGEMM_DEFAULT_UNROLL_M 2
  293. #define ZGEMM_DEFAULT_UNROLL_M 1
  294. #define XGEMM_DEFAULT_UNROLL_M 1
  295. #else
  296. #define SGEMM_DEFAULT_UNROLL_N 2
  297. #define DGEMM_DEFAULT_UNROLL_N 2
  298. #define SGEMM_DEFAULT_UNROLL_M 16
  299. #define DGEMM_DEFAULT_UNROLL_M 8
  300. #define QGEMM_DEFAULT_UNROLL_M 2
  301. #define CGEMM_DEFAULT_UNROLL_M 4
  302. #define ZGEMM_DEFAULT_UNROLL_M 2
  303. #define XGEMM_DEFAULT_UNROLL_M 1
  304. #define CGEMM3M_DEFAULT_UNROLL_N 4
  305. #define CGEMM3M_DEFAULT_UNROLL_M 8
  306. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  307. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  308. #define GEMV_UNROLL 8
  309. #endif
  310. #if defined(ARCH_X86_64)
  311. #define SGEMM_DEFAULT_P 768
  312. #define DGEMM_DEFAULT_P 768
  313. #define ZGEMM_DEFAULT_P 384
  314. #define CGEMM_DEFAULT_P 768
  315. #else
  316. #define SGEMM_DEFAULT_P 448
  317. #define DGEMM_DEFAULT_P 480
  318. #define ZGEMM_DEFAULT_P 112
  319. #define CGEMM_DEFAULT_P 224
  320. #endif
  321. #define QGEMM_DEFAULT_P 112
  322. #define XGEMM_DEFAULT_P 56
  323. #if defined(ARCH_X86_64)
  324. #define SGEMM_DEFAULT_Q 192
  325. #define DGEMM_DEFAULT_Q 168
  326. #define ZGEMM_DEFAULT_Q 168
  327. #define CGEMM_DEFAULT_Q 168
  328. #else
  329. #define SGEMM_DEFAULT_Q 224
  330. #define DGEMM_DEFAULT_Q 224
  331. #define ZGEMM_DEFAULT_Q 224
  332. #define CGEMM_DEFAULT_Q 224
  333. #endif
  334. #define QGEMM_DEFAULT_Q 224
  335. #define XGEMM_DEFAULT_Q 224
  336. #define CGEMM3M_DEFAULT_P 448
  337. #define ZGEMM3M_DEFAULT_P 224
  338. #define XGEMM3M_DEFAULT_P 112
  339. #define CGEMM3M_DEFAULT_Q 224
  340. #define ZGEMM3M_DEFAULT_Q 224
  341. #define XGEMM3M_DEFAULT_Q 224
  342. #define CGEMM3M_DEFAULT_R 12288
  343. #define ZGEMM3M_DEFAULT_R 12288
  344. #define XGEMM3M_DEFAULT_R 12288
  345. #define SGEMM_DEFAULT_R 12288
  346. #define QGEMM_DEFAULT_R qgemm_r
  347. #define DGEMM_DEFAULT_R 12288
  348. #define CGEMM_DEFAULT_R cgemm_r
  349. #define ZGEMM_DEFAULT_R zgemm_r
  350. #define XGEMM_DEFAULT_R xgemm_r
  351. #define SYMV_P 16
  352. #define HAVE_EXCLUSIVE_CACHE
  353. #define GEMM_THREAD gemm_thread_mn
  354. #endif
  355. #ifdef STEAMROLLER
  356. #define SNUMOPT 8
  357. #define DNUMOPT 4
  358. #define GEMM_DEFAULT_OFFSET_A 64
  359. #define GEMM_DEFAULT_OFFSET_B 832
  360. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  361. #define QGEMM_DEFAULT_UNROLL_N 2
  362. #define CGEMM_DEFAULT_UNROLL_N 2
  363. #define ZGEMM_DEFAULT_UNROLL_N 2
  364. #define XGEMM_DEFAULT_UNROLL_N 1
  365. #ifdef ARCH_X86
  366. #define SGEMM_DEFAULT_UNROLL_N 4
  367. #define DGEMM_DEFAULT_UNROLL_N 4
  368. #define SGEMM_DEFAULT_UNROLL_M 4
  369. #define DGEMM_DEFAULT_UNROLL_M 2
  370. #define QGEMM_DEFAULT_UNROLL_M 2
  371. #define CGEMM_DEFAULT_UNROLL_M 2
  372. #define ZGEMM_DEFAULT_UNROLL_M 1
  373. #define XGEMM_DEFAULT_UNROLL_M 1
  374. #else
  375. #define SGEMM_DEFAULT_UNROLL_N 2
  376. #define DGEMM_DEFAULT_UNROLL_N 2
  377. #define SGEMM_DEFAULT_UNROLL_M 16
  378. #define DGEMM_DEFAULT_UNROLL_M 8
  379. #define QGEMM_DEFAULT_UNROLL_M 2
  380. #define CGEMM_DEFAULT_UNROLL_M 4
  381. #define ZGEMM_DEFAULT_UNROLL_M 2
  382. #define XGEMM_DEFAULT_UNROLL_M 1
  383. #define CGEMM3M_DEFAULT_UNROLL_N 4
  384. #define CGEMM3M_DEFAULT_UNROLL_M 8
  385. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  386. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  387. #define GEMV_UNROLL 8
  388. #endif
  389. #if defined(ARCH_X86_64)
  390. #define SGEMM_DEFAULT_P 768
  391. #define DGEMM_DEFAULT_P 576
  392. #define ZGEMM_DEFAULT_P 288
  393. #define CGEMM_DEFAULT_P 576
  394. #else
  395. #define SGEMM_DEFAULT_P 448
  396. #define DGEMM_DEFAULT_P 480
  397. #define ZGEMM_DEFAULT_P 112
  398. #define CGEMM_DEFAULT_P 224
  399. #endif
  400. #define QGEMM_DEFAULT_P 112
  401. #define XGEMM_DEFAULT_P 56
  402. #if defined(ARCH_X86_64)
  403. #define SGEMM_DEFAULT_Q 192
  404. #define DGEMM_DEFAULT_Q 160
  405. #define ZGEMM_DEFAULT_Q 160
  406. #define CGEMM_DEFAULT_Q 160
  407. #else
  408. #define SGEMM_DEFAULT_Q 224
  409. #define DGEMM_DEFAULT_Q 224
  410. #define ZGEMM_DEFAULT_Q 224
  411. #define CGEMM_DEFAULT_Q 224
  412. #endif
  413. #define QGEMM_DEFAULT_Q 224
  414. #define XGEMM_DEFAULT_Q 224
  415. #define CGEMM3M_DEFAULT_P 448
  416. #define ZGEMM3M_DEFAULT_P 224
  417. #define XGEMM3M_DEFAULT_P 112
  418. #define CGEMM3M_DEFAULT_Q 224
  419. #define ZGEMM3M_DEFAULT_Q 224
  420. #define XGEMM3M_DEFAULT_Q 224
  421. #define CGEMM3M_DEFAULT_R 12288
  422. #define ZGEMM3M_DEFAULT_R 12288
  423. #define XGEMM3M_DEFAULT_R 12288
  424. #define SGEMM_DEFAULT_R 12288
  425. #define QGEMM_DEFAULT_R qgemm_r
  426. #define DGEMM_DEFAULT_R 12288
  427. #define CGEMM_DEFAULT_R cgemm_r
  428. #define ZGEMM_DEFAULT_R zgemm_r
  429. #define XGEMM_DEFAULT_R xgemm_r
  430. #define SYMV_P 16
  431. #define HAVE_EXCLUSIVE_CACHE
  432. #define GEMM_THREAD gemm_thread_mn
  433. #endif
  434. #ifdef EXCAVATOR
  435. #define SNUMOPT 8
  436. #define DNUMOPT 4
  437. #define GEMM_DEFAULT_OFFSET_A 64
  438. #define GEMM_DEFAULT_OFFSET_B 832
  439. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  440. #define QGEMM_DEFAULT_UNROLL_N 2
  441. #define CGEMM_DEFAULT_UNROLL_N 2
  442. #define ZGEMM_DEFAULT_UNROLL_N 2
  443. #define XGEMM_DEFAULT_UNROLL_N 1
  444. #ifdef ARCH_X86
  445. #define SGEMM_DEFAULT_UNROLL_N 4
  446. #define DGEMM_DEFAULT_UNROLL_N 4
  447. #define SGEMM_DEFAULT_UNROLL_M 4
  448. #define DGEMM_DEFAULT_UNROLL_M 2
  449. #define QGEMM_DEFAULT_UNROLL_M 2
  450. #define CGEMM_DEFAULT_UNROLL_M 2
  451. #define ZGEMM_DEFAULT_UNROLL_M 1
  452. #define XGEMM_DEFAULT_UNROLL_M 1
  453. #else
  454. #define SGEMM_DEFAULT_UNROLL_N 2
  455. #define DGEMM_DEFAULT_UNROLL_N 2
  456. #define SGEMM_DEFAULT_UNROLL_M 16
  457. #define DGEMM_DEFAULT_UNROLL_M 8
  458. #define QGEMM_DEFAULT_UNROLL_M 2
  459. #define CGEMM_DEFAULT_UNROLL_M 4
  460. #define ZGEMM_DEFAULT_UNROLL_M 2
  461. #define XGEMM_DEFAULT_UNROLL_M 1
  462. #define CGEMM3M_DEFAULT_UNROLL_N 4
  463. #define CGEMM3M_DEFAULT_UNROLL_M 8
  464. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  465. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  466. #define GEMV_UNROLL 8
  467. #endif
  468. #if defined(ARCH_X86_64)
  469. #define SGEMM_DEFAULT_P 768
  470. #define DGEMM_DEFAULT_P 576
  471. #define ZGEMM_DEFAULT_P 288
  472. #define CGEMM_DEFAULT_P 576
  473. #else
  474. #define SGEMM_DEFAULT_P 448
  475. #define DGEMM_DEFAULT_P 480
  476. #define ZGEMM_DEFAULT_P 112
  477. #define CGEMM_DEFAULT_P 224
  478. #endif
  479. #define QGEMM_DEFAULT_P 112
  480. #define XGEMM_DEFAULT_P 56
  481. #if defined(ARCH_X86_64)
  482. #define SGEMM_DEFAULT_Q 192
  483. #define DGEMM_DEFAULT_Q 160
  484. #define ZGEMM_DEFAULT_Q 160
  485. #define CGEMM_DEFAULT_Q 160
  486. #else
  487. #define SGEMM_DEFAULT_Q 224
  488. #define DGEMM_DEFAULT_Q 224
  489. #define ZGEMM_DEFAULT_Q 224
  490. #define CGEMM_DEFAULT_Q 224
  491. #endif
  492. #define QGEMM_DEFAULT_Q 224
  493. #define XGEMM_DEFAULT_Q 224
  494. #define CGEMM3M_DEFAULT_P 448
  495. #define ZGEMM3M_DEFAULT_P 224
  496. #define XGEMM3M_DEFAULT_P 112
  497. #define CGEMM3M_DEFAULT_Q 224
  498. #define ZGEMM3M_DEFAULT_Q 224
  499. #define XGEMM3M_DEFAULT_Q 224
  500. #define CGEMM3M_DEFAULT_R 12288
  501. #define ZGEMM3M_DEFAULT_R 12288
  502. #define XGEMM3M_DEFAULT_R 12288
  503. #define SGEMM_DEFAULT_R 12288
  504. #define QGEMM_DEFAULT_R qgemm_r
  505. #define DGEMM_DEFAULT_R 12288
  506. #define CGEMM_DEFAULT_R cgemm_r
  507. #define ZGEMM_DEFAULT_R zgemm_r
  508. #define XGEMM_DEFAULT_R xgemm_r
  509. #define SYMV_P 16
  510. #define HAVE_EXCLUSIVE_CACHE
  511. #define GEMM_THREAD gemm_thread_mn
  512. #endif
  513. #ifdef ZEN
  514. #define SNUMOPT 16
  515. #define DNUMOPT 8
  516. #define GEMM_DEFAULT_OFFSET_A 0
  517. #define GEMM_DEFAULT_OFFSET_B 0
  518. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  519. #define SYMV_P 8
  520. #if defined(XDOUBLE) || defined(DOUBLE)
  521. #define SWITCH_RATIO 4
  522. #define GEMM_PREFERED_SIZE 4
  523. #else
  524. #define SWITCH_RATIO 8
  525. #define GEMM_PREFERED_SIZE 8
  526. #endif
  527. #ifdef ARCH_X86
  528. #define SGEMM_DEFAULT_UNROLL_M 4
  529. #define DGEMM_DEFAULT_UNROLL_M 2
  530. #define QGEMM_DEFAULT_UNROLL_M 2
  531. #define CGEMM_DEFAULT_UNROLL_M 2
  532. #define ZGEMM_DEFAULT_UNROLL_M 1
  533. #define XGEMM_DEFAULT_UNROLL_M 1
  534. #define SGEMM_DEFAULT_UNROLL_N 4
  535. #define DGEMM_DEFAULT_UNROLL_N 4
  536. #define QGEMM_DEFAULT_UNROLL_N 2
  537. #define CGEMM_DEFAULT_UNROLL_N 2
  538. #define ZGEMM_DEFAULT_UNROLL_N 2
  539. #define XGEMM_DEFAULT_UNROLL_N 1
  540. #else
  541. #define SGEMM_DEFAULT_UNROLL_M 8
  542. #define DGEMM_DEFAULT_UNROLL_M 4
  543. #define QGEMM_DEFAULT_UNROLL_M 2
  544. #define CGEMM_DEFAULT_UNROLL_M 8
  545. #define ZGEMM_DEFAULT_UNROLL_M 4
  546. #define XGEMM_DEFAULT_UNROLL_M 1
  547. #define SGEMM_DEFAULT_UNROLL_N 4
  548. #define DGEMM_DEFAULT_UNROLL_N 8
  549. #define QGEMM_DEFAULT_UNROLL_N 2
  550. #define CGEMM_DEFAULT_UNROLL_N 2
  551. #define ZGEMM_DEFAULT_UNROLL_N 2
  552. #define XGEMM_DEFAULT_UNROLL_N 1
  553. /*
  554. #define SGEMM_DEFAULT_UNROLL_MN 32
  555. #define DGEMM_DEFAULT_UNROLL_MN 32
  556. */
  557. #endif
  558. #ifdef ARCH_X86
  559. #define SGEMM_DEFAULT_P 512
  560. #define SGEMM_DEFAULT_R sgemm_r
  561. #define DGEMM_DEFAULT_P 512
  562. #define DGEMM_DEFAULT_R dgemm_r
  563. #define QGEMM_DEFAULT_P 504
  564. #define QGEMM_DEFAULT_R qgemm_r
  565. #define CGEMM_DEFAULT_P 128
  566. #define CGEMM_DEFAULT_R 1024
  567. #define ZGEMM_DEFAULT_P 512
  568. #define ZGEMM_DEFAULT_R zgemm_r
  569. #define XGEMM_DEFAULT_P 252
  570. #define XGEMM_DEFAULT_R xgemm_r
  571. #define SGEMM_DEFAULT_Q 256
  572. #define DGEMM_DEFAULT_Q 256
  573. #define QGEMM_DEFAULT_Q 128
  574. #define CGEMM_DEFAULT_Q 256
  575. #define ZGEMM_DEFAULT_Q 192
  576. #define XGEMM_DEFAULT_Q 128
  577. #else
  578. #define SGEMM_DEFAULT_P 320
  579. #define DGEMM_DEFAULT_P 512
  580. #define CGEMM_DEFAULT_P 256
  581. #define ZGEMM_DEFAULT_P 192
  582. #ifdef WINDOWS_ABI
  583. #define SGEMM_DEFAULT_Q 320
  584. #define DGEMM_DEFAULT_Q 128
  585. #else
  586. #define SGEMM_DEFAULT_Q 320
  587. #define DGEMM_DEFAULT_Q 256
  588. #endif
  589. #define CGEMM_DEFAULT_Q 256
  590. #define ZGEMM_DEFAULT_Q 192
  591. #define SGEMM_DEFAULT_R sgemm_r
  592. #define DGEMM_DEFAULT_R 13824
  593. #define CGEMM_DEFAULT_R cgemm_r
  594. #define ZGEMM_DEFAULT_R zgemm_r
  595. #define QGEMM_DEFAULT_Q 128
  596. #define QGEMM_DEFAULT_P 504
  597. #define QGEMM_DEFAULT_R qgemm_r
  598. #define XGEMM_DEFAULT_P 252
  599. #define XGEMM_DEFAULT_R xgemm_r
  600. #define XGEMM_DEFAULT_Q 128
  601. #define CGEMM3M_DEFAULT_UNROLL_N 4
  602. #define CGEMM3M_DEFAULT_UNROLL_M 8
  603. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  604. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  605. #define CGEMM3M_DEFAULT_P 320
  606. #define ZGEMM3M_DEFAULT_P 256
  607. #define XGEMM3M_DEFAULT_P 112
  608. #define CGEMM3M_DEFAULT_Q 320
  609. #define ZGEMM3M_DEFAULT_Q 256
  610. #define XGEMM3M_DEFAULT_Q 224
  611. #define CGEMM3M_DEFAULT_R 12288
  612. #define ZGEMM3M_DEFAULT_R 12288
  613. #define XGEMM3M_DEFAULT_R 12288
  614. #endif
  615. #endif
  616. #ifdef ATHLON
  617. #define SNUMOPT 4
  618. #define DNUMOPT 2
  619. #define GEMM_DEFAULT_OFFSET_A 0
  620. #define GEMM_DEFAULT_OFFSET_B 384
  621. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  622. #define SGEMM_DEFAULT_UNROLL_N 4
  623. #define DGEMM_DEFAULT_UNROLL_N 4
  624. #define QGEMM_DEFAULT_UNROLL_N 2
  625. #define CGEMM_DEFAULT_UNROLL_N 2
  626. #define ZGEMM_DEFAULT_UNROLL_N 2
  627. #define XGEMM_DEFAULT_UNROLL_N 1
  628. #define SGEMM_DEFAULT_UNROLL_M 2
  629. #define DGEMM_DEFAULT_UNROLL_M 1
  630. #define QGEMM_DEFAULT_UNROLL_M 2
  631. #define CGEMM_DEFAULT_UNROLL_M 1
  632. #define ZGEMM_DEFAULT_UNROLL_M 1
  633. #define XGEMM_DEFAULT_UNROLL_M 1
  634. #define SGEMM_DEFAULT_R sgemm_r
  635. #define DGEMM_DEFAULT_R dgemm_r
  636. #define QGEMM_DEFAULT_R qgemm_r
  637. #define CGEMM_DEFAULT_R cgemm_r
  638. #define ZGEMM_DEFAULT_R zgemm_r
  639. #define XGEMM_DEFAULT_R xgemm_r
  640. #define SGEMM_DEFAULT_P 208
  641. #define DGEMM_DEFAULT_P 104
  642. #define QGEMM_DEFAULT_P 56
  643. #define CGEMM_DEFAULT_P 104
  644. #define ZGEMM_DEFAULT_P 56
  645. #define XGEMM_DEFAULT_P 28
  646. #define SGEMM_DEFAULT_Q 208
  647. #define DGEMM_DEFAULT_Q 208
  648. #define QGEMM_DEFAULT_Q 208
  649. #define CGEMM_DEFAULT_Q 208
  650. #define ZGEMM_DEFAULT_Q 208
  651. #define XGEMM_DEFAULT_Q 208
  652. #define SYMV_P 16
  653. #define HAVE_EXCLUSIVE_CACHE
  654. #endif
  655. #ifdef VIAC3
  656. #define SNUMOPT 2
  657. #define DNUMOPT 1
  658. #define GEMM_DEFAULT_OFFSET_A 0
  659. #define GEMM_DEFAULT_OFFSET_B 256
  660. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  661. #define SGEMM_DEFAULT_UNROLL_N 4
  662. #define DGEMM_DEFAULT_UNROLL_N 4
  663. #define QGEMM_DEFAULT_UNROLL_N 2
  664. #define CGEMM_DEFAULT_UNROLL_N 2
  665. #define ZGEMM_DEFAULT_UNROLL_N 2
  666. #define XGEMM_DEFAULT_UNROLL_N 1
  667. #define SGEMM_DEFAULT_UNROLL_M 2
  668. #define DGEMM_DEFAULT_UNROLL_M 1
  669. #define QGEMM_DEFAULT_UNROLL_M 2
  670. #define CGEMM_DEFAULT_UNROLL_M 1
  671. #define ZGEMM_DEFAULT_UNROLL_M 1
  672. #define XGEMM_DEFAULT_UNROLL_M 1
  673. #define SGEMM_DEFAULT_R sgemm_r
  674. #define DGEMM_DEFAULT_R dgemm_r
  675. #define QGEMM_DEFAULT_R qgemm_r
  676. #define CGEMM_DEFAULT_R cgemm_r
  677. #define ZGEMM_DEFAULT_R zgemm_r
  678. #define XGEMM_DEFAULT_R xgemm_r
  679. #define SGEMM_DEFAULT_P 128
  680. #define DGEMM_DEFAULT_P 128
  681. #define QGEMM_DEFAULT_P 128
  682. #define CGEMM_DEFAULT_P 128
  683. #define ZGEMM_DEFAULT_P 128
  684. #define XGEMM_DEFAULT_P 128
  685. #define SGEMM_DEFAULT_Q 512
  686. #define DGEMM_DEFAULT_Q 256
  687. #define QGEMM_DEFAULT_Q 256
  688. #define CGEMM_DEFAULT_Q 256
  689. #define ZGEMM_DEFAULT_Q 128
  690. #define XGEMM_DEFAULT_Q 128
  691. #define SYMV_P 16
  692. #endif
  693. #ifdef NANO
  694. #define SNUMOPT 4
  695. #define DNUMOPT 2
  696. #define GEMM_DEFAULT_OFFSET_A 64
  697. #define GEMM_DEFAULT_OFFSET_B 256
  698. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x01ffffUL
  699. #ifdef ARCH_X86
  700. #define SGEMM_DEFAULT_UNROLL_N 4
  701. #define DGEMM_DEFAULT_UNROLL_N 4
  702. #define QGEMM_DEFAULT_UNROLL_N 2
  703. #define CGEMM_DEFAULT_UNROLL_N 2
  704. #define ZGEMM_DEFAULT_UNROLL_N 2
  705. #define XGEMM_DEFAULT_UNROLL_N 1
  706. #define SGEMM_DEFAULT_UNROLL_M 4
  707. #define DGEMM_DEFAULT_UNROLL_M 2
  708. #define QGEMM_DEFAULT_UNROLL_M 2
  709. #define CGEMM_DEFAULT_UNROLL_M 2
  710. #define ZGEMM_DEFAULT_UNROLL_M 1
  711. #define XGEMM_DEFAULT_UNROLL_M 1
  712. #else
  713. #define SGEMM_DEFAULT_UNROLL_N 8
  714. #define DGEMM_DEFAULT_UNROLL_N 4
  715. #define QGEMM_DEFAULT_UNROLL_N 2
  716. #define CGEMM_DEFAULT_UNROLL_N 4
  717. #define ZGEMM_DEFAULT_UNROLL_N 2
  718. #define XGEMM_DEFAULT_UNROLL_N 1
  719. #define SGEMM_DEFAULT_UNROLL_M 4
  720. #define DGEMM_DEFAULT_UNROLL_M 4
  721. #define QGEMM_DEFAULT_UNROLL_M 2
  722. #define CGEMM_DEFAULT_UNROLL_M 2
  723. #define ZGEMM_DEFAULT_UNROLL_M 2
  724. #define XGEMM_DEFAULT_UNROLL_M 1
  725. #endif
  726. #define SGEMM_DEFAULT_P 288
  727. #define DGEMM_DEFAULT_P 288
  728. #define QGEMM_DEFAULT_P 288
  729. #define CGEMM_DEFAULT_P 288
  730. #define ZGEMM_DEFAULT_P 288
  731. #define XGEMM_DEFAULT_P 288
  732. #define SGEMM_DEFAULT_R sgemm_r
  733. #define DGEMM_DEFAULT_R dgemm_r
  734. #define QGEMM_DEFAULT_R qgemm_r
  735. #define CGEMM_DEFAULT_R cgemm_r
  736. #define ZGEMM_DEFAULT_R zgemm_r
  737. #define XGEMM_DEFAULT_R xgemm_r
  738. #define SGEMM_DEFAULT_Q 256
  739. #define DGEMM_DEFAULT_Q 128
  740. #define QGEMM_DEFAULT_Q 64
  741. #define CGEMM_DEFAULT_Q 128
  742. #define ZGEMM_DEFAULT_Q 64
  743. #define XGEMM_DEFAULT_Q 32
  744. #define SYMV_P 16
  745. #define HAVE_EXCLUSIVE_CACHE
  746. #endif
  747. #if defined(PENTIUM) || defined(PENTIUM2) || defined(PENTIUM3)
  748. #ifdef HAVE_SSE
  749. #define SNUMOPT 2
  750. #else
  751. #define SNUMOPT 1
  752. #endif
  753. #define DNUMOPT 1
  754. #define GEMM_DEFAULT_OFFSET_A 0
  755. #define GEMM_DEFAULT_OFFSET_B 0
  756. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  757. #ifdef HAVE_SSE
  758. #define SGEMM_DEFAULT_UNROLL_M 8
  759. #define CGEMM_DEFAULT_UNROLL_M 4
  760. #else
  761. #define SGEMM_DEFAULT_UNROLL_M 4
  762. #define CGEMM_DEFAULT_UNROLL_M 2
  763. #endif
  764. #define DGEMM_DEFAULT_UNROLL_M 2
  765. #define SGEMM_DEFAULT_UNROLL_N 2
  766. #define DGEMM_DEFAULT_UNROLL_N 2
  767. #define QGEMM_DEFAULT_UNROLL_M 2
  768. #define QGEMM_DEFAULT_UNROLL_N 2
  769. #define CGEMM_DEFAULT_UNROLL_N 1
  770. #define ZGEMM_DEFAULT_UNROLL_M 1
  771. #define ZGEMM_DEFAULT_UNROLL_N 1
  772. #define XGEMM_DEFAULT_UNROLL_M 1
  773. #define XGEMM_DEFAULT_UNROLL_N 1
  774. #define SGEMM_DEFAULT_P sgemm_p
  775. #define SGEMM_DEFAULT_Q 256
  776. #define SGEMM_DEFAULT_R sgemm_r
  777. #define DGEMM_DEFAULT_P dgemm_p
  778. #define DGEMM_DEFAULT_Q 256
  779. #define DGEMM_DEFAULT_R dgemm_r
  780. #define QGEMM_DEFAULT_P qgemm_p
  781. #define QGEMM_DEFAULT_Q 256
  782. #define QGEMM_DEFAULT_R qgemm_r
  783. #define CGEMM_DEFAULT_P cgemm_p
  784. #define CGEMM_DEFAULT_Q 256
  785. #define CGEMM_DEFAULT_R cgemm_r
  786. #define ZGEMM_DEFAULT_P zgemm_p
  787. #define ZGEMM_DEFAULT_Q 256
  788. #define ZGEMM_DEFAULT_R zgemm_r
  789. #define XGEMM_DEFAULT_P xgemm_p
  790. #define XGEMM_DEFAULT_Q 256
  791. #define XGEMM_DEFAULT_R xgemm_r
  792. #define SYMV_P 4
  793. #endif
  794. #ifdef PENTIUMM
  795. #define SNUMOPT 2
  796. #define DNUMOPT 1
  797. #define GEMM_DEFAULT_OFFSET_A 0
  798. #define GEMM_DEFAULT_OFFSET_B 0
  799. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  800. #ifdef CORE_YONAH
  801. #define SGEMM_DEFAULT_UNROLL_M 4
  802. #define SGEMM_DEFAULT_UNROLL_N 4
  803. #define DGEMM_DEFAULT_UNROLL_M 2
  804. #define DGEMM_DEFAULT_UNROLL_N 4
  805. #define QGEMM_DEFAULT_UNROLL_M 2
  806. #define QGEMM_DEFAULT_UNROLL_N 2
  807. #define CGEMM_DEFAULT_UNROLL_M 2
  808. #define CGEMM_DEFAULT_UNROLL_N 2
  809. #define ZGEMM_DEFAULT_UNROLL_M 1
  810. #define ZGEMM_DEFAULT_UNROLL_N 2
  811. #define XGEMM_DEFAULT_UNROLL_M 1
  812. #define XGEMM_DEFAULT_UNROLL_N 1
  813. #else
  814. #define SGEMM_DEFAULT_UNROLL_M 8
  815. #define SGEMM_DEFAULT_UNROLL_N 2
  816. #define DGEMM_DEFAULT_UNROLL_M 2
  817. #define DGEMM_DEFAULT_UNROLL_N 2
  818. #define QGEMM_DEFAULT_UNROLL_M 2
  819. #define QGEMM_DEFAULT_UNROLL_N 2
  820. #define CGEMM_DEFAULT_UNROLL_M 4
  821. #define CGEMM_DEFAULT_UNROLL_N 1
  822. #define ZGEMM_DEFAULT_UNROLL_M 1
  823. #define ZGEMM_DEFAULT_UNROLL_N 1
  824. #define XGEMM_DEFAULT_UNROLL_M 1
  825. #define XGEMM_DEFAULT_UNROLL_N 1
  826. #endif
  827. #define SGEMM_DEFAULT_P sgemm_p
  828. #define SGEMM_DEFAULT_Q 256
  829. #define SGEMM_DEFAULT_R sgemm_r
  830. #define DGEMM_DEFAULT_P dgemm_p
  831. #define DGEMM_DEFAULT_Q 256
  832. #define DGEMM_DEFAULT_R dgemm_r
  833. #define QGEMM_DEFAULT_P qgemm_p
  834. #define QGEMM_DEFAULT_Q 256
  835. #define QGEMM_DEFAULT_R qgemm_r
  836. #define CGEMM_DEFAULT_P cgemm_p
  837. #define CGEMM_DEFAULT_Q 256
  838. #define CGEMM_DEFAULT_R cgemm_r
  839. #define ZGEMM_DEFAULT_P zgemm_p
  840. #define ZGEMM_DEFAULT_Q 256
  841. #define ZGEMM_DEFAULT_R zgemm_r
  842. #define XGEMM_DEFAULT_P xgemm_p
  843. #define XGEMM_DEFAULT_Q 256
  844. #define XGEMM_DEFAULT_R xgemm_r
  845. #define SYMV_P 4
  846. #endif
  847. #ifdef CORE_NORTHWOOD
  848. #define SNUMOPT 4
  849. #define DNUMOPT 2
  850. #define GEMM_DEFAULT_OFFSET_A 0
  851. #define GEMM_DEFAULT_OFFSET_B 32
  852. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  853. #define SYMV_P 8
  854. #define SGEMM_DEFAULT_UNROLL_M 8
  855. #define DGEMM_DEFAULT_UNROLL_M 4
  856. #define QGEMM_DEFAULT_UNROLL_M 2
  857. #define CGEMM_DEFAULT_UNROLL_M 4
  858. #define ZGEMM_DEFAULT_UNROLL_M 2
  859. #define XGEMM_DEFAULT_UNROLL_M 1
  860. #define SGEMM_DEFAULT_UNROLL_N 2
  861. #define DGEMM_DEFAULT_UNROLL_N 2
  862. #define QGEMM_DEFAULT_UNROLL_N 2
  863. #define CGEMM_DEFAULT_UNROLL_N 1
  864. #define ZGEMM_DEFAULT_UNROLL_N 1
  865. #define XGEMM_DEFAULT_UNROLL_N 1
  866. #define SGEMM_DEFAULT_P sgemm_p
  867. #define SGEMM_DEFAULT_R sgemm_r
  868. #define DGEMM_DEFAULT_P dgemm_p
  869. #define DGEMM_DEFAULT_R dgemm_r
  870. #define QGEMM_DEFAULT_P qgemm_p
  871. #define QGEMM_DEFAULT_R qgemm_r
  872. #define CGEMM_DEFAULT_P cgemm_p
  873. #define CGEMM_DEFAULT_R cgemm_r
  874. #define ZGEMM_DEFAULT_P zgemm_p
  875. #define ZGEMM_DEFAULT_R zgemm_r
  876. #define XGEMM_DEFAULT_P xgemm_p
  877. #define XGEMM_DEFAULT_R xgemm_r
  878. #define SGEMM_DEFAULT_Q 128
  879. #define DGEMM_DEFAULT_Q 128
  880. #define QGEMM_DEFAULT_Q 128
  881. #define CGEMM_DEFAULT_Q 128
  882. #define ZGEMM_DEFAULT_Q 128
  883. #define XGEMM_DEFAULT_Q 128
  884. #endif
  885. #ifdef CORE_PRESCOTT
  886. #define SNUMOPT 4
  887. #define DNUMOPT 2
  888. #ifndef __64BIT__
  889. #define GEMM_DEFAULT_OFFSET_A 128
  890. #define GEMM_DEFAULT_OFFSET_B 192
  891. #else
  892. #define GEMM_DEFAULT_OFFSET_A 0
  893. #define GEMM_DEFAULT_OFFSET_B 256
  894. #endif
  895. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  896. #define SYMV_P 8
  897. #ifdef ARCH_X86
  898. #define SGEMM_DEFAULT_UNROLL_M 4
  899. #define DGEMM_DEFAULT_UNROLL_M 2
  900. #define QGEMM_DEFAULT_UNROLL_M 2
  901. #define CGEMM_DEFAULT_UNROLL_M 2
  902. #define ZGEMM_DEFAULT_UNROLL_M 1
  903. #define XGEMM_DEFAULT_UNROLL_M 1
  904. #else
  905. #define SGEMM_DEFAULT_UNROLL_M 8
  906. #define DGEMM_DEFAULT_UNROLL_M 4
  907. #define QGEMM_DEFAULT_UNROLL_M 2
  908. #define CGEMM_DEFAULT_UNROLL_M 4
  909. #define ZGEMM_DEFAULT_UNROLL_M 2
  910. #define XGEMM_DEFAULT_UNROLL_M 1
  911. #endif
  912. #define SGEMM_DEFAULT_UNROLL_N 4
  913. #define DGEMM_DEFAULT_UNROLL_N 4
  914. #define QGEMM_DEFAULT_UNROLL_N 2
  915. #define CGEMM_DEFAULT_UNROLL_N 2
  916. #define ZGEMM_DEFAULT_UNROLL_N 2
  917. #define XGEMM_DEFAULT_UNROLL_N 1
  918. #define SGEMM_DEFAULT_P sgemm_p
  919. #define SGEMM_DEFAULT_R sgemm_r
  920. #define DGEMM_DEFAULT_P dgemm_p
  921. #define DGEMM_DEFAULT_R dgemm_r
  922. #define QGEMM_DEFAULT_P qgemm_p
  923. #define QGEMM_DEFAULT_R qgemm_r
  924. #define CGEMM_DEFAULT_P cgemm_p
  925. #define CGEMM_DEFAULT_R cgemm_r
  926. #define ZGEMM_DEFAULT_P zgemm_p
  927. #define ZGEMM_DEFAULT_R zgemm_r
  928. #define XGEMM_DEFAULT_P xgemm_p
  929. #define XGEMM_DEFAULT_R xgemm_r
  930. #define SGEMM_DEFAULT_Q 128
  931. #define DGEMM_DEFAULT_Q 128
  932. #define QGEMM_DEFAULT_Q 128
  933. #define CGEMM_DEFAULT_Q 128
  934. #define ZGEMM_DEFAULT_Q 128
  935. #define XGEMM_DEFAULT_Q 128
  936. #endif
  937. #ifdef CORE2
  938. #define SNUMOPT 8
  939. #define DNUMOPT 4
  940. #define GEMM_DEFAULT_OFFSET_A 448
  941. #define GEMM_DEFAULT_OFFSET_B 128
  942. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  943. #define SYMV_P 8
  944. #define SWITCH_RATIO 4
  945. #ifdef ARCH_X86
  946. #define SGEMM_DEFAULT_UNROLL_M 8
  947. #define DGEMM_DEFAULT_UNROLL_M 4
  948. #define QGEMM_DEFAULT_UNROLL_M 2
  949. #define CGEMM_DEFAULT_UNROLL_M 4
  950. #define ZGEMM_DEFAULT_UNROLL_M 2
  951. #define XGEMM_DEFAULT_UNROLL_M 1
  952. #define SGEMM_DEFAULT_UNROLL_N 2
  953. #define DGEMM_DEFAULT_UNROLL_N 2
  954. #define QGEMM_DEFAULT_UNROLL_N 2
  955. #define CGEMM_DEFAULT_UNROLL_N 1
  956. #define ZGEMM_DEFAULT_UNROLL_N 1
  957. #define XGEMM_DEFAULT_UNROLL_N 1
  958. #define MASK(a, b) ((((a) + (b) - 1) / (b)) * (b))
  959. #else
  960. #define SGEMM_DEFAULT_UNROLL_M 8
  961. #define DGEMM_DEFAULT_UNROLL_M 4
  962. #define QGEMM_DEFAULT_UNROLL_M 2
  963. #define CGEMM_DEFAULT_UNROLL_M 4
  964. #define ZGEMM_DEFAULT_UNROLL_M 2
  965. #define XGEMM_DEFAULT_UNROLL_M 1
  966. #define SGEMM_DEFAULT_UNROLL_N 4
  967. #define DGEMM_DEFAULT_UNROLL_N 4
  968. #define QGEMM_DEFAULT_UNROLL_N 2
  969. #define CGEMM_DEFAULT_UNROLL_N 2
  970. #define ZGEMM_DEFAULT_UNROLL_N 2
  971. #define XGEMM_DEFAULT_UNROLL_N 1
  972. #endif
  973. #define SGEMM_DEFAULT_P sgemm_p
  974. #define SGEMM_DEFAULT_R sgemm_r
  975. #define DGEMM_DEFAULT_P dgemm_p
  976. #define DGEMM_DEFAULT_R dgemm_r
  977. #define QGEMM_DEFAULT_P qgemm_p
  978. #define QGEMM_DEFAULT_R qgemm_r
  979. #define CGEMM_DEFAULT_P cgemm_p
  980. #define CGEMM_DEFAULT_R cgemm_r
  981. #define ZGEMM_DEFAULT_P zgemm_p
  982. #define ZGEMM_DEFAULT_R zgemm_r
  983. #define XGEMM_DEFAULT_P xgemm_p
  984. #define XGEMM_DEFAULT_R xgemm_r
  985. #define SGEMM_DEFAULT_Q 256
  986. #define DGEMM_DEFAULT_Q 256
  987. #define QGEMM_DEFAULT_Q 256
  988. #define CGEMM_DEFAULT_Q 256
  989. #define ZGEMM_DEFAULT_Q 256
  990. #define XGEMM_DEFAULT_Q 256
  991. #endif
  992. #ifdef PENRYN
  993. #define SNUMOPT 8
  994. #define DNUMOPT 4
  995. #define GEMM_DEFAULT_OFFSET_A 128
  996. #define GEMM_DEFAULT_OFFSET_B 0
  997. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  998. #define SYMV_P 8
  999. #define SWITCH_RATIO 4
  1000. #ifdef ARCH_X86
  1001. #define SGEMM_DEFAULT_UNROLL_M 4
  1002. #define DGEMM_DEFAULT_UNROLL_M 2
  1003. #define QGEMM_DEFAULT_UNROLL_M 2
  1004. #define CGEMM_DEFAULT_UNROLL_M 2
  1005. #define ZGEMM_DEFAULT_UNROLL_M 1
  1006. #define XGEMM_DEFAULT_UNROLL_M 1
  1007. #define SGEMM_DEFAULT_UNROLL_N 4
  1008. #define DGEMM_DEFAULT_UNROLL_N 4
  1009. #define QGEMM_DEFAULT_UNROLL_N 2
  1010. #define CGEMM_DEFAULT_UNROLL_N 2
  1011. #define ZGEMM_DEFAULT_UNROLL_N 2
  1012. #define XGEMM_DEFAULT_UNROLL_N 1
  1013. #else
  1014. #define SGEMM_DEFAULT_UNROLL_M 8
  1015. #define DGEMM_DEFAULT_UNROLL_M 4
  1016. #define QGEMM_DEFAULT_UNROLL_M 2
  1017. #define CGEMM_DEFAULT_UNROLL_M 4
  1018. #define ZGEMM_DEFAULT_UNROLL_M 2
  1019. #define XGEMM_DEFAULT_UNROLL_M 1
  1020. #define SGEMM_DEFAULT_UNROLL_N 4
  1021. #define DGEMM_DEFAULT_UNROLL_N 4
  1022. #define QGEMM_DEFAULT_UNROLL_N 2
  1023. #define CGEMM_DEFAULT_UNROLL_N 2
  1024. #define ZGEMM_DEFAULT_UNROLL_N 2
  1025. #define XGEMM_DEFAULT_UNROLL_N 1
  1026. #endif
  1027. #define SGEMM_DEFAULT_P sgemm_p
  1028. #define SGEMM_DEFAULT_R sgemm_r
  1029. #define DGEMM_DEFAULT_P dgemm_p
  1030. #define DGEMM_DEFAULT_R dgemm_r
  1031. #define QGEMM_DEFAULT_P qgemm_p
  1032. #define QGEMM_DEFAULT_R qgemm_r
  1033. #define CGEMM_DEFAULT_P cgemm_p
  1034. #define CGEMM_DEFAULT_R cgemm_r
  1035. #define ZGEMM_DEFAULT_P zgemm_p
  1036. #define ZGEMM_DEFAULT_R zgemm_r
  1037. #define XGEMM_DEFAULT_P xgemm_p
  1038. #define XGEMM_DEFAULT_R xgemm_r
  1039. #define SGEMM_DEFAULT_Q 512
  1040. #define DGEMM_DEFAULT_Q 256
  1041. #define QGEMM_DEFAULT_Q 128
  1042. #define CGEMM_DEFAULT_Q 512
  1043. #define ZGEMM_DEFAULT_Q 256
  1044. #define XGEMM_DEFAULT_Q 128
  1045. #define GETRF_FACTOR 0.75
  1046. #endif
  1047. #ifdef DUNNINGTON
  1048. #define SNUMOPT 8
  1049. #define DNUMOPT 4
  1050. #define GEMM_DEFAULT_OFFSET_A 128
  1051. #define GEMM_DEFAULT_OFFSET_B 0
  1052. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1053. #define SYMV_P 8
  1054. #define SWITCH_RATIO 4
  1055. #ifdef ARCH_X86
  1056. #define SGEMM_DEFAULT_UNROLL_M 4
  1057. #define DGEMM_DEFAULT_UNROLL_M 2
  1058. #define QGEMM_DEFAULT_UNROLL_M 2
  1059. #define CGEMM_DEFAULT_UNROLL_M 2
  1060. #define ZGEMM_DEFAULT_UNROLL_M 1
  1061. #define XGEMM_DEFAULT_UNROLL_M 1
  1062. #define SGEMM_DEFAULT_UNROLL_N 4
  1063. #define DGEMM_DEFAULT_UNROLL_N 4
  1064. #define QGEMM_DEFAULT_UNROLL_N 2
  1065. #define CGEMM_DEFAULT_UNROLL_N 2
  1066. #define ZGEMM_DEFAULT_UNROLL_N 2
  1067. #define XGEMM_DEFAULT_UNROLL_N 1
  1068. #else
  1069. #define SGEMM_DEFAULT_UNROLL_M 8
  1070. #define DGEMM_DEFAULT_UNROLL_M 4
  1071. #define QGEMM_DEFAULT_UNROLL_M 2
  1072. #define CGEMM_DEFAULT_UNROLL_M 4
  1073. #define ZGEMM_DEFAULT_UNROLL_M 2
  1074. #define XGEMM_DEFAULT_UNROLL_M 1
  1075. #define SGEMM_DEFAULT_UNROLL_N 4
  1076. #define DGEMM_DEFAULT_UNROLL_N 4
  1077. #define QGEMM_DEFAULT_UNROLL_N 2
  1078. #define CGEMM_DEFAULT_UNROLL_N 2
  1079. #define ZGEMM_DEFAULT_UNROLL_N 2
  1080. #define XGEMM_DEFAULT_UNROLL_N 1
  1081. #endif
  1082. #define SGEMM_DEFAULT_P sgemm_p
  1083. #define SGEMM_DEFAULT_R sgemm_r
  1084. #define DGEMM_DEFAULT_P dgemm_p
  1085. #define DGEMM_DEFAULT_R dgemm_r
  1086. #define QGEMM_DEFAULT_P qgemm_p
  1087. #define QGEMM_DEFAULT_R qgemm_r
  1088. #define CGEMM_DEFAULT_P cgemm_p
  1089. #define CGEMM_DEFAULT_R cgemm_r
  1090. #define ZGEMM_DEFAULT_P zgemm_p
  1091. #define ZGEMM_DEFAULT_R zgemm_r
  1092. #define XGEMM_DEFAULT_P xgemm_p
  1093. #define XGEMM_DEFAULT_R xgemm_r
  1094. #define SGEMM_DEFAULT_Q 768
  1095. #define DGEMM_DEFAULT_Q 384
  1096. #define QGEMM_DEFAULT_Q 192
  1097. #define CGEMM_DEFAULT_Q 768
  1098. #define ZGEMM_DEFAULT_Q 384
  1099. #define XGEMM_DEFAULT_Q 192
  1100. #define GETRF_FACTOR 0.75
  1101. #define GEMM_THREAD gemm_thread_mn
  1102. #endif
  1103. #ifdef NEHALEM
  1104. #define SNUMOPT 8
  1105. #define DNUMOPT 4
  1106. #define GEMM_DEFAULT_OFFSET_A 32
  1107. #define GEMM_DEFAULT_OFFSET_B 0
  1108. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1109. #define SYMV_P 8
  1110. #define SWITCH_RATIO 4
  1111. #ifdef ARCH_X86
  1112. #define SGEMM_DEFAULT_UNROLL_M 4
  1113. #define DGEMM_DEFAULT_UNROLL_M 2
  1114. #define QGEMM_DEFAULT_UNROLL_M 2
  1115. #define CGEMM_DEFAULT_UNROLL_M 2
  1116. #define ZGEMM_DEFAULT_UNROLL_M 1
  1117. #define XGEMM_DEFAULT_UNROLL_M 1
  1118. #define SGEMM_DEFAULT_UNROLL_N 4
  1119. #define DGEMM_DEFAULT_UNROLL_N 4
  1120. #define QGEMM_DEFAULT_UNROLL_N 2
  1121. #define CGEMM_DEFAULT_UNROLL_N 2
  1122. #define ZGEMM_DEFAULT_UNROLL_N 2
  1123. #define XGEMM_DEFAULT_UNROLL_N 1
  1124. #else
  1125. #define SGEMM_DEFAULT_UNROLL_M 4
  1126. #define DGEMM_DEFAULT_UNROLL_M 2
  1127. #define QGEMM_DEFAULT_UNROLL_M 2
  1128. #define CGEMM_DEFAULT_UNROLL_M 2
  1129. #define ZGEMM_DEFAULT_UNROLL_M 1
  1130. #define XGEMM_DEFAULT_UNROLL_M 1
  1131. #define SGEMM_DEFAULT_UNROLL_N 8
  1132. #define DGEMM_DEFAULT_UNROLL_N 8
  1133. #define QGEMM_DEFAULT_UNROLL_N 2
  1134. #define CGEMM_DEFAULT_UNROLL_N 4
  1135. #define ZGEMM_DEFAULT_UNROLL_N 4
  1136. #define XGEMM_DEFAULT_UNROLL_N 1
  1137. #endif
  1138. #define SGEMM_DEFAULT_P 504
  1139. #define SGEMM_DEFAULT_R sgemm_r
  1140. #define DGEMM_DEFAULT_P 504
  1141. #define DGEMM_DEFAULT_R dgemm_r
  1142. #define QGEMM_DEFAULT_P 504
  1143. #define QGEMM_DEFAULT_R qgemm_r
  1144. #define CGEMM_DEFAULT_P 252
  1145. #define CGEMM_DEFAULT_R cgemm_r
  1146. #define ZGEMM_DEFAULT_P 252
  1147. #define ZGEMM_DEFAULT_R zgemm_r
  1148. #define XGEMM_DEFAULT_P 252
  1149. #define XGEMM_DEFAULT_R xgemm_r
  1150. #define SGEMM_DEFAULT_Q 512
  1151. #define DGEMM_DEFAULT_Q 256
  1152. #define QGEMM_DEFAULT_Q 128
  1153. #define CGEMM_DEFAULT_Q 512
  1154. #define ZGEMM_DEFAULT_Q 256
  1155. #define XGEMM_DEFAULT_Q 128
  1156. #define GETRF_FACTOR 0.72
  1157. #endif
  1158. #ifdef SANDYBRIDGE
  1159. #define SNUMOPT 8
  1160. #define DNUMOPT 4
  1161. #define GEMM_DEFAULT_OFFSET_A 0
  1162. #define GEMM_DEFAULT_OFFSET_B 0
  1163. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1164. #define SYMV_P 8
  1165. #define SWITCH_RATIO 4
  1166. #ifdef ARCH_X86
  1167. #define SGEMM_DEFAULT_UNROLL_M 4
  1168. #define DGEMM_DEFAULT_UNROLL_M 2
  1169. #define QGEMM_DEFAULT_UNROLL_M 2
  1170. #define CGEMM_DEFAULT_UNROLL_M 2
  1171. #define ZGEMM_DEFAULT_UNROLL_M 1
  1172. #define XGEMM_DEFAULT_UNROLL_M 1
  1173. #define SGEMM_DEFAULT_UNROLL_N 4
  1174. #define DGEMM_DEFAULT_UNROLL_N 4
  1175. #define QGEMM_DEFAULT_UNROLL_N 2
  1176. #define CGEMM_DEFAULT_UNROLL_N 2
  1177. #define ZGEMM_DEFAULT_UNROLL_N 2
  1178. #define XGEMM_DEFAULT_UNROLL_N 1
  1179. #else
  1180. #define SGEMM_DEFAULT_UNROLL_M 16
  1181. #define DGEMM_DEFAULT_UNROLL_M 8
  1182. #define QGEMM_DEFAULT_UNROLL_M 2
  1183. #define CGEMM_DEFAULT_UNROLL_M 8
  1184. #define ZGEMM_DEFAULT_UNROLL_M 1
  1185. #define XGEMM_DEFAULT_UNROLL_M 1
  1186. #define SGEMM_DEFAULT_UNROLL_N 4
  1187. #define DGEMM_DEFAULT_UNROLL_N 4
  1188. #define QGEMM_DEFAULT_UNROLL_N 2
  1189. #define CGEMM_DEFAULT_UNROLL_N 2
  1190. #define ZGEMM_DEFAULT_UNROLL_N 4
  1191. #define XGEMM_DEFAULT_UNROLL_N 1
  1192. #endif
  1193. #define SGEMM_DEFAULT_P 768
  1194. #define SGEMM_DEFAULT_R sgemm_r
  1195. /*#define SGEMM_DEFAULT_R 1024*/
  1196. #define DGEMM_DEFAULT_P 512
  1197. #define DGEMM_DEFAULT_R dgemm_r
  1198. /*#define DGEMM_DEFAULT_R 1024*/
  1199. #define QGEMM_DEFAULT_P 504
  1200. #define QGEMM_DEFAULT_R qgemm_r
  1201. #define CGEMM_DEFAULT_P 768
  1202. #define CGEMM_DEFAULT_R cgemm_r
  1203. /*#define CGEMM_DEFAULT_R 1024*/
  1204. #define ZGEMM_DEFAULT_P 512
  1205. #define ZGEMM_DEFAULT_R zgemm_r
  1206. /*#define ZGEMM_DEFAULT_R 1024*/
  1207. #define XGEMM_DEFAULT_P 252
  1208. #define XGEMM_DEFAULT_R xgemm_r
  1209. #define SGEMM_DEFAULT_Q 384
  1210. #define DGEMM_DEFAULT_Q 256
  1211. #define QGEMM_DEFAULT_Q 128
  1212. #define CGEMM_DEFAULT_Q 512
  1213. #define ZGEMM_DEFAULT_Q 192
  1214. #define XGEMM_DEFAULT_Q 128
  1215. #define CGEMM3M_DEFAULT_UNROLL_N 8
  1216. #define CGEMM3M_DEFAULT_UNROLL_M 4
  1217. #define ZGEMM3M_DEFAULT_UNROLL_N 8
  1218. #define ZGEMM3M_DEFAULT_UNROLL_M 2
  1219. #define CGEMM3M_DEFAULT_P 448
  1220. #define ZGEMM3M_DEFAULT_P 224
  1221. #define XGEMM3M_DEFAULT_P 112
  1222. #define CGEMM3M_DEFAULT_Q 224
  1223. #define ZGEMM3M_DEFAULT_Q 224
  1224. #define XGEMM3M_DEFAULT_Q 224
  1225. #define CGEMM3M_DEFAULT_R 12288
  1226. #define ZGEMM3M_DEFAULT_R 12288
  1227. #define XGEMM3M_DEFAULT_R 12288
  1228. #define GETRF_FACTOR 0.72
  1229. #endif
  1230. #ifdef HASWELL
  1231. #define SNUMOPT 16
  1232. #define DNUMOPT 8
  1233. #define GEMM_DEFAULT_OFFSET_A 0
  1234. #define GEMM_DEFAULT_OFFSET_B 0
  1235. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1236. #define SYMV_P 8
  1237. #if defined(XDOUBLE) || defined(DOUBLE)
  1238. #define SWITCH_RATIO 4
  1239. #define GEMM_PREFERED_SIZE 4
  1240. #else
  1241. #define SWITCH_RATIO 8
  1242. #define GEMM_PREFERED_SIZE 8
  1243. #endif
  1244. #ifdef ARCH_X86
  1245. #define SGEMM_DEFAULT_UNROLL_M 4
  1246. #define DGEMM_DEFAULT_UNROLL_M 2
  1247. #define QGEMM_DEFAULT_UNROLL_M 2
  1248. #define CGEMM_DEFAULT_UNROLL_M 2
  1249. #define ZGEMM_DEFAULT_UNROLL_M 1
  1250. #define XGEMM_DEFAULT_UNROLL_M 1
  1251. #define SGEMM_DEFAULT_UNROLL_N 4
  1252. #define DGEMM_DEFAULT_UNROLL_N 4
  1253. #define QGEMM_DEFAULT_UNROLL_N 2
  1254. #define CGEMM_DEFAULT_UNROLL_N 2
  1255. #define ZGEMM_DEFAULT_UNROLL_N 2
  1256. #define XGEMM_DEFAULT_UNROLL_N 1
  1257. #else
  1258. #define SGEMM_DEFAULT_UNROLL_M 8
  1259. #define DGEMM_DEFAULT_UNROLL_M 4
  1260. #define QGEMM_DEFAULT_UNROLL_M 2
  1261. #define CGEMM_DEFAULT_UNROLL_M 8
  1262. #define ZGEMM_DEFAULT_UNROLL_M 4
  1263. #define XGEMM_DEFAULT_UNROLL_M 1
  1264. #define SGEMM_DEFAULT_UNROLL_N 4
  1265. #define DGEMM_DEFAULT_UNROLL_N 8
  1266. #define QGEMM_DEFAULT_UNROLL_N 2
  1267. #define CGEMM_DEFAULT_UNROLL_N 2
  1268. #define ZGEMM_DEFAULT_UNROLL_N 2
  1269. #define XGEMM_DEFAULT_UNROLL_N 1
  1270. /*
  1271. #define SGEMM_DEFAULT_UNROLL_MN 32
  1272. #define DGEMM_DEFAULT_UNROLL_MN 32
  1273. */
  1274. #endif
  1275. #ifdef ARCH_X86
  1276. #define SGEMM_DEFAULT_P 512
  1277. #define SGEMM_DEFAULT_R sgemm_r
  1278. #define DGEMM_DEFAULT_P 512
  1279. #define DGEMM_DEFAULT_R dgemm_r
  1280. #define QGEMM_DEFAULT_P 504
  1281. #define QGEMM_DEFAULT_R qgemm_r
  1282. #define CGEMM_DEFAULT_P 128
  1283. #define CGEMM_DEFAULT_R 1024
  1284. #define ZGEMM_DEFAULT_P 512
  1285. #define ZGEMM_DEFAULT_R zgemm_r
  1286. #define XGEMM_DEFAULT_P 252
  1287. #define XGEMM_DEFAULT_R xgemm_r
  1288. #define SGEMM_DEFAULT_Q 256
  1289. #define DGEMM_DEFAULT_Q 256
  1290. #define QGEMM_DEFAULT_Q 128
  1291. #define CGEMM_DEFAULT_Q 256
  1292. #define ZGEMM_DEFAULT_Q 192
  1293. #define XGEMM_DEFAULT_Q 128
  1294. #else
  1295. #define SGEMM_DEFAULT_P 320
  1296. #define DGEMM_DEFAULT_P 512
  1297. #define CGEMM_DEFAULT_P 256
  1298. #define ZGEMM_DEFAULT_P 192
  1299. #ifdef WINDOWS_ABI
  1300. #define SGEMM_DEFAULT_Q 320
  1301. #define DGEMM_DEFAULT_Q 128
  1302. #else
  1303. #define SGEMM_DEFAULT_Q 320
  1304. #define DGEMM_DEFAULT_Q 256
  1305. #endif
  1306. #define CGEMM_DEFAULT_Q 256
  1307. #define ZGEMM_DEFAULT_Q 192
  1308. #define SGEMM_DEFAULT_R sgemm_r
  1309. #define DGEMM_DEFAULT_R 13824
  1310. #define CGEMM_DEFAULT_R cgemm_r
  1311. #define ZGEMM_DEFAULT_R zgemm_r
  1312. #define QGEMM_DEFAULT_Q 128
  1313. #define QGEMM_DEFAULT_P 504
  1314. #define QGEMM_DEFAULT_R qgemm_r
  1315. #define XGEMM_DEFAULT_P 252
  1316. #define XGEMM_DEFAULT_R xgemm_r
  1317. #define XGEMM_DEFAULT_Q 128
  1318. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1319. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1320. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1321. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1322. #define CGEMM3M_DEFAULT_P 320
  1323. #define ZGEMM3M_DEFAULT_P 256
  1324. #define XGEMM3M_DEFAULT_P 112
  1325. #define CGEMM3M_DEFAULT_Q 320
  1326. #define ZGEMM3M_DEFAULT_Q 256
  1327. #define XGEMM3M_DEFAULT_Q 224
  1328. #define CGEMM3M_DEFAULT_R 12288
  1329. #define ZGEMM3M_DEFAULT_R 12288
  1330. #define XGEMM3M_DEFAULT_R 12288
  1331. #endif
  1332. #endif
  1333. #ifdef SKYLAKEX
  1334. #define SNUMOPT 16
  1335. #define DNUMOPT 8
  1336. #define GEMM_DEFAULT_OFFSET_A 0
  1337. #define GEMM_DEFAULT_OFFSET_B 0
  1338. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1339. #define SYMV_P 8
  1340. #if defined(XDOUBLE) || defined(DOUBLE)
  1341. #define SWITCH_RATIO 8
  1342. #define GEMM_PREFERED_SIZE 8
  1343. #else
  1344. #define SWITCH_RATIO 16
  1345. #define GEMM_PREFERED_SIZE 16
  1346. #endif
  1347. #define USE_SGEMM_KERNEL_DIRECT 1
  1348. #ifdef ARCH_X86
  1349. #define SGEMM_DEFAULT_UNROLL_M 4
  1350. #define DGEMM_DEFAULT_UNROLL_M 2
  1351. #define QGEMM_DEFAULT_UNROLL_M 2
  1352. #define CGEMM_DEFAULT_UNROLL_M 2
  1353. #define ZGEMM_DEFAULT_UNROLL_M 1
  1354. #define XGEMM_DEFAULT_UNROLL_M 1
  1355. #define SGEMM_DEFAULT_UNROLL_N 4
  1356. #define DGEMM_DEFAULT_UNROLL_N 4
  1357. #define QGEMM_DEFAULT_UNROLL_N 2
  1358. #define CGEMM_DEFAULT_UNROLL_N 2
  1359. #define ZGEMM_DEFAULT_UNROLL_N 2
  1360. #define XGEMM_DEFAULT_UNROLL_N 1
  1361. #else
  1362. #define SGEMM_DEFAULT_UNROLL_M 16
  1363. #define DGEMM_DEFAULT_UNROLL_M 16
  1364. #define QGEMM_DEFAULT_UNROLL_M 2
  1365. #define CGEMM_DEFAULT_UNROLL_M 8
  1366. #define ZGEMM_DEFAULT_UNROLL_M 4
  1367. #define XGEMM_DEFAULT_UNROLL_M 1
  1368. #define SGEMM_DEFAULT_UNROLL_N 4
  1369. #define DGEMM_DEFAULT_UNROLL_N 2
  1370. #define QGEMM_DEFAULT_UNROLL_N 2
  1371. #define CGEMM_DEFAULT_UNROLL_N 2
  1372. #define ZGEMM_DEFAULT_UNROLL_N 2
  1373. #define XGEMM_DEFAULT_UNROLL_N 1
  1374. #define SGEMM_DEFAULT_UNROLL_MN 32
  1375. #define DGEMM_DEFAULT_UNROLL_MN 32
  1376. #endif
  1377. #ifdef ARCH_X86
  1378. #define SGEMM_DEFAULT_P 512
  1379. #define SGEMM_DEFAULT_R sgemm_r
  1380. #define DGEMM_DEFAULT_P 512
  1381. #define DGEMM_DEFAULT_R dgemm_r
  1382. #define QGEMM_DEFAULT_P 504
  1383. #define QGEMM_DEFAULT_R qgemm_r
  1384. #define CGEMM_DEFAULT_P 128
  1385. #define CGEMM_DEFAULT_R 1024
  1386. #define ZGEMM_DEFAULT_P 512
  1387. #define ZGEMM_DEFAULT_R zgemm_r
  1388. #define XGEMM_DEFAULT_P 252
  1389. #define XGEMM_DEFAULT_R xgemm_r
  1390. #define SGEMM_DEFAULT_Q 256
  1391. #define DGEMM_DEFAULT_Q 256
  1392. #define QGEMM_DEFAULT_Q 128
  1393. #define CGEMM_DEFAULT_Q 256
  1394. #define ZGEMM_DEFAULT_Q 192
  1395. #define XGEMM_DEFAULT_Q 128
  1396. #else
  1397. #define SGEMM_DEFAULT_P 448
  1398. #define DGEMM_DEFAULT_P 192
  1399. #define CGEMM_DEFAULT_P 384
  1400. #define ZGEMM_DEFAULT_P 256
  1401. #define SGEMM_DEFAULT_Q 448
  1402. #define DGEMM_DEFAULT_Q 384
  1403. #define CGEMM_DEFAULT_Q 192
  1404. #define ZGEMM_DEFAULT_Q 128
  1405. #define SGEMM_DEFAULT_R sgemm_r
  1406. #define DGEMM_DEFAULT_R 8640
  1407. #define CGEMM_DEFAULT_R cgemm_r
  1408. #define ZGEMM_DEFAULT_R zgemm_r
  1409. #define QGEMM_DEFAULT_Q 128
  1410. #define QGEMM_DEFAULT_P 504
  1411. #define QGEMM_DEFAULT_R qgemm_r
  1412. #define XGEMM_DEFAULT_P 252
  1413. #define XGEMM_DEFAULT_R xgemm_r
  1414. #define XGEMM_DEFAULT_Q 128
  1415. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1416. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1417. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1418. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1419. #define CGEMM3M_DEFAULT_P 320
  1420. #define ZGEMM3M_DEFAULT_P 256
  1421. #define XGEMM3M_DEFAULT_P 112
  1422. #define CGEMM3M_DEFAULT_Q 320
  1423. #define ZGEMM3M_DEFAULT_Q 256
  1424. #define XGEMM3M_DEFAULT_Q 224
  1425. #define CGEMM3M_DEFAULT_R 12288
  1426. #define ZGEMM3M_DEFAULT_R 12288
  1427. #define XGEMM3M_DEFAULT_R 12288
  1428. #endif
  1429. #endif
  1430. #ifdef SAPPHIRERAPIDS
  1431. #define SNUMOPT 16
  1432. #define DNUMOPT 8
  1433. #define GEMM_DEFAULT_OFFSET_A 0
  1434. #define GEMM_DEFAULT_OFFSET_B 0
  1435. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1436. #define SYMV_P 8
  1437. #if defined(XDOUBLE) || defined(DOUBLE)
  1438. #define SWITCH_RATIO 8
  1439. #define GEMM_PREFERED_SIZE 8
  1440. #else
  1441. #define SWITCH_RATIO 16
  1442. #define GEMM_PREFERED_SIZE 16
  1443. #endif
  1444. #define USE_SGEMM_KERNEL_DIRECT 1
  1445. #undef SBGEMM_DEFAULT_UNROLL_N
  1446. #undef SBGEMM_DEFAULT_UNROLL_M
  1447. #undef SBGEMM_DEFAULT_P
  1448. #undef SBGEMM_DEFAULT_R
  1449. #undef SBGEMM_DEFAULT_Q
  1450. // FIXME: actually UNROLL_M = UNROLL_N = 16
  1451. // If M and N is equal, OpenBLAS will reuse OCOPY as ICOPY.
  1452. // But for AMX, they are not the same, set UNROLL_M = 32 to workaround
  1453. #define SBGEMM_DEFAULT_UNROLL_N 16
  1454. #define SBGEMM_DEFAULT_UNROLL_M 32
  1455. #define SBGEMM_DEFAULT_P 256
  1456. #define SBGEMM_DEFAULT_Q 1024
  1457. #define SBGEMM_DEFAULT_R sbgemm_r
  1458. #ifdef ARCH_X86
  1459. #define SGEMM_DEFAULT_UNROLL_M 4
  1460. #define DGEMM_DEFAULT_UNROLL_M 2
  1461. #define QGEMM_DEFAULT_UNROLL_M 2
  1462. #define CGEMM_DEFAULT_UNROLL_M 2
  1463. #define ZGEMM_DEFAULT_UNROLL_M 1
  1464. #define XGEMM_DEFAULT_UNROLL_M 1
  1465. #define SGEMM_DEFAULT_UNROLL_N 4
  1466. #define DGEMM_DEFAULT_UNROLL_N 4
  1467. #define QGEMM_DEFAULT_UNROLL_N 2
  1468. #define CGEMM_DEFAULT_UNROLL_N 2
  1469. #define ZGEMM_DEFAULT_UNROLL_N 2
  1470. #define XGEMM_DEFAULT_UNROLL_N 1
  1471. #else
  1472. #define SGEMM_DEFAULT_UNROLL_M 16
  1473. #define DGEMM_DEFAULT_UNROLL_M 16
  1474. #define QGEMM_DEFAULT_UNROLL_M 2
  1475. #define CGEMM_DEFAULT_UNROLL_M 8
  1476. #define ZGEMM_DEFAULT_UNROLL_M 4
  1477. #define XGEMM_DEFAULT_UNROLL_M 1
  1478. #define SGEMM_DEFAULT_UNROLL_N 4
  1479. #define DGEMM_DEFAULT_UNROLL_N 2
  1480. #define QGEMM_DEFAULT_UNROLL_N 2
  1481. #define CGEMM_DEFAULT_UNROLL_N 2
  1482. #define ZGEMM_DEFAULT_UNROLL_N 2
  1483. #define XGEMM_DEFAULT_UNROLL_N 1
  1484. #define SGEMM_DEFAULT_UNROLL_MN 32
  1485. #define DGEMM_DEFAULT_UNROLL_MN 32
  1486. #endif
  1487. #ifdef ARCH_X86
  1488. #define SGEMM_DEFAULT_P 512
  1489. #define SGEMM_DEFAULT_R sgemm_r
  1490. #define DGEMM_DEFAULT_P 512
  1491. #define DGEMM_DEFAULT_R dgemm_r
  1492. #define QGEMM_DEFAULT_P 504
  1493. #define QGEMM_DEFAULT_R qgemm_r
  1494. #define CGEMM_DEFAULT_P 128
  1495. #define CGEMM_DEFAULT_R 1024
  1496. #define ZGEMM_DEFAULT_P 512
  1497. #define ZGEMM_DEFAULT_R zgemm_r
  1498. #define XGEMM_DEFAULT_P 252
  1499. #define XGEMM_DEFAULT_R xgemm_r
  1500. #define SGEMM_DEFAULT_Q 256
  1501. #define DGEMM_DEFAULT_Q 256
  1502. #define QGEMM_DEFAULT_Q 128
  1503. #define CGEMM_DEFAULT_Q 256
  1504. #define ZGEMM_DEFAULT_Q 192
  1505. #define XGEMM_DEFAULT_Q 128
  1506. #else
  1507. #define SGEMM_DEFAULT_P 640
  1508. #define DGEMM_DEFAULT_P 192
  1509. #define CGEMM_DEFAULT_P 384
  1510. #define ZGEMM_DEFAULT_P 256
  1511. #define SGEMM_DEFAULT_Q 320
  1512. #define DGEMM_DEFAULT_Q 384
  1513. #define CGEMM_DEFAULT_Q 192
  1514. #define ZGEMM_DEFAULT_Q 128
  1515. #define SGEMM_DEFAULT_R sgemm_r
  1516. #define DGEMM_DEFAULT_R 8640
  1517. #define CGEMM_DEFAULT_R cgemm_r
  1518. #define ZGEMM_DEFAULT_R zgemm_r
  1519. #define QGEMM_DEFAULT_Q 128
  1520. #define QGEMM_DEFAULT_P 504
  1521. #define QGEMM_DEFAULT_R qgemm_r
  1522. #define XGEMM_DEFAULT_P 252
  1523. #define XGEMM_DEFAULT_R xgemm_r
  1524. #define XGEMM_DEFAULT_Q 128
  1525. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1526. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1527. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1528. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1529. #define CGEMM3M_DEFAULT_P 320
  1530. #define ZGEMM3M_DEFAULT_P 256
  1531. #define XGEMM3M_DEFAULT_P 112
  1532. #define CGEMM3M_DEFAULT_Q 320
  1533. #define ZGEMM3M_DEFAULT_Q 256
  1534. #define XGEMM3M_DEFAULT_Q 224
  1535. #define CGEMM3M_DEFAULT_R 12288
  1536. #define ZGEMM3M_DEFAULT_R 12288
  1537. #define XGEMM3M_DEFAULT_R 12288
  1538. #endif
  1539. #endif
  1540. #ifdef COOPERLAKE
  1541. #define SNUMOPT 16
  1542. #define DNUMOPT 8
  1543. #define GEMM_DEFAULT_OFFSET_A 0
  1544. #define GEMM_DEFAULT_OFFSET_B 0
  1545. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1546. #define SYMV_P 8
  1547. #if defined(XDOUBLE) || defined(DOUBLE)
  1548. #define SWITCH_RATIO 8
  1549. #define GEMM_PREFERED_SIZE 8
  1550. #else
  1551. #define SWITCH_RATIO 16
  1552. #define GEMM_PREFERED_SIZE 16
  1553. #endif
  1554. #define USE_SGEMM_KERNEL_DIRECT 1
  1555. #undef SBGEMM_DEFAULT_UNROLL_N
  1556. #undef SBGEMM_DEFAULT_UNROLL_M
  1557. #undef SBGEMM_DEFAULT_P
  1558. #undef SBGEMM_DEFAULT_R
  1559. #undef SBGEMM_DEFAULT_Q
  1560. #define SBGEMM_DEFAULT_UNROLL_N 4
  1561. #define SBGEMM_DEFAULT_UNROLL_M 16
  1562. #define SBGEMM_DEFAULT_P 384
  1563. #define SBGEMM_DEFAULT_Q 768
  1564. #define SBGEMM_DEFAULT_R sbgemm_r
  1565. #ifdef ARCH_X86
  1566. #define SGEMM_DEFAULT_UNROLL_M 4
  1567. #define DGEMM_DEFAULT_UNROLL_M 2
  1568. #define QGEMM_DEFAULT_UNROLL_M 2
  1569. #define CGEMM_DEFAULT_UNROLL_M 2
  1570. #define ZGEMM_DEFAULT_UNROLL_M 1
  1571. #define XGEMM_DEFAULT_UNROLL_M 1
  1572. #define SGEMM_DEFAULT_UNROLL_N 4
  1573. #define DGEMM_DEFAULT_UNROLL_N 4
  1574. #define QGEMM_DEFAULT_UNROLL_N 2
  1575. #define CGEMM_DEFAULT_UNROLL_N 2
  1576. #define ZGEMM_DEFAULT_UNROLL_N 2
  1577. #define XGEMM_DEFAULT_UNROLL_N 1
  1578. #else
  1579. #define SGEMM_DEFAULT_UNROLL_M 16
  1580. #define DGEMM_DEFAULT_UNROLL_M 16
  1581. #define QGEMM_DEFAULT_UNROLL_M 2
  1582. #define CGEMM_DEFAULT_UNROLL_M 8
  1583. #define ZGEMM_DEFAULT_UNROLL_M 4
  1584. #define XGEMM_DEFAULT_UNROLL_M 1
  1585. #define SGEMM_DEFAULT_UNROLL_N 4
  1586. #define DGEMM_DEFAULT_UNROLL_N 2
  1587. #define QGEMM_DEFAULT_UNROLL_N 2
  1588. #define CGEMM_DEFAULT_UNROLL_N 2
  1589. #define ZGEMM_DEFAULT_UNROLL_N 2
  1590. #define XGEMM_DEFAULT_UNROLL_N 1
  1591. #define SGEMM_DEFAULT_UNROLL_MN 32
  1592. #define DGEMM_DEFAULT_UNROLL_MN 32
  1593. #endif
  1594. #ifdef ARCH_X86
  1595. #define SGEMM_DEFAULT_P 512
  1596. #define SGEMM_DEFAULT_R sgemm_r
  1597. #define DGEMM_DEFAULT_P 512
  1598. #define DGEMM_DEFAULT_R dgemm_r
  1599. #define QGEMM_DEFAULT_P 504
  1600. #define QGEMM_DEFAULT_R qgemm_r
  1601. #define CGEMM_DEFAULT_P 128
  1602. #define CGEMM_DEFAULT_R 1024
  1603. #define ZGEMM_DEFAULT_P 512
  1604. #define ZGEMM_DEFAULT_R zgemm_r
  1605. #define XGEMM_DEFAULT_P 252
  1606. #define XGEMM_DEFAULT_R xgemm_r
  1607. #define SGEMM_DEFAULT_Q 256
  1608. #define DGEMM_DEFAULT_Q 256
  1609. #define QGEMM_DEFAULT_Q 128
  1610. #define CGEMM_DEFAULT_Q 256
  1611. #define ZGEMM_DEFAULT_Q 192
  1612. #define XGEMM_DEFAULT_Q 128
  1613. #else
  1614. #define SGEMM_DEFAULT_P 640
  1615. #define DGEMM_DEFAULT_P 192
  1616. #define CGEMM_DEFAULT_P 384
  1617. #define ZGEMM_DEFAULT_P 256
  1618. #define SGEMM_DEFAULT_Q 320
  1619. #define DGEMM_DEFAULT_Q 384
  1620. #define CGEMM_DEFAULT_Q 192
  1621. #define ZGEMM_DEFAULT_Q 128
  1622. #define SGEMM_DEFAULT_R sgemm_r
  1623. #define DGEMM_DEFAULT_R 8640
  1624. #define CGEMM_DEFAULT_R cgemm_r
  1625. #define ZGEMM_DEFAULT_R zgemm_r
  1626. #define QGEMM_DEFAULT_Q 128
  1627. #define QGEMM_DEFAULT_P 504
  1628. #define QGEMM_DEFAULT_R qgemm_r
  1629. #define XGEMM_DEFAULT_P 252
  1630. #define XGEMM_DEFAULT_R xgemm_r
  1631. #define XGEMM_DEFAULT_Q 128
  1632. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1633. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1634. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1635. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1636. #define CGEMM3M_DEFAULT_P 320
  1637. #define ZGEMM3M_DEFAULT_P 256
  1638. #define XGEMM3M_DEFAULT_P 112
  1639. #define CGEMM3M_DEFAULT_Q 320
  1640. #define ZGEMM3M_DEFAULT_Q 256
  1641. #define XGEMM3M_DEFAULT_Q 224
  1642. #define CGEMM3M_DEFAULT_R 12288
  1643. #define ZGEMM3M_DEFAULT_R 12288
  1644. #define XGEMM3M_DEFAULT_R 12288
  1645. #endif
  1646. #endif
  1647. #ifdef ATOM
  1648. #define SNUMOPT 2
  1649. #define DNUMOPT 1
  1650. #define GEMM_DEFAULT_OFFSET_A 64
  1651. #define GEMM_DEFAULT_OFFSET_B 0
  1652. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1653. #define SYMV_P 8
  1654. #ifdef ARCH_X86
  1655. #define SGEMM_DEFAULT_UNROLL_M 4
  1656. #define DGEMM_DEFAULT_UNROLL_M 2
  1657. #define QGEMM_DEFAULT_UNROLL_M 2
  1658. #define CGEMM_DEFAULT_UNROLL_M 2
  1659. #define ZGEMM_DEFAULT_UNROLL_M 1
  1660. #define XGEMM_DEFAULT_UNROLL_M 1
  1661. #else
  1662. #define SGEMM_DEFAULT_UNROLL_M 8
  1663. #define DGEMM_DEFAULT_UNROLL_M 4
  1664. #define QGEMM_DEFAULT_UNROLL_M 2
  1665. #define CGEMM_DEFAULT_UNROLL_M 4
  1666. #define ZGEMM_DEFAULT_UNROLL_M 2
  1667. #define XGEMM_DEFAULT_UNROLL_M 1
  1668. #endif
  1669. #define SGEMM_DEFAULT_UNROLL_N 4
  1670. #define DGEMM_DEFAULT_UNROLL_N 2
  1671. #define QGEMM_DEFAULT_UNROLL_N 2
  1672. #define CGEMM_DEFAULT_UNROLL_N 2
  1673. #define ZGEMM_DEFAULT_UNROLL_N 1
  1674. #define XGEMM_DEFAULT_UNROLL_N 1
  1675. #define SGEMM_DEFAULT_P sgemm_p
  1676. #define SGEMM_DEFAULT_R sgemm_r
  1677. #define DGEMM_DEFAULT_P dgemm_p
  1678. #define DGEMM_DEFAULT_R dgemm_r
  1679. #define QGEMM_DEFAULT_P qgemm_p
  1680. #define QGEMM_DEFAULT_R qgemm_r
  1681. #define CGEMM_DEFAULT_P cgemm_p
  1682. #define CGEMM_DEFAULT_R cgemm_r
  1683. #define ZGEMM_DEFAULT_P zgemm_p
  1684. #define ZGEMM_DEFAULT_R zgemm_r
  1685. #define XGEMM_DEFAULT_P xgemm_p
  1686. #define XGEMM_DEFAULT_R xgemm_r
  1687. #define SGEMM_DEFAULT_Q 256
  1688. #define DGEMM_DEFAULT_Q 256
  1689. #define QGEMM_DEFAULT_Q 256
  1690. #define CGEMM_DEFAULT_Q 256
  1691. #define ZGEMM_DEFAULT_Q 256
  1692. #define XGEMM_DEFAULT_Q 256
  1693. #endif
  1694. #ifdef ITANIUM2
  1695. #define SNUMOPT 4
  1696. #define DNUMOPT 4
  1697. #define GEMM_DEFAULT_OFFSET_A 0
  1698. #define GEMM_DEFAULT_OFFSET_B 128
  1699. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1700. #define SGEMM_DEFAULT_UNROLL_M 8
  1701. #define SGEMM_DEFAULT_UNROLL_N 8
  1702. #define DGEMM_DEFAULT_UNROLL_M 8
  1703. #define DGEMM_DEFAULT_UNROLL_N 8
  1704. #define QGEMM_DEFAULT_UNROLL_M 8
  1705. #define QGEMM_DEFAULT_UNROLL_N 8
  1706. #define CGEMM_DEFAULT_UNROLL_M 4
  1707. #define CGEMM_DEFAULT_UNROLL_N 4
  1708. #define ZGEMM_DEFAULT_UNROLL_M 4
  1709. #define ZGEMM_DEFAULT_UNROLL_N 4
  1710. #define XGEMM_DEFAULT_UNROLL_M 4
  1711. #define XGEMM_DEFAULT_UNROLL_N 4
  1712. #define SGEMM_DEFAULT_P sgemm_p
  1713. #define DGEMM_DEFAULT_P dgemm_p
  1714. #define QGEMM_DEFAULT_P qgemm_p
  1715. #define CGEMM_DEFAULT_P cgemm_p
  1716. #define ZGEMM_DEFAULT_P zgemm_p
  1717. #define XGEMM_DEFAULT_P xgemm_p
  1718. #define SGEMM_DEFAULT_Q 1024
  1719. #define DGEMM_DEFAULT_Q 1024
  1720. #define QGEMM_DEFAULT_Q 1024
  1721. #define CGEMM_DEFAULT_Q 1024
  1722. #define ZGEMM_DEFAULT_Q 1024
  1723. #define XGEMM_DEFAULT_Q 1024
  1724. #define SGEMM_DEFAULT_R sgemm_r
  1725. #define DGEMM_DEFAULT_R dgemm_r
  1726. #define QGEMM_DEFAULT_R qgemm_r
  1727. #define CGEMM_DEFAULT_R cgemm_r
  1728. #define ZGEMM_DEFAULT_R zgemm_r
  1729. #define XGEMM_DEFAULT_R xgemm_r
  1730. #define SYMV_P 16
  1731. #define GETRF_FACTOR 0.65
  1732. #endif
  1733. #if defined(EV4) || defined(EV5) || defined(EV6)
  1734. #ifdef EV4
  1735. #define SNUMOPT 1
  1736. #define DNUMOPT 1
  1737. #else
  1738. #define SNUMOPT 2
  1739. #define DNUMOPT 2
  1740. #endif
  1741. #define GEMM_DEFAULT_OFFSET_A 512
  1742. #define GEMM_DEFAULT_OFFSET_B 512
  1743. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1744. #define SGEMM_DEFAULT_UNROLL_M 4
  1745. #define SGEMM_DEFAULT_UNROLL_N 4
  1746. #define DGEMM_DEFAULT_UNROLL_M 4
  1747. #define DGEMM_DEFAULT_UNROLL_N 4
  1748. #define CGEMM_DEFAULT_UNROLL_M 2
  1749. #define CGEMM_DEFAULT_UNROLL_N 2
  1750. #define ZGEMM_DEFAULT_UNROLL_M 2
  1751. #define ZGEMM_DEFAULT_UNROLL_N 2
  1752. #define SYMV_P 8
  1753. #ifdef EV4
  1754. #define SGEMM_DEFAULT_P 32
  1755. #define SGEMM_DEFAULT_Q 112
  1756. #define SGEMM_DEFAULT_R 256
  1757. #define DGEMM_DEFAULT_P 32
  1758. #define DGEMM_DEFAULT_Q 56
  1759. #define DGEMM_DEFAULT_R 256
  1760. #define CGEMM_DEFAULT_P 32
  1761. #define CGEMM_DEFAULT_Q 64
  1762. #define CGEMM_DEFAULT_R 240
  1763. #define ZGEMM_DEFAULT_P 32
  1764. #define ZGEMM_DEFAULT_Q 32
  1765. #define ZGEMM_DEFAULT_R 240
  1766. #endif
  1767. #ifdef EV5
  1768. #define SGEMM_DEFAULT_P 64
  1769. #define SGEMM_DEFAULT_Q 256
  1770. #define DGEMM_DEFAULT_P 64
  1771. #define DGEMM_DEFAULT_Q 128
  1772. #define CGEMM_DEFAULT_P 64
  1773. #define CGEMM_DEFAULT_Q 128
  1774. #define ZGEMM_DEFAULT_P 64
  1775. #define ZGEMM_DEFAULT_Q 64
  1776. #endif
  1777. #ifdef EV6
  1778. #define SGEMM_DEFAULT_P 256
  1779. #define SGEMM_DEFAULT_Q 512
  1780. #define DGEMM_DEFAULT_P 256
  1781. #define DGEMM_DEFAULT_Q 256
  1782. #define CGEMM_DEFAULT_P 256
  1783. #define CGEMM_DEFAULT_Q 256
  1784. #define ZGEMM_DEFAULT_P 128
  1785. #define ZGEMM_DEFAULT_Q 256
  1786. #endif
  1787. #endif
  1788. #ifdef CELL
  1789. #define SNUMOPT 2
  1790. #define DNUMOPT 2
  1791. #define GEMM_DEFAULT_OFFSET_A 0
  1792. #define GEMM_DEFAULT_OFFSET_B 8192
  1793. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1794. #define SGEMM_DEFAULT_UNROLL_M 16
  1795. #define SGEMM_DEFAULT_UNROLL_N 4
  1796. #define DGEMM_DEFAULT_UNROLL_M 4
  1797. #define DGEMM_DEFAULT_UNROLL_N 4
  1798. #define CGEMM_DEFAULT_UNROLL_M 8
  1799. #define CGEMM_DEFAULT_UNROLL_N 2
  1800. #define ZGEMM_DEFAULT_UNROLL_M 2
  1801. #define ZGEMM_DEFAULT_UNROLL_N 2
  1802. #define SGEMM_DEFAULT_P 128
  1803. #define DGEMM_DEFAULT_P 128
  1804. #define CGEMM_DEFAULT_P 128
  1805. #define ZGEMM_DEFAULT_P 128
  1806. #define SGEMM_DEFAULT_Q 512
  1807. #define DGEMM_DEFAULT_Q 256
  1808. #define CGEMM_DEFAULT_Q 256
  1809. #define ZGEMM_DEFAULT_Q 128
  1810. #define SYMV_P 4
  1811. #endif
  1812. #ifdef PPCG4
  1813. #define GEMM_DEFAULT_OFFSET_A 0
  1814. #define GEMM_DEFAULT_OFFSET_B 1024
  1815. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1816. #define SGEMM_DEFAULT_UNROLL_M 4
  1817. #define SGEMM_DEFAULT_UNROLL_N 4
  1818. #define DGEMM_DEFAULT_UNROLL_M 4
  1819. #define DGEMM_DEFAULT_UNROLL_N 4
  1820. #define CGEMM_DEFAULT_UNROLL_M 2
  1821. #define CGEMM_DEFAULT_UNROLL_N 2
  1822. #define ZGEMM_DEFAULT_UNROLL_M 2
  1823. #define ZGEMM_DEFAULT_UNROLL_N 2
  1824. #define SGEMM_DEFAULT_P 256
  1825. #define DGEMM_DEFAULT_P 128
  1826. #define CGEMM_DEFAULT_P 128
  1827. #define ZGEMM_DEFAULT_P 64
  1828. #define SGEMM_DEFAULT_Q 256
  1829. #define DGEMM_DEFAULT_Q 256
  1830. #define CGEMM_DEFAULT_Q 256
  1831. #define ZGEMM_DEFAULT_Q 256
  1832. #define SYMV_P 4
  1833. #endif
  1834. #ifdef PPC970
  1835. #define SNUMOPT 4
  1836. #define DNUMOPT 4
  1837. #define GEMM_DEFAULT_OFFSET_A 2688
  1838. #define GEMM_DEFAULT_OFFSET_B 3072
  1839. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1840. #if defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
  1841. #define SGEMM_DEFAULT_UNROLL_M 4
  1842. #else
  1843. #define SGEMM_DEFAULT_UNROLL_M 16
  1844. #endif
  1845. #define SGEMM_DEFAULT_UNROLL_N 4
  1846. #define DGEMM_DEFAULT_UNROLL_M 4
  1847. #define DGEMM_DEFAULT_UNROLL_N 4
  1848. #if defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
  1849. #define CGEMM_DEFAULT_UNROLL_M 2
  1850. #else
  1851. #define CGEMM_DEFAULT_UNROLL_M 8
  1852. #endif
  1853. #define CGEMM_DEFAULT_UNROLL_N 2
  1854. #define ZGEMM_DEFAULT_UNROLL_M 2
  1855. #define ZGEMM_DEFAULT_UNROLL_N 2
  1856. #if defined(OS_LINUX) || defined(OS_DARWIN) || defined(OS_FREEBSD)
  1857. #if L2_SIZE == 1024976
  1858. #define SGEMM_DEFAULT_P 320
  1859. #define DGEMM_DEFAULT_P 256
  1860. #define CGEMM_DEFAULT_P 256
  1861. #define ZGEMM_DEFAULT_P 256
  1862. #else
  1863. #define SGEMM_DEFAULT_P 176
  1864. #define DGEMM_DEFAULT_P 176
  1865. #define CGEMM_DEFAULT_P 176
  1866. #define ZGEMM_DEFAULT_P 176
  1867. #endif
  1868. #endif
  1869. #define SGEMM_DEFAULT_Q 512
  1870. #define DGEMM_DEFAULT_Q 256
  1871. #define CGEMM_DEFAULT_Q 256
  1872. #define ZGEMM_DEFAULT_Q 128
  1873. #define SYMV_P 4
  1874. #endif
  1875. #ifdef PPC440
  1876. #define SNUMOPT 2
  1877. #define DNUMOPT 2
  1878. #define GEMM_DEFAULT_OFFSET_A (32 * 0)
  1879. #define GEMM_DEFAULT_OFFSET_B (32 * 0)
  1880. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1881. #define SGEMM_DEFAULT_UNROLL_M 4
  1882. #define SGEMM_DEFAULT_UNROLL_N 4
  1883. #define DGEMM_DEFAULT_UNROLL_M 4
  1884. #define DGEMM_DEFAULT_UNROLL_N 4
  1885. #define CGEMM_DEFAULT_UNROLL_M 2
  1886. #define CGEMM_DEFAULT_UNROLL_N 2
  1887. #define ZGEMM_DEFAULT_UNROLL_M 2
  1888. #define ZGEMM_DEFAULT_UNROLL_N 2
  1889. #define SGEMM_DEFAULT_P 512
  1890. #define DGEMM_DEFAULT_P 512
  1891. #define CGEMM_DEFAULT_P 512
  1892. #define ZGEMM_DEFAULT_P 512
  1893. #define SGEMM_DEFAULT_Q 1024
  1894. #define DGEMM_DEFAULT_Q 512
  1895. #define CGEMM_DEFAULT_Q 512
  1896. #define ZGEMM_DEFAULT_Q 256
  1897. #define SGEMM_DEFAULT_R SGEMM_DEFAULT_P
  1898. #define DGEMM_DEFAULT_R DGEMM_DEFAULT_P
  1899. #define CGEMM_DEFAULT_R CGEMM_DEFAULT_P
  1900. #define ZGEMM_DEFAULT_R ZGEMM_DEFAULT_P
  1901. #define SYMV_P 4
  1902. #endif
  1903. #ifdef PPC440FP2
  1904. #define SNUMOPT 4
  1905. #define DNUMOPT 4
  1906. #define GEMM_DEFAULT_OFFSET_A (32 * 0)
  1907. #define GEMM_DEFAULT_OFFSET_B (32 * 0)
  1908. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1909. #define SGEMM_DEFAULT_UNROLL_M 8
  1910. #define SGEMM_DEFAULT_UNROLL_N 4
  1911. #define DGEMM_DEFAULT_UNROLL_M 8
  1912. #define DGEMM_DEFAULT_UNROLL_N 4
  1913. #define CGEMM_DEFAULT_UNROLL_M 4
  1914. #define CGEMM_DEFAULT_UNROLL_N 2
  1915. #define ZGEMM_DEFAULT_UNROLL_M 4
  1916. #define ZGEMM_DEFAULT_UNROLL_N 2
  1917. #define SGEMM_DEFAULT_P 128
  1918. #define DGEMM_DEFAULT_P 128
  1919. #define CGEMM_DEFAULT_P 128
  1920. #define ZGEMM_DEFAULT_P 128
  1921. #if 1
  1922. #define SGEMM_DEFAULT_Q 4096
  1923. #define DGEMM_DEFAULT_Q 3072
  1924. #define CGEMM_DEFAULT_Q 2048
  1925. #define ZGEMM_DEFAULT_Q 1024
  1926. #else
  1927. #define SGEMM_DEFAULT_Q 512
  1928. #define DGEMM_DEFAULT_Q 256
  1929. #define CGEMM_DEFAULT_Q 256
  1930. #define ZGEMM_DEFAULT_Q 128
  1931. #endif
  1932. #define SYMV_P 4
  1933. #endif
  1934. #if defined(POWER3) || defined(POWER4) || defined(POWER5)
  1935. #define GEMM_DEFAULT_OFFSET_A 0
  1936. #define GEMM_DEFAULT_OFFSET_B 2048
  1937. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1938. #define SGEMM_DEFAULT_UNROLL_M 4
  1939. #define SGEMM_DEFAULT_UNROLL_N 4
  1940. #define DGEMM_DEFAULT_UNROLL_M 4
  1941. #define DGEMM_DEFAULT_UNROLL_N 4
  1942. #define CGEMM_DEFAULT_UNROLL_M 2
  1943. #define CGEMM_DEFAULT_UNROLL_N 2
  1944. #define ZGEMM_DEFAULT_UNROLL_M 2
  1945. #define ZGEMM_DEFAULT_UNROLL_N 2
  1946. #ifdef POWER3
  1947. #define SNUMOPT 4
  1948. #define DNUMOPT 4
  1949. #define SGEMM_DEFAULT_P 256
  1950. #define SGEMM_DEFAULT_Q 432
  1951. #define SGEMM_DEFAULT_R 1012
  1952. #define DGEMM_DEFAULT_P 256
  1953. #define DGEMM_DEFAULT_Q 216
  1954. #define DGEMM_DEFAULT_R 1012
  1955. #define CGEMM_DEFAULT_P 256
  1956. #define CGEMM_DEFAULT_Q 104
  1957. #define CGEMM_DEFAULT_R 1012
  1958. #define ZGEMM_DEFAULT_P 256
  1959. #define ZGEMM_DEFAULT_Q 104
  1960. #define ZGEMM_DEFAULT_R 1012
  1961. #endif
  1962. #if defined(POWER4)
  1963. #ifdef ALLOC_HUGETLB
  1964. #define SGEMM_DEFAULT_P 184
  1965. #define DGEMM_DEFAULT_P 184
  1966. #define CGEMM_DEFAULT_P 184
  1967. #define ZGEMM_DEFAULT_P 184
  1968. #else
  1969. #define SGEMM_DEFAULT_P 144
  1970. #define DGEMM_DEFAULT_P 144
  1971. #define CGEMM_DEFAULT_P 144
  1972. #define ZGEMM_DEFAULT_P 144
  1973. #endif
  1974. #define SGEMM_DEFAULT_Q 256
  1975. #define CGEMM_DEFAULT_Q 256
  1976. #define DGEMM_DEFAULT_Q 256
  1977. #define ZGEMM_DEFAULT_Q 256
  1978. #endif
  1979. #if defined(POWER5)
  1980. #ifdef ALLOC_HUGETLB
  1981. #define SGEMM_DEFAULT_P 512
  1982. #define DGEMM_DEFAULT_P 256
  1983. #define CGEMM_DEFAULT_P 256
  1984. #define ZGEMM_DEFAULT_P 128
  1985. #else
  1986. #define SGEMM_DEFAULT_P 320
  1987. #define DGEMM_DEFAULT_P 160
  1988. #define CGEMM_DEFAULT_P 160
  1989. #define ZGEMM_DEFAULT_P 80
  1990. #endif
  1991. #define SGEMM_DEFAULT_Q 256
  1992. #define CGEMM_DEFAULT_Q 256
  1993. #define DGEMM_DEFAULT_Q 256
  1994. #define ZGEMM_DEFAULT_Q 256
  1995. #endif
  1996. #define SYMV_P 8
  1997. #endif
  1998. #if defined(POWER6)
  1999. #define SNUMOPT 4
  2000. #define DNUMOPT 4
  2001. #define GEMM_DEFAULT_OFFSET_A 384
  2002. #define GEMM_DEFAULT_OFFSET_B 1024
  2003. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2004. #define SGEMM_DEFAULT_UNROLL_M 4
  2005. #define SGEMM_DEFAULT_UNROLL_N 4
  2006. #define DGEMM_DEFAULT_UNROLL_M 4
  2007. #define DGEMM_DEFAULT_UNROLL_N 4
  2008. #define CGEMM_DEFAULT_UNROLL_M 2
  2009. #define CGEMM_DEFAULT_UNROLL_N 4
  2010. #define ZGEMM_DEFAULT_UNROLL_M 2
  2011. #define ZGEMM_DEFAULT_UNROLL_N 4
  2012. #define SGEMM_DEFAULT_P 992
  2013. #define DGEMM_DEFAULT_P 480
  2014. #define CGEMM_DEFAULT_P 488
  2015. #define ZGEMM_DEFAULT_P 248
  2016. #define SGEMM_DEFAULT_Q 504
  2017. #define DGEMM_DEFAULT_Q 504
  2018. #define CGEMM_DEFAULT_Q 400
  2019. #define ZGEMM_DEFAULT_Q 400
  2020. #define SYMV_P 8
  2021. #endif
  2022. #if defined(POWER8) || (defined(POWER9) && defined(OS_AIX))
  2023. #define SNUMOPT 16
  2024. #define DNUMOPT 8
  2025. #define GEMM_DEFAULT_OFFSET_A 0
  2026. #define GEMM_DEFAULT_OFFSET_B 65536
  2027. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2028. #if defined(__32BIT__)
  2029. #warning using BINARY32==POWER6
  2030. #define SGEMM_DEFAULT_UNROLL_M 4
  2031. #define SGEMM_DEFAULT_UNROLL_N 4
  2032. #define DGEMM_DEFAULT_UNROLL_M 4
  2033. #define DGEMM_DEFAULT_UNROLL_N 4
  2034. #define CGEMM_DEFAULT_UNROLL_M 2
  2035. #define CGEMM_DEFAULT_UNROLL_N 4
  2036. #define ZGEMM_DEFAULT_UNROLL_M 2
  2037. #define ZGEMM_DEFAULT_UNROLL_N 4
  2038. #else
  2039. #define SGEMM_DEFAULT_UNROLL_M 16
  2040. #define SGEMM_DEFAULT_UNROLL_N 8
  2041. #define DGEMM_DEFAULT_UNROLL_M 16
  2042. #define DGEMM_DEFAULT_UNROLL_N 4
  2043. #define CGEMM_DEFAULT_UNROLL_M 8
  2044. #define CGEMM_DEFAULT_UNROLL_N 4
  2045. #define ZGEMM_DEFAULT_UNROLL_M 8
  2046. #define ZGEMM_DEFAULT_UNROLL_N 2
  2047. #endif
  2048. #define SGEMM_DEFAULT_P 1280UL
  2049. #define DGEMM_DEFAULT_P 640UL
  2050. #define CGEMM_DEFAULT_P 640UL
  2051. #define ZGEMM_DEFAULT_P 320UL
  2052. #define SGEMM_DEFAULT_Q 640UL
  2053. #define DGEMM_DEFAULT_Q 720UL
  2054. #define CGEMM_DEFAULT_Q 640UL
  2055. #define ZGEMM_DEFAULT_Q 640UL
  2056. #if 0
  2057. #define SGEMM_DEFAULT_R SGEMM_DEFAULT_P
  2058. #define DGEMM_DEFAULT_R DGEMM_DEFAULT_P
  2059. #define CGEMM_DEFAULT_R CGEMM_DEFAULT_P
  2060. #define ZGEMM_DEFAULT_R ZGEMM_DEFAULT_P
  2061. #endif
  2062. #define SGEMM_DEFAULT_R 4096
  2063. #define DGEMM_DEFAULT_R 4096
  2064. #define CGEMM_DEFAULT_R 4096
  2065. #define ZGEMM_DEFAULT_R 4096
  2066. #define SYMV_P 8
  2067. #endif
  2068. #if defined(POWER9) && (defined(OS_LINUX) || defined(OS_FREEBSD))
  2069. #define SNUMOPT 16
  2070. #define DNUMOPT 8
  2071. #define GEMM_DEFAULT_OFFSET_A 0
  2072. #define GEMM_DEFAULT_OFFSET_B 65536
  2073. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2074. #define SWITCH_RATIO 16
  2075. #define GEMM_PREFERED_SIZE 16
  2076. #define SGEMM_DEFAULT_UNROLL_M 16
  2077. #define SGEMM_DEFAULT_UNROLL_N 8
  2078. #define DGEMM_DEFAULT_UNROLL_M 16
  2079. #define DGEMM_DEFAULT_UNROLL_N 4
  2080. #define CGEMM_DEFAULT_UNROLL_M 8
  2081. #define CGEMM_DEFAULT_UNROLL_N 4
  2082. #define ZGEMM_DEFAULT_UNROLL_M 8
  2083. #define ZGEMM_DEFAULT_UNROLL_N 2
  2084. #define SGEMM_DEFAULT_P 832
  2085. #define DGEMM_DEFAULT_P 128
  2086. #define CGEMM_DEFAULT_P 512
  2087. #define ZGEMM_DEFAULT_P 256
  2088. #define SGEMM_DEFAULT_Q 1026
  2089. #define DGEMM_DEFAULT_Q 384
  2090. #define CGEMM_DEFAULT_Q 1026
  2091. #define ZGEMM_DEFAULT_Q 1026
  2092. #define SGEMM_DEFAULT_R 4096
  2093. #define DGEMM_DEFAULT_R 4096
  2094. #define CGEMM_DEFAULT_R 4096
  2095. #define ZGEMM_DEFAULT_R 4096
  2096. #define SYMV_P 8
  2097. #endif
  2098. #if defined(POWER10)
  2099. #define SNUMOPT 16
  2100. #define DNUMOPT 8
  2101. #define GEMM_DEFAULT_OFFSET_A 0
  2102. #define GEMM_DEFAULT_OFFSET_B 65536
  2103. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2104. #define SWITCH_RATIO 16
  2105. #define GEMM_PREFERED_SIZE 16
  2106. #define SGEMM_DEFAULT_UNROLL_M 16
  2107. #define SGEMM_DEFAULT_UNROLL_N 8
  2108. #define DGEMM_DEFAULT_UNROLL_M 8
  2109. #define DGEMM_DEFAULT_UNROLL_N 8
  2110. #define CGEMM_DEFAULT_UNROLL_M 8
  2111. #define CGEMM_DEFAULT_UNROLL_N 4
  2112. #define ZGEMM_DEFAULT_UNROLL_M 8
  2113. #define ZGEMM_DEFAULT_UNROLL_N 2
  2114. #define SGEMM_DEFAULT_P 512
  2115. #define DGEMM_DEFAULT_P 384
  2116. #define CGEMM_DEFAULT_P 512
  2117. #define ZGEMM_DEFAULT_P 256
  2118. #define SGEMM_DEFAULT_Q 512
  2119. #define DGEMM_DEFAULT_Q 512
  2120. #define CGEMM_DEFAULT_Q 384
  2121. #define ZGEMM_DEFAULT_Q 384
  2122. #define SGEMM_DEFAULT_R 4096
  2123. #define DGEMM_DEFAULT_R 4096
  2124. #define CGEMM_DEFAULT_R 4096
  2125. #define ZGEMM_DEFAULT_R 4096
  2126. #define SYMV_P 8
  2127. #undef SBGEMM_DEFAULT_UNROLL_N
  2128. #undef SBGEMM_DEFAULT_UNROLL_M
  2129. #undef SBGEMM_DEFAULT_P
  2130. #undef SBGEMM_DEFAULT_R
  2131. #undef SBGEMM_DEFAULT_Q
  2132. #define SBGEMM_DEFAULT_UNROLL_M 16
  2133. #define SBGEMM_DEFAULT_UNROLL_N 8
  2134. #define SBGEMM_DEFAULT_P 512
  2135. #define SBGEMM_DEFAULT_Q 1024
  2136. #define SBGEMM_DEFAULT_R 4096
  2137. #endif
  2138. #if defined(SPARC) && defined(V7)
  2139. #define SNUMOPT 4
  2140. #define DNUMOPT 4
  2141. #define GEMM_DEFAULT_OFFSET_A 0
  2142. #define GEMM_DEFAULT_OFFSET_B 2048
  2143. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2144. #define SGEMM_DEFAULT_UNROLL_M 2
  2145. #define SGEMM_DEFAULT_UNROLL_N 8
  2146. #define DGEMM_DEFAULT_UNROLL_M 2
  2147. #define DGEMM_DEFAULT_UNROLL_N 8
  2148. #define CGEMM_DEFAULT_UNROLL_M 1
  2149. #define CGEMM_DEFAULT_UNROLL_N 4
  2150. #define ZGEMM_DEFAULT_UNROLL_M 1
  2151. #define ZGEMM_DEFAULT_UNROLL_N 4
  2152. #define SGEMM_DEFAULT_P 256
  2153. #define DGEMM_DEFAULT_P 256
  2154. #define CGEMM_DEFAULT_P 256
  2155. #define ZGEMM_DEFAULT_P 256
  2156. #define SGEMM_DEFAULT_Q 512
  2157. #define DGEMM_DEFAULT_Q 256
  2158. #define CGEMM_DEFAULT_Q 256
  2159. #define ZGEMM_DEFAULT_Q 128
  2160. #define SYMV_P 8
  2161. #define GEMM_THREAD gemm_thread_mn
  2162. #endif
  2163. #if (defined(SPARC) && defined(V9)) || defined(__sparc_v9__)
  2164. #define SNUMOPT 2
  2165. #define DNUMOPT 2
  2166. #define GEMM_DEFAULT_OFFSET_A 0
  2167. #define GEMM_DEFAULT_OFFSET_B 2048
  2168. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2169. #define SGEMM_DEFAULT_UNROLL_M 4
  2170. #define SGEMM_DEFAULT_UNROLL_N 4
  2171. #define DGEMM_DEFAULT_UNROLL_M 4
  2172. #define DGEMM_DEFAULT_UNROLL_N 4
  2173. #define CGEMM_DEFAULT_UNROLL_M 2
  2174. #define CGEMM_DEFAULT_UNROLL_N 2
  2175. #define ZGEMM_DEFAULT_UNROLL_M 2
  2176. #define ZGEMM_DEFAULT_UNROLL_N 2
  2177. #define SGEMM_DEFAULT_P 512
  2178. #define DGEMM_DEFAULT_P 512
  2179. #define CGEMM_DEFAULT_P 512
  2180. #define ZGEMM_DEFAULT_P 512
  2181. #define SGEMM_DEFAULT_Q 1024
  2182. #define DGEMM_DEFAULT_Q 512
  2183. #define CGEMM_DEFAULT_Q 512
  2184. #define ZGEMM_DEFAULT_Q 256
  2185. #define SYMV_P 8
  2186. #endif
  2187. #ifdef SICORTEX
  2188. #define SNUMOPT 2
  2189. #define DNUMOPT 2
  2190. #define GEMM_DEFAULT_OFFSET_A 0
  2191. #define GEMM_DEFAULT_OFFSET_B 0
  2192. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2193. #define SGEMM_DEFAULT_UNROLL_M 2
  2194. #define SGEMM_DEFAULT_UNROLL_N 8
  2195. #define DGEMM_DEFAULT_UNROLL_M 2
  2196. #define DGEMM_DEFAULT_UNROLL_N 8
  2197. #define CGEMM_DEFAULT_UNROLL_M 1
  2198. #define CGEMM_DEFAULT_UNROLL_N 4
  2199. #define ZGEMM_DEFAULT_UNROLL_M 1
  2200. #define ZGEMM_DEFAULT_UNROLL_N 4
  2201. #define SGEMM_DEFAULT_P 108
  2202. #define DGEMM_DEFAULT_P 112
  2203. #define CGEMM_DEFAULT_P 108
  2204. #define ZGEMM_DEFAULT_P 112
  2205. #define SGEMM_DEFAULT_Q 288
  2206. #define DGEMM_DEFAULT_Q 144
  2207. #define CGEMM_DEFAULT_Q 144
  2208. #define ZGEMM_DEFAULT_Q 72
  2209. #define SGEMM_DEFAULT_R 2000
  2210. #define DGEMM_DEFAULT_R 2000
  2211. #define CGEMM_DEFAULT_R 2000
  2212. #define ZGEMM_DEFAULT_R 2000
  2213. #define SYMV_P 16
  2214. #endif
  2215. #if defined(LOONGSON3R4)
  2216. #define SNUMOPT 2
  2217. #define DNUMOPT 2
  2218. #define GEMM_DEFAULT_OFFSET_A 0
  2219. #define GEMM_DEFAULT_OFFSET_B 0
  2220. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2221. #if defined(NO_MSA)
  2222. #define SGEMM_DEFAULT_UNROLL_M 8
  2223. #define SGEMM_DEFAULT_UNROLL_N 4
  2224. #define DGEMM_DEFAULT_UNROLL_M 4
  2225. #define DGEMM_DEFAULT_UNROLL_N 4
  2226. #define CGEMM_DEFAULT_UNROLL_M 4
  2227. #define CGEMM_DEFAULT_UNROLL_N 2
  2228. #define ZGEMM_DEFAULT_UNROLL_M 2
  2229. #define ZGEMM_DEFAULT_UNROLL_N 2
  2230. #else
  2231. #define SGEMM_DEFAULT_UNROLL_M 8
  2232. #define SGEMM_DEFAULT_UNROLL_N 8
  2233. #define DGEMM_DEFAULT_UNROLL_M 8
  2234. #define DGEMM_DEFAULT_UNROLL_N 4
  2235. #define CGEMM_DEFAULT_UNROLL_M 8
  2236. #define CGEMM_DEFAULT_UNROLL_N 4
  2237. #define ZGEMM_DEFAULT_UNROLL_M 4
  2238. #define ZGEMM_DEFAULT_UNROLL_N 4
  2239. #endif
  2240. #define SGEMM_DEFAULT_P 64
  2241. #define DGEMM_DEFAULT_P 44
  2242. #define CGEMM_DEFAULT_P 64
  2243. #define ZGEMM_DEFAULT_P 32
  2244. #define SGEMM_DEFAULT_Q 192
  2245. #define DGEMM_DEFAULT_Q 92
  2246. #define CGEMM_DEFAULT_Q 128
  2247. #define ZGEMM_DEFAULT_Q 80
  2248. #define SGEMM_DEFAULT_R 640
  2249. #define DGEMM_DEFAULT_R dgemm_r
  2250. #define CGEMM_DEFAULT_R 640
  2251. #define ZGEMM_DEFAULT_R 640
  2252. #define GEMM_OFFSET_A1 0x10000
  2253. #define GEMM_OFFSET_B1 0x100000
  2254. #define SYMV_P 16
  2255. #endif
  2256. #if defined(LOONGSON3R3)
  2257. ////Copy from SICORTEX
  2258. #define SNUMOPT 2
  2259. #define DNUMOPT 2
  2260. #define GEMM_DEFAULT_OFFSET_A 0
  2261. #define GEMM_DEFAULT_OFFSET_B 0
  2262. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2263. #define SGEMM_DEFAULT_UNROLL_M 8
  2264. #define SGEMM_DEFAULT_UNROLL_N 4
  2265. #define DGEMM_DEFAULT_UNROLL_M 4
  2266. #define DGEMM_DEFAULT_UNROLL_N 4
  2267. #define CGEMM_DEFAULT_UNROLL_M 4
  2268. #define CGEMM_DEFAULT_UNROLL_N 2
  2269. #define ZGEMM_DEFAULT_UNROLL_M 2
  2270. #define ZGEMM_DEFAULT_UNROLL_N 2
  2271. #define SGEMM_DEFAULT_P 64
  2272. #define DGEMM_DEFAULT_P 44
  2273. #define CGEMM_DEFAULT_P 64
  2274. #define ZGEMM_DEFAULT_P 32
  2275. #define SGEMM_DEFAULT_Q 192
  2276. #define DGEMM_DEFAULT_Q 92
  2277. #define CGEMM_DEFAULT_Q 128
  2278. #define ZGEMM_DEFAULT_Q 80
  2279. #define SGEMM_DEFAULT_R 640
  2280. #define DGEMM_DEFAULT_R dgemm_r
  2281. #define CGEMM_DEFAULT_R 640
  2282. #define ZGEMM_DEFAULT_R 640
  2283. #define GEMM_OFFSET_A1 0x10000
  2284. #define GEMM_OFFSET_B1 0x100000
  2285. #define SYMV_P 16
  2286. #endif
  2287. #if defined (LA464)
  2288. #define SNUMOPT 2
  2289. #define DNUMOPT 2
  2290. #define GEMM_DEFAULT_OFFSET_A 0x20000
  2291. #define GEMM_DEFAULT_OFFSET_B 0
  2292. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2293. #if defined(NO_LASX)
  2294. #define DGEMM_DEFAULT_UNROLL_N 8
  2295. #define DGEMM_DEFAULT_UNROLL_M 2
  2296. #define SGEMM_DEFAULT_UNROLL_N 8
  2297. #define SGEMM_DEFAULT_UNROLL_M 2
  2298. #define CGEMM_DEFAULT_UNROLL_N 4
  2299. #define CGEMM_DEFAULT_UNROLL_M 1
  2300. #define ZGEMM_DEFAULT_UNROLL_N 4
  2301. #define ZGEMM_DEFAULT_UNROLL_M 1
  2302. #else
  2303. #define DGEMM_DEFAULT_UNROLL_N 6
  2304. #define DGEMM_DEFAULT_UNROLL_M 16
  2305. #define SGEMM_DEFAULT_UNROLL_N 8
  2306. #define SGEMM_DEFAULT_UNROLL_M 16
  2307. #define CGEMM_DEFAULT_UNROLL_N 4
  2308. #define CGEMM_DEFAULT_UNROLL_M 16
  2309. #define ZGEMM_DEFAULT_UNROLL_N 4
  2310. #define ZGEMM_DEFAULT_UNROLL_M 8
  2311. #define DGEMM_DEFAULT_UNROLL_MN 96
  2312. #endif
  2313. #define QGEMM_DEFAULT_UNROLL_N 2
  2314. #define XGEMM_DEFAULT_UNROLL_N 1
  2315. #define QGEMM_DEFAULT_UNROLL_M 2
  2316. #define XGEMM_DEFAULT_UNROLL_M 1
  2317. #define SGEMM_DEFAULT_P sgemm_p
  2318. #define DGEMM_DEFAULT_P dgemm_p
  2319. #define CGEMM_DEFAULT_P 128
  2320. #define ZGEMM_DEFAULT_P zgemm_p
  2321. #define SGEMM_DEFAULT_R sgemm_r
  2322. #define DGEMM_DEFAULT_R dgemm_r
  2323. #define CGEMM_DEFAULT_R 4096
  2324. #define ZGEMM_DEFAULT_R zgemm_r
  2325. #define SGEMM_DEFAULT_Q sgemm_q
  2326. #define DGEMM_DEFAULT_Q dgemm_q
  2327. #define CGEMM_DEFAULT_Q 128
  2328. #define ZGEMM_DEFAULT_Q zgemm_q
  2329. #define SYMV_P 16
  2330. #endif
  2331. #ifdef LA264
  2332. #define GEMM_DEFAULT_OFFSET_A 0
  2333. #define GEMM_DEFAULT_OFFSET_B 0
  2334. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2335. #define SGEMM_DEFAULT_UNROLL_M 2
  2336. #define SGEMM_DEFAULT_UNROLL_N 8
  2337. #define DGEMM_DEFAULT_UNROLL_M 8
  2338. #define DGEMM_DEFAULT_UNROLL_N 4
  2339. #define CGEMM_DEFAULT_UNROLL_M 8
  2340. #define CGEMM_DEFAULT_UNROLL_N 4
  2341. #define ZGEMM_DEFAULT_UNROLL_M 4
  2342. #define ZGEMM_DEFAULT_UNROLL_N 4
  2343. #define SGEMM_DEFAULT_P 128
  2344. #define DGEMM_DEFAULT_P 128
  2345. #define CGEMM_DEFAULT_P 96
  2346. #define ZGEMM_DEFAULT_P 64
  2347. #define SGEMM_DEFAULT_Q 240
  2348. #define DGEMM_DEFAULT_Q 120
  2349. #define CGEMM_DEFAULT_Q 120
  2350. #define ZGEMM_DEFAULT_Q 120
  2351. #define SGEMM_DEFAULT_R 12288
  2352. #define DGEMM_DEFAULT_R 8192
  2353. #define CGEMM_DEFAULT_R 4096
  2354. #define ZGEMM_DEFAULT_R 4096
  2355. #define SYMV_P 16
  2356. #endif
  2357. #ifdef LA64_GENERIC
  2358. #define GEMM_DEFAULT_OFFSET_A 0
  2359. #define GEMM_DEFAULT_OFFSET_B 0
  2360. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2361. #define SGEMM_DEFAULT_UNROLL_M 2
  2362. #define SGEMM_DEFAULT_UNROLL_N 8
  2363. #define DGEMM_DEFAULT_UNROLL_M 2
  2364. #define DGEMM_DEFAULT_UNROLL_N 8
  2365. #define CGEMM_DEFAULT_UNROLL_M 1
  2366. #define CGEMM_DEFAULT_UNROLL_N 4
  2367. #define ZGEMM_DEFAULT_UNROLL_M 1
  2368. #define ZGEMM_DEFAULT_UNROLL_N 4
  2369. #define SGEMM_DEFAULT_P 128
  2370. #define DGEMM_DEFAULT_P 128
  2371. #define CGEMM_DEFAULT_P 96
  2372. #define ZGEMM_DEFAULT_P 64
  2373. #define SGEMM_DEFAULT_Q 240
  2374. #define DGEMM_DEFAULT_Q 120
  2375. #define CGEMM_DEFAULT_Q 120
  2376. #define ZGEMM_DEFAULT_Q 120
  2377. #define SGEMM_DEFAULT_R 12288
  2378. #define DGEMM_DEFAULT_R 8192
  2379. #define CGEMM_DEFAULT_R 4096
  2380. #define ZGEMM_DEFAULT_R 4096
  2381. #define SYMV_P 16
  2382. #endif
  2383. #if defined(MIPS64_GENERIC) || defined(P5600) || defined(MIPS1004K) || defined(MIPS24K) || defined(I6400) || defined(P6600) || defined(I6500)
  2384. #define SNUMOPT 2
  2385. #define DNUMOPT 2
  2386. #define GEMM_DEFAULT_OFFSET_A 0
  2387. #define GEMM_DEFAULT_OFFSET_B 0
  2388. #define GEMM_DEFAULT_ALIGN (BLASLONG) 0x03fffUL
  2389. #if defined(NO_MSA) || defined(MIPS64_GENERIC)
  2390. #define SGEMM_DEFAULT_UNROLL_M 2
  2391. #define SGEMM_DEFAULT_UNROLL_N 2
  2392. #define DGEMM_DEFAULT_UNROLL_M 2
  2393. #define DGEMM_DEFAULT_UNROLL_N 2
  2394. #define CGEMM_DEFAULT_UNROLL_M 2
  2395. #define CGEMM_DEFAULT_UNROLL_N 2
  2396. #define ZGEMM_DEFAULT_UNROLL_M 2
  2397. #define ZGEMM_DEFAULT_UNROLL_N 2
  2398. #else
  2399. #define SGEMM_DEFAULT_UNROLL_M 8
  2400. #define SGEMM_DEFAULT_UNROLL_N 8
  2401. #define DGEMM_DEFAULT_UNROLL_M 8
  2402. #define DGEMM_DEFAULT_UNROLL_N 4
  2403. #define CGEMM_DEFAULT_UNROLL_M 8
  2404. #define CGEMM_DEFAULT_UNROLL_N 4
  2405. #define ZGEMM_DEFAULT_UNROLL_M 4
  2406. #define ZGEMM_DEFAULT_UNROLL_N 4
  2407. #endif
  2408. #define SGEMM_DEFAULT_P 128
  2409. #define DGEMM_DEFAULT_P 128
  2410. #define CGEMM_DEFAULT_P 96
  2411. #define ZGEMM_DEFAULT_P 64
  2412. #define SGEMM_DEFAULT_Q 240
  2413. #define DGEMM_DEFAULT_Q 120
  2414. #define CGEMM_DEFAULT_Q 120
  2415. #define ZGEMM_DEFAULT_Q 120
  2416. #define SGEMM_DEFAULT_R 12288
  2417. #define DGEMM_DEFAULT_R 8192
  2418. #define CGEMM_DEFAULT_R 4096
  2419. #define ZGEMM_DEFAULT_R 4096
  2420. #define SYMV_P 16
  2421. #endif
  2422. #ifdef RISCV64_GENERIC
  2423. #define GEMM_DEFAULT_OFFSET_A 0
  2424. #define GEMM_DEFAULT_OFFSET_B 0
  2425. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2426. #define SGEMM_DEFAULT_UNROLL_M 2
  2427. #define SGEMM_DEFAULT_UNROLL_N 2
  2428. #define DGEMM_DEFAULT_UNROLL_M 2
  2429. #define DGEMM_DEFAULT_UNROLL_N 2
  2430. #define CGEMM_DEFAULT_UNROLL_M 2
  2431. #define CGEMM_DEFAULT_UNROLL_N 2
  2432. #define ZGEMM_DEFAULT_UNROLL_M 2
  2433. #define ZGEMM_DEFAULT_UNROLL_N 2
  2434. #define SGEMM_DEFAULT_P 128
  2435. #define DGEMM_DEFAULT_P 128
  2436. #define CGEMM_DEFAULT_P 96
  2437. #define ZGEMM_DEFAULT_P 64
  2438. #define SGEMM_DEFAULT_Q 240
  2439. #define DGEMM_DEFAULT_Q 120
  2440. #define CGEMM_DEFAULT_Q 120
  2441. #define ZGEMM_DEFAULT_Q 120
  2442. #define SGEMM_DEFAULT_R 12288
  2443. #define DGEMM_DEFAULT_R 8192
  2444. #define CGEMM_DEFAULT_R 4096
  2445. #define ZGEMM_DEFAULT_R 4096
  2446. #define SYMV_P 16
  2447. #define GEMM_DEFAULT_OFFSET_A 0
  2448. #define GEMM_DEFAULT_OFFSET_B 0
  2449. #endif
  2450. #if defined(x280)
  2451. #define GEMM_DEFAULT_OFFSET_A 0
  2452. #define GEMM_DEFAULT_OFFSET_B 0
  2453. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2454. #define SGEMM_DEFAULT_UNROLL_M 16 // 4 // 16 // 2
  2455. #define SGEMM_DEFAULT_UNROLL_N 8// 4 // 4 // 2
  2456. /* SGEMM_UNROLL_MN is calculated as max(SGEMM_UNROLL_M, SGEMM_UNROLL_N)
  2457. * Since we don't define SGEMM_UNROLL_M correctly we have to manually set this macro.
  2458. * If VLMAX size is ever more than 1024, this should be increased also. */
  2459. #define SGEMM_DEFAULT_UNROLL_MN 32
  2460. #define DGEMM_DEFAULT_UNROLL_M 16 //2 // 8
  2461. #define DGEMM_DEFAULT_UNROLL_N 8 //2 // 4
  2462. #define DGEMM_DEFAULT_UNROLL_MN 32
  2463. #define CGEMM_DEFAULT_UNROLL_M 8
  2464. #define CGEMM_DEFAULT_UNROLL_N 4
  2465. #define CGEMM_DEFAULT_UNROLL_MN 32
  2466. #define ZGEMM_DEFAULT_UNROLL_M 8
  2467. #define ZGEMM_DEFAULT_UNROLL_N 4
  2468. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2469. #define SGEMM_DEFAULT_P 160
  2470. #define DGEMM_DEFAULT_P 160
  2471. #define CGEMM_DEFAULT_P 96
  2472. #define ZGEMM_DEFAULT_P 64
  2473. #define SGEMM_DEFAULT_Q 240
  2474. #define DGEMM_DEFAULT_Q 128
  2475. #define CGEMM_DEFAULT_Q 120
  2476. #define ZGEMM_DEFAULT_Q 120
  2477. #define SGEMM_DEFAULT_R 12288
  2478. #define DGEMM_DEFAULT_R 8192
  2479. #define CGEMM_DEFAULT_R 4096
  2480. #define ZGEMM_DEFAULT_R 4096
  2481. #define SYMV_P 16
  2482. #define GEMM_DEFAULT_OFFSET_A 0
  2483. #define GEMM_DEFAULT_OFFSET_B 0
  2484. #endif
  2485. #ifdef C910V
  2486. #define GEMM_DEFAULT_OFFSET_A 0
  2487. #define GEMM_DEFAULT_OFFSET_B 0
  2488. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2489. #define SGEMM_DEFAULT_UNROLL_M 16
  2490. #define SGEMM_DEFAULT_UNROLL_N 4
  2491. #define DGEMM_DEFAULT_UNROLL_M 8
  2492. #define DGEMM_DEFAULT_UNROLL_N 4
  2493. #define CGEMM_DEFAULT_UNROLL_M 2
  2494. #define CGEMM_DEFAULT_UNROLL_N 2
  2495. #define ZGEMM_DEFAULT_UNROLL_M 2
  2496. #define ZGEMM_DEFAULT_UNROLL_N 2
  2497. #define SGEMM_DEFAULT_P 160
  2498. #define DGEMM_DEFAULT_P 160
  2499. #define CGEMM_DEFAULT_P 96
  2500. #define ZGEMM_DEFAULT_P 64
  2501. #define SGEMM_DEFAULT_Q 240
  2502. #define DGEMM_DEFAULT_Q 128
  2503. #define CGEMM_DEFAULT_Q 120
  2504. #define ZGEMM_DEFAULT_Q 120
  2505. #define SGEMM_DEFAULT_R 12288
  2506. #define DGEMM_DEFAULT_R 8192
  2507. #define CGEMM_DEFAULT_R 4096
  2508. #define ZGEMM_DEFAULT_R 4096
  2509. #define SYMV_P 16
  2510. #define GEMM_DEFAULT_OFFSET_A 0
  2511. #define GEMM_DEFAULT_OFFSET_B 0
  2512. #endif
  2513. #ifdef RISCV64_ZVL128B
  2514. #define GEMM_DEFAULT_OFFSET_A 0
  2515. #define GEMM_DEFAULT_OFFSET_B 0
  2516. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2517. #undef SHGEMM_DEFAULT_UNROLL_M
  2518. #undef SHGEMM_DEFAULT_UNROLL_N
  2519. #define SHGEMM_DEFAULT_UNROLL_M 8
  2520. #define SHGEMM_DEFAULT_UNROLL_N 8
  2521. #define SGEMM_DEFAULT_UNROLL_M 8
  2522. #define SGEMM_DEFAULT_UNROLL_N 8
  2523. #define DGEMM_DEFAULT_UNROLL_M 8
  2524. #define DGEMM_DEFAULT_UNROLL_N 4
  2525. #define CGEMM_DEFAULT_UNROLL_M 8
  2526. #define CGEMM_DEFAULT_UNROLL_N 4
  2527. #define ZGEMM_DEFAULT_UNROLL_M 4
  2528. #define ZGEMM_DEFAULT_UNROLL_N 4
  2529. #undef SHGEMM_DEFAULT_P
  2530. #define SHGEMM_DEFAULT_P 128
  2531. #define SGEMM_DEFAULT_P 128
  2532. #define DGEMM_DEFAULT_P 128
  2533. #define CGEMM_DEFAULT_P 96
  2534. #define ZGEMM_DEFAULT_P 64
  2535. #undef SHGEMM_DEFAULT_Q
  2536. #define SHGEMM_DEFAULT_Q 240
  2537. #define SGEMM_DEFAULT_Q 240
  2538. #define DGEMM_DEFAULT_Q 120
  2539. #define CGEMM_DEFAULT_Q 120
  2540. #define ZGEMM_DEFAULT_Q 120
  2541. #undef SHGEMM_DEFAULT_R
  2542. #define SHGEMM_DEFAULT_R 12288
  2543. #define SGEMM_DEFAULT_R 12288
  2544. #define DGEMM_DEFAULT_R 8192
  2545. #define CGEMM_DEFAULT_R 4096
  2546. #define ZGEMM_DEFAULT_R 4096
  2547. #define SYMV_P 16
  2548. #define GEMM_DEFAULT_OFFSET_A 0
  2549. #define GEMM_DEFAULT_OFFSET_B 0
  2550. #endif
  2551. #ifdef RISCV64_ZVL256B
  2552. #define GEMM_DEFAULT_OFFSET_A 0
  2553. #define GEMM_DEFAULT_OFFSET_B 0
  2554. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2555. #undef SHGEMM_DEFAULT_UNROLL_M
  2556. #undef SHGEMM_DEFAULT_UNROLL_N
  2557. #define SHGEMM_DEFAULT_UNROLL_M 16
  2558. #define SHGEMM_DEFAULT_UNROLL_N 8
  2559. #define SGEMM_DEFAULT_UNROLL_M 16
  2560. #define SGEMM_DEFAULT_UNROLL_N 8
  2561. #define DGEMM_DEFAULT_UNROLL_M 8
  2562. #define DGEMM_DEFAULT_UNROLL_N 8
  2563. #define CGEMM_DEFAULT_UNROLL_M 8
  2564. #define CGEMM_DEFAULT_UNROLL_N 8
  2565. #define ZGEMM_DEFAULT_UNROLL_M 8
  2566. #define ZGEMM_DEFAULT_UNROLL_N 4
  2567. #undef SHGEMM_DEFAULT_P
  2568. #define SHGEMM_DEFAULT_P 128
  2569. #define SGEMM_DEFAULT_P 128
  2570. #define DGEMM_DEFAULT_P 64
  2571. #define CGEMM_DEFAULT_P 64
  2572. #define ZGEMM_DEFAULT_P 64
  2573. #undef SHGEMM_DEFAULT_Q
  2574. #define SHGEMM_DEFAULT_Q 128
  2575. #define SGEMM_DEFAULT_Q 128
  2576. #define DGEMM_DEFAULT_Q 128
  2577. #define CGEMM_DEFAULT_Q 128
  2578. #define ZGEMM_DEFAULT_Q 64
  2579. #undef SHGEMM_DEFAULT_R
  2580. #define SHGEMM_DEFAULT_R 16384
  2581. #define SGEMM_DEFAULT_R 16384
  2582. #define DGEMM_DEFAULT_R 8192
  2583. #define CGEMM_DEFAULT_R 8192
  2584. #define ZGEMM_DEFAULT_R 4096
  2585. #define SYMV_P 16
  2586. #define GEMM_DEFAULT_OFFSET_A 0
  2587. #define GEMM_DEFAULT_OFFSET_B 0
  2588. #endif
  2589. #ifdef ARMV7
  2590. #define SNUMOPT 2
  2591. #define DNUMOPT 2
  2592. #define GEMM_DEFAULT_OFFSET_A 0
  2593. #define GEMM_DEFAULT_OFFSET_B 0
  2594. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2595. #define SGEMM_DEFAULT_UNROLL_M 4
  2596. #define SGEMM_DEFAULT_UNROLL_N 4
  2597. #define DGEMM_DEFAULT_UNROLL_M 4
  2598. #define DGEMM_DEFAULT_UNROLL_N 4
  2599. #define CGEMM_DEFAULT_UNROLL_M 2
  2600. #define CGEMM_DEFAULT_UNROLL_N 2
  2601. #define ZGEMM_DEFAULT_UNROLL_M 2
  2602. #define ZGEMM_DEFAULT_UNROLL_N 2
  2603. #define SGEMM_DEFAULT_P 128
  2604. #define DGEMM_DEFAULT_P 128
  2605. #define CGEMM_DEFAULT_P 96
  2606. #define ZGEMM_DEFAULT_P 64
  2607. #define SGEMM_DEFAULT_Q 240
  2608. #define DGEMM_DEFAULT_Q 120
  2609. #define CGEMM_DEFAULT_Q 120
  2610. #define ZGEMM_DEFAULT_Q 120
  2611. #define SGEMM_DEFAULT_R 12288
  2612. #define DGEMM_DEFAULT_R 8192
  2613. #define CGEMM_DEFAULT_R 4096
  2614. #define ZGEMM_DEFAULT_R 4096
  2615. #define SYMV_P 16
  2616. #endif
  2617. #if defined(ARMV6)
  2618. #define SNUMOPT 2
  2619. #define DNUMOPT 2
  2620. #define GEMM_DEFAULT_OFFSET_A 0
  2621. #define GEMM_DEFAULT_OFFSET_B 0
  2622. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2623. #define SGEMM_DEFAULT_UNROLL_M 4
  2624. #define SGEMM_DEFAULT_UNROLL_N 2
  2625. #define DGEMM_DEFAULT_UNROLL_M 4
  2626. #define DGEMM_DEFAULT_UNROLL_N 2
  2627. #define CGEMM_DEFAULT_UNROLL_M 2
  2628. #define CGEMM_DEFAULT_UNROLL_N 2
  2629. #define ZGEMM_DEFAULT_UNROLL_M 2
  2630. #define ZGEMM_DEFAULT_UNROLL_N 2
  2631. #define SGEMM_DEFAULT_P 128
  2632. #define DGEMM_DEFAULT_P 128
  2633. #define CGEMM_DEFAULT_P 96
  2634. #define ZGEMM_DEFAULT_P 64
  2635. #define SGEMM_DEFAULT_Q 240
  2636. #define DGEMM_DEFAULT_Q 120
  2637. #define CGEMM_DEFAULT_Q 120
  2638. #define ZGEMM_DEFAULT_Q 120
  2639. #define SGEMM_DEFAULT_R 12288
  2640. #define DGEMM_DEFAULT_R 8192
  2641. #define CGEMM_DEFAULT_R 4096
  2642. #define ZGEMM_DEFAULT_R 4096
  2643. #define SYMV_P 16
  2644. #endif
  2645. /* Common ARMv8 parameters */
  2646. #if defined(ARMV8)
  2647. #define SNUMOPT 2
  2648. #define DNUMOPT 2
  2649. #define GEMM_DEFAULT_OFFSET_A 0
  2650. #define GEMM_DEFAULT_OFFSET_B 0
  2651. #ifdef _WIN64
  2652. /* Use explicit casting for win64 as LLP64 datamodel is used */
  2653. #define GEMM_DEFAULT_ALIGN (BLASULONG)0x03fffUL
  2654. #else
  2655. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2656. #endif
  2657. #define SYMV_P 16
  2658. #if defined(CORTEXA57) || defined(CORTEXX1) || \
  2659. defined(CORTEXA72) || defined(CORTEXA73) || \
  2660. defined(FALKOR) || defined(TSV110) || defined(EMAG8180) || defined(VORTEX) || defined(FT2000)
  2661. #define SGEMM_DEFAULT_UNROLL_M 16
  2662. #define SGEMM_DEFAULT_UNROLL_N 4
  2663. #define DGEMM_DEFAULT_UNROLL_M 8
  2664. #define DGEMM_DEFAULT_UNROLL_N 4
  2665. #define CGEMM_DEFAULT_UNROLL_M 8
  2666. #define CGEMM_DEFAULT_UNROLL_N 4
  2667. #define ZGEMM_DEFAULT_UNROLL_M 4
  2668. #define ZGEMM_DEFAULT_UNROLL_N 4
  2669. /*FIXME: this should be using the cache size, but there is currently no easy way to
  2670. query that on ARM. So if getarch counted more than 8 cores we simply assume the host
  2671. is a big desktop or server with abundant cache rather than a phone or embedded device */
  2672. #if NUM_CORES > 8 || defined(TSV110) || defined(EMAG8180) || defined(VORTEX)|| defined(CORTEXX1)
  2673. #define SGEMM_DEFAULT_P 512
  2674. #define DGEMM_DEFAULT_P 256
  2675. #define CGEMM_DEFAULT_P 256
  2676. #define ZGEMM_DEFAULT_P 128
  2677. #define SGEMM_DEFAULT_Q 1024
  2678. #define DGEMM_DEFAULT_Q 512
  2679. #define CGEMM_DEFAULT_Q 512
  2680. #define ZGEMM_DEFAULT_Q 512
  2681. #else
  2682. #define SGEMM_DEFAULT_P 128
  2683. #define DGEMM_DEFAULT_P 160
  2684. #define CGEMM_DEFAULT_P 128
  2685. #define ZGEMM_DEFAULT_P 128
  2686. #define SGEMM_DEFAULT_Q 352
  2687. #define DGEMM_DEFAULT_Q 128
  2688. #define CGEMM_DEFAULT_Q 224
  2689. #define ZGEMM_DEFAULT_Q 112
  2690. #endif
  2691. #define SGEMM_DEFAULT_R 4096
  2692. #define DGEMM_DEFAULT_R 4096
  2693. #define CGEMM_DEFAULT_R 4096
  2694. #define ZGEMM_DEFAULT_R 2048
  2695. #elif defined(CORTEXA76)
  2696. #define SGEMM_DEFAULT_UNROLL_M 16
  2697. #define SGEMM_DEFAULT_UNROLL_N 4
  2698. #define DGEMM_DEFAULT_UNROLL_M 8
  2699. #define DGEMM_DEFAULT_UNROLL_N 4
  2700. #define CGEMM_DEFAULT_UNROLL_M 8
  2701. #define CGEMM_DEFAULT_UNROLL_N 4
  2702. #define ZGEMM_DEFAULT_UNROLL_M 4
  2703. #define ZGEMM_DEFAULT_UNROLL_N 4
  2704. #if defined(XDOUBLE) || defined(DOUBLE)
  2705. #define SWITCH_RATIO 8
  2706. #else
  2707. #define SWITCH_RATIO 16
  2708. #endif
  2709. #define SGEMM_DEFAULT_P 256
  2710. #define DGEMM_DEFAULT_P 128
  2711. #define CGEMM_DEFAULT_P 128
  2712. #define ZGEMM_DEFAULT_P 64
  2713. #define SGEMM_DEFAULT_Q 512
  2714. #define DGEMM_DEFAULT_Q 256
  2715. #define CGEMM_DEFAULT_Q 256
  2716. #define ZGEMM_DEFAULT_Q 256
  2717. #define SGEMM_DEFAULT_R 4096
  2718. #define DGEMM_DEFAULT_R 4096
  2719. #define CGEMM_DEFAULT_R 4096
  2720. #define ZGEMM_DEFAULT_R 4096
  2721. #elif defined(CORTEXA53) || defined(CORTEXA55)
  2722. #define SGEMM_DEFAULT_UNROLL_M 8
  2723. #define SGEMM_DEFAULT_UNROLL_N 8
  2724. #define DGEMM_DEFAULT_UNROLL_M 4
  2725. #define DGEMM_DEFAULT_UNROLL_N 4
  2726. #define CGEMM_DEFAULT_UNROLL_M 8
  2727. #define CGEMM_DEFAULT_UNROLL_N 4
  2728. #define ZGEMM_DEFAULT_UNROLL_M 4
  2729. #define ZGEMM_DEFAULT_UNROLL_N 4
  2730. #define SGEMM_DEFAULT_P 256
  2731. #define DGEMM_DEFAULT_P 160
  2732. #define CGEMM_DEFAULT_P 128
  2733. #define ZGEMM_DEFAULT_P 128
  2734. #define SGEMM_DEFAULT_Q 256
  2735. #define DGEMM_DEFAULT_Q 128
  2736. #define CGEMM_DEFAULT_Q 224
  2737. #define ZGEMM_DEFAULT_Q 112
  2738. #define SGEMM_DEFAULT_R 4096
  2739. #define DGEMM_DEFAULT_R 4096
  2740. #define CGEMM_DEFAULT_R 4096
  2741. #define ZGEMM_DEFAULT_R 2048
  2742. #elif defined(THUNDERX)
  2743. #define SGEMM_DEFAULT_UNROLL_M 4
  2744. #define SGEMM_DEFAULT_UNROLL_N 4
  2745. #define DGEMM_DEFAULT_UNROLL_M 2
  2746. #define DGEMM_DEFAULT_UNROLL_N 2
  2747. #define CGEMM_DEFAULT_UNROLL_M 2
  2748. #define CGEMM_DEFAULT_UNROLL_N 2
  2749. #define ZGEMM_DEFAULT_UNROLL_M 2
  2750. #define ZGEMM_DEFAULT_UNROLL_N 2
  2751. #define SGEMM_DEFAULT_P 128
  2752. #define DGEMM_DEFAULT_P 128
  2753. #define CGEMM_DEFAULT_P 96
  2754. #define ZGEMM_DEFAULT_P 64
  2755. #define SGEMM_DEFAULT_Q 240
  2756. #define DGEMM_DEFAULT_Q 120
  2757. #define CGEMM_DEFAULT_Q 120
  2758. #define ZGEMM_DEFAULT_Q 120
  2759. #define SGEMM_DEFAULT_R 12288
  2760. #define DGEMM_DEFAULT_R 8192
  2761. #define CGEMM_DEFAULT_R 4096
  2762. #define ZGEMM_DEFAULT_R 4096
  2763. #elif defined(THUNDERX2T99)
  2764. #define SGEMM_DEFAULT_UNROLL_M 16
  2765. #define SGEMM_DEFAULT_UNROLL_N 4
  2766. #define DGEMM_DEFAULT_UNROLL_M 8
  2767. #define DGEMM_DEFAULT_UNROLL_N 4
  2768. #define CGEMM_DEFAULT_UNROLL_M 8
  2769. #define CGEMM_DEFAULT_UNROLL_N 4
  2770. #define ZGEMM_DEFAULT_UNROLL_M 4
  2771. #define ZGEMM_DEFAULT_UNROLL_N 4
  2772. #define SGEMM_DEFAULT_P 128
  2773. #define DGEMM_DEFAULT_P 160
  2774. #define CGEMM_DEFAULT_P 128
  2775. #define ZGEMM_DEFAULT_P 128
  2776. #define SGEMM_DEFAULT_Q 352
  2777. #define DGEMM_DEFAULT_Q 128
  2778. #define CGEMM_DEFAULT_Q 224
  2779. #define ZGEMM_DEFAULT_Q 112
  2780. #define SGEMM_DEFAULT_R 4096
  2781. #define DGEMM_DEFAULT_R 4096
  2782. #define CGEMM_DEFAULT_R 4096
  2783. #define ZGEMM_DEFAULT_R 4096
  2784. #elif defined(THUNDERX3T110)
  2785. #define SGEMM_DEFAULT_UNROLL_M 16
  2786. #define SGEMM_DEFAULT_UNROLL_N 4
  2787. #define DGEMM_DEFAULT_UNROLL_M 8
  2788. #define DGEMM_DEFAULT_UNROLL_N 4
  2789. #define CGEMM_DEFAULT_UNROLL_M 8
  2790. #define CGEMM_DEFAULT_UNROLL_N 4
  2791. #define ZGEMM_DEFAULT_UNROLL_M 4
  2792. #define ZGEMM_DEFAULT_UNROLL_N 4
  2793. #define SGEMM_DEFAULT_P 128
  2794. #define DGEMM_DEFAULT_P 320
  2795. #define CGEMM_DEFAULT_P 128
  2796. #define ZGEMM_DEFAULT_P 128
  2797. #define SGEMM_DEFAULT_Q 352
  2798. #define DGEMM_DEFAULT_Q 128
  2799. #define CGEMM_DEFAULT_Q 224
  2800. #define ZGEMM_DEFAULT_Q 112
  2801. #define SGEMM_DEFAULT_R 4096
  2802. #define DGEMM_DEFAULT_R 4096
  2803. #define CGEMM_DEFAULT_R 4096
  2804. #define ZGEMM_DEFAULT_R 4096
  2805. #elif defined(NEOVERSEN1)
  2806. #if defined(XDOUBLE) || defined(DOUBLE)
  2807. #define SWITCH_RATIO 8
  2808. #else
  2809. #define SWITCH_RATIO 16
  2810. #endif
  2811. #define SGEMM_DEFAULT_UNROLL_M 16
  2812. #define SGEMM_DEFAULT_UNROLL_N 4
  2813. #define DGEMM_DEFAULT_UNROLL_M 8
  2814. #define DGEMM_DEFAULT_UNROLL_N 4
  2815. #define CGEMM_DEFAULT_UNROLL_M 8
  2816. #define CGEMM_DEFAULT_UNROLL_N 4
  2817. #define ZGEMM_DEFAULT_UNROLL_M 4
  2818. #define ZGEMM_DEFAULT_UNROLL_N 4
  2819. #define SGEMM_DEFAULT_P 240
  2820. #define DGEMM_DEFAULT_P 240
  2821. #define CGEMM_DEFAULT_P 128
  2822. #define ZGEMM_DEFAULT_P 128
  2823. #define SGEMM_DEFAULT_Q 640
  2824. #define DGEMM_DEFAULT_Q 320
  2825. #define CGEMM_DEFAULT_Q 224
  2826. #define ZGEMM_DEFAULT_Q 112
  2827. #define SGEMM_DEFAULT_R 4096
  2828. #define DGEMM_DEFAULT_R 4096
  2829. #define CGEMM_DEFAULT_R 4096
  2830. #define ZGEMM_DEFAULT_R 4096
  2831. #elif defined(NEOVERSEV1) // 256-bit SVE
  2832. #if defined(XDOUBLE) || defined(DOUBLE)
  2833. #define SWITCH_RATIO 8
  2834. #define GEMM_PREFERED_SIZE 4
  2835. #else
  2836. #define SWITCH_RATIO 16
  2837. #define GEMM_PREFERED_SIZE 8
  2838. #endif
  2839. #undef SBGEMM_ALIGN_K
  2840. #undef SBGEMM_DEFAULT_UNROLL_M
  2841. #undef SBGEMM_DEFAULT_UNROLL_N
  2842. #define SBGEMM_ALIGN_K 8
  2843. #define SBGEMM_DEFAULT_UNROLL_M 4
  2844. #define SBGEMM_DEFAULT_UNROLL_N 4
  2845. #define SGEMM_DEFAULT_UNROLL_M 16
  2846. #define SGEMM_DEFAULT_UNROLL_N 8
  2847. #define DGEMM_DEFAULT_UNROLL_M 4 // Actually 2VL (8) but kept separate to keep copies separate
  2848. #define DGEMM_DEFAULT_UNROLL_N 8
  2849. #define CGEMM_DEFAULT_UNROLL_M 2
  2850. #define CGEMM_DEFAULT_UNROLL_N 4
  2851. #define CGEMM_DEFAULT_UNROLL_MN 16
  2852. #define ZGEMM_DEFAULT_UNROLL_M 2
  2853. #define ZGEMM_DEFAULT_UNROLL_N 4
  2854. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2855. #define SGEMM_DEFAULT_P 240
  2856. #define DGEMM_DEFAULT_P 240
  2857. #define CGEMM_DEFAULT_P 128
  2858. #define ZGEMM_DEFAULT_P 128
  2859. #define SGEMM_DEFAULT_Q 640
  2860. #define DGEMM_DEFAULT_Q 320
  2861. #define CGEMM_DEFAULT_Q 224
  2862. #define ZGEMM_DEFAULT_Q 112
  2863. #define SGEMM_DEFAULT_R 4096
  2864. #define DGEMM_DEFAULT_R 4096
  2865. #define CGEMM_DEFAULT_R 4096
  2866. #define ZGEMM_DEFAULT_R 4096
  2867. #elif defined(NEOVERSEN2)
  2868. #if defined(XDOUBLE) || defined(DOUBLE)
  2869. #define SWITCH_RATIO 8
  2870. #else
  2871. #define SWITCH_RATIO 16
  2872. #endif
  2873. #undef SBGEMM_ALIGN_K
  2874. #define SBGEMM_ALIGN_K 4
  2875. #undef SBGEMM_DEFAULT_UNROLL_M
  2876. #undef SBGEMM_DEFAULT_UNROLL_N
  2877. #define SBGEMM_DEFAULT_UNROLL_M 8
  2878. #define SBGEMM_DEFAULT_UNROLL_N 4
  2879. #define SGEMM_DEFAULT_UNROLL_M 16
  2880. #define SGEMM_DEFAULT_UNROLL_N 4
  2881. #define DGEMM_DEFAULT_UNROLL_M 8
  2882. #define DGEMM_DEFAULT_UNROLL_N 4
  2883. #define CGEMM_DEFAULT_UNROLL_M 8
  2884. #define CGEMM_DEFAULT_UNROLL_N 4
  2885. #define ZGEMM_DEFAULT_UNROLL_M 4
  2886. #define ZGEMM_DEFAULT_UNROLL_N 4
  2887. #define SGEMM_DEFAULT_P 128
  2888. #define DGEMM_DEFAULT_P 160
  2889. #define CGEMM_DEFAULT_P 128
  2890. #define ZGEMM_DEFAULT_P 128
  2891. #define SGEMM_DEFAULT_Q 352
  2892. #define DGEMM_DEFAULT_Q 128
  2893. #define CGEMM_DEFAULT_Q 224
  2894. #define ZGEMM_DEFAULT_Q 112
  2895. #define SGEMM_DEFAULT_R 4096
  2896. #define DGEMM_DEFAULT_R 4096
  2897. #define CGEMM_DEFAULT_R 4096
  2898. #define ZGEMM_DEFAULT_R 4096
  2899. #elif defined(A64FX) // 512-bit SVE
  2900. /* When all BLAS3 routines are implemeted with SVE, SGEMM_DEFAULT_UNROLL_M should be "sve_vl".
  2901. Until then, just keep it different than DGEMM_DEFAULT_UNROLL_N to keep copy routines in both directions seperated. */
  2902. #define SGEMM_DEFAULT_UNROLL_M 4
  2903. #define SGEMM_DEFAULT_UNROLL_N 8
  2904. /* SGEMM_UNROLL_MN is calculated as max(SGEMM_UNROLL_M, SGEMM_UNROLL_N)
  2905. * Since we don't define SGEMM_UNROLL_M correctly we have to manually set this macro.
  2906. * If SVE size is ever more than 1024, this should be increased also. */
  2907. #define SGEMM_DEFAULT_UNROLL_MN 32
  2908. /* When all BLAS3 routines are implemeted with SVE, DGEMM_DEFAULT_UNROLL_M should be "sve_vl".
  2909. Until then, just keep it different than DGEMM_DEFAULT_UNROLL_N to keep copy routines in both directions seperated. */
  2910. #define DGEMM_DEFAULT_UNROLL_M 2
  2911. #define DGEMM_DEFAULT_UNROLL_N 8
  2912. #define DGEMM_DEFAULT_UNROLL_MN 32
  2913. #define CGEMM_DEFAULT_UNROLL_M 2
  2914. #define CGEMM_DEFAULT_UNROLL_N 4
  2915. #define CGEMM_DEFAULT_UNROLL_MN 16
  2916. #define ZGEMM_DEFAULT_UNROLL_M 2
  2917. #define ZGEMM_DEFAULT_UNROLL_N 4
  2918. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2919. #define SGEMM_DEFAULT_P 128
  2920. #define DGEMM_DEFAULT_P 160
  2921. #define CGEMM_DEFAULT_P 128
  2922. #define ZGEMM_DEFAULT_P 128
  2923. #define SGEMM_DEFAULT_Q 352
  2924. #define DGEMM_DEFAULT_Q 128
  2925. #define CGEMM_DEFAULT_Q 224
  2926. #define ZGEMM_DEFAULT_Q 112
  2927. #define SGEMM_DEFAULT_R 4096
  2928. #define DGEMM_DEFAULT_R 4096
  2929. #define CGEMM_DEFAULT_R 4096
  2930. #define ZGEMM_DEFAULT_R 4096
  2931. #elif defined(ARMV8SVE) || defined(ARMV9SME) || defined(ARMV9) || defined(CORTEXA510)|| defined(CORTEXA710) || defined(CORTEXX2) // 128-bit SVE
  2932. #if defined(XDOUBLE) || defined(DOUBLE)
  2933. #define SWITCH_RATIO 8
  2934. #else
  2935. #define SWITCH_RATIO 16
  2936. #endif
  2937. #define SGEMM_DEFAULT_UNROLL_M 4 // Actually 1VL (8) but kept seperate to keep copies seperate
  2938. #define SGEMM_DEFAULT_UNROLL_N 8
  2939. #define DGEMM_DEFAULT_UNROLL_M 4
  2940. #define DGEMM_DEFAULT_UNROLL_N 8
  2941. #define CGEMM_DEFAULT_UNROLL_M 2
  2942. #define CGEMM_DEFAULT_UNROLL_N 4
  2943. #define CGEMM_DEFAULT_UNROLL_MN 16
  2944. #define ZGEMM_DEFAULT_UNROLL_M 2
  2945. #define ZGEMM_DEFAULT_UNROLL_N 4
  2946. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2947. #define SGEMM_DEFAULT_P 128
  2948. #define DGEMM_DEFAULT_P 160
  2949. #define CGEMM_DEFAULT_P 128
  2950. #define ZGEMM_DEFAULT_P 128
  2951. #define SGEMM_DEFAULT_Q 352
  2952. #define DGEMM_DEFAULT_Q 128
  2953. #define CGEMM_DEFAULT_Q 224
  2954. #define ZGEMM_DEFAULT_Q 112
  2955. #define SGEMM_DEFAULT_R 4096
  2956. #define DGEMM_DEFAULT_R 4096
  2957. #define CGEMM_DEFAULT_R 4096
  2958. #define ZGEMM_DEFAULT_R 4096
  2959. #else /* Other/undetected ARMv8 cores */
  2960. #define SGEMM_DEFAULT_UNROLL_M 16
  2961. #define SGEMM_DEFAULT_UNROLL_N 4
  2962. #define DGEMM_DEFAULT_UNROLL_M 8
  2963. #define DGEMM_DEFAULT_UNROLL_N 4
  2964. #define CGEMM_DEFAULT_UNROLL_M 8
  2965. #define CGEMM_DEFAULT_UNROLL_N 4
  2966. #define ZGEMM_DEFAULT_UNROLL_M 4
  2967. #define ZGEMM_DEFAULT_UNROLL_N 4
  2968. #define SGEMM_DEFAULT_P 128
  2969. #define DGEMM_DEFAULT_P 160
  2970. #define CGEMM_DEFAULT_P 128
  2971. #define ZGEMM_DEFAULT_P 128
  2972. #define SGEMM_DEFAULT_Q 352
  2973. #define DGEMM_DEFAULT_Q 128
  2974. #define CGEMM_DEFAULT_Q 224
  2975. #define ZGEMM_DEFAULT_Q 112
  2976. #define SGEMM_DEFAULT_R 4096
  2977. #define DGEMM_DEFAULT_R 4096
  2978. #define CGEMM_DEFAULT_R 4096
  2979. #define ZGEMM_DEFAULT_R 4096
  2980. #endif /* Cores */
  2981. #endif /* ARMv8 */
  2982. #if defined(ARMV9SME) /* ARMv9 SME */
  2983. #define USE_SGEMM_KERNEL_DIRECT 1
  2984. #endif /* ARMv9 SME */
  2985. #if defined(ARMV5)
  2986. #define SNUMOPT 2
  2987. #define DNUMOPT 2
  2988. #define GEMM_DEFAULT_OFFSET_A 0
  2989. #define GEMM_DEFAULT_OFFSET_B 0
  2990. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2991. #define SGEMM_DEFAULT_UNROLL_M 2
  2992. #define SGEMM_DEFAULT_UNROLL_N 2
  2993. #define DGEMM_DEFAULT_UNROLL_M 2
  2994. #define DGEMM_DEFAULT_UNROLL_N 2
  2995. #define CGEMM_DEFAULT_UNROLL_M 2
  2996. #define CGEMM_DEFAULT_UNROLL_N 2
  2997. #define ZGEMM_DEFAULT_UNROLL_M 2
  2998. #define ZGEMM_DEFAULT_UNROLL_N 2
  2999. #define SGEMM_DEFAULT_P 128
  3000. #define DGEMM_DEFAULT_P 128
  3001. #define CGEMM_DEFAULT_P 96
  3002. #define ZGEMM_DEFAULT_P 64
  3003. #define SGEMM_DEFAULT_Q 240
  3004. #define DGEMM_DEFAULT_Q 120
  3005. #define CGEMM_DEFAULT_Q 120
  3006. #define ZGEMM_DEFAULT_Q 120
  3007. #define SGEMM_DEFAULT_R 12288
  3008. #define DGEMM_DEFAULT_R 8192
  3009. #define CGEMM_DEFAULT_R 4096
  3010. #define ZGEMM_DEFAULT_R 4096
  3011. #define SYMV_P 16
  3012. #endif
  3013. #ifdef CORTEXA9
  3014. #define SNUMOPT 2
  3015. #define DNUMOPT 2
  3016. #define GEMM_DEFAULT_OFFSET_A 0
  3017. #define GEMM_DEFAULT_OFFSET_B 0
  3018. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3019. #define SGEMM_DEFAULT_UNROLL_M 4
  3020. #define SGEMM_DEFAULT_UNROLL_N 4
  3021. #define DGEMM_DEFAULT_UNROLL_M 4
  3022. #define DGEMM_DEFAULT_UNROLL_N 4
  3023. #define CGEMM_DEFAULT_UNROLL_M 2
  3024. #define CGEMM_DEFAULT_UNROLL_N 2
  3025. #define ZGEMM_DEFAULT_UNROLL_M 2
  3026. #define ZGEMM_DEFAULT_UNROLL_N 2
  3027. #define SGEMM_DEFAULT_P 128
  3028. #define DGEMM_DEFAULT_P 128
  3029. #define CGEMM_DEFAULT_P 96
  3030. #define ZGEMM_DEFAULT_P 64
  3031. #define SGEMM_DEFAULT_Q 240
  3032. #define DGEMM_DEFAULT_Q 120
  3033. #define CGEMM_DEFAULT_Q 120
  3034. #define ZGEMM_DEFAULT_Q 120
  3035. #define SGEMM_DEFAULT_R 12288
  3036. #define DGEMM_DEFAULT_R 8192
  3037. #define CGEMM_DEFAULT_R 4096
  3038. #define ZGEMM_DEFAULT_R 4096
  3039. #define SYMV_P 16
  3040. #endif
  3041. #ifdef CORTEXA15
  3042. #define SNUMOPT 2
  3043. #define DNUMOPT 2
  3044. #define GEMM_DEFAULT_OFFSET_A 0
  3045. #define GEMM_DEFAULT_OFFSET_B 0
  3046. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3047. #define SGEMM_DEFAULT_UNROLL_M 4
  3048. #define SGEMM_DEFAULT_UNROLL_N 4
  3049. #define DGEMM_DEFAULT_UNROLL_M 4
  3050. #define DGEMM_DEFAULT_UNROLL_N 4
  3051. #define CGEMM_DEFAULT_UNROLL_M 2
  3052. #define CGEMM_DEFAULT_UNROLL_N 2
  3053. #define ZGEMM_DEFAULT_UNROLL_M 2
  3054. #define ZGEMM_DEFAULT_UNROLL_N 2
  3055. #define SGEMM_DEFAULT_P 128
  3056. #define DGEMM_DEFAULT_P 128
  3057. #define CGEMM_DEFAULT_P 96
  3058. #define ZGEMM_DEFAULT_P 64
  3059. #define SGEMM_DEFAULT_Q 240
  3060. #define DGEMM_DEFAULT_Q 120
  3061. #define CGEMM_DEFAULT_Q 120
  3062. #define ZGEMM_DEFAULT_Q 120
  3063. #define SGEMM_DEFAULT_R 12288
  3064. #define DGEMM_DEFAULT_R 8192
  3065. #define CGEMM_DEFAULT_R 4096
  3066. #define ZGEMM_DEFAULT_R 4096
  3067. #define SYMV_P 16
  3068. #endif
  3069. #if defined(ZARCH_GENERIC)
  3070. #define SNUMOPT 2
  3071. #define DNUMOPT 2
  3072. #define GEMM_DEFAULT_OFFSET_A 0
  3073. #define GEMM_DEFAULT_OFFSET_B 0
  3074. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3075. #define SGEMM_DEFAULT_UNROLL_M 2
  3076. #define SGEMM_DEFAULT_UNROLL_N 2
  3077. #define DGEMM_DEFAULT_UNROLL_M 2
  3078. #define DGEMM_DEFAULT_UNROLL_N 2
  3079. #define CGEMM_DEFAULT_UNROLL_M 2
  3080. #define CGEMM_DEFAULT_UNROLL_N 2
  3081. #define ZGEMM_DEFAULT_UNROLL_M 2
  3082. #define ZGEMM_DEFAULT_UNROLL_N 2
  3083. #define SGEMM_DEFAULT_P 128
  3084. #define DGEMM_DEFAULT_P 128
  3085. #define CGEMM_DEFAULT_P 96
  3086. #define ZGEMM_DEFAULT_P 64
  3087. #define SGEMM_DEFAULT_Q 240
  3088. #define DGEMM_DEFAULT_Q 120
  3089. #define CGEMM_DEFAULT_Q 120
  3090. #define ZGEMM_DEFAULT_Q 120
  3091. #define SGEMM_DEFAULT_R 12288
  3092. #define DGEMM_DEFAULT_R 8192
  3093. #define CGEMM_DEFAULT_R 4096
  3094. #define ZGEMM_DEFAULT_R 4096
  3095. #define SYMV_P 16
  3096. #endif
  3097. #if defined(Z13)
  3098. #define SNUMOPT 2
  3099. #define DNUMOPT 2
  3100. #define GEMM_DEFAULT_OFFSET_A 0
  3101. #define GEMM_DEFAULT_OFFSET_B 0
  3102. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3103. #define SGEMM_DEFAULT_UNROLL_M 8
  3104. #define SGEMM_DEFAULT_UNROLL_N 4
  3105. #define DGEMM_DEFAULT_UNROLL_M 8
  3106. #define DGEMM_DEFAULT_UNROLL_N 4
  3107. #define CGEMM_DEFAULT_UNROLL_M 4
  3108. #define CGEMM_DEFAULT_UNROLL_N 4
  3109. #define ZGEMM_DEFAULT_UNROLL_M 4
  3110. #define ZGEMM_DEFAULT_UNROLL_N 4
  3111. #define SGEMM_DEFAULT_P 456
  3112. #define DGEMM_DEFAULT_P 320
  3113. #define CGEMM_DEFAULT_P 480
  3114. #define ZGEMM_DEFAULT_P 224
  3115. #define SGEMM_DEFAULT_Q 488
  3116. #define DGEMM_DEFAULT_Q 384
  3117. #define CGEMM_DEFAULT_Q 128
  3118. #define ZGEMM_DEFAULT_Q 352
  3119. #define SGEMM_DEFAULT_R 8192
  3120. #define DGEMM_DEFAULT_R 4096
  3121. #define CGEMM_DEFAULT_R 4096
  3122. #define ZGEMM_DEFAULT_R 2048
  3123. #define SYMV_P 16
  3124. #endif
  3125. #if defined(Z14)
  3126. #define SNUMOPT 2
  3127. #define DNUMOPT 2
  3128. #define GEMM_DEFAULT_OFFSET_A 0
  3129. #define GEMM_DEFAULT_OFFSET_B 0
  3130. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  3131. #define SGEMM_DEFAULT_UNROLL_M 16
  3132. #define SGEMM_DEFAULT_UNROLL_N 4
  3133. #define DGEMM_DEFAULT_UNROLL_M 8
  3134. #define DGEMM_DEFAULT_UNROLL_N 4
  3135. #define CGEMM_DEFAULT_UNROLL_M 4
  3136. #define CGEMM_DEFAULT_UNROLL_N 4
  3137. #define ZGEMM_DEFAULT_UNROLL_M 4
  3138. #define ZGEMM_DEFAULT_UNROLL_N 4
  3139. #define SGEMM_DEFAULT_P 480
  3140. #define DGEMM_DEFAULT_P 320
  3141. #define CGEMM_DEFAULT_P 480
  3142. #define ZGEMM_DEFAULT_P 224
  3143. #define SGEMM_DEFAULT_Q 512
  3144. #define DGEMM_DEFAULT_Q 384
  3145. #define CGEMM_DEFAULT_Q 128
  3146. #define ZGEMM_DEFAULT_Q 352
  3147. #define SGEMM_DEFAULT_R 8192
  3148. #define DGEMM_DEFAULT_R 4096
  3149. #define CGEMM_DEFAULT_R 4096
  3150. #define ZGEMM_DEFAULT_R 2048
  3151. #define SYMV_P 16
  3152. #endif
  3153. #if defined(CSKY) || defined(CK860FV)
  3154. #define GEMM_DEFAULT_OFFSET_A 0
  3155. #define GEMM_DEFAULT_OFFSET_B 0
  3156. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3157. #define SGEMM_DEFAULT_UNROLL_M 2
  3158. #define SGEMM_DEFAULT_UNROLL_N 2
  3159. #define DGEMM_DEFAULT_UNROLL_M 2
  3160. #define DGEMM_DEFAULT_UNROLL_N 2
  3161. #define CGEMM_DEFAULT_UNROLL_M 2
  3162. #define CGEMM_DEFAULT_UNROLL_N 2
  3163. #define ZGEMM_DEFAULT_UNROLL_M 2
  3164. #define ZGEMM_DEFAULT_UNROLL_N 2
  3165. #define SGEMM_DEFAULT_P 128
  3166. #define DGEMM_DEFAULT_P 128
  3167. #define CGEMM_DEFAULT_P 96
  3168. #define ZGEMM_DEFAULT_P 64
  3169. #define SGEMM_DEFAULT_Q 240
  3170. #define DGEMM_DEFAULT_Q 120
  3171. #define CGEMM_DEFAULT_Q 120
  3172. #define ZGEMM_DEFAULT_Q 120
  3173. #define SGEMM_DEFAULT_R 12288
  3174. #define DGEMM_DEFAULT_R 8192
  3175. #define CGEMM_DEFAULT_R 4096
  3176. #define ZGEMM_DEFAULT_R 4096
  3177. #define SYMV_P 16
  3178. #define GEMM_DEFAULT_OFFSET_A 0
  3179. #define GEMM_DEFAULT_OFFSET_B 0
  3180. #endif
  3181. #ifdef GENERIC
  3182. #define SNUMOPT 2
  3183. #define DNUMOPT 2
  3184. #define GEMM_DEFAULT_OFFSET_A 0
  3185. #define GEMM_DEFAULT_OFFSET_B 0
  3186. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  3187. #define SGEMM_DEFAULT_UNROLL_N 2
  3188. #define DGEMM_DEFAULT_UNROLL_N 2
  3189. #define QGEMM_DEFAULT_UNROLL_N 2
  3190. #define CGEMM_DEFAULT_UNROLL_N 2
  3191. #define ZGEMM_DEFAULT_UNROLL_N 2
  3192. #define XGEMM_DEFAULT_UNROLL_N 1
  3193. #define CGEMM3M_DEFAULT_UNROLL_N 2
  3194. #define ZGEMM3M_DEFAULT_UNROLL_N 2
  3195. #ifdef ARCH_X86
  3196. #define SGEMM_DEFAULT_UNROLL_M 2
  3197. #define DGEMM_DEFAULT_UNROLL_M 2
  3198. #define QGEMM_DEFAULT_UNROLL_M 2
  3199. #define CGEMM_DEFAULT_UNROLL_M 2
  3200. #define ZGEMM_DEFAULT_UNROLL_M 2
  3201. #define XGEMM_DEFAULT_UNROLL_M 1
  3202. #else
  3203. #define SGEMM_DEFAULT_UNROLL_M 2
  3204. #define DGEMM_DEFAULT_UNROLL_M 2
  3205. #define QGEMM_DEFAULT_UNROLL_M 2
  3206. #define CGEMM_DEFAULT_UNROLL_M 2
  3207. #define ZGEMM_DEFAULT_UNROLL_M 2
  3208. #define XGEMM_DEFAULT_UNROLL_M 1
  3209. #define CGEMM3M_DEFAULT_UNROLL_M 2
  3210. #define ZGEMM3M_DEFAULT_UNROLL_M 2
  3211. #define CGEMM3M_DEFAULT_P 448
  3212. #define ZGEMM3M_DEFAULT_P 224
  3213. #define XGEMM3M_DEFAULT_P 112
  3214. #define CGEMM3M_DEFAULT_Q 224
  3215. #define ZGEMM3M_DEFAULT_Q 224
  3216. #define XGEMM3M_DEFAULT_Q 224
  3217. #define CGEMM3M_DEFAULT_R 12288
  3218. #define ZGEMM3M_DEFAULT_R 12288
  3219. #define XGEMM3M_DEFAULT_R 12288
  3220. #endif
  3221. #ifdef ARCH_MIPS
  3222. #define SGEMM_DEFAULT_P 128
  3223. #define DGEMM_DEFAULT_P 128
  3224. #define CGEMM_DEFAULT_P 96
  3225. #define ZGEMM_DEFAULT_P 64
  3226. #define SGEMM_DEFAULT_Q 240
  3227. #define DGEMM_DEFAULT_Q 120
  3228. #define CGEMM_DEFAULT_Q 120
  3229. #define ZGEMM_DEFAULT_Q 120
  3230. #define SGEMM_DEFAULT_R 12288
  3231. #define DGEMM_DEFAULT_R 8192
  3232. #define CGEMM_DEFAULT_R 4096
  3233. #define ZGEMM_DEFAULT_R 4096
  3234. #elif defined(ARCH_LOONGARCH64)
  3235. #define SGEMM_DEFAULT_P 128
  3236. #define DGEMM_DEFAULT_P 128
  3237. #define CGEMM_DEFAULT_P 96
  3238. #define ZGEMM_DEFAULT_P 64
  3239. #define SGEMM_DEFAULT_Q 240
  3240. #define DGEMM_DEFAULT_Q 120
  3241. #define CGEMM_DEFAULT_Q 120
  3242. #define ZGEMM_DEFAULT_Q 120
  3243. #define SGEMM_DEFAULT_R 12288
  3244. #define DGEMM_DEFAULT_R 8192
  3245. #define CGEMM_DEFAULT_R 4096
  3246. #define ZGEMM_DEFAULT_R 4096
  3247. #else
  3248. #define SGEMM_DEFAULT_P sgemm_p
  3249. #define DGEMM_DEFAULT_P dgemm_p
  3250. #define QGEMM_DEFAULT_P qgemm_p
  3251. #define CGEMM_DEFAULT_P cgemm_p
  3252. #define ZGEMM_DEFAULT_P zgemm_p
  3253. #define XGEMM_DEFAULT_P xgemm_p
  3254. #define SGEMM_DEFAULT_R sgemm_r
  3255. #define DGEMM_DEFAULT_R dgemm_r
  3256. #define QGEMM_DEFAULT_R qgemm_r
  3257. #define CGEMM_DEFAULT_R cgemm_r
  3258. #define ZGEMM_DEFAULT_R zgemm_r
  3259. #define XGEMM_DEFAULT_R xgemm_r
  3260. #define SGEMM_DEFAULT_Q 128
  3261. #define DGEMM_DEFAULT_Q 128
  3262. #define QGEMM_DEFAULT_Q 128
  3263. #define CGEMM_DEFAULT_Q 128
  3264. #define ZGEMM_DEFAULT_Q 128
  3265. #define XGEMM_DEFAULT_Q 128
  3266. #endif
  3267. #define SYMV_P 16
  3268. #endif
  3269. #ifndef SWITCH_RATIO
  3270. #define SWITCH_RATIO 2
  3271. #endif
  3272. #ifndef QGEMM_DEFAULT_UNROLL_M
  3273. #define QGEMM_DEFAULT_UNROLL_M 2
  3274. #endif
  3275. #ifndef QGEMM_DEFAULT_UNROLL_N
  3276. #define QGEMM_DEFAULT_UNROLL_N 2
  3277. #endif
  3278. #ifndef XGEMM_DEFAULT_UNROLL_M
  3279. #define XGEMM_DEFAULT_UNROLL_M 2
  3280. #endif
  3281. #ifndef XGEMM_DEFAULT_UNROLL_N
  3282. #define XGEMM_DEFAULT_UNROLL_N 2
  3283. #endif
  3284. #ifndef HAVE_SSE2
  3285. #define SHUFPD_0 shufps $0x44,
  3286. #define SHUFPD_1 shufps $0x4e,
  3287. #define SHUFPD_2 shufps $0xe4,
  3288. #define SHUFPD_3 shufps $0xee,
  3289. #endif
  3290. #ifndef SHUFPD_0
  3291. #define SHUFPD_0 shufpd $0,
  3292. #endif
  3293. #ifndef SHUFPD_1
  3294. #define SHUFPD_1 shufpd $1,
  3295. #endif
  3296. #ifndef SHUFPD_2
  3297. #define SHUFPD_2 shufpd $2,
  3298. #endif
  3299. #ifndef SHUFPD_3
  3300. #define SHUFPD_3 shufpd $3,
  3301. #endif
  3302. #ifndef SHUFPS_39
  3303. #define SHUFPS_39 shufps $0x39,
  3304. #endif
  3305. #endif