You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

param.h 100 kB

12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
5 years ago
12 years ago
12 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
12 years ago
12 years ago
12 years ago
5 years ago
5 years ago
12 years ago
5 years ago
5 years ago
12 years ago
5 years ago
12 years ago
5 years ago
5 years ago
5 years ago
12 years ago
6 years ago
12 years ago
12 years ago
12 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
3 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
12 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
12 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191119211931194119511961197119811991200120112021203120412051206120712081209121012111212121312141215121612171218121912201221122212231224122512261227122812291230123112321233123412351236123712381239124012411242124312441245124612471248124912501251125212531254125512561257125812591260126112621263126412651266126712681269127012711272127312741275127612771278127912801281128212831284128512861287128812891290129112921293129412951296129712981299130013011302130313041305130613071308130913101311131213131314131513161317131813191320132113221323132413251326132713281329133013311332133313341335133613371338133913401341134213431344134513461347134813491350135113521353135413551356135713581359136013611362136313641365136613671368136913701371137213731374137513761377137813791380138113821383138413851386138713881389139013911392139313941395139613971398139914001401140214031404140514061407140814091410141114121413141414151416141714181419142014211422142314241425142614271428142914301431143214331434143514361437143814391440144114421443144414451446144714481449145014511452145314541455145614571458145914601461146214631464146514661467146814691470147114721473147414751476147714781479148014811482148314841485148614871488148914901491149214931494149514961497149814991500150115021503150415051506150715081509151015111512151315141515151615171518151915201521152215231524152515261527152815291530153115321533153415351536153715381539154015411542154315441545154615471548154915501551155215531554155515561557155815591560156115621563156415651566156715681569157015711572157315741575157615771578157915801581158215831584158515861587158815891590159115921593159415951596159715981599160016011602160316041605160616071608160916101611161216131614161516161617161816191620162116221623162416251626162716281629163016311632163316341635163616371638163916401641164216431644164516461647164816491650165116521653165416551656165716581659166016611662166316641665166616671668166916701671167216731674167516761677167816791680168116821683168416851686168716881689169016911692169316941695169616971698169917001701170217031704170517061707170817091710171117121713171417151716171717181719172017211722172317241725172617271728172917301731173217331734173517361737173817391740174117421743174417451746174717481749175017511752175317541755175617571758175917601761176217631764176517661767176817691770177117721773177417751776177717781779178017811782178317841785178617871788178917901791179217931794179517961797179817991800180118021803180418051806180718081809181018111812181318141815181618171818181918201821182218231824182518261827182818291830183118321833183418351836183718381839184018411842184318441845184618471848184918501851185218531854185518561857185818591860186118621863186418651866186718681869187018711872187318741875187618771878187918801881188218831884188518861887188818891890189118921893189418951896189718981899190019011902190319041905190619071908190919101911191219131914191519161917191819191920192119221923192419251926192719281929193019311932193319341935193619371938193919401941194219431944194519461947194819491950195119521953195419551956195719581959196019611962196319641965196619671968196919701971197219731974197519761977197819791980198119821983198419851986198719881989199019911992199319941995199619971998199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320242025202620272028202920302031203220332034203520362037203820392040204120422043204420452046204720482049205020512052205320542055205620572058205920602061206220632064206520662067206820692070207120722073207420752076207720782079208020812082208320842085208620872088208920902091209220932094209520962097209820992100210121022103210421052106210721082109211021112112211321142115211621172118211921202121212221232124212521262127212821292130213121322133213421352136213721382139214021412142214321442145214621472148214921502151215221532154215521562157215821592160216121622163216421652166216721682169217021712172217321742175217621772178217921802181218221832184218521862187218821892190219121922193219421952196219721982199220022012202220322042205220622072208220922102211221222132214221522162217221822192220222122222223222422252226222722282229223022312232223322342235223622372238223922402241224222432244224522462247224822492250225122522253225422552256225722582259226022612262226322642265226622672268226922702271227222732274227522762277227822792280228122822283228422852286228722882289229022912292229322942295229622972298229923002301230223032304230523062307230823092310231123122313231423152316231723182319232023212322232323242325232623272328232923302331233223332334233523362337233823392340234123422343234423452346234723482349235023512352235323542355235623572358235923602361236223632364236523662367236823692370237123722373237423752376237723782379238023812382238323842385238623872388238923902391239223932394239523962397239823992400240124022403240424052406240724082409241024112412241324142415241624172418241924202421242224232424242524262427242824292430243124322433243424352436243724382439244024412442244324442445244624472448244924502451245224532454245524562457245824592460246124622463246424652466246724682469247024712472247324742475247624772478247924802481248224832484248524862487248824892490249124922493249424952496249724982499250025012502250325042505250625072508250925102511251225132514251525162517251825192520252125222523252425252526252725282529253025312532253325342535253625372538253925402541254225432544254525462547254825492550255125522553255425552556255725582559256025612562256325642565256625672568256925702571257225732574257525762577257825792580258125822583258425852586258725882589259025912592259325942595259625972598259926002601260226032604260526062607260826092610261126122613261426152616261726182619262026212622262326242625262626272628262926302631263226332634263526362637263826392640264126422643264426452646264726482649265026512652265326542655265626572658265926602661266226632664266526662667266826692670267126722673267426752676267726782679268026812682268326842685268626872688268926902691269226932694269526962697269826992700270127022703270427052706270727082709271027112712271327142715271627172718271927202721272227232724272527262727272827292730273127322733273427352736273727382739274027412742274327442745274627472748274927502751275227532754275527562757275827592760276127622763276427652766276727682769277027712772277327742775277627772778277927802781278227832784278527862787278827892790279127922793279427952796279727982799280028012802280328042805280628072808280928102811281228132814281528162817281828192820282128222823282428252826282728282829283028312832283328342835283628372838283928402841284228432844284528462847284828492850285128522853285428552856285728582859286028612862286328642865286628672868286928702871287228732874287528762877287828792880288128822883288428852886288728882889289028912892289328942895289628972898289929002901290229032904290529062907290829092910291129122913291429152916291729182919292029212922292329242925292629272928292929302931293229332934293529362937293829392940294129422943294429452946294729482949295029512952295329542955295629572958295929602961296229632964296529662967296829692970297129722973297429752976297729782979298029812982298329842985298629872988298929902991299229932994299529962997299829993000300130023003300430053006300730083009301030113012301330143015301630173018301930203021302230233024302530263027302830293030303130323033303430353036303730383039304030413042304330443045304630473048304930503051305230533054305530563057305830593060306130623063306430653066306730683069307030713072307330743075307630773078307930803081308230833084308530863087308830893090309130923093309430953096309730983099310031013102310331043105310631073108310931103111311231133114311531163117311831193120312131223123312431253126312731283129313031313132313331343135313631373138313931403141314231433144314531463147314831493150315131523153315431553156315731583159316031613162316331643165316631673168316931703171317231733174317531763177317831793180318131823183318431853186318731883189319031913192319331943195319631973198319932003201320232033204320532063207320832093210321132123213321432153216321732183219322032213222322332243225322632273228322932303231323232333234323532363237323832393240324132423243324432453246324732483249325032513252325332543255325632573258325932603261326232633264326532663267326832693270327132723273327432753276327732783279328032813282328332843285328632873288328932903291329232933294329532963297329832993300330133023303330433053306330733083309331033113312331333143315331633173318331933203321332233233324332533263327332833293330333133323333333433353336333733383339334033413342334333443345334633473348334933503351335233533354335533563357335833593360336133623363336433653366336733683369337033713372337333743375337633773378337933803381338233833384338533863387338833893390339133923393339433953396339733983399340034013402340334043405340634073408340934103411341234133414341534163417341834193420342134223423342434253426342734283429343034313432343334343435343634373438343934403441344234433444344534463447344834493450345134523453345434553456345734583459346034613462346334643465346634673468346934703471347234733474347534763477347834793480348134823483348434853486348734883489349034913492349334943495349634973498349935003501350235033504350535063507350835093510351135123513351435153516351735183519352035213522352335243525352635273528352935303531353235333534353535363537353835393540354135423543354435453546354735483549355035513552355335543555355635573558355935603561356235633564356535663567356835693570357135723573357435753576357735783579358035813582358335843585358635873588358935903591359235933594359535963597359835993600360136023603360436053606360736083609361036113612361336143615361636173618361936203621362236233624362536263627362836293630363136323633363436353636363736383639364036413642364336443645364636473648364936503651365236533654365536563657365836593660366136623663366436653666366736683669367036713672367336743675367636773678367936803681368236833684368536863687368836893690369136923693369436953696369736983699370037013702370337043705370637073708370937103711371237133714371537163717371837193720372137223723372437253726372737283729373037313732373337343735373637373738373937403741374237433744374537463747374837493750375137523753375437553756375737583759376037613762376337643765376637673768376937703771377237733774377537763777377837793780378137823783378437853786378737883789379037913792379337943795379637973798379938003801380238033804380538063807380838093810381138123813381438153816381738183819382038213822382338243825382638273828382938303831383238333834383538363837383838393840384138423843384438453846384738483849385038513852385338543855385638573858385938603861386238633864386538663867386838693870387138723873387438753876387738783879388038813882388338843885388638873888388938903891389238933894389538963897389838993900390139023903390439053906390739083909391039113912391339143915391639173918391939203921392239233924392539263927392839293930393139323933393439353936393739383939394039413942394339443945394639473948394939503951395239533954395539563957395839593960396139623963396439653966396739683969397039713972397339743975397639773978397939803981398239833984398539863987398839893990399139923993399439953996399739983999400040014002400340044005400640074008400940104011401240134014401540164017401840194020402140224023402440254026402740284029403040314032403340344035403640374038403940404041404240434044404540464047404840494050405140524053405440554056405740584059406040614062406340644065406640674068406940704071407240734074407540764077407840794080408140824083408440854086408740884089409040914092409340944095409640974098409941004101410241034104410541064107410841094110411141124113411441154116411741184119412041214122412341244125412641274128412941304131413241334134413541364137413841394140414141424143414441454146414741484149415041514152415341544155415641574158415941604161416241634164416541664167416841694170417141724173417441754176417741784179418041814182418341844185418641874188418941904191419241934194419541964197419841994200420142024203420442054206420742084209421042114212
  1. /*****************************************************************************
  2. Copyright (c) 2011-2023, The OpenBLAS Project
  3. All rights reserved.
  4. Redistribution and use in source and binary forms, with or without
  5. modification, are permitted provided that the following conditions are
  6. met:
  7. 1. Redistributions of source code must retain the above copyright
  8. notice, this list of conditions and the following disclaimer.
  9. 2. Redistributions in binary form must reproduce the above copyright
  10. notice, this list of conditions and the following disclaimer in
  11. the documentation and/or other materials provided with the
  12. distribution.
  13. 3. Neither the name of the OpenBLAS project nor the names of
  14. its contributors may be used to endorse or promote products
  15. derived from this software without specific prior written
  16. permission.
  17. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  18. AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  19. IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  20. ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  21. LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  22. DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  23. SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  24. CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  25. OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
  26. USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  27. **********************************************************************************/
  28. /*********************************************************************/
  29. /* Copyright 2009, 2010 The University of Texas at Austin. */
  30. /* All rights reserved. */
  31. /* */
  32. /* Redistribution and use in source and binary forms, with or */
  33. /* without modification, are permitted provided that the following */
  34. /* conditions are met: */
  35. /* */
  36. /* 1. Redistributions of source code must retain the above */
  37. /* copyright notice, this list of conditions and the following */
  38. /* disclaimer. */
  39. /* */
  40. /* 2. Redistributions in binary form must reproduce the above */
  41. /* copyright notice, this list of conditions and the following */
  42. /* disclaimer in the documentation and/or other materials */
  43. /* provided with the distribution. */
  44. /* */
  45. /* THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF TEXAS AT */
  46. /* AUSTIN ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, */
  47. /* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */
  48. /* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */
  49. /* DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT */
  50. /* AUSTIN OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, */
  51. /* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES */
  52. /* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE */
  53. /* GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR */
  54. /* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF */
  55. /* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT */
  56. /* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT */
  57. /* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */
  58. /* POSSIBILITY OF SUCH DAMAGE. */
  59. /* */
  60. /* The views and conclusions contained in the software and */
  61. /* documentation are those of the authors and should not be */
  62. /* interpreted as representing official policies, either expressed */
  63. /* or implied, of The University of Texas at Austin. */
  64. /*********************************************************************/
  65. #ifndef PARAM_H
  66. #define PARAM_H
  67. #define SHGEMM_DEFAULT_UNROLL_N 8
  68. #define SHGEMM_DEFAULT_UNROLL_M 8
  69. #define SHGEMM_DEFAULT_UNROLL_MN 32
  70. #define SHGEMM_DEFAULT_P 128
  71. #define SHGEMM_DEFAULT_R 240
  72. #define SHGEMM_DEFAULT_Q 12288
  73. #define SBGEMM_DEFAULT_UNROLL_N 4
  74. #define SBGEMM_DEFAULT_UNROLL_M 8
  75. #define SBGEMM_DEFAULT_UNROLL_MN 32
  76. #define SBGEMM_DEFAULT_P 256
  77. #define SBGEMM_DEFAULT_R 256
  78. #define SBGEMM_DEFAULT_Q 256
  79. #define SBGEMM_ALIGN_K 1 // must be 2^x
  80. #ifdef OPTERON
  81. #define SNUMOPT 4
  82. #define DNUMOPT 2
  83. #define GEMM_DEFAULT_OFFSET_A 64
  84. #define GEMM_DEFAULT_OFFSET_B 256
  85. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x01ffffUL
  86. #define SGEMM_DEFAULT_UNROLL_N 4
  87. #define DGEMM_DEFAULT_UNROLL_N 4
  88. #define QGEMM_DEFAULT_UNROLL_N 2
  89. #define CGEMM_DEFAULT_UNROLL_N 2
  90. #define ZGEMM_DEFAULT_UNROLL_N 2
  91. #define XGEMM_DEFAULT_UNROLL_N 1
  92. #ifdef ARCH_X86
  93. #define SGEMM_DEFAULT_UNROLL_M 4
  94. #define DGEMM_DEFAULT_UNROLL_M 2
  95. #define QGEMM_DEFAULT_UNROLL_M 2
  96. #define CGEMM_DEFAULT_UNROLL_M 2
  97. #define ZGEMM_DEFAULT_UNROLL_M 1
  98. #define XGEMM_DEFAULT_UNROLL_M 1
  99. #else
  100. #define SGEMM_DEFAULT_UNROLL_M 8
  101. #define DGEMM_DEFAULT_UNROLL_M 4
  102. #define QGEMM_DEFAULT_UNROLL_M 2
  103. #define CGEMM_DEFAULT_UNROLL_M 4
  104. #define ZGEMM_DEFAULT_UNROLL_M 2
  105. #define XGEMM_DEFAULT_UNROLL_M 1
  106. #endif
  107. #define SGEMM_DEFAULT_P sgemm_p
  108. #define DGEMM_DEFAULT_P dgemm_p
  109. #define QGEMM_DEFAULT_P qgemm_p
  110. #define CGEMM_DEFAULT_P cgemm_p
  111. #define ZGEMM_DEFAULT_P zgemm_p
  112. #define XGEMM_DEFAULT_P xgemm_p
  113. #define SGEMM_DEFAULT_R sgemm_r
  114. #define DGEMM_DEFAULT_R dgemm_r
  115. #define QGEMM_DEFAULT_R qgemm_r
  116. #define CGEMM_DEFAULT_R cgemm_r
  117. #define ZGEMM_DEFAULT_R zgemm_r
  118. #define XGEMM_DEFAULT_R xgemm_r
  119. #ifdef ALLOC_HUGETLB
  120. #define SGEMM_DEFAULT_Q 248
  121. #define DGEMM_DEFAULT_Q 248
  122. #define QGEMM_DEFAULT_Q 248
  123. #define CGEMM_DEFAULT_Q 248
  124. #define ZGEMM_DEFAULT_Q 248
  125. #define XGEMM_DEFAULT_Q 248
  126. #else
  127. #define SGEMM_DEFAULT_Q 240
  128. #define DGEMM_DEFAULT_Q 240
  129. #define QGEMM_DEFAULT_Q 240
  130. #define CGEMM_DEFAULT_Q 240
  131. #define ZGEMM_DEFAULT_Q 240
  132. #define XGEMM_DEFAULT_Q 240
  133. #endif
  134. #define SYMV_P 16
  135. #define HAVE_EXCLUSIVE_CACHE
  136. #endif
  137. #if defined(BARCELONA) || defined(SHANGHAI) || defined(BOBCAT)
  138. #define SNUMOPT 8
  139. #define DNUMOPT 4
  140. #define GEMM_DEFAULT_OFFSET_A 64
  141. #define GEMM_DEFAULT_OFFSET_B 832
  142. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  143. #define SGEMM_DEFAULT_UNROLL_N 4
  144. #define DGEMM_DEFAULT_UNROLL_N 4
  145. #define QGEMM_DEFAULT_UNROLL_N 2
  146. #define CGEMM_DEFAULT_UNROLL_N 2
  147. #define ZGEMM_DEFAULT_UNROLL_N 2
  148. #define XGEMM_DEFAULT_UNROLL_N 1
  149. #ifdef ARCH_X86
  150. #define SGEMM_DEFAULT_UNROLL_M 4
  151. #define DGEMM_DEFAULT_UNROLL_M 2
  152. #define QGEMM_DEFAULT_UNROLL_M 2
  153. #define CGEMM_DEFAULT_UNROLL_M 2
  154. #define ZGEMM_DEFAULT_UNROLL_M 1
  155. #define XGEMM_DEFAULT_UNROLL_M 1
  156. #else
  157. #define SGEMM_DEFAULT_UNROLL_M 8
  158. #define DGEMM_DEFAULT_UNROLL_M 4
  159. #define QGEMM_DEFAULT_UNROLL_M 2
  160. #define CGEMM_DEFAULT_UNROLL_M 4
  161. #define ZGEMM_DEFAULT_UNROLL_M 2
  162. #define XGEMM_DEFAULT_UNROLL_M 1
  163. #endif
  164. #if 0
  165. #define SGEMM_DEFAULT_P 496
  166. #define DGEMM_DEFAULT_P 248
  167. #define QGEMM_DEFAULT_P 124
  168. #define CGEMM_DEFAULT_P 248
  169. #define ZGEMM_DEFAULT_P 124
  170. #define XGEMM_DEFAULT_P 62
  171. #define SGEMM_DEFAULT_Q 248
  172. #define DGEMM_DEFAULT_Q 248
  173. #define QGEMM_DEFAULT_Q 248
  174. #define CGEMM_DEFAULT_Q 248
  175. #define ZGEMM_DEFAULT_Q 248
  176. #define XGEMM_DEFAULT_Q 248
  177. #else
  178. #define SGEMM_DEFAULT_P 448
  179. #define DGEMM_DEFAULT_P 224
  180. #define QGEMM_DEFAULT_P 112
  181. #define CGEMM_DEFAULT_P 224
  182. #define ZGEMM_DEFAULT_P 112
  183. #define XGEMM_DEFAULT_P 56
  184. #define SGEMM_DEFAULT_Q 224
  185. #define DGEMM_DEFAULT_Q 224
  186. #define QGEMM_DEFAULT_Q 224
  187. #define CGEMM_DEFAULT_Q 224
  188. #define ZGEMM_DEFAULT_Q 224
  189. #define XGEMM_DEFAULT_Q 224
  190. #endif
  191. #define SGEMM_DEFAULT_R sgemm_r
  192. #define QGEMM_DEFAULT_R qgemm_r
  193. #define DGEMM_DEFAULT_R dgemm_r
  194. #define CGEMM_DEFAULT_R cgemm_r
  195. #define ZGEMM_DEFAULT_R zgemm_r
  196. #define XGEMM_DEFAULT_R xgemm_r
  197. #define SYMV_P 16
  198. #define HAVE_EXCLUSIVE_CACHE
  199. #define GEMM_THREAD gemm_thread_mn
  200. #endif
  201. #ifdef BULLDOZER
  202. #define SNUMOPT 8
  203. #define DNUMOPT 4
  204. #define GEMM_DEFAULT_OFFSET_A 64
  205. #define GEMM_DEFAULT_OFFSET_B 832
  206. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  207. #define QGEMM_DEFAULT_UNROLL_N 2
  208. #define CGEMM_DEFAULT_UNROLL_N 2
  209. #define ZGEMM_DEFAULT_UNROLL_N 2
  210. #define XGEMM_DEFAULT_UNROLL_N 1
  211. #ifdef ARCH_X86
  212. #define SGEMM_DEFAULT_UNROLL_N 4
  213. #define DGEMM_DEFAULT_UNROLL_N 4
  214. #define SGEMM_DEFAULT_UNROLL_M 4
  215. #define DGEMM_DEFAULT_UNROLL_M 2
  216. #define QGEMM_DEFAULT_UNROLL_M 2
  217. #define CGEMM_DEFAULT_UNROLL_M 2
  218. #define ZGEMM_DEFAULT_UNROLL_M 1
  219. #define XGEMM_DEFAULT_UNROLL_M 1
  220. #else
  221. #define SGEMM_DEFAULT_UNROLL_N 2
  222. #define DGEMM_DEFAULT_UNROLL_N 2
  223. #define SGEMM_DEFAULT_UNROLL_M 16
  224. #define DGEMM_DEFAULT_UNROLL_M 8
  225. #define QGEMM_DEFAULT_UNROLL_M 2
  226. #define CGEMM_DEFAULT_UNROLL_M 4
  227. #define ZGEMM_DEFAULT_UNROLL_M 2
  228. #define XGEMM_DEFAULT_UNROLL_M 1
  229. #define CGEMM3M_DEFAULT_UNROLL_N 4
  230. #define CGEMM3M_DEFAULT_UNROLL_M 8
  231. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  232. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  233. #define DGEMM_DEFAULT_UNROLL_MN 16
  234. #define GEMV_UNROLL 8
  235. #endif
  236. #if defined(ARCH_X86_64)
  237. #define SGEMM_DEFAULT_P 768
  238. #define DGEMM_DEFAULT_P 384
  239. #else
  240. #define SGEMM_DEFAULT_P 448
  241. #define DGEMM_DEFAULT_P 224
  242. #endif
  243. #define QGEMM_DEFAULT_P 112
  244. #define CGEMM_DEFAULT_P 224
  245. #define ZGEMM_DEFAULT_P 112
  246. #define XGEMM_DEFAULT_P 56
  247. #if defined(ARCH_X86_64)
  248. #define SGEMM_DEFAULT_Q 168
  249. #define DGEMM_DEFAULT_Q 168
  250. #else
  251. #define SGEMM_DEFAULT_Q 224
  252. #define DGEMM_DEFAULT_Q 224
  253. #endif
  254. #define QGEMM_DEFAULT_Q 224
  255. #define CGEMM_DEFAULT_Q 224
  256. #define ZGEMM_DEFAULT_Q 224
  257. #define XGEMM_DEFAULT_Q 224
  258. #define CGEMM3M_DEFAULT_P 448
  259. #define ZGEMM3M_DEFAULT_P 224
  260. #define XGEMM3M_DEFAULT_P 112
  261. #define CGEMM3M_DEFAULT_Q 224
  262. #define ZGEMM3M_DEFAULT_Q 224
  263. #define XGEMM3M_DEFAULT_Q 224
  264. #define CGEMM3M_DEFAULT_R 12288
  265. #define ZGEMM3M_DEFAULT_R 12288
  266. #define XGEMM3M_DEFAULT_R 12288
  267. #define SGEMM_DEFAULT_R sgemm_r
  268. #define QGEMM_DEFAULT_R qgemm_r
  269. #define DGEMM_DEFAULT_R dgemm_r
  270. #define CGEMM_DEFAULT_R cgemm_r
  271. #define ZGEMM_DEFAULT_R zgemm_r
  272. #define XGEMM_DEFAULT_R xgemm_r
  273. #define SYMV_P 16
  274. #define HAVE_EXCLUSIVE_CACHE
  275. #define GEMM_THREAD gemm_thread_mn
  276. #endif
  277. #ifdef PILEDRIVER
  278. #define SNUMOPT 8
  279. #define DNUMOPT 4
  280. #define GEMM_DEFAULT_OFFSET_A 64
  281. #define GEMM_DEFAULT_OFFSET_B 832
  282. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  283. #define QGEMM_DEFAULT_UNROLL_N 2
  284. #define CGEMM_DEFAULT_UNROLL_N 2
  285. #define ZGEMM_DEFAULT_UNROLL_N 2
  286. #define XGEMM_DEFAULT_UNROLL_N 1
  287. #ifdef ARCH_X86
  288. #define SGEMM_DEFAULT_UNROLL_N 4
  289. #define DGEMM_DEFAULT_UNROLL_N 4
  290. #define SGEMM_DEFAULT_UNROLL_M 4
  291. #define DGEMM_DEFAULT_UNROLL_M 2
  292. #define QGEMM_DEFAULT_UNROLL_M 2
  293. #define CGEMM_DEFAULT_UNROLL_M 2
  294. #define ZGEMM_DEFAULT_UNROLL_M 1
  295. #define XGEMM_DEFAULT_UNROLL_M 1
  296. #else
  297. #define SGEMM_DEFAULT_UNROLL_N 2
  298. #define DGEMM_DEFAULT_UNROLL_N 2
  299. #define SGEMM_DEFAULT_UNROLL_M 16
  300. #define DGEMM_DEFAULT_UNROLL_M 8
  301. #define QGEMM_DEFAULT_UNROLL_M 2
  302. #define CGEMM_DEFAULT_UNROLL_M 4
  303. #define ZGEMM_DEFAULT_UNROLL_M 2
  304. #define XGEMM_DEFAULT_UNROLL_M 1
  305. #define CGEMM3M_DEFAULT_UNROLL_N 4
  306. #define CGEMM3M_DEFAULT_UNROLL_M 8
  307. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  308. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  309. #define GEMV_UNROLL 8
  310. #endif
  311. #if defined(ARCH_X86_64)
  312. #define SGEMM_DEFAULT_P 768
  313. #define DGEMM_DEFAULT_P 768
  314. #define ZGEMM_DEFAULT_P 384
  315. #define CGEMM_DEFAULT_P 768
  316. #else
  317. #define SGEMM_DEFAULT_P 448
  318. #define DGEMM_DEFAULT_P 480
  319. #define ZGEMM_DEFAULT_P 112
  320. #define CGEMM_DEFAULT_P 224
  321. #endif
  322. #define QGEMM_DEFAULT_P 112
  323. #define XGEMM_DEFAULT_P 56
  324. #if defined(ARCH_X86_64)
  325. #define SGEMM_DEFAULT_Q 192
  326. #define DGEMM_DEFAULT_Q 168
  327. #define ZGEMM_DEFAULT_Q 168
  328. #define CGEMM_DEFAULT_Q 168
  329. #else
  330. #define SGEMM_DEFAULT_Q 224
  331. #define DGEMM_DEFAULT_Q 224
  332. #define ZGEMM_DEFAULT_Q 224
  333. #define CGEMM_DEFAULT_Q 224
  334. #endif
  335. #define QGEMM_DEFAULT_Q 224
  336. #define XGEMM_DEFAULT_Q 224
  337. #define CGEMM3M_DEFAULT_P 448
  338. #define ZGEMM3M_DEFAULT_P 224
  339. #define XGEMM3M_DEFAULT_P 112
  340. #define CGEMM3M_DEFAULT_Q 224
  341. #define ZGEMM3M_DEFAULT_Q 224
  342. #define XGEMM3M_DEFAULT_Q 224
  343. #define CGEMM3M_DEFAULT_R 12288
  344. #define ZGEMM3M_DEFAULT_R 12288
  345. #define XGEMM3M_DEFAULT_R 12288
  346. #define SGEMM_DEFAULT_R 12288
  347. #define QGEMM_DEFAULT_R qgemm_r
  348. #define DGEMM_DEFAULT_R 12288
  349. #define CGEMM_DEFAULT_R cgemm_r
  350. #define ZGEMM_DEFAULT_R zgemm_r
  351. #define XGEMM_DEFAULT_R xgemm_r
  352. #define SYMV_P 16
  353. #define HAVE_EXCLUSIVE_CACHE
  354. #define GEMM_THREAD gemm_thread_mn
  355. #endif
  356. #ifdef STEAMROLLER
  357. #define SNUMOPT 8
  358. #define DNUMOPT 4
  359. #define GEMM_DEFAULT_OFFSET_A 64
  360. #define GEMM_DEFAULT_OFFSET_B 832
  361. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  362. #define QGEMM_DEFAULT_UNROLL_N 2
  363. #define CGEMM_DEFAULT_UNROLL_N 2
  364. #define ZGEMM_DEFAULT_UNROLL_N 2
  365. #define XGEMM_DEFAULT_UNROLL_N 1
  366. #ifdef ARCH_X86
  367. #define SGEMM_DEFAULT_UNROLL_N 4
  368. #define DGEMM_DEFAULT_UNROLL_N 4
  369. #define SGEMM_DEFAULT_UNROLL_M 4
  370. #define DGEMM_DEFAULT_UNROLL_M 2
  371. #define QGEMM_DEFAULT_UNROLL_M 2
  372. #define CGEMM_DEFAULT_UNROLL_M 2
  373. #define ZGEMM_DEFAULT_UNROLL_M 1
  374. #define XGEMM_DEFAULT_UNROLL_M 1
  375. #else
  376. #define SGEMM_DEFAULT_UNROLL_N 2
  377. #define DGEMM_DEFAULT_UNROLL_N 2
  378. #define SGEMM_DEFAULT_UNROLL_M 16
  379. #define DGEMM_DEFAULT_UNROLL_M 8
  380. #define QGEMM_DEFAULT_UNROLL_M 2
  381. #define CGEMM_DEFAULT_UNROLL_M 4
  382. #define ZGEMM_DEFAULT_UNROLL_M 2
  383. #define XGEMM_DEFAULT_UNROLL_M 1
  384. #define CGEMM3M_DEFAULT_UNROLL_N 4
  385. #define CGEMM3M_DEFAULT_UNROLL_M 8
  386. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  387. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  388. #define GEMV_UNROLL 8
  389. #endif
  390. #if defined(ARCH_X86_64)
  391. #define SGEMM_DEFAULT_P 768
  392. #define DGEMM_DEFAULT_P 576
  393. #define ZGEMM_DEFAULT_P 288
  394. #define CGEMM_DEFAULT_P 576
  395. #else
  396. #define SGEMM_DEFAULT_P 448
  397. #define DGEMM_DEFAULT_P 480
  398. #define ZGEMM_DEFAULT_P 112
  399. #define CGEMM_DEFAULT_P 224
  400. #endif
  401. #define QGEMM_DEFAULT_P 112
  402. #define XGEMM_DEFAULT_P 56
  403. #if defined(ARCH_X86_64)
  404. #define SGEMM_DEFAULT_Q 192
  405. #define DGEMM_DEFAULT_Q 160
  406. #define ZGEMM_DEFAULT_Q 160
  407. #define CGEMM_DEFAULT_Q 160
  408. #else
  409. #define SGEMM_DEFAULT_Q 224
  410. #define DGEMM_DEFAULT_Q 224
  411. #define ZGEMM_DEFAULT_Q 224
  412. #define CGEMM_DEFAULT_Q 224
  413. #endif
  414. #define QGEMM_DEFAULT_Q 224
  415. #define XGEMM_DEFAULT_Q 224
  416. #define CGEMM3M_DEFAULT_P 448
  417. #define ZGEMM3M_DEFAULT_P 224
  418. #define XGEMM3M_DEFAULT_P 112
  419. #define CGEMM3M_DEFAULT_Q 224
  420. #define ZGEMM3M_DEFAULT_Q 224
  421. #define XGEMM3M_DEFAULT_Q 224
  422. #define CGEMM3M_DEFAULT_R 12288
  423. #define ZGEMM3M_DEFAULT_R 12288
  424. #define XGEMM3M_DEFAULT_R 12288
  425. #define SGEMM_DEFAULT_R 12288
  426. #define QGEMM_DEFAULT_R qgemm_r
  427. #define DGEMM_DEFAULT_R 12288
  428. #define CGEMM_DEFAULT_R cgemm_r
  429. #define ZGEMM_DEFAULT_R zgemm_r
  430. #define XGEMM_DEFAULT_R xgemm_r
  431. #define SYMV_P 16
  432. #define HAVE_EXCLUSIVE_CACHE
  433. #define GEMM_THREAD gemm_thread_mn
  434. #endif
  435. #ifdef EXCAVATOR
  436. #define SNUMOPT 8
  437. #define DNUMOPT 4
  438. #define GEMM_DEFAULT_OFFSET_A 64
  439. #define GEMM_DEFAULT_OFFSET_B 832
  440. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  441. #define QGEMM_DEFAULT_UNROLL_N 2
  442. #define CGEMM_DEFAULT_UNROLL_N 2
  443. #define ZGEMM_DEFAULT_UNROLL_N 2
  444. #define XGEMM_DEFAULT_UNROLL_N 1
  445. #ifdef ARCH_X86
  446. #define SGEMM_DEFAULT_UNROLL_N 4
  447. #define DGEMM_DEFAULT_UNROLL_N 4
  448. #define SGEMM_DEFAULT_UNROLL_M 4
  449. #define DGEMM_DEFAULT_UNROLL_M 2
  450. #define QGEMM_DEFAULT_UNROLL_M 2
  451. #define CGEMM_DEFAULT_UNROLL_M 2
  452. #define ZGEMM_DEFAULT_UNROLL_M 1
  453. #define XGEMM_DEFAULT_UNROLL_M 1
  454. #else
  455. #define SGEMM_DEFAULT_UNROLL_N 2
  456. #define DGEMM_DEFAULT_UNROLL_N 2
  457. #define SGEMM_DEFAULT_UNROLL_M 16
  458. #define DGEMM_DEFAULT_UNROLL_M 8
  459. #define QGEMM_DEFAULT_UNROLL_M 2
  460. #define CGEMM_DEFAULT_UNROLL_M 4
  461. #define ZGEMM_DEFAULT_UNROLL_M 2
  462. #define XGEMM_DEFAULT_UNROLL_M 1
  463. #define CGEMM3M_DEFAULT_UNROLL_N 4
  464. #define CGEMM3M_DEFAULT_UNROLL_M 8
  465. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  466. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  467. #define GEMV_UNROLL 8
  468. #endif
  469. #if defined(ARCH_X86_64)
  470. #define SGEMM_DEFAULT_P 768
  471. #define DGEMM_DEFAULT_P 576
  472. #define ZGEMM_DEFAULT_P 288
  473. #define CGEMM_DEFAULT_P 576
  474. #else
  475. #define SGEMM_DEFAULT_P 448
  476. #define DGEMM_DEFAULT_P 480
  477. #define ZGEMM_DEFAULT_P 112
  478. #define CGEMM_DEFAULT_P 224
  479. #endif
  480. #define QGEMM_DEFAULT_P 112
  481. #define XGEMM_DEFAULT_P 56
  482. #if defined(ARCH_X86_64)
  483. #define SGEMM_DEFAULT_Q 192
  484. #define DGEMM_DEFAULT_Q 160
  485. #define ZGEMM_DEFAULT_Q 160
  486. #define CGEMM_DEFAULT_Q 160
  487. #else
  488. #define SGEMM_DEFAULT_Q 224
  489. #define DGEMM_DEFAULT_Q 224
  490. #define ZGEMM_DEFAULT_Q 224
  491. #define CGEMM_DEFAULT_Q 224
  492. #endif
  493. #define QGEMM_DEFAULT_Q 224
  494. #define XGEMM_DEFAULT_Q 224
  495. #define CGEMM3M_DEFAULT_P 448
  496. #define ZGEMM3M_DEFAULT_P 224
  497. #define XGEMM3M_DEFAULT_P 112
  498. #define CGEMM3M_DEFAULT_Q 224
  499. #define ZGEMM3M_DEFAULT_Q 224
  500. #define XGEMM3M_DEFAULT_Q 224
  501. #define CGEMM3M_DEFAULT_R 12288
  502. #define ZGEMM3M_DEFAULT_R 12288
  503. #define XGEMM3M_DEFAULT_R 12288
  504. #define SGEMM_DEFAULT_R 12288
  505. #define QGEMM_DEFAULT_R qgemm_r
  506. #define DGEMM_DEFAULT_R 12288
  507. #define CGEMM_DEFAULT_R cgemm_r
  508. #define ZGEMM_DEFAULT_R zgemm_r
  509. #define XGEMM_DEFAULT_R xgemm_r
  510. #define SYMV_P 16
  511. #define HAVE_EXCLUSIVE_CACHE
  512. #define GEMM_THREAD gemm_thread_mn
  513. #endif
  514. #ifdef ZEN
  515. #define SNUMOPT 16
  516. #define DNUMOPT 8
  517. #define GEMM_DEFAULT_OFFSET_A 0
  518. #define GEMM_DEFAULT_OFFSET_B 0
  519. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  520. #define SYMV_P 8
  521. #if defined(XDOUBLE) || defined(DOUBLE)
  522. #define SWITCH_RATIO 4
  523. #define GEMM_PREFERED_SIZE 4
  524. #else
  525. #define SWITCH_RATIO 8
  526. #define GEMM_PREFERED_SIZE 8
  527. #endif
  528. #ifdef ARCH_X86
  529. #define SGEMM_DEFAULT_UNROLL_M 4
  530. #define DGEMM_DEFAULT_UNROLL_M 2
  531. #define QGEMM_DEFAULT_UNROLL_M 2
  532. #define CGEMM_DEFAULT_UNROLL_M 2
  533. #define ZGEMM_DEFAULT_UNROLL_M 1
  534. #define XGEMM_DEFAULT_UNROLL_M 1
  535. #define SGEMM_DEFAULT_UNROLL_N 4
  536. #define DGEMM_DEFAULT_UNROLL_N 4
  537. #define QGEMM_DEFAULT_UNROLL_N 2
  538. #define CGEMM_DEFAULT_UNROLL_N 2
  539. #define ZGEMM_DEFAULT_UNROLL_N 2
  540. #define XGEMM_DEFAULT_UNROLL_N 1
  541. #else
  542. #define SGEMM_DEFAULT_UNROLL_M 8
  543. #define DGEMM_DEFAULT_UNROLL_M 4
  544. #define QGEMM_DEFAULT_UNROLL_M 2
  545. #define CGEMM_DEFAULT_UNROLL_M 8
  546. #define ZGEMM_DEFAULT_UNROLL_M 4
  547. #define XGEMM_DEFAULT_UNROLL_M 1
  548. #define SGEMM_DEFAULT_UNROLL_N 4
  549. #define DGEMM_DEFAULT_UNROLL_N 8
  550. #define QGEMM_DEFAULT_UNROLL_N 2
  551. #define CGEMM_DEFAULT_UNROLL_N 2
  552. #define ZGEMM_DEFAULT_UNROLL_N 2
  553. #define XGEMM_DEFAULT_UNROLL_N 1
  554. /*
  555. #define SGEMM_DEFAULT_UNROLL_MN 32
  556. #define DGEMM_DEFAULT_UNROLL_MN 32
  557. */
  558. #endif
  559. #ifdef ARCH_X86
  560. #define SGEMM_DEFAULT_P 512
  561. #define SGEMM_DEFAULT_R sgemm_r
  562. #define DGEMM_DEFAULT_P 512
  563. #define DGEMM_DEFAULT_R dgemm_r
  564. #define QGEMM_DEFAULT_P 504
  565. #define QGEMM_DEFAULT_R qgemm_r
  566. #define CGEMM_DEFAULT_P 128
  567. #define CGEMM_DEFAULT_R 1024
  568. #define ZGEMM_DEFAULT_P 512
  569. #define ZGEMM_DEFAULT_R zgemm_r
  570. #define XGEMM_DEFAULT_P 252
  571. #define XGEMM_DEFAULT_R xgemm_r
  572. #define SGEMM_DEFAULT_Q 256
  573. #define DGEMM_DEFAULT_Q 256
  574. #define QGEMM_DEFAULT_Q 128
  575. #define CGEMM_DEFAULT_Q 256
  576. #define ZGEMM_DEFAULT_Q 192
  577. #define XGEMM_DEFAULT_Q 128
  578. #else
  579. #define SGEMM_DEFAULT_P 320
  580. #define DGEMM_DEFAULT_P 512
  581. #define CGEMM_DEFAULT_P 256
  582. #define ZGEMM_DEFAULT_P 192
  583. #ifdef WINDOWS_ABI
  584. #define SGEMM_DEFAULT_Q 320
  585. #define DGEMM_DEFAULT_Q 128
  586. #else
  587. #define SGEMM_DEFAULT_Q 320
  588. #define DGEMM_DEFAULT_Q 256
  589. #endif
  590. #define CGEMM_DEFAULT_Q 256
  591. #define ZGEMM_DEFAULT_Q 192
  592. #define SGEMM_DEFAULT_R sgemm_r
  593. #define DGEMM_DEFAULT_R 13824
  594. #define CGEMM_DEFAULT_R cgemm_r
  595. #define ZGEMM_DEFAULT_R zgemm_r
  596. #define QGEMM_DEFAULT_Q 128
  597. #define QGEMM_DEFAULT_P 504
  598. #define QGEMM_DEFAULT_R qgemm_r
  599. #define XGEMM_DEFAULT_P 252
  600. #define XGEMM_DEFAULT_R xgemm_r
  601. #define XGEMM_DEFAULT_Q 128
  602. #define CGEMM3M_DEFAULT_UNROLL_N 4
  603. #define CGEMM3M_DEFAULT_UNROLL_M 8
  604. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  605. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  606. #define CGEMM3M_DEFAULT_P 320
  607. #define ZGEMM3M_DEFAULT_P 256
  608. #define XGEMM3M_DEFAULT_P 112
  609. #define CGEMM3M_DEFAULT_Q 320
  610. #define ZGEMM3M_DEFAULT_Q 256
  611. #define XGEMM3M_DEFAULT_Q 224
  612. #define CGEMM3M_DEFAULT_R 12288
  613. #define ZGEMM3M_DEFAULT_R 12288
  614. #define XGEMM3M_DEFAULT_R 12288
  615. #endif
  616. #endif
  617. #ifdef ATHLON
  618. #define SNUMOPT 4
  619. #define DNUMOPT 2
  620. #define GEMM_DEFAULT_OFFSET_A 0
  621. #define GEMM_DEFAULT_OFFSET_B 384
  622. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  623. #define SGEMM_DEFAULT_UNROLL_N 4
  624. #define DGEMM_DEFAULT_UNROLL_N 4
  625. #define QGEMM_DEFAULT_UNROLL_N 2
  626. #define CGEMM_DEFAULT_UNROLL_N 2
  627. #define ZGEMM_DEFAULT_UNROLL_N 2
  628. #define XGEMM_DEFAULT_UNROLL_N 1
  629. #define SGEMM_DEFAULT_UNROLL_M 2
  630. #define DGEMM_DEFAULT_UNROLL_M 1
  631. #define QGEMM_DEFAULT_UNROLL_M 2
  632. #define CGEMM_DEFAULT_UNROLL_M 1
  633. #define ZGEMM_DEFAULT_UNROLL_M 1
  634. #define XGEMM_DEFAULT_UNROLL_M 1
  635. #define SGEMM_DEFAULT_R sgemm_r
  636. #define DGEMM_DEFAULT_R dgemm_r
  637. #define QGEMM_DEFAULT_R qgemm_r
  638. #define CGEMM_DEFAULT_R cgemm_r
  639. #define ZGEMM_DEFAULT_R zgemm_r
  640. #define XGEMM_DEFAULT_R xgemm_r
  641. #define SGEMM_DEFAULT_P 208
  642. #define DGEMM_DEFAULT_P 104
  643. #define QGEMM_DEFAULT_P 56
  644. #define CGEMM_DEFAULT_P 104
  645. #define ZGEMM_DEFAULT_P 56
  646. #define XGEMM_DEFAULT_P 28
  647. #define SGEMM_DEFAULT_Q 208
  648. #define DGEMM_DEFAULT_Q 208
  649. #define QGEMM_DEFAULT_Q 208
  650. #define CGEMM_DEFAULT_Q 208
  651. #define ZGEMM_DEFAULT_Q 208
  652. #define XGEMM_DEFAULT_Q 208
  653. #define SYMV_P 16
  654. #define HAVE_EXCLUSIVE_CACHE
  655. #endif
  656. #ifdef VIAC3
  657. #define SNUMOPT 2
  658. #define DNUMOPT 1
  659. #define GEMM_DEFAULT_OFFSET_A 0
  660. #define GEMM_DEFAULT_OFFSET_B 256
  661. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  662. #define SGEMM_DEFAULT_UNROLL_N 4
  663. #define DGEMM_DEFAULT_UNROLL_N 4
  664. #define QGEMM_DEFAULT_UNROLL_N 2
  665. #define CGEMM_DEFAULT_UNROLL_N 2
  666. #define ZGEMM_DEFAULT_UNROLL_N 2
  667. #define XGEMM_DEFAULT_UNROLL_N 1
  668. #define SGEMM_DEFAULT_UNROLL_M 2
  669. #define DGEMM_DEFAULT_UNROLL_M 1
  670. #define QGEMM_DEFAULT_UNROLL_M 2
  671. #define CGEMM_DEFAULT_UNROLL_M 1
  672. #define ZGEMM_DEFAULT_UNROLL_M 1
  673. #define XGEMM_DEFAULT_UNROLL_M 1
  674. #define SGEMM_DEFAULT_R sgemm_r
  675. #define DGEMM_DEFAULT_R dgemm_r
  676. #define QGEMM_DEFAULT_R qgemm_r
  677. #define CGEMM_DEFAULT_R cgemm_r
  678. #define ZGEMM_DEFAULT_R zgemm_r
  679. #define XGEMM_DEFAULT_R xgemm_r
  680. #define SGEMM_DEFAULT_P 128
  681. #define DGEMM_DEFAULT_P 128
  682. #define QGEMM_DEFAULT_P 128
  683. #define CGEMM_DEFAULT_P 128
  684. #define ZGEMM_DEFAULT_P 128
  685. #define XGEMM_DEFAULT_P 128
  686. #define SGEMM_DEFAULT_Q 512
  687. #define DGEMM_DEFAULT_Q 256
  688. #define QGEMM_DEFAULT_Q 256
  689. #define CGEMM_DEFAULT_Q 256
  690. #define ZGEMM_DEFAULT_Q 128
  691. #define XGEMM_DEFAULT_Q 128
  692. #define SYMV_P 16
  693. #endif
  694. #ifdef NANO
  695. #define SNUMOPT 4
  696. #define DNUMOPT 2
  697. #define GEMM_DEFAULT_OFFSET_A 64
  698. #define GEMM_DEFAULT_OFFSET_B 256
  699. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x01ffffUL
  700. #ifdef ARCH_X86
  701. #define SGEMM_DEFAULT_UNROLL_N 4
  702. #define DGEMM_DEFAULT_UNROLL_N 4
  703. #define QGEMM_DEFAULT_UNROLL_N 2
  704. #define CGEMM_DEFAULT_UNROLL_N 2
  705. #define ZGEMM_DEFAULT_UNROLL_N 2
  706. #define XGEMM_DEFAULT_UNROLL_N 1
  707. #define SGEMM_DEFAULT_UNROLL_M 4
  708. #define DGEMM_DEFAULT_UNROLL_M 2
  709. #define QGEMM_DEFAULT_UNROLL_M 2
  710. #define CGEMM_DEFAULT_UNROLL_M 2
  711. #define ZGEMM_DEFAULT_UNROLL_M 1
  712. #define XGEMM_DEFAULT_UNROLL_M 1
  713. #else
  714. #define SGEMM_DEFAULT_UNROLL_N 8
  715. #define DGEMM_DEFAULT_UNROLL_N 4
  716. #define QGEMM_DEFAULT_UNROLL_N 2
  717. #define CGEMM_DEFAULT_UNROLL_N 4
  718. #define ZGEMM_DEFAULT_UNROLL_N 2
  719. #define XGEMM_DEFAULT_UNROLL_N 1
  720. #define SGEMM_DEFAULT_UNROLL_M 4
  721. #define DGEMM_DEFAULT_UNROLL_M 4
  722. #define QGEMM_DEFAULT_UNROLL_M 2
  723. #define CGEMM_DEFAULT_UNROLL_M 2
  724. #define ZGEMM_DEFAULT_UNROLL_M 2
  725. #define XGEMM_DEFAULT_UNROLL_M 1
  726. #endif
  727. #define SGEMM_DEFAULT_P 288
  728. #define DGEMM_DEFAULT_P 288
  729. #define QGEMM_DEFAULT_P 288
  730. #define CGEMM_DEFAULT_P 288
  731. #define ZGEMM_DEFAULT_P 288
  732. #define XGEMM_DEFAULT_P 288
  733. #define SGEMM_DEFAULT_R sgemm_r
  734. #define DGEMM_DEFAULT_R dgemm_r
  735. #define QGEMM_DEFAULT_R qgemm_r
  736. #define CGEMM_DEFAULT_R cgemm_r
  737. #define ZGEMM_DEFAULT_R zgemm_r
  738. #define XGEMM_DEFAULT_R xgemm_r
  739. #define SGEMM_DEFAULT_Q 256
  740. #define DGEMM_DEFAULT_Q 128
  741. #define QGEMM_DEFAULT_Q 64
  742. #define CGEMM_DEFAULT_Q 128
  743. #define ZGEMM_DEFAULT_Q 64
  744. #define XGEMM_DEFAULT_Q 32
  745. #define SYMV_P 16
  746. #define HAVE_EXCLUSIVE_CACHE
  747. #endif
  748. #if defined(PENTIUM) || defined(PENTIUM2) || defined(PENTIUM3)
  749. #ifdef HAVE_SSE
  750. #define SNUMOPT 2
  751. #else
  752. #define SNUMOPT 1
  753. #endif
  754. #define DNUMOPT 1
  755. #define GEMM_DEFAULT_OFFSET_A 0
  756. #define GEMM_DEFAULT_OFFSET_B 0
  757. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  758. #ifdef HAVE_SSE
  759. #define SGEMM_DEFAULT_UNROLL_M 8
  760. #define CGEMM_DEFAULT_UNROLL_M 4
  761. #else
  762. #define SGEMM_DEFAULT_UNROLL_M 4
  763. #define CGEMM_DEFAULT_UNROLL_M 2
  764. #endif
  765. #define DGEMM_DEFAULT_UNROLL_M 2
  766. #define SGEMM_DEFAULT_UNROLL_N 2
  767. #define DGEMM_DEFAULT_UNROLL_N 2
  768. #define QGEMM_DEFAULT_UNROLL_M 2
  769. #define QGEMM_DEFAULT_UNROLL_N 2
  770. #define CGEMM_DEFAULT_UNROLL_N 1
  771. #define ZGEMM_DEFAULT_UNROLL_M 1
  772. #define ZGEMM_DEFAULT_UNROLL_N 1
  773. #define XGEMM_DEFAULT_UNROLL_M 1
  774. #define XGEMM_DEFAULT_UNROLL_N 1
  775. #define SGEMM_DEFAULT_P sgemm_p
  776. #define SGEMM_DEFAULT_Q 256
  777. #define SGEMM_DEFAULT_R sgemm_r
  778. #define DGEMM_DEFAULT_P dgemm_p
  779. #define DGEMM_DEFAULT_Q 256
  780. #define DGEMM_DEFAULT_R dgemm_r
  781. #define QGEMM_DEFAULT_P qgemm_p
  782. #define QGEMM_DEFAULT_Q 256
  783. #define QGEMM_DEFAULT_R qgemm_r
  784. #define CGEMM_DEFAULT_P cgemm_p
  785. #define CGEMM_DEFAULT_Q 256
  786. #define CGEMM_DEFAULT_R cgemm_r
  787. #define ZGEMM_DEFAULT_P zgemm_p
  788. #define ZGEMM_DEFAULT_Q 256
  789. #define ZGEMM_DEFAULT_R zgemm_r
  790. #define XGEMM_DEFAULT_P xgemm_p
  791. #define XGEMM_DEFAULT_Q 256
  792. #define XGEMM_DEFAULT_R xgemm_r
  793. #define SYMV_P 4
  794. #endif
  795. #ifdef PENTIUMM
  796. #define SNUMOPT 2
  797. #define DNUMOPT 1
  798. #define GEMM_DEFAULT_OFFSET_A 0
  799. #define GEMM_DEFAULT_OFFSET_B 0
  800. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  801. #ifdef CORE_YONAH
  802. #define SGEMM_DEFAULT_UNROLL_M 4
  803. #define SGEMM_DEFAULT_UNROLL_N 4
  804. #define DGEMM_DEFAULT_UNROLL_M 2
  805. #define DGEMM_DEFAULT_UNROLL_N 4
  806. #define QGEMM_DEFAULT_UNROLL_M 2
  807. #define QGEMM_DEFAULT_UNROLL_N 2
  808. #define CGEMM_DEFAULT_UNROLL_M 2
  809. #define CGEMM_DEFAULT_UNROLL_N 2
  810. #define ZGEMM_DEFAULT_UNROLL_M 1
  811. #define ZGEMM_DEFAULT_UNROLL_N 2
  812. #define XGEMM_DEFAULT_UNROLL_M 1
  813. #define XGEMM_DEFAULT_UNROLL_N 1
  814. #else
  815. #define SGEMM_DEFAULT_UNROLL_M 8
  816. #define SGEMM_DEFAULT_UNROLL_N 2
  817. #define DGEMM_DEFAULT_UNROLL_M 2
  818. #define DGEMM_DEFAULT_UNROLL_N 2
  819. #define QGEMM_DEFAULT_UNROLL_M 2
  820. #define QGEMM_DEFAULT_UNROLL_N 2
  821. #define CGEMM_DEFAULT_UNROLL_M 4
  822. #define CGEMM_DEFAULT_UNROLL_N 1
  823. #define ZGEMM_DEFAULT_UNROLL_M 1
  824. #define ZGEMM_DEFAULT_UNROLL_N 1
  825. #define XGEMM_DEFAULT_UNROLL_M 1
  826. #define XGEMM_DEFAULT_UNROLL_N 1
  827. #endif
  828. #define SGEMM_DEFAULT_P sgemm_p
  829. #define SGEMM_DEFAULT_Q 256
  830. #define SGEMM_DEFAULT_R sgemm_r
  831. #define DGEMM_DEFAULT_P dgemm_p
  832. #define DGEMM_DEFAULT_Q 256
  833. #define DGEMM_DEFAULT_R dgemm_r
  834. #define QGEMM_DEFAULT_P qgemm_p
  835. #define QGEMM_DEFAULT_Q 256
  836. #define QGEMM_DEFAULT_R qgemm_r
  837. #define CGEMM_DEFAULT_P cgemm_p
  838. #define CGEMM_DEFAULT_Q 256
  839. #define CGEMM_DEFAULT_R cgemm_r
  840. #define ZGEMM_DEFAULT_P zgemm_p
  841. #define ZGEMM_DEFAULT_Q 256
  842. #define ZGEMM_DEFAULT_R zgemm_r
  843. #define XGEMM_DEFAULT_P xgemm_p
  844. #define XGEMM_DEFAULT_Q 256
  845. #define XGEMM_DEFAULT_R xgemm_r
  846. #define SYMV_P 4
  847. #endif
  848. #ifdef CORE_NORTHWOOD
  849. #define SNUMOPT 4
  850. #define DNUMOPT 2
  851. #define GEMM_DEFAULT_OFFSET_A 0
  852. #define GEMM_DEFAULT_OFFSET_B 32
  853. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  854. #define SYMV_P 8
  855. #define SGEMM_DEFAULT_UNROLL_M 8
  856. #define DGEMM_DEFAULT_UNROLL_M 4
  857. #define QGEMM_DEFAULT_UNROLL_M 2
  858. #define CGEMM_DEFAULT_UNROLL_M 4
  859. #define ZGEMM_DEFAULT_UNROLL_M 2
  860. #define XGEMM_DEFAULT_UNROLL_M 1
  861. #define SGEMM_DEFAULT_UNROLL_N 2
  862. #define DGEMM_DEFAULT_UNROLL_N 2
  863. #define QGEMM_DEFAULT_UNROLL_N 2
  864. #define CGEMM_DEFAULT_UNROLL_N 1
  865. #define ZGEMM_DEFAULT_UNROLL_N 1
  866. #define XGEMM_DEFAULT_UNROLL_N 1
  867. #define SGEMM_DEFAULT_P sgemm_p
  868. #define SGEMM_DEFAULT_R sgemm_r
  869. #define DGEMM_DEFAULT_P dgemm_p
  870. #define DGEMM_DEFAULT_R dgemm_r
  871. #define QGEMM_DEFAULT_P qgemm_p
  872. #define QGEMM_DEFAULT_R qgemm_r
  873. #define CGEMM_DEFAULT_P cgemm_p
  874. #define CGEMM_DEFAULT_R cgemm_r
  875. #define ZGEMM_DEFAULT_P zgemm_p
  876. #define ZGEMM_DEFAULT_R zgemm_r
  877. #define XGEMM_DEFAULT_P xgemm_p
  878. #define XGEMM_DEFAULT_R xgemm_r
  879. #define SGEMM_DEFAULT_Q 128
  880. #define DGEMM_DEFAULT_Q 128
  881. #define QGEMM_DEFAULT_Q 128
  882. #define CGEMM_DEFAULT_Q 128
  883. #define ZGEMM_DEFAULT_Q 128
  884. #define XGEMM_DEFAULT_Q 128
  885. #endif
  886. #ifdef CORE_PRESCOTT
  887. #define SNUMOPT 4
  888. #define DNUMOPT 2
  889. #ifndef __64BIT__
  890. #define GEMM_DEFAULT_OFFSET_A 128
  891. #define GEMM_DEFAULT_OFFSET_B 192
  892. #else
  893. #define GEMM_DEFAULT_OFFSET_A 0
  894. #define GEMM_DEFAULT_OFFSET_B 256
  895. #endif
  896. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  897. #define SYMV_P 8
  898. #ifdef ARCH_X86
  899. #define SGEMM_DEFAULT_UNROLL_M 4
  900. #define DGEMM_DEFAULT_UNROLL_M 2
  901. #define QGEMM_DEFAULT_UNROLL_M 2
  902. #define CGEMM_DEFAULT_UNROLL_M 2
  903. #define ZGEMM_DEFAULT_UNROLL_M 1
  904. #define XGEMM_DEFAULT_UNROLL_M 1
  905. #else
  906. #define SGEMM_DEFAULT_UNROLL_M 8
  907. #define DGEMM_DEFAULT_UNROLL_M 4
  908. #define QGEMM_DEFAULT_UNROLL_M 2
  909. #define CGEMM_DEFAULT_UNROLL_M 4
  910. #define ZGEMM_DEFAULT_UNROLL_M 2
  911. #define XGEMM_DEFAULT_UNROLL_M 1
  912. #endif
  913. #define SGEMM_DEFAULT_UNROLL_N 4
  914. #define DGEMM_DEFAULT_UNROLL_N 4
  915. #define QGEMM_DEFAULT_UNROLL_N 2
  916. #define CGEMM_DEFAULT_UNROLL_N 2
  917. #define ZGEMM_DEFAULT_UNROLL_N 2
  918. #define XGEMM_DEFAULT_UNROLL_N 1
  919. #define SGEMM_DEFAULT_P sgemm_p
  920. #define SGEMM_DEFAULT_R sgemm_r
  921. #define DGEMM_DEFAULT_P dgemm_p
  922. #define DGEMM_DEFAULT_R dgemm_r
  923. #define QGEMM_DEFAULT_P qgemm_p
  924. #define QGEMM_DEFAULT_R qgemm_r
  925. #define CGEMM_DEFAULT_P cgemm_p
  926. #define CGEMM_DEFAULT_R cgemm_r
  927. #define ZGEMM_DEFAULT_P zgemm_p
  928. #define ZGEMM_DEFAULT_R zgemm_r
  929. #define XGEMM_DEFAULT_P xgemm_p
  930. #define XGEMM_DEFAULT_R xgemm_r
  931. #define SGEMM_DEFAULT_Q 128
  932. #define DGEMM_DEFAULT_Q 128
  933. #define QGEMM_DEFAULT_Q 128
  934. #define CGEMM_DEFAULT_Q 128
  935. #define ZGEMM_DEFAULT_Q 128
  936. #define XGEMM_DEFAULT_Q 128
  937. #endif
  938. #ifdef CORE2
  939. #define SNUMOPT 8
  940. #define DNUMOPT 4
  941. #define GEMM_DEFAULT_OFFSET_A 448
  942. #define GEMM_DEFAULT_OFFSET_B 128
  943. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  944. #define SYMV_P 8
  945. #define SWITCH_RATIO 4
  946. #ifdef ARCH_X86
  947. #define SGEMM_DEFAULT_UNROLL_M 8
  948. #define DGEMM_DEFAULT_UNROLL_M 4
  949. #define QGEMM_DEFAULT_UNROLL_M 2
  950. #define CGEMM_DEFAULT_UNROLL_M 4
  951. #define ZGEMM_DEFAULT_UNROLL_M 2
  952. #define XGEMM_DEFAULT_UNROLL_M 1
  953. #define SGEMM_DEFAULT_UNROLL_N 2
  954. #define DGEMM_DEFAULT_UNROLL_N 2
  955. #define QGEMM_DEFAULT_UNROLL_N 2
  956. #define CGEMM_DEFAULT_UNROLL_N 1
  957. #define ZGEMM_DEFAULT_UNROLL_N 1
  958. #define XGEMM_DEFAULT_UNROLL_N 1
  959. #define MASK(a, b) ((((a) + (b) - 1) / (b)) * (b))
  960. #else
  961. #define SGEMM_DEFAULT_UNROLL_M 8
  962. #define DGEMM_DEFAULT_UNROLL_M 4
  963. #define QGEMM_DEFAULT_UNROLL_M 2
  964. #define CGEMM_DEFAULT_UNROLL_M 4
  965. #define ZGEMM_DEFAULT_UNROLL_M 2
  966. #define XGEMM_DEFAULT_UNROLL_M 1
  967. #define SGEMM_DEFAULT_UNROLL_N 4
  968. #define DGEMM_DEFAULT_UNROLL_N 4
  969. #define QGEMM_DEFAULT_UNROLL_N 2
  970. #define CGEMM_DEFAULT_UNROLL_N 2
  971. #define ZGEMM_DEFAULT_UNROLL_N 2
  972. #define XGEMM_DEFAULT_UNROLL_N 1
  973. #endif
  974. #define SGEMM_DEFAULT_P sgemm_p
  975. #define SGEMM_DEFAULT_R sgemm_r
  976. #define DGEMM_DEFAULT_P dgemm_p
  977. #define DGEMM_DEFAULT_R dgemm_r
  978. #define QGEMM_DEFAULT_P qgemm_p
  979. #define QGEMM_DEFAULT_R qgemm_r
  980. #define CGEMM_DEFAULT_P cgemm_p
  981. #define CGEMM_DEFAULT_R cgemm_r
  982. #define ZGEMM_DEFAULT_P zgemm_p
  983. #define ZGEMM_DEFAULT_R zgemm_r
  984. #define XGEMM_DEFAULT_P xgemm_p
  985. #define XGEMM_DEFAULT_R xgemm_r
  986. #define SGEMM_DEFAULT_Q 256
  987. #define DGEMM_DEFAULT_Q 256
  988. #define QGEMM_DEFAULT_Q 256
  989. #define CGEMM_DEFAULT_Q 256
  990. #define ZGEMM_DEFAULT_Q 256
  991. #define XGEMM_DEFAULT_Q 256
  992. #endif
  993. #ifdef PENRYN
  994. #define SNUMOPT 8
  995. #define DNUMOPT 4
  996. #define GEMM_DEFAULT_OFFSET_A 128
  997. #define GEMM_DEFAULT_OFFSET_B 0
  998. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  999. #define SYMV_P 8
  1000. #define SWITCH_RATIO 4
  1001. #ifdef ARCH_X86
  1002. #define SGEMM_DEFAULT_UNROLL_M 4
  1003. #define DGEMM_DEFAULT_UNROLL_M 2
  1004. #define QGEMM_DEFAULT_UNROLL_M 2
  1005. #define CGEMM_DEFAULT_UNROLL_M 2
  1006. #define ZGEMM_DEFAULT_UNROLL_M 1
  1007. #define XGEMM_DEFAULT_UNROLL_M 1
  1008. #define SGEMM_DEFAULT_UNROLL_N 4
  1009. #define DGEMM_DEFAULT_UNROLL_N 4
  1010. #define QGEMM_DEFAULT_UNROLL_N 2
  1011. #define CGEMM_DEFAULT_UNROLL_N 2
  1012. #define ZGEMM_DEFAULT_UNROLL_N 2
  1013. #define XGEMM_DEFAULT_UNROLL_N 1
  1014. #else
  1015. #define SGEMM_DEFAULT_UNROLL_M 8
  1016. #define DGEMM_DEFAULT_UNROLL_M 4
  1017. #define QGEMM_DEFAULT_UNROLL_M 2
  1018. #define CGEMM_DEFAULT_UNROLL_M 4
  1019. #define ZGEMM_DEFAULT_UNROLL_M 2
  1020. #define XGEMM_DEFAULT_UNROLL_M 1
  1021. #define SGEMM_DEFAULT_UNROLL_N 4
  1022. #define DGEMM_DEFAULT_UNROLL_N 4
  1023. #define QGEMM_DEFAULT_UNROLL_N 2
  1024. #define CGEMM_DEFAULT_UNROLL_N 2
  1025. #define ZGEMM_DEFAULT_UNROLL_N 2
  1026. #define XGEMM_DEFAULT_UNROLL_N 1
  1027. #endif
  1028. #define SGEMM_DEFAULT_P sgemm_p
  1029. #define SGEMM_DEFAULT_R sgemm_r
  1030. #define DGEMM_DEFAULT_P dgemm_p
  1031. #define DGEMM_DEFAULT_R dgemm_r
  1032. #define QGEMM_DEFAULT_P qgemm_p
  1033. #define QGEMM_DEFAULT_R qgemm_r
  1034. #define CGEMM_DEFAULT_P cgemm_p
  1035. #define CGEMM_DEFAULT_R cgemm_r
  1036. #define ZGEMM_DEFAULT_P zgemm_p
  1037. #define ZGEMM_DEFAULT_R zgemm_r
  1038. #define XGEMM_DEFAULT_P xgemm_p
  1039. #define XGEMM_DEFAULT_R xgemm_r
  1040. #define SGEMM_DEFAULT_Q 512
  1041. #define DGEMM_DEFAULT_Q 256
  1042. #define QGEMM_DEFAULT_Q 128
  1043. #define CGEMM_DEFAULT_Q 512
  1044. #define ZGEMM_DEFAULT_Q 256
  1045. #define XGEMM_DEFAULT_Q 128
  1046. #define GETRF_FACTOR 0.75
  1047. #endif
  1048. #ifdef DUNNINGTON
  1049. #define SNUMOPT 8
  1050. #define DNUMOPT 4
  1051. #define GEMM_DEFAULT_OFFSET_A 128
  1052. #define GEMM_DEFAULT_OFFSET_B 0
  1053. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1054. #define SYMV_P 8
  1055. #define SWITCH_RATIO 4
  1056. #ifdef ARCH_X86
  1057. #define SGEMM_DEFAULT_UNROLL_M 4
  1058. #define DGEMM_DEFAULT_UNROLL_M 2
  1059. #define QGEMM_DEFAULT_UNROLL_M 2
  1060. #define CGEMM_DEFAULT_UNROLL_M 2
  1061. #define ZGEMM_DEFAULT_UNROLL_M 1
  1062. #define XGEMM_DEFAULT_UNROLL_M 1
  1063. #define SGEMM_DEFAULT_UNROLL_N 4
  1064. #define DGEMM_DEFAULT_UNROLL_N 4
  1065. #define QGEMM_DEFAULT_UNROLL_N 2
  1066. #define CGEMM_DEFAULT_UNROLL_N 2
  1067. #define ZGEMM_DEFAULT_UNROLL_N 2
  1068. #define XGEMM_DEFAULT_UNROLL_N 1
  1069. #else
  1070. #define SGEMM_DEFAULT_UNROLL_M 8
  1071. #define DGEMM_DEFAULT_UNROLL_M 4
  1072. #define QGEMM_DEFAULT_UNROLL_M 2
  1073. #define CGEMM_DEFAULT_UNROLL_M 4
  1074. #define ZGEMM_DEFAULT_UNROLL_M 2
  1075. #define XGEMM_DEFAULT_UNROLL_M 1
  1076. #define SGEMM_DEFAULT_UNROLL_N 4
  1077. #define DGEMM_DEFAULT_UNROLL_N 4
  1078. #define QGEMM_DEFAULT_UNROLL_N 2
  1079. #define CGEMM_DEFAULT_UNROLL_N 2
  1080. #define ZGEMM_DEFAULT_UNROLL_N 2
  1081. #define XGEMM_DEFAULT_UNROLL_N 1
  1082. #endif
  1083. #define SGEMM_DEFAULT_P sgemm_p
  1084. #define SGEMM_DEFAULT_R sgemm_r
  1085. #define DGEMM_DEFAULT_P dgemm_p
  1086. #define DGEMM_DEFAULT_R dgemm_r
  1087. #define QGEMM_DEFAULT_P qgemm_p
  1088. #define QGEMM_DEFAULT_R qgemm_r
  1089. #define CGEMM_DEFAULT_P cgemm_p
  1090. #define CGEMM_DEFAULT_R cgemm_r
  1091. #define ZGEMM_DEFAULT_P zgemm_p
  1092. #define ZGEMM_DEFAULT_R zgemm_r
  1093. #define XGEMM_DEFAULT_P xgemm_p
  1094. #define XGEMM_DEFAULT_R xgemm_r
  1095. #define SGEMM_DEFAULT_Q 768
  1096. #define DGEMM_DEFAULT_Q 384
  1097. #define QGEMM_DEFAULT_Q 192
  1098. #define CGEMM_DEFAULT_Q 768
  1099. #define ZGEMM_DEFAULT_Q 384
  1100. #define XGEMM_DEFAULT_Q 192
  1101. #define GETRF_FACTOR 0.75
  1102. #define GEMM_THREAD gemm_thread_mn
  1103. #endif
  1104. #ifdef NEHALEM
  1105. #define SNUMOPT 8
  1106. #define DNUMOPT 4
  1107. #define GEMM_DEFAULT_OFFSET_A 32
  1108. #define GEMM_DEFAULT_OFFSET_B 0
  1109. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1110. #define SYMV_P 8
  1111. #define SWITCH_RATIO 4
  1112. #ifdef ARCH_X86
  1113. #define SGEMM_DEFAULT_UNROLL_M 4
  1114. #define DGEMM_DEFAULT_UNROLL_M 2
  1115. #define QGEMM_DEFAULT_UNROLL_M 2
  1116. #define CGEMM_DEFAULT_UNROLL_M 2
  1117. #define ZGEMM_DEFAULT_UNROLL_M 1
  1118. #define XGEMM_DEFAULT_UNROLL_M 1
  1119. #define SGEMM_DEFAULT_UNROLL_N 4
  1120. #define DGEMM_DEFAULT_UNROLL_N 4
  1121. #define QGEMM_DEFAULT_UNROLL_N 2
  1122. #define CGEMM_DEFAULT_UNROLL_N 2
  1123. #define ZGEMM_DEFAULT_UNROLL_N 2
  1124. #define XGEMM_DEFAULT_UNROLL_N 1
  1125. #else
  1126. #define SGEMM_DEFAULT_UNROLL_M 4
  1127. #define DGEMM_DEFAULT_UNROLL_M 2
  1128. #define QGEMM_DEFAULT_UNROLL_M 2
  1129. #define CGEMM_DEFAULT_UNROLL_M 2
  1130. #define ZGEMM_DEFAULT_UNROLL_M 1
  1131. #define XGEMM_DEFAULT_UNROLL_M 1
  1132. #define SGEMM_DEFAULT_UNROLL_N 8
  1133. #define DGEMM_DEFAULT_UNROLL_N 8
  1134. #define QGEMM_DEFAULT_UNROLL_N 2
  1135. #define CGEMM_DEFAULT_UNROLL_N 4
  1136. #define ZGEMM_DEFAULT_UNROLL_N 4
  1137. #define XGEMM_DEFAULT_UNROLL_N 1
  1138. #endif
  1139. #define SGEMM_DEFAULT_P 504
  1140. #define SGEMM_DEFAULT_R sgemm_r
  1141. #define DGEMM_DEFAULT_P 504
  1142. #define DGEMM_DEFAULT_R dgemm_r
  1143. #define QGEMM_DEFAULT_P 504
  1144. #define QGEMM_DEFAULT_R qgemm_r
  1145. #define CGEMM_DEFAULT_P 252
  1146. #define CGEMM_DEFAULT_R cgemm_r
  1147. #define ZGEMM_DEFAULT_P 252
  1148. #define ZGEMM_DEFAULT_R zgemm_r
  1149. #define XGEMM_DEFAULT_P 252
  1150. #define XGEMM_DEFAULT_R xgemm_r
  1151. #define SGEMM_DEFAULT_Q 512
  1152. #define DGEMM_DEFAULT_Q 256
  1153. #define QGEMM_DEFAULT_Q 128
  1154. #define CGEMM_DEFAULT_Q 512
  1155. #define ZGEMM_DEFAULT_Q 256
  1156. #define XGEMM_DEFAULT_Q 128
  1157. #define GETRF_FACTOR 0.72
  1158. #endif
  1159. #ifdef SANDYBRIDGE
  1160. #define SNUMOPT 8
  1161. #define DNUMOPT 4
  1162. #define GEMM_DEFAULT_OFFSET_A 0
  1163. #define GEMM_DEFAULT_OFFSET_B 0
  1164. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1165. #define SYMV_P 8
  1166. #define SWITCH_RATIO 4
  1167. #ifdef ARCH_X86
  1168. #define SGEMM_DEFAULT_UNROLL_M 4
  1169. #define DGEMM_DEFAULT_UNROLL_M 2
  1170. #define QGEMM_DEFAULT_UNROLL_M 2
  1171. #define CGEMM_DEFAULT_UNROLL_M 2
  1172. #define ZGEMM_DEFAULT_UNROLL_M 1
  1173. #define XGEMM_DEFAULT_UNROLL_M 1
  1174. #define SGEMM_DEFAULT_UNROLL_N 4
  1175. #define DGEMM_DEFAULT_UNROLL_N 4
  1176. #define QGEMM_DEFAULT_UNROLL_N 2
  1177. #define CGEMM_DEFAULT_UNROLL_N 2
  1178. #define ZGEMM_DEFAULT_UNROLL_N 2
  1179. #define XGEMM_DEFAULT_UNROLL_N 1
  1180. #else
  1181. #define SGEMM_DEFAULT_UNROLL_M 16
  1182. #define DGEMM_DEFAULT_UNROLL_M 8
  1183. #define QGEMM_DEFAULT_UNROLL_M 2
  1184. #define CGEMM_DEFAULT_UNROLL_M 8
  1185. #define ZGEMM_DEFAULT_UNROLL_M 1
  1186. #define XGEMM_DEFAULT_UNROLL_M 1
  1187. #define SGEMM_DEFAULT_UNROLL_N 4
  1188. #define DGEMM_DEFAULT_UNROLL_N 4
  1189. #define QGEMM_DEFAULT_UNROLL_N 2
  1190. #define CGEMM_DEFAULT_UNROLL_N 2
  1191. #define ZGEMM_DEFAULT_UNROLL_N 4
  1192. #define XGEMM_DEFAULT_UNROLL_N 1
  1193. #endif
  1194. #define SGEMM_DEFAULT_P 768
  1195. #define SGEMM_DEFAULT_R sgemm_r
  1196. /*#define SGEMM_DEFAULT_R 1024*/
  1197. #define DGEMM_DEFAULT_P 512
  1198. #define DGEMM_DEFAULT_R dgemm_r
  1199. /*#define DGEMM_DEFAULT_R 1024*/
  1200. #define QGEMM_DEFAULT_P 504
  1201. #define QGEMM_DEFAULT_R qgemm_r
  1202. #define CGEMM_DEFAULT_P 768
  1203. #define CGEMM_DEFAULT_R cgemm_r
  1204. /*#define CGEMM_DEFAULT_R 1024*/
  1205. #define ZGEMM_DEFAULT_P 512
  1206. #define ZGEMM_DEFAULT_R zgemm_r
  1207. /*#define ZGEMM_DEFAULT_R 1024*/
  1208. #define XGEMM_DEFAULT_P 252
  1209. #define XGEMM_DEFAULT_R xgemm_r
  1210. #define SGEMM_DEFAULT_Q 384
  1211. #define DGEMM_DEFAULT_Q 256
  1212. #define QGEMM_DEFAULT_Q 128
  1213. #define CGEMM_DEFAULT_Q 512
  1214. #define ZGEMM_DEFAULT_Q 192
  1215. #define XGEMM_DEFAULT_Q 128
  1216. #define CGEMM3M_DEFAULT_UNROLL_N 8
  1217. #define CGEMM3M_DEFAULT_UNROLL_M 4
  1218. #define ZGEMM3M_DEFAULT_UNROLL_N 8
  1219. #define ZGEMM3M_DEFAULT_UNROLL_M 2
  1220. #define CGEMM3M_DEFAULT_P 448
  1221. #define ZGEMM3M_DEFAULT_P 224
  1222. #define XGEMM3M_DEFAULT_P 112
  1223. #define CGEMM3M_DEFAULT_Q 224
  1224. #define ZGEMM3M_DEFAULT_Q 224
  1225. #define XGEMM3M_DEFAULT_Q 224
  1226. #define CGEMM3M_DEFAULT_R 12288
  1227. #define ZGEMM3M_DEFAULT_R 12288
  1228. #define XGEMM3M_DEFAULT_R 12288
  1229. #define GETRF_FACTOR 0.72
  1230. #endif
  1231. #ifdef HASWELL
  1232. #define SNUMOPT 16
  1233. #define DNUMOPT 8
  1234. #define GEMM_DEFAULT_OFFSET_A 0
  1235. #define GEMM_DEFAULT_OFFSET_B 0
  1236. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1237. #define SYMV_P 8
  1238. #if defined(XDOUBLE) || defined(DOUBLE)
  1239. #define SWITCH_RATIO 4
  1240. #define GEMM_PREFERED_SIZE 4
  1241. #else
  1242. #define SWITCH_RATIO 8
  1243. #define GEMM_PREFERED_SIZE 8
  1244. #endif
  1245. #ifdef ARCH_X86
  1246. #define SGEMM_DEFAULT_UNROLL_M 4
  1247. #define DGEMM_DEFAULT_UNROLL_M 2
  1248. #define QGEMM_DEFAULT_UNROLL_M 2
  1249. #define CGEMM_DEFAULT_UNROLL_M 2
  1250. #define ZGEMM_DEFAULT_UNROLL_M 1
  1251. #define XGEMM_DEFAULT_UNROLL_M 1
  1252. #define SGEMM_DEFAULT_UNROLL_N 4
  1253. #define DGEMM_DEFAULT_UNROLL_N 4
  1254. #define QGEMM_DEFAULT_UNROLL_N 2
  1255. #define CGEMM_DEFAULT_UNROLL_N 2
  1256. #define ZGEMM_DEFAULT_UNROLL_N 2
  1257. #define XGEMM_DEFAULT_UNROLL_N 1
  1258. #else
  1259. #define SGEMM_DEFAULT_UNROLL_M 8
  1260. #define DGEMM_DEFAULT_UNROLL_M 4
  1261. #define QGEMM_DEFAULT_UNROLL_M 2
  1262. #define CGEMM_DEFAULT_UNROLL_M 8
  1263. #define ZGEMM_DEFAULT_UNROLL_M 4
  1264. #define XGEMM_DEFAULT_UNROLL_M 1
  1265. #define SGEMM_DEFAULT_UNROLL_N 4
  1266. #define DGEMM_DEFAULT_UNROLL_N 8
  1267. #define QGEMM_DEFAULT_UNROLL_N 2
  1268. #define CGEMM_DEFAULT_UNROLL_N 2
  1269. #define ZGEMM_DEFAULT_UNROLL_N 2
  1270. #define XGEMM_DEFAULT_UNROLL_N 1
  1271. /*
  1272. #define SGEMM_DEFAULT_UNROLL_MN 32
  1273. #define DGEMM_DEFAULT_UNROLL_MN 32
  1274. */
  1275. #endif
  1276. #ifdef ARCH_X86
  1277. #define SGEMM_DEFAULT_P 512
  1278. #define SGEMM_DEFAULT_R sgemm_r
  1279. #define DGEMM_DEFAULT_P 512
  1280. #define DGEMM_DEFAULT_R dgemm_r
  1281. #define QGEMM_DEFAULT_P 504
  1282. #define QGEMM_DEFAULT_R qgemm_r
  1283. #define CGEMM_DEFAULT_P 128
  1284. #define CGEMM_DEFAULT_R 1024
  1285. #define ZGEMM_DEFAULT_P 512
  1286. #define ZGEMM_DEFAULT_R zgemm_r
  1287. #define XGEMM_DEFAULT_P 252
  1288. #define XGEMM_DEFAULT_R xgemm_r
  1289. #define SGEMM_DEFAULT_Q 256
  1290. #define DGEMM_DEFAULT_Q 256
  1291. #define QGEMM_DEFAULT_Q 128
  1292. #define CGEMM_DEFAULT_Q 256
  1293. #define ZGEMM_DEFAULT_Q 192
  1294. #define XGEMM_DEFAULT_Q 128
  1295. #else
  1296. #define SGEMM_DEFAULT_P 320
  1297. #define DGEMM_DEFAULT_P 512
  1298. #define CGEMM_DEFAULT_P 256
  1299. #define ZGEMM_DEFAULT_P 192
  1300. #ifdef WINDOWS_ABI
  1301. #define SGEMM_DEFAULT_Q 320
  1302. #define DGEMM_DEFAULT_Q 128
  1303. #else
  1304. #define SGEMM_DEFAULT_Q 320
  1305. #define DGEMM_DEFAULT_Q 256
  1306. #endif
  1307. #define CGEMM_DEFAULT_Q 256
  1308. #define ZGEMM_DEFAULT_Q 192
  1309. #define SGEMM_DEFAULT_R sgemm_r
  1310. #define DGEMM_DEFAULT_R 13824
  1311. #define CGEMM_DEFAULT_R cgemm_r
  1312. #define ZGEMM_DEFAULT_R zgemm_r
  1313. #define QGEMM_DEFAULT_Q 128
  1314. #define QGEMM_DEFAULT_P 504
  1315. #define QGEMM_DEFAULT_R qgemm_r
  1316. #define XGEMM_DEFAULT_P 252
  1317. #define XGEMM_DEFAULT_R xgemm_r
  1318. #define XGEMM_DEFAULT_Q 128
  1319. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1320. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1321. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1322. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1323. #define CGEMM3M_DEFAULT_P 320
  1324. #define ZGEMM3M_DEFAULT_P 256
  1325. #define XGEMM3M_DEFAULT_P 112
  1326. #define CGEMM3M_DEFAULT_Q 320
  1327. #define ZGEMM3M_DEFAULT_Q 256
  1328. #define XGEMM3M_DEFAULT_Q 224
  1329. #define CGEMM3M_DEFAULT_R 12288
  1330. #define ZGEMM3M_DEFAULT_R 12288
  1331. #define XGEMM3M_DEFAULT_R 12288
  1332. #endif
  1333. #endif
  1334. #ifdef SKYLAKEX
  1335. #define SNUMOPT 16
  1336. #define DNUMOPT 8
  1337. #define GEMM_DEFAULT_OFFSET_A 0
  1338. #define GEMM_DEFAULT_OFFSET_B 0
  1339. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1340. #define SYMV_P 8
  1341. #if defined(XDOUBLE) || defined(DOUBLE)
  1342. #define SWITCH_RATIO 8
  1343. #define GEMM_PREFERED_SIZE 8
  1344. #else
  1345. #define SWITCH_RATIO 16
  1346. #define GEMM_PREFERED_SIZE 16
  1347. #endif
  1348. #define USE_SGEMM_KERNEL_DIRECT 1
  1349. #ifdef ARCH_X86
  1350. #define SGEMM_DEFAULT_UNROLL_M 4
  1351. #define DGEMM_DEFAULT_UNROLL_M 2
  1352. #define QGEMM_DEFAULT_UNROLL_M 2
  1353. #define CGEMM_DEFAULT_UNROLL_M 2
  1354. #define ZGEMM_DEFAULT_UNROLL_M 1
  1355. #define XGEMM_DEFAULT_UNROLL_M 1
  1356. #define SGEMM_DEFAULT_UNROLL_N 4
  1357. #define DGEMM_DEFAULT_UNROLL_N 4
  1358. #define QGEMM_DEFAULT_UNROLL_N 2
  1359. #define CGEMM_DEFAULT_UNROLL_N 2
  1360. #define ZGEMM_DEFAULT_UNROLL_N 2
  1361. #define XGEMM_DEFAULT_UNROLL_N 1
  1362. #else
  1363. #define SGEMM_DEFAULT_UNROLL_M 16
  1364. #define DGEMM_DEFAULT_UNROLL_M 16
  1365. #define QGEMM_DEFAULT_UNROLL_M 2
  1366. #define CGEMM_DEFAULT_UNROLL_M 8
  1367. #define ZGEMM_DEFAULT_UNROLL_M 4
  1368. #define XGEMM_DEFAULT_UNROLL_M 1
  1369. #define SGEMM_DEFAULT_UNROLL_N 4
  1370. #define DGEMM_DEFAULT_UNROLL_N 2
  1371. #define QGEMM_DEFAULT_UNROLL_N 2
  1372. #define CGEMM_DEFAULT_UNROLL_N 2
  1373. #define ZGEMM_DEFAULT_UNROLL_N 2
  1374. #define XGEMM_DEFAULT_UNROLL_N 1
  1375. #define SGEMM_DEFAULT_UNROLL_MN 32
  1376. #define DGEMM_DEFAULT_UNROLL_MN 32
  1377. #endif
  1378. #ifdef ARCH_X86
  1379. #define SGEMM_DEFAULT_P 512
  1380. #define SGEMM_DEFAULT_R sgemm_r
  1381. #define DGEMM_DEFAULT_P 512
  1382. #define DGEMM_DEFAULT_R dgemm_r
  1383. #define QGEMM_DEFAULT_P 504
  1384. #define QGEMM_DEFAULT_R qgemm_r
  1385. #define CGEMM_DEFAULT_P 128
  1386. #define CGEMM_DEFAULT_R 1024
  1387. #define ZGEMM_DEFAULT_P 512
  1388. #define ZGEMM_DEFAULT_R zgemm_r
  1389. #define XGEMM_DEFAULT_P 252
  1390. #define XGEMM_DEFAULT_R xgemm_r
  1391. #define SGEMM_DEFAULT_Q 256
  1392. #define DGEMM_DEFAULT_Q 256
  1393. #define QGEMM_DEFAULT_Q 128
  1394. #define CGEMM_DEFAULT_Q 256
  1395. #define ZGEMM_DEFAULT_Q 192
  1396. #define XGEMM_DEFAULT_Q 128
  1397. #else
  1398. #define SGEMM_DEFAULT_P 448
  1399. #define DGEMM_DEFAULT_P 192
  1400. #define CGEMM_DEFAULT_P 384
  1401. #define ZGEMM_DEFAULT_P 256
  1402. #define SGEMM_DEFAULT_Q 448
  1403. #define DGEMM_DEFAULT_Q 384
  1404. #define CGEMM_DEFAULT_Q 192
  1405. #define ZGEMM_DEFAULT_Q 128
  1406. #define SGEMM_DEFAULT_R sgemm_r
  1407. #define DGEMM_DEFAULT_R 8640
  1408. #define CGEMM_DEFAULT_R cgemm_r
  1409. #define ZGEMM_DEFAULT_R zgemm_r
  1410. #define QGEMM_DEFAULT_Q 128
  1411. #define QGEMM_DEFAULT_P 504
  1412. #define QGEMM_DEFAULT_R qgemm_r
  1413. #define XGEMM_DEFAULT_P 252
  1414. #define XGEMM_DEFAULT_R xgemm_r
  1415. #define XGEMM_DEFAULT_Q 128
  1416. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1417. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1418. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1419. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1420. #define CGEMM3M_DEFAULT_P 320
  1421. #define ZGEMM3M_DEFAULT_P 256
  1422. #define XGEMM3M_DEFAULT_P 112
  1423. #define CGEMM3M_DEFAULT_Q 320
  1424. #define ZGEMM3M_DEFAULT_Q 256
  1425. #define XGEMM3M_DEFAULT_Q 224
  1426. #define CGEMM3M_DEFAULT_R 12288
  1427. #define ZGEMM3M_DEFAULT_R 12288
  1428. #define XGEMM3M_DEFAULT_R 12288
  1429. #endif
  1430. #endif
  1431. #ifdef SAPPHIRERAPIDS
  1432. #define SNUMOPT 16
  1433. #define DNUMOPT 8
  1434. #define GEMM_DEFAULT_OFFSET_A 0
  1435. #define GEMM_DEFAULT_OFFSET_B 0
  1436. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1437. #define SYMV_P 8
  1438. #if defined(XDOUBLE) || defined(DOUBLE)
  1439. #define SWITCH_RATIO 8
  1440. #define GEMM_PREFERED_SIZE 8
  1441. #else
  1442. #define SWITCH_RATIO 16
  1443. #define GEMM_PREFERED_SIZE 16
  1444. #endif
  1445. #define USE_SGEMM_KERNEL_DIRECT 1
  1446. #undef SBGEMM_DEFAULT_UNROLL_N
  1447. #undef SBGEMM_DEFAULT_UNROLL_M
  1448. #undef SBGEMM_DEFAULT_P
  1449. #undef SBGEMM_DEFAULT_R
  1450. #undef SBGEMM_DEFAULT_Q
  1451. // FIXME: actually UNROLL_M = UNROLL_N = 16
  1452. // If M and N is equal, OpenBLAS will reuse OCOPY as ICOPY.
  1453. // But for AMX, they are not the same, set UNROLL_M = 32 to workaround
  1454. #define SBGEMM_DEFAULT_UNROLL_N 16
  1455. #define SBGEMM_DEFAULT_UNROLL_M 32
  1456. #define SBGEMM_DEFAULT_P 256
  1457. #define SBGEMM_DEFAULT_Q 1024
  1458. #define SBGEMM_DEFAULT_R sbgemm_r
  1459. #ifdef ARCH_X86
  1460. #define SGEMM_DEFAULT_UNROLL_M 4
  1461. #define DGEMM_DEFAULT_UNROLL_M 2
  1462. #define QGEMM_DEFAULT_UNROLL_M 2
  1463. #define CGEMM_DEFAULT_UNROLL_M 2
  1464. #define ZGEMM_DEFAULT_UNROLL_M 1
  1465. #define XGEMM_DEFAULT_UNROLL_M 1
  1466. #define SGEMM_DEFAULT_UNROLL_N 4
  1467. #define DGEMM_DEFAULT_UNROLL_N 4
  1468. #define QGEMM_DEFAULT_UNROLL_N 2
  1469. #define CGEMM_DEFAULT_UNROLL_N 2
  1470. #define ZGEMM_DEFAULT_UNROLL_N 2
  1471. #define XGEMM_DEFAULT_UNROLL_N 1
  1472. #else
  1473. #define SGEMM_DEFAULT_UNROLL_M 16
  1474. #define DGEMM_DEFAULT_UNROLL_M 16
  1475. #define QGEMM_DEFAULT_UNROLL_M 2
  1476. #define CGEMM_DEFAULT_UNROLL_M 8
  1477. #define ZGEMM_DEFAULT_UNROLL_M 4
  1478. #define XGEMM_DEFAULT_UNROLL_M 1
  1479. #define SGEMM_DEFAULT_UNROLL_N 4
  1480. #define DGEMM_DEFAULT_UNROLL_N 2
  1481. #define QGEMM_DEFAULT_UNROLL_N 2
  1482. #define CGEMM_DEFAULT_UNROLL_N 2
  1483. #define ZGEMM_DEFAULT_UNROLL_N 2
  1484. #define XGEMM_DEFAULT_UNROLL_N 1
  1485. #define SGEMM_DEFAULT_UNROLL_MN 32
  1486. #define DGEMM_DEFAULT_UNROLL_MN 32
  1487. #endif
  1488. #ifdef ARCH_X86
  1489. #define SGEMM_DEFAULT_P 512
  1490. #define SGEMM_DEFAULT_R sgemm_r
  1491. #define DGEMM_DEFAULT_P 512
  1492. #define DGEMM_DEFAULT_R dgemm_r
  1493. #define QGEMM_DEFAULT_P 504
  1494. #define QGEMM_DEFAULT_R qgemm_r
  1495. #define CGEMM_DEFAULT_P 128
  1496. #define CGEMM_DEFAULT_R 1024
  1497. #define ZGEMM_DEFAULT_P 512
  1498. #define ZGEMM_DEFAULT_R zgemm_r
  1499. #define XGEMM_DEFAULT_P 252
  1500. #define XGEMM_DEFAULT_R xgemm_r
  1501. #define SGEMM_DEFAULT_Q 256
  1502. #define DGEMM_DEFAULT_Q 256
  1503. #define QGEMM_DEFAULT_Q 128
  1504. #define CGEMM_DEFAULT_Q 256
  1505. #define ZGEMM_DEFAULT_Q 192
  1506. #define XGEMM_DEFAULT_Q 128
  1507. #else
  1508. #define SGEMM_DEFAULT_P 640
  1509. #define DGEMM_DEFAULT_P 192
  1510. #define CGEMM_DEFAULT_P 384
  1511. #define ZGEMM_DEFAULT_P 256
  1512. #define SGEMM_DEFAULT_Q 320
  1513. #define DGEMM_DEFAULT_Q 384
  1514. #define CGEMM_DEFAULT_Q 192
  1515. #define ZGEMM_DEFAULT_Q 128
  1516. #define SGEMM_DEFAULT_R sgemm_r
  1517. #define DGEMM_DEFAULT_R 8640
  1518. #define CGEMM_DEFAULT_R cgemm_r
  1519. #define ZGEMM_DEFAULT_R zgemm_r
  1520. #define QGEMM_DEFAULT_Q 128
  1521. #define QGEMM_DEFAULT_P 504
  1522. #define QGEMM_DEFAULT_R qgemm_r
  1523. #define XGEMM_DEFAULT_P 252
  1524. #define XGEMM_DEFAULT_R xgemm_r
  1525. #define XGEMM_DEFAULT_Q 128
  1526. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1527. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1528. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1529. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1530. #define CGEMM3M_DEFAULT_P 320
  1531. #define ZGEMM3M_DEFAULT_P 256
  1532. #define XGEMM3M_DEFAULT_P 112
  1533. #define CGEMM3M_DEFAULT_Q 320
  1534. #define ZGEMM3M_DEFAULT_Q 256
  1535. #define XGEMM3M_DEFAULT_Q 224
  1536. #define CGEMM3M_DEFAULT_R 12288
  1537. #define ZGEMM3M_DEFAULT_R 12288
  1538. #define XGEMM3M_DEFAULT_R 12288
  1539. #endif
  1540. #endif
  1541. #ifdef COOPERLAKE
  1542. #define SNUMOPT 16
  1543. #define DNUMOPT 8
  1544. #define GEMM_DEFAULT_OFFSET_A 0
  1545. #define GEMM_DEFAULT_OFFSET_B 0
  1546. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1547. #define SYMV_P 8
  1548. #if defined(XDOUBLE) || defined(DOUBLE)
  1549. #define SWITCH_RATIO 8
  1550. #define GEMM_PREFERED_SIZE 8
  1551. #else
  1552. #define SWITCH_RATIO 16
  1553. #define GEMM_PREFERED_SIZE 16
  1554. #endif
  1555. #define USE_SGEMM_KERNEL_DIRECT 1
  1556. #undef SBGEMM_DEFAULT_UNROLL_N
  1557. #undef SBGEMM_DEFAULT_UNROLL_M
  1558. #undef SBGEMM_DEFAULT_P
  1559. #undef SBGEMM_DEFAULT_R
  1560. #undef SBGEMM_DEFAULT_Q
  1561. #define SBGEMM_DEFAULT_UNROLL_N 4
  1562. #define SBGEMM_DEFAULT_UNROLL_M 16
  1563. #define SBGEMM_DEFAULT_P 384
  1564. #define SBGEMM_DEFAULT_Q 768
  1565. #define SBGEMM_DEFAULT_R sbgemm_r
  1566. #ifdef ARCH_X86
  1567. #define SGEMM_DEFAULT_UNROLL_M 4
  1568. #define DGEMM_DEFAULT_UNROLL_M 2
  1569. #define QGEMM_DEFAULT_UNROLL_M 2
  1570. #define CGEMM_DEFAULT_UNROLL_M 2
  1571. #define ZGEMM_DEFAULT_UNROLL_M 1
  1572. #define XGEMM_DEFAULT_UNROLL_M 1
  1573. #define SGEMM_DEFAULT_UNROLL_N 4
  1574. #define DGEMM_DEFAULT_UNROLL_N 4
  1575. #define QGEMM_DEFAULT_UNROLL_N 2
  1576. #define CGEMM_DEFAULT_UNROLL_N 2
  1577. #define ZGEMM_DEFAULT_UNROLL_N 2
  1578. #define XGEMM_DEFAULT_UNROLL_N 1
  1579. #else
  1580. #define SGEMM_DEFAULT_UNROLL_M 16
  1581. #define DGEMM_DEFAULT_UNROLL_M 16
  1582. #define QGEMM_DEFAULT_UNROLL_M 2
  1583. #define CGEMM_DEFAULT_UNROLL_M 8
  1584. #define ZGEMM_DEFAULT_UNROLL_M 4
  1585. #define XGEMM_DEFAULT_UNROLL_M 1
  1586. #define SGEMM_DEFAULT_UNROLL_N 4
  1587. #define DGEMM_DEFAULT_UNROLL_N 2
  1588. #define QGEMM_DEFAULT_UNROLL_N 2
  1589. #define CGEMM_DEFAULT_UNROLL_N 2
  1590. #define ZGEMM_DEFAULT_UNROLL_N 2
  1591. #define XGEMM_DEFAULT_UNROLL_N 1
  1592. #define SGEMM_DEFAULT_UNROLL_MN 32
  1593. #define DGEMM_DEFAULT_UNROLL_MN 32
  1594. #endif
  1595. #ifdef ARCH_X86
  1596. #define SGEMM_DEFAULT_P 512
  1597. #define SGEMM_DEFAULT_R sgemm_r
  1598. #define DGEMM_DEFAULT_P 512
  1599. #define DGEMM_DEFAULT_R dgemm_r
  1600. #define QGEMM_DEFAULT_P 504
  1601. #define QGEMM_DEFAULT_R qgemm_r
  1602. #define CGEMM_DEFAULT_P 128
  1603. #define CGEMM_DEFAULT_R 1024
  1604. #define ZGEMM_DEFAULT_P 512
  1605. #define ZGEMM_DEFAULT_R zgemm_r
  1606. #define XGEMM_DEFAULT_P 252
  1607. #define XGEMM_DEFAULT_R xgemm_r
  1608. #define SGEMM_DEFAULT_Q 256
  1609. #define DGEMM_DEFAULT_Q 256
  1610. #define QGEMM_DEFAULT_Q 128
  1611. #define CGEMM_DEFAULT_Q 256
  1612. #define ZGEMM_DEFAULT_Q 192
  1613. #define XGEMM_DEFAULT_Q 128
  1614. #else
  1615. #define SGEMM_DEFAULT_P 640
  1616. #define DGEMM_DEFAULT_P 192
  1617. #define CGEMM_DEFAULT_P 384
  1618. #define ZGEMM_DEFAULT_P 256
  1619. #define SGEMM_DEFAULT_Q 320
  1620. #define DGEMM_DEFAULT_Q 384
  1621. #define CGEMM_DEFAULT_Q 192
  1622. #define ZGEMM_DEFAULT_Q 128
  1623. #define SGEMM_DEFAULT_R sgemm_r
  1624. #define DGEMM_DEFAULT_R 8640
  1625. #define CGEMM_DEFAULT_R cgemm_r
  1626. #define ZGEMM_DEFAULT_R zgemm_r
  1627. #define QGEMM_DEFAULT_Q 128
  1628. #define QGEMM_DEFAULT_P 504
  1629. #define QGEMM_DEFAULT_R qgemm_r
  1630. #define XGEMM_DEFAULT_P 252
  1631. #define XGEMM_DEFAULT_R xgemm_r
  1632. #define XGEMM_DEFAULT_Q 128
  1633. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1634. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1635. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1636. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1637. #define CGEMM3M_DEFAULT_P 320
  1638. #define ZGEMM3M_DEFAULT_P 256
  1639. #define XGEMM3M_DEFAULT_P 112
  1640. #define CGEMM3M_DEFAULT_Q 320
  1641. #define ZGEMM3M_DEFAULT_Q 256
  1642. #define XGEMM3M_DEFAULT_Q 224
  1643. #define CGEMM3M_DEFAULT_R 12288
  1644. #define ZGEMM3M_DEFAULT_R 12288
  1645. #define XGEMM3M_DEFAULT_R 12288
  1646. #endif
  1647. #endif
  1648. #ifdef ATOM
  1649. #define SNUMOPT 2
  1650. #define DNUMOPT 1
  1651. #define GEMM_DEFAULT_OFFSET_A 64
  1652. #define GEMM_DEFAULT_OFFSET_B 0
  1653. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1654. #define SYMV_P 8
  1655. #ifdef ARCH_X86
  1656. #define SGEMM_DEFAULT_UNROLL_M 4
  1657. #define DGEMM_DEFAULT_UNROLL_M 2
  1658. #define QGEMM_DEFAULT_UNROLL_M 2
  1659. #define CGEMM_DEFAULT_UNROLL_M 2
  1660. #define ZGEMM_DEFAULT_UNROLL_M 1
  1661. #define XGEMM_DEFAULT_UNROLL_M 1
  1662. #else
  1663. #define SGEMM_DEFAULT_UNROLL_M 8
  1664. #define DGEMM_DEFAULT_UNROLL_M 4
  1665. #define QGEMM_DEFAULT_UNROLL_M 2
  1666. #define CGEMM_DEFAULT_UNROLL_M 4
  1667. #define ZGEMM_DEFAULT_UNROLL_M 2
  1668. #define XGEMM_DEFAULT_UNROLL_M 1
  1669. #endif
  1670. #define SGEMM_DEFAULT_UNROLL_N 4
  1671. #define DGEMM_DEFAULT_UNROLL_N 2
  1672. #define QGEMM_DEFAULT_UNROLL_N 2
  1673. #define CGEMM_DEFAULT_UNROLL_N 2
  1674. #define ZGEMM_DEFAULT_UNROLL_N 1
  1675. #define XGEMM_DEFAULT_UNROLL_N 1
  1676. #define SGEMM_DEFAULT_P sgemm_p
  1677. #define SGEMM_DEFAULT_R sgemm_r
  1678. #define DGEMM_DEFAULT_P dgemm_p
  1679. #define DGEMM_DEFAULT_R dgemm_r
  1680. #define QGEMM_DEFAULT_P qgemm_p
  1681. #define QGEMM_DEFAULT_R qgemm_r
  1682. #define CGEMM_DEFAULT_P cgemm_p
  1683. #define CGEMM_DEFAULT_R cgemm_r
  1684. #define ZGEMM_DEFAULT_P zgemm_p
  1685. #define ZGEMM_DEFAULT_R zgemm_r
  1686. #define XGEMM_DEFAULT_P xgemm_p
  1687. #define XGEMM_DEFAULT_R xgemm_r
  1688. #define SGEMM_DEFAULT_Q 256
  1689. #define DGEMM_DEFAULT_Q 256
  1690. #define QGEMM_DEFAULT_Q 256
  1691. #define CGEMM_DEFAULT_Q 256
  1692. #define ZGEMM_DEFAULT_Q 256
  1693. #define XGEMM_DEFAULT_Q 256
  1694. #endif
  1695. #ifdef ITANIUM2
  1696. #define SNUMOPT 4
  1697. #define DNUMOPT 4
  1698. #define GEMM_DEFAULT_OFFSET_A 0
  1699. #define GEMM_DEFAULT_OFFSET_B 128
  1700. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1701. #define SGEMM_DEFAULT_UNROLL_M 8
  1702. #define SGEMM_DEFAULT_UNROLL_N 8
  1703. #define DGEMM_DEFAULT_UNROLL_M 8
  1704. #define DGEMM_DEFAULT_UNROLL_N 8
  1705. #define QGEMM_DEFAULT_UNROLL_M 8
  1706. #define QGEMM_DEFAULT_UNROLL_N 8
  1707. #define CGEMM_DEFAULT_UNROLL_M 4
  1708. #define CGEMM_DEFAULT_UNROLL_N 4
  1709. #define ZGEMM_DEFAULT_UNROLL_M 4
  1710. #define ZGEMM_DEFAULT_UNROLL_N 4
  1711. #define XGEMM_DEFAULT_UNROLL_M 4
  1712. #define XGEMM_DEFAULT_UNROLL_N 4
  1713. #define SGEMM_DEFAULT_P sgemm_p
  1714. #define DGEMM_DEFAULT_P dgemm_p
  1715. #define QGEMM_DEFAULT_P qgemm_p
  1716. #define CGEMM_DEFAULT_P cgemm_p
  1717. #define ZGEMM_DEFAULT_P zgemm_p
  1718. #define XGEMM_DEFAULT_P xgemm_p
  1719. #define SGEMM_DEFAULT_Q 1024
  1720. #define DGEMM_DEFAULT_Q 1024
  1721. #define QGEMM_DEFAULT_Q 1024
  1722. #define CGEMM_DEFAULT_Q 1024
  1723. #define ZGEMM_DEFAULT_Q 1024
  1724. #define XGEMM_DEFAULT_Q 1024
  1725. #define SGEMM_DEFAULT_R sgemm_r
  1726. #define DGEMM_DEFAULT_R dgemm_r
  1727. #define QGEMM_DEFAULT_R qgemm_r
  1728. #define CGEMM_DEFAULT_R cgemm_r
  1729. #define ZGEMM_DEFAULT_R zgemm_r
  1730. #define XGEMM_DEFAULT_R xgemm_r
  1731. #define SYMV_P 16
  1732. #define GETRF_FACTOR 0.65
  1733. #endif
  1734. #if defined(EV4) || defined(EV5) || defined(EV6)
  1735. #ifdef EV4
  1736. #define SNUMOPT 1
  1737. #define DNUMOPT 1
  1738. #else
  1739. #define SNUMOPT 2
  1740. #define DNUMOPT 2
  1741. #endif
  1742. #define GEMM_DEFAULT_OFFSET_A 512
  1743. #define GEMM_DEFAULT_OFFSET_B 512
  1744. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1745. #define SGEMM_DEFAULT_UNROLL_M 4
  1746. #define SGEMM_DEFAULT_UNROLL_N 4
  1747. #define DGEMM_DEFAULT_UNROLL_M 4
  1748. #define DGEMM_DEFAULT_UNROLL_N 4
  1749. #define CGEMM_DEFAULT_UNROLL_M 2
  1750. #define CGEMM_DEFAULT_UNROLL_N 2
  1751. #define ZGEMM_DEFAULT_UNROLL_M 2
  1752. #define ZGEMM_DEFAULT_UNROLL_N 2
  1753. #define SYMV_P 8
  1754. #ifdef EV4
  1755. #define SGEMM_DEFAULT_P 32
  1756. #define SGEMM_DEFAULT_Q 112
  1757. #define SGEMM_DEFAULT_R 256
  1758. #define DGEMM_DEFAULT_P 32
  1759. #define DGEMM_DEFAULT_Q 56
  1760. #define DGEMM_DEFAULT_R 256
  1761. #define CGEMM_DEFAULT_P 32
  1762. #define CGEMM_DEFAULT_Q 64
  1763. #define CGEMM_DEFAULT_R 240
  1764. #define ZGEMM_DEFAULT_P 32
  1765. #define ZGEMM_DEFAULT_Q 32
  1766. #define ZGEMM_DEFAULT_R 240
  1767. #endif
  1768. #ifdef EV5
  1769. #define SGEMM_DEFAULT_P 64
  1770. #define SGEMM_DEFAULT_Q 256
  1771. #define DGEMM_DEFAULT_P 64
  1772. #define DGEMM_DEFAULT_Q 128
  1773. #define CGEMM_DEFAULT_P 64
  1774. #define CGEMM_DEFAULT_Q 128
  1775. #define ZGEMM_DEFAULT_P 64
  1776. #define ZGEMM_DEFAULT_Q 64
  1777. #endif
  1778. #ifdef EV6
  1779. #define SGEMM_DEFAULT_P 256
  1780. #define SGEMM_DEFAULT_Q 512
  1781. #define DGEMM_DEFAULT_P 256
  1782. #define DGEMM_DEFAULT_Q 256
  1783. #define CGEMM_DEFAULT_P 256
  1784. #define CGEMM_DEFAULT_Q 256
  1785. #define ZGEMM_DEFAULT_P 128
  1786. #define ZGEMM_DEFAULT_Q 256
  1787. #endif
  1788. #endif
  1789. #ifdef CELL
  1790. #define SNUMOPT 2
  1791. #define DNUMOPT 2
  1792. #define GEMM_DEFAULT_OFFSET_A 0
  1793. #define GEMM_DEFAULT_OFFSET_B 8192
  1794. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1795. #define SGEMM_DEFAULT_UNROLL_M 16
  1796. #define SGEMM_DEFAULT_UNROLL_N 4
  1797. #define DGEMM_DEFAULT_UNROLL_M 4
  1798. #define DGEMM_DEFAULT_UNROLL_N 4
  1799. #define CGEMM_DEFAULT_UNROLL_M 8
  1800. #define CGEMM_DEFAULT_UNROLL_N 2
  1801. #define ZGEMM_DEFAULT_UNROLL_M 2
  1802. #define ZGEMM_DEFAULT_UNROLL_N 2
  1803. #define SGEMM_DEFAULT_P 128
  1804. #define DGEMM_DEFAULT_P 128
  1805. #define CGEMM_DEFAULT_P 128
  1806. #define ZGEMM_DEFAULT_P 128
  1807. #define SGEMM_DEFAULT_Q 512
  1808. #define DGEMM_DEFAULT_Q 256
  1809. #define CGEMM_DEFAULT_Q 256
  1810. #define ZGEMM_DEFAULT_Q 128
  1811. #define SYMV_P 4
  1812. #endif
  1813. #ifdef PPCG4
  1814. #define GEMM_DEFAULT_OFFSET_A 0
  1815. #define GEMM_DEFAULT_OFFSET_B 1024
  1816. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1817. #define SGEMM_DEFAULT_UNROLL_M 4
  1818. #define SGEMM_DEFAULT_UNROLL_N 4
  1819. #define DGEMM_DEFAULT_UNROLL_M 4
  1820. #define DGEMM_DEFAULT_UNROLL_N 4
  1821. #define CGEMM_DEFAULT_UNROLL_M 2
  1822. #define CGEMM_DEFAULT_UNROLL_N 2
  1823. #define ZGEMM_DEFAULT_UNROLL_M 2
  1824. #define ZGEMM_DEFAULT_UNROLL_N 2
  1825. #define SGEMM_DEFAULT_P 256
  1826. #define DGEMM_DEFAULT_P 128
  1827. #define CGEMM_DEFAULT_P 128
  1828. #define ZGEMM_DEFAULT_P 64
  1829. #define SGEMM_DEFAULT_Q 256
  1830. #define DGEMM_DEFAULT_Q 256
  1831. #define CGEMM_DEFAULT_Q 256
  1832. #define ZGEMM_DEFAULT_Q 256
  1833. #define SYMV_P 4
  1834. #endif
  1835. #ifdef PPC970
  1836. #define SNUMOPT 4
  1837. #define DNUMOPT 4
  1838. #define GEMM_DEFAULT_OFFSET_A 2688
  1839. #define GEMM_DEFAULT_OFFSET_B 3072
  1840. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1841. #if defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
  1842. #define SGEMM_DEFAULT_UNROLL_M 4
  1843. #else
  1844. #define SGEMM_DEFAULT_UNROLL_M 16
  1845. #endif
  1846. #define SGEMM_DEFAULT_UNROLL_N 4
  1847. #define DGEMM_DEFAULT_UNROLL_M 4
  1848. #define DGEMM_DEFAULT_UNROLL_N 4
  1849. #if defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
  1850. #define CGEMM_DEFAULT_UNROLL_M 2
  1851. #else
  1852. #define CGEMM_DEFAULT_UNROLL_M 8
  1853. #endif
  1854. #define CGEMM_DEFAULT_UNROLL_N 2
  1855. #define ZGEMM_DEFAULT_UNROLL_M 2
  1856. #define ZGEMM_DEFAULT_UNROLL_N 2
  1857. #if defined(OS_LINUX) || defined(OS_DARWIN) || defined(OS_FREEBSD)
  1858. #if L2_SIZE == 1024976
  1859. #define SGEMM_DEFAULT_P 320
  1860. #define DGEMM_DEFAULT_P 256
  1861. #define CGEMM_DEFAULT_P 256
  1862. #define ZGEMM_DEFAULT_P 256
  1863. #else
  1864. #define SGEMM_DEFAULT_P 176
  1865. #define DGEMM_DEFAULT_P 176
  1866. #define CGEMM_DEFAULT_P 176
  1867. #define ZGEMM_DEFAULT_P 176
  1868. #endif
  1869. #endif
  1870. #define SGEMM_DEFAULT_Q 512
  1871. #define DGEMM_DEFAULT_Q 256
  1872. #define CGEMM_DEFAULT_Q 256
  1873. #define ZGEMM_DEFAULT_Q 128
  1874. #define SYMV_P 4
  1875. #endif
  1876. #ifdef PPC440
  1877. #define SNUMOPT 2
  1878. #define DNUMOPT 2
  1879. #define GEMM_DEFAULT_OFFSET_A (32 * 0)
  1880. #define GEMM_DEFAULT_OFFSET_B (32 * 0)
  1881. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1882. #define SGEMM_DEFAULT_UNROLL_M 4
  1883. #define SGEMM_DEFAULT_UNROLL_N 4
  1884. #define DGEMM_DEFAULT_UNROLL_M 4
  1885. #define DGEMM_DEFAULT_UNROLL_N 4
  1886. #define CGEMM_DEFAULT_UNROLL_M 2
  1887. #define CGEMM_DEFAULT_UNROLL_N 2
  1888. #define ZGEMM_DEFAULT_UNROLL_M 2
  1889. #define ZGEMM_DEFAULT_UNROLL_N 2
  1890. #define SGEMM_DEFAULT_P 512
  1891. #define DGEMM_DEFAULT_P 512
  1892. #define CGEMM_DEFAULT_P 512
  1893. #define ZGEMM_DEFAULT_P 512
  1894. #define SGEMM_DEFAULT_Q 1024
  1895. #define DGEMM_DEFAULT_Q 512
  1896. #define CGEMM_DEFAULT_Q 512
  1897. #define ZGEMM_DEFAULT_Q 256
  1898. #define SGEMM_DEFAULT_R SGEMM_DEFAULT_P
  1899. #define DGEMM_DEFAULT_R DGEMM_DEFAULT_P
  1900. #define CGEMM_DEFAULT_R CGEMM_DEFAULT_P
  1901. #define ZGEMM_DEFAULT_R ZGEMM_DEFAULT_P
  1902. #define SYMV_P 4
  1903. #endif
  1904. #ifdef PPC440FP2
  1905. #define SNUMOPT 4
  1906. #define DNUMOPT 4
  1907. #define GEMM_DEFAULT_OFFSET_A (32 * 0)
  1908. #define GEMM_DEFAULT_OFFSET_B (32 * 0)
  1909. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1910. #define SGEMM_DEFAULT_UNROLL_M 8
  1911. #define SGEMM_DEFAULT_UNROLL_N 4
  1912. #define DGEMM_DEFAULT_UNROLL_M 8
  1913. #define DGEMM_DEFAULT_UNROLL_N 4
  1914. #define CGEMM_DEFAULT_UNROLL_M 4
  1915. #define CGEMM_DEFAULT_UNROLL_N 2
  1916. #define ZGEMM_DEFAULT_UNROLL_M 4
  1917. #define ZGEMM_DEFAULT_UNROLL_N 2
  1918. #define SGEMM_DEFAULT_P 128
  1919. #define DGEMM_DEFAULT_P 128
  1920. #define CGEMM_DEFAULT_P 128
  1921. #define ZGEMM_DEFAULT_P 128
  1922. #if 1
  1923. #define SGEMM_DEFAULT_Q 4096
  1924. #define DGEMM_DEFAULT_Q 3072
  1925. #define CGEMM_DEFAULT_Q 2048
  1926. #define ZGEMM_DEFAULT_Q 1024
  1927. #else
  1928. #define SGEMM_DEFAULT_Q 512
  1929. #define DGEMM_DEFAULT_Q 256
  1930. #define CGEMM_DEFAULT_Q 256
  1931. #define ZGEMM_DEFAULT_Q 128
  1932. #endif
  1933. #define SYMV_P 4
  1934. #endif
  1935. #if defined(POWER3) || defined(POWER4) || defined(POWER5)
  1936. #define GEMM_DEFAULT_OFFSET_A 0
  1937. #define GEMM_DEFAULT_OFFSET_B 2048
  1938. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1939. #define SGEMM_DEFAULT_UNROLL_M 4
  1940. #define SGEMM_DEFAULT_UNROLL_N 4
  1941. #define DGEMM_DEFAULT_UNROLL_M 4
  1942. #define DGEMM_DEFAULT_UNROLL_N 4
  1943. #define CGEMM_DEFAULT_UNROLL_M 2
  1944. #define CGEMM_DEFAULT_UNROLL_N 2
  1945. #define ZGEMM_DEFAULT_UNROLL_M 2
  1946. #define ZGEMM_DEFAULT_UNROLL_N 2
  1947. #ifdef POWER3
  1948. #define SNUMOPT 4
  1949. #define DNUMOPT 4
  1950. #define SGEMM_DEFAULT_P 256
  1951. #define SGEMM_DEFAULT_Q 432
  1952. #define SGEMM_DEFAULT_R 1012
  1953. #define DGEMM_DEFAULT_P 256
  1954. #define DGEMM_DEFAULT_Q 216
  1955. #define DGEMM_DEFAULT_R 1012
  1956. #define CGEMM_DEFAULT_P 256
  1957. #define CGEMM_DEFAULT_Q 104
  1958. #define CGEMM_DEFAULT_R 1012
  1959. #define ZGEMM_DEFAULT_P 256
  1960. #define ZGEMM_DEFAULT_Q 104
  1961. #define ZGEMM_DEFAULT_R 1012
  1962. #endif
  1963. #if defined(POWER4)
  1964. #ifdef ALLOC_HUGETLB
  1965. #define SGEMM_DEFAULT_P 184
  1966. #define DGEMM_DEFAULT_P 184
  1967. #define CGEMM_DEFAULT_P 184
  1968. #define ZGEMM_DEFAULT_P 184
  1969. #else
  1970. #define SGEMM_DEFAULT_P 144
  1971. #define DGEMM_DEFAULT_P 144
  1972. #define CGEMM_DEFAULT_P 144
  1973. #define ZGEMM_DEFAULT_P 144
  1974. #endif
  1975. #define SGEMM_DEFAULT_Q 256
  1976. #define CGEMM_DEFAULT_Q 256
  1977. #define DGEMM_DEFAULT_Q 256
  1978. #define ZGEMM_DEFAULT_Q 256
  1979. #endif
  1980. #if defined(POWER5)
  1981. #ifdef ALLOC_HUGETLB
  1982. #define SGEMM_DEFAULT_P 512
  1983. #define DGEMM_DEFAULT_P 256
  1984. #define CGEMM_DEFAULT_P 256
  1985. #define ZGEMM_DEFAULT_P 128
  1986. #else
  1987. #define SGEMM_DEFAULT_P 320
  1988. #define DGEMM_DEFAULT_P 160
  1989. #define CGEMM_DEFAULT_P 160
  1990. #define ZGEMM_DEFAULT_P 80
  1991. #endif
  1992. #define SGEMM_DEFAULT_Q 256
  1993. #define CGEMM_DEFAULT_Q 256
  1994. #define DGEMM_DEFAULT_Q 256
  1995. #define ZGEMM_DEFAULT_Q 256
  1996. #endif
  1997. #define SYMV_P 8
  1998. #endif
  1999. #if defined(POWER6)
  2000. #define SNUMOPT 4
  2001. #define DNUMOPT 4
  2002. #define GEMM_DEFAULT_OFFSET_A 384
  2003. #define GEMM_DEFAULT_OFFSET_B 1024
  2004. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2005. #define SGEMM_DEFAULT_UNROLL_M 4
  2006. #define SGEMM_DEFAULT_UNROLL_N 4
  2007. #define DGEMM_DEFAULT_UNROLL_M 4
  2008. #define DGEMM_DEFAULT_UNROLL_N 4
  2009. #define CGEMM_DEFAULT_UNROLL_M 2
  2010. #define CGEMM_DEFAULT_UNROLL_N 4
  2011. #define ZGEMM_DEFAULT_UNROLL_M 2
  2012. #define ZGEMM_DEFAULT_UNROLL_N 4
  2013. #define SGEMM_DEFAULT_P 992
  2014. #define DGEMM_DEFAULT_P 480
  2015. #define CGEMM_DEFAULT_P 488
  2016. #define ZGEMM_DEFAULT_P 248
  2017. #define SGEMM_DEFAULT_Q 504
  2018. #define DGEMM_DEFAULT_Q 504
  2019. #define CGEMM_DEFAULT_Q 400
  2020. #define ZGEMM_DEFAULT_Q 400
  2021. #define SYMV_P 8
  2022. #endif
  2023. #if defined(POWER8) || (defined(POWER9) && defined(OS_AIX))
  2024. #define SNUMOPT 16
  2025. #define DNUMOPT 8
  2026. #define GEMM_DEFAULT_OFFSET_A 0
  2027. #define GEMM_DEFAULT_OFFSET_B 65536
  2028. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2029. #if defined(__32BIT__)
  2030. #warning using BINARY32==POWER6
  2031. #define SGEMM_DEFAULT_UNROLL_M 4
  2032. #define SGEMM_DEFAULT_UNROLL_N 4
  2033. #define DGEMM_DEFAULT_UNROLL_M 4
  2034. #define DGEMM_DEFAULT_UNROLL_N 4
  2035. #define CGEMM_DEFAULT_UNROLL_M 2
  2036. #define CGEMM_DEFAULT_UNROLL_N 4
  2037. #define ZGEMM_DEFAULT_UNROLL_M 2
  2038. #define ZGEMM_DEFAULT_UNROLL_N 4
  2039. #else
  2040. #define SGEMM_DEFAULT_UNROLL_M 16
  2041. #define SGEMM_DEFAULT_UNROLL_N 8
  2042. #define DGEMM_DEFAULT_UNROLL_M 16
  2043. #define DGEMM_DEFAULT_UNROLL_N 4
  2044. #define CGEMM_DEFAULT_UNROLL_M 8
  2045. #define CGEMM_DEFAULT_UNROLL_N 4
  2046. #define ZGEMM_DEFAULT_UNROLL_M 8
  2047. #define ZGEMM_DEFAULT_UNROLL_N 2
  2048. #endif
  2049. #define SGEMM_DEFAULT_P 1280UL
  2050. #define DGEMM_DEFAULT_P 640UL
  2051. #define CGEMM_DEFAULT_P 640UL
  2052. #define ZGEMM_DEFAULT_P 320UL
  2053. #define SGEMM_DEFAULT_Q 640UL
  2054. #define DGEMM_DEFAULT_Q 720UL
  2055. #define CGEMM_DEFAULT_Q 640UL
  2056. #define ZGEMM_DEFAULT_Q 640UL
  2057. #if 0
  2058. #define SGEMM_DEFAULT_R SGEMM_DEFAULT_P
  2059. #define DGEMM_DEFAULT_R DGEMM_DEFAULT_P
  2060. #define CGEMM_DEFAULT_R CGEMM_DEFAULT_P
  2061. #define ZGEMM_DEFAULT_R ZGEMM_DEFAULT_P
  2062. #endif
  2063. #define SGEMM_DEFAULT_R 4096
  2064. #define DGEMM_DEFAULT_R 4096
  2065. #define CGEMM_DEFAULT_R 4096
  2066. #define ZGEMM_DEFAULT_R 4096
  2067. #define SYMV_P 8
  2068. #endif
  2069. #if defined(POWER9) && (defined(OS_LINUX) || defined(OS_FREEBSD))
  2070. #define SNUMOPT 16
  2071. #define DNUMOPT 8
  2072. #define GEMM_DEFAULT_OFFSET_A 0
  2073. #define GEMM_DEFAULT_OFFSET_B 65536
  2074. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2075. #define SWITCH_RATIO 16
  2076. #define GEMM_PREFERED_SIZE 16
  2077. #define SGEMM_DEFAULT_UNROLL_M 16
  2078. #define SGEMM_DEFAULT_UNROLL_N 8
  2079. #define DGEMM_DEFAULT_UNROLL_M 16
  2080. #define DGEMM_DEFAULT_UNROLL_N 4
  2081. #define CGEMM_DEFAULT_UNROLL_M 8
  2082. #define CGEMM_DEFAULT_UNROLL_N 4
  2083. #define ZGEMM_DEFAULT_UNROLL_M 8
  2084. #define ZGEMM_DEFAULT_UNROLL_N 2
  2085. #define SGEMM_DEFAULT_P 832
  2086. #define DGEMM_DEFAULT_P 128
  2087. #define CGEMM_DEFAULT_P 512
  2088. #define ZGEMM_DEFAULT_P 256
  2089. #define SGEMM_DEFAULT_Q 1026
  2090. #define DGEMM_DEFAULT_Q 384
  2091. #define CGEMM_DEFAULT_Q 1026
  2092. #define ZGEMM_DEFAULT_Q 1026
  2093. #define SGEMM_DEFAULT_R 4096
  2094. #define DGEMM_DEFAULT_R 4096
  2095. #define CGEMM_DEFAULT_R 4096
  2096. #define ZGEMM_DEFAULT_R 4096
  2097. #define SYMV_P 8
  2098. #endif
  2099. #if defined(POWER10)
  2100. #define SNUMOPT 16
  2101. #define DNUMOPT 8
  2102. #define GEMM_DEFAULT_OFFSET_A 0
  2103. #define GEMM_DEFAULT_OFFSET_B 65536
  2104. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2105. #define SWITCH_RATIO 16
  2106. #define GEMM_PREFERED_SIZE 16
  2107. #define SGEMM_DEFAULT_UNROLL_M 16
  2108. #define SGEMM_DEFAULT_UNROLL_N 8
  2109. #define DGEMM_DEFAULT_UNROLL_M 8
  2110. #define DGEMM_DEFAULT_UNROLL_N 8
  2111. #define CGEMM_DEFAULT_UNROLL_M 8
  2112. #define CGEMM_DEFAULT_UNROLL_N 4
  2113. #define ZGEMM_DEFAULT_UNROLL_M 8
  2114. #define ZGEMM_DEFAULT_UNROLL_N 2
  2115. #define SGEMM_DEFAULT_P 512
  2116. #define DGEMM_DEFAULT_P 384
  2117. #define CGEMM_DEFAULT_P 512
  2118. #define ZGEMM_DEFAULT_P 256
  2119. #define SGEMM_DEFAULT_Q 512
  2120. #define DGEMM_DEFAULT_Q 512
  2121. #define CGEMM_DEFAULT_Q 384
  2122. #define ZGEMM_DEFAULT_Q 384
  2123. #define SGEMM_DEFAULT_R 4096
  2124. #define DGEMM_DEFAULT_R 4096
  2125. #define CGEMM_DEFAULT_R 4096
  2126. #define ZGEMM_DEFAULT_R 4096
  2127. #define SYMV_P 8
  2128. #undef SBGEMM_DEFAULT_UNROLL_N
  2129. #undef SBGEMM_DEFAULT_UNROLL_M
  2130. #undef SBGEMM_DEFAULT_P
  2131. #undef SBGEMM_DEFAULT_R
  2132. #undef SBGEMM_DEFAULT_Q
  2133. #define SBGEMM_DEFAULT_UNROLL_M 16
  2134. #define SBGEMM_DEFAULT_UNROLL_N 8
  2135. #define SBGEMM_DEFAULT_P 512
  2136. #define SBGEMM_DEFAULT_Q 1024
  2137. #define SBGEMM_DEFAULT_R 4096
  2138. #endif
  2139. #if defined(SPARC) && defined(V7)
  2140. #define SNUMOPT 4
  2141. #define DNUMOPT 4
  2142. #define GEMM_DEFAULT_OFFSET_A 0
  2143. #define GEMM_DEFAULT_OFFSET_B 2048
  2144. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2145. #define SGEMM_DEFAULT_UNROLL_M 2
  2146. #define SGEMM_DEFAULT_UNROLL_N 8
  2147. #define DGEMM_DEFAULT_UNROLL_M 2
  2148. #define DGEMM_DEFAULT_UNROLL_N 8
  2149. #define CGEMM_DEFAULT_UNROLL_M 1
  2150. #define CGEMM_DEFAULT_UNROLL_N 4
  2151. #define ZGEMM_DEFAULT_UNROLL_M 1
  2152. #define ZGEMM_DEFAULT_UNROLL_N 4
  2153. #define SGEMM_DEFAULT_P 256
  2154. #define DGEMM_DEFAULT_P 256
  2155. #define CGEMM_DEFAULT_P 256
  2156. #define ZGEMM_DEFAULT_P 256
  2157. #define SGEMM_DEFAULT_Q 512
  2158. #define DGEMM_DEFAULT_Q 256
  2159. #define CGEMM_DEFAULT_Q 256
  2160. #define ZGEMM_DEFAULT_Q 128
  2161. #define SYMV_P 8
  2162. #define GEMM_THREAD gemm_thread_mn
  2163. #endif
  2164. #if (defined(SPARC) && defined(V9)) || defined(__sparc_v9__)
  2165. #define SNUMOPT 2
  2166. #define DNUMOPT 2
  2167. #define GEMM_DEFAULT_OFFSET_A 0
  2168. #define GEMM_DEFAULT_OFFSET_B 2048
  2169. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2170. #define SGEMM_DEFAULT_UNROLL_M 4
  2171. #define SGEMM_DEFAULT_UNROLL_N 4
  2172. #define DGEMM_DEFAULT_UNROLL_M 4
  2173. #define DGEMM_DEFAULT_UNROLL_N 4
  2174. #define CGEMM_DEFAULT_UNROLL_M 2
  2175. #define CGEMM_DEFAULT_UNROLL_N 2
  2176. #define ZGEMM_DEFAULT_UNROLL_M 2
  2177. #define ZGEMM_DEFAULT_UNROLL_N 2
  2178. #define SGEMM_DEFAULT_P 512
  2179. #define DGEMM_DEFAULT_P 512
  2180. #define CGEMM_DEFAULT_P 512
  2181. #define ZGEMM_DEFAULT_P 512
  2182. #define SGEMM_DEFAULT_Q 1024
  2183. #define DGEMM_DEFAULT_Q 512
  2184. #define CGEMM_DEFAULT_Q 512
  2185. #define ZGEMM_DEFAULT_Q 256
  2186. #define SYMV_P 8
  2187. #endif
  2188. #ifdef SICORTEX
  2189. #define SNUMOPT 2
  2190. #define DNUMOPT 2
  2191. #define GEMM_DEFAULT_OFFSET_A 0
  2192. #define GEMM_DEFAULT_OFFSET_B 0
  2193. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2194. #define SGEMM_DEFAULT_UNROLL_M 2
  2195. #define SGEMM_DEFAULT_UNROLL_N 8
  2196. #define DGEMM_DEFAULT_UNROLL_M 2
  2197. #define DGEMM_DEFAULT_UNROLL_N 8
  2198. #define CGEMM_DEFAULT_UNROLL_M 1
  2199. #define CGEMM_DEFAULT_UNROLL_N 4
  2200. #define ZGEMM_DEFAULT_UNROLL_M 1
  2201. #define ZGEMM_DEFAULT_UNROLL_N 4
  2202. #define SGEMM_DEFAULT_P 108
  2203. #define DGEMM_DEFAULT_P 112
  2204. #define CGEMM_DEFAULT_P 108
  2205. #define ZGEMM_DEFAULT_P 112
  2206. #define SGEMM_DEFAULT_Q 288
  2207. #define DGEMM_DEFAULT_Q 144
  2208. #define CGEMM_DEFAULT_Q 144
  2209. #define ZGEMM_DEFAULT_Q 72
  2210. #define SGEMM_DEFAULT_R 2000
  2211. #define DGEMM_DEFAULT_R 2000
  2212. #define CGEMM_DEFAULT_R 2000
  2213. #define ZGEMM_DEFAULT_R 2000
  2214. #define SYMV_P 16
  2215. #endif
  2216. #if defined(LOONGSON3R4)
  2217. #define SNUMOPT 2
  2218. #define DNUMOPT 2
  2219. #define GEMM_DEFAULT_OFFSET_A 0
  2220. #define GEMM_DEFAULT_OFFSET_B 0
  2221. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2222. #if defined(NO_MSA)
  2223. #define SGEMM_DEFAULT_UNROLL_M 8
  2224. #define SGEMM_DEFAULT_UNROLL_N 4
  2225. #define DGEMM_DEFAULT_UNROLL_M 4
  2226. #define DGEMM_DEFAULT_UNROLL_N 4
  2227. #define CGEMM_DEFAULT_UNROLL_M 4
  2228. #define CGEMM_DEFAULT_UNROLL_N 2
  2229. #define ZGEMM_DEFAULT_UNROLL_M 2
  2230. #define ZGEMM_DEFAULT_UNROLL_N 2
  2231. #else
  2232. #define SGEMM_DEFAULT_UNROLL_M 8
  2233. #define SGEMM_DEFAULT_UNROLL_N 8
  2234. #define DGEMM_DEFAULT_UNROLL_M 8
  2235. #define DGEMM_DEFAULT_UNROLL_N 4
  2236. #define CGEMM_DEFAULT_UNROLL_M 8
  2237. #define CGEMM_DEFAULT_UNROLL_N 4
  2238. #define ZGEMM_DEFAULT_UNROLL_M 4
  2239. #define ZGEMM_DEFAULT_UNROLL_N 4
  2240. #endif
  2241. #define SGEMM_DEFAULT_P 64
  2242. #define DGEMM_DEFAULT_P 44
  2243. #define CGEMM_DEFAULT_P 64
  2244. #define ZGEMM_DEFAULT_P 32
  2245. #define SGEMM_DEFAULT_Q 192
  2246. #define DGEMM_DEFAULT_Q 92
  2247. #define CGEMM_DEFAULT_Q 128
  2248. #define ZGEMM_DEFAULT_Q 80
  2249. #define SGEMM_DEFAULT_R 640
  2250. #define DGEMM_DEFAULT_R dgemm_r
  2251. #define CGEMM_DEFAULT_R 640
  2252. #define ZGEMM_DEFAULT_R 640
  2253. #define GEMM_OFFSET_A1 0x10000
  2254. #define GEMM_OFFSET_B1 0x100000
  2255. #define SYMV_P 16
  2256. #endif
  2257. #if defined(LOONGSON3R3)
  2258. ////Copy from SICORTEX
  2259. #define SNUMOPT 2
  2260. #define DNUMOPT 2
  2261. #define GEMM_DEFAULT_OFFSET_A 0
  2262. #define GEMM_DEFAULT_OFFSET_B 0
  2263. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2264. #define SGEMM_DEFAULT_UNROLL_M 8
  2265. #define SGEMM_DEFAULT_UNROLL_N 4
  2266. #define DGEMM_DEFAULT_UNROLL_M 4
  2267. #define DGEMM_DEFAULT_UNROLL_N 4
  2268. #define CGEMM_DEFAULT_UNROLL_M 4
  2269. #define CGEMM_DEFAULT_UNROLL_N 2
  2270. #define ZGEMM_DEFAULT_UNROLL_M 2
  2271. #define ZGEMM_DEFAULT_UNROLL_N 2
  2272. #define SGEMM_DEFAULT_P 64
  2273. #define DGEMM_DEFAULT_P 44
  2274. #define CGEMM_DEFAULT_P 64
  2275. #define ZGEMM_DEFAULT_P 32
  2276. #define SGEMM_DEFAULT_Q 192
  2277. #define DGEMM_DEFAULT_Q 92
  2278. #define CGEMM_DEFAULT_Q 128
  2279. #define ZGEMM_DEFAULT_Q 80
  2280. #define SGEMM_DEFAULT_R 640
  2281. #define DGEMM_DEFAULT_R dgemm_r
  2282. #define CGEMM_DEFAULT_R 640
  2283. #define ZGEMM_DEFAULT_R 640
  2284. #define GEMM_OFFSET_A1 0x10000
  2285. #define GEMM_OFFSET_B1 0x100000
  2286. #define SYMV_P 16
  2287. #endif
  2288. #if defined (LA464)
  2289. #define SNUMOPT 2
  2290. #define DNUMOPT 2
  2291. #define GEMM_DEFAULT_OFFSET_A 0x20000
  2292. #define GEMM_DEFAULT_OFFSET_B 0
  2293. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2294. #if defined(NO_LASX)
  2295. #define DGEMM_DEFAULT_UNROLL_N 8
  2296. #define DGEMM_DEFAULT_UNROLL_M 2
  2297. #define SGEMM_DEFAULT_UNROLL_N 8
  2298. #define SGEMM_DEFAULT_UNROLL_M 2
  2299. #define CGEMM_DEFAULT_UNROLL_N 4
  2300. #define CGEMM_DEFAULT_UNROLL_M 1
  2301. #define ZGEMM_DEFAULT_UNROLL_N 4
  2302. #define ZGEMM_DEFAULT_UNROLL_M 1
  2303. #else
  2304. #define DGEMM_DEFAULT_UNROLL_N 6
  2305. #define DGEMM_DEFAULT_UNROLL_M 16
  2306. #define SGEMM_DEFAULT_UNROLL_N 8
  2307. #define SGEMM_DEFAULT_UNROLL_M 16
  2308. #define CGEMM_DEFAULT_UNROLL_N 4
  2309. #define CGEMM_DEFAULT_UNROLL_M 16
  2310. #define ZGEMM_DEFAULT_UNROLL_N 4
  2311. #define ZGEMM_DEFAULT_UNROLL_M 8
  2312. #define DGEMM_DEFAULT_UNROLL_MN 96
  2313. #endif
  2314. #define QGEMM_DEFAULT_UNROLL_N 2
  2315. #define XGEMM_DEFAULT_UNROLL_N 1
  2316. #define QGEMM_DEFAULT_UNROLL_M 2
  2317. #define XGEMM_DEFAULT_UNROLL_M 1
  2318. #define SGEMM_DEFAULT_P sgemm_p
  2319. #define DGEMM_DEFAULT_P dgemm_p
  2320. #define CGEMM_DEFAULT_P 128
  2321. #define ZGEMM_DEFAULT_P zgemm_p
  2322. #define SGEMM_DEFAULT_R sgemm_r
  2323. #define DGEMM_DEFAULT_R dgemm_r
  2324. #define CGEMM_DEFAULT_R 4096
  2325. #define ZGEMM_DEFAULT_R zgemm_r
  2326. #define SGEMM_DEFAULT_Q sgemm_q
  2327. #define DGEMM_DEFAULT_Q dgemm_q
  2328. #define CGEMM_DEFAULT_Q 128
  2329. #define ZGEMM_DEFAULT_Q zgemm_q
  2330. #define SYMV_P 16
  2331. #endif
  2332. #ifdef LA264
  2333. #define GEMM_DEFAULT_OFFSET_A 0
  2334. #define GEMM_DEFAULT_OFFSET_B 0
  2335. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2336. #define SGEMM_DEFAULT_UNROLL_M 2
  2337. #define SGEMM_DEFAULT_UNROLL_N 8
  2338. #define DGEMM_DEFAULT_UNROLL_M 8
  2339. #define DGEMM_DEFAULT_UNROLL_N 4
  2340. #define CGEMM_DEFAULT_UNROLL_M 8
  2341. #define CGEMM_DEFAULT_UNROLL_N 4
  2342. #define ZGEMM_DEFAULT_UNROLL_M 4
  2343. #define ZGEMM_DEFAULT_UNROLL_N 4
  2344. #define SGEMM_DEFAULT_P 128
  2345. #define DGEMM_DEFAULT_P 128
  2346. #define CGEMM_DEFAULT_P 96
  2347. #define ZGEMM_DEFAULT_P 64
  2348. #define SGEMM_DEFAULT_Q 240
  2349. #define DGEMM_DEFAULT_Q 120
  2350. #define CGEMM_DEFAULT_Q 120
  2351. #define ZGEMM_DEFAULT_Q 120
  2352. #define SGEMM_DEFAULT_R 12288
  2353. #define DGEMM_DEFAULT_R 8192
  2354. #define CGEMM_DEFAULT_R 4096
  2355. #define ZGEMM_DEFAULT_R 4096
  2356. #define SYMV_P 16
  2357. #endif
  2358. #ifdef LA64_GENERIC
  2359. #define GEMM_DEFAULT_OFFSET_A 0
  2360. #define GEMM_DEFAULT_OFFSET_B 0
  2361. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2362. #define SGEMM_DEFAULT_UNROLL_M 2
  2363. #define SGEMM_DEFAULT_UNROLL_N 8
  2364. #define DGEMM_DEFAULT_UNROLL_M 2
  2365. #define DGEMM_DEFAULT_UNROLL_N 8
  2366. #define CGEMM_DEFAULT_UNROLL_M 1
  2367. #define CGEMM_DEFAULT_UNROLL_N 4
  2368. #define ZGEMM_DEFAULT_UNROLL_M 1
  2369. #define ZGEMM_DEFAULT_UNROLL_N 4
  2370. #define SGEMM_DEFAULT_P 128
  2371. #define DGEMM_DEFAULT_P 128
  2372. #define CGEMM_DEFAULT_P 96
  2373. #define ZGEMM_DEFAULT_P 64
  2374. #define SGEMM_DEFAULT_Q 240
  2375. #define DGEMM_DEFAULT_Q 120
  2376. #define CGEMM_DEFAULT_Q 120
  2377. #define ZGEMM_DEFAULT_Q 120
  2378. #define SGEMM_DEFAULT_R 12288
  2379. #define DGEMM_DEFAULT_R 8192
  2380. #define CGEMM_DEFAULT_R 4096
  2381. #define ZGEMM_DEFAULT_R 4096
  2382. #define SYMV_P 16
  2383. #endif
  2384. #if defined(MIPS64_GENERIC) || defined(P5600) || defined(MIPS1004K) || defined(MIPS24K) || defined(I6400) || defined(P6600) || defined(I6500)
  2385. #define SNUMOPT 2
  2386. #define DNUMOPT 2
  2387. #define GEMM_DEFAULT_OFFSET_A 0
  2388. #define GEMM_DEFAULT_OFFSET_B 0
  2389. #define GEMM_DEFAULT_ALIGN (BLASLONG) 0x03fffUL
  2390. #if defined(NO_MSA) || defined(MIPS64_GENERIC)
  2391. #define SGEMM_DEFAULT_UNROLL_M 2
  2392. #define SGEMM_DEFAULT_UNROLL_N 2
  2393. #define DGEMM_DEFAULT_UNROLL_M 2
  2394. #define DGEMM_DEFAULT_UNROLL_N 2
  2395. #define CGEMM_DEFAULT_UNROLL_M 2
  2396. #define CGEMM_DEFAULT_UNROLL_N 2
  2397. #define ZGEMM_DEFAULT_UNROLL_M 2
  2398. #define ZGEMM_DEFAULT_UNROLL_N 2
  2399. #else
  2400. #define SGEMM_DEFAULT_UNROLL_M 8
  2401. #define SGEMM_DEFAULT_UNROLL_N 8
  2402. #define DGEMM_DEFAULT_UNROLL_M 8
  2403. #define DGEMM_DEFAULT_UNROLL_N 4
  2404. #define CGEMM_DEFAULT_UNROLL_M 8
  2405. #define CGEMM_DEFAULT_UNROLL_N 4
  2406. #define ZGEMM_DEFAULT_UNROLL_M 4
  2407. #define ZGEMM_DEFAULT_UNROLL_N 4
  2408. #endif
  2409. #define SGEMM_DEFAULT_P 128
  2410. #define DGEMM_DEFAULT_P 128
  2411. #define CGEMM_DEFAULT_P 96
  2412. #define ZGEMM_DEFAULT_P 64
  2413. #define SGEMM_DEFAULT_Q 240
  2414. #define DGEMM_DEFAULT_Q 120
  2415. #define CGEMM_DEFAULT_Q 120
  2416. #define ZGEMM_DEFAULT_Q 120
  2417. #define SGEMM_DEFAULT_R 12288
  2418. #define DGEMM_DEFAULT_R 8192
  2419. #define CGEMM_DEFAULT_R 4096
  2420. #define ZGEMM_DEFAULT_R 4096
  2421. #define SYMV_P 16
  2422. #endif
  2423. #ifdef RISCV64_GENERIC
  2424. #define GEMM_DEFAULT_OFFSET_A 0
  2425. #define GEMM_DEFAULT_OFFSET_B 0
  2426. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2427. #define SGEMM_DEFAULT_UNROLL_M 2
  2428. #define SGEMM_DEFAULT_UNROLL_N 2
  2429. #define DGEMM_DEFAULT_UNROLL_M 2
  2430. #define DGEMM_DEFAULT_UNROLL_N 2
  2431. #define CGEMM_DEFAULT_UNROLL_M 2
  2432. #define CGEMM_DEFAULT_UNROLL_N 2
  2433. #define ZGEMM_DEFAULT_UNROLL_M 2
  2434. #define ZGEMM_DEFAULT_UNROLL_N 2
  2435. #define SGEMM_DEFAULT_P 128
  2436. #define DGEMM_DEFAULT_P 128
  2437. #define CGEMM_DEFAULT_P 96
  2438. #define ZGEMM_DEFAULT_P 64
  2439. #define SGEMM_DEFAULT_Q 240
  2440. #define DGEMM_DEFAULT_Q 120
  2441. #define CGEMM_DEFAULT_Q 120
  2442. #define ZGEMM_DEFAULT_Q 120
  2443. #define SGEMM_DEFAULT_R 12288
  2444. #define DGEMM_DEFAULT_R 8192
  2445. #define CGEMM_DEFAULT_R 4096
  2446. #define ZGEMM_DEFAULT_R 4096
  2447. #define SYMV_P 16
  2448. #define GEMM_DEFAULT_OFFSET_A 0
  2449. #define GEMM_DEFAULT_OFFSET_B 0
  2450. #endif
  2451. #if defined(x280)
  2452. #define GEMM_DEFAULT_OFFSET_A 0
  2453. #define GEMM_DEFAULT_OFFSET_B 0
  2454. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2455. #define SGEMM_DEFAULT_UNROLL_M 16 // 4 // 16 // 2
  2456. #define SGEMM_DEFAULT_UNROLL_N 8// 4 // 4 // 2
  2457. /* SGEMM_UNROLL_MN is calculated as max(SGEMM_UNROLL_M, SGEMM_UNROLL_N)
  2458. * Since we don't define SGEMM_UNROLL_M correctly we have to manually set this macro.
  2459. * If VLMAX size is ever more than 1024, this should be increased also. */
  2460. #define SGEMM_DEFAULT_UNROLL_MN 32
  2461. #define DGEMM_DEFAULT_UNROLL_M 16 //2 // 8
  2462. #define DGEMM_DEFAULT_UNROLL_N 8 //2 // 4
  2463. #define DGEMM_DEFAULT_UNROLL_MN 32
  2464. #define CGEMM_DEFAULT_UNROLL_M 8
  2465. #define CGEMM_DEFAULT_UNROLL_N 4
  2466. #define CGEMM_DEFAULT_UNROLL_MN 32
  2467. #define ZGEMM_DEFAULT_UNROLL_M 8
  2468. #define ZGEMM_DEFAULT_UNROLL_N 4
  2469. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2470. #define SGEMM_DEFAULT_P 160
  2471. #define DGEMM_DEFAULT_P 160
  2472. #define CGEMM_DEFAULT_P 96
  2473. #define ZGEMM_DEFAULT_P 64
  2474. #define SGEMM_DEFAULT_Q 240
  2475. #define DGEMM_DEFAULT_Q 128
  2476. #define CGEMM_DEFAULT_Q 120
  2477. #define ZGEMM_DEFAULT_Q 120
  2478. #define SGEMM_DEFAULT_R 12288
  2479. #define DGEMM_DEFAULT_R 8192
  2480. #define CGEMM_DEFAULT_R 4096
  2481. #define ZGEMM_DEFAULT_R 4096
  2482. #define SYMV_P 16
  2483. #define GEMM_DEFAULT_OFFSET_A 0
  2484. #define GEMM_DEFAULT_OFFSET_B 0
  2485. #endif
  2486. #ifdef C910V
  2487. #define GEMM_DEFAULT_OFFSET_A 0
  2488. #define GEMM_DEFAULT_OFFSET_B 0
  2489. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2490. #define SGEMM_DEFAULT_UNROLL_M 16
  2491. #define SGEMM_DEFAULT_UNROLL_N 4
  2492. #define DGEMM_DEFAULT_UNROLL_M 8
  2493. #define DGEMM_DEFAULT_UNROLL_N 4
  2494. #define CGEMM_DEFAULT_UNROLL_M 2
  2495. #define CGEMM_DEFAULT_UNROLL_N 2
  2496. #define ZGEMM_DEFAULT_UNROLL_M 2
  2497. #define ZGEMM_DEFAULT_UNROLL_N 2
  2498. #define SGEMM_DEFAULT_P 160
  2499. #define DGEMM_DEFAULT_P 160
  2500. #define CGEMM_DEFAULT_P 96
  2501. #define ZGEMM_DEFAULT_P 64
  2502. #define SGEMM_DEFAULT_Q 240
  2503. #define DGEMM_DEFAULT_Q 128
  2504. #define CGEMM_DEFAULT_Q 120
  2505. #define ZGEMM_DEFAULT_Q 120
  2506. #define SGEMM_DEFAULT_R 12288
  2507. #define DGEMM_DEFAULT_R 8192
  2508. #define CGEMM_DEFAULT_R 4096
  2509. #define ZGEMM_DEFAULT_R 4096
  2510. #define SYMV_P 16
  2511. #define GEMM_DEFAULT_OFFSET_A 0
  2512. #define GEMM_DEFAULT_OFFSET_B 0
  2513. #endif
  2514. #ifdef RISCV64_ZVL128B
  2515. #define GEMM_DEFAULT_OFFSET_A 0
  2516. #define GEMM_DEFAULT_OFFSET_B 0
  2517. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2518. #undef SHGEMM_DEFAULT_UNROLL_M
  2519. #undef SHGEMM_DEFAULT_UNROLL_N
  2520. #define SHGEMM_DEFAULT_UNROLL_M 8
  2521. #define SHGEMM_DEFAULT_UNROLL_N 8
  2522. #define SGEMM_DEFAULT_UNROLL_M 8
  2523. #define SGEMM_DEFAULT_UNROLL_N 8
  2524. #define DGEMM_DEFAULT_UNROLL_M 8
  2525. #define DGEMM_DEFAULT_UNROLL_N 4
  2526. #define CGEMM_DEFAULT_UNROLL_M 8
  2527. #define CGEMM_DEFAULT_UNROLL_N 4
  2528. #define ZGEMM_DEFAULT_UNROLL_M 4
  2529. #define ZGEMM_DEFAULT_UNROLL_N 4
  2530. #undef SHGEMM_DEFAULT_P
  2531. #define SHGEMM_DEFAULT_P 128
  2532. #define SGEMM_DEFAULT_P 128
  2533. #define DGEMM_DEFAULT_P 128
  2534. #define CGEMM_DEFAULT_P 96
  2535. #define ZGEMM_DEFAULT_P 64
  2536. #undef SHGEMM_DEFAULT_Q
  2537. #define SHGEMM_DEFAULT_Q 240
  2538. #define SGEMM_DEFAULT_Q 240
  2539. #define DGEMM_DEFAULT_Q 120
  2540. #define CGEMM_DEFAULT_Q 120
  2541. #define ZGEMM_DEFAULT_Q 120
  2542. #undef SHGEMM_DEFAULT_R
  2543. #define SHGEMM_DEFAULT_R 12288
  2544. #define SGEMM_DEFAULT_R 12288
  2545. #define DGEMM_DEFAULT_R 8192
  2546. #define CGEMM_DEFAULT_R 4096
  2547. #define ZGEMM_DEFAULT_R 4096
  2548. #define SYMV_P 16
  2549. #define GEMM_DEFAULT_OFFSET_A 0
  2550. #define GEMM_DEFAULT_OFFSET_B 0
  2551. #endif
  2552. #ifdef RISCV64_ZVL256B
  2553. #define GEMM_DEFAULT_OFFSET_A 0
  2554. #define GEMM_DEFAULT_OFFSET_B 0
  2555. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2556. #undef SHGEMM_DEFAULT_UNROLL_M
  2557. #undef SHGEMM_DEFAULT_UNROLL_N
  2558. #define SHGEMM_DEFAULT_UNROLL_M 16
  2559. #define SHGEMM_DEFAULT_UNROLL_N 8
  2560. #define SGEMM_DEFAULT_UNROLL_M 16
  2561. #define SGEMM_DEFAULT_UNROLL_N 8
  2562. #define DGEMM_DEFAULT_UNROLL_M 8
  2563. #define DGEMM_DEFAULT_UNROLL_N 8
  2564. #define CGEMM_DEFAULT_UNROLL_M 8
  2565. #define CGEMM_DEFAULT_UNROLL_N 8
  2566. #define ZGEMM_DEFAULT_UNROLL_M 8
  2567. #define ZGEMM_DEFAULT_UNROLL_N 4
  2568. #undef SHGEMM_DEFAULT_P
  2569. #define SHGEMM_DEFAULT_P 128
  2570. #define SGEMM_DEFAULT_P 128
  2571. #define DGEMM_DEFAULT_P 64
  2572. #define CGEMM_DEFAULT_P 64
  2573. #define ZGEMM_DEFAULT_P 64
  2574. #undef SHGEMM_DEFAULT_Q
  2575. #define SHGEMM_DEFAULT_Q 128
  2576. #define SGEMM_DEFAULT_Q 128
  2577. #define DGEMM_DEFAULT_Q 128
  2578. #define CGEMM_DEFAULT_Q 128
  2579. #define ZGEMM_DEFAULT_Q 64
  2580. #undef SHGEMM_DEFAULT_R
  2581. #define SHGEMM_DEFAULT_R 16384
  2582. #define SGEMM_DEFAULT_R 16384
  2583. #define DGEMM_DEFAULT_R 8192
  2584. #define CGEMM_DEFAULT_R 8192
  2585. #define ZGEMM_DEFAULT_R 4096
  2586. #define SYMV_P 16
  2587. #define GEMM_DEFAULT_OFFSET_A 0
  2588. #define GEMM_DEFAULT_OFFSET_B 0
  2589. #endif
  2590. #ifdef ARMV7
  2591. #define SNUMOPT 2
  2592. #define DNUMOPT 2
  2593. #define GEMM_DEFAULT_OFFSET_A 0
  2594. #define GEMM_DEFAULT_OFFSET_B 0
  2595. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2596. #define SGEMM_DEFAULT_UNROLL_M 4
  2597. #define SGEMM_DEFAULT_UNROLL_N 4
  2598. #define DGEMM_DEFAULT_UNROLL_M 4
  2599. #define DGEMM_DEFAULT_UNROLL_N 4
  2600. #define CGEMM_DEFAULT_UNROLL_M 2
  2601. #define CGEMM_DEFAULT_UNROLL_N 2
  2602. #define ZGEMM_DEFAULT_UNROLL_M 2
  2603. #define ZGEMM_DEFAULT_UNROLL_N 2
  2604. #define SGEMM_DEFAULT_P 128
  2605. #define DGEMM_DEFAULT_P 128
  2606. #define CGEMM_DEFAULT_P 96
  2607. #define ZGEMM_DEFAULT_P 64
  2608. #define SGEMM_DEFAULT_Q 240
  2609. #define DGEMM_DEFAULT_Q 120
  2610. #define CGEMM_DEFAULT_Q 120
  2611. #define ZGEMM_DEFAULT_Q 120
  2612. #define SGEMM_DEFAULT_R 12288
  2613. #define DGEMM_DEFAULT_R 8192
  2614. #define CGEMM_DEFAULT_R 4096
  2615. #define ZGEMM_DEFAULT_R 4096
  2616. #define SYMV_P 16
  2617. #endif
  2618. #if defined(ARMV6)
  2619. #define SNUMOPT 2
  2620. #define DNUMOPT 2
  2621. #define GEMM_DEFAULT_OFFSET_A 0
  2622. #define GEMM_DEFAULT_OFFSET_B 0
  2623. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2624. #define SGEMM_DEFAULT_UNROLL_M 4
  2625. #define SGEMM_DEFAULT_UNROLL_N 2
  2626. #define DGEMM_DEFAULT_UNROLL_M 4
  2627. #define DGEMM_DEFAULT_UNROLL_N 2
  2628. #define CGEMM_DEFAULT_UNROLL_M 2
  2629. #define CGEMM_DEFAULT_UNROLL_N 2
  2630. #define ZGEMM_DEFAULT_UNROLL_M 2
  2631. #define ZGEMM_DEFAULT_UNROLL_N 2
  2632. #define SGEMM_DEFAULT_P 128
  2633. #define DGEMM_DEFAULT_P 128
  2634. #define CGEMM_DEFAULT_P 96
  2635. #define ZGEMM_DEFAULT_P 64
  2636. #define SGEMM_DEFAULT_Q 240
  2637. #define DGEMM_DEFAULT_Q 120
  2638. #define CGEMM_DEFAULT_Q 120
  2639. #define ZGEMM_DEFAULT_Q 120
  2640. #define SGEMM_DEFAULT_R 12288
  2641. #define DGEMM_DEFAULT_R 8192
  2642. #define CGEMM_DEFAULT_R 4096
  2643. #define ZGEMM_DEFAULT_R 4096
  2644. #define SYMV_P 16
  2645. #endif
  2646. /* Common ARMv8 parameters */
  2647. #if defined(ARMV8)
  2648. #define SNUMOPT 2
  2649. #define DNUMOPT 2
  2650. #define GEMM_DEFAULT_OFFSET_A 0
  2651. #define GEMM_DEFAULT_OFFSET_B 0
  2652. #ifdef _WIN64
  2653. /* Use explicit casting for win64 as LLP64 datamodel is used */
  2654. #define GEMM_DEFAULT_ALIGN (BLASULONG)0x03fffUL
  2655. #else
  2656. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2657. #endif
  2658. #define SYMV_P 16
  2659. #if defined(CORTEXA57) || defined(CORTEXX1) || \
  2660. defined(CORTEXA72) || defined(CORTEXA73) || \
  2661. defined(FALKOR) || defined(TSV110) || defined(EMAG8180) || defined(VORTEX) || defined(FT2000)
  2662. #define SGEMM_DEFAULT_UNROLL_M 16
  2663. #define SGEMM_DEFAULT_UNROLL_N 4
  2664. #define DGEMM_DEFAULT_UNROLL_M 8
  2665. #define DGEMM_DEFAULT_UNROLL_N 4
  2666. #define CGEMM_DEFAULT_UNROLL_M 8
  2667. #define CGEMM_DEFAULT_UNROLL_N 4
  2668. #define ZGEMM_DEFAULT_UNROLL_M 4
  2669. #define ZGEMM_DEFAULT_UNROLL_N 4
  2670. /*FIXME: this should be using the cache size, but there is currently no easy way to
  2671. query that on ARM. So if getarch counted more than 8 cores we simply assume the host
  2672. is a big desktop or server with abundant cache rather than a phone or embedded device */
  2673. #if NUM_CORES > 8 || defined(TSV110) || defined(EMAG8180) || defined(VORTEX)|| defined(CORTEXX1)
  2674. #define SGEMM_DEFAULT_P 512
  2675. #define DGEMM_DEFAULT_P 256
  2676. #define CGEMM_DEFAULT_P 256
  2677. #define ZGEMM_DEFAULT_P 128
  2678. #define SGEMM_DEFAULT_Q 1024
  2679. #define DGEMM_DEFAULT_Q 512
  2680. #define CGEMM_DEFAULT_Q 512
  2681. #define ZGEMM_DEFAULT_Q 512
  2682. #else
  2683. #define SGEMM_DEFAULT_P 128
  2684. #define DGEMM_DEFAULT_P 160
  2685. #define CGEMM_DEFAULT_P 128
  2686. #define ZGEMM_DEFAULT_P 128
  2687. #define SGEMM_DEFAULT_Q 352
  2688. #define DGEMM_DEFAULT_Q 128
  2689. #define CGEMM_DEFAULT_Q 224
  2690. #define ZGEMM_DEFAULT_Q 112
  2691. #endif
  2692. #define SGEMM_DEFAULT_R 4096
  2693. #define DGEMM_DEFAULT_R 4096
  2694. #define CGEMM_DEFAULT_R 4096
  2695. #define ZGEMM_DEFAULT_R 2048
  2696. #elif defined(CORTEXA76)
  2697. #define SGEMM_DEFAULT_UNROLL_M 16
  2698. #define SGEMM_DEFAULT_UNROLL_N 4
  2699. #define DGEMM_DEFAULT_UNROLL_M 8
  2700. #define DGEMM_DEFAULT_UNROLL_N 4
  2701. #define CGEMM_DEFAULT_UNROLL_M 8
  2702. #define CGEMM_DEFAULT_UNROLL_N 4
  2703. #define ZGEMM_DEFAULT_UNROLL_M 4
  2704. #define ZGEMM_DEFAULT_UNROLL_N 4
  2705. #if defined(XDOUBLE) || defined(DOUBLE)
  2706. #define SWITCH_RATIO 8
  2707. #else
  2708. #define SWITCH_RATIO 16
  2709. #endif
  2710. #define SGEMM_DEFAULT_P 256
  2711. #define DGEMM_DEFAULT_P 128
  2712. #define CGEMM_DEFAULT_P 128
  2713. #define ZGEMM_DEFAULT_P 64
  2714. #define SGEMM_DEFAULT_Q 512
  2715. #define DGEMM_DEFAULT_Q 256
  2716. #define CGEMM_DEFAULT_Q 256
  2717. #define ZGEMM_DEFAULT_Q 256
  2718. #define SGEMM_DEFAULT_R 4096
  2719. #define DGEMM_DEFAULT_R 4096
  2720. #define CGEMM_DEFAULT_R 4096
  2721. #define ZGEMM_DEFAULT_R 4096
  2722. #elif defined(CORTEXA53) || defined(CORTEXA55)
  2723. #define SGEMM_DEFAULT_UNROLL_M 8
  2724. #define SGEMM_DEFAULT_UNROLL_N 8
  2725. #define DGEMM_DEFAULT_UNROLL_M 4
  2726. #define DGEMM_DEFAULT_UNROLL_N 4
  2727. #define CGEMM_DEFAULT_UNROLL_M 8
  2728. #define CGEMM_DEFAULT_UNROLL_N 4
  2729. #define ZGEMM_DEFAULT_UNROLL_M 4
  2730. #define ZGEMM_DEFAULT_UNROLL_N 4
  2731. #define SGEMM_DEFAULT_P 256
  2732. #define DGEMM_DEFAULT_P 160
  2733. #define CGEMM_DEFAULT_P 128
  2734. #define ZGEMM_DEFAULT_P 128
  2735. #define SGEMM_DEFAULT_Q 256
  2736. #define DGEMM_DEFAULT_Q 128
  2737. #define CGEMM_DEFAULT_Q 224
  2738. #define ZGEMM_DEFAULT_Q 112
  2739. #define SGEMM_DEFAULT_R 4096
  2740. #define DGEMM_DEFAULT_R 4096
  2741. #define CGEMM_DEFAULT_R 4096
  2742. #define ZGEMM_DEFAULT_R 2048
  2743. #elif defined(THUNDERX)
  2744. #define SGEMM_DEFAULT_UNROLL_M 4
  2745. #define SGEMM_DEFAULT_UNROLL_N 4
  2746. #define DGEMM_DEFAULT_UNROLL_M 2
  2747. #define DGEMM_DEFAULT_UNROLL_N 2
  2748. #define CGEMM_DEFAULT_UNROLL_M 2
  2749. #define CGEMM_DEFAULT_UNROLL_N 2
  2750. #define ZGEMM_DEFAULT_UNROLL_M 2
  2751. #define ZGEMM_DEFAULT_UNROLL_N 2
  2752. #define SGEMM_DEFAULT_P 128
  2753. #define DGEMM_DEFAULT_P 128
  2754. #define CGEMM_DEFAULT_P 96
  2755. #define ZGEMM_DEFAULT_P 64
  2756. #define SGEMM_DEFAULT_Q 240
  2757. #define DGEMM_DEFAULT_Q 120
  2758. #define CGEMM_DEFAULT_Q 120
  2759. #define ZGEMM_DEFAULT_Q 120
  2760. #define SGEMM_DEFAULT_R 12288
  2761. #define DGEMM_DEFAULT_R 8192
  2762. #define CGEMM_DEFAULT_R 4096
  2763. #define ZGEMM_DEFAULT_R 4096
  2764. #elif defined(THUNDERX2T99)
  2765. #define SGEMM_DEFAULT_UNROLL_M 16
  2766. #define SGEMM_DEFAULT_UNROLL_N 4
  2767. #define DGEMM_DEFAULT_UNROLL_M 8
  2768. #define DGEMM_DEFAULT_UNROLL_N 4
  2769. #define CGEMM_DEFAULT_UNROLL_M 8
  2770. #define CGEMM_DEFAULT_UNROLL_N 4
  2771. #define ZGEMM_DEFAULT_UNROLL_M 4
  2772. #define ZGEMM_DEFAULT_UNROLL_N 4
  2773. #define SGEMM_DEFAULT_P 128
  2774. #define DGEMM_DEFAULT_P 160
  2775. #define CGEMM_DEFAULT_P 128
  2776. #define ZGEMM_DEFAULT_P 128
  2777. #define SGEMM_DEFAULT_Q 352
  2778. #define DGEMM_DEFAULT_Q 128
  2779. #define CGEMM_DEFAULT_Q 224
  2780. #define ZGEMM_DEFAULT_Q 112
  2781. #define SGEMM_DEFAULT_R 4096
  2782. #define DGEMM_DEFAULT_R 4096
  2783. #define CGEMM_DEFAULT_R 4096
  2784. #define ZGEMM_DEFAULT_R 4096
  2785. #elif defined(THUNDERX3T110)
  2786. #define SGEMM_DEFAULT_UNROLL_M 16
  2787. #define SGEMM_DEFAULT_UNROLL_N 4
  2788. #define DGEMM_DEFAULT_UNROLL_M 8
  2789. #define DGEMM_DEFAULT_UNROLL_N 4
  2790. #define CGEMM_DEFAULT_UNROLL_M 8
  2791. #define CGEMM_DEFAULT_UNROLL_N 4
  2792. #define ZGEMM_DEFAULT_UNROLL_M 4
  2793. #define ZGEMM_DEFAULT_UNROLL_N 4
  2794. #define SGEMM_DEFAULT_P 128
  2795. #define DGEMM_DEFAULT_P 320
  2796. #define CGEMM_DEFAULT_P 128
  2797. #define ZGEMM_DEFAULT_P 128
  2798. #define SGEMM_DEFAULT_Q 352
  2799. #define DGEMM_DEFAULT_Q 128
  2800. #define CGEMM_DEFAULT_Q 224
  2801. #define ZGEMM_DEFAULT_Q 112
  2802. #define SGEMM_DEFAULT_R 4096
  2803. #define DGEMM_DEFAULT_R 4096
  2804. #define CGEMM_DEFAULT_R 4096
  2805. #define ZGEMM_DEFAULT_R 4096
  2806. #elif defined(NEOVERSEN1)
  2807. #if defined(XDOUBLE) || defined(DOUBLE)
  2808. #define SWITCH_RATIO 8
  2809. #else
  2810. #define SWITCH_RATIO 16
  2811. #endif
  2812. #define SGEMM_DEFAULT_UNROLL_M 16
  2813. #define SGEMM_DEFAULT_UNROLL_N 4
  2814. #define DGEMM_DEFAULT_UNROLL_M 8
  2815. #define DGEMM_DEFAULT_UNROLL_N 4
  2816. #define CGEMM_DEFAULT_UNROLL_M 8
  2817. #define CGEMM_DEFAULT_UNROLL_N 4
  2818. #define ZGEMM_DEFAULT_UNROLL_M 4
  2819. #define ZGEMM_DEFAULT_UNROLL_N 4
  2820. #define SGEMM_DEFAULT_P 240
  2821. #define DGEMM_DEFAULT_P 240
  2822. #define CGEMM_DEFAULT_P 128
  2823. #define ZGEMM_DEFAULT_P 128
  2824. #define SGEMM_DEFAULT_Q 640
  2825. #define DGEMM_DEFAULT_Q 320
  2826. #define CGEMM_DEFAULT_Q 224
  2827. #define ZGEMM_DEFAULT_Q 112
  2828. #define SGEMM_DEFAULT_R 4096
  2829. #define DGEMM_DEFAULT_R 4096
  2830. #define CGEMM_DEFAULT_R 4096
  2831. #define ZGEMM_DEFAULT_R 4096
  2832. #elif defined(NEOVERSEV1) // 256-bit SVE
  2833. #if defined(XDOUBLE) || defined(DOUBLE)
  2834. #define SWITCH_RATIO 8
  2835. #define GEMM_PREFERED_SIZE 4
  2836. #else
  2837. #define SWITCH_RATIO 16
  2838. #define GEMM_PREFERED_SIZE 8
  2839. #endif
  2840. #undef SBGEMM_ALIGN_K
  2841. #undef SBGEMM_DEFAULT_UNROLL_M
  2842. #undef SBGEMM_DEFAULT_UNROLL_N
  2843. #define SBGEMM_ALIGN_K 8
  2844. #define SBGEMM_DEFAULT_UNROLL_M 4
  2845. #define SBGEMM_DEFAULT_UNROLL_N 4
  2846. #define SGEMM_DEFAULT_UNROLL_M 16
  2847. #define SGEMM_DEFAULT_UNROLL_N 8
  2848. #define DGEMM_DEFAULT_UNROLL_M 4 // Actually 2VL (8) but kept separate to keep copies separate
  2849. #define DGEMM_DEFAULT_UNROLL_N 8
  2850. #define CGEMM_DEFAULT_UNROLL_M 2
  2851. #define CGEMM_DEFAULT_UNROLL_N 4
  2852. #define CGEMM_DEFAULT_UNROLL_MN 16
  2853. #define ZGEMM_DEFAULT_UNROLL_M 2
  2854. #define ZGEMM_DEFAULT_UNROLL_N 4
  2855. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2856. #define SGEMM_DEFAULT_P 240
  2857. #define DGEMM_DEFAULT_P 240
  2858. #define CGEMM_DEFAULT_P 128
  2859. #define ZGEMM_DEFAULT_P 128
  2860. #define SGEMM_DEFAULT_Q 640
  2861. #define DGEMM_DEFAULT_Q 320
  2862. #define CGEMM_DEFAULT_Q 224
  2863. #define ZGEMM_DEFAULT_Q 112
  2864. #define SGEMM_DEFAULT_R 4096
  2865. #define DGEMM_DEFAULT_R 4096
  2866. #define CGEMM_DEFAULT_R 4096
  2867. #define ZGEMM_DEFAULT_R 4096
  2868. #elif defined(NEOVERSEN2)
  2869. #if defined(XDOUBLE) || defined(DOUBLE)
  2870. #define SWITCH_RATIO 8
  2871. #else
  2872. #define SWITCH_RATIO 16
  2873. #endif
  2874. #undef SBGEMM_ALIGN_K
  2875. #define SBGEMM_ALIGN_K 4
  2876. #undef SBGEMM_DEFAULT_UNROLL_M
  2877. #undef SBGEMM_DEFAULT_UNROLL_N
  2878. #define SBGEMM_DEFAULT_UNROLL_M 8
  2879. #define SBGEMM_DEFAULT_UNROLL_N 4
  2880. #define SGEMM_DEFAULT_UNROLL_M 16
  2881. #define SGEMM_DEFAULT_UNROLL_N 4
  2882. #define DGEMM_DEFAULT_UNROLL_M 8
  2883. #define DGEMM_DEFAULT_UNROLL_N 4
  2884. #define CGEMM_DEFAULT_UNROLL_M 8
  2885. #define CGEMM_DEFAULT_UNROLL_N 4
  2886. #define ZGEMM_DEFAULT_UNROLL_M 4
  2887. #define ZGEMM_DEFAULT_UNROLL_N 4
  2888. #define SGEMM_DEFAULT_P 128
  2889. #define DGEMM_DEFAULT_P 160
  2890. #define CGEMM_DEFAULT_P 128
  2891. #define ZGEMM_DEFAULT_P 128
  2892. #define SGEMM_DEFAULT_Q 352
  2893. #define DGEMM_DEFAULT_Q 128
  2894. #define CGEMM_DEFAULT_Q 224
  2895. #define ZGEMM_DEFAULT_Q 112
  2896. #define SGEMM_DEFAULT_R 4096
  2897. #define DGEMM_DEFAULT_R 4096
  2898. #define CGEMM_DEFAULT_R 4096
  2899. #define ZGEMM_DEFAULT_R 4096
  2900. #elif defined(A64FX) // 512-bit SVE
  2901. /* When all BLAS3 routines are implemeted with SVE, SGEMM_DEFAULT_UNROLL_M should be "sve_vl".
  2902. Until then, just keep it different than DGEMM_DEFAULT_UNROLL_N to keep copy routines in both directions seperated. */
  2903. #define SGEMM_DEFAULT_UNROLL_M 4
  2904. #define SGEMM_DEFAULT_UNROLL_N 8
  2905. /* SGEMM_UNROLL_MN is calculated as max(SGEMM_UNROLL_M, SGEMM_UNROLL_N)
  2906. * Since we don't define SGEMM_UNROLL_M correctly we have to manually set this macro.
  2907. * If SVE size is ever more than 1024, this should be increased also. */
  2908. #define SGEMM_DEFAULT_UNROLL_MN 32
  2909. /* When all BLAS3 routines are implemeted with SVE, DGEMM_DEFAULT_UNROLL_M should be "sve_vl".
  2910. Until then, just keep it different than DGEMM_DEFAULT_UNROLL_N to keep copy routines in both directions seperated. */
  2911. #define DGEMM_DEFAULT_UNROLL_M 2
  2912. #define DGEMM_DEFAULT_UNROLL_N 8
  2913. #define DGEMM_DEFAULT_UNROLL_MN 32
  2914. #define CGEMM_DEFAULT_UNROLL_M 2
  2915. #define CGEMM_DEFAULT_UNROLL_N 4
  2916. #define CGEMM_DEFAULT_UNROLL_MN 16
  2917. #define ZGEMM_DEFAULT_UNROLL_M 2
  2918. #define ZGEMM_DEFAULT_UNROLL_N 4
  2919. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2920. #define SGEMM_DEFAULT_P 128
  2921. #define DGEMM_DEFAULT_P 160
  2922. #define CGEMM_DEFAULT_P 128
  2923. #define ZGEMM_DEFAULT_P 128
  2924. #define SGEMM_DEFAULT_Q 352
  2925. #define DGEMM_DEFAULT_Q 128
  2926. #define CGEMM_DEFAULT_Q 224
  2927. #define ZGEMM_DEFAULT_Q 112
  2928. #define SGEMM_DEFAULT_R 4096
  2929. #define DGEMM_DEFAULT_R 4096
  2930. #define CGEMM_DEFAULT_R 4096
  2931. #define ZGEMM_DEFAULT_R 4096
  2932. #elif defined(ARMV8SVE) || defined(ARMV9SME) || defined(ARMV9) || defined(CORTEXA510)|| defined(CORTEXA710) || defined(CORTEXX2) // 128-bit SVE
  2933. #if defined(XDOUBLE) || defined(DOUBLE)
  2934. #define SWITCH_RATIO 8
  2935. #else
  2936. #define SWITCH_RATIO 16
  2937. #endif
  2938. #define SGEMM_DEFAULT_UNROLL_M 4 // Actually 1VL (8) but kept seperate to keep copies seperate
  2939. #define SGEMM_DEFAULT_UNROLL_N 8
  2940. #define DGEMM_DEFAULT_UNROLL_M 4
  2941. #define DGEMM_DEFAULT_UNROLL_N 8
  2942. #define CGEMM_DEFAULT_UNROLL_M 2
  2943. #define CGEMM_DEFAULT_UNROLL_N 4
  2944. #define CGEMM_DEFAULT_UNROLL_MN 16
  2945. #define ZGEMM_DEFAULT_UNROLL_M 2
  2946. #define ZGEMM_DEFAULT_UNROLL_N 4
  2947. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2948. #define SGEMM_DEFAULT_P 128
  2949. #define DGEMM_DEFAULT_P 160
  2950. #define CGEMM_DEFAULT_P 128
  2951. #define ZGEMM_DEFAULT_P 128
  2952. #define SGEMM_DEFAULT_Q 352
  2953. #define DGEMM_DEFAULT_Q 128
  2954. #define CGEMM_DEFAULT_Q 224
  2955. #define ZGEMM_DEFAULT_Q 112
  2956. #define SGEMM_DEFAULT_R 4096
  2957. #define DGEMM_DEFAULT_R 4096
  2958. #define CGEMM_DEFAULT_R 4096
  2959. #define ZGEMM_DEFAULT_R 4096
  2960. #else /* Other/undetected ARMv8 cores */
  2961. #define SGEMM_DEFAULT_UNROLL_M 16
  2962. #define SGEMM_DEFAULT_UNROLL_N 4
  2963. #define DGEMM_DEFAULT_UNROLL_M 8
  2964. #define DGEMM_DEFAULT_UNROLL_N 4
  2965. #define CGEMM_DEFAULT_UNROLL_M 8
  2966. #define CGEMM_DEFAULT_UNROLL_N 4
  2967. #define ZGEMM_DEFAULT_UNROLL_M 4
  2968. #define ZGEMM_DEFAULT_UNROLL_N 4
  2969. #define SGEMM_DEFAULT_P 128
  2970. #define DGEMM_DEFAULT_P 160
  2971. #define CGEMM_DEFAULT_P 128
  2972. #define ZGEMM_DEFAULT_P 128
  2973. #define SGEMM_DEFAULT_Q 352
  2974. #define DGEMM_DEFAULT_Q 128
  2975. #define CGEMM_DEFAULT_Q 224
  2976. #define ZGEMM_DEFAULT_Q 112
  2977. #define SGEMM_DEFAULT_R 4096
  2978. #define DGEMM_DEFAULT_R 4096
  2979. #define CGEMM_DEFAULT_R 4096
  2980. #define ZGEMM_DEFAULT_R 4096
  2981. #endif /* Cores */
  2982. #endif /* ARMv8 */
  2983. #if defined(ARMV9SME) /* ARMv9 SME */
  2984. #define USE_SGEMM_KERNEL_DIRECT 1
  2985. #endif /* ARMv9 SME */
  2986. #if defined(ARMV5)
  2987. #define SNUMOPT 2
  2988. #define DNUMOPT 2
  2989. #define GEMM_DEFAULT_OFFSET_A 0
  2990. #define GEMM_DEFAULT_OFFSET_B 0
  2991. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2992. #define SGEMM_DEFAULT_UNROLL_M 2
  2993. #define SGEMM_DEFAULT_UNROLL_N 2
  2994. #define DGEMM_DEFAULT_UNROLL_M 2
  2995. #define DGEMM_DEFAULT_UNROLL_N 2
  2996. #define CGEMM_DEFAULT_UNROLL_M 2
  2997. #define CGEMM_DEFAULT_UNROLL_N 2
  2998. #define ZGEMM_DEFAULT_UNROLL_M 2
  2999. #define ZGEMM_DEFAULT_UNROLL_N 2
  3000. #define SGEMM_DEFAULT_P 128
  3001. #define DGEMM_DEFAULT_P 128
  3002. #define CGEMM_DEFAULT_P 96
  3003. #define ZGEMM_DEFAULT_P 64
  3004. #define SGEMM_DEFAULT_Q 240
  3005. #define DGEMM_DEFAULT_Q 120
  3006. #define CGEMM_DEFAULT_Q 120
  3007. #define ZGEMM_DEFAULT_Q 120
  3008. #define SGEMM_DEFAULT_R 12288
  3009. #define DGEMM_DEFAULT_R 8192
  3010. #define CGEMM_DEFAULT_R 4096
  3011. #define ZGEMM_DEFAULT_R 4096
  3012. #define SYMV_P 16
  3013. #endif
  3014. #ifdef CORTEXA9
  3015. #define SNUMOPT 2
  3016. #define DNUMOPT 2
  3017. #define GEMM_DEFAULT_OFFSET_A 0
  3018. #define GEMM_DEFAULT_OFFSET_B 0
  3019. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3020. #define SGEMM_DEFAULT_UNROLL_M 4
  3021. #define SGEMM_DEFAULT_UNROLL_N 4
  3022. #define DGEMM_DEFAULT_UNROLL_M 4
  3023. #define DGEMM_DEFAULT_UNROLL_N 4
  3024. #define CGEMM_DEFAULT_UNROLL_M 2
  3025. #define CGEMM_DEFAULT_UNROLL_N 2
  3026. #define ZGEMM_DEFAULT_UNROLL_M 2
  3027. #define ZGEMM_DEFAULT_UNROLL_N 2
  3028. #define SGEMM_DEFAULT_P 128
  3029. #define DGEMM_DEFAULT_P 128
  3030. #define CGEMM_DEFAULT_P 96
  3031. #define ZGEMM_DEFAULT_P 64
  3032. #define SGEMM_DEFAULT_Q 240
  3033. #define DGEMM_DEFAULT_Q 120
  3034. #define CGEMM_DEFAULT_Q 120
  3035. #define ZGEMM_DEFAULT_Q 120
  3036. #define SGEMM_DEFAULT_R 12288
  3037. #define DGEMM_DEFAULT_R 8192
  3038. #define CGEMM_DEFAULT_R 4096
  3039. #define ZGEMM_DEFAULT_R 4096
  3040. #define SYMV_P 16
  3041. #endif
  3042. #ifdef CORTEXA15
  3043. #define SNUMOPT 2
  3044. #define DNUMOPT 2
  3045. #define GEMM_DEFAULT_OFFSET_A 0
  3046. #define GEMM_DEFAULT_OFFSET_B 0
  3047. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3048. #define SGEMM_DEFAULT_UNROLL_M 4
  3049. #define SGEMM_DEFAULT_UNROLL_N 4
  3050. #define DGEMM_DEFAULT_UNROLL_M 4
  3051. #define DGEMM_DEFAULT_UNROLL_N 4
  3052. #define CGEMM_DEFAULT_UNROLL_M 2
  3053. #define CGEMM_DEFAULT_UNROLL_N 2
  3054. #define ZGEMM_DEFAULT_UNROLL_M 2
  3055. #define ZGEMM_DEFAULT_UNROLL_N 2
  3056. #define SGEMM_DEFAULT_P 128
  3057. #define DGEMM_DEFAULT_P 128
  3058. #define CGEMM_DEFAULT_P 96
  3059. #define ZGEMM_DEFAULT_P 64
  3060. #define SGEMM_DEFAULT_Q 240
  3061. #define DGEMM_DEFAULT_Q 120
  3062. #define CGEMM_DEFAULT_Q 120
  3063. #define ZGEMM_DEFAULT_Q 120
  3064. #define SGEMM_DEFAULT_R 12288
  3065. #define DGEMM_DEFAULT_R 8192
  3066. #define CGEMM_DEFAULT_R 4096
  3067. #define ZGEMM_DEFAULT_R 4096
  3068. #define SYMV_P 16
  3069. #endif
  3070. #if defined(ZARCH_GENERIC)
  3071. #define SNUMOPT 2
  3072. #define DNUMOPT 2
  3073. #define GEMM_DEFAULT_OFFSET_A 0
  3074. #define GEMM_DEFAULT_OFFSET_B 0
  3075. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3076. #define SGEMM_DEFAULT_UNROLL_M 2
  3077. #define SGEMM_DEFAULT_UNROLL_N 2
  3078. #define DGEMM_DEFAULT_UNROLL_M 2
  3079. #define DGEMM_DEFAULT_UNROLL_N 2
  3080. #define CGEMM_DEFAULT_UNROLL_M 2
  3081. #define CGEMM_DEFAULT_UNROLL_N 2
  3082. #define ZGEMM_DEFAULT_UNROLL_M 2
  3083. #define ZGEMM_DEFAULT_UNROLL_N 2
  3084. #define SGEMM_DEFAULT_P 128
  3085. #define DGEMM_DEFAULT_P 128
  3086. #define CGEMM_DEFAULT_P 96
  3087. #define ZGEMM_DEFAULT_P 64
  3088. #define SGEMM_DEFAULT_Q 240
  3089. #define DGEMM_DEFAULT_Q 120
  3090. #define CGEMM_DEFAULT_Q 120
  3091. #define ZGEMM_DEFAULT_Q 120
  3092. #define SGEMM_DEFAULT_R 12288
  3093. #define DGEMM_DEFAULT_R 8192
  3094. #define CGEMM_DEFAULT_R 4096
  3095. #define ZGEMM_DEFAULT_R 4096
  3096. #define SYMV_P 16
  3097. #endif
  3098. #if defined(Z13)
  3099. #define SNUMOPT 2
  3100. #define DNUMOPT 2
  3101. #define GEMM_DEFAULT_OFFSET_A 0
  3102. #define GEMM_DEFAULT_OFFSET_B 0
  3103. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3104. #define SGEMM_DEFAULT_UNROLL_M 8
  3105. #define SGEMM_DEFAULT_UNROLL_N 4
  3106. #define DGEMM_DEFAULT_UNROLL_M 8
  3107. #define DGEMM_DEFAULT_UNROLL_N 4
  3108. #define CGEMM_DEFAULT_UNROLL_M 4
  3109. #define CGEMM_DEFAULT_UNROLL_N 4
  3110. #define ZGEMM_DEFAULT_UNROLL_M 4
  3111. #define ZGEMM_DEFAULT_UNROLL_N 4
  3112. #define SGEMM_DEFAULT_P 456
  3113. #define DGEMM_DEFAULT_P 320
  3114. #define CGEMM_DEFAULT_P 480
  3115. #define ZGEMM_DEFAULT_P 224
  3116. #define SGEMM_DEFAULT_Q 488
  3117. #define DGEMM_DEFAULT_Q 384
  3118. #define CGEMM_DEFAULT_Q 128
  3119. #define ZGEMM_DEFAULT_Q 352
  3120. #define SGEMM_DEFAULT_R 8192
  3121. #define DGEMM_DEFAULT_R 4096
  3122. #define CGEMM_DEFAULT_R 4096
  3123. #define ZGEMM_DEFAULT_R 2048
  3124. #define SYMV_P 16
  3125. #endif
  3126. #if defined(Z14)
  3127. #define SNUMOPT 2
  3128. #define DNUMOPT 2
  3129. #define GEMM_DEFAULT_OFFSET_A 0
  3130. #define GEMM_DEFAULT_OFFSET_B 0
  3131. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  3132. #define SGEMM_DEFAULT_UNROLL_M 16
  3133. #define SGEMM_DEFAULT_UNROLL_N 4
  3134. #define DGEMM_DEFAULT_UNROLL_M 8
  3135. #define DGEMM_DEFAULT_UNROLL_N 4
  3136. #define CGEMM_DEFAULT_UNROLL_M 4
  3137. #define CGEMM_DEFAULT_UNROLL_N 4
  3138. #define ZGEMM_DEFAULT_UNROLL_M 4
  3139. #define ZGEMM_DEFAULT_UNROLL_N 4
  3140. #define SGEMM_DEFAULT_P 480
  3141. #define DGEMM_DEFAULT_P 320
  3142. #define CGEMM_DEFAULT_P 480
  3143. #define ZGEMM_DEFAULT_P 224
  3144. #define SGEMM_DEFAULT_Q 512
  3145. #define DGEMM_DEFAULT_Q 384
  3146. #define CGEMM_DEFAULT_Q 128
  3147. #define ZGEMM_DEFAULT_Q 352
  3148. #define SGEMM_DEFAULT_R 8192
  3149. #define DGEMM_DEFAULT_R 4096
  3150. #define CGEMM_DEFAULT_R 4096
  3151. #define ZGEMM_DEFAULT_R 2048
  3152. #define SYMV_P 16
  3153. #endif
  3154. #if defined(CSKY) || defined(CK860FV)
  3155. #define GEMM_DEFAULT_OFFSET_A 0
  3156. #define GEMM_DEFAULT_OFFSET_B 0
  3157. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3158. #define SGEMM_DEFAULT_UNROLL_M 2
  3159. #define SGEMM_DEFAULT_UNROLL_N 2
  3160. #define DGEMM_DEFAULT_UNROLL_M 2
  3161. #define DGEMM_DEFAULT_UNROLL_N 2
  3162. #define CGEMM_DEFAULT_UNROLL_M 2
  3163. #define CGEMM_DEFAULT_UNROLL_N 2
  3164. #define ZGEMM_DEFAULT_UNROLL_M 2
  3165. #define ZGEMM_DEFAULT_UNROLL_N 2
  3166. #define SGEMM_DEFAULT_P 128
  3167. #define DGEMM_DEFAULT_P 128
  3168. #define CGEMM_DEFAULT_P 96
  3169. #define ZGEMM_DEFAULT_P 64
  3170. #define SGEMM_DEFAULT_Q 240
  3171. #define DGEMM_DEFAULT_Q 120
  3172. #define CGEMM_DEFAULT_Q 120
  3173. #define ZGEMM_DEFAULT_Q 120
  3174. #define SGEMM_DEFAULT_R 12288
  3175. #define DGEMM_DEFAULT_R 8192
  3176. #define CGEMM_DEFAULT_R 4096
  3177. #define ZGEMM_DEFAULT_R 4096
  3178. #define SYMV_P 16
  3179. #define GEMM_DEFAULT_OFFSET_A 0
  3180. #define GEMM_DEFAULT_OFFSET_B 0
  3181. #endif
  3182. #ifdef GENERIC
  3183. #define SNUMOPT 2
  3184. #define DNUMOPT 2
  3185. #define GEMM_DEFAULT_OFFSET_A 0
  3186. #define GEMM_DEFAULT_OFFSET_B 0
  3187. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  3188. #define SGEMM_DEFAULT_UNROLL_N 2
  3189. #define DGEMM_DEFAULT_UNROLL_N 2
  3190. #define QGEMM_DEFAULT_UNROLL_N 2
  3191. #define CGEMM_DEFAULT_UNROLL_N 2
  3192. #define ZGEMM_DEFAULT_UNROLL_N 2
  3193. #define XGEMM_DEFAULT_UNROLL_N 1
  3194. #define CGEMM3M_DEFAULT_UNROLL_N 2
  3195. #define ZGEMM3M_DEFAULT_UNROLL_N 2
  3196. #ifdef ARCH_X86
  3197. #define SGEMM_DEFAULT_UNROLL_M 2
  3198. #define DGEMM_DEFAULT_UNROLL_M 2
  3199. #define QGEMM_DEFAULT_UNROLL_M 2
  3200. #define CGEMM_DEFAULT_UNROLL_M 2
  3201. #define ZGEMM_DEFAULT_UNROLL_M 2
  3202. #define XGEMM_DEFAULT_UNROLL_M 1
  3203. #else
  3204. #define SGEMM_DEFAULT_UNROLL_M 2
  3205. #define DGEMM_DEFAULT_UNROLL_M 2
  3206. #define QGEMM_DEFAULT_UNROLL_M 2
  3207. #define CGEMM_DEFAULT_UNROLL_M 2
  3208. #define ZGEMM_DEFAULT_UNROLL_M 2
  3209. #define XGEMM_DEFAULT_UNROLL_M 1
  3210. #define CGEMM3M_DEFAULT_UNROLL_M 2
  3211. #define ZGEMM3M_DEFAULT_UNROLL_M 2
  3212. #define CGEMM3M_DEFAULT_P 448
  3213. #define ZGEMM3M_DEFAULT_P 224
  3214. #define XGEMM3M_DEFAULT_P 112
  3215. #define CGEMM3M_DEFAULT_Q 224
  3216. #define ZGEMM3M_DEFAULT_Q 224
  3217. #define XGEMM3M_DEFAULT_Q 224
  3218. #define CGEMM3M_DEFAULT_R 12288
  3219. #define ZGEMM3M_DEFAULT_R 12288
  3220. #define XGEMM3M_DEFAULT_R 12288
  3221. #endif
  3222. #ifdef ARCH_MIPS
  3223. #define SGEMM_DEFAULT_P 128
  3224. #define DGEMM_DEFAULT_P 128
  3225. #define CGEMM_DEFAULT_P 96
  3226. #define ZGEMM_DEFAULT_P 64
  3227. #define SGEMM_DEFAULT_Q 240
  3228. #define DGEMM_DEFAULT_Q 120
  3229. #define CGEMM_DEFAULT_Q 120
  3230. #define ZGEMM_DEFAULT_Q 120
  3231. #define SGEMM_DEFAULT_R 12288
  3232. #define DGEMM_DEFAULT_R 8192
  3233. #define CGEMM_DEFAULT_R 4096
  3234. #define ZGEMM_DEFAULT_R 4096
  3235. #elif defined(ARCH_LOONGARCH64)
  3236. #define SGEMM_DEFAULT_P 128
  3237. #define DGEMM_DEFAULT_P 128
  3238. #define CGEMM_DEFAULT_P 96
  3239. #define ZGEMM_DEFAULT_P 64
  3240. #define SGEMM_DEFAULT_Q 240
  3241. #define DGEMM_DEFAULT_Q 120
  3242. #define CGEMM_DEFAULT_Q 120
  3243. #define ZGEMM_DEFAULT_Q 120
  3244. #define SGEMM_DEFAULT_R 12288
  3245. #define DGEMM_DEFAULT_R 8192
  3246. #define CGEMM_DEFAULT_R 4096
  3247. #define ZGEMM_DEFAULT_R 4096
  3248. #else
  3249. #define SGEMM_DEFAULT_P sgemm_p
  3250. #define DGEMM_DEFAULT_P dgemm_p
  3251. #define QGEMM_DEFAULT_P qgemm_p
  3252. #define CGEMM_DEFAULT_P cgemm_p
  3253. #define ZGEMM_DEFAULT_P zgemm_p
  3254. #define XGEMM_DEFAULT_P xgemm_p
  3255. #define SGEMM_DEFAULT_R sgemm_r
  3256. #define DGEMM_DEFAULT_R dgemm_r
  3257. #define QGEMM_DEFAULT_R qgemm_r
  3258. #define CGEMM_DEFAULT_R cgemm_r
  3259. #define ZGEMM_DEFAULT_R zgemm_r
  3260. #define XGEMM_DEFAULT_R xgemm_r
  3261. #define SGEMM_DEFAULT_Q 128
  3262. #define DGEMM_DEFAULT_Q 128
  3263. #define QGEMM_DEFAULT_Q 128
  3264. #define CGEMM_DEFAULT_Q 128
  3265. #define ZGEMM_DEFAULT_Q 128
  3266. #define XGEMM_DEFAULT_Q 128
  3267. #endif
  3268. #define SYMV_P 16
  3269. #endif
  3270. #ifndef SWITCH_RATIO
  3271. #define SWITCH_RATIO 2
  3272. #endif
  3273. #ifndef QGEMM_DEFAULT_UNROLL_M
  3274. #define QGEMM_DEFAULT_UNROLL_M 2
  3275. #endif
  3276. #ifndef QGEMM_DEFAULT_UNROLL_N
  3277. #define QGEMM_DEFAULT_UNROLL_N 2
  3278. #endif
  3279. #ifndef XGEMM_DEFAULT_UNROLL_M
  3280. #define XGEMM_DEFAULT_UNROLL_M 2
  3281. #endif
  3282. #ifndef XGEMM_DEFAULT_UNROLL_N
  3283. #define XGEMM_DEFAULT_UNROLL_N 2
  3284. #endif
  3285. #ifndef HAVE_SSE2
  3286. #define SHUFPD_0 shufps $0x44,
  3287. #define SHUFPD_1 shufps $0x4e,
  3288. #define SHUFPD_2 shufps $0xe4,
  3289. #define SHUFPD_3 shufps $0xee,
  3290. #endif
  3291. #ifndef SHUFPD_0
  3292. #define SHUFPD_0 shufpd $0,
  3293. #endif
  3294. #ifndef SHUFPD_1
  3295. #define SHUFPD_1 shufpd $1,
  3296. #endif
  3297. #ifndef SHUFPD_2
  3298. #define SHUFPD_2 shufpd $2,
  3299. #endif
  3300. #ifndef SHUFPD_3
  3301. #define SHUFPD_3 shufpd $3,
  3302. #endif
  3303. #ifndef SHUFPS_39
  3304. #define SHUFPS_39 shufps $0x39,
  3305. #endif
  3306. #endif