You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

param.h 90 kB

12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
5 years ago
12 years ago
12 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
12 years ago
12 years ago
12 years ago
5 years ago
5 years ago
12 years ago
5 years ago
5 years ago
12 years ago
5 years ago
12 years ago
5 years ago
5 years ago
5 years ago
12 years ago
6 years ago
12 years ago
12 years ago
12 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
3 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
12 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
12 years ago
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324132513261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384138513861387138813891390139113921393139413951396139713981399140014011402140314041405140614071408140914101411141214131414141514161417141814191420142114221423142414251426142714281429143014311432143314341435143614371438143914401441144214431444144514461447144814491450145114521453145414551456145714581459146014611462146314641465146614671468146914701471147214731474147514761477147814791480148114821483148414851486148714881489149014911492149314941495149614971498149915001501150215031504150515061507150815091510151115121513151415151516151715181519152015211522152315241525152615271528152915301531153215331534153515361537153815391540154115421543154415451546154715481549155015511552155315541555155615571558155915601561156215631564156515661567156815691570157115721573157415751576157715781579158015811582158315841585158615871588158915901591159215931594159515961597159815991600160116021603160416051606160716081609161016111612161316141615161616171618161916201621162216231624162516261627162816291630163116321633163416351636163716381639164016411642164316441645164616471648164916501651165216531654165516561657165816591660166116621663166416651666166716681669167016711672167316741675167616771678167916801681168216831684168516861687168816891690169116921693169416951696169716981699170017011702170317041705170617071708170917101711171217131714171517161717171817191720172117221723172417251726172717281729173017311732173317341735173617371738173917401741174217431744174517461747174817491750175117521753175417551756175717581759176017611762176317641765176617671768176917701771177217731774177517761777177817791780178117821783178417851786178717881789179017911792179317941795179617971798179918001801180218031804180518061807180818091810181118121813181418151816181718181819182018211822182318241825182618271828182918301831183218331834183518361837183818391840184118421843184418451846184718481849185018511852185318541855185618571858185918601861186218631864186518661867186818691870187118721873187418751876187718781879188018811882188318841885188618871888188918901891189218931894189518961897189818991900190119021903190419051906190719081909191019111912191319141915191619171918191919201921192219231924192519261927192819291930193119321933193419351936193719381939194019411942194319441945194619471948194919501951195219531954195519561957195819591960196119621963196419651966196719681969197019711972197319741975197619771978197919801981198219831984198519861987198819891990199119921993199419951996199719981999200020012002200320042005200620072008200920102011201220132014201520162017201820192020202120222023202420252026202720282029203020312032203320342035203620372038203920402041204220432044204520462047204820492050205120522053205420552056205720582059206020612062206320642065206620672068206920702071207220732074207520762077207820792080208120822083208420852086208720882089209020912092209320942095209620972098209921002101210221032104210521062107210821092110211121122113211421152116211721182119212021212122212321242125212621272128212921302131213221332134213521362137213821392140214121422143214421452146214721482149215021512152215321542155215621572158215921602161216221632164216521662167216821692170217121722173217421752176217721782179218021812182218321842185218621872188218921902191219221932194219521962197219821992200220122022203220422052206220722082209221022112212221322142215221622172218221922202221222222232224222522262227222822292230223122322233223422352236223722382239224022412242224322442245224622472248224922502251225222532254225522562257225822592260226122622263226422652266226722682269227022712272227322742275227622772278227922802281228222832284228522862287228822892290229122922293229422952296229722982299230023012302230323042305230623072308230923102311231223132314231523162317231823192320232123222323232423252326232723282329233023312332233323342335233623372338233923402341234223432344234523462347234823492350235123522353235423552356235723582359236023612362236323642365236623672368236923702371237223732374237523762377237823792380238123822383238423852386238723882389239023912392239323942395239623972398239924002401240224032404240524062407240824092410241124122413241424152416241724182419242024212422242324242425242624272428242924302431243224332434243524362437243824392440244124422443244424452446244724482449245024512452245324542455245624572458245924602461246224632464246524662467246824692470247124722473247424752476247724782479248024812482248324842485248624872488248924902491249224932494249524962497249824992500250125022503250425052506250725082509251025112512251325142515251625172518251925202521252225232524252525262527252825292530253125322533253425352536253725382539254025412542254325442545254625472548254925502551255225532554255525562557255825592560256125622563256425652566256725682569257025712572257325742575257625772578257925802581258225832584258525862587258825892590259125922593259425952596259725982599260026012602260326042605260626072608260926102611261226132614261526162617261826192620262126222623262426252626262726282629263026312632263326342635263626372638263926402641264226432644264526462647264826492650265126522653265426552656265726582659266026612662266326642665266626672668266926702671267226732674267526762677267826792680268126822683268426852686268726882689269026912692269326942695269626972698269927002701270227032704270527062707270827092710271127122713271427152716271727182719272027212722272327242725272627272728272927302731273227332734273527362737273827392740274127422743274427452746274727482749275027512752275327542755275627572758275927602761276227632764276527662767276827692770277127722773277427752776277727782779278027812782278327842785278627872788278927902791279227932794279527962797279827992800280128022803280428052806280728082809281028112812281328142815281628172818281928202821282228232824282528262827282828292830283128322833283428352836283728382839284028412842284328442845284628472848284928502851285228532854285528562857285828592860286128622863286428652866286728682869287028712872287328742875287628772878287928802881288228832884288528862887288828892890289128922893289428952896289728982899290029012902290329042905290629072908290929102911291229132914291529162917291829192920292129222923292429252926292729282929293029312932293329342935293629372938293929402941294229432944294529462947294829492950295129522953295429552956295729582959296029612962296329642965296629672968296929702971297229732974297529762977297829792980298129822983298429852986298729882989299029912992299329942995299629972998299930003001300230033004300530063007300830093010301130123013301430153016301730183019302030213022302330243025302630273028302930303031303230333034303530363037303830393040304130423043304430453046304730483049305030513052305330543055305630573058305930603061306230633064306530663067306830693070307130723073307430753076307730783079308030813082308330843085308630873088308930903091309230933094309530963097309830993100310131023103310431053106310731083109311031113112311331143115311631173118311931203121312231233124312531263127312831293130313131323133313431353136313731383139314031413142314331443145314631473148314931503151315231533154315531563157315831593160316131623163316431653166316731683169317031713172317331743175317631773178317931803181318231833184318531863187318831893190319131923193319431953196319731983199320032013202320332043205320632073208320932103211321232133214321532163217321832193220322132223223322432253226322732283229323032313232323332343235323632373238323932403241324232433244324532463247324832493250325132523253325432553256325732583259326032613262326332643265326632673268326932703271327232733274327532763277327832793280328132823283328432853286328732883289329032913292329332943295329632973298329933003301330233033304330533063307330833093310331133123313331433153316331733183319332033213322332333243325332633273328332933303331333233333334333533363337333833393340334133423343334433453346334733483349335033513352335333543355335633573358335933603361336233633364336533663367336833693370337133723373337433753376337733783379338033813382338333843385338633873388338933903391339233933394339533963397339833993400340134023403340434053406340734083409341034113412341334143415341634173418341934203421342234233424342534263427342834293430343134323433343434353436343734383439344034413442344334443445344634473448344934503451345234533454345534563457345834593460346134623463346434653466346734683469347034713472347334743475347634773478347934803481348234833484348534863487348834893490349134923493349434953496349734983499350035013502350335043505350635073508350935103511351235133514351535163517351835193520352135223523352435253526352735283529353035313532353335343535353635373538353935403541354235433544354535463547354835493550355135523553355435553556355735583559356035613562356335643565356635673568356935703571357235733574357535763577357835793580358135823583358435853586358735883589359035913592359335943595359635973598359936003601360236033604360536063607360836093610361136123613361436153616361736183619362036213622362336243625362636273628362936303631363236333634363536363637363836393640364136423643364436453646364736483649365036513652365336543655365636573658365936603661366236633664366536663667366836693670367136723673367436753676367736783679368036813682368336843685368636873688368936903691369236933694369536963697369836993700370137023703370437053706370737083709371037113712371337143715371637173718371937203721372237233724372537263727372837293730373137323733373437353736373737383739374037413742374337443745374637473748374937503751375237533754375537563757375837593760376137623763376437653766376737683769377037713772377337743775377637773778377937803781378237833784378537863787378837893790379137923793379437953796379737983799380038013802380338043805380638073808380938103811381238133814
  1. /*****************************************************************************
  2. Copyright (c) 2011-2014, The OpenBLAS Project
  3. All rights reserved.
  4. Redistribution and use in source and binary forms, with or without
  5. modification, are permitted provided that the following conditions are
  6. met:
  7. 1. Redistributions of source code must retain the above copyright
  8. notice, this list of conditions and the following disclaimer.
  9. 2. Redistributions in binary form must reproduce the above copyright
  10. notice, this list of conditions and the following disclaimer in
  11. the documentation and/or other materials provided with the
  12. distribution.
  13. 3. Neither the name of the OpenBLAS project nor the names of
  14. its contributors may be used to endorse or promote products
  15. derived from this software without specific prior written
  16. permission.
  17. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  18. AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  19. IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  20. ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  21. LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  22. DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  23. SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  24. CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  25. OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
  26. USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  27. **********************************************************************************/
  28. /*********************************************************************/
  29. /* Copyright 2009, 2010 The University of Texas at Austin. */
  30. /* All rights reserved. */
  31. /* */
  32. /* Redistribution and use in source and binary forms, with or */
  33. /* without modification, are permitted provided that the following */
  34. /* conditions are met: */
  35. /* */
  36. /* 1. Redistributions of source code must retain the above */
  37. /* copyright notice, this list of conditions and the following */
  38. /* disclaimer. */
  39. /* */
  40. /* 2. Redistributions in binary form must reproduce the above */
  41. /* copyright notice, this list of conditions and the following */
  42. /* disclaimer in the documentation and/or other materials */
  43. /* provided with the distribution. */
  44. /* */
  45. /* THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF TEXAS AT */
  46. /* AUSTIN ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, */
  47. /* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */
  48. /* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */
  49. /* DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT */
  50. /* AUSTIN OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, */
  51. /* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES */
  52. /* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE */
  53. /* GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR */
  54. /* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF */
  55. /* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT */
  56. /* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT */
  57. /* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */
  58. /* POSSIBILITY OF SUCH DAMAGE. */
  59. /* */
  60. /* The views and conclusions contained in the software and */
  61. /* documentation are those of the authors and should not be */
  62. /* interpreted as representing official policies, either expressed */
  63. /* or implied, of The University of Texas at Austin. */
  64. /*********************************************************************/
  65. #ifndef PARAM_H
  66. #define PARAM_H
  67. #define SBGEMM_DEFAULT_UNROLL_N 4
  68. #define SBGEMM_DEFAULT_UNROLL_M 8
  69. #define SBGEMM_DEFAULT_UNROLL_MN 32
  70. #define SBGEMM_DEFAULT_P 256
  71. #define SBGEMM_DEFAULT_R 256
  72. #define SBGEMM_DEFAULT_Q 256
  73. #ifdef OPTERON
  74. #define SNUMOPT 4
  75. #define DNUMOPT 2
  76. #define GEMM_DEFAULT_OFFSET_A 64
  77. #define GEMM_DEFAULT_OFFSET_B 256
  78. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x01ffffUL
  79. #define SGEMM_DEFAULT_UNROLL_N 4
  80. #define DGEMM_DEFAULT_UNROLL_N 4
  81. #define QGEMM_DEFAULT_UNROLL_N 2
  82. #define CGEMM_DEFAULT_UNROLL_N 2
  83. #define ZGEMM_DEFAULT_UNROLL_N 2
  84. #define XGEMM_DEFAULT_UNROLL_N 1
  85. #ifdef ARCH_X86
  86. #define SGEMM_DEFAULT_UNROLL_M 4
  87. #define DGEMM_DEFAULT_UNROLL_M 2
  88. #define QGEMM_DEFAULT_UNROLL_M 2
  89. #define CGEMM_DEFAULT_UNROLL_M 2
  90. #define ZGEMM_DEFAULT_UNROLL_M 1
  91. #define XGEMM_DEFAULT_UNROLL_M 1
  92. #else
  93. #define SGEMM_DEFAULT_UNROLL_M 8
  94. #define DGEMM_DEFAULT_UNROLL_M 4
  95. #define QGEMM_DEFAULT_UNROLL_M 2
  96. #define CGEMM_DEFAULT_UNROLL_M 4
  97. #define ZGEMM_DEFAULT_UNROLL_M 2
  98. #define XGEMM_DEFAULT_UNROLL_M 1
  99. #endif
  100. #define SGEMM_DEFAULT_P sgemm_p
  101. #define DGEMM_DEFAULT_P dgemm_p
  102. #define QGEMM_DEFAULT_P qgemm_p
  103. #define CGEMM_DEFAULT_P cgemm_p
  104. #define ZGEMM_DEFAULT_P zgemm_p
  105. #define XGEMM_DEFAULT_P xgemm_p
  106. #define SGEMM_DEFAULT_R sgemm_r
  107. #define DGEMM_DEFAULT_R dgemm_r
  108. #define QGEMM_DEFAULT_R qgemm_r
  109. #define CGEMM_DEFAULT_R cgemm_r
  110. #define ZGEMM_DEFAULT_R zgemm_r
  111. #define XGEMM_DEFAULT_R xgemm_r
  112. #ifdef ALLOC_HUGETLB
  113. #define SGEMM_DEFAULT_Q 248
  114. #define DGEMM_DEFAULT_Q 248
  115. #define QGEMM_DEFAULT_Q 248
  116. #define CGEMM_DEFAULT_Q 248
  117. #define ZGEMM_DEFAULT_Q 248
  118. #define XGEMM_DEFAULT_Q 248
  119. #else
  120. #define SGEMM_DEFAULT_Q 240
  121. #define DGEMM_DEFAULT_Q 240
  122. #define QGEMM_DEFAULT_Q 240
  123. #define CGEMM_DEFAULT_Q 240
  124. #define ZGEMM_DEFAULT_Q 240
  125. #define XGEMM_DEFAULT_Q 240
  126. #endif
  127. #define SYMV_P 16
  128. #define HAVE_EXCLUSIVE_CACHE
  129. #endif
  130. #if defined(BARCELONA) || defined(SHANGHAI) || defined(BOBCAT)
  131. #define SNUMOPT 8
  132. #define DNUMOPT 4
  133. #define GEMM_DEFAULT_OFFSET_A 64
  134. #define GEMM_DEFAULT_OFFSET_B 832
  135. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  136. #define SGEMM_DEFAULT_UNROLL_N 4
  137. #define DGEMM_DEFAULT_UNROLL_N 4
  138. #define QGEMM_DEFAULT_UNROLL_N 2
  139. #define CGEMM_DEFAULT_UNROLL_N 2
  140. #define ZGEMM_DEFAULT_UNROLL_N 2
  141. #define XGEMM_DEFAULT_UNROLL_N 1
  142. #ifdef ARCH_X86
  143. #define SGEMM_DEFAULT_UNROLL_M 4
  144. #define DGEMM_DEFAULT_UNROLL_M 2
  145. #define QGEMM_DEFAULT_UNROLL_M 2
  146. #define CGEMM_DEFAULT_UNROLL_M 2
  147. #define ZGEMM_DEFAULT_UNROLL_M 1
  148. #define XGEMM_DEFAULT_UNROLL_M 1
  149. #else
  150. #define SGEMM_DEFAULT_UNROLL_M 8
  151. #define DGEMM_DEFAULT_UNROLL_M 4
  152. #define QGEMM_DEFAULT_UNROLL_M 2
  153. #define CGEMM_DEFAULT_UNROLL_M 4
  154. #define ZGEMM_DEFAULT_UNROLL_M 2
  155. #define XGEMM_DEFAULT_UNROLL_M 1
  156. #endif
  157. #if 0
  158. #define SGEMM_DEFAULT_P 496
  159. #define DGEMM_DEFAULT_P 248
  160. #define QGEMM_DEFAULT_P 124
  161. #define CGEMM_DEFAULT_P 248
  162. #define ZGEMM_DEFAULT_P 124
  163. #define XGEMM_DEFAULT_P 62
  164. #define SGEMM_DEFAULT_Q 248
  165. #define DGEMM_DEFAULT_Q 248
  166. #define QGEMM_DEFAULT_Q 248
  167. #define CGEMM_DEFAULT_Q 248
  168. #define ZGEMM_DEFAULT_Q 248
  169. #define XGEMM_DEFAULT_Q 248
  170. #else
  171. #define SGEMM_DEFAULT_P 448
  172. #define DGEMM_DEFAULT_P 224
  173. #define QGEMM_DEFAULT_P 112
  174. #define CGEMM_DEFAULT_P 224
  175. #define ZGEMM_DEFAULT_P 112
  176. #define XGEMM_DEFAULT_P 56
  177. #define SGEMM_DEFAULT_Q 224
  178. #define DGEMM_DEFAULT_Q 224
  179. #define QGEMM_DEFAULT_Q 224
  180. #define CGEMM_DEFAULT_Q 224
  181. #define ZGEMM_DEFAULT_Q 224
  182. #define XGEMM_DEFAULT_Q 224
  183. #endif
  184. #define SGEMM_DEFAULT_R sgemm_r
  185. #define QGEMM_DEFAULT_R qgemm_r
  186. #define DGEMM_DEFAULT_R dgemm_r
  187. #define CGEMM_DEFAULT_R cgemm_r
  188. #define ZGEMM_DEFAULT_R zgemm_r
  189. #define XGEMM_DEFAULT_R xgemm_r
  190. #define SYMV_P 16
  191. #define HAVE_EXCLUSIVE_CACHE
  192. #define GEMM_THREAD gemm_thread_mn
  193. #endif
  194. #ifdef BULLDOZER
  195. #define SNUMOPT 8
  196. #define DNUMOPT 4
  197. #define GEMM_DEFAULT_OFFSET_A 64
  198. #define GEMM_DEFAULT_OFFSET_B 832
  199. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  200. #define QGEMM_DEFAULT_UNROLL_N 2
  201. #define CGEMM_DEFAULT_UNROLL_N 2
  202. #define ZGEMM_DEFAULT_UNROLL_N 2
  203. #define XGEMM_DEFAULT_UNROLL_N 1
  204. #ifdef ARCH_X86
  205. #define SGEMM_DEFAULT_UNROLL_N 4
  206. #define DGEMM_DEFAULT_UNROLL_N 4
  207. #define SGEMM_DEFAULT_UNROLL_M 4
  208. #define DGEMM_DEFAULT_UNROLL_M 2
  209. #define QGEMM_DEFAULT_UNROLL_M 2
  210. #define CGEMM_DEFAULT_UNROLL_M 2
  211. #define ZGEMM_DEFAULT_UNROLL_M 1
  212. #define XGEMM_DEFAULT_UNROLL_M 1
  213. #else
  214. #define SGEMM_DEFAULT_UNROLL_N 2
  215. #define DGEMM_DEFAULT_UNROLL_N 2
  216. #define SGEMM_DEFAULT_UNROLL_M 16
  217. #define DGEMM_DEFAULT_UNROLL_M 8
  218. #define QGEMM_DEFAULT_UNROLL_M 2
  219. #define CGEMM_DEFAULT_UNROLL_M 4
  220. #define ZGEMM_DEFAULT_UNROLL_M 2
  221. #define XGEMM_DEFAULT_UNROLL_M 1
  222. #define CGEMM3M_DEFAULT_UNROLL_N 4
  223. #define CGEMM3M_DEFAULT_UNROLL_M 8
  224. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  225. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  226. #define DGEMM_DEFAULT_UNROLL_MN 16
  227. #define GEMV_UNROLL 8
  228. #endif
  229. #if defined(ARCH_X86_64)
  230. #define SGEMM_DEFAULT_P 768
  231. #define DGEMM_DEFAULT_P 384
  232. #else
  233. #define SGEMM_DEFAULT_P 448
  234. #define DGEMM_DEFAULT_P 224
  235. #endif
  236. #define QGEMM_DEFAULT_P 112
  237. #define CGEMM_DEFAULT_P 224
  238. #define ZGEMM_DEFAULT_P 112
  239. #define XGEMM_DEFAULT_P 56
  240. #if defined(ARCH_X86_64)
  241. #define SGEMM_DEFAULT_Q 168
  242. #define DGEMM_DEFAULT_Q 168
  243. #else
  244. #define SGEMM_DEFAULT_Q 224
  245. #define DGEMM_DEFAULT_Q 224
  246. #endif
  247. #define QGEMM_DEFAULT_Q 224
  248. #define CGEMM_DEFAULT_Q 224
  249. #define ZGEMM_DEFAULT_Q 224
  250. #define XGEMM_DEFAULT_Q 224
  251. #define CGEMM3M_DEFAULT_P 448
  252. #define ZGEMM3M_DEFAULT_P 224
  253. #define XGEMM3M_DEFAULT_P 112
  254. #define CGEMM3M_DEFAULT_Q 224
  255. #define ZGEMM3M_DEFAULT_Q 224
  256. #define XGEMM3M_DEFAULT_Q 224
  257. #define CGEMM3M_DEFAULT_R 12288
  258. #define ZGEMM3M_DEFAULT_R 12288
  259. #define XGEMM3M_DEFAULT_R 12288
  260. #define SGEMM_DEFAULT_R sgemm_r
  261. #define QGEMM_DEFAULT_R qgemm_r
  262. #define DGEMM_DEFAULT_R dgemm_r
  263. #define CGEMM_DEFAULT_R cgemm_r
  264. #define ZGEMM_DEFAULT_R zgemm_r
  265. #define XGEMM_DEFAULT_R xgemm_r
  266. #define SYMV_P 16
  267. #define HAVE_EXCLUSIVE_CACHE
  268. #define GEMM_THREAD gemm_thread_mn
  269. #endif
  270. #ifdef PILEDRIVER
  271. #define SNUMOPT 8
  272. #define DNUMOPT 4
  273. #define GEMM_DEFAULT_OFFSET_A 64
  274. #define GEMM_DEFAULT_OFFSET_B 832
  275. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  276. #define QGEMM_DEFAULT_UNROLL_N 2
  277. #define CGEMM_DEFAULT_UNROLL_N 2
  278. #define ZGEMM_DEFAULT_UNROLL_N 2
  279. #define XGEMM_DEFAULT_UNROLL_N 1
  280. #ifdef ARCH_X86
  281. #define SGEMM_DEFAULT_UNROLL_N 4
  282. #define DGEMM_DEFAULT_UNROLL_N 4
  283. #define SGEMM_DEFAULT_UNROLL_M 4
  284. #define DGEMM_DEFAULT_UNROLL_M 2
  285. #define QGEMM_DEFAULT_UNROLL_M 2
  286. #define CGEMM_DEFAULT_UNROLL_M 2
  287. #define ZGEMM_DEFAULT_UNROLL_M 1
  288. #define XGEMM_DEFAULT_UNROLL_M 1
  289. #else
  290. #define SGEMM_DEFAULT_UNROLL_N 2
  291. #define DGEMM_DEFAULT_UNROLL_N 2
  292. #define SGEMM_DEFAULT_UNROLL_M 16
  293. #define DGEMM_DEFAULT_UNROLL_M 8
  294. #define QGEMM_DEFAULT_UNROLL_M 2
  295. #define CGEMM_DEFAULT_UNROLL_M 4
  296. #define ZGEMM_DEFAULT_UNROLL_M 2
  297. #define XGEMM_DEFAULT_UNROLL_M 1
  298. #define CGEMM3M_DEFAULT_UNROLL_N 4
  299. #define CGEMM3M_DEFAULT_UNROLL_M 8
  300. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  301. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  302. #define GEMV_UNROLL 8
  303. #endif
  304. #if defined(ARCH_X86_64)
  305. #define SGEMM_DEFAULT_P 768
  306. #define DGEMM_DEFAULT_P 768
  307. #define ZGEMM_DEFAULT_P 384
  308. #define CGEMM_DEFAULT_P 768
  309. #else
  310. #define SGEMM_DEFAULT_P 448
  311. #define DGEMM_DEFAULT_P 480
  312. #define ZGEMM_DEFAULT_P 112
  313. #define CGEMM_DEFAULT_P 224
  314. #endif
  315. #define QGEMM_DEFAULT_P 112
  316. #define XGEMM_DEFAULT_P 56
  317. #if defined(ARCH_X86_64)
  318. #define SGEMM_DEFAULT_Q 192
  319. #define DGEMM_DEFAULT_Q 168
  320. #define ZGEMM_DEFAULT_Q 168
  321. #define CGEMM_DEFAULT_Q 168
  322. #else
  323. #define SGEMM_DEFAULT_Q 224
  324. #define DGEMM_DEFAULT_Q 224
  325. #define ZGEMM_DEFAULT_Q 224
  326. #define CGEMM_DEFAULT_Q 224
  327. #endif
  328. #define QGEMM_DEFAULT_Q 224
  329. #define XGEMM_DEFAULT_Q 224
  330. #define CGEMM3M_DEFAULT_P 448
  331. #define ZGEMM3M_DEFAULT_P 224
  332. #define XGEMM3M_DEFAULT_P 112
  333. #define CGEMM3M_DEFAULT_Q 224
  334. #define ZGEMM3M_DEFAULT_Q 224
  335. #define XGEMM3M_DEFAULT_Q 224
  336. #define CGEMM3M_DEFAULT_R 12288
  337. #define ZGEMM3M_DEFAULT_R 12288
  338. #define XGEMM3M_DEFAULT_R 12288
  339. #define SGEMM_DEFAULT_R 12288
  340. #define QGEMM_DEFAULT_R qgemm_r
  341. #define DGEMM_DEFAULT_R 12288
  342. #define CGEMM_DEFAULT_R cgemm_r
  343. #define ZGEMM_DEFAULT_R zgemm_r
  344. #define XGEMM_DEFAULT_R xgemm_r
  345. #define SYMV_P 16
  346. #define HAVE_EXCLUSIVE_CACHE
  347. #define GEMM_THREAD gemm_thread_mn
  348. #endif
  349. #ifdef STEAMROLLER
  350. #define SNUMOPT 8
  351. #define DNUMOPT 4
  352. #define GEMM_DEFAULT_OFFSET_A 64
  353. #define GEMM_DEFAULT_OFFSET_B 832
  354. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  355. #define QGEMM_DEFAULT_UNROLL_N 2
  356. #define CGEMM_DEFAULT_UNROLL_N 2
  357. #define ZGEMM_DEFAULT_UNROLL_N 2
  358. #define XGEMM_DEFAULT_UNROLL_N 1
  359. #ifdef ARCH_X86
  360. #define SGEMM_DEFAULT_UNROLL_N 4
  361. #define DGEMM_DEFAULT_UNROLL_N 4
  362. #define SGEMM_DEFAULT_UNROLL_M 4
  363. #define DGEMM_DEFAULT_UNROLL_M 2
  364. #define QGEMM_DEFAULT_UNROLL_M 2
  365. #define CGEMM_DEFAULT_UNROLL_M 2
  366. #define ZGEMM_DEFAULT_UNROLL_M 1
  367. #define XGEMM_DEFAULT_UNROLL_M 1
  368. #else
  369. #define SGEMM_DEFAULT_UNROLL_N 2
  370. #define DGEMM_DEFAULT_UNROLL_N 2
  371. #define SGEMM_DEFAULT_UNROLL_M 16
  372. #define DGEMM_DEFAULT_UNROLL_M 8
  373. #define QGEMM_DEFAULT_UNROLL_M 2
  374. #define CGEMM_DEFAULT_UNROLL_M 4
  375. #define ZGEMM_DEFAULT_UNROLL_M 2
  376. #define XGEMM_DEFAULT_UNROLL_M 1
  377. #define CGEMM3M_DEFAULT_UNROLL_N 4
  378. #define CGEMM3M_DEFAULT_UNROLL_M 8
  379. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  380. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  381. #define GEMV_UNROLL 8
  382. #endif
  383. #if defined(ARCH_X86_64)
  384. #define SGEMM_DEFAULT_P 768
  385. #define DGEMM_DEFAULT_P 576
  386. #define ZGEMM_DEFAULT_P 288
  387. #define CGEMM_DEFAULT_P 576
  388. #else
  389. #define SGEMM_DEFAULT_P 448
  390. #define DGEMM_DEFAULT_P 480
  391. #define ZGEMM_DEFAULT_P 112
  392. #define CGEMM_DEFAULT_P 224
  393. #endif
  394. #define QGEMM_DEFAULT_P 112
  395. #define XGEMM_DEFAULT_P 56
  396. #if defined(ARCH_X86_64)
  397. #define SGEMM_DEFAULT_Q 192
  398. #define DGEMM_DEFAULT_Q 160
  399. #define ZGEMM_DEFAULT_Q 160
  400. #define CGEMM_DEFAULT_Q 160
  401. #else
  402. #define SGEMM_DEFAULT_Q 224
  403. #define DGEMM_DEFAULT_Q 224
  404. #define ZGEMM_DEFAULT_Q 224
  405. #define CGEMM_DEFAULT_Q 224
  406. #endif
  407. #define QGEMM_DEFAULT_Q 224
  408. #define XGEMM_DEFAULT_Q 224
  409. #define CGEMM3M_DEFAULT_P 448
  410. #define ZGEMM3M_DEFAULT_P 224
  411. #define XGEMM3M_DEFAULT_P 112
  412. #define CGEMM3M_DEFAULT_Q 224
  413. #define ZGEMM3M_DEFAULT_Q 224
  414. #define XGEMM3M_DEFAULT_Q 224
  415. #define CGEMM3M_DEFAULT_R 12288
  416. #define ZGEMM3M_DEFAULT_R 12288
  417. #define XGEMM3M_DEFAULT_R 12288
  418. #define SGEMM_DEFAULT_R 12288
  419. #define QGEMM_DEFAULT_R qgemm_r
  420. #define DGEMM_DEFAULT_R 12288
  421. #define CGEMM_DEFAULT_R cgemm_r
  422. #define ZGEMM_DEFAULT_R zgemm_r
  423. #define XGEMM_DEFAULT_R xgemm_r
  424. #define SYMV_P 16
  425. #define HAVE_EXCLUSIVE_CACHE
  426. #define GEMM_THREAD gemm_thread_mn
  427. #endif
  428. #ifdef EXCAVATOR
  429. #define SNUMOPT 8
  430. #define DNUMOPT 4
  431. #define GEMM_DEFAULT_OFFSET_A 64
  432. #define GEMM_DEFAULT_OFFSET_B 832
  433. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  434. #define QGEMM_DEFAULT_UNROLL_N 2
  435. #define CGEMM_DEFAULT_UNROLL_N 2
  436. #define ZGEMM_DEFAULT_UNROLL_N 2
  437. #define XGEMM_DEFAULT_UNROLL_N 1
  438. #ifdef ARCH_X86
  439. #define SGEMM_DEFAULT_UNROLL_N 4
  440. #define DGEMM_DEFAULT_UNROLL_N 4
  441. #define SGEMM_DEFAULT_UNROLL_M 4
  442. #define DGEMM_DEFAULT_UNROLL_M 2
  443. #define QGEMM_DEFAULT_UNROLL_M 2
  444. #define CGEMM_DEFAULT_UNROLL_M 2
  445. #define ZGEMM_DEFAULT_UNROLL_M 1
  446. #define XGEMM_DEFAULT_UNROLL_M 1
  447. #else
  448. #define SGEMM_DEFAULT_UNROLL_N 2
  449. #define DGEMM_DEFAULT_UNROLL_N 2
  450. #define SGEMM_DEFAULT_UNROLL_M 16
  451. #define DGEMM_DEFAULT_UNROLL_M 8
  452. #define QGEMM_DEFAULT_UNROLL_M 2
  453. #define CGEMM_DEFAULT_UNROLL_M 4
  454. #define ZGEMM_DEFAULT_UNROLL_M 2
  455. #define XGEMM_DEFAULT_UNROLL_M 1
  456. #define CGEMM3M_DEFAULT_UNROLL_N 4
  457. #define CGEMM3M_DEFAULT_UNROLL_M 8
  458. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  459. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  460. #define GEMV_UNROLL 8
  461. #endif
  462. #if defined(ARCH_X86_64)
  463. #define SGEMM_DEFAULT_P 768
  464. #define DGEMM_DEFAULT_P 576
  465. #define ZGEMM_DEFAULT_P 288
  466. #define CGEMM_DEFAULT_P 576
  467. #else
  468. #define SGEMM_DEFAULT_P 448
  469. #define DGEMM_DEFAULT_P 480
  470. #define ZGEMM_DEFAULT_P 112
  471. #define CGEMM_DEFAULT_P 224
  472. #endif
  473. #define QGEMM_DEFAULT_P 112
  474. #define XGEMM_DEFAULT_P 56
  475. #if defined(ARCH_X86_64)
  476. #define SGEMM_DEFAULT_Q 192
  477. #define DGEMM_DEFAULT_Q 160
  478. #define ZGEMM_DEFAULT_Q 160
  479. #define CGEMM_DEFAULT_Q 160
  480. #else
  481. #define SGEMM_DEFAULT_Q 224
  482. #define DGEMM_DEFAULT_Q 224
  483. #define ZGEMM_DEFAULT_Q 224
  484. #define CGEMM_DEFAULT_Q 224
  485. #endif
  486. #define QGEMM_DEFAULT_Q 224
  487. #define XGEMM_DEFAULT_Q 224
  488. #define CGEMM3M_DEFAULT_P 448
  489. #define ZGEMM3M_DEFAULT_P 224
  490. #define XGEMM3M_DEFAULT_P 112
  491. #define CGEMM3M_DEFAULT_Q 224
  492. #define ZGEMM3M_DEFAULT_Q 224
  493. #define XGEMM3M_DEFAULT_Q 224
  494. #define CGEMM3M_DEFAULT_R 12288
  495. #define ZGEMM3M_DEFAULT_R 12288
  496. #define XGEMM3M_DEFAULT_R 12288
  497. #define SGEMM_DEFAULT_R 12288
  498. #define QGEMM_DEFAULT_R qgemm_r
  499. #define DGEMM_DEFAULT_R 12288
  500. #define CGEMM_DEFAULT_R cgemm_r
  501. #define ZGEMM_DEFAULT_R zgemm_r
  502. #define XGEMM_DEFAULT_R xgemm_r
  503. #define SYMV_P 16
  504. #define HAVE_EXCLUSIVE_CACHE
  505. #define GEMM_THREAD gemm_thread_mn
  506. #endif
  507. #ifdef ZEN
  508. #define SNUMOPT 16
  509. #define DNUMOPT 8
  510. #define GEMM_DEFAULT_OFFSET_A 0
  511. #define GEMM_DEFAULT_OFFSET_B 0
  512. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  513. #define SYMV_P 8
  514. #define SWITCH_RATIO 16
  515. #ifdef ARCH_X86
  516. #define SGEMM_DEFAULT_UNROLL_M 4
  517. #define DGEMM_DEFAULT_UNROLL_M 2
  518. #define QGEMM_DEFAULT_UNROLL_M 2
  519. #define CGEMM_DEFAULT_UNROLL_M 2
  520. #define ZGEMM_DEFAULT_UNROLL_M 1
  521. #define XGEMM_DEFAULT_UNROLL_M 1
  522. #define SGEMM_DEFAULT_UNROLL_N 4
  523. #define DGEMM_DEFAULT_UNROLL_N 4
  524. #define QGEMM_DEFAULT_UNROLL_N 2
  525. #define CGEMM_DEFAULT_UNROLL_N 2
  526. #define ZGEMM_DEFAULT_UNROLL_N 2
  527. #define XGEMM_DEFAULT_UNROLL_N 1
  528. #else
  529. #define SGEMM_DEFAULT_UNROLL_M 8
  530. #define DGEMM_DEFAULT_UNROLL_M 4
  531. #define QGEMM_DEFAULT_UNROLL_M 2
  532. #define CGEMM_DEFAULT_UNROLL_M 8
  533. #define ZGEMM_DEFAULT_UNROLL_M 4
  534. #define XGEMM_DEFAULT_UNROLL_M 1
  535. #define SGEMM_DEFAULT_UNROLL_N 4
  536. #define DGEMM_DEFAULT_UNROLL_N 8
  537. #define QGEMM_DEFAULT_UNROLL_N 2
  538. #define CGEMM_DEFAULT_UNROLL_N 2
  539. #define ZGEMM_DEFAULT_UNROLL_N 2
  540. #define XGEMM_DEFAULT_UNROLL_N 1
  541. /*
  542. #define SGEMM_DEFAULT_UNROLL_MN 32
  543. #define DGEMM_DEFAULT_UNROLL_MN 32
  544. */
  545. #endif
  546. #ifdef ARCH_X86
  547. #define SGEMM_DEFAULT_P 512
  548. #define SGEMM_DEFAULT_R sgemm_r
  549. #define DGEMM_DEFAULT_P 512
  550. #define DGEMM_DEFAULT_R dgemm_r
  551. #define QGEMM_DEFAULT_P 504
  552. #define QGEMM_DEFAULT_R qgemm_r
  553. #define CGEMM_DEFAULT_P 128
  554. #define CGEMM_DEFAULT_R 1024
  555. #define ZGEMM_DEFAULT_P 512
  556. #define ZGEMM_DEFAULT_R zgemm_r
  557. #define XGEMM_DEFAULT_P 252
  558. #define XGEMM_DEFAULT_R xgemm_r
  559. #define SGEMM_DEFAULT_Q 256
  560. #define DGEMM_DEFAULT_Q 256
  561. #define QGEMM_DEFAULT_Q 128
  562. #define CGEMM_DEFAULT_Q 256
  563. #define ZGEMM_DEFAULT_Q 192
  564. #define XGEMM_DEFAULT_Q 128
  565. #else
  566. #define SGEMM_DEFAULT_P 320
  567. #define DGEMM_DEFAULT_P 512
  568. #define CGEMM_DEFAULT_P 256
  569. #define ZGEMM_DEFAULT_P 192
  570. #ifdef WINDOWS_ABI
  571. #define SGEMM_DEFAULT_Q 320
  572. #define DGEMM_DEFAULT_Q 128
  573. #else
  574. #define SGEMM_DEFAULT_Q 320
  575. #define DGEMM_DEFAULT_Q 256
  576. #endif
  577. #define CGEMM_DEFAULT_Q 256
  578. #define ZGEMM_DEFAULT_Q 192
  579. #define SGEMM_DEFAULT_R sgemm_r
  580. #define DGEMM_DEFAULT_R 13824
  581. #define CGEMM_DEFAULT_R cgemm_r
  582. #define ZGEMM_DEFAULT_R zgemm_r
  583. #define QGEMM_DEFAULT_Q 128
  584. #define QGEMM_DEFAULT_P 504
  585. #define QGEMM_DEFAULT_R qgemm_r
  586. #define XGEMM_DEFAULT_P 252
  587. #define XGEMM_DEFAULT_R xgemm_r
  588. #define XGEMM_DEFAULT_Q 128
  589. #define CGEMM3M_DEFAULT_UNROLL_N 4
  590. #define CGEMM3M_DEFAULT_UNROLL_M 8
  591. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  592. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  593. #define CGEMM3M_DEFAULT_P 320
  594. #define ZGEMM3M_DEFAULT_P 256
  595. #define XGEMM3M_DEFAULT_P 112
  596. #define CGEMM3M_DEFAULT_Q 320
  597. #define ZGEMM3M_DEFAULT_Q 256
  598. #define XGEMM3M_DEFAULT_Q 224
  599. #define CGEMM3M_DEFAULT_R 12288
  600. #define ZGEMM3M_DEFAULT_R 12288
  601. #define XGEMM3M_DEFAULT_R 12288
  602. #endif
  603. #endif
  604. #ifdef ATHLON
  605. #define SNUMOPT 4
  606. #define DNUMOPT 2
  607. #define GEMM_DEFAULT_OFFSET_A 0
  608. #define GEMM_DEFAULT_OFFSET_B 384
  609. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  610. #define SGEMM_DEFAULT_UNROLL_N 4
  611. #define DGEMM_DEFAULT_UNROLL_N 4
  612. #define QGEMM_DEFAULT_UNROLL_N 2
  613. #define CGEMM_DEFAULT_UNROLL_N 2
  614. #define ZGEMM_DEFAULT_UNROLL_N 2
  615. #define XGEMM_DEFAULT_UNROLL_N 1
  616. #define SGEMM_DEFAULT_UNROLL_M 2
  617. #define DGEMM_DEFAULT_UNROLL_M 1
  618. #define QGEMM_DEFAULT_UNROLL_M 2
  619. #define CGEMM_DEFAULT_UNROLL_M 1
  620. #define ZGEMM_DEFAULT_UNROLL_M 1
  621. #define XGEMM_DEFAULT_UNROLL_M 1
  622. #define SGEMM_DEFAULT_R sgemm_r
  623. #define DGEMM_DEFAULT_R dgemm_r
  624. #define QGEMM_DEFAULT_R qgemm_r
  625. #define CGEMM_DEFAULT_R cgemm_r
  626. #define ZGEMM_DEFAULT_R zgemm_r
  627. #define XGEMM_DEFAULT_R xgemm_r
  628. #define SGEMM_DEFAULT_P 208
  629. #define DGEMM_DEFAULT_P 104
  630. #define QGEMM_DEFAULT_P 56
  631. #define CGEMM_DEFAULT_P 104
  632. #define ZGEMM_DEFAULT_P 56
  633. #define XGEMM_DEFAULT_P 28
  634. #define SGEMM_DEFAULT_Q 208
  635. #define DGEMM_DEFAULT_Q 208
  636. #define QGEMM_DEFAULT_Q 208
  637. #define CGEMM_DEFAULT_Q 208
  638. #define ZGEMM_DEFAULT_Q 208
  639. #define XGEMM_DEFAULT_Q 208
  640. #define SYMV_P 16
  641. #define HAVE_EXCLUSIVE_CACHE
  642. #endif
  643. #ifdef VIAC3
  644. #define SNUMOPT 2
  645. #define DNUMOPT 1
  646. #define GEMM_DEFAULT_OFFSET_A 0
  647. #define GEMM_DEFAULT_OFFSET_B 256
  648. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  649. #define SGEMM_DEFAULT_UNROLL_N 4
  650. #define DGEMM_DEFAULT_UNROLL_N 4
  651. #define QGEMM_DEFAULT_UNROLL_N 2
  652. #define CGEMM_DEFAULT_UNROLL_N 2
  653. #define ZGEMM_DEFAULT_UNROLL_N 2
  654. #define XGEMM_DEFAULT_UNROLL_N 1
  655. #define SGEMM_DEFAULT_UNROLL_M 2
  656. #define DGEMM_DEFAULT_UNROLL_M 1
  657. #define QGEMM_DEFAULT_UNROLL_M 2
  658. #define CGEMM_DEFAULT_UNROLL_M 1
  659. #define ZGEMM_DEFAULT_UNROLL_M 1
  660. #define XGEMM_DEFAULT_UNROLL_M 1
  661. #define SGEMM_DEFAULT_R sgemm_r
  662. #define DGEMM_DEFAULT_R dgemm_r
  663. #define QGEMM_DEFAULT_R qgemm_r
  664. #define CGEMM_DEFAULT_R cgemm_r
  665. #define ZGEMM_DEFAULT_R zgemm_r
  666. #define XGEMM_DEFAULT_R xgemm_r
  667. #define SGEMM_DEFAULT_P 128
  668. #define DGEMM_DEFAULT_P 128
  669. #define QGEMM_DEFAULT_P 128
  670. #define CGEMM_DEFAULT_P 128
  671. #define ZGEMM_DEFAULT_P 128
  672. #define XGEMM_DEFAULT_P 128
  673. #define SGEMM_DEFAULT_Q 512
  674. #define DGEMM_DEFAULT_Q 256
  675. #define QGEMM_DEFAULT_Q 256
  676. #define CGEMM_DEFAULT_Q 256
  677. #define ZGEMM_DEFAULT_Q 128
  678. #define XGEMM_DEFAULT_Q 128
  679. #define SYMV_P 16
  680. #endif
  681. #ifdef NANO
  682. #define SNUMOPT 4
  683. #define DNUMOPT 2
  684. #define GEMM_DEFAULT_OFFSET_A 64
  685. #define GEMM_DEFAULT_OFFSET_B 256
  686. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x01ffffUL
  687. #ifdef ARCH_X86
  688. #define SGEMM_DEFAULT_UNROLL_N 4
  689. #define DGEMM_DEFAULT_UNROLL_N 4
  690. #define QGEMM_DEFAULT_UNROLL_N 2
  691. #define CGEMM_DEFAULT_UNROLL_N 2
  692. #define ZGEMM_DEFAULT_UNROLL_N 2
  693. #define XGEMM_DEFAULT_UNROLL_N 1
  694. #define SGEMM_DEFAULT_UNROLL_M 4
  695. #define DGEMM_DEFAULT_UNROLL_M 2
  696. #define QGEMM_DEFAULT_UNROLL_M 2
  697. #define CGEMM_DEFAULT_UNROLL_M 2
  698. #define ZGEMM_DEFAULT_UNROLL_M 1
  699. #define XGEMM_DEFAULT_UNROLL_M 1
  700. #else
  701. #define SGEMM_DEFAULT_UNROLL_N 8
  702. #define DGEMM_DEFAULT_UNROLL_N 4
  703. #define QGEMM_DEFAULT_UNROLL_N 2
  704. #define CGEMM_DEFAULT_UNROLL_N 4
  705. #define ZGEMM_DEFAULT_UNROLL_N 2
  706. #define XGEMM_DEFAULT_UNROLL_N 1
  707. #define SGEMM_DEFAULT_UNROLL_M 4
  708. #define DGEMM_DEFAULT_UNROLL_M 4
  709. #define QGEMM_DEFAULT_UNROLL_M 2
  710. #define CGEMM_DEFAULT_UNROLL_M 2
  711. #define ZGEMM_DEFAULT_UNROLL_M 2
  712. #define XGEMM_DEFAULT_UNROLL_M 1
  713. #endif
  714. #define SGEMM_DEFAULT_P 288
  715. #define DGEMM_DEFAULT_P 288
  716. #define QGEMM_DEFAULT_P 288
  717. #define CGEMM_DEFAULT_P 288
  718. #define ZGEMM_DEFAULT_P 288
  719. #define XGEMM_DEFAULT_P 288
  720. #define SGEMM_DEFAULT_R sgemm_r
  721. #define DGEMM_DEFAULT_R dgemm_r
  722. #define QGEMM_DEFAULT_R qgemm_r
  723. #define CGEMM_DEFAULT_R cgemm_r
  724. #define ZGEMM_DEFAULT_R zgemm_r
  725. #define XGEMM_DEFAULT_R xgemm_r
  726. #define SGEMM_DEFAULT_Q 256
  727. #define DGEMM_DEFAULT_Q 128
  728. #define QGEMM_DEFAULT_Q 64
  729. #define CGEMM_DEFAULT_Q 128
  730. #define ZGEMM_DEFAULT_Q 64
  731. #define XGEMM_DEFAULT_Q 32
  732. #define SYMV_P 16
  733. #define HAVE_EXCLUSIVE_CACHE
  734. #endif
  735. #if defined(PENTIUM) || defined(PENTIUM2) || defined(PENTIUM3)
  736. #ifdef HAVE_SSE
  737. #define SNUMOPT 2
  738. #else
  739. #define SNUMOPT 1
  740. #endif
  741. #define DNUMOPT 1
  742. #define GEMM_DEFAULT_OFFSET_A 0
  743. #define GEMM_DEFAULT_OFFSET_B 0
  744. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  745. #ifdef HAVE_SSE
  746. #define SGEMM_DEFAULT_UNROLL_M 8
  747. #define CGEMM_DEFAULT_UNROLL_M 4
  748. #else
  749. #define SGEMM_DEFAULT_UNROLL_M 4
  750. #define CGEMM_DEFAULT_UNROLL_M 2
  751. #endif
  752. #define DGEMM_DEFAULT_UNROLL_M 2
  753. #define SGEMM_DEFAULT_UNROLL_N 2
  754. #define DGEMM_DEFAULT_UNROLL_N 2
  755. #define QGEMM_DEFAULT_UNROLL_M 2
  756. #define QGEMM_DEFAULT_UNROLL_N 2
  757. #define CGEMM_DEFAULT_UNROLL_N 1
  758. #define ZGEMM_DEFAULT_UNROLL_M 1
  759. #define ZGEMM_DEFAULT_UNROLL_N 1
  760. #define XGEMM_DEFAULT_UNROLL_M 1
  761. #define XGEMM_DEFAULT_UNROLL_N 1
  762. #define SGEMM_DEFAULT_P sgemm_p
  763. #define SGEMM_DEFAULT_Q 256
  764. #define SGEMM_DEFAULT_R sgemm_r
  765. #define DGEMM_DEFAULT_P dgemm_p
  766. #define DGEMM_DEFAULT_Q 256
  767. #define DGEMM_DEFAULT_R dgemm_r
  768. #define QGEMM_DEFAULT_P qgemm_p
  769. #define QGEMM_DEFAULT_Q 256
  770. #define QGEMM_DEFAULT_R qgemm_r
  771. #define CGEMM_DEFAULT_P cgemm_p
  772. #define CGEMM_DEFAULT_Q 256
  773. #define CGEMM_DEFAULT_R cgemm_r
  774. #define ZGEMM_DEFAULT_P zgemm_p
  775. #define ZGEMM_DEFAULT_Q 256
  776. #define ZGEMM_DEFAULT_R zgemm_r
  777. #define XGEMM_DEFAULT_P xgemm_p
  778. #define XGEMM_DEFAULT_Q 256
  779. #define XGEMM_DEFAULT_R xgemm_r
  780. #define SYMV_P 4
  781. #endif
  782. #ifdef PENTIUMM
  783. #define SNUMOPT 2
  784. #define DNUMOPT 1
  785. #define GEMM_DEFAULT_OFFSET_A 0
  786. #define GEMM_DEFAULT_OFFSET_B 0
  787. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  788. #ifdef CORE_YONAH
  789. #define SGEMM_DEFAULT_UNROLL_M 4
  790. #define SGEMM_DEFAULT_UNROLL_N 4
  791. #define DGEMM_DEFAULT_UNROLL_M 2
  792. #define DGEMM_DEFAULT_UNROLL_N 4
  793. #define QGEMM_DEFAULT_UNROLL_M 2
  794. #define QGEMM_DEFAULT_UNROLL_N 2
  795. #define CGEMM_DEFAULT_UNROLL_M 2
  796. #define CGEMM_DEFAULT_UNROLL_N 2
  797. #define ZGEMM_DEFAULT_UNROLL_M 1
  798. #define ZGEMM_DEFAULT_UNROLL_N 2
  799. #define XGEMM_DEFAULT_UNROLL_M 1
  800. #define XGEMM_DEFAULT_UNROLL_N 1
  801. #else
  802. #define SGEMM_DEFAULT_UNROLL_M 8
  803. #define SGEMM_DEFAULT_UNROLL_N 2
  804. #define DGEMM_DEFAULT_UNROLL_M 2
  805. #define DGEMM_DEFAULT_UNROLL_N 2
  806. #define QGEMM_DEFAULT_UNROLL_M 2
  807. #define QGEMM_DEFAULT_UNROLL_N 2
  808. #define CGEMM_DEFAULT_UNROLL_M 4
  809. #define CGEMM_DEFAULT_UNROLL_N 1
  810. #define ZGEMM_DEFAULT_UNROLL_M 1
  811. #define ZGEMM_DEFAULT_UNROLL_N 1
  812. #define XGEMM_DEFAULT_UNROLL_M 1
  813. #define XGEMM_DEFAULT_UNROLL_N 1
  814. #endif
  815. #define SGEMM_DEFAULT_P sgemm_p
  816. #define SGEMM_DEFAULT_Q 256
  817. #define SGEMM_DEFAULT_R sgemm_r
  818. #define DGEMM_DEFAULT_P dgemm_p
  819. #define DGEMM_DEFAULT_Q 256
  820. #define DGEMM_DEFAULT_R dgemm_r
  821. #define QGEMM_DEFAULT_P qgemm_p
  822. #define QGEMM_DEFAULT_Q 256
  823. #define QGEMM_DEFAULT_R qgemm_r
  824. #define CGEMM_DEFAULT_P cgemm_p
  825. #define CGEMM_DEFAULT_Q 256
  826. #define CGEMM_DEFAULT_R cgemm_r
  827. #define ZGEMM_DEFAULT_P zgemm_p
  828. #define ZGEMM_DEFAULT_Q 256
  829. #define ZGEMM_DEFAULT_R zgemm_r
  830. #define XGEMM_DEFAULT_P xgemm_p
  831. #define XGEMM_DEFAULT_Q 256
  832. #define XGEMM_DEFAULT_R xgemm_r
  833. #define SYMV_P 4
  834. #endif
  835. #ifdef CORE_NORTHWOOD
  836. #define SNUMOPT 4
  837. #define DNUMOPT 2
  838. #define GEMM_DEFAULT_OFFSET_A 0
  839. #define GEMM_DEFAULT_OFFSET_B 32
  840. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  841. #define SYMV_P 8
  842. #define SGEMM_DEFAULT_UNROLL_M 8
  843. #define DGEMM_DEFAULT_UNROLL_M 4
  844. #define QGEMM_DEFAULT_UNROLL_M 2
  845. #define CGEMM_DEFAULT_UNROLL_M 4
  846. #define ZGEMM_DEFAULT_UNROLL_M 2
  847. #define XGEMM_DEFAULT_UNROLL_M 1
  848. #define SGEMM_DEFAULT_UNROLL_N 2
  849. #define DGEMM_DEFAULT_UNROLL_N 2
  850. #define QGEMM_DEFAULT_UNROLL_N 2
  851. #define CGEMM_DEFAULT_UNROLL_N 1
  852. #define ZGEMM_DEFAULT_UNROLL_N 1
  853. #define XGEMM_DEFAULT_UNROLL_N 1
  854. #define SGEMM_DEFAULT_P sgemm_p
  855. #define SGEMM_DEFAULT_R sgemm_r
  856. #define DGEMM_DEFAULT_P dgemm_p
  857. #define DGEMM_DEFAULT_R dgemm_r
  858. #define QGEMM_DEFAULT_P qgemm_p
  859. #define QGEMM_DEFAULT_R qgemm_r
  860. #define CGEMM_DEFAULT_P cgemm_p
  861. #define CGEMM_DEFAULT_R cgemm_r
  862. #define ZGEMM_DEFAULT_P zgemm_p
  863. #define ZGEMM_DEFAULT_R zgemm_r
  864. #define XGEMM_DEFAULT_P xgemm_p
  865. #define XGEMM_DEFAULT_R xgemm_r
  866. #define SGEMM_DEFAULT_Q 128
  867. #define DGEMM_DEFAULT_Q 128
  868. #define QGEMM_DEFAULT_Q 128
  869. #define CGEMM_DEFAULT_Q 128
  870. #define ZGEMM_DEFAULT_Q 128
  871. #define XGEMM_DEFAULT_Q 128
  872. #endif
  873. #ifdef CORE_PRESCOTT
  874. #define SNUMOPT 4
  875. #define DNUMOPT 2
  876. #ifndef __64BIT__
  877. #define GEMM_DEFAULT_OFFSET_A 128
  878. #define GEMM_DEFAULT_OFFSET_B 192
  879. #else
  880. #define GEMM_DEFAULT_OFFSET_A 0
  881. #define GEMM_DEFAULT_OFFSET_B 256
  882. #endif
  883. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  884. #define SYMV_P 8
  885. #ifdef ARCH_X86
  886. #define SGEMM_DEFAULT_UNROLL_M 4
  887. #define DGEMM_DEFAULT_UNROLL_M 2
  888. #define QGEMM_DEFAULT_UNROLL_M 2
  889. #define CGEMM_DEFAULT_UNROLL_M 2
  890. #define ZGEMM_DEFAULT_UNROLL_M 1
  891. #define XGEMM_DEFAULT_UNROLL_M 1
  892. #else
  893. #define SGEMM_DEFAULT_UNROLL_M 8
  894. #define DGEMM_DEFAULT_UNROLL_M 4
  895. #define QGEMM_DEFAULT_UNROLL_M 2
  896. #define CGEMM_DEFAULT_UNROLL_M 4
  897. #define ZGEMM_DEFAULT_UNROLL_M 2
  898. #define XGEMM_DEFAULT_UNROLL_M 1
  899. #endif
  900. #define SGEMM_DEFAULT_UNROLL_N 4
  901. #define DGEMM_DEFAULT_UNROLL_N 4
  902. #define QGEMM_DEFAULT_UNROLL_N 2
  903. #define CGEMM_DEFAULT_UNROLL_N 2
  904. #define ZGEMM_DEFAULT_UNROLL_N 2
  905. #define XGEMM_DEFAULT_UNROLL_N 1
  906. #define SGEMM_DEFAULT_P sgemm_p
  907. #define SGEMM_DEFAULT_R sgemm_r
  908. #define DGEMM_DEFAULT_P dgemm_p
  909. #define DGEMM_DEFAULT_R dgemm_r
  910. #define QGEMM_DEFAULT_P qgemm_p
  911. #define QGEMM_DEFAULT_R qgemm_r
  912. #define CGEMM_DEFAULT_P cgemm_p
  913. #define CGEMM_DEFAULT_R cgemm_r
  914. #define ZGEMM_DEFAULT_P zgemm_p
  915. #define ZGEMM_DEFAULT_R zgemm_r
  916. #define XGEMM_DEFAULT_P xgemm_p
  917. #define XGEMM_DEFAULT_R xgemm_r
  918. #define SGEMM_DEFAULT_Q 128
  919. #define DGEMM_DEFAULT_Q 128
  920. #define QGEMM_DEFAULT_Q 128
  921. #define CGEMM_DEFAULT_Q 128
  922. #define ZGEMM_DEFAULT_Q 128
  923. #define XGEMM_DEFAULT_Q 128
  924. #endif
  925. #ifdef CORE2
  926. #define SNUMOPT 8
  927. #define DNUMOPT 4
  928. #define GEMM_DEFAULT_OFFSET_A 448
  929. #define GEMM_DEFAULT_OFFSET_B 128
  930. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  931. #define SYMV_P 8
  932. #define SWITCH_RATIO 4
  933. #ifdef ARCH_X86
  934. #define SGEMM_DEFAULT_UNROLL_M 8
  935. #define DGEMM_DEFAULT_UNROLL_M 4
  936. #define QGEMM_DEFAULT_UNROLL_M 2
  937. #define CGEMM_DEFAULT_UNROLL_M 4
  938. #define ZGEMM_DEFAULT_UNROLL_M 2
  939. #define XGEMM_DEFAULT_UNROLL_M 1
  940. #define SGEMM_DEFAULT_UNROLL_N 2
  941. #define DGEMM_DEFAULT_UNROLL_N 2
  942. #define QGEMM_DEFAULT_UNROLL_N 2
  943. #define CGEMM_DEFAULT_UNROLL_N 1
  944. #define ZGEMM_DEFAULT_UNROLL_N 1
  945. #define XGEMM_DEFAULT_UNROLL_N 1
  946. #define MASK(a, b) ((((a) + (b) - 1) / (b)) * (b))
  947. #else
  948. #define SGEMM_DEFAULT_UNROLL_M 8
  949. #define DGEMM_DEFAULT_UNROLL_M 4
  950. #define QGEMM_DEFAULT_UNROLL_M 2
  951. #define CGEMM_DEFAULT_UNROLL_M 4
  952. #define ZGEMM_DEFAULT_UNROLL_M 2
  953. #define XGEMM_DEFAULT_UNROLL_M 1
  954. #define SGEMM_DEFAULT_UNROLL_N 4
  955. #define DGEMM_DEFAULT_UNROLL_N 4
  956. #define QGEMM_DEFAULT_UNROLL_N 2
  957. #define CGEMM_DEFAULT_UNROLL_N 2
  958. #define ZGEMM_DEFAULT_UNROLL_N 2
  959. #define XGEMM_DEFAULT_UNROLL_N 1
  960. #endif
  961. #define SGEMM_DEFAULT_P sgemm_p
  962. #define SGEMM_DEFAULT_R sgemm_r
  963. #define DGEMM_DEFAULT_P dgemm_p
  964. #define DGEMM_DEFAULT_R dgemm_r
  965. #define QGEMM_DEFAULT_P qgemm_p
  966. #define QGEMM_DEFAULT_R qgemm_r
  967. #define CGEMM_DEFAULT_P cgemm_p
  968. #define CGEMM_DEFAULT_R cgemm_r
  969. #define ZGEMM_DEFAULT_P zgemm_p
  970. #define ZGEMM_DEFAULT_R zgemm_r
  971. #define XGEMM_DEFAULT_P xgemm_p
  972. #define XGEMM_DEFAULT_R xgemm_r
  973. #define SGEMM_DEFAULT_Q 256
  974. #define DGEMM_DEFAULT_Q 256
  975. #define QGEMM_DEFAULT_Q 256
  976. #define CGEMM_DEFAULT_Q 256
  977. #define ZGEMM_DEFAULT_Q 256
  978. #define XGEMM_DEFAULT_Q 256
  979. #endif
  980. #ifdef PENRYN
  981. #define SNUMOPT 8
  982. #define DNUMOPT 4
  983. #define GEMM_DEFAULT_OFFSET_A 128
  984. #define GEMM_DEFAULT_OFFSET_B 0
  985. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  986. #define SYMV_P 8
  987. #define SWITCH_RATIO 4
  988. #ifdef ARCH_X86
  989. #define SGEMM_DEFAULT_UNROLL_M 4
  990. #define DGEMM_DEFAULT_UNROLL_M 2
  991. #define QGEMM_DEFAULT_UNROLL_M 2
  992. #define CGEMM_DEFAULT_UNROLL_M 2
  993. #define ZGEMM_DEFAULT_UNROLL_M 1
  994. #define XGEMM_DEFAULT_UNROLL_M 1
  995. #define SGEMM_DEFAULT_UNROLL_N 4
  996. #define DGEMM_DEFAULT_UNROLL_N 4
  997. #define QGEMM_DEFAULT_UNROLL_N 2
  998. #define CGEMM_DEFAULT_UNROLL_N 2
  999. #define ZGEMM_DEFAULT_UNROLL_N 2
  1000. #define XGEMM_DEFAULT_UNROLL_N 1
  1001. #else
  1002. #define SGEMM_DEFAULT_UNROLL_M 8
  1003. #define DGEMM_DEFAULT_UNROLL_M 4
  1004. #define QGEMM_DEFAULT_UNROLL_M 2
  1005. #define CGEMM_DEFAULT_UNROLL_M 4
  1006. #define ZGEMM_DEFAULT_UNROLL_M 2
  1007. #define XGEMM_DEFAULT_UNROLL_M 1
  1008. #define SGEMM_DEFAULT_UNROLL_N 4
  1009. #define DGEMM_DEFAULT_UNROLL_N 4
  1010. #define QGEMM_DEFAULT_UNROLL_N 2
  1011. #define CGEMM_DEFAULT_UNROLL_N 2
  1012. #define ZGEMM_DEFAULT_UNROLL_N 2
  1013. #define XGEMM_DEFAULT_UNROLL_N 1
  1014. #endif
  1015. #define SGEMM_DEFAULT_P sgemm_p
  1016. #define SGEMM_DEFAULT_R sgemm_r
  1017. #define DGEMM_DEFAULT_P dgemm_p
  1018. #define DGEMM_DEFAULT_R dgemm_r
  1019. #define QGEMM_DEFAULT_P qgemm_p
  1020. #define QGEMM_DEFAULT_R qgemm_r
  1021. #define CGEMM_DEFAULT_P cgemm_p
  1022. #define CGEMM_DEFAULT_R cgemm_r
  1023. #define ZGEMM_DEFAULT_P zgemm_p
  1024. #define ZGEMM_DEFAULT_R zgemm_r
  1025. #define XGEMM_DEFAULT_P xgemm_p
  1026. #define XGEMM_DEFAULT_R xgemm_r
  1027. #define SGEMM_DEFAULT_Q 512
  1028. #define DGEMM_DEFAULT_Q 256
  1029. #define QGEMM_DEFAULT_Q 128
  1030. #define CGEMM_DEFAULT_Q 512
  1031. #define ZGEMM_DEFAULT_Q 256
  1032. #define XGEMM_DEFAULT_Q 128
  1033. #define GETRF_FACTOR 0.75
  1034. #endif
  1035. #ifdef DUNNINGTON
  1036. #define SNUMOPT 8
  1037. #define DNUMOPT 4
  1038. #define GEMM_DEFAULT_OFFSET_A 128
  1039. #define GEMM_DEFAULT_OFFSET_B 0
  1040. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1041. #define SYMV_P 8
  1042. #define SWITCH_RATIO 4
  1043. #ifdef ARCH_X86
  1044. #define SGEMM_DEFAULT_UNROLL_M 4
  1045. #define DGEMM_DEFAULT_UNROLL_M 2
  1046. #define QGEMM_DEFAULT_UNROLL_M 2
  1047. #define CGEMM_DEFAULT_UNROLL_M 2
  1048. #define ZGEMM_DEFAULT_UNROLL_M 1
  1049. #define XGEMM_DEFAULT_UNROLL_M 1
  1050. #define SGEMM_DEFAULT_UNROLL_N 4
  1051. #define DGEMM_DEFAULT_UNROLL_N 4
  1052. #define QGEMM_DEFAULT_UNROLL_N 2
  1053. #define CGEMM_DEFAULT_UNROLL_N 2
  1054. #define ZGEMM_DEFAULT_UNROLL_N 2
  1055. #define XGEMM_DEFAULT_UNROLL_N 1
  1056. #else
  1057. #define SGEMM_DEFAULT_UNROLL_M 8
  1058. #define DGEMM_DEFAULT_UNROLL_M 4
  1059. #define QGEMM_DEFAULT_UNROLL_M 2
  1060. #define CGEMM_DEFAULT_UNROLL_M 4
  1061. #define ZGEMM_DEFAULT_UNROLL_M 2
  1062. #define XGEMM_DEFAULT_UNROLL_M 1
  1063. #define SGEMM_DEFAULT_UNROLL_N 4
  1064. #define DGEMM_DEFAULT_UNROLL_N 4
  1065. #define QGEMM_DEFAULT_UNROLL_N 2
  1066. #define CGEMM_DEFAULT_UNROLL_N 2
  1067. #define ZGEMM_DEFAULT_UNROLL_N 2
  1068. #define XGEMM_DEFAULT_UNROLL_N 1
  1069. #endif
  1070. #define SGEMM_DEFAULT_P sgemm_p
  1071. #define SGEMM_DEFAULT_R sgemm_r
  1072. #define DGEMM_DEFAULT_P dgemm_p
  1073. #define DGEMM_DEFAULT_R dgemm_r
  1074. #define QGEMM_DEFAULT_P qgemm_p
  1075. #define QGEMM_DEFAULT_R qgemm_r
  1076. #define CGEMM_DEFAULT_P cgemm_p
  1077. #define CGEMM_DEFAULT_R cgemm_r
  1078. #define ZGEMM_DEFAULT_P zgemm_p
  1079. #define ZGEMM_DEFAULT_R zgemm_r
  1080. #define XGEMM_DEFAULT_P xgemm_p
  1081. #define XGEMM_DEFAULT_R xgemm_r
  1082. #define SGEMM_DEFAULT_Q 768
  1083. #define DGEMM_DEFAULT_Q 384
  1084. #define QGEMM_DEFAULT_Q 192
  1085. #define CGEMM_DEFAULT_Q 768
  1086. #define ZGEMM_DEFAULT_Q 384
  1087. #define XGEMM_DEFAULT_Q 192
  1088. #define GETRF_FACTOR 0.75
  1089. #define GEMM_THREAD gemm_thread_mn
  1090. #endif
  1091. #ifdef NEHALEM
  1092. #define SNUMOPT 8
  1093. #define DNUMOPT 4
  1094. #define GEMM_DEFAULT_OFFSET_A 32
  1095. #define GEMM_DEFAULT_OFFSET_B 0
  1096. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1097. #define SYMV_P 8
  1098. #define SWITCH_RATIO 4
  1099. #ifdef ARCH_X86
  1100. #define SGEMM_DEFAULT_UNROLL_M 4
  1101. #define DGEMM_DEFAULT_UNROLL_M 2
  1102. #define QGEMM_DEFAULT_UNROLL_M 2
  1103. #define CGEMM_DEFAULT_UNROLL_M 2
  1104. #define ZGEMM_DEFAULT_UNROLL_M 1
  1105. #define XGEMM_DEFAULT_UNROLL_M 1
  1106. #define SGEMM_DEFAULT_UNROLL_N 4
  1107. #define DGEMM_DEFAULT_UNROLL_N 4
  1108. #define QGEMM_DEFAULT_UNROLL_N 2
  1109. #define CGEMM_DEFAULT_UNROLL_N 2
  1110. #define ZGEMM_DEFAULT_UNROLL_N 2
  1111. #define XGEMM_DEFAULT_UNROLL_N 1
  1112. #else
  1113. #define SGEMM_DEFAULT_UNROLL_M 4
  1114. #define DGEMM_DEFAULT_UNROLL_M 2
  1115. #define QGEMM_DEFAULT_UNROLL_M 2
  1116. #define CGEMM_DEFAULT_UNROLL_M 2
  1117. #define ZGEMM_DEFAULT_UNROLL_M 1
  1118. #define XGEMM_DEFAULT_UNROLL_M 1
  1119. #define SGEMM_DEFAULT_UNROLL_N 8
  1120. #define DGEMM_DEFAULT_UNROLL_N 8
  1121. #define QGEMM_DEFAULT_UNROLL_N 2
  1122. #define CGEMM_DEFAULT_UNROLL_N 4
  1123. #define ZGEMM_DEFAULT_UNROLL_N 4
  1124. #define XGEMM_DEFAULT_UNROLL_N 1
  1125. #endif
  1126. #define SGEMM_DEFAULT_P 504
  1127. #define SGEMM_DEFAULT_R sgemm_r
  1128. #define DGEMM_DEFAULT_P 504
  1129. #define DGEMM_DEFAULT_R dgemm_r
  1130. #define QGEMM_DEFAULT_P 504
  1131. #define QGEMM_DEFAULT_R qgemm_r
  1132. #define CGEMM_DEFAULT_P 252
  1133. #define CGEMM_DEFAULT_R cgemm_r
  1134. #define ZGEMM_DEFAULT_P 252
  1135. #define ZGEMM_DEFAULT_R zgemm_r
  1136. #define XGEMM_DEFAULT_P 252
  1137. #define XGEMM_DEFAULT_R xgemm_r
  1138. #define SGEMM_DEFAULT_Q 512
  1139. #define DGEMM_DEFAULT_Q 256
  1140. #define QGEMM_DEFAULT_Q 128
  1141. #define CGEMM_DEFAULT_Q 512
  1142. #define ZGEMM_DEFAULT_Q 256
  1143. #define XGEMM_DEFAULT_Q 128
  1144. #define GETRF_FACTOR 0.72
  1145. #endif
  1146. #ifdef SANDYBRIDGE
  1147. #define SNUMOPT 8
  1148. #define DNUMOPT 4
  1149. #define GEMM_DEFAULT_OFFSET_A 0
  1150. #define GEMM_DEFAULT_OFFSET_B 0
  1151. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1152. #define SYMV_P 8
  1153. #define SWITCH_RATIO 4
  1154. #ifdef ARCH_X86
  1155. #define SGEMM_DEFAULT_UNROLL_M 4
  1156. #define DGEMM_DEFAULT_UNROLL_M 2
  1157. #define QGEMM_DEFAULT_UNROLL_M 2
  1158. #define CGEMM_DEFAULT_UNROLL_M 2
  1159. #define ZGEMM_DEFAULT_UNROLL_M 1
  1160. #define XGEMM_DEFAULT_UNROLL_M 1
  1161. #define SGEMM_DEFAULT_UNROLL_N 4
  1162. #define DGEMM_DEFAULT_UNROLL_N 4
  1163. #define QGEMM_DEFAULT_UNROLL_N 2
  1164. #define CGEMM_DEFAULT_UNROLL_N 2
  1165. #define ZGEMM_DEFAULT_UNROLL_N 2
  1166. #define XGEMM_DEFAULT_UNROLL_N 1
  1167. #else
  1168. #define SGEMM_DEFAULT_UNROLL_M 16
  1169. #define DGEMM_DEFAULT_UNROLL_M 8
  1170. #define QGEMM_DEFAULT_UNROLL_M 2
  1171. #define CGEMM_DEFAULT_UNROLL_M 8
  1172. #define ZGEMM_DEFAULT_UNROLL_M 1
  1173. #define XGEMM_DEFAULT_UNROLL_M 1
  1174. #define SGEMM_DEFAULT_UNROLL_N 4
  1175. #define DGEMM_DEFAULT_UNROLL_N 4
  1176. #define QGEMM_DEFAULT_UNROLL_N 2
  1177. #define CGEMM_DEFAULT_UNROLL_N 2
  1178. #define ZGEMM_DEFAULT_UNROLL_N 4
  1179. #define XGEMM_DEFAULT_UNROLL_N 1
  1180. #endif
  1181. #define SGEMM_DEFAULT_P 768
  1182. #define SGEMM_DEFAULT_R sgemm_r
  1183. /*#define SGEMM_DEFAULT_R 1024*/
  1184. #define DGEMM_DEFAULT_P 512
  1185. #define DGEMM_DEFAULT_R dgemm_r
  1186. /*#define DGEMM_DEFAULT_R 1024*/
  1187. #define QGEMM_DEFAULT_P 504
  1188. #define QGEMM_DEFAULT_R qgemm_r
  1189. #define CGEMM_DEFAULT_P 768
  1190. #define CGEMM_DEFAULT_R cgemm_r
  1191. /*#define CGEMM_DEFAULT_R 1024*/
  1192. #define ZGEMM_DEFAULT_P 512
  1193. #define ZGEMM_DEFAULT_R zgemm_r
  1194. /*#define ZGEMM_DEFAULT_R 1024*/
  1195. #define XGEMM_DEFAULT_P 252
  1196. #define XGEMM_DEFAULT_R xgemm_r
  1197. #define SGEMM_DEFAULT_Q 384
  1198. #define DGEMM_DEFAULT_Q 256
  1199. #define QGEMM_DEFAULT_Q 128
  1200. #define CGEMM_DEFAULT_Q 512
  1201. #define ZGEMM_DEFAULT_Q 192
  1202. #define XGEMM_DEFAULT_Q 128
  1203. #define CGEMM3M_DEFAULT_UNROLL_N 8
  1204. #define CGEMM3M_DEFAULT_UNROLL_M 4
  1205. #define ZGEMM3M_DEFAULT_UNROLL_N 8
  1206. #define ZGEMM3M_DEFAULT_UNROLL_M 2
  1207. #define CGEMM3M_DEFAULT_P 448
  1208. #define ZGEMM3M_DEFAULT_P 224
  1209. #define XGEMM3M_DEFAULT_P 112
  1210. #define CGEMM3M_DEFAULT_Q 224
  1211. #define ZGEMM3M_DEFAULT_Q 224
  1212. #define XGEMM3M_DEFAULT_Q 224
  1213. #define CGEMM3M_DEFAULT_R 12288
  1214. #define ZGEMM3M_DEFAULT_R 12288
  1215. #define XGEMM3M_DEFAULT_R 12288
  1216. #define GETRF_FACTOR 0.72
  1217. #endif
  1218. #ifdef HASWELL
  1219. #define SNUMOPT 16
  1220. #define DNUMOPT 8
  1221. #define GEMM_DEFAULT_OFFSET_A 0
  1222. #define GEMM_DEFAULT_OFFSET_B 0
  1223. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1224. #define SYMV_P 8
  1225. #if defined(XDOUBLE) || defined(DOUBLE)
  1226. #define SWITCH_RATIO 4
  1227. #define GEMM_PREFERED_SIZE 4
  1228. #else
  1229. #define SWITCH_RATIO 8
  1230. #define GEMM_PREFERED_SIZE 8
  1231. #endif
  1232. #ifdef ARCH_X86
  1233. #define SGEMM_DEFAULT_UNROLL_M 4
  1234. #define DGEMM_DEFAULT_UNROLL_M 2
  1235. #define QGEMM_DEFAULT_UNROLL_M 2
  1236. #define CGEMM_DEFAULT_UNROLL_M 2
  1237. #define ZGEMM_DEFAULT_UNROLL_M 1
  1238. #define XGEMM_DEFAULT_UNROLL_M 1
  1239. #define SGEMM_DEFAULT_UNROLL_N 4
  1240. #define DGEMM_DEFAULT_UNROLL_N 4
  1241. #define QGEMM_DEFAULT_UNROLL_N 2
  1242. #define CGEMM_DEFAULT_UNROLL_N 2
  1243. #define ZGEMM_DEFAULT_UNROLL_N 2
  1244. #define XGEMM_DEFAULT_UNROLL_N 1
  1245. #else
  1246. #define SGEMM_DEFAULT_UNROLL_M 8
  1247. #define DGEMM_DEFAULT_UNROLL_M 4
  1248. #define QGEMM_DEFAULT_UNROLL_M 2
  1249. #define CGEMM_DEFAULT_UNROLL_M 8
  1250. #define ZGEMM_DEFAULT_UNROLL_M 4
  1251. #define XGEMM_DEFAULT_UNROLL_M 1
  1252. #define SGEMM_DEFAULT_UNROLL_N 4
  1253. #define DGEMM_DEFAULT_UNROLL_N 8
  1254. #define QGEMM_DEFAULT_UNROLL_N 2
  1255. #define CGEMM_DEFAULT_UNROLL_N 2
  1256. #define ZGEMM_DEFAULT_UNROLL_N 2
  1257. #define XGEMM_DEFAULT_UNROLL_N 1
  1258. /*
  1259. #define SGEMM_DEFAULT_UNROLL_MN 32
  1260. #define DGEMM_DEFAULT_UNROLL_MN 32
  1261. */
  1262. #endif
  1263. #ifdef ARCH_X86
  1264. #define SGEMM_DEFAULT_P 512
  1265. #define SGEMM_DEFAULT_R sgemm_r
  1266. #define DGEMM_DEFAULT_P 512
  1267. #define DGEMM_DEFAULT_R dgemm_r
  1268. #define QGEMM_DEFAULT_P 504
  1269. #define QGEMM_DEFAULT_R qgemm_r
  1270. #define CGEMM_DEFAULT_P 128
  1271. #define CGEMM_DEFAULT_R 1024
  1272. #define ZGEMM_DEFAULT_P 512
  1273. #define ZGEMM_DEFAULT_R zgemm_r
  1274. #define XGEMM_DEFAULT_P 252
  1275. #define XGEMM_DEFAULT_R xgemm_r
  1276. #define SGEMM_DEFAULT_Q 256
  1277. #define DGEMM_DEFAULT_Q 256
  1278. #define QGEMM_DEFAULT_Q 128
  1279. #define CGEMM_DEFAULT_Q 256
  1280. #define ZGEMM_DEFAULT_Q 192
  1281. #define XGEMM_DEFAULT_Q 128
  1282. #else
  1283. #define SGEMM_DEFAULT_P 320
  1284. #define DGEMM_DEFAULT_P 512
  1285. #define CGEMM_DEFAULT_P 256
  1286. #define ZGEMM_DEFAULT_P 192
  1287. #ifdef WINDOWS_ABI
  1288. #define SGEMM_DEFAULT_Q 320
  1289. #define DGEMM_DEFAULT_Q 128
  1290. #else
  1291. #define SGEMM_DEFAULT_Q 320
  1292. #define DGEMM_DEFAULT_Q 256
  1293. #endif
  1294. #define CGEMM_DEFAULT_Q 256
  1295. #define ZGEMM_DEFAULT_Q 192
  1296. #define SGEMM_DEFAULT_R sgemm_r
  1297. #define DGEMM_DEFAULT_R 13824
  1298. #define CGEMM_DEFAULT_R cgemm_r
  1299. #define ZGEMM_DEFAULT_R zgemm_r
  1300. #define QGEMM_DEFAULT_Q 128
  1301. #define QGEMM_DEFAULT_P 504
  1302. #define QGEMM_DEFAULT_R qgemm_r
  1303. #define XGEMM_DEFAULT_P 252
  1304. #define XGEMM_DEFAULT_R xgemm_r
  1305. #define XGEMM_DEFAULT_Q 128
  1306. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1307. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1308. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1309. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1310. #define CGEMM3M_DEFAULT_P 320
  1311. #define ZGEMM3M_DEFAULT_P 256
  1312. #define XGEMM3M_DEFAULT_P 112
  1313. #define CGEMM3M_DEFAULT_Q 320
  1314. #define ZGEMM3M_DEFAULT_Q 256
  1315. #define XGEMM3M_DEFAULT_Q 224
  1316. #define CGEMM3M_DEFAULT_R 12288
  1317. #define ZGEMM3M_DEFAULT_R 12288
  1318. #define XGEMM3M_DEFAULT_R 12288
  1319. #endif
  1320. #endif
  1321. #ifdef SKYLAKEX
  1322. #define SNUMOPT 16
  1323. #define DNUMOPT 8
  1324. #define GEMM_DEFAULT_OFFSET_A 0
  1325. #define GEMM_DEFAULT_OFFSET_B 0
  1326. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1327. #define SYMV_P 8
  1328. #if defined(XDOUBLE) || defined(DOUBLE)
  1329. #define SWITCH_RATIO 8
  1330. #define GEMM_PREFERED_SIZE 8
  1331. #else
  1332. #define SWITCH_RATIO 16
  1333. #define GEMM_PREFERED_SIZE 16
  1334. #endif
  1335. #define USE_SGEMM_KERNEL_DIRECT 1
  1336. #ifdef ARCH_X86
  1337. #define SGEMM_DEFAULT_UNROLL_M 4
  1338. #define DGEMM_DEFAULT_UNROLL_M 2
  1339. #define QGEMM_DEFAULT_UNROLL_M 2
  1340. #define CGEMM_DEFAULT_UNROLL_M 2
  1341. #define ZGEMM_DEFAULT_UNROLL_M 1
  1342. #define XGEMM_DEFAULT_UNROLL_M 1
  1343. #define SGEMM_DEFAULT_UNROLL_N 4
  1344. #define DGEMM_DEFAULT_UNROLL_N 4
  1345. #define QGEMM_DEFAULT_UNROLL_N 2
  1346. #define CGEMM_DEFAULT_UNROLL_N 2
  1347. #define ZGEMM_DEFAULT_UNROLL_N 2
  1348. #define XGEMM_DEFAULT_UNROLL_N 1
  1349. #else
  1350. #define SGEMM_DEFAULT_UNROLL_M 16
  1351. #ifdef DYNAMIC_ARCH
  1352. #define DGEMM_DEFAULT_UNROLL_M 4
  1353. #else
  1354. #define DGEMM_DEFAULT_UNROLL_M 16
  1355. #endif
  1356. #define QGEMM_DEFAULT_UNROLL_M 2
  1357. #define CGEMM_DEFAULT_UNROLL_M 8
  1358. #define ZGEMM_DEFAULT_UNROLL_M 4
  1359. #define XGEMM_DEFAULT_UNROLL_M 1
  1360. #define SGEMM_DEFAULT_UNROLL_N 4
  1361. #ifdef DYNAMIC_ARCH
  1362. #define DGEMM_DEFAULT_UNROLL_N 8
  1363. #else
  1364. #define DGEMM_DEFAULT_UNROLL_N 2
  1365. #endif
  1366. #define QGEMM_DEFAULT_UNROLL_N 2
  1367. #define CGEMM_DEFAULT_UNROLL_N 2
  1368. #define ZGEMM_DEFAULT_UNROLL_N 2
  1369. #define XGEMM_DEFAULT_UNROLL_N 1
  1370. #define SGEMM_DEFAULT_UNROLL_MN 32
  1371. #define DGEMM_DEFAULT_UNROLL_MN 32
  1372. #endif
  1373. #ifdef ARCH_X86
  1374. #define SGEMM_DEFAULT_P 512
  1375. #define SGEMM_DEFAULT_R sgemm_r
  1376. #define DGEMM_DEFAULT_P 512
  1377. #define DGEMM_DEFAULT_R dgemm_r
  1378. #define QGEMM_DEFAULT_P 504
  1379. #define QGEMM_DEFAULT_R qgemm_r
  1380. #define CGEMM_DEFAULT_P 128
  1381. #define CGEMM_DEFAULT_R 1024
  1382. #define ZGEMM_DEFAULT_P 512
  1383. #define ZGEMM_DEFAULT_R zgemm_r
  1384. #define XGEMM_DEFAULT_P 252
  1385. #define XGEMM_DEFAULT_R xgemm_r
  1386. #define SGEMM_DEFAULT_Q 256
  1387. #define DGEMM_DEFAULT_Q 256
  1388. #define QGEMM_DEFAULT_Q 128
  1389. #define CGEMM_DEFAULT_Q 256
  1390. #define ZGEMM_DEFAULT_Q 192
  1391. #define XGEMM_DEFAULT_Q 128
  1392. #else
  1393. #define SGEMM_DEFAULT_P 448
  1394. #ifndef DYNAMIC_ARCH
  1395. #define DGEMM_DEFAULT_P 192
  1396. #else
  1397. #define DGEMM_DEFAULT_P 384
  1398. #endif
  1399. #define CGEMM_DEFAULT_P 384
  1400. #define ZGEMM_DEFAULT_P 256
  1401. #define SGEMM_DEFAULT_Q 448
  1402. #ifndef DYNAMIC_ARCH
  1403. #define DGEMM_DEFAULT_Q 384
  1404. #else
  1405. #define DGEMM_DEFAULT_Q 168
  1406. #endif
  1407. #define CGEMM_DEFAULT_Q 192
  1408. #define ZGEMM_DEFAULT_Q 128
  1409. #define SGEMM_DEFAULT_R sgemm_r
  1410. #ifndef DYNAMIC_ARCH
  1411. #define DGEMM_DEFAULT_R 8640
  1412. #else
  1413. #define DGEMM_DEFAULT_R 13824
  1414. #endif
  1415. #define CGEMM_DEFAULT_R cgemm_r
  1416. #define ZGEMM_DEFAULT_R zgemm_r
  1417. #define QGEMM_DEFAULT_Q 128
  1418. #define QGEMM_DEFAULT_P 504
  1419. #define QGEMM_DEFAULT_R qgemm_r
  1420. #define XGEMM_DEFAULT_P 252
  1421. #define XGEMM_DEFAULT_R xgemm_r
  1422. #define XGEMM_DEFAULT_Q 128
  1423. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1424. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1425. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1426. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1427. #define CGEMM3M_DEFAULT_P 320
  1428. #define ZGEMM3M_DEFAULT_P 256
  1429. #define XGEMM3M_DEFAULT_P 112
  1430. #define CGEMM3M_DEFAULT_Q 320
  1431. #define ZGEMM3M_DEFAULT_Q 256
  1432. #define XGEMM3M_DEFAULT_Q 224
  1433. #define CGEMM3M_DEFAULT_R 12288
  1434. #define ZGEMM3M_DEFAULT_R 12288
  1435. #define XGEMM3M_DEFAULT_R 12288
  1436. #endif
  1437. #endif
  1438. #ifdef SAPPHIRERAPIDS
  1439. #define SNUMOPT 16
  1440. #define DNUMOPT 8
  1441. #define GEMM_DEFAULT_OFFSET_A 0
  1442. #define GEMM_DEFAULT_OFFSET_B 0
  1443. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1444. #define SYMV_P 8
  1445. #if defined(XDOUBLE) || defined(DOUBLE)
  1446. #define SWITCH_RATIO 8
  1447. #define GEMM_PREFERED_SIZE 8
  1448. #else
  1449. #define SWITCH_RATIO 16
  1450. #define GEMM_PREFERED_SIZE 16
  1451. #endif
  1452. #define USE_SGEMM_KERNEL_DIRECT 1
  1453. #undef SBGEMM_DEFAULT_UNROLL_N
  1454. #undef SBGEMM_DEFAULT_UNROLL_M
  1455. #undef SBGEMM_DEFAULT_P
  1456. #undef SBGEMM_DEFAULT_R
  1457. #undef SBGEMM_DEFAULT_Q
  1458. // FIXME: actually UNROLL_M = UNROLL_N = 16
  1459. // If M and N is equal, OpenBLAS will reuse OCOPY as ICOPY.
  1460. // But for AMX, they are not the same, set UNROLL_M = 32 to workaround
  1461. #define SBGEMM_DEFAULT_UNROLL_N 16
  1462. #define SBGEMM_DEFAULT_UNROLL_M 32
  1463. #define SBGEMM_DEFAULT_P 256
  1464. #define SBGEMM_DEFAULT_Q 1024
  1465. #define SBGEMM_DEFAULT_R sbgemm_r
  1466. #ifdef ARCH_X86
  1467. #define SGEMM_DEFAULT_UNROLL_M 4
  1468. #define DGEMM_DEFAULT_UNROLL_M 2
  1469. #define QGEMM_DEFAULT_UNROLL_M 2
  1470. #define CGEMM_DEFAULT_UNROLL_M 2
  1471. #define ZGEMM_DEFAULT_UNROLL_M 1
  1472. #define XGEMM_DEFAULT_UNROLL_M 1
  1473. #define SGEMM_DEFAULT_UNROLL_N 4
  1474. #define DGEMM_DEFAULT_UNROLL_N 4
  1475. #define QGEMM_DEFAULT_UNROLL_N 2
  1476. #define CGEMM_DEFAULT_UNROLL_N 2
  1477. #define ZGEMM_DEFAULT_UNROLL_N 2
  1478. #define XGEMM_DEFAULT_UNROLL_N 1
  1479. #else
  1480. #define SGEMM_DEFAULT_UNROLL_M 16
  1481. #define DGEMM_DEFAULT_UNROLL_M 16
  1482. #define QGEMM_DEFAULT_UNROLL_M 2
  1483. #define CGEMM_DEFAULT_UNROLL_M 8
  1484. #define ZGEMM_DEFAULT_UNROLL_M 4
  1485. #define XGEMM_DEFAULT_UNROLL_M 1
  1486. #define SGEMM_DEFAULT_UNROLL_N 4
  1487. #define DGEMM_DEFAULT_UNROLL_N 2
  1488. #define QGEMM_DEFAULT_UNROLL_N 2
  1489. #define CGEMM_DEFAULT_UNROLL_N 2
  1490. #define ZGEMM_DEFAULT_UNROLL_N 2
  1491. #define XGEMM_DEFAULT_UNROLL_N 1
  1492. #define SGEMM_DEFAULT_UNROLL_MN 32
  1493. #define DGEMM_DEFAULT_UNROLL_MN 32
  1494. #endif
  1495. #ifdef ARCH_X86
  1496. #define SGEMM_DEFAULT_P 512
  1497. #define SGEMM_DEFAULT_R sgemm_r
  1498. #define DGEMM_DEFAULT_P 512
  1499. #define DGEMM_DEFAULT_R dgemm_r
  1500. #define QGEMM_DEFAULT_P 504
  1501. #define QGEMM_DEFAULT_R qgemm_r
  1502. #define CGEMM_DEFAULT_P 128
  1503. #define CGEMM_DEFAULT_R 1024
  1504. #define ZGEMM_DEFAULT_P 512
  1505. #define ZGEMM_DEFAULT_R zgemm_r
  1506. #define XGEMM_DEFAULT_P 252
  1507. #define XGEMM_DEFAULT_R xgemm_r
  1508. #define SGEMM_DEFAULT_Q 256
  1509. #define DGEMM_DEFAULT_Q 256
  1510. #define QGEMM_DEFAULT_Q 128
  1511. #define CGEMM_DEFAULT_Q 256
  1512. #define ZGEMM_DEFAULT_Q 192
  1513. #define XGEMM_DEFAULT_Q 128
  1514. #else
  1515. #define SGEMM_DEFAULT_P 640
  1516. #define DGEMM_DEFAULT_P 192
  1517. #define CGEMM_DEFAULT_P 384
  1518. #define ZGEMM_DEFAULT_P 256
  1519. #define SGEMM_DEFAULT_Q 320
  1520. #define DGEMM_DEFAULT_Q 384
  1521. #define CGEMM_DEFAULT_Q 192
  1522. #define ZGEMM_DEFAULT_Q 128
  1523. #define SGEMM_DEFAULT_R sgemm_r
  1524. #define DGEMM_DEFAULT_R 8640
  1525. #define CGEMM_DEFAULT_R cgemm_r
  1526. #define ZGEMM_DEFAULT_R zgemm_r
  1527. #define QGEMM_DEFAULT_Q 128
  1528. #define QGEMM_DEFAULT_P 504
  1529. #define QGEMM_DEFAULT_R qgemm_r
  1530. #define XGEMM_DEFAULT_P 252
  1531. #define XGEMM_DEFAULT_R xgemm_r
  1532. #define XGEMM_DEFAULT_Q 128
  1533. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1534. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1535. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1536. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1537. #define CGEMM3M_DEFAULT_P 320
  1538. #define ZGEMM3M_DEFAULT_P 256
  1539. #define XGEMM3M_DEFAULT_P 112
  1540. #define CGEMM3M_DEFAULT_Q 320
  1541. #define ZGEMM3M_DEFAULT_Q 256
  1542. #define XGEMM3M_DEFAULT_Q 224
  1543. #define CGEMM3M_DEFAULT_R 12288
  1544. #define ZGEMM3M_DEFAULT_R 12288
  1545. #define XGEMM3M_DEFAULT_R 12288
  1546. #endif
  1547. #endif
  1548. #ifdef COOPERLAKE
  1549. #define SNUMOPT 16
  1550. #define DNUMOPT 8
  1551. #define GEMM_DEFAULT_OFFSET_A 0
  1552. #define GEMM_DEFAULT_OFFSET_B 0
  1553. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1554. #define SYMV_P 8
  1555. #if defined(XDOUBLE) || defined(DOUBLE)
  1556. #define SWITCH_RATIO 8
  1557. #define GEMM_PREFERED_SIZE 8
  1558. #else
  1559. #define SWITCH_RATIO 16
  1560. #define GEMM_PREFERED_SIZE 16
  1561. #endif
  1562. #define USE_SGEMM_KERNEL_DIRECT 1
  1563. #undef SBGEMM_DEFAULT_UNROLL_N
  1564. #undef SBGEMM_DEFAULT_UNROLL_M
  1565. #undef SBGEMM_DEFAULT_P
  1566. #undef SBGEMM_DEFAULT_R
  1567. #undef SBGEMM_DEFAULT_Q
  1568. #define SBGEMM_DEFAULT_UNROLL_N 4
  1569. #define SBGEMM_DEFAULT_UNROLL_M 16
  1570. #define SBGEMM_DEFAULT_P 384
  1571. #define SBGEMM_DEFAULT_Q 768
  1572. #define SBGEMM_DEFAULT_R sbgemm_r
  1573. #ifdef ARCH_X86
  1574. #define SGEMM_DEFAULT_UNROLL_M 4
  1575. #define DGEMM_DEFAULT_UNROLL_M 2
  1576. #define QGEMM_DEFAULT_UNROLL_M 2
  1577. #define CGEMM_DEFAULT_UNROLL_M 2
  1578. #define ZGEMM_DEFAULT_UNROLL_M 1
  1579. #define XGEMM_DEFAULT_UNROLL_M 1
  1580. #define SGEMM_DEFAULT_UNROLL_N 4
  1581. #define DGEMM_DEFAULT_UNROLL_N 4
  1582. #define QGEMM_DEFAULT_UNROLL_N 2
  1583. #define CGEMM_DEFAULT_UNROLL_N 2
  1584. #define ZGEMM_DEFAULT_UNROLL_N 2
  1585. #define XGEMM_DEFAULT_UNROLL_N 1
  1586. #else
  1587. #define SGEMM_DEFAULT_UNROLL_M 16
  1588. #define DGEMM_DEFAULT_UNROLL_M 16
  1589. #define QGEMM_DEFAULT_UNROLL_M 2
  1590. #define CGEMM_DEFAULT_UNROLL_M 8
  1591. #define ZGEMM_DEFAULT_UNROLL_M 4
  1592. #define XGEMM_DEFAULT_UNROLL_M 1
  1593. #define SGEMM_DEFAULT_UNROLL_N 4
  1594. #define DGEMM_DEFAULT_UNROLL_N 2
  1595. #define QGEMM_DEFAULT_UNROLL_N 2
  1596. #define CGEMM_DEFAULT_UNROLL_N 2
  1597. #define ZGEMM_DEFAULT_UNROLL_N 2
  1598. #define XGEMM_DEFAULT_UNROLL_N 1
  1599. #define SGEMM_DEFAULT_UNROLL_MN 32
  1600. #define DGEMM_DEFAULT_UNROLL_MN 32
  1601. #endif
  1602. #ifdef ARCH_X86
  1603. #define SGEMM_DEFAULT_P 512
  1604. #define SGEMM_DEFAULT_R sgemm_r
  1605. #define DGEMM_DEFAULT_P 512
  1606. #define DGEMM_DEFAULT_R dgemm_r
  1607. #define QGEMM_DEFAULT_P 504
  1608. #define QGEMM_DEFAULT_R qgemm_r
  1609. #define CGEMM_DEFAULT_P 128
  1610. #define CGEMM_DEFAULT_R 1024
  1611. #define ZGEMM_DEFAULT_P 512
  1612. #define ZGEMM_DEFAULT_R zgemm_r
  1613. #define XGEMM_DEFAULT_P 252
  1614. #define XGEMM_DEFAULT_R xgemm_r
  1615. #define SGEMM_DEFAULT_Q 256
  1616. #define DGEMM_DEFAULT_Q 256
  1617. #define QGEMM_DEFAULT_Q 128
  1618. #define CGEMM_DEFAULT_Q 256
  1619. #define ZGEMM_DEFAULT_Q 192
  1620. #define XGEMM_DEFAULT_Q 128
  1621. #else
  1622. #define SGEMM_DEFAULT_P 640
  1623. #define DGEMM_DEFAULT_P 192
  1624. #define CGEMM_DEFAULT_P 384
  1625. #define ZGEMM_DEFAULT_P 256
  1626. #define SGEMM_DEFAULT_Q 320
  1627. #define DGEMM_DEFAULT_Q 384
  1628. #define CGEMM_DEFAULT_Q 192
  1629. #define ZGEMM_DEFAULT_Q 128
  1630. #define SGEMM_DEFAULT_R sgemm_r
  1631. #define DGEMM_DEFAULT_R 8640
  1632. #define CGEMM_DEFAULT_R cgemm_r
  1633. #define ZGEMM_DEFAULT_R zgemm_r
  1634. #define QGEMM_DEFAULT_Q 128
  1635. #define QGEMM_DEFAULT_P 504
  1636. #define QGEMM_DEFAULT_R qgemm_r
  1637. #define XGEMM_DEFAULT_P 252
  1638. #define XGEMM_DEFAULT_R xgemm_r
  1639. #define XGEMM_DEFAULT_Q 128
  1640. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1641. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1642. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1643. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1644. #define CGEMM3M_DEFAULT_P 320
  1645. #define ZGEMM3M_DEFAULT_P 256
  1646. #define XGEMM3M_DEFAULT_P 112
  1647. #define CGEMM3M_DEFAULT_Q 320
  1648. #define ZGEMM3M_DEFAULT_Q 256
  1649. #define XGEMM3M_DEFAULT_Q 224
  1650. #define CGEMM3M_DEFAULT_R 12288
  1651. #define ZGEMM3M_DEFAULT_R 12288
  1652. #define XGEMM3M_DEFAULT_R 12288
  1653. #endif
  1654. #endif
  1655. #ifdef ATOM
  1656. #define SNUMOPT 2
  1657. #define DNUMOPT 1
  1658. #define GEMM_DEFAULT_OFFSET_A 64
  1659. #define GEMM_DEFAULT_OFFSET_B 0
  1660. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1661. #define SYMV_P 8
  1662. #ifdef ARCH_X86
  1663. #define SGEMM_DEFAULT_UNROLL_M 4
  1664. #define DGEMM_DEFAULT_UNROLL_M 2
  1665. #define QGEMM_DEFAULT_UNROLL_M 2
  1666. #define CGEMM_DEFAULT_UNROLL_M 2
  1667. #define ZGEMM_DEFAULT_UNROLL_M 1
  1668. #define XGEMM_DEFAULT_UNROLL_M 1
  1669. #else
  1670. #define SGEMM_DEFAULT_UNROLL_M 8
  1671. #define DGEMM_DEFAULT_UNROLL_M 4
  1672. #define QGEMM_DEFAULT_UNROLL_M 2
  1673. #define CGEMM_DEFAULT_UNROLL_M 4
  1674. #define ZGEMM_DEFAULT_UNROLL_M 2
  1675. #define XGEMM_DEFAULT_UNROLL_M 1
  1676. #endif
  1677. #define SGEMM_DEFAULT_UNROLL_N 4
  1678. #define DGEMM_DEFAULT_UNROLL_N 2
  1679. #define QGEMM_DEFAULT_UNROLL_N 2
  1680. #define CGEMM_DEFAULT_UNROLL_N 2
  1681. #define ZGEMM_DEFAULT_UNROLL_N 1
  1682. #define XGEMM_DEFAULT_UNROLL_N 1
  1683. #define SGEMM_DEFAULT_P sgemm_p
  1684. #define SGEMM_DEFAULT_R sgemm_r
  1685. #define DGEMM_DEFAULT_P dgemm_p
  1686. #define DGEMM_DEFAULT_R dgemm_r
  1687. #define QGEMM_DEFAULT_P qgemm_p
  1688. #define QGEMM_DEFAULT_R qgemm_r
  1689. #define CGEMM_DEFAULT_P cgemm_p
  1690. #define CGEMM_DEFAULT_R cgemm_r
  1691. #define ZGEMM_DEFAULT_P zgemm_p
  1692. #define ZGEMM_DEFAULT_R zgemm_r
  1693. #define XGEMM_DEFAULT_P xgemm_p
  1694. #define XGEMM_DEFAULT_R xgemm_r
  1695. #define SGEMM_DEFAULT_Q 256
  1696. #define DGEMM_DEFAULT_Q 256
  1697. #define QGEMM_DEFAULT_Q 256
  1698. #define CGEMM_DEFAULT_Q 256
  1699. #define ZGEMM_DEFAULT_Q 256
  1700. #define XGEMM_DEFAULT_Q 256
  1701. #endif
  1702. #ifdef ITANIUM2
  1703. #define SNUMOPT 4
  1704. #define DNUMOPT 4
  1705. #define GEMM_DEFAULT_OFFSET_A 0
  1706. #define GEMM_DEFAULT_OFFSET_B 128
  1707. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1708. #define SGEMM_DEFAULT_UNROLL_M 8
  1709. #define SGEMM_DEFAULT_UNROLL_N 8
  1710. #define DGEMM_DEFAULT_UNROLL_M 8
  1711. #define DGEMM_DEFAULT_UNROLL_N 8
  1712. #define QGEMM_DEFAULT_UNROLL_M 8
  1713. #define QGEMM_DEFAULT_UNROLL_N 8
  1714. #define CGEMM_DEFAULT_UNROLL_M 4
  1715. #define CGEMM_DEFAULT_UNROLL_N 4
  1716. #define ZGEMM_DEFAULT_UNROLL_M 4
  1717. #define ZGEMM_DEFAULT_UNROLL_N 4
  1718. #define XGEMM_DEFAULT_UNROLL_M 4
  1719. #define XGEMM_DEFAULT_UNROLL_N 4
  1720. #define SGEMM_DEFAULT_P sgemm_p
  1721. #define DGEMM_DEFAULT_P dgemm_p
  1722. #define QGEMM_DEFAULT_P qgemm_p
  1723. #define CGEMM_DEFAULT_P cgemm_p
  1724. #define ZGEMM_DEFAULT_P zgemm_p
  1725. #define XGEMM_DEFAULT_P xgemm_p
  1726. #define SGEMM_DEFAULT_Q 1024
  1727. #define DGEMM_DEFAULT_Q 1024
  1728. #define QGEMM_DEFAULT_Q 1024
  1729. #define CGEMM_DEFAULT_Q 1024
  1730. #define ZGEMM_DEFAULT_Q 1024
  1731. #define XGEMM_DEFAULT_Q 1024
  1732. #define SGEMM_DEFAULT_R sgemm_r
  1733. #define DGEMM_DEFAULT_R dgemm_r
  1734. #define QGEMM_DEFAULT_R qgemm_r
  1735. #define CGEMM_DEFAULT_R cgemm_r
  1736. #define ZGEMM_DEFAULT_R zgemm_r
  1737. #define XGEMM_DEFAULT_R xgemm_r
  1738. #define SYMV_P 16
  1739. #define GETRF_FACTOR 0.65
  1740. #endif
  1741. #if defined(EV4) || defined(EV5) || defined(EV6)
  1742. #ifdef EV4
  1743. #define SNUMOPT 1
  1744. #define DNUMOPT 1
  1745. #else
  1746. #define SNUMOPT 2
  1747. #define DNUMOPT 2
  1748. #endif
  1749. #define GEMM_DEFAULT_OFFSET_A 512
  1750. #define GEMM_DEFAULT_OFFSET_B 512
  1751. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1752. #define SGEMM_DEFAULT_UNROLL_M 4
  1753. #define SGEMM_DEFAULT_UNROLL_N 4
  1754. #define DGEMM_DEFAULT_UNROLL_M 4
  1755. #define DGEMM_DEFAULT_UNROLL_N 4
  1756. #define CGEMM_DEFAULT_UNROLL_M 2
  1757. #define CGEMM_DEFAULT_UNROLL_N 2
  1758. #define ZGEMM_DEFAULT_UNROLL_M 2
  1759. #define ZGEMM_DEFAULT_UNROLL_N 2
  1760. #define SYMV_P 8
  1761. #ifdef EV4
  1762. #define SGEMM_DEFAULT_P 32
  1763. #define SGEMM_DEFAULT_Q 112
  1764. #define SGEMM_DEFAULT_R 256
  1765. #define DGEMM_DEFAULT_P 32
  1766. #define DGEMM_DEFAULT_Q 56
  1767. #define DGEMM_DEFAULT_R 256
  1768. #define CGEMM_DEFAULT_P 32
  1769. #define CGEMM_DEFAULT_Q 64
  1770. #define CGEMM_DEFAULT_R 240
  1771. #define ZGEMM_DEFAULT_P 32
  1772. #define ZGEMM_DEFAULT_Q 32
  1773. #define ZGEMM_DEFAULT_R 240
  1774. #endif
  1775. #ifdef EV5
  1776. #define SGEMM_DEFAULT_P 64
  1777. #define SGEMM_DEFAULT_Q 256
  1778. #define DGEMM_DEFAULT_P 64
  1779. #define DGEMM_DEFAULT_Q 128
  1780. #define CGEMM_DEFAULT_P 64
  1781. #define CGEMM_DEFAULT_Q 128
  1782. #define ZGEMM_DEFAULT_P 64
  1783. #define ZGEMM_DEFAULT_Q 64
  1784. #endif
  1785. #ifdef EV6
  1786. #define SGEMM_DEFAULT_P 256
  1787. #define SGEMM_DEFAULT_Q 512
  1788. #define DGEMM_DEFAULT_P 256
  1789. #define DGEMM_DEFAULT_Q 256
  1790. #define CGEMM_DEFAULT_P 256
  1791. #define CGEMM_DEFAULT_Q 256
  1792. #define ZGEMM_DEFAULT_P 128
  1793. #define ZGEMM_DEFAULT_Q 256
  1794. #endif
  1795. #endif
  1796. #ifdef CELL
  1797. #define SNUMOPT 2
  1798. #define DNUMOPT 2
  1799. #define GEMM_DEFAULT_OFFSET_A 0
  1800. #define GEMM_DEFAULT_OFFSET_B 8192
  1801. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1802. #define SGEMM_DEFAULT_UNROLL_M 16
  1803. #define SGEMM_DEFAULT_UNROLL_N 4
  1804. #define DGEMM_DEFAULT_UNROLL_M 4
  1805. #define DGEMM_DEFAULT_UNROLL_N 4
  1806. #define CGEMM_DEFAULT_UNROLL_M 8
  1807. #define CGEMM_DEFAULT_UNROLL_N 2
  1808. #define ZGEMM_DEFAULT_UNROLL_M 2
  1809. #define ZGEMM_DEFAULT_UNROLL_N 2
  1810. #define SGEMM_DEFAULT_P 128
  1811. #define DGEMM_DEFAULT_P 128
  1812. #define CGEMM_DEFAULT_P 128
  1813. #define ZGEMM_DEFAULT_P 128
  1814. #define SGEMM_DEFAULT_Q 512
  1815. #define DGEMM_DEFAULT_Q 256
  1816. #define CGEMM_DEFAULT_Q 256
  1817. #define ZGEMM_DEFAULT_Q 128
  1818. #define SYMV_P 4
  1819. #endif
  1820. #ifdef PPCG4
  1821. #define GEMM_DEFAULT_OFFSET_A 0
  1822. #define GEMM_DEFAULT_OFFSET_B 1024
  1823. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1824. #define SGEMM_DEFAULT_UNROLL_M 16
  1825. #define SGEMM_DEFAULT_UNROLL_N 4
  1826. #define DGEMM_DEFAULT_UNROLL_M 4
  1827. #define DGEMM_DEFAULT_UNROLL_N 4
  1828. #define CGEMM_DEFAULT_UNROLL_M 2
  1829. #define CGEMM_DEFAULT_UNROLL_N 2
  1830. #define ZGEMM_DEFAULT_UNROLL_M 2
  1831. #define ZGEMM_DEFAULT_UNROLL_N 2
  1832. #define SGEMM_DEFAULT_P 256
  1833. #define DGEMM_DEFAULT_P 128
  1834. #define CGEMM_DEFAULT_P 128
  1835. #define ZGEMM_DEFAULT_P 64
  1836. #define SGEMM_DEFAULT_Q 256
  1837. #define DGEMM_DEFAULT_Q 256
  1838. #define CGEMM_DEFAULT_Q 256
  1839. #define ZGEMM_DEFAULT_Q 256
  1840. #define SYMV_P 4
  1841. #endif
  1842. #ifdef PPC970
  1843. #define SNUMOPT 4
  1844. #define DNUMOPT 4
  1845. #define GEMM_DEFAULT_OFFSET_A 2688
  1846. #define GEMM_DEFAULT_OFFSET_B 3072
  1847. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1848. #if defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
  1849. #define SGEMM_DEFAULT_UNROLL_M 4
  1850. #else
  1851. #define SGEMM_DEFAULT_UNROLL_M 16
  1852. #endif
  1853. #define SGEMM_DEFAULT_UNROLL_N 4
  1854. #define DGEMM_DEFAULT_UNROLL_M 4
  1855. #define DGEMM_DEFAULT_UNROLL_N 4
  1856. #if defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
  1857. #define CGEMM_DEFAULT_UNROLL_M 2
  1858. #else
  1859. #define CGEMM_DEFAULT_UNROLL_M 8
  1860. #endif
  1861. #define CGEMM_DEFAULT_UNROLL_N 2
  1862. #define ZGEMM_DEFAULT_UNROLL_M 2
  1863. #define ZGEMM_DEFAULT_UNROLL_N 2
  1864. #if defined(OS_LINUX) || defined(OS_DARWIN) || defined(OS_FREEBSD)
  1865. #if L2_SIZE == 1024976
  1866. #define SGEMM_DEFAULT_P 320
  1867. #define DGEMM_DEFAULT_P 256
  1868. #define CGEMM_DEFAULT_P 256
  1869. #define ZGEMM_DEFAULT_P 256
  1870. #else
  1871. #define SGEMM_DEFAULT_P 176
  1872. #define DGEMM_DEFAULT_P 176
  1873. #define CGEMM_DEFAULT_P 176
  1874. #define ZGEMM_DEFAULT_P 176
  1875. #endif
  1876. #endif
  1877. #define SGEMM_DEFAULT_Q 512
  1878. #define DGEMM_DEFAULT_Q 256
  1879. #define CGEMM_DEFAULT_Q 256
  1880. #define ZGEMM_DEFAULT_Q 128
  1881. #define SYMV_P 4
  1882. #endif
  1883. #ifdef PPC440
  1884. #define SNUMOPT 2
  1885. #define DNUMOPT 2
  1886. #define GEMM_DEFAULT_OFFSET_A (32 * 0)
  1887. #define GEMM_DEFAULT_OFFSET_B (32 * 0)
  1888. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1889. #define SGEMM_DEFAULT_UNROLL_M 4
  1890. #define SGEMM_DEFAULT_UNROLL_N 4
  1891. #define DGEMM_DEFAULT_UNROLL_M 4
  1892. #define DGEMM_DEFAULT_UNROLL_N 4
  1893. #define CGEMM_DEFAULT_UNROLL_M 2
  1894. #define CGEMM_DEFAULT_UNROLL_N 2
  1895. #define ZGEMM_DEFAULT_UNROLL_M 2
  1896. #define ZGEMM_DEFAULT_UNROLL_N 2
  1897. #define SGEMM_DEFAULT_P 512
  1898. #define DGEMM_DEFAULT_P 512
  1899. #define CGEMM_DEFAULT_P 512
  1900. #define ZGEMM_DEFAULT_P 512
  1901. #define SGEMM_DEFAULT_Q 1024
  1902. #define DGEMM_DEFAULT_Q 512
  1903. #define CGEMM_DEFAULT_Q 512
  1904. #define ZGEMM_DEFAULT_Q 256
  1905. #define SGEMM_DEFAULT_R SGEMM_DEFAULT_P
  1906. #define DGEMM_DEFAULT_R DGEMM_DEFAULT_P
  1907. #define CGEMM_DEFAULT_R CGEMM_DEFAULT_P
  1908. #define ZGEMM_DEFAULT_R ZGEMM_DEFAULT_P
  1909. #define SYMV_P 4
  1910. #endif
  1911. #ifdef PPC440FP2
  1912. #define SNUMOPT 4
  1913. #define DNUMOPT 4
  1914. #define GEMM_DEFAULT_OFFSET_A (32 * 0)
  1915. #define GEMM_DEFAULT_OFFSET_B (32 * 0)
  1916. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1917. #define SGEMM_DEFAULT_UNROLL_M 8
  1918. #define SGEMM_DEFAULT_UNROLL_N 4
  1919. #define DGEMM_DEFAULT_UNROLL_M 8
  1920. #define DGEMM_DEFAULT_UNROLL_N 4
  1921. #define CGEMM_DEFAULT_UNROLL_M 4
  1922. #define CGEMM_DEFAULT_UNROLL_N 2
  1923. #define ZGEMM_DEFAULT_UNROLL_M 4
  1924. #define ZGEMM_DEFAULT_UNROLL_N 2
  1925. #define SGEMM_DEFAULT_P 128
  1926. #define DGEMM_DEFAULT_P 128
  1927. #define CGEMM_DEFAULT_P 128
  1928. #define ZGEMM_DEFAULT_P 128
  1929. #if 1
  1930. #define SGEMM_DEFAULT_Q 4096
  1931. #define DGEMM_DEFAULT_Q 3072
  1932. #define CGEMM_DEFAULT_Q 2048
  1933. #define ZGEMM_DEFAULT_Q 1024
  1934. #else
  1935. #define SGEMM_DEFAULT_Q 512
  1936. #define DGEMM_DEFAULT_Q 256
  1937. #define CGEMM_DEFAULT_Q 256
  1938. #define ZGEMM_DEFAULT_Q 128
  1939. #endif
  1940. #define SYMV_P 4
  1941. #endif
  1942. #if defined(POWER3) || defined(POWER4) || defined(POWER5)
  1943. #define GEMM_DEFAULT_OFFSET_A 0
  1944. #define GEMM_DEFAULT_OFFSET_B 2048
  1945. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1946. #define SGEMM_DEFAULT_UNROLL_M 4
  1947. #define SGEMM_DEFAULT_UNROLL_N 4
  1948. #define DGEMM_DEFAULT_UNROLL_M 4
  1949. #define DGEMM_DEFAULT_UNROLL_N 4
  1950. #define CGEMM_DEFAULT_UNROLL_M 2
  1951. #define CGEMM_DEFAULT_UNROLL_N 2
  1952. #define ZGEMM_DEFAULT_UNROLL_M 2
  1953. #define ZGEMM_DEFAULT_UNROLL_N 2
  1954. #ifdef POWER3
  1955. #define SNUMOPT 4
  1956. #define DNUMOPT 4
  1957. #define SGEMM_DEFAULT_P 256
  1958. #define SGEMM_DEFAULT_Q 432
  1959. #define SGEMM_DEFAULT_R 1012
  1960. #define DGEMM_DEFAULT_P 256
  1961. #define DGEMM_DEFAULT_Q 216
  1962. #define DGEMM_DEFAULT_R 1012
  1963. #define CGEMM_DEFAULT_P 256
  1964. #define CGEMM_DEFAULT_Q 104
  1965. #define CGEMM_DEFAULT_R 1012
  1966. #define ZGEMM_DEFAULT_P 256
  1967. #define ZGEMM_DEFAULT_Q 104
  1968. #define ZGEMM_DEFAULT_R 1012
  1969. #endif
  1970. #if defined(POWER4)
  1971. #ifdef ALLOC_HUGETLB
  1972. #define SGEMM_DEFAULT_P 184
  1973. #define DGEMM_DEFAULT_P 184
  1974. #define CGEMM_DEFAULT_P 184
  1975. #define ZGEMM_DEFAULT_P 184
  1976. #else
  1977. #define SGEMM_DEFAULT_P 144
  1978. #define DGEMM_DEFAULT_P 144
  1979. #define CGEMM_DEFAULT_P 144
  1980. #define ZGEMM_DEFAULT_P 144
  1981. #endif
  1982. #define SGEMM_DEFAULT_Q 256
  1983. #define CGEMM_DEFAULT_Q 256
  1984. #define DGEMM_DEFAULT_Q 256
  1985. #define ZGEMM_DEFAULT_Q 256
  1986. #endif
  1987. #if defined(POWER5)
  1988. #ifdef ALLOC_HUGETLB
  1989. #define SGEMM_DEFAULT_P 512
  1990. #define DGEMM_DEFAULT_P 256
  1991. #define CGEMM_DEFAULT_P 256
  1992. #define ZGEMM_DEFAULT_P 128
  1993. #else
  1994. #define SGEMM_DEFAULT_P 320
  1995. #define DGEMM_DEFAULT_P 160
  1996. #define CGEMM_DEFAULT_P 160
  1997. #define ZGEMM_DEFAULT_P 80
  1998. #endif
  1999. #define SGEMM_DEFAULT_Q 256
  2000. #define CGEMM_DEFAULT_Q 256
  2001. #define DGEMM_DEFAULT_Q 256
  2002. #define ZGEMM_DEFAULT_Q 256
  2003. #endif
  2004. #define SYMV_P 8
  2005. #endif
  2006. #if defined(POWER6)
  2007. #define SNUMOPT 4
  2008. #define DNUMOPT 4
  2009. #define GEMM_DEFAULT_OFFSET_A 384
  2010. #define GEMM_DEFAULT_OFFSET_B 1024
  2011. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2012. #define SGEMM_DEFAULT_UNROLL_M 4
  2013. #define SGEMM_DEFAULT_UNROLL_N 4
  2014. #define DGEMM_DEFAULT_UNROLL_M 4
  2015. #define DGEMM_DEFAULT_UNROLL_N 4
  2016. #define CGEMM_DEFAULT_UNROLL_M 2
  2017. #define CGEMM_DEFAULT_UNROLL_N 4
  2018. #define ZGEMM_DEFAULT_UNROLL_M 2
  2019. #define ZGEMM_DEFAULT_UNROLL_N 4
  2020. #define SGEMM_DEFAULT_P 992
  2021. #define DGEMM_DEFAULT_P 480
  2022. #define CGEMM_DEFAULT_P 488
  2023. #define ZGEMM_DEFAULT_P 248
  2024. #define SGEMM_DEFAULT_Q 504
  2025. #define DGEMM_DEFAULT_Q 504
  2026. #define CGEMM_DEFAULT_Q 400
  2027. #define ZGEMM_DEFAULT_Q 400
  2028. #define SYMV_P 8
  2029. #endif
  2030. #if defined(POWER8)
  2031. #define SNUMOPT 16
  2032. #define DNUMOPT 8
  2033. #define GEMM_DEFAULT_OFFSET_A 0
  2034. #define GEMM_DEFAULT_OFFSET_B 65536
  2035. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2036. #if defined(__32BIT__)
  2037. #warning using BINARY32==POWER6
  2038. #define SGEMM_DEFAULT_UNROLL_M 4
  2039. #define SGEMM_DEFAULT_UNROLL_N 4
  2040. #define DGEMM_DEFAULT_UNROLL_M 4
  2041. #define DGEMM_DEFAULT_UNROLL_N 4
  2042. #define CGEMM_DEFAULT_UNROLL_M 2
  2043. #define CGEMM_DEFAULT_UNROLL_N 4
  2044. #define ZGEMM_DEFAULT_UNROLL_M 2
  2045. #define ZGEMM_DEFAULT_UNROLL_N 4
  2046. #else
  2047. #define SGEMM_DEFAULT_UNROLL_M 16
  2048. #define SGEMM_DEFAULT_UNROLL_N 8
  2049. #define DGEMM_DEFAULT_UNROLL_M 16
  2050. #define DGEMM_DEFAULT_UNROLL_N 4
  2051. #define CGEMM_DEFAULT_UNROLL_M 8
  2052. #define CGEMM_DEFAULT_UNROLL_N 4
  2053. #define ZGEMM_DEFAULT_UNROLL_M 8
  2054. #define ZGEMM_DEFAULT_UNROLL_N 2
  2055. #endif
  2056. #define SGEMM_DEFAULT_P 1280UL
  2057. #define DGEMM_DEFAULT_P 640UL
  2058. #define CGEMM_DEFAULT_P 640UL
  2059. #define ZGEMM_DEFAULT_P 320UL
  2060. #define SGEMM_DEFAULT_Q 640UL
  2061. #define DGEMM_DEFAULT_Q 720UL
  2062. #define CGEMM_DEFAULT_Q 640UL
  2063. #define ZGEMM_DEFAULT_Q 640UL
  2064. #if 0
  2065. #define SGEMM_DEFAULT_R SGEMM_DEFAULT_P
  2066. #define DGEMM_DEFAULT_R DGEMM_DEFAULT_P
  2067. #define CGEMM_DEFAULT_R CGEMM_DEFAULT_P
  2068. #define ZGEMM_DEFAULT_R ZGEMM_DEFAULT_P
  2069. #endif
  2070. #define SGEMM_DEFAULT_R 4096
  2071. #define DGEMM_DEFAULT_R 4096
  2072. #define CGEMM_DEFAULT_R 4096
  2073. #define ZGEMM_DEFAULT_R 4096
  2074. #define SYMV_P 8
  2075. #endif
  2076. #if defined(POWER9)
  2077. #define SNUMOPT 16
  2078. #define DNUMOPT 8
  2079. #define GEMM_DEFAULT_OFFSET_A 0
  2080. #define GEMM_DEFAULT_OFFSET_B 65536
  2081. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2082. #define SWITCH_RATIO 16
  2083. #define GEMM_PREFERED_SIZE 16
  2084. #define SGEMM_DEFAULT_UNROLL_M 16
  2085. #define SGEMM_DEFAULT_UNROLL_N 8
  2086. #define DGEMM_DEFAULT_UNROLL_M 16
  2087. #define DGEMM_DEFAULT_UNROLL_N 4
  2088. #define CGEMM_DEFAULT_UNROLL_M 8
  2089. #define CGEMM_DEFAULT_UNROLL_N 4
  2090. #define ZGEMM_DEFAULT_UNROLL_M 8
  2091. #define ZGEMM_DEFAULT_UNROLL_N 2
  2092. #define SGEMM_DEFAULT_P 832
  2093. #define DGEMM_DEFAULT_P 128
  2094. #define CGEMM_DEFAULT_P 512
  2095. #define ZGEMM_DEFAULT_P 256
  2096. #define SGEMM_DEFAULT_Q 1026
  2097. #define DGEMM_DEFAULT_Q 384
  2098. #define CGEMM_DEFAULT_Q 1026
  2099. #define ZGEMM_DEFAULT_Q 1026
  2100. #define SGEMM_DEFAULT_R 4096
  2101. #define DGEMM_DEFAULT_R 4096
  2102. #define CGEMM_DEFAULT_R 4096
  2103. #define ZGEMM_DEFAULT_R 4096
  2104. #define SYMV_P 8
  2105. #endif
  2106. #if defined(POWER10)
  2107. #define SNUMOPT 16
  2108. #define DNUMOPT 8
  2109. #define GEMM_DEFAULT_OFFSET_A 0
  2110. #define GEMM_DEFAULT_OFFSET_B 65536
  2111. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2112. #define SWITCH_RATIO 16
  2113. #define GEMM_PREFERED_SIZE 16
  2114. #define SGEMM_DEFAULT_UNROLL_M 16
  2115. #define SGEMM_DEFAULT_UNROLL_N 8
  2116. #if defined(HAVE_GAS) && (HAVE_GAS == 1)
  2117. #define DGEMM_DEFAULT_UNROLL_M 16
  2118. #define DGEMM_DEFAULT_UNROLL_N 4
  2119. #else
  2120. #define DGEMM_DEFAULT_UNROLL_M 8
  2121. #define DGEMM_DEFAULT_UNROLL_N 8
  2122. #endif
  2123. #define CGEMM_DEFAULT_UNROLL_M 8
  2124. #define CGEMM_DEFAULT_UNROLL_N 4
  2125. #define ZGEMM_DEFAULT_UNROLL_M 8
  2126. #define ZGEMM_DEFAULT_UNROLL_N 2
  2127. #define SGEMM_DEFAULT_P 512
  2128. #define DGEMM_DEFAULT_P 384
  2129. #define CGEMM_DEFAULT_P 512
  2130. #define ZGEMM_DEFAULT_P 256
  2131. #define SGEMM_DEFAULT_Q 512
  2132. #define DGEMM_DEFAULT_Q 512
  2133. #define CGEMM_DEFAULT_Q 384
  2134. #define ZGEMM_DEFAULT_Q 384
  2135. #define SGEMM_DEFAULT_R 4096
  2136. #define DGEMM_DEFAULT_R 4096
  2137. #define CGEMM_DEFAULT_R 4096
  2138. #define ZGEMM_DEFAULT_R 4096
  2139. #define SYMV_P 8
  2140. #undef SBGEMM_DEFAULT_UNROLL_N
  2141. #undef SBGEMM_DEFAULT_UNROLL_M
  2142. #undef SBGEMM_DEFAULT_P
  2143. #undef SBGEMM_DEFAULT_R
  2144. #undef SBGEMM_DEFAULT_Q
  2145. #define SBGEMM_DEFAULT_UNROLL_M 16
  2146. #define SBGEMM_DEFAULT_UNROLL_N 8
  2147. #define SBGEMM_DEFAULT_P 832
  2148. #define SBGEMM_DEFAULT_Q 1026
  2149. #define SBGEMM_DEFAULT_R 4096
  2150. #endif
  2151. #if defined(SPARC) && defined(V7)
  2152. #define SNUMOPT 4
  2153. #define DNUMOPT 4
  2154. #define GEMM_DEFAULT_OFFSET_A 0
  2155. #define GEMM_DEFAULT_OFFSET_B 2048
  2156. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2157. #define SGEMM_DEFAULT_UNROLL_M 2
  2158. #define SGEMM_DEFAULT_UNROLL_N 8
  2159. #define DGEMM_DEFAULT_UNROLL_M 2
  2160. #define DGEMM_DEFAULT_UNROLL_N 8
  2161. #define CGEMM_DEFAULT_UNROLL_M 1
  2162. #define CGEMM_DEFAULT_UNROLL_N 4
  2163. #define ZGEMM_DEFAULT_UNROLL_M 1
  2164. #define ZGEMM_DEFAULT_UNROLL_N 4
  2165. #define SGEMM_DEFAULT_P 256
  2166. #define DGEMM_DEFAULT_P 256
  2167. #define CGEMM_DEFAULT_P 256
  2168. #define ZGEMM_DEFAULT_P 256
  2169. #define SGEMM_DEFAULT_Q 512
  2170. #define DGEMM_DEFAULT_Q 256
  2171. #define CGEMM_DEFAULT_Q 256
  2172. #define ZGEMM_DEFAULT_Q 128
  2173. #define SYMV_P 8
  2174. #define GEMM_THREAD gemm_thread_mn
  2175. #endif
  2176. #if (defined(SPARC) && defined(V9)) || defined(__sparc_v9__)
  2177. #define SNUMOPT 2
  2178. #define DNUMOPT 2
  2179. #define GEMM_DEFAULT_OFFSET_A 0
  2180. #define GEMM_DEFAULT_OFFSET_B 2048
  2181. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2182. #define SGEMM_DEFAULT_UNROLL_M 4
  2183. #define SGEMM_DEFAULT_UNROLL_N 4
  2184. #define DGEMM_DEFAULT_UNROLL_M 4
  2185. #define DGEMM_DEFAULT_UNROLL_N 4
  2186. #define CGEMM_DEFAULT_UNROLL_M 2
  2187. #define CGEMM_DEFAULT_UNROLL_N 2
  2188. #define ZGEMM_DEFAULT_UNROLL_M 2
  2189. #define ZGEMM_DEFAULT_UNROLL_N 2
  2190. #define SGEMM_DEFAULT_P 512
  2191. #define DGEMM_DEFAULT_P 512
  2192. #define CGEMM_DEFAULT_P 512
  2193. #define ZGEMM_DEFAULT_P 512
  2194. #define SGEMM_DEFAULT_Q 1024
  2195. #define DGEMM_DEFAULT_Q 512
  2196. #define CGEMM_DEFAULT_Q 512
  2197. #define ZGEMM_DEFAULT_Q 256
  2198. #define SYMV_P 8
  2199. #endif
  2200. #ifdef SICORTEX
  2201. #define SNUMOPT 2
  2202. #define DNUMOPT 2
  2203. #define GEMM_DEFAULT_OFFSET_A 0
  2204. #define GEMM_DEFAULT_OFFSET_B 0
  2205. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2206. #define SGEMM_DEFAULT_UNROLL_M 2
  2207. #define SGEMM_DEFAULT_UNROLL_N 8
  2208. #define DGEMM_DEFAULT_UNROLL_M 2
  2209. #define DGEMM_DEFAULT_UNROLL_N 8
  2210. #define CGEMM_DEFAULT_UNROLL_M 1
  2211. #define CGEMM_DEFAULT_UNROLL_N 4
  2212. #define ZGEMM_DEFAULT_UNROLL_M 1
  2213. #define ZGEMM_DEFAULT_UNROLL_N 4
  2214. #define SGEMM_DEFAULT_P 108
  2215. #define DGEMM_DEFAULT_P 112
  2216. #define CGEMM_DEFAULT_P 108
  2217. #define ZGEMM_DEFAULT_P 112
  2218. #define SGEMM_DEFAULT_Q 288
  2219. #define DGEMM_DEFAULT_Q 144
  2220. #define CGEMM_DEFAULT_Q 144
  2221. #define ZGEMM_DEFAULT_Q 72
  2222. #define SGEMM_DEFAULT_R 2000
  2223. #define DGEMM_DEFAULT_R 2000
  2224. #define CGEMM_DEFAULT_R 2000
  2225. #define ZGEMM_DEFAULT_R 2000
  2226. #define SYMV_P 16
  2227. #endif
  2228. #if defined(LOONGSON3R4)
  2229. #define SNUMOPT 2
  2230. #define DNUMOPT 2
  2231. #define GEMM_DEFAULT_OFFSET_A 0
  2232. #define GEMM_DEFAULT_OFFSET_B 0
  2233. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2234. #ifdef HAVE_MSA
  2235. #define SGEMM_DEFAULT_UNROLL_M 8
  2236. #define SGEMM_DEFAULT_UNROLL_N 8
  2237. #define DGEMM_DEFAULT_UNROLL_M 8
  2238. #define DGEMM_DEFAULT_UNROLL_N 4
  2239. #define CGEMM_DEFAULT_UNROLL_M 8
  2240. #define CGEMM_DEFAULT_UNROLL_N 4
  2241. #define ZGEMM_DEFAULT_UNROLL_M 4
  2242. #define ZGEMM_DEFAULT_UNROLL_N 4
  2243. #else
  2244. #define SGEMM_DEFAULT_UNROLL_M 8
  2245. #define SGEMM_DEFAULT_UNROLL_N 4
  2246. #define DGEMM_DEFAULT_UNROLL_M 4
  2247. #define DGEMM_DEFAULT_UNROLL_N 4
  2248. #define CGEMM_DEFAULT_UNROLL_M 4
  2249. #define CGEMM_DEFAULT_UNROLL_N 2
  2250. #define ZGEMM_DEFAULT_UNROLL_M 2
  2251. #define ZGEMM_DEFAULT_UNROLL_N 2
  2252. #endif
  2253. #define SGEMM_DEFAULT_P 64
  2254. #define DGEMM_DEFAULT_P 44
  2255. #define CGEMM_DEFAULT_P 64
  2256. #define ZGEMM_DEFAULT_P 32
  2257. #define SGEMM_DEFAULT_Q 192
  2258. #define DGEMM_DEFAULT_Q 92
  2259. #define CGEMM_DEFAULT_Q 128
  2260. #define ZGEMM_DEFAULT_Q 80
  2261. #define SGEMM_DEFAULT_R 640
  2262. #define DGEMM_DEFAULT_R dgemm_r
  2263. #define CGEMM_DEFAULT_R 640
  2264. #define ZGEMM_DEFAULT_R 640
  2265. #define GEMM_OFFSET_A1 0x10000
  2266. #define GEMM_OFFSET_B1 0x100000
  2267. #define SYMV_P 16
  2268. #endif
  2269. #if defined(LOONGSON3R3)
  2270. ////Copy from SICORTEX
  2271. #define SNUMOPT 2
  2272. #define DNUMOPT 2
  2273. #define GEMM_DEFAULT_OFFSET_A 0
  2274. #define GEMM_DEFAULT_OFFSET_B 0
  2275. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2276. #define SGEMM_DEFAULT_UNROLL_M 8
  2277. #define SGEMM_DEFAULT_UNROLL_N 4
  2278. #define DGEMM_DEFAULT_UNROLL_M 4
  2279. #define DGEMM_DEFAULT_UNROLL_N 4
  2280. #define CGEMM_DEFAULT_UNROLL_M 4
  2281. #define CGEMM_DEFAULT_UNROLL_N 2
  2282. #define ZGEMM_DEFAULT_UNROLL_M 2
  2283. #define ZGEMM_DEFAULT_UNROLL_N 2
  2284. #define SGEMM_DEFAULT_P 64
  2285. #define DGEMM_DEFAULT_P 44
  2286. #define CGEMM_DEFAULT_P 64
  2287. #define ZGEMM_DEFAULT_P 32
  2288. #define SGEMM_DEFAULT_Q 192
  2289. #define DGEMM_DEFAULT_Q 92
  2290. #define CGEMM_DEFAULT_Q 128
  2291. #define ZGEMM_DEFAULT_Q 80
  2292. #define SGEMM_DEFAULT_R 640
  2293. #define DGEMM_DEFAULT_R dgemm_r
  2294. #define CGEMM_DEFAULT_R 640
  2295. #define ZGEMM_DEFAULT_R 640
  2296. #define GEMM_OFFSET_A1 0x10000
  2297. #define GEMM_OFFSET_B1 0x100000
  2298. #define SYMV_P 16
  2299. #endif
  2300. #if defined (LOONGSON3R5)
  2301. #define SNUMOPT 2
  2302. #define DNUMOPT 2
  2303. #define GEMM_DEFAULT_OFFSET_A 0
  2304. #define GEMM_DEFAULT_OFFSET_B 0
  2305. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2306. #define SGEMM_DEFAULT_UNROLL_N 8
  2307. #define DGEMM_DEFAULT_UNROLL_N 4
  2308. #define QGEMM_DEFAULT_UNROLL_N 2
  2309. #define CGEMM_DEFAULT_UNROLL_N 4
  2310. #define ZGEMM_DEFAULT_UNROLL_N 4
  2311. #define XGEMM_DEFAULT_UNROLL_N 1
  2312. #define SGEMM_DEFAULT_UNROLL_M 2
  2313. #define DGEMM_DEFAULT_UNROLL_M 16
  2314. #define QGEMM_DEFAULT_UNROLL_M 2
  2315. #define CGEMM_DEFAULT_UNROLL_M 1
  2316. #define ZGEMM_DEFAULT_UNROLL_M 1
  2317. #define XGEMM_DEFAULT_UNROLL_M 1
  2318. #define SGEMM_DEFAULT_P sgemm_p
  2319. #define DGEMM_DEFAULT_P 32
  2320. #define QGEMM_DEFAULT_P qgemm_p
  2321. #define CGEMM_DEFAULT_P cgemm_p
  2322. #define ZGEMM_DEFAULT_P zgemm_p
  2323. #define XGEMM_DEFAULT_P xgemm_p
  2324. #define SGEMM_DEFAULT_R sgemm_r
  2325. #define DGEMM_DEFAULT_R 858
  2326. #define QGEMM_DEFAULT_R qgemm_r
  2327. #define CGEMM_DEFAULT_R cgemm_r
  2328. #define ZGEMM_DEFAULT_R zgemm_r
  2329. #define XGEMM_DEFAULT_R xgemm_r
  2330. #define SGEMM_DEFAULT_Q 128
  2331. #define DGEMM_DEFAULT_Q 152
  2332. #define QGEMM_DEFAULT_Q 128
  2333. #define CGEMM_DEFAULT_Q 128
  2334. #define ZGEMM_DEFAULT_Q 128
  2335. #define XGEMM_DEFAULT_Q 128
  2336. #define SYMV_P 16
  2337. #endif
  2338. #if defined(P5600) || defined(MIPS1004K) || defined(MIPS24K) || defined(I6400) || defined(P6600) || defined(I6500)
  2339. #define SNUMOPT 2
  2340. #define DNUMOPT 2
  2341. #define GEMM_DEFAULT_OFFSET_A 0
  2342. #define GEMM_DEFAULT_OFFSET_B 0
  2343. #define GEMM_DEFAULT_ALIGN (BLASLONG) 0x03fffUL
  2344. #if defined(HAVE_MSA) && !defined(NO_MSA)
  2345. #define SGEMM_DEFAULT_UNROLL_M 8
  2346. #define SGEMM_DEFAULT_UNROLL_N 8
  2347. #define DGEMM_DEFAULT_UNROLL_M 8
  2348. #define DGEMM_DEFAULT_UNROLL_N 4
  2349. #define CGEMM_DEFAULT_UNROLL_M 8
  2350. #define CGEMM_DEFAULT_UNROLL_N 4
  2351. #define ZGEMM_DEFAULT_UNROLL_M 4
  2352. #define ZGEMM_DEFAULT_UNROLL_N 4
  2353. #else
  2354. #define SGEMM_DEFAULT_UNROLL_M 2
  2355. #define SGEMM_DEFAULT_UNROLL_N 2
  2356. #define DGEMM_DEFAULT_UNROLL_M 2
  2357. #define DGEMM_DEFAULT_UNROLL_N 2
  2358. #define CGEMM_DEFAULT_UNROLL_M 2
  2359. #define CGEMM_DEFAULT_UNROLL_N 2
  2360. #define ZGEMM_DEFAULT_UNROLL_M 2
  2361. #define ZGEMM_DEFAULT_UNROLL_N 2
  2362. #endif
  2363. #define SGEMM_DEFAULT_P 128
  2364. #define DGEMM_DEFAULT_P 128
  2365. #define CGEMM_DEFAULT_P 96
  2366. #define ZGEMM_DEFAULT_P 64
  2367. #define SGEMM_DEFAULT_Q 240
  2368. #define DGEMM_DEFAULT_Q 120
  2369. #define CGEMM_DEFAULT_Q 120
  2370. #define ZGEMM_DEFAULT_Q 120
  2371. #define SGEMM_DEFAULT_R 12288
  2372. #define DGEMM_DEFAULT_R 8192
  2373. #define CGEMM_DEFAULT_R 4096
  2374. #define ZGEMM_DEFAULT_R 4096
  2375. #define SYMV_P 16
  2376. #endif
  2377. #ifdef RISCV64_GENERIC
  2378. #define GEMM_DEFAULT_OFFSET_A 0
  2379. #define GEMM_DEFAULT_OFFSET_B 0
  2380. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2381. #define SGEMM_DEFAULT_UNROLL_M 2
  2382. #define SGEMM_DEFAULT_UNROLL_N 2
  2383. #define DGEMM_DEFAULT_UNROLL_M 2
  2384. #define DGEMM_DEFAULT_UNROLL_N 2
  2385. #define CGEMM_DEFAULT_UNROLL_M 2
  2386. #define CGEMM_DEFAULT_UNROLL_N 2
  2387. #define ZGEMM_DEFAULT_UNROLL_M 2
  2388. #define ZGEMM_DEFAULT_UNROLL_N 2
  2389. #define SGEMM_DEFAULT_P 128
  2390. #define DGEMM_DEFAULT_P 128
  2391. #define CGEMM_DEFAULT_P 96
  2392. #define ZGEMM_DEFAULT_P 64
  2393. #define SGEMM_DEFAULT_Q 240
  2394. #define DGEMM_DEFAULT_Q 120
  2395. #define CGEMM_DEFAULT_Q 120
  2396. #define ZGEMM_DEFAULT_Q 120
  2397. #define SGEMM_DEFAULT_R 12288
  2398. #define DGEMM_DEFAULT_R 8192
  2399. #define CGEMM_DEFAULT_R 4096
  2400. #define ZGEMM_DEFAULT_R 4096
  2401. #define SYMV_P 16
  2402. #define GEMM_DEFAULT_OFFSET_A 0
  2403. #define GEMM_DEFAULT_OFFSET_B 0
  2404. #endif
  2405. #ifdef C910V
  2406. #define GEMM_DEFAULT_OFFSET_A 0
  2407. #define GEMM_DEFAULT_OFFSET_B 0
  2408. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2409. #define SGEMM_DEFAULT_UNROLL_M 16
  2410. #define SGEMM_DEFAULT_UNROLL_N 4
  2411. #define DGEMM_DEFAULT_UNROLL_M 8
  2412. #define DGEMM_DEFAULT_UNROLL_N 4
  2413. #define CGEMM_DEFAULT_UNROLL_M 2
  2414. #define CGEMM_DEFAULT_UNROLL_N 2
  2415. #define ZGEMM_DEFAULT_UNROLL_M 2
  2416. #define ZGEMM_DEFAULT_UNROLL_N 2
  2417. #define SGEMM_DEFAULT_P 160
  2418. #define DGEMM_DEFAULT_P 160
  2419. #define CGEMM_DEFAULT_P 96
  2420. #define ZGEMM_DEFAULT_P 64
  2421. #define SGEMM_DEFAULT_Q 240
  2422. #define DGEMM_DEFAULT_Q 128
  2423. #define CGEMM_DEFAULT_Q 120
  2424. #define ZGEMM_DEFAULT_Q 120
  2425. #define SGEMM_DEFAULT_R 12288
  2426. #define DGEMM_DEFAULT_R 8192
  2427. #define CGEMM_DEFAULT_R 4096
  2428. #define ZGEMM_DEFAULT_R 4096
  2429. #define SYMV_P 16
  2430. #define GEMM_DEFAULT_OFFSET_A 0
  2431. #define GEMM_DEFAULT_OFFSET_B 0
  2432. #endif
  2433. #ifdef ARMV7
  2434. #define SNUMOPT 2
  2435. #define DNUMOPT 2
  2436. #define GEMM_DEFAULT_OFFSET_A 0
  2437. #define GEMM_DEFAULT_OFFSET_B 0
  2438. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2439. #define SGEMM_DEFAULT_UNROLL_M 4
  2440. #define SGEMM_DEFAULT_UNROLL_N 4
  2441. #define DGEMM_DEFAULT_UNROLL_M 4
  2442. #define DGEMM_DEFAULT_UNROLL_N 4
  2443. #define CGEMM_DEFAULT_UNROLL_M 2
  2444. #define CGEMM_DEFAULT_UNROLL_N 2
  2445. #define ZGEMM_DEFAULT_UNROLL_M 2
  2446. #define ZGEMM_DEFAULT_UNROLL_N 2
  2447. #define SGEMM_DEFAULT_P 128
  2448. #define DGEMM_DEFAULT_P 128
  2449. #define CGEMM_DEFAULT_P 96
  2450. #define ZGEMM_DEFAULT_P 64
  2451. #define SGEMM_DEFAULT_Q 240
  2452. #define DGEMM_DEFAULT_Q 120
  2453. #define CGEMM_DEFAULT_Q 120
  2454. #define ZGEMM_DEFAULT_Q 120
  2455. #define SGEMM_DEFAULT_R 12288
  2456. #define DGEMM_DEFAULT_R 8192
  2457. #define CGEMM_DEFAULT_R 4096
  2458. #define ZGEMM_DEFAULT_R 4096
  2459. #define SYMV_P 16
  2460. #endif
  2461. #if defined(ARMV6)
  2462. #define SNUMOPT 2
  2463. #define DNUMOPT 2
  2464. #define GEMM_DEFAULT_OFFSET_A 0
  2465. #define GEMM_DEFAULT_OFFSET_B 0
  2466. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2467. #define SGEMM_DEFAULT_UNROLL_M 4
  2468. #define SGEMM_DEFAULT_UNROLL_N 2
  2469. #define DGEMM_DEFAULT_UNROLL_M 4
  2470. #define DGEMM_DEFAULT_UNROLL_N 2
  2471. #define CGEMM_DEFAULT_UNROLL_M 2
  2472. #define CGEMM_DEFAULT_UNROLL_N 2
  2473. #define ZGEMM_DEFAULT_UNROLL_M 2
  2474. #define ZGEMM_DEFAULT_UNROLL_N 2
  2475. #define SGEMM_DEFAULT_P 128
  2476. #define DGEMM_DEFAULT_P 128
  2477. #define CGEMM_DEFAULT_P 96
  2478. #define ZGEMM_DEFAULT_P 64
  2479. #define SGEMM_DEFAULT_Q 240
  2480. #define DGEMM_DEFAULT_Q 120
  2481. #define CGEMM_DEFAULT_Q 120
  2482. #define ZGEMM_DEFAULT_Q 120
  2483. #define SGEMM_DEFAULT_R 12288
  2484. #define DGEMM_DEFAULT_R 8192
  2485. #define CGEMM_DEFAULT_R 4096
  2486. #define ZGEMM_DEFAULT_R 4096
  2487. #define SYMV_P 16
  2488. #endif
  2489. /* Common ARMv8 parameters */
  2490. #if defined(ARMV8)
  2491. #define SNUMOPT 2
  2492. #define DNUMOPT 2
  2493. #define GEMM_DEFAULT_OFFSET_A 0
  2494. #define GEMM_DEFAULT_OFFSET_B 0
  2495. #ifdef _WIN64
  2496. /* Use explicit casting for win64 as LLP64 datamodel is used */
  2497. #define GEMM_DEFAULT_ALIGN (BLASULONG)0x03fffUL
  2498. #else
  2499. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2500. #endif
  2501. #define SYMV_P 16
  2502. #if defined(CORTEXA57) || defined(CORTEXX1) || \
  2503. defined(CORTEXA72) || defined(CORTEXA73) || \
  2504. defined(FALKOR) || defined(TSV110) || defined(EMAG8180) || defined(VORTEX) || defined(FT2000)
  2505. #define SGEMM_DEFAULT_UNROLL_M 16
  2506. #define SGEMM_DEFAULT_UNROLL_N 4
  2507. #define DGEMM_DEFAULT_UNROLL_M 8
  2508. #define DGEMM_DEFAULT_UNROLL_N 4
  2509. #define CGEMM_DEFAULT_UNROLL_M 8
  2510. #define CGEMM_DEFAULT_UNROLL_N 4
  2511. #define ZGEMM_DEFAULT_UNROLL_M 4
  2512. #define ZGEMM_DEFAULT_UNROLL_N 4
  2513. /*FIXME: this should be using the cache size, but there is currently no easy way to
  2514. query that on ARM. So if getarch counted more than 8 cores we simply assume the host
  2515. is a big desktop or server with abundant cache rather than a phone or embedded device */
  2516. #if NUM_CORES > 8 || defined(TSV110) || defined(EMAG8180) || defined(VORTEX)|| defined(CORTEXX1)
  2517. #define SGEMM_DEFAULT_P 512
  2518. #define DGEMM_DEFAULT_P 256
  2519. #define CGEMM_DEFAULT_P 256
  2520. #define ZGEMM_DEFAULT_P 128
  2521. #define SGEMM_DEFAULT_Q 1024
  2522. #define DGEMM_DEFAULT_Q 512
  2523. #define CGEMM_DEFAULT_Q 512
  2524. #define ZGEMM_DEFAULT_Q 512
  2525. #else
  2526. #define SGEMM_DEFAULT_P 128
  2527. #define DGEMM_DEFAULT_P 160
  2528. #define CGEMM_DEFAULT_P 128
  2529. #define ZGEMM_DEFAULT_P 128
  2530. #define SGEMM_DEFAULT_Q 352
  2531. #define DGEMM_DEFAULT_Q 128
  2532. #define CGEMM_DEFAULT_Q 224
  2533. #define ZGEMM_DEFAULT_Q 112
  2534. #endif
  2535. #define SGEMM_DEFAULT_R 4096
  2536. #define DGEMM_DEFAULT_R 4096
  2537. #define CGEMM_DEFAULT_R 4096
  2538. #define ZGEMM_DEFAULT_R 2048
  2539. #elif defined(CORTEXA53) || defined(CORTEXA55)
  2540. #define SGEMM_DEFAULT_UNROLL_M 8
  2541. #define SGEMM_DEFAULT_UNROLL_N 8
  2542. #define DGEMM_DEFAULT_UNROLL_M 4
  2543. #define DGEMM_DEFAULT_UNROLL_N 4
  2544. #define CGEMM_DEFAULT_UNROLL_M 8
  2545. #define CGEMM_DEFAULT_UNROLL_N 4
  2546. #define ZGEMM_DEFAULT_UNROLL_M 4
  2547. #define ZGEMM_DEFAULT_UNROLL_N 4
  2548. #define SGEMM_DEFAULT_P 256
  2549. #define DGEMM_DEFAULT_P 160
  2550. #define CGEMM_DEFAULT_P 128
  2551. #define ZGEMM_DEFAULT_P 128
  2552. #define SGEMM_DEFAULT_Q 256
  2553. #define DGEMM_DEFAULT_Q 128
  2554. #define CGEMM_DEFAULT_Q 224
  2555. #define ZGEMM_DEFAULT_Q 112
  2556. #define SGEMM_DEFAULT_R 4096
  2557. #define DGEMM_DEFAULT_R 4096
  2558. #define CGEMM_DEFAULT_R 4096
  2559. #define ZGEMM_DEFAULT_R 2048
  2560. #elif defined(THUNDERX)
  2561. #define SGEMM_DEFAULT_UNROLL_M 4
  2562. #define SGEMM_DEFAULT_UNROLL_N 4
  2563. #define DGEMM_DEFAULT_UNROLL_M 2
  2564. #define DGEMM_DEFAULT_UNROLL_N 2
  2565. #define CGEMM_DEFAULT_UNROLL_M 2
  2566. #define CGEMM_DEFAULT_UNROLL_N 2
  2567. #define ZGEMM_DEFAULT_UNROLL_M 2
  2568. #define ZGEMM_DEFAULT_UNROLL_N 2
  2569. #define SGEMM_DEFAULT_P 128
  2570. #define DGEMM_DEFAULT_P 128
  2571. #define CGEMM_DEFAULT_P 96
  2572. #define ZGEMM_DEFAULT_P 64
  2573. #define SGEMM_DEFAULT_Q 240
  2574. #define DGEMM_DEFAULT_Q 120
  2575. #define CGEMM_DEFAULT_Q 120
  2576. #define ZGEMM_DEFAULT_Q 120
  2577. #define SGEMM_DEFAULT_R 12288
  2578. #define DGEMM_DEFAULT_R 8192
  2579. #define CGEMM_DEFAULT_R 4096
  2580. #define ZGEMM_DEFAULT_R 4096
  2581. #elif defined(THUNDERX2T99)
  2582. #define SGEMM_DEFAULT_UNROLL_M 16
  2583. #define SGEMM_DEFAULT_UNROLL_N 4
  2584. #define DGEMM_DEFAULT_UNROLL_M 8
  2585. #define DGEMM_DEFAULT_UNROLL_N 4
  2586. #define CGEMM_DEFAULT_UNROLL_M 8
  2587. #define CGEMM_DEFAULT_UNROLL_N 4
  2588. #define ZGEMM_DEFAULT_UNROLL_M 4
  2589. #define ZGEMM_DEFAULT_UNROLL_N 4
  2590. #define SGEMM_DEFAULT_P 128
  2591. #define DGEMM_DEFAULT_P 160
  2592. #define CGEMM_DEFAULT_P 128
  2593. #define ZGEMM_DEFAULT_P 128
  2594. #define SGEMM_DEFAULT_Q 352
  2595. #define DGEMM_DEFAULT_Q 128
  2596. #define CGEMM_DEFAULT_Q 224
  2597. #define ZGEMM_DEFAULT_Q 112
  2598. #define SGEMM_DEFAULT_R 4096
  2599. #define DGEMM_DEFAULT_R 4096
  2600. #define CGEMM_DEFAULT_R 4096
  2601. #define ZGEMM_DEFAULT_R 4096
  2602. #elif defined(THUNDERX3T110)
  2603. #define SGEMM_DEFAULT_UNROLL_M 16
  2604. #define SGEMM_DEFAULT_UNROLL_N 4
  2605. #define DGEMM_DEFAULT_UNROLL_M 8
  2606. #define DGEMM_DEFAULT_UNROLL_N 4
  2607. #define CGEMM_DEFAULT_UNROLL_M 8
  2608. #define CGEMM_DEFAULT_UNROLL_N 4
  2609. #define ZGEMM_DEFAULT_UNROLL_M 4
  2610. #define ZGEMM_DEFAULT_UNROLL_N 4
  2611. #define SGEMM_DEFAULT_P 128
  2612. #define DGEMM_DEFAULT_P 320
  2613. #define CGEMM_DEFAULT_P 128
  2614. #define ZGEMM_DEFAULT_P 128
  2615. #define SGEMM_DEFAULT_Q 352
  2616. #define DGEMM_DEFAULT_Q 128
  2617. #define CGEMM_DEFAULT_Q 224
  2618. #define ZGEMM_DEFAULT_Q 112
  2619. #define SGEMM_DEFAULT_R 4096
  2620. #define DGEMM_DEFAULT_R 4096
  2621. #define CGEMM_DEFAULT_R 4096
  2622. #define ZGEMM_DEFAULT_R 4096
  2623. #elif defined(NEOVERSEN1)
  2624. #define SGEMM_DEFAULT_UNROLL_M 16
  2625. #define SGEMM_DEFAULT_UNROLL_N 4
  2626. #define DGEMM_DEFAULT_UNROLL_M 8
  2627. #define DGEMM_DEFAULT_UNROLL_N 4
  2628. #define CGEMM_DEFAULT_UNROLL_M 8
  2629. #define CGEMM_DEFAULT_UNROLL_N 4
  2630. #define ZGEMM_DEFAULT_UNROLL_M 4
  2631. #define ZGEMM_DEFAULT_UNROLL_N 4
  2632. #define SGEMM_DEFAULT_P 128
  2633. #define DGEMM_DEFAULT_P 160
  2634. #define CGEMM_DEFAULT_P 128
  2635. #define ZGEMM_DEFAULT_P 128
  2636. #define SGEMM_DEFAULT_Q 352
  2637. #define DGEMM_DEFAULT_Q 128
  2638. #define CGEMM_DEFAULT_Q 224
  2639. #define ZGEMM_DEFAULT_Q 112
  2640. #define SGEMM_DEFAULT_R 4096
  2641. #define DGEMM_DEFAULT_R 4096
  2642. #define CGEMM_DEFAULT_R 4096
  2643. #define ZGEMM_DEFAULT_R 4096
  2644. #elif defined(NEOVERSEV1)
  2645. #define SGEMM_DEFAULT_UNROLL_M 16
  2646. #define SGEMM_DEFAULT_UNROLL_N 4
  2647. #define DGEMM_DEFAULT_UNROLL_M 8
  2648. #define DGEMM_DEFAULT_UNROLL_N 4
  2649. #define CGEMM_DEFAULT_UNROLL_M 8
  2650. #define CGEMM_DEFAULT_UNROLL_N 4
  2651. #define ZGEMM_DEFAULT_UNROLL_M 4
  2652. #define ZGEMM_DEFAULT_UNROLL_N 4
  2653. #define SGEMM_DEFAULT_P 128
  2654. #define DGEMM_DEFAULT_P 160
  2655. #define CGEMM_DEFAULT_P 128
  2656. #define ZGEMM_DEFAULT_P 128
  2657. #define SGEMM_DEFAULT_Q 352
  2658. #define DGEMM_DEFAULT_Q 128
  2659. #define CGEMM_DEFAULT_Q 224
  2660. #define ZGEMM_DEFAULT_Q 112
  2661. #define SGEMM_DEFAULT_R 4096
  2662. #define DGEMM_DEFAULT_R 4096
  2663. #define CGEMM_DEFAULT_R 4096
  2664. #define ZGEMM_DEFAULT_R 4096
  2665. #elif defined(NEOVERSEN2)
  2666. #define SGEMM_DEFAULT_UNROLL_M 16
  2667. #define SGEMM_DEFAULT_UNROLL_N 4
  2668. #define DGEMM_DEFAULT_UNROLL_M 8
  2669. #define DGEMM_DEFAULT_UNROLL_N 4
  2670. #define CGEMM_DEFAULT_UNROLL_M 8
  2671. #define CGEMM_DEFAULT_UNROLL_N 4
  2672. #define ZGEMM_DEFAULT_UNROLL_M 4
  2673. #define ZGEMM_DEFAULT_UNROLL_N 4
  2674. #define SGEMM_DEFAULT_P 128
  2675. #define DGEMM_DEFAULT_P 160
  2676. #define CGEMM_DEFAULT_P 128
  2677. #define ZGEMM_DEFAULT_P 128
  2678. #define SGEMM_DEFAULT_Q 352
  2679. #define DGEMM_DEFAULT_Q 128
  2680. #define CGEMM_DEFAULT_Q 224
  2681. #define ZGEMM_DEFAULT_Q 112
  2682. #define SGEMM_DEFAULT_R 4096
  2683. #define DGEMM_DEFAULT_R 4096
  2684. #define CGEMM_DEFAULT_R 4096
  2685. #define ZGEMM_DEFAULT_R 4096
  2686. #elif defined(ARMV8SVE) || defined(A64FX) || defined(ARMV9) || defined(CORTEXA510)|| defined(CORTEXA710) || defined(CORTEXX2)
  2687. /* When all BLAS3 routines are implemeted with SVE, SGEMM_DEFAULT_UNROLL_M should be "sve_vl".
  2688. Until then, just keep it different than DGEMM_DEFAULT_UNROLL_N to keep copy routines in both directions seperated. */
  2689. #define SGEMM_DEFAULT_UNROLL_M 4
  2690. #define SGEMM_DEFAULT_UNROLL_N 8
  2691. /* SGEMM_UNROLL_MN is calculated as max(SGEMM_UNROLL_M, SGEMM_UNROLL_N)
  2692. * Since we don't define SGEMM_UNROLL_M correctly we have to manually set this macro.
  2693. * If SVE size is ever more than 1024, this should be increased also. */
  2694. #define SGEMM_DEFAULT_UNROLL_MN 32
  2695. /* When all BLAS3 routines are implemeted with SVE, DGEMM_DEFAULT_UNROLL_M should be "sve_vl".
  2696. Until then, just keep it different than DGEMM_DEFAULT_UNROLL_N to keep copy routines in both directions seperated. */
  2697. #define DGEMM_DEFAULT_UNROLL_M 2
  2698. #define DGEMM_DEFAULT_UNROLL_N 8
  2699. #define DGEMM_DEFAULT_UNROLL_MN 32
  2700. #define CGEMM_DEFAULT_UNROLL_M 2
  2701. #define CGEMM_DEFAULT_UNROLL_N 4
  2702. #define CGEMM_DEFAULT_UNROLL_MN 16
  2703. #define ZGEMM_DEFAULT_UNROLL_M 2
  2704. #define ZGEMM_DEFAULT_UNROLL_N 4
  2705. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2706. #define SGEMM_DEFAULT_P 128
  2707. #define DGEMM_DEFAULT_P 160
  2708. #define CGEMM_DEFAULT_P 128
  2709. #define ZGEMM_DEFAULT_P 128
  2710. #define SGEMM_DEFAULT_Q 352
  2711. #define DGEMM_DEFAULT_Q 128
  2712. #define CGEMM_DEFAULT_Q 224
  2713. #define ZGEMM_DEFAULT_Q 112
  2714. #define SGEMM_DEFAULT_R 4096
  2715. #define DGEMM_DEFAULT_R 4096
  2716. #define CGEMM_DEFAULT_R 4096
  2717. #define ZGEMM_DEFAULT_R 4096
  2718. #else /* Other/undetected ARMv8 cores */
  2719. #define SGEMM_DEFAULT_UNROLL_M 16
  2720. #define SGEMM_DEFAULT_UNROLL_N 4
  2721. #define DGEMM_DEFAULT_UNROLL_M 8
  2722. #define DGEMM_DEFAULT_UNROLL_N 4
  2723. #define CGEMM_DEFAULT_UNROLL_M 8
  2724. #define CGEMM_DEFAULT_UNROLL_N 4
  2725. #define ZGEMM_DEFAULT_UNROLL_M 4
  2726. #define ZGEMM_DEFAULT_UNROLL_N 4
  2727. #define SGEMM_DEFAULT_P 128
  2728. #define DGEMM_DEFAULT_P 160
  2729. #define CGEMM_DEFAULT_P 128
  2730. #define ZGEMM_DEFAULT_P 128
  2731. #define SGEMM_DEFAULT_Q 352
  2732. #define DGEMM_DEFAULT_Q 128
  2733. #define CGEMM_DEFAULT_Q 224
  2734. #define ZGEMM_DEFAULT_Q 112
  2735. #define SGEMM_DEFAULT_R 4096
  2736. #define DGEMM_DEFAULT_R 4096
  2737. #define CGEMM_DEFAULT_R 4096
  2738. #define ZGEMM_DEFAULT_R 4096
  2739. #endif /* Cores */
  2740. #endif /* ARMv8 */
  2741. #if defined(ARMV5)
  2742. #define SNUMOPT 2
  2743. #define DNUMOPT 2
  2744. #define GEMM_DEFAULT_OFFSET_A 0
  2745. #define GEMM_DEFAULT_OFFSET_B 0
  2746. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2747. #define SGEMM_DEFAULT_UNROLL_M 2
  2748. #define SGEMM_DEFAULT_UNROLL_N 2
  2749. #define DGEMM_DEFAULT_UNROLL_M 2
  2750. #define DGEMM_DEFAULT_UNROLL_N 2
  2751. #define CGEMM_DEFAULT_UNROLL_M 2
  2752. #define CGEMM_DEFAULT_UNROLL_N 2
  2753. #define ZGEMM_DEFAULT_UNROLL_M 2
  2754. #define ZGEMM_DEFAULT_UNROLL_N 2
  2755. #define SGEMM_DEFAULT_P 128
  2756. #define DGEMM_DEFAULT_P 128
  2757. #define CGEMM_DEFAULT_P 96
  2758. #define ZGEMM_DEFAULT_P 64
  2759. #define SGEMM_DEFAULT_Q 240
  2760. #define DGEMM_DEFAULT_Q 120
  2761. #define CGEMM_DEFAULT_Q 120
  2762. #define ZGEMM_DEFAULT_Q 120
  2763. #define SGEMM_DEFAULT_R 12288
  2764. #define DGEMM_DEFAULT_R 8192
  2765. #define CGEMM_DEFAULT_R 4096
  2766. #define ZGEMM_DEFAULT_R 4096
  2767. #define SYMV_P 16
  2768. #endif
  2769. #ifdef CORTEXA9
  2770. #define SNUMOPT 2
  2771. #define DNUMOPT 2
  2772. #define GEMM_DEFAULT_OFFSET_A 0
  2773. #define GEMM_DEFAULT_OFFSET_B 0
  2774. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2775. #define SGEMM_DEFAULT_UNROLL_M 4
  2776. #define SGEMM_DEFAULT_UNROLL_N 4
  2777. #define DGEMM_DEFAULT_UNROLL_M 4
  2778. #define DGEMM_DEFAULT_UNROLL_N 4
  2779. #define CGEMM_DEFAULT_UNROLL_M 2
  2780. #define CGEMM_DEFAULT_UNROLL_N 2
  2781. #define ZGEMM_DEFAULT_UNROLL_M 2
  2782. #define ZGEMM_DEFAULT_UNROLL_N 2
  2783. #define SGEMM_DEFAULT_P 128
  2784. #define DGEMM_DEFAULT_P 128
  2785. #define CGEMM_DEFAULT_P 96
  2786. #define ZGEMM_DEFAULT_P 64
  2787. #define SGEMM_DEFAULT_Q 240
  2788. #define DGEMM_DEFAULT_Q 120
  2789. #define CGEMM_DEFAULT_Q 120
  2790. #define ZGEMM_DEFAULT_Q 120
  2791. #define SGEMM_DEFAULT_R 12288
  2792. #define DGEMM_DEFAULT_R 8192
  2793. #define CGEMM_DEFAULT_R 4096
  2794. #define ZGEMM_DEFAULT_R 4096
  2795. #define SYMV_P 16
  2796. #endif
  2797. #ifdef CORTEXA15
  2798. #define SNUMOPT 2
  2799. #define DNUMOPT 2
  2800. #define GEMM_DEFAULT_OFFSET_A 0
  2801. #define GEMM_DEFAULT_OFFSET_B 0
  2802. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2803. #define SGEMM_DEFAULT_UNROLL_M 4
  2804. #define SGEMM_DEFAULT_UNROLL_N 4
  2805. #define DGEMM_DEFAULT_UNROLL_M 4
  2806. #define DGEMM_DEFAULT_UNROLL_N 4
  2807. #define CGEMM_DEFAULT_UNROLL_M 2
  2808. #define CGEMM_DEFAULT_UNROLL_N 2
  2809. #define ZGEMM_DEFAULT_UNROLL_M 2
  2810. #define ZGEMM_DEFAULT_UNROLL_N 2
  2811. #define SGEMM_DEFAULT_P 128
  2812. #define DGEMM_DEFAULT_P 128
  2813. #define CGEMM_DEFAULT_P 96
  2814. #define ZGEMM_DEFAULT_P 64
  2815. #define SGEMM_DEFAULT_Q 240
  2816. #define DGEMM_DEFAULT_Q 120
  2817. #define CGEMM_DEFAULT_Q 120
  2818. #define ZGEMM_DEFAULT_Q 120
  2819. #define SGEMM_DEFAULT_R 12288
  2820. #define DGEMM_DEFAULT_R 8192
  2821. #define CGEMM_DEFAULT_R 4096
  2822. #define ZGEMM_DEFAULT_R 4096
  2823. #define SYMV_P 16
  2824. #endif
  2825. #if defined(ZARCH_GENERIC)
  2826. #define SNUMOPT 2
  2827. #define DNUMOPT 2
  2828. #define GEMM_DEFAULT_OFFSET_A 0
  2829. #define GEMM_DEFAULT_OFFSET_B 0
  2830. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2831. #define SGEMM_DEFAULT_UNROLL_M 2
  2832. #define SGEMM_DEFAULT_UNROLL_N 2
  2833. #define DGEMM_DEFAULT_UNROLL_M 2
  2834. #define DGEMM_DEFAULT_UNROLL_N 2
  2835. #define CGEMM_DEFAULT_UNROLL_M 2
  2836. #define CGEMM_DEFAULT_UNROLL_N 2
  2837. #define ZGEMM_DEFAULT_UNROLL_M 2
  2838. #define ZGEMM_DEFAULT_UNROLL_N 2
  2839. #define SGEMM_DEFAULT_P 128
  2840. #define DGEMM_DEFAULT_P 128
  2841. #define CGEMM_DEFAULT_P 96
  2842. #define ZGEMM_DEFAULT_P 64
  2843. #define SGEMM_DEFAULT_Q 240
  2844. #define DGEMM_DEFAULT_Q 120
  2845. #define CGEMM_DEFAULT_Q 120
  2846. #define ZGEMM_DEFAULT_Q 120
  2847. #define SGEMM_DEFAULT_R 12288
  2848. #define DGEMM_DEFAULT_R 8192
  2849. #define CGEMM_DEFAULT_R 4096
  2850. #define ZGEMM_DEFAULT_R 4096
  2851. #define SYMV_P 16
  2852. #endif
  2853. #if defined(Z13)
  2854. #define SNUMOPT 2
  2855. #define DNUMOPT 2
  2856. #define GEMM_DEFAULT_OFFSET_A 0
  2857. #define GEMM_DEFAULT_OFFSET_B 0
  2858. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2859. #define SGEMM_DEFAULT_UNROLL_M 8
  2860. #define SGEMM_DEFAULT_UNROLL_N 4
  2861. #define DGEMM_DEFAULT_UNROLL_M 8
  2862. #define DGEMM_DEFAULT_UNROLL_N 4
  2863. #define CGEMM_DEFAULT_UNROLL_M 4
  2864. #define CGEMM_DEFAULT_UNROLL_N 4
  2865. #define ZGEMM_DEFAULT_UNROLL_M 4
  2866. #define ZGEMM_DEFAULT_UNROLL_N 4
  2867. #define SGEMM_DEFAULT_P 456
  2868. #define DGEMM_DEFAULT_P 320
  2869. #define CGEMM_DEFAULT_P 480
  2870. #define ZGEMM_DEFAULT_P 224
  2871. #define SGEMM_DEFAULT_Q 488
  2872. #define DGEMM_DEFAULT_Q 384
  2873. #define CGEMM_DEFAULT_Q 128
  2874. #define ZGEMM_DEFAULT_Q 352
  2875. #define SGEMM_DEFAULT_R 8192
  2876. #define DGEMM_DEFAULT_R 4096
  2877. #define CGEMM_DEFAULT_R 4096
  2878. #define ZGEMM_DEFAULT_R 2048
  2879. #define SYMV_P 16
  2880. #endif
  2881. #if defined(Z14)
  2882. #define SNUMOPT 2
  2883. #define DNUMOPT 2
  2884. #define GEMM_DEFAULT_OFFSET_A 0
  2885. #define GEMM_DEFAULT_OFFSET_B 0
  2886. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2887. #define SGEMM_DEFAULT_UNROLL_M 16
  2888. #define SGEMM_DEFAULT_UNROLL_N 4
  2889. #define DGEMM_DEFAULT_UNROLL_M 8
  2890. #define DGEMM_DEFAULT_UNROLL_N 4
  2891. #define CGEMM_DEFAULT_UNROLL_M 4
  2892. #define CGEMM_DEFAULT_UNROLL_N 4
  2893. #define ZGEMM_DEFAULT_UNROLL_M 4
  2894. #define ZGEMM_DEFAULT_UNROLL_N 4
  2895. #define SGEMM_DEFAULT_P 480
  2896. #define DGEMM_DEFAULT_P 320
  2897. #define CGEMM_DEFAULT_P 480
  2898. #define ZGEMM_DEFAULT_P 224
  2899. #define SGEMM_DEFAULT_Q 512
  2900. #define DGEMM_DEFAULT_Q 384
  2901. #define CGEMM_DEFAULT_Q 128
  2902. #define ZGEMM_DEFAULT_Q 352
  2903. #define SGEMM_DEFAULT_R 8192
  2904. #define DGEMM_DEFAULT_R 4096
  2905. #define CGEMM_DEFAULT_R 4096
  2906. #define ZGEMM_DEFAULT_R 2048
  2907. #define SYMV_P 16
  2908. #endif
  2909. #ifdef GENERIC
  2910. #define SNUMOPT 2
  2911. #define DNUMOPT 2
  2912. #define GEMM_DEFAULT_OFFSET_A 0
  2913. #define GEMM_DEFAULT_OFFSET_B 0
  2914. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  2915. #define SGEMM_DEFAULT_UNROLL_N 2
  2916. #define DGEMM_DEFAULT_UNROLL_N 2
  2917. #define QGEMM_DEFAULT_UNROLL_N 2
  2918. #define CGEMM_DEFAULT_UNROLL_N 2
  2919. #define ZGEMM_DEFAULT_UNROLL_N 2
  2920. #define XGEMM_DEFAULT_UNROLL_N 1
  2921. #ifdef ARCH_X86
  2922. #define SGEMM_DEFAULT_UNROLL_M 2
  2923. #define DGEMM_DEFAULT_UNROLL_M 2
  2924. #define QGEMM_DEFAULT_UNROLL_M 2
  2925. #define CGEMM_DEFAULT_UNROLL_M 2
  2926. #define ZGEMM_DEFAULT_UNROLL_M 2
  2927. #define XGEMM_DEFAULT_UNROLL_M 1
  2928. #else
  2929. #define SGEMM_DEFAULT_UNROLL_M 2
  2930. #define DGEMM_DEFAULT_UNROLL_M 2
  2931. #define QGEMM_DEFAULT_UNROLL_M 2
  2932. #define CGEMM_DEFAULT_UNROLL_M 2
  2933. #define ZGEMM_DEFAULT_UNROLL_M 2
  2934. #define XGEMM_DEFAULT_UNROLL_M 1
  2935. #endif
  2936. #ifdef ARCH_MIPS
  2937. #define SGEMM_DEFAULT_P 128
  2938. #define DGEMM_DEFAULT_P 128
  2939. #define CGEMM_DEFAULT_P 96
  2940. #define ZGEMM_DEFAULT_P 64
  2941. #define SGEMM_DEFAULT_Q 240
  2942. #define DGEMM_DEFAULT_Q 120
  2943. #define CGEMM_DEFAULT_Q 120
  2944. #define ZGEMM_DEFAULT_Q 120
  2945. #define SGEMM_DEFAULT_R 12288
  2946. #define DGEMM_DEFAULT_R 8192
  2947. #define CGEMM_DEFAULT_R 4096
  2948. #define ZGEMM_DEFAULT_R 4096
  2949. #else
  2950. #define SGEMM_DEFAULT_P sgemm_p
  2951. #define DGEMM_DEFAULT_P dgemm_p
  2952. #define QGEMM_DEFAULT_P qgemm_p
  2953. #define CGEMM_DEFAULT_P cgemm_p
  2954. #define ZGEMM_DEFAULT_P zgemm_p
  2955. #define XGEMM_DEFAULT_P xgemm_p
  2956. #define SGEMM_DEFAULT_R sgemm_r
  2957. #define DGEMM_DEFAULT_R dgemm_r
  2958. #define QGEMM_DEFAULT_R qgemm_r
  2959. #define CGEMM_DEFAULT_R cgemm_r
  2960. #define ZGEMM_DEFAULT_R zgemm_r
  2961. #define XGEMM_DEFAULT_R xgemm_r
  2962. #define SGEMM_DEFAULT_Q 128
  2963. #define DGEMM_DEFAULT_Q 128
  2964. #define QGEMM_DEFAULT_Q 128
  2965. #define CGEMM_DEFAULT_Q 128
  2966. #define ZGEMM_DEFAULT_Q 128
  2967. #define XGEMM_DEFAULT_Q 128
  2968. #endif
  2969. #define SYMV_P 16
  2970. #endif
  2971. #ifndef QGEMM_DEFAULT_UNROLL_M
  2972. #define QGEMM_DEFAULT_UNROLL_M 2
  2973. #endif
  2974. #ifndef QGEMM_DEFAULT_UNROLL_N
  2975. #define QGEMM_DEFAULT_UNROLL_N 2
  2976. #endif
  2977. #ifndef XGEMM_DEFAULT_UNROLL_M
  2978. #define XGEMM_DEFAULT_UNROLL_M 2
  2979. #endif
  2980. #ifndef XGEMM_DEFAULT_UNROLL_N
  2981. #define XGEMM_DEFAULT_UNROLL_N 2
  2982. #endif
  2983. #ifndef HAVE_SSE2
  2984. #define SHUFPD_0 shufps $0x44,
  2985. #define SHUFPD_1 shufps $0x4e,
  2986. #define SHUFPD_2 shufps $0xe4,
  2987. #define SHUFPD_3 shufps $0xee,
  2988. #endif
  2989. #ifndef SHUFPD_0
  2990. #define SHUFPD_0 shufpd $0,
  2991. #endif
  2992. #ifndef SHUFPD_1
  2993. #define SHUFPD_1 shufpd $1,
  2994. #endif
  2995. #ifndef SHUFPD_2
  2996. #define SHUFPD_2 shufpd $2,
  2997. #endif
  2998. #ifndef SHUFPD_3
  2999. #define SHUFPD_3 shufpd $3,
  3000. #endif
  3001. #ifndef SHUFPS_39
  3002. #define SHUFPS_39 shufps $0x39,
  3003. #endif
  3004. #endif