You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

param.h 99 kB

12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
5 years ago
12 years ago
12 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
12 years ago
12 years ago
12 years ago
5 years ago
5 years ago
12 years ago
5 years ago
5 years ago
12 years ago
5 years ago
12 years ago
5 years ago
5 years ago
5 years ago
12 years ago
6 years ago
12 years ago
12 years ago
12 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
3 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
12 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
12 years ago
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324132513261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384138513861387138813891390139113921393139413951396139713981399140014011402140314041405140614071408140914101411141214131414141514161417141814191420142114221423142414251426142714281429143014311432143314341435143614371438143914401441144214431444144514461447144814491450145114521453145414551456145714581459146014611462146314641465146614671468146914701471147214731474147514761477147814791480148114821483148414851486148714881489149014911492149314941495149614971498149915001501150215031504150515061507150815091510151115121513151415151516151715181519152015211522152315241525152615271528152915301531153215331534153515361537153815391540154115421543154415451546154715481549155015511552155315541555155615571558155915601561156215631564156515661567156815691570157115721573157415751576157715781579158015811582158315841585158615871588158915901591159215931594159515961597159815991600160116021603160416051606160716081609161016111612161316141615161616171618161916201621162216231624162516261627162816291630163116321633163416351636163716381639164016411642164316441645164616471648164916501651165216531654165516561657165816591660166116621663166416651666166716681669167016711672167316741675167616771678167916801681168216831684168516861687168816891690169116921693169416951696169716981699170017011702170317041705170617071708170917101711171217131714171517161717171817191720172117221723172417251726172717281729173017311732173317341735173617371738173917401741174217431744174517461747174817491750175117521753175417551756175717581759176017611762176317641765176617671768176917701771177217731774177517761777177817791780178117821783178417851786178717881789179017911792179317941795179617971798179918001801180218031804180518061807180818091810181118121813181418151816181718181819182018211822182318241825182618271828182918301831183218331834183518361837183818391840184118421843184418451846184718481849185018511852185318541855185618571858185918601861186218631864186518661867186818691870187118721873187418751876187718781879188018811882188318841885188618871888188918901891189218931894189518961897189818991900190119021903190419051906190719081909191019111912191319141915191619171918191919201921192219231924192519261927192819291930193119321933193419351936193719381939194019411942194319441945194619471948194919501951195219531954195519561957195819591960196119621963196419651966196719681969197019711972197319741975197619771978197919801981198219831984198519861987198819891990199119921993199419951996199719981999200020012002200320042005200620072008200920102011201220132014201520162017201820192020202120222023202420252026202720282029203020312032203320342035203620372038203920402041204220432044204520462047204820492050205120522053205420552056205720582059206020612062206320642065206620672068206920702071207220732074207520762077207820792080208120822083208420852086208720882089209020912092209320942095209620972098209921002101210221032104210521062107210821092110211121122113211421152116211721182119212021212122212321242125212621272128212921302131213221332134213521362137213821392140214121422143214421452146214721482149215021512152215321542155215621572158215921602161216221632164216521662167216821692170217121722173217421752176217721782179218021812182218321842185218621872188218921902191219221932194219521962197219821992200220122022203220422052206220722082209221022112212221322142215221622172218221922202221222222232224222522262227222822292230223122322233223422352236223722382239224022412242224322442245224622472248224922502251225222532254225522562257225822592260226122622263226422652266226722682269227022712272227322742275227622772278227922802281228222832284228522862287228822892290229122922293229422952296229722982299230023012302230323042305230623072308230923102311231223132314231523162317231823192320232123222323232423252326232723282329233023312332233323342335233623372338233923402341234223432344234523462347234823492350235123522353235423552356235723582359236023612362236323642365236623672368236923702371237223732374237523762377237823792380238123822383238423852386238723882389239023912392239323942395239623972398239924002401240224032404240524062407240824092410241124122413241424152416241724182419242024212422242324242425242624272428242924302431243224332434243524362437243824392440244124422443244424452446244724482449245024512452245324542455245624572458245924602461246224632464246524662467246824692470247124722473247424752476247724782479248024812482248324842485248624872488248924902491249224932494249524962497249824992500250125022503250425052506250725082509251025112512251325142515251625172518251925202521252225232524252525262527252825292530253125322533253425352536253725382539254025412542254325442545254625472548254925502551255225532554255525562557255825592560256125622563256425652566256725682569257025712572257325742575257625772578257925802581258225832584258525862587258825892590259125922593259425952596259725982599260026012602260326042605260626072608260926102611261226132614261526162617261826192620262126222623262426252626262726282629263026312632263326342635263626372638263926402641264226432644264526462647264826492650265126522653265426552656265726582659266026612662266326642665266626672668266926702671267226732674267526762677267826792680268126822683268426852686268726882689269026912692269326942695269626972698269927002701270227032704270527062707270827092710271127122713271427152716271727182719272027212722272327242725272627272728272927302731273227332734273527362737273827392740274127422743274427452746274727482749275027512752275327542755275627572758275927602761276227632764276527662767276827692770277127722773277427752776277727782779278027812782278327842785278627872788278927902791279227932794279527962797279827992800280128022803280428052806280728082809281028112812281328142815281628172818281928202821282228232824282528262827282828292830283128322833283428352836283728382839284028412842284328442845284628472848284928502851285228532854285528562857285828592860286128622863286428652866286728682869287028712872287328742875287628772878287928802881288228832884288528862887288828892890289128922893289428952896289728982899290029012902290329042905290629072908290929102911291229132914291529162917291829192920292129222923292429252926292729282929293029312932293329342935293629372938293929402941294229432944294529462947294829492950295129522953295429552956295729582959296029612962296329642965296629672968296929702971297229732974297529762977297829792980298129822983298429852986298729882989299029912992299329942995299629972998299930003001300230033004300530063007300830093010301130123013301430153016301730183019302030213022302330243025302630273028302930303031303230333034303530363037303830393040304130423043304430453046304730483049305030513052305330543055305630573058305930603061306230633064306530663067306830693070307130723073307430753076307730783079308030813082308330843085308630873088308930903091309230933094309530963097309830993100310131023103310431053106310731083109311031113112311331143115311631173118311931203121312231233124312531263127312831293130313131323133313431353136313731383139314031413142314331443145314631473148314931503151315231533154315531563157315831593160316131623163316431653166316731683169317031713172317331743175317631773178317931803181318231833184318531863187318831893190319131923193319431953196319731983199320032013202320332043205320632073208320932103211321232133214321532163217321832193220322132223223322432253226322732283229323032313232323332343235323632373238323932403241324232433244324532463247324832493250325132523253325432553256325732583259326032613262326332643265326632673268326932703271327232733274327532763277327832793280328132823283328432853286328732883289329032913292329332943295329632973298329933003301330233033304330533063307330833093310331133123313331433153316331733183319332033213322332333243325332633273328332933303331333233333334333533363337333833393340334133423343334433453346334733483349335033513352335333543355335633573358335933603361336233633364336533663367336833693370337133723373337433753376337733783379338033813382338333843385338633873388338933903391339233933394339533963397339833993400340134023403340434053406340734083409341034113412341334143415341634173418341934203421342234233424342534263427342834293430343134323433343434353436343734383439344034413442344334443445344634473448344934503451345234533454345534563457345834593460346134623463346434653466346734683469347034713472347334743475347634773478347934803481348234833484348534863487348834893490349134923493349434953496349734983499350035013502350335043505350635073508350935103511351235133514351535163517351835193520352135223523352435253526352735283529353035313532353335343535353635373538353935403541354235433544354535463547354835493550355135523553355435553556355735583559356035613562356335643565356635673568356935703571357235733574357535763577357835793580358135823583358435853586358735883589359035913592359335943595359635973598359936003601360236033604360536063607360836093610361136123613361436153616361736183619362036213622362336243625362636273628362936303631363236333634363536363637363836393640364136423643364436453646364736483649365036513652365336543655365636573658365936603661366236633664366536663667366836693670367136723673367436753676367736783679368036813682368336843685368636873688368936903691369236933694369536963697369836993700370137023703370437053706370737083709371037113712371337143715371637173718371937203721372237233724372537263727372837293730373137323733373437353736373737383739374037413742374337443745374637473748374937503751375237533754375537563757375837593760376137623763376437653766376737683769377037713772377337743775377637773778377937803781378237833784378537863787378837893790379137923793379437953796379737983799380038013802380338043805380638073808380938103811381238133814381538163817381838193820382138223823382438253826382738283829383038313832383338343835383638373838383938403841384238433844384538463847384838493850385138523853385438553856385738583859386038613862386338643865386638673868386938703871387238733874387538763877387838793880388138823883388438853886388738883889389038913892389338943895389638973898389939003901390239033904390539063907390839093910391139123913391439153916391739183919392039213922392339243925392639273928392939303931393239333934393539363937393839393940394139423943394439453946394739483949395039513952395339543955395639573958395939603961396239633964396539663967396839693970397139723973397439753976397739783979398039813982398339843985398639873988398939903991399239933994399539963997399839994000400140024003400440054006400740084009401040114012401340144015401640174018401940204021402240234024402540264027402840294030403140324033403440354036403740384039404040414042404340444045404640474048404940504051405240534054405540564057405840594060406140624063406440654066406740684069407040714072407340744075407640774078407940804081408240834084408540864087408840894090409140924093409440954096409740984099410041014102410341044105410641074108410941104111411241134114411541164117411841194120412141224123412441254126412741284129413041314132413341344135413641374138413941404141414241434144414541464147414841494150415141524153415441554156
  1. /*****************************************************************************
  2. Copyright (c) 2011-2023, The OpenBLAS Project
  3. All rights reserved.
  4. Redistribution and use in source and binary forms, with or without
  5. modification, are permitted provided that the following conditions are
  6. met:
  7. 1. Redistributions of source code must retain the above copyright
  8. notice, this list of conditions and the following disclaimer.
  9. 2. Redistributions in binary form must reproduce the above copyright
  10. notice, this list of conditions and the following disclaimer in
  11. the documentation and/or other materials provided with the
  12. distribution.
  13. 3. Neither the name of the OpenBLAS project nor the names of
  14. its contributors may be used to endorse or promote products
  15. derived from this software without specific prior written
  16. permission.
  17. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  18. AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  19. IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  20. ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  21. LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  22. DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  23. SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  24. CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  25. OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
  26. USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  27. **********************************************************************************/
  28. /*********************************************************************/
  29. /* Copyright 2009, 2010 The University of Texas at Austin. */
  30. /* All rights reserved. */
  31. /* */
  32. /* Redistribution and use in source and binary forms, with or */
  33. /* without modification, are permitted provided that the following */
  34. /* conditions are met: */
  35. /* */
  36. /* 1. Redistributions of source code must retain the above */
  37. /* copyright notice, this list of conditions and the following */
  38. /* disclaimer. */
  39. /* */
  40. /* 2. Redistributions in binary form must reproduce the above */
  41. /* copyright notice, this list of conditions and the following */
  42. /* disclaimer in the documentation and/or other materials */
  43. /* provided with the distribution. */
  44. /* */
  45. /* THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF TEXAS AT */
  46. /* AUSTIN ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, */
  47. /* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */
  48. /* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */
  49. /* DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT */
  50. /* AUSTIN OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, */
  51. /* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES */
  52. /* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE */
  53. /* GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR */
  54. /* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF */
  55. /* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT */
  56. /* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT */
  57. /* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */
  58. /* POSSIBILITY OF SUCH DAMAGE. */
  59. /* */
  60. /* The views and conclusions contained in the software and */
  61. /* documentation are those of the authors and should not be */
  62. /* interpreted as representing official policies, either expressed */
  63. /* or implied, of The University of Texas at Austin. */
  64. /*********************************************************************/
  65. #ifndef PARAM_H
  66. #define PARAM_H
  67. #define SBGEMM_DEFAULT_UNROLL_N 4
  68. #define SBGEMM_DEFAULT_UNROLL_M 8
  69. #define SBGEMM_DEFAULT_UNROLL_MN 32
  70. #define SBGEMM_DEFAULT_P 256
  71. #define SBGEMM_DEFAULT_R 256
  72. #define SBGEMM_DEFAULT_Q 256
  73. #define SBGEMM_ALIGN_K 1 // must be 2^x
  74. #ifdef OPTERON
  75. #define SNUMOPT 4
  76. #define DNUMOPT 2
  77. #define GEMM_DEFAULT_OFFSET_A 64
  78. #define GEMM_DEFAULT_OFFSET_B 256
  79. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x01ffffUL
  80. #define SGEMM_DEFAULT_UNROLL_N 4
  81. #define DGEMM_DEFAULT_UNROLL_N 4
  82. #define QGEMM_DEFAULT_UNROLL_N 2
  83. #define CGEMM_DEFAULT_UNROLL_N 2
  84. #define ZGEMM_DEFAULT_UNROLL_N 2
  85. #define XGEMM_DEFAULT_UNROLL_N 1
  86. #ifdef ARCH_X86
  87. #define SGEMM_DEFAULT_UNROLL_M 4
  88. #define DGEMM_DEFAULT_UNROLL_M 2
  89. #define QGEMM_DEFAULT_UNROLL_M 2
  90. #define CGEMM_DEFAULT_UNROLL_M 2
  91. #define ZGEMM_DEFAULT_UNROLL_M 1
  92. #define XGEMM_DEFAULT_UNROLL_M 1
  93. #else
  94. #define SGEMM_DEFAULT_UNROLL_M 8
  95. #define DGEMM_DEFAULT_UNROLL_M 4
  96. #define QGEMM_DEFAULT_UNROLL_M 2
  97. #define CGEMM_DEFAULT_UNROLL_M 4
  98. #define ZGEMM_DEFAULT_UNROLL_M 2
  99. #define XGEMM_DEFAULT_UNROLL_M 1
  100. #endif
  101. #define SGEMM_DEFAULT_P sgemm_p
  102. #define DGEMM_DEFAULT_P dgemm_p
  103. #define QGEMM_DEFAULT_P qgemm_p
  104. #define CGEMM_DEFAULT_P cgemm_p
  105. #define ZGEMM_DEFAULT_P zgemm_p
  106. #define XGEMM_DEFAULT_P xgemm_p
  107. #define SGEMM_DEFAULT_R sgemm_r
  108. #define DGEMM_DEFAULT_R dgemm_r
  109. #define QGEMM_DEFAULT_R qgemm_r
  110. #define CGEMM_DEFAULT_R cgemm_r
  111. #define ZGEMM_DEFAULT_R zgemm_r
  112. #define XGEMM_DEFAULT_R xgemm_r
  113. #ifdef ALLOC_HUGETLB
  114. #define SGEMM_DEFAULT_Q 248
  115. #define DGEMM_DEFAULT_Q 248
  116. #define QGEMM_DEFAULT_Q 248
  117. #define CGEMM_DEFAULT_Q 248
  118. #define ZGEMM_DEFAULT_Q 248
  119. #define XGEMM_DEFAULT_Q 248
  120. #else
  121. #define SGEMM_DEFAULT_Q 240
  122. #define DGEMM_DEFAULT_Q 240
  123. #define QGEMM_DEFAULT_Q 240
  124. #define CGEMM_DEFAULT_Q 240
  125. #define ZGEMM_DEFAULT_Q 240
  126. #define XGEMM_DEFAULT_Q 240
  127. #endif
  128. #define SYMV_P 16
  129. #define HAVE_EXCLUSIVE_CACHE
  130. #endif
  131. #if defined(BARCELONA) || defined(SHANGHAI) || defined(BOBCAT)
  132. #define SNUMOPT 8
  133. #define DNUMOPT 4
  134. #define GEMM_DEFAULT_OFFSET_A 64
  135. #define GEMM_DEFAULT_OFFSET_B 832
  136. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  137. #define SGEMM_DEFAULT_UNROLL_N 4
  138. #define DGEMM_DEFAULT_UNROLL_N 4
  139. #define QGEMM_DEFAULT_UNROLL_N 2
  140. #define CGEMM_DEFAULT_UNROLL_N 2
  141. #define ZGEMM_DEFAULT_UNROLL_N 2
  142. #define XGEMM_DEFAULT_UNROLL_N 1
  143. #ifdef ARCH_X86
  144. #define SGEMM_DEFAULT_UNROLL_M 4
  145. #define DGEMM_DEFAULT_UNROLL_M 2
  146. #define QGEMM_DEFAULT_UNROLL_M 2
  147. #define CGEMM_DEFAULT_UNROLL_M 2
  148. #define ZGEMM_DEFAULT_UNROLL_M 1
  149. #define XGEMM_DEFAULT_UNROLL_M 1
  150. #else
  151. #define SGEMM_DEFAULT_UNROLL_M 8
  152. #define DGEMM_DEFAULT_UNROLL_M 4
  153. #define QGEMM_DEFAULT_UNROLL_M 2
  154. #define CGEMM_DEFAULT_UNROLL_M 4
  155. #define ZGEMM_DEFAULT_UNROLL_M 2
  156. #define XGEMM_DEFAULT_UNROLL_M 1
  157. #endif
  158. #if 0
  159. #define SGEMM_DEFAULT_P 496
  160. #define DGEMM_DEFAULT_P 248
  161. #define QGEMM_DEFAULT_P 124
  162. #define CGEMM_DEFAULT_P 248
  163. #define ZGEMM_DEFAULT_P 124
  164. #define XGEMM_DEFAULT_P 62
  165. #define SGEMM_DEFAULT_Q 248
  166. #define DGEMM_DEFAULT_Q 248
  167. #define QGEMM_DEFAULT_Q 248
  168. #define CGEMM_DEFAULT_Q 248
  169. #define ZGEMM_DEFAULT_Q 248
  170. #define XGEMM_DEFAULT_Q 248
  171. #else
  172. #define SGEMM_DEFAULT_P 448
  173. #define DGEMM_DEFAULT_P 224
  174. #define QGEMM_DEFAULT_P 112
  175. #define CGEMM_DEFAULT_P 224
  176. #define ZGEMM_DEFAULT_P 112
  177. #define XGEMM_DEFAULT_P 56
  178. #define SGEMM_DEFAULT_Q 224
  179. #define DGEMM_DEFAULT_Q 224
  180. #define QGEMM_DEFAULT_Q 224
  181. #define CGEMM_DEFAULT_Q 224
  182. #define ZGEMM_DEFAULT_Q 224
  183. #define XGEMM_DEFAULT_Q 224
  184. #endif
  185. #define SGEMM_DEFAULT_R sgemm_r
  186. #define QGEMM_DEFAULT_R qgemm_r
  187. #define DGEMM_DEFAULT_R dgemm_r
  188. #define CGEMM_DEFAULT_R cgemm_r
  189. #define ZGEMM_DEFAULT_R zgemm_r
  190. #define XGEMM_DEFAULT_R xgemm_r
  191. #define SYMV_P 16
  192. #define HAVE_EXCLUSIVE_CACHE
  193. #define GEMM_THREAD gemm_thread_mn
  194. #endif
  195. #ifdef BULLDOZER
  196. #define SNUMOPT 8
  197. #define DNUMOPT 4
  198. #define GEMM_DEFAULT_OFFSET_A 64
  199. #define GEMM_DEFAULT_OFFSET_B 832
  200. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  201. #define QGEMM_DEFAULT_UNROLL_N 2
  202. #define CGEMM_DEFAULT_UNROLL_N 2
  203. #define ZGEMM_DEFAULT_UNROLL_N 2
  204. #define XGEMM_DEFAULT_UNROLL_N 1
  205. #ifdef ARCH_X86
  206. #define SGEMM_DEFAULT_UNROLL_N 4
  207. #define DGEMM_DEFAULT_UNROLL_N 4
  208. #define SGEMM_DEFAULT_UNROLL_M 4
  209. #define DGEMM_DEFAULT_UNROLL_M 2
  210. #define QGEMM_DEFAULT_UNROLL_M 2
  211. #define CGEMM_DEFAULT_UNROLL_M 2
  212. #define ZGEMM_DEFAULT_UNROLL_M 1
  213. #define XGEMM_DEFAULT_UNROLL_M 1
  214. #else
  215. #define SGEMM_DEFAULT_UNROLL_N 2
  216. #define DGEMM_DEFAULT_UNROLL_N 2
  217. #define SGEMM_DEFAULT_UNROLL_M 16
  218. #define DGEMM_DEFAULT_UNROLL_M 8
  219. #define QGEMM_DEFAULT_UNROLL_M 2
  220. #define CGEMM_DEFAULT_UNROLL_M 4
  221. #define ZGEMM_DEFAULT_UNROLL_M 2
  222. #define XGEMM_DEFAULT_UNROLL_M 1
  223. #define CGEMM3M_DEFAULT_UNROLL_N 4
  224. #define CGEMM3M_DEFAULT_UNROLL_M 8
  225. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  226. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  227. #define DGEMM_DEFAULT_UNROLL_MN 16
  228. #define GEMV_UNROLL 8
  229. #endif
  230. #if defined(ARCH_X86_64)
  231. #define SGEMM_DEFAULT_P 768
  232. #define DGEMM_DEFAULT_P 384
  233. #else
  234. #define SGEMM_DEFAULT_P 448
  235. #define DGEMM_DEFAULT_P 224
  236. #endif
  237. #define QGEMM_DEFAULT_P 112
  238. #define CGEMM_DEFAULT_P 224
  239. #define ZGEMM_DEFAULT_P 112
  240. #define XGEMM_DEFAULT_P 56
  241. #if defined(ARCH_X86_64)
  242. #define SGEMM_DEFAULT_Q 168
  243. #define DGEMM_DEFAULT_Q 168
  244. #else
  245. #define SGEMM_DEFAULT_Q 224
  246. #define DGEMM_DEFAULT_Q 224
  247. #endif
  248. #define QGEMM_DEFAULT_Q 224
  249. #define CGEMM_DEFAULT_Q 224
  250. #define ZGEMM_DEFAULT_Q 224
  251. #define XGEMM_DEFAULT_Q 224
  252. #define CGEMM3M_DEFAULT_P 448
  253. #define ZGEMM3M_DEFAULT_P 224
  254. #define XGEMM3M_DEFAULT_P 112
  255. #define CGEMM3M_DEFAULT_Q 224
  256. #define ZGEMM3M_DEFAULT_Q 224
  257. #define XGEMM3M_DEFAULT_Q 224
  258. #define CGEMM3M_DEFAULT_R 12288
  259. #define ZGEMM3M_DEFAULT_R 12288
  260. #define XGEMM3M_DEFAULT_R 12288
  261. #define SGEMM_DEFAULT_R sgemm_r
  262. #define QGEMM_DEFAULT_R qgemm_r
  263. #define DGEMM_DEFAULT_R dgemm_r
  264. #define CGEMM_DEFAULT_R cgemm_r
  265. #define ZGEMM_DEFAULT_R zgemm_r
  266. #define XGEMM_DEFAULT_R xgemm_r
  267. #define SYMV_P 16
  268. #define HAVE_EXCLUSIVE_CACHE
  269. #define GEMM_THREAD gemm_thread_mn
  270. #endif
  271. #ifdef PILEDRIVER
  272. #define SNUMOPT 8
  273. #define DNUMOPT 4
  274. #define GEMM_DEFAULT_OFFSET_A 64
  275. #define GEMM_DEFAULT_OFFSET_B 832
  276. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  277. #define QGEMM_DEFAULT_UNROLL_N 2
  278. #define CGEMM_DEFAULT_UNROLL_N 2
  279. #define ZGEMM_DEFAULT_UNROLL_N 2
  280. #define XGEMM_DEFAULT_UNROLL_N 1
  281. #ifdef ARCH_X86
  282. #define SGEMM_DEFAULT_UNROLL_N 4
  283. #define DGEMM_DEFAULT_UNROLL_N 4
  284. #define SGEMM_DEFAULT_UNROLL_M 4
  285. #define DGEMM_DEFAULT_UNROLL_M 2
  286. #define QGEMM_DEFAULT_UNROLL_M 2
  287. #define CGEMM_DEFAULT_UNROLL_M 2
  288. #define ZGEMM_DEFAULT_UNROLL_M 1
  289. #define XGEMM_DEFAULT_UNROLL_M 1
  290. #else
  291. #define SGEMM_DEFAULT_UNROLL_N 2
  292. #define DGEMM_DEFAULT_UNROLL_N 2
  293. #define SGEMM_DEFAULT_UNROLL_M 16
  294. #define DGEMM_DEFAULT_UNROLL_M 8
  295. #define QGEMM_DEFAULT_UNROLL_M 2
  296. #define CGEMM_DEFAULT_UNROLL_M 4
  297. #define ZGEMM_DEFAULT_UNROLL_M 2
  298. #define XGEMM_DEFAULT_UNROLL_M 1
  299. #define CGEMM3M_DEFAULT_UNROLL_N 4
  300. #define CGEMM3M_DEFAULT_UNROLL_M 8
  301. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  302. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  303. #define GEMV_UNROLL 8
  304. #endif
  305. #if defined(ARCH_X86_64)
  306. #define SGEMM_DEFAULT_P 768
  307. #define DGEMM_DEFAULT_P 768
  308. #define ZGEMM_DEFAULT_P 384
  309. #define CGEMM_DEFAULT_P 768
  310. #else
  311. #define SGEMM_DEFAULT_P 448
  312. #define DGEMM_DEFAULT_P 480
  313. #define ZGEMM_DEFAULT_P 112
  314. #define CGEMM_DEFAULT_P 224
  315. #endif
  316. #define QGEMM_DEFAULT_P 112
  317. #define XGEMM_DEFAULT_P 56
  318. #if defined(ARCH_X86_64)
  319. #define SGEMM_DEFAULT_Q 192
  320. #define DGEMM_DEFAULT_Q 168
  321. #define ZGEMM_DEFAULT_Q 168
  322. #define CGEMM_DEFAULT_Q 168
  323. #else
  324. #define SGEMM_DEFAULT_Q 224
  325. #define DGEMM_DEFAULT_Q 224
  326. #define ZGEMM_DEFAULT_Q 224
  327. #define CGEMM_DEFAULT_Q 224
  328. #endif
  329. #define QGEMM_DEFAULT_Q 224
  330. #define XGEMM_DEFAULT_Q 224
  331. #define CGEMM3M_DEFAULT_P 448
  332. #define ZGEMM3M_DEFAULT_P 224
  333. #define XGEMM3M_DEFAULT_P 112
  334. #define CGEMM3M_DEFAULT_Q 224
  335. #define ZGEMM3M_DEFAULT_Q 224
  336. #define XGEMM3M_DEFAULT_Q 224
  337. #define CGEMM3M_DEFAULT_R 12288
  338. #define ZGEMM3M_DEFAULT_R 12288
  339. #define XGEMM3M_DEFAULT_R 12288
  340. #define SGEMM_DEFAULT_R 12288
  341. #define QGEMM_DEFAULT_R qgemm_r
  342. #define DGEMM_DEFAULT_R 12288
  343. #define CGEMM_DEFAULT_R cgemm_r
  344. #define ZGEMM_DEFAULT_R zgemm_r
  345. #define XGEMM_DEFAULT_R xgemm_r
  346. #define SYMV_P 16
  347. #define HAVE_EXCLUSIVE_CACHE
  348. #define GEMM_THREAD gemm_thread_mn
  349. #endif
  350. #ifdef STEAMROLLER
  351. #define SNUMOPT 8
  352. #define DNUMOPT 4
  353. #define GEMM_DEFAULT_OFFSET_A 64
  354. #define GEMM_DEFAULT_OFFSET_B 832
  355. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  356. #define QGEMM_DEFAULT_UNROLL_N 2
  357. #define CGEMM_DEFAULT_UNROLL_N 2
  358. #define ZGEMM_DEFAULT_UNROLL_N 2
  359. #define XGEMM_DEFAULT_UNROLL_N 1
  360. #ifdef ARCH_X86
  361. #define SGEMM_DEFAULT_UNROLL_N 4
  362. #define DGEMM_DEFAULT_UNROLL_N 4
  363. #define SGEMM_DEFAULT_UNROLL_M 4
  364. #define DGEMM_DEFAULT_UNROLL_M 2
  365. #define QGEMM_DEFAULT_UNROLL_M 2
  366. #define CGEMM_DEFAULT_UNROLL_M 2
  367. #define ZGEMM_DEFAULT_UNROLL_M 1
  368. #define XGEMM_DEFAULT_UNROLL_M 1
  369. #else
  370. #define SGEMM_DEFAULT_UNROLL_N 2
  371. #define DGEMM_DEFAULT_UNROLL_N 2
  372. #define SGEMM_DEFAULT_UNROLL_M 16
  373. #define DGEMM_DEFAULT_UNROLL_M 8
  374. #define QGEMM_DEFAULT_UNROLL_M 2
  375. #define CGEMM_DEFAULT_UNROLL_M 4
  376. #define ZGEMM_DEFAULT_UNROLL_M 2
  377. #define XGEMM_DEFAULT_UNROLL_M 1
  378. #define CGEMM3M_DEFAULT_UNROLL_N 4
  379. #define CGEMM3M_DEFAULT_UNROLL_M 8
  380. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  381. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  382. #define GEMV_UNROLL 8
  383. #endif
  384. #if defined(ARCH_X86_64)
  385. #define SGEMM_DEFAULT_P 768
  386. #define DGEMM_DEFAULT_P 576
  387. #define ZGEMM_DEFAULT_P 288
  388. #define CGEMM_DEFAULT_P 576
  389. #else
  390. #define SGEMM_DEFAULT_P 448
  391. #define DGEMM_DEFAULT_P 480
  392. #define ZGEMM_DEFAULT_P 112
  393. #define CGEMM_DEFAULT_P 224
  394. #endif
  395. #define QGEMM_DEFAULT_P 112
  396. #define XGEMM_DEFAULT_P 56
  397. #if defined(ARCH_X86_64)
  398. #define SGEMM_DEFAULT_Q 192
  399. #define DGEMM_DEFAULT_Q 160
  400. #define ZGEMM_DEFAULT_Q 160
  401. #define CGEMM_DEFAULT_Q 160
  402. #else
  403. #define SGEMM_DEFAULT_Q 224
  404. #define DGEMM_DEFAULT_Q 224
  405. #define ZGEMM_DEFAULT_Q 224
  406. #define CGEMM_DEFAULT_Q 224
  407. #endif
  408. #define QGEMM_DEFAULT_Q 224
  409. #define XGEMM_DEFAULT_Q 224
  410. #define CGEMM3M_DEFAULT_P 448
  411. #define ZGEMM3M_DEFAULT_P 224
  412. #define XGEMM3M_DEFAULT_P 112
  413. #define CGEMM3M_DEFAULT_Q 224
  414. #define ZGEMM3M_DEFAULT_Q 224
  415. #define XGEMM3M_DEFAULT_Q 224
  416. #define CGEMM3M_DEFAULT_R 12288
  417. #define ZGEMM3M_DEFAULT_R 12288
  418. #define XGEMM3M_DEFAULT_R 12288
  419. #define SGEMM_DEFAULT_R 12288
  420. #define QGEMM_DEFAULT_R qgemm_r
  421. #define DGEMM_DEFAULT_R 12288
  422. #define CGEMM_DEFAULT_R cgemm_r
  423. #define ZGEMM_DEFAULT_R zgemm_r
  424. #define XGEMM_DEFAULT_R xgemm_r
  425. #define SYMV_P 16
  426. #define HAVE_EXCLUSIVE_CACHE
  427. #define GEMM_THREAD gemm_thread_mn
  428. #endif
  429. #ifdef EXCAVATOR
  430. #define SNUMOPT 8
  431. #define DNUMOPT 4
  432. #define GEMM_DEFAULT_OFFSET_A 64
  433. #define GEMM_DEFAULT_OFFSET_B 832
  434. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  435. #define QGEMM_DEFAULT_UNROLL_N 2
  436. #define CGEMM_DEFAULT_UNROLL_N 2
  437. #define ZGEMM_DEFAULT_UNROLL_N 2
  438. #define XGEMM_DEFAULT_UNROLL_N 1
  439. #ifdef ARCH_X86
  440. #define SGEMM_DEFAULT_UNROLL_N 4
  441. #define DGEMM_DEFAULT_UNROLL_N 4
  442. #define SGEMM_DEFAULT_UNROLL_M 4
  443. #define DGEMM_DEFAULT_UNROLL_M 2
  444. #define QGEMM_DEFAULT_UNROLL_M 2
  445. #define CGEMM_DEFAULT_UNROLL_M 2
  446. #define ZGEMM_DEFAULT_UNROLL_M 1
  447. #define XGEMM_DEFAULT_UNROLL_M 1
  448. #else
  449. #define SGEMM_DEFAULT_UNROLL_N 2
  450. #define DGEMM_DEFAULT_UNROLL_N 2
  451. #define SGEMM_DEFAULT_UNROLL_M 16
  452. #define DGEMM_DEFAULT_UNROLL_M 8
  453. #define QGEMM_DEFAULT_UNROLL_M 2
  454. #define CGEMM_DEFAULT_UNROLL_M 4
  455. #define ZGEMM_DEFAULT_UNROLL_M 2
  456. #define XGEMM_DEFAULT_UNROLL_M 1
  457. #define CGEMM3M_DEFAULT_UNROLL_N 4
  458. #define CGEMM3M_DEFAULT_UNROLL_M 8
  459. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  460. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  461. #define GEMV_UNROLL 8
  462. #endif
  463. #if defined(ARCH_X86_64)
  464. #define SGEMM_DEFAULT_P 768
  465. #define DGEMM_DEFAULT_P 576
  466. #define ZGEMM_DEFAULT_P 288
  467. #define CGEMM_DEFAULT_P 576
  468. #else
  469. #define SGEMM_DEFAULT_P 448
  470. #define DGEMM_DEFAULT_P 480
  471. #define ZGEMM_DEFAULT_P 112
  472. #define CGEMM_DEFAULT_P 224
  473. #endif
  474. #define QGEMM_DEFAULT_P 112
  475. #define XGEMM_DEFAULT_P 56
  476. #if defined(ARCH_X86_64)
  477. #define SGEMM_DEFAULT_Q 192
  478. #define DGEMM_DEFAULT_Q 160
  479. #define ZGEMM_DEFAULT_Q 160
  480. #define CGEMM_DEFAULT_Q 160
  481. #else
  482. #define SGEMM_DEFAULT_Q 224
  483. #define DGEMM_DEFAULT_Q 224
  484. #define ZGEMM_DEFAULT_Q 224
  485. #define CGEMM_DEFAULT_Q 224
  486. #endif
  487. #define QGEMM_DEFAULT_Q 224
  488. #define XGEMM_DEFAULT_Q 224
  489. #define CGEMM3M_DEFAULT_P 448
  490. #define ZGEMM3M_DEFAULT_P 224
  491. #define XGEMM3M_DEFAULT_P 112
  492. #define CGEMM3M_DEFAULT_Q 224
  493. #define ZGEMM3M_DEFAULT_Q 224
  494. #define XGEMM3M_DEFAULT_Q 224
  495. #define CGEMM3M_DEFAULT_R 12288
  496. #define ZGEMM3M_DEFAULT_R 12288
  497. #define XGEMM3M_DEFAULT_R 12288
  498. #define SGEMM_DEFAULT_R 12288
  499. #define QGEMM_DEFAULT_R qgemm_r
  500. #define DGEMM_DEFAULT_R 12288
  501. #define CGEMM_DEFAULT_R cgemm_r
  502. #define ZGEMM_DEFAULT_R zgemm_r
  503. #define XGEMM_DEFAULT_R xgemm_r
  504. #define SYMV_P 16
  505. #define HAVE_EXCLUSIVE_CACHE
  506. #define GEMM_THREAD gemm_thread_mn
  507. #endif
  508. #ifdef ZEN
  509. #define SNUMOPT 16
  510. #define DNUMOPT 8
  511. #define GEMM_DEFAULT_OFFSET_A 0
  512. #define GEMM_DEFAULT_OFFSET_B 0
  513. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  514. #define SYMV_P 8
  515. #if defined(XDOUBLE) || defined(DOUBLE)
  516. #define SWITCH_RATIO 4
  517. #define GEMM_PREFERED_SIZE 4
  518. #else
  519. #define SWITCH_RATIO 8
  520. #define GEMM_PREFERED_SIZE 8
  521. #endif
  522. #ifdef ARCH_X86
  523. #define SGEMM_DEFAULT_UNROLL_M 4
  524. #define DGEMM_DEFAULT_UNROLL_M 2
  525. #define QGEMM_DEFAULT_UNROLL_M 2
  526. #define CGEMM_DEFAULT_UNROLL_M 2
  527. #define ZGEMM_DEFAULT_UNROLL_M 1
  528. #define XGEMM_DEFAULT_UNROLL_M 1
  529. #define SGEMM_DEFAULT_UNROLL_N 4
  530. #define DGEMM_DEFAULT_UNROLL_N 4
  531. #define QGEMM_DEFAULT_UNROLL_N 2
  532. #define CGEMM_DEFAULT_UNROLL_N 2
  533. #define ZGEMM_DEFAULT_UNROLL_N 2
  534. #define XGEMM_DEFAULT_UNROLL_N 1
  535. #else
  536. #define SGEMM_DEFAULT_UNROLL_M 8
  537. #define DGEMM_DEFAULT_UNROLL_M 4
  538. #define QGEMM_DEFAULT_UNROLL_M 2
  539. #define CGEMM_DEFAULT_UNROLL_M 8
  540. #define ZGEMM_DEFAULT_UNROLL_M 4
  541. #define XGEMM_DEFAULT_UNROLL_M 1
  542. #define SGEMM_DEFAULT_UNROLL_N 4
  543. #define DGEMM_DEFAULT_UNROLL_N 8
  544. #define QGEMM_DEFAULT_UNROLL_N 2
  545. #define CGEMM_DEFAULT_UNROLL_N 2
  546. #define ZGEMM_DEFAULT_UNROLL_N 2
  547. #define XGEMM_DEFAULT_UNROLL_N 1
  548. /*
  549. #define SGEMM_DEFAULT_UNROLL_MN 32
  550. #define DGEMM_DEFAULT_UNROLL_MN 32
  551. */
  552. #endif
  553. #ifdef ARCH_X86
  554. #define SGEMM_DEFAULT_P 512
  555. #define SGEMM_DEFAULT_R sgemm_r
  556. #define DGEMM_DEFAULT_P 512
  557. #define DGEMM_DEFAULT_R dgemm_r
  558. #define QGEMM_DEFAULT_P 504
  559. #define QGEMM_DEFAULT_R qgemm_r
  560. #define CGEMM_DEFAULT_P 128
  561. #define CGEMM_DEFAULT_R 1024
  562. #define ZGEMM_DEFAULT_P 512
  563. #define ZGEMM_DEFAULT_R zgemm_r
  564. #define XGEMM_DEFAULT_P 252
  565. #define XGEMM_DEFAULT_R xgemm_r
  566. #define SGEMM_DEFAULT_Q 256
  567. #define DGEMM_DEFAULT_Q 256
  568. #define QGEMM_DEFAULT_Q 128
  569. #define CGEMM_DEFAULT_Q 256
  570. #define ZGEMM_DEFAULT_Q 192
  571. #define XGEMM_DEFAULT_Q 128
  572. #else
  573. #define SGEMM_DEFAULT_P 320
  574. #define DGEMM_DEFAULT_P 512
  575. #define CGEMM_DEFAULT_P 256
  576. #define ZGEMM_DEFAULT_P 192
  577. #ifdef WINDOWS_ABI
  578. #define SGEMM_DEFAULT_Q 320
  579. #define DGEMM_DEFAULT_Q 128
  580. #else
  581. #define SGEMM_DEFAULT_Q 320
  582. #define DGEMM_DEFAULT_Q 256
  583. #endif
  584. #define CGEMM_DEFAULT_Q 256
  585. #define ZGEMM_DEFAULT_Q 192
  586. #define SGEMM_DEFAULT_R sgemm_r
  587. #define DGEMM_DEFAULT_R 13824
  588. #define CGEMM_DEFAULT_R cgemm_r
  589. #define ZGEMM_DEFAULT_R zgemm_r
  590. #define QGEMM_DEFAULT_Q 128
  591. #define QGEMM_DEFAULT_P 504
  592. #define QGEMM_DEFAULT_R qgemm_r
  593. #define XGEMM_DEFAULT_P 252
  594. #define XGEMM_DEFAULT_R xgemm_r
  595. #define XGEMM_DEFAULT_Q 128
  596. #define CGEMM3M_DEFAULT_UNROLL_N 4
  597. #define CGEMM3M_DEFAULT_UNROLL_M 8
  598. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  599. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  600. #define CGEMM3M_DEFAULT_P 320
  601. #define ZGEMM3M_DEFAULT_P 256
  602. #define XGEMM3M_DEFAULT_P 112
  603. #define CGEMM3M_DEFAULT_Q 320
  604. #define ZGEMM3M_DEFAULT_Q 256
  605. #define XGEMM3M_DEFAULT_Q 224
  606. #define CGEMM3M_DEFAULT_R 12288
  607. #define ZGEMM3M_DEFAULT_R 12288
  608. #define XGEMM3M_DEFAULT_R 12288
  609. #endif
  610. #endif
  611. #ifdef ATHLON
  612. #define SNUMOPT 4
  613. #define DNUMOPT 2
  614. #define GEMM_DEFAULT_OFFSET_A 0
  615. #define GEMM_DEFAULT_OFFSET_B 384
  616. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  617. #define SGEMM_DEFAULT_UNROLL_N 4
  618. #define DGEMM_DEFAULT_UNROLL_N 4
  619. #define QGEMM_DEFAULT_UNROLL_N 2
  620. #define CGEMM_DEFAULT_UNROLL_N 2
  621. #define ZGEMM_DEFAULT_UNROLL_N 2
  622. #define XGEMM_DEFAULT_UNROLL_N 1
  623. #define SGEMM_DEFAULT_UNROLL_M 2
  624. #define DGEMM_DEFAULT_UNROLL_M 1
  625. #define QGEMM_DEFAULT_UNROLL_M 2
  626. #define CGEMM_DEFAULT_UNROLL_M 1
  627. #define ZGEMM_DEFAULT_UNROLL_M 1
  628. #define XGEMM_DEFAULT_UNROLL_M 1
  629. #define SGEMM_DEFAULT_R sgemm_r
  630. #define DGEMM_DEFAULT_R dgemm_r
  631. #define QGEMM_DEFAULT_R qgemm_r
  632. #define CGEMM_DEFAULT_R cgemm_r
  633. #define ZGEMM_DEFAULT_R zgemm_r
  634. #define XGEMM_DEFAULT_R xgemm_r
  635. #define SGEMM_DEFAULT_P 208
  636. #define DGEMM_DEFAULT_P 104
  637. #define QGEMM_DEFAULT_P 56
  638. #define CGEMM_DEFAULT_P 104
  639. #define ZGEMM_DEFAULT_P 56
  640. #define XGEMM_DEFAULT_P 28
  641. #define SGEMM_DEFAULT_Q 208
  642. #define DGEMM_DEFAULT_Q 208
  643. #define QGEMM_DEFAULT_Q 208
  644. #define CGEMM_DEFAULT_Q 208
  645. #define ZGEMM_DEFAULT_Q 208
  646. #define XGEMM_DEFAULT_Q 208
  647. #define SYMV_P 16
  648. #define HAVE_EXCLUSIVE_CACHE
  649. #endif
  650. #ifdef VIAC3
  651. #define SNUMOPT 2
  652. #define DNUMOPT 1
  653. #define GEMM_DEFAULT_OFFSET_A 0
  654. #define GEMM_DEFAULT_OFFSET_B 256
  655. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  656. #define SGEMM_DEFAULT_UNROLL_N 4
  657. #define DGEMM_DEFAULT_UNROLL_N 4
  658. #define QGEMM_DEFAULT_UNROLL_N 2
  659. #define CGEMM_DEFAULT_UNROLL_N 2
  660. #define ZGEMM_DEFAULT_UNROLL_N 2
  661. #define XGEMM_DEFAULT_UNROLL_N 1
  662. #define SGEMM_DEFAULT_UNROLL_M 2
  663. #define DGEMM_DEFAULT_UNROLL_M 1
  664. #define QGEMM_DEFAULT_UNROLL_M 2
  665. #define CGEMM_DEFAULT_UNROLL_M 1
  666. #define ZGEMM_DEFAULT_UNROLL_M 1
  667. #define XGEMM_DEFAULT_UNROLL_M 1
  668. #define SGEMM_DEFAULT_R sgemm_r
  669. #define DGEMM_DEFAULT_R dgemm_r
  670. #define QGEMM_DEFAULT_R qgemm_r
  671. #define CGEMM_DEFAULT_R cgemm_r
  672. #define ZGEMM_DEFAULT_R zgemm_r
  673. #define XGEMM_DEFAULT_R xgemm_r
  674. #define SGEMM_DEFAULT_P 128
  675. #define DGEMM_DEFAULT_P 128
  676. #define QGEMM_DEFAULT_P 128
  677. #define CGEMM_DEFAULT_P 128
  678. #define ZGEMM_DEFAULT_P 128
  679. #define XGEMM_DEFAULT_P 128
  680. #define SGEMM_DEFAULT_Q 512
  681. #define DGEMM_DEFAULT_Q 256
  682. #define QGEMM_DEFAULT_Q 256
  683. #define CGEMM_DEFAULT_Q 256
  684. #define ZGEMM_DEFAULT_Q 128
  685. #define XGEMM_DEFAULT_Q 128
  686. #define SYMV_P 16
  687. #endif
  688. #ifdef NANO
  689. #define SNUMOPT 4
  690. #define DNUMOPT 2
  691. #define GEMM_DEFAULT_OFFSET_A 64
  692. #define GEMM_DEFAULT_OFFSET_B 256
  693. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x01ffffUL
  694. #ifdef ARCH_X86
  695. #define SGEMM_DEFAULT_UNROLL_N 4
  696. #define DGEMM_DEFAULT_UNROLL_N 4
  697. #define QGEMM_DEFAULT_UNROLL_N 2
  698. #define CGEMM_DEFAULT_UNROLL_N 2
  699. #define ZGEMM_DEFAULT_UNROLL_N 2
  700. #define XGEMM_DEFAULT_UNROLL_N 1
  701. #define SGEMM_DEFAULT_UNROLL_M 4
  702. #define DGEMM_DEFAULT_UNROLL_M 2
  703. #define QGEMM_DEFAULT_UNROLL_M 2
  704. #define CGEMM_DEFAULT_UNROLL_M 2
  705. #define ZGEMM_DEFAULT_UNROLL_M 1
  706. #define XGEMM_DEFAULT_UNROLL_M 1
  707. #else
  708. #define SGEMM_DEFAULT_UNROLL_N 8
  709. #define DGEMM_DEFAULT_UNROLL_N 4
  710. #define QGEMM_DEFAULT_UNROLL_N 2
  711. #define CGEMM_DEFAULT_UNROLL_N 4
  712. #define ZGEMM_DEFAULT_UNROLL_N 2
  713. #define XGEMM_DEFAULT_UNROLL_N 1
  714. #define SGEMM_DEFAULT_UNROLL_M 4
  715. #define DGEMM_DEFAULT_UNROLL_M 4
  716. #define QGEMM_DEFAULT_UNROLL_M 2
  717. #define CGEMM_DEFAULT_UNROLL_M 2
  718. #define ZGEMM_DEFAULT_UNROLL_M 2
  719. #define XGEMM_DEFAULT_UNROLL_M 1
  720. #endif
  721. #define SGEMM_DEFAULT_P 288
  722. #define DGEMM_DEFAULT_P 288
  723. #define QGEMM_DEFAULT_P 288
  724. #define CGEMM_DEFAULT_P 288
  725. #define ZGEMM_DEFAULT_P 288
  726. #define XGEMM_DEFAULT_P 288
  727. #define SGEMM_DEFAULT_R sgemm_r
  728. #define DGEMM_DEFAULT_R dgemm_r
  729. #define QGEMM_DEFAULT_R qgemm_r
  730. #define CGEMM_DEFAULT_R cgemm_r
  731. #define ZGEMM_DEFAULT_R zgemm_r
  732. #define XGEMM_DEFAULT_R xgemm_r
  733. #define SGEMM_DEFAULT_Q 256
  734. #define DGEMM_DEFAULT_Q 128
  735. #define QGEMM_DEFAULT_Q 64
  736. #define CGEMM_DEFAULT_Q 128
  737. #define ZGEMM_DEFAULT_Q 64
  738. #define XGEMM_DEFAULT_Q 32
  739. #define SYMV_P 16
  740. #define HAVE_EXCLUSIVE_CACHE
  741. #endif
  742. #if defined(PENTIUM) || defined(PENTIUM2) || defined(PENTIUM3)
  743. #ifdef HAVE_SSE
  744. #define SNUMOPT 2
  745. #else
  746. #define SNUMOPT 1
  747. #endif
  748. #define DNUMOPT 1
  749. #define GEMM_DEFAULT_OFFSET_A 0
  750. #define GEMM_DEFAULT_OFFSET_B 0
  751. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  752. #ifdef HAVE_SSE
  753. #define SGEMM_DEFAULT_UNROLL_M 8
  754. #define CGEMM_DEFAULT_UNROLL_M 4
  755. #else
  756. #define SGEMM_DEFAULT_UNROLL_M 4
  757. #define CGEMM_DEFAULT_UNROLL_M 2
  758. #endif
  759. #define DGEMM_DEFAULT_UNROLL_M 2
  760. #define SGEMM_DEFAULT_UNROLL_N 2
  761. #define DGEMM_DEFAULT_UNROLL_N 2
  762. #define QGEMM_DEFAULT_UNROLL_M 2
  763. #define QGEMM_DEFAULT_UNROLL_N 2
  764. #define CGEMM_DEFAULT_UNROLL_N 1
  765. #define ZGEMM_DEFAULT_UNROLL_M 1
  766. #define ZGEMM_DEFAULT_UNROLL_N 1
  767. #define XGEMM_DEFAULT_UNROLL_M 1
  768. #define XGEMM_DEFAULT_UNROLL_N 1
  769. #define SGEMM_DEFAULT_P sgemm_p
  770. #define SGEMM_DEFAULT_Q 256
  771. #define SGEMM_DEFAULT_R sgemm_r
  772. #define DGEMM_DEFAULT_P dgemm_p
  773. #define DGEMM_DEFAULT_Q 256
  774. #define DGEMM_DEFAULT_R dgemm_r
  775. #define QGEMM_DEFAULT_P qgemm_p
  776. #define QGEMM_DEFAULT_Q 256
  777. #define QGEMM_DEFAULT_R qgemm_r
  778. #define CGEMM_DEFAULT_P cgemm_p
  779. #define CGEMM_DEFAULT_Q 256
  780. #define CGEMM_DEFAULT_R cgemm_r
  781. #define ZGEMM_DEFAULT_P zgemm_p
  782. #define ZGEMM_DEFAULT_Q 256
  783. #define ZGEMM_DEFAULT_R zgemm_r
  784. #define XGEMM_DEFAULT_P xgemm_p
  785. #define XGEMM_DEFAULT_Q 256
  786. #define XGEMM_DEFAULT_R xgemm_r
  787. #define SYMV_P 4
  788. #endif
  789. #ifdef PENTIUMM
  790. #define SNUMOPT 2
  791. #define DNUMOPT 1
  792. #define GEMM_DEFAULT_OFFSET_A 0
  793. #define GEMM_DEFAULT_OFFSET_B 0
  794. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  795. #ifdef CORE_YONAH
  796. #define SGEMM_DEFAULT_UNROLL_M 4
  797. #define SGEMM_DEFAULT_UNROLL_N 4
  798. #define DGEMM_DEFAULT_UNROLL_M 2
  799. #define DGEMM_DEFAULT_UNROLL_N 4
  800. #define QGEMM_DEFAULT_UNROLL_M 2
  801. #define QGEMM_DEFAULT_UNROLL_N 2
  802. #define CGEMM_DEFAULT_UNROLL_M 2
  803. #define CGEMM_DEFAULT_UNROLL_N 2
  804. #define ZGEMM_DEFAULT_UNROLL_M 1
  805. #define ZGEMM_DEFAULT_UNROLL_N 2
  806. #define XGEMM_DEFAULT_UNROLL_M 1
  807. #define XGEMM_DEFAULT_UNROLL_N 1
  808. #else
  809. #define SGEMM_DEFAULT_UNROLL_M 8
  810. #define SGEMM_DEFAULT_UNROLL_N 2
  811. #define DGEMM_DEFAULT_UNROLL_M 2
  812. #define DGEMM_DEFAULT_UNROLL_N 2
  813. #define QGEMM_DEFAULT_UNROLL_M 2
  814. #define QGEMM_DEFAULT_UNROLL_N 2
  815. #define CGEMM_DEFAULT_UNROLL_M 4
  816. #define CGEMM_DEFAULT_UNROLL_N 1
  817. #define ZGEMM_DEFAULT_UNROLL_M 1
  818. #define ZGEMM_DEFAULT_UNROLL_N 1
  819. #define XGEMM_DEFAULT_UNROLL_M 1
  820. #define XGEMM_DEFAULT_UNROLL_N 1
  821. #endif
  822. #define SGEMM_DEFAULT_P sgemm_p
  823. #define SGEMM_DEFAULT_Q 256
  824. #define SGEMM_DEFAULT_R sgemm_r
  825. #define DGEMM_DEFAULT_P dgemm_p
  826. #define DGEMM_DEFAULT_Q 256
  827. #define DGEMM_DEFAULT_R dgemm_r
  828. #define QGEMM_DEFAULT_P qgemm_p
  829. #define QGEMM_DEFAULT_Q 256
  830. #define QGEMM_DEFAULT_R qgemm_r
  831. #define CGEMM_DEFAULT_P cgemm_p
  832. #define CGEMM_DEFAULT_Q 256
  833. #define CGEMM_DEFAULT_R cgemm_r
  834. #define ZGEMM_DEFAULT_P zgemm_p
  835. #define ZGEMM_DEFAULT_Q 256
  836. #define ZGEMM_DEFAULT_R zgemm_r
  837. #define XGEMM_DEFAULT_P xgemm_p
  838. #define XGEMM_DEFAULT_Q 256
  839. #define XGEMM_DEFAULT_R xgemm_r
  840. #define SYMV_P 4
  841. #endif
  842. #ifdef CORE_NORTHWOOD
  843. #define SNUMOPT 4
  844. #define DNUMOPT 2
  845. #define GEMM_DEFAULT_OFFSET_A 0
  846. #define GEMM_DEFAULT_OFFSET_B 32
  847. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  848. #define SYMV_P 8
  849. #define SGEMM_DEFAULT_UNROLL_M 8
  850. #define DGEMM_DEFAULT_UNROLL_M 4
  851. #define QGEMM_DEFAULT_UNROLL_M 2
  852. #define CGEMM_DEFAULT_UNROLL_M 4
  853. #define ZGEMM_DEFAULT_UNROLL_M 2
  854. #define XGEMM_DEFAULT_UNROLL_M 1
  855. #define SGEMM_DEFAULT_UNROLL_N 2
  856. #define DGEMM_DEFAULT_UNROLL_N 2
  857. #define QGEMM_DEFAULT_UNROLL_N 2
  858. #define CGEMM_DEFAULT_UNROLL_N 1
  859. #define ZGEMM_DEFAULT_UNROLL_N 1
  860. #define XGEMM_DEFAULT_UNROLL_N 1
  861. #define SGEMM_DEFAULT_P sgemm_p
  862. #define SGEMM_DEFAULT_R sgemm_r
  863. #define DGEMM_DEFAULT_P dgemm_p
  864. #define DGEMM_DEFAULT_R dgemm_r
  865. #define QGEMM_DEFAULT_P qgemm_p
  866. #define QGEMM_DEFAULT_R qgemm_r
  867. #define CGEMM_DEFAULT_P cgemm_p
  868. #define CGEMM_DEFAULT_R cgemm_r
  869. #define ZGEMM_DEFAULT_P zgemm_p
  870. #define ZGEMM_DEFAULT_R zgemm_r
  871. #define XGEMM_DEFAULT_P xgemm_p
  872. #define XGEMM_DEFAULT_R xgemm_r
  873. #define SGEMM_DEFAULT_Q 128
  874. #define DGEMM_DEFAULT_Q 128
  875. #define QGEMM_DEFAULT_Q 128
  876. #define CGEMM_DEFAULT_Q 128
  877. #define ZGEMM_DEFAULT_Q 128
  878. #define XGEMM_DEFAULT_Q 128
  879. #endif
  880. #ifdef CORE_PRESCOTT
  881. #define SNUMOPT 4
  882. #define DNUMOPT 2
  883. #ifndef __64BIT__
  884. #define GEMM_DEFAULT_OFFSET_A 128
  885. #define GEMM_DEFAULT_OFFSET_B 192
  886. #else
  887. #define GEMM_DEFAULT_OFFSET_A 0
  888. #define GEMM_DEFAULT_OFFSET_B 256
  889. #endif
  890. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  891. #define SYMV_P 8
  892. #ifdef ARCH_X86
  893. #define SGEMM_DEFAULT_UNROLL_M 4
  894. #define DGEMM_DEFAULT_UNROLL_M 2
  895. #define QGEMM_DEFAULT_UNROLL_M 2
  896. #define CGEMM_DEFAULT_UNROLL_M 2
  897. #define ZGEMM_DEFAULT_UNROLL_M 1
  898. #define XGEMM_DEFAULT_UNROLL_M 1
  899. #else
  900. #define SGEMM_DEFAULT_UNROLL_M 8
  901. #define DGEMM_DEFAULT_UNROLL_M 4
  902. #define QGEMM_DEFAULT_UNROLL_M 2
  903. #define CGEMM_DEFAULT_UNROLL_M 4
  904. #define ZGEMM_DEFAULT_UNROLL_M 2
  905. #define XGEMM_DEFAULT_UNROLL_M 1
  906. #endif
  907. #define SGEMM_DEFAULT_UNROLL_N 4
  908. #define DGEMM_DEFAULT_UNROLL_N 4
  909. #define QGEMM_DEFAULT_UNROLL_N 2
  910. #define CGEMM_DEFAULT_UNROLL_N 2
  911. #define ZGEMM_DEFAULT_UNROLL_N 2
  912. #define XGEMM_DEFAULT_UNROLL_N 1
  913. #define SGEMM_DEFAULT_P sgemm_p
  914. #define SGEMM_DEFAULT_R sgemm_r
  915. #define DGEMM_DEFAULT_P dgemm_p
  916. #define DGEMM_DEFAULT_R dgemm_r
  917. #define QGEMM_DEFAULT_P qgemm_p
  918. #define QGEMM_DEFAULT_R qgemm_r
  919. #define CGEMM_DEFAULT_P cgemm_p
  920. #define CGEMM_DEFAULT_R cgemm_r
  921. #define ZGEMM_DEFAULT_P zgemm_p
  922. #define ZGEMM_DEFAULT_R zgemm_r
  923. #define XGEMM_DEFAULT_P xgemm_p
  924. #define XGEMM_DEFAULT_R xgemm_r
  925. #define SGEMM_DEFAULT_Q 128
  926. #define DGEMM_DEFAULT_Q 128
  927. #define QGEMM_DEFAULT_Q 128
  928. #define CGEMM_DEFAULT_Q 128
  929. #define ZGEMM_DEFAULT_Q 128
  930. #define XGEMM_DEFAULT_Q 128
  931. #endif
  932. #ifdef CORE2
  933. #define SNUMOPT 8
  934. #define DNUMOPT 4
  935. #define GEMM_DEFAULT_OFFSET_A 448
  936. #define GEMM_DEFAULT_OFFSET_B 128
  937. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  938. #define SYMV_P 8
  939. #define SWITCH_RATIO 4
  940. #ifdef ARCH_X86
  941. #define SGEMM_DEFAULT_UNROLL_M 8
  942. #define DGEMM_DEFAULT_UNROLL_M 4
  943. #define QGEMM_DEFAULT_UNROLL_M 2
  944. #define CGEMM_DEFAULT_UNROLL_M 4
  945. #define ZGEMM_DEFAULT_UNROLL_M 2
  946. #define XGEMM_DEFAULT_UNROLL_M 1
  947. #define SGEMM_DEFAULT_UNROLL_N 2
  948. #define DGEMM_DEFAULT_UNROLL_N 2
  949. #define QGEMM_DEFAULT_UNROLL_N 2
  950. #define CGEMM_DEFAULT_UNROLL_N 1
  951. #define ZGEMM_DEFAULT_UNROLL_N 1
  952. #define XGEMM_DEFAULT_UNROLL_N 1
  953. #define MASK(a, b) ((((a) + (b) - 1) / (b)) * (b))
  954. #else
  955. #define SGEMM_DEFAULT_UNROLL_M 8
  956. #define DGEMM_DEFAULT_UNROLL_M 4
  957. #define QGEMM_DEFAULT_UNROLL_M 2
  958. #define CGEMM_DEFAULT_UNROLL_M 4
  959. #define ZGEMM_DEFAULT_UNROLL_M 2
  960. #define XGEMM_DEFAULT_UNROLL_M 1
  961. #define SGEMM_DEFAULT_UNROLL_N 4
  962. #define DGEMM_DEFAULT_UNROLL_N 4
  963. #define QGEMM_DEFAULT_UNROLL_N 2
  964. #define CGEMM_DEFAULT_UNROLL_N 2
  965. #define ZGEMM_DEFAULT_UNROLL_N 2
  966. #define XGEMM_DEFAULT_UNROLL_N 1
  967. #endif
  968. #define SGEMM_DEFAULT_P sgemm_p
  969. #define SGEMM_DEFAULT_R sgemm_r
  970. #define DGEMM_DEFAULT_P dgemm_p
  971. #define DGEMM_DEFAULT_R dgemm_r
  972. #define QGEMM_DEFAULT_P qgemm_p
  973. #define QGEMM_DEFAULT_R qgemm_r
  974. #define CGEMM_DEFAULT_P cgemm_p
  975. #define CGEMM_DEFAULT_R cgemm_r
  976. #define ZGEMM_DEFAULT_P zgemm_p
  977. #define ZGEMM_DEFAULT_R zgemm_r
  978. #define XGEMM_DEFAULT_P xgemm_p
  979. #define XGEMM_DEFAULT_R xgemm_r
  980. #define SGEMM_DEFAULT_Q 256
  981. #define DGEMM_DEFAULT_Q 256
  982. #define QGEMM_DEFAULT_Q 256
  983. #define CGEMM_DEFAULT_Q 256
  984. #define ZGEMM_DEFAULT_Q 256
  985. #define XGEMM_DEFAULT_Q 256
  986. #endif
  987. #ifdef PENRYN
  988. #define SNUMOPT 8
  989. #define DNUMOPT 4
  990. #define GEMM_DEFAULT_OFFSET_A 128
  991. #define GEMM_DEFAULT_OFFSET_B 0
  992. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  993. #define SYMV_P 8
  994. #define SWITCH_RATIO 4
  995. #ifdef ARCH_X86
  996. #define SGEMM_DEFAULT_UNROLL_M 4
  997. #define DGEMM_DEFAULT_UNROLL_M 2
  998. #define QGEMM_DEFAULT_UNROLL_M 2
  999. #define CGEMM_DEFAULT_UNROLL_M 2
  1000. #define ZGEMM_DEFAULT_UNROLL_M 1
  1001. #define XGEMM_DEFAULT_UNROLL_M 1
  1002. #define SGEMM_DEFAULT_UNROLL_N 4
  1003. #define DGEMM_DEFAULT_UNROLL_N 4
  1004. #define QGEMM_DEFAULT_UNROLL_N 2
  1005. #define CGEMM_DEFAULT_UNROLL_N 2
  1006. #define ZGEMM_DEFAULT_UNROLL_N 2
  1007. #define XGEMM_DEFAULT_UNROLL_N 1
  1008. #else
  1009. #define SGEMM_DEFAULT_UNROLL_M 8
  1010. #define DGEMM_DEFAULT_UNROLL_M 4
  1011. #define QGEMM_DEFAULT_UNROLL_M 2
  1012. #define CGEMM_DEFAULT_UNROLL_M 4
  1013. #define ZGEMM_DEFAULT_UNROLL_M 2
  1014. #define XGEMM_DEFAULT_UNROLL_M 1
  1015. #define SGEMM_DEFAULT_UNROLL_N 4
  1016. #define DGEMM_DEFAULT_UNROLL_N 4
  1017. #define QGEMM_DEFAULT_UNROLL_N 2
  1018. #define CGEMM_DEFAULT_UNROLL_N 2
  1019. #define ZGEMM_DEFAULT_UNROLL_N 2
  1020. #define XGEMM_DEFAULT_UNROLL_N 1
  1021. #endif
  1022. #define SGEMM_DEFAULT_P sgemm_p
  1023. #define SGEMM_DEFAULT_R sgemm_r
  1024. #define DGEMM_DEFAULT_P dgemm_p
  1025. #define DGEMM_DEFAULT_R dgemm_r
  1026. #define QGEMM_DEFAULT_P qgemm_p
  1027. #define QGEMM_DEFAULT_R qgemm_r
  1028. #define CGEMM_DEFAULT_P cgemm_p
  1029. #define CGEMM_DEFAULT_R cgemm_r
  1030. #define ZGEMM_DEFAULT_P zgemm_p
  1031. #define ZGEMM_DEFAULT_R zgemm_r
  1032. #define XGEMM_DEFAULT_P xgemm_p
  1033. #define XGEMM_DEFAULT_R xgemm_r
  1034. #define SGEMM_DEFAULT_Q 512
  1035. #define DGEMM_DEFAULT_Q 256
  1036. #define QGEMM_DEFAULT_Q 128
  1037. #define CGEMM_DEFAULT_Q 512
  1038. #define ZGEMM_DEFAULT_Q 256
  1039. #define XGEMM_DEFAULT_Q 128
  1040. #define GETRF_FACTOR 0.75
  1041. #endif
  1042. #ifdef DUNNINGTON
  1043. #define SNUMOPT 8
  1044. #define DNUMOPT 4
  1045. #define GEMM_DEFAULT_OFFSET_A 128
  1046. #define GEMM_DEFAULT_OFFSET_B 0
  1047. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1048. #define SYMV_P 8
  1049. #define SWITCH_RATIO 4
  1050. #ifdef ARCH_X86
  1051. #define SGEMM_DEFAULT_UNROLL_M 4
  1052. #define DGEMM_DEFAULT_UNROLL_M 2
  1053. #define QGEMM_DEFAULT_UNROLL_M 2
  1054. #define CGEMM_DEFAULT_UNROLL_M 2
  1055. #define ZGEMM_DEFAULT_UNROLL_M 1
  1056. #define XGEMM_DEFAULT_UNROLL_M 1
  1057. #define SGEMM_DEFAULT_UNROLL_N 4
  1058. #define DGEMM_DEFAULT_UNROLL_N 4
  1059. #define QGEMM_DEFAULT_UNROLL_N 2
  1060. #define CGEMM_DEFAULT_UNROLL_N 2
  1061. #define ZGEMM_DEFAULT_UNROLL_N 2
  1062. #define XGEMM_DEFAULT_UNROLL_N 1
  1063. #else
  1064. #define SGEMM_DEFAULT_UNROLL_M 8
  1065. #define DGEMM_DEFAULT_UNROLL_M 4
  1066. #define QGEMM_DEFAULT_UNROLL_M 2
  1067. #define CGEMM_DEFAULT_UNROLL_M 4
  1068. #define ZGEMM_DEFAULT_UNROLL_M 2
  1069. #define XGEMM_DEFAULT_UNROLL_M 1
  1070. #define SGEMM_DEFAULT_UNROLL_N 4
  1071. #define DGEMM_DEFAULT_UNROLL_N 4
  1072. #define QGEMM_DEFAULT_UNROLL_N 2
  1073. #define CGEMM_DEFAULT_UNROLL_N 2
  1074. #define ZGEMM_DEFAULT_UNROLL_N 2
  1075. #define XGEMM_DEFAULT_UNROLL_N 1
  1076. #endif
  1077. #define SGEMM_DEFAULT_P sgemm_p
  1078. #define SGEMM_DEFAULT_R sgemm_r
  1079. #define DGEMM_DEFAULT_P dgemm_p
  1080. #define DGEMM_DEFAULT_R dgemm_r
  1081. #define QGEMM_DEFAULT_P qgemm_p
  1082. #define QGEMM_DEFAULT_R qgemm_r
  1083. #define CGEMM_DEFAULT_P cgemm_p
  1084. #define CGEMM_DEFAULT_R cgemm_r
  1085. #define ZGEMM_DEFAULT_P zgemm_p
  1086. #define ZGEMM_DEFAULT_R zgemm_r
  1087. #define XGEMM_DEFAULT_P xgemm_p
  1088. #define XGEMM_DEFAULT_R xgemm_r
  1089. #define SGEMM_DEFAULT_Q 768
  1090. #define DGEMM_DEFAULT_Q 384
  1091. #define QGEMM_DEFAULT_Q 192
  1092. #define CGEMM_DEFAULT_Q 768
  1093. #define ZGEMM_DEFAULT_Q 384
  1094. #define XGEMM_DEFAULT_Q 192
  1095. #define GETRF_FACTOR 0.75
  1096. #define GEMM_THREAD gemm_thread_mn
  1097. #endif
  1098. #ifdef NEHALEM
  1099. #define SNUMOPT 8
  1100. #define DNUMOPT 4
  1101. #define GEMM_DEFAULT_OFFSET_A 32
  1102. #define GEMM_DEFAULT_OFFSET_B 0
  1103. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1104. #define SYMV_P 8
  1105. #define SWITCH_RATIO 4
  1106. #ifdef ARCH_X86
  1107. #define SGEMM_DEFAULT_UNROLL_M 4
  1108. #define DGEMM_DEFAULT_UNROLL_M 2
  1109. #define QGEMM_DEFAULT_UNROLL_M 2
  1110. #define CGEMM_DEFAULT_UNROLL_M 2
  1111. #define ZGEMM_DEFAULT_UNROLL_M 1
  1112. #define XGEMM_DEFAULT_UNROLL_M 1
  1113. #define SGEMM_DEFAULT_UNROLL_N 4
  1114. #define DGEMM_DEFAULT_UNROLL_N 4
  1115. #define QGEMM_DEFAULT_UNROLL_N 2
  1116. #define CGEMM_DEFAULT_UNROLL_N 2
  1117. #define ZGEMM_DEFAULT_UNROLL_N 2
  1118. #define XGEMM_DEFAULT_UNROLL_N 1
  1119. #else
  1120. #define SGEMM_DEFAULT_UNROLL_M 4
  1121. #define DGEMM_DEFAULT_UNROLL_M 2
  1122. #define QGEMM_DEFAULT_UNROLL_M 2
  1123. #define CGEMM_DEFAULT_UNROLL_M 2
  1124. #define ZGEMM_DEFAULT_UNROLL_M 1
  1125. #define XGEMM_DEFAULT_UNROLL_M 1
  1126. #define SGEMM_DEFAULT_UNROLL_N 8
  1127. #define DGEMM_DEFAULT_UNROLL_N 8
  1128. #define QGEMM_DEFAULT_UNROLL_N 2
  1129. #define CGEMM_DEFAULT_UNROLL_N 4
  1130. #define ZGEMM_DEFAULT_UNROLL_N 4
  1131. #define XGEMM_DEFAULT_UNROLL_N 1
  1132. #endif
  1133. #define SGEMM_DEFAULT_P 504
  1134. #define SGEMM_DEFAULT_R sgemm_r
  1135. #define DGEMM_DEFAULT_P 504
  1136. #define DGEMM_DEFAULT_R dgemm_r
  1137. #define QGEMM_DEFAULT_P 504
  1138. #define QGEMM_DEFAULT_R qgemm_r
  1139. #define CGEMM_DEFAULT_P 252
  1140. #define CGEMM_DEFAULT_R cgemm_r
  1141. #define ZGEMM_DEFAULT_P 252
  1142. #define ZGEMM_DEFAULT_R zgemm_r
  1143. #define XGEMM_DEFAULT_P 252
  1144. #define XGEMM_DEFAULT_R xgemm_r
  1145. #define SGEMM_DEFAULT_Q 512
  1146. #define DGEMM_DEFAULT_Q 256
  1147. #define QGEMM_DEFAULT_Q 128
  1148. #define CGEMM_DEFAULT_Q 512
  1149. #define ZGEMM_DEFAULT_Q 256
  1150. #define XGEMM_DEFAULT_Q 128
  1151. #define GETRF_FACTOR 0.72
  1152. #endif
  1153. #ifdef SANDYBRIDGE
  1154. #define SNUMOPT 8
  1155. #define DNUMOPT 4
  1156. #define GEMM_DEFAULT_OFFSET_A 0
  1157. #define GEMM_DEFAULT_OFFSET_B 0
  1158. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1159. #define SYMV_P 8
  1160. #define SWITCH_RATIO 4
  1161. #ifdef ARCH_X86
  1162. #define SGEMM_DEFAULT_UNROLL_M 4
  1163. #define DGEMM_DEFAULT_UNROLL_M 2
  1164. #define QGEMM_DEFAULT_UNROLL_M 2
  1165. #define CGEMM_DEFAULT_UNROLL_M 2
  1166. #define ZGEMM_DEFAULT_UNROLL_M 1
  1167. #define XGEMM_DEFAULT_UNROLL_M 1
  1168. #define SGEMM_DEFAULT_UNROLL_N 4
  1169. #define DGEMM_DEFAULT_UNROLL_N 4
  1170. #define QGEMM_DEFAULT_UNROLL_N 2
  1171. #define CGEMM_DEFAULT_UNROLL_N 2
  1172. #define ZGEMM_DEFAULT_UNROLL_N 2
  1173. #define XGEMM_DEFAULT_UNROLL_N 1
  1174. #else
  1175. #define SGEMM_DEFAULT_UNROLL_M 16
  1176. #define DGEMM_DEFAULT_UNROLL_M 8
  1177. #define QGEMM_DEFAULT_UNROLL_M 2
  1178. #define CGEMM_DEFAULT_UNROLL_M 8
  1179. #define ZGEMM_DEFAULT_UNROLL_M 1
  1180. #define XGEMM_DEFAULT_UNROLL_M 1
  1181. #define SGEMM_DEFAULT_UNROLL_N 4
  1182. #define DGEMM_DEFAULT_UNROLL_N 4
  1183. #define QGEMM_DEFAULT_UNROLL_N 2
  1184. #define CGEMM_DEFAULT_UNROLL_N 2
  1185. #define ZGEMM_DEFAULT_UNROLL_N 4
  1186. #define XGEMM_DEFAULT_UNROLL_N 1
  1187. #endif
  1188. #define SGEMM_DEFAULT_P 768
  1189. #define SGEMM_DEFAULT_R sgemm_r
  1190. /*#define SGEMM_DEFAULT_R 1024*/
  1191. #define DGEMM_DEFAULT_P 512
  1192. #define DGEMM_DEFAULT_R dgemm_r
  1193. /*#define DGEMM_DEFAULT_R 1024*/
  1194. #define QGEMM_DEFAULT_P 504
  1195. #define QGEMM_DEFAULT_R qgemm_r
  1196. #define CGEMM_DEFAULT_P 768
  1197. #define CGEMM_DEFAULT_R cgemm_r
  1198. /*#define CGEMM_DEFAULT_R 1024*/
  1199. #define ZGEMM_DEFAULT_P 512
  1200. #define ZGEMM_DEFAULT_R zgemm_r
  1201. /*#define ZGEMM_DEFAULT_R 1024*/
  1202. #define XGEMM_DEFAULT_P 252
  1203. #define XGEMM_DEFAULT_R xgemm_r
  1204. #define SGEMM_DEFAULT_Q 384
  1205. #define DGEMM_DEFAULT_Q 256
  1206. #define QGEMM_DEFAULT_Q 128
  1207. #define CGEMM_DEFAULT_Q 512
  1208. #define ZGEMM_DEFAULT_Q 192
  1209. #define XGEMM_DEFAULT_Q 128
  1210. #define CGEMM3M_DEFAULT_UNROLL_N 8
  1211. #define CGEMM3M_DEFAULT_UNROLL_M 4
  1212. #define ZGEMM3M_DEFAULT_UNROLL_N 8
  1213. #define ZGEMM3M_DEFAULT_UNROLL_M 2
  1214. #define CGEMM3M_DEFAULT_P 448
  1215. #define ZGEMM3M_DEFAULT_P 224
  1216. #define XGEMM3M_DEFAULT_P 112
  1217. #define CGEMM3M_DEFAULT_Q 224
  1218. #define ZGEMM3M_DEFAULT_Q 224
  1219. #define XGEMM3M_DEFAULT_Q 224
  1220. #define CGEMM3M_DEFAULT_R 12288
  1221. #define ZGEMM3M_DEFAULT_R 12288
  1222. #define XGEMM3M_DEFAULT_R 12288
  1223. #define GETRF_FACTOR 0.72
  1224. #endif
  1225. #ifdef HASWELL
  1226. #define SNUMOPT 16
  1227. #define DNUMOPT 8
  1228. #define GEMM_DEFAULT_OFFSET_A 0
  1229. #define GEMM_DEFAULT_OFFSET_B 0
  1230. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1231. #define SYMV_P 8
  1232. #if defined(XDOUBLE) || defined(DOUBLE)
  1233. #define SWITCH_RATIO 4
  1234. #define GEMM_PREFERED_SIZE 4
  1235. #else
  1236. #define SWITCH_RATIO 8
  1237. #define GEMM_PREFERED_SIZE 8
  1238. #endif
  1239. #ifdef ARCH_X86
  1240. #define SGEMM_DEFAULT_UNROLL_M 4
  1241. #define DGEMM_DEFAULT_UNROLL_M 2
  1242. #define QGEMM_DEFAULT_UNROLL_M 2
  1243. #define CGEMM_DEFAULT_UNROLL_M 2
  1244. #define ZGEMM_DEFAULT_UNROLL_M 1
  1245. #define XGEMM_DEFAULT_UNROLL_M 1
  1246. #define SGEMM_DEFAULT_UNROLL_N 4
  1247. #define DGEMM_DEFAULT_UNROLL_N 4
  1248. #define QGEMM_DEFAULT_UNROLL_N 2
  1249. #define CGEMM_DEFAULT_UNROLL_N 2
  1250. #define ZGEMM_DEFAULT_UNROLL_N 2
  1251. #define XGEMM_DEFAULT_UNROLL_N 1
  1252. #else
  1253. #define SGEMM_DEFAULT_UNROLL_M 8
  1254. #define DGEMM_DEFAULT_UNROLL_M 4
  1255. #define QGEMM_DEFAULT_UNROLL_M 2
  1256. #define CGEMM_DEFAULT_UNROLL_M 8
  1257. #define ZGEMM_DEFAULT_UNROLL_M 4
  1258. #define XGEMM_DEFAULT_UNROLL_M 1
  1259. #define SGEMM_DEFAULT_UNROLL_N 4
  1260. #define DGEMM_DEFAULT_UNROLL_N 8
  1261. #define QGEMM_DEFAULT_UNROLL_N 2
  1262. #define CGEMM_DEFAULT_UNROLL_N 2
  1263. #define ZGEMM_DEFAULT_UNROLL_N 2
  1264. #define XGEMM_DEFAULT_UNROLL_N 1
  1265. /*
  1266. #define SGEMM_DEFAULT_UNROLL_MN 32
  1267. #define DGEMM_DEFAULT_UNROLL_MN 32
  1268. */
  1269. #endif
  1270. #ifdef ARCH_X86
  1271. #define SGEMM_DEFAULT_P 512
  1272. #define SGEMM_DEFAULT_R sgemm_r
  1273. #define DGEMM_DEFAULT_P 512
  1274. #define DGEMM_DEFAULT_R dgemm_r
  1275. #define QGEMM_DEFAULT_P 504
  1276. #define QGEMM_DEFAULT_R qgemm_r
  1277. #define CGEMM_DEFAULT_P 128
  1278. #define CGEMM_DEFAULT_R 1024
  1279. #define ZGEMM_DEFAULT_P 512
  1280. #define ZGEMM_DEFAULT_R zgemm_r
  1281. #define XGEMM_DEFAULT_P 252
  1282. #define XGEMM_DEFAULT_R xgemm_r
  1283. #define SGEMM_DEFAULT_Q 256
  1284. #define DGEMM_DEFAULT_Q 256
  1285. #define QGEMM_DEFAULT_Q 128
  1286. #define CGEMM_DEFAULT_Q 256
  1287. #define ZGEMM_DEFAULT_Q 192
  1288. #define XGEMM_DEFAULT_Q 128
  1289. #else
  1290. #define SGEMM_DEFAULT_P 320
  1291. #define DGEMM_DEFAULT_P 512
  1292. #define CGEMM_DEFAULT_P 256
  1293. #define ZGEMM_DEFAULT_P 192
  1294. #ifdef WINDOWS_ABI
  1295. #define SGEMM_DEFAULT_Q 320
  1296. #define DGEMM_DEFAULT_Q 128
  1297. #else
  1298. #define SGEMM_DEFAULT_Q 320
  1299. #define DGEMM_DEFAULT_Q 256
  1300. #endif
  1301. #define CGEMM_DEFAULT_Q 256
  1302. #define ZGEMM_DEFAULT_Q 192
  1303. #define SGEMM_DEFAULT_R sgemm_r
  1304. #define DGEMM_DEFAULT_R 13824
  1305. #define CGEMM_DEFAULT_R cgemm_r
  1306. #define ZGEMM_DEFAULT_R zgemm_r
  1307. #define QGEMM_DEFAULT_Q 128
  1308. #define QGEMM_DEFAULT_P 504
  1309. #define QGEMM_DEFAULT_R qgemm_r
  1310. #define XGEMM_DEFAULT_P 252
  1311. #define XGEMM_DEFAULT_R xgemm_r
  1312. #define XGEMM_DEFAULT_Q 128
  1313. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1314. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1315. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1316. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1317. #define CGEMM3M_DEFAULT_P 320
  1318. #define ZGEMM3M_DEFAULT_P 256
  1319. #define XGEMM3M_DEFAULT_P 112
  1320. #define CGEMM3M_DEFAULT_Q 320
  1321. #define ZGEMM3M_DEFAULT_Q 256
  1322. #define XGEMM3M_DEFAULT_Q 224
  1323. #define CGEMM3M_DEFAULT_R 12288
  1324. #define ZGEMM3M_DEFAULT_R 12288
  1325. #define XGEMM3M_DEFAULT_R 12288
  1326. #endif
  1327. #endif
  1328. #ifdef SKYLAKEX
  1329. #define SNUMOPT 16
  1330. #define DNUMOPT 8
  1331. #define GEMM_DEFAULT_OFFSET_A 0
  1332. #define GEMM_DEFAULT_OFFSET_B 0
  1333. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1334. #define SYMV_P 8
  1335. #if defined(XDOUBLE) || defined(DOUBLE)
  1336. #define SWITCH_RATIO 8
  1337. #define GEMM_PREFERED_SIZE 8
  1338. #else
  1339. #define SWITCH_RATIO 16
  1340. #define GEMM_PREFERED_SIZE 16
  1341. #endif
  1342. #define USE_SGEMM_KERNEL_DIRECT 1
  1343. #ifdef ARCH_X86
  1344. #define SGEMM_DEFAULT_UNROLL_M 4
  1345. #define DGEMM_DEFAULT_UNROLL_M 2
  1346. #define QGEMM_DEFAULT_UNROLL_M 2
  1347. #define CGEMM_DEFAULT_UNROLL_M 2
  1348. #define ZGEMM_DEFAULT_UNROLL_M 1
  1349. #define XGEMM_DEFAULT_UNROLL_M 1
  1350. #define SGEMM_DEFAULT_UNROLL_N 4
  1351. #define DGEMM_DEFAULT_UNROLL_N 4
  1352. #define QGEMM_DEFAULT_UNROLL_N 2
  1353. #define CGEMM_DEFAULT_UNROLL_N 2
  1354. #define ZGEMM_DEFAULT_UNROLL_N 2
  1355. #define XGEMM_DEFAULT_UNROLL_N 1
  1356. #else
  1357. #define SGEMM_DEFAULT_UNROLL_M 16
  1358. #define DGEMM_DEFAULT_UNROLL_M 16
  1359. #define QGEMM_DEFAULT_UNROLL_M 2
  1360. #define CGEMM_DEFAULT_UNROLL_M 8
  1361. #define ZGEMM_DEFAULT_UNROLL_M 4
  1362. #define XGEMM_DEFAULT_UNROLL_M 1
  1363. #define SGEMM_DEFAULT_UNROLL_N 4
  1364. #define DGEMM_DEFAULT_UNROLL_N 2
  1365. #define QGEMM_DEFAULT_UNROLL_N 2
  1366. #define CGEMM_DEFAULT_UNROLL_N 2
  1367. #define ZGEMM_DEFAULT_UNROLL_N 2
  1368. #define XGEMM_DEFAULT_UNROLL_N 1
  1369. #define SGEMM_DEFAULT_UNROLL_MN 32
  1370. #define DGEMM_DEFAULT_UNROLL_MN 32
  1371. #endif
  1372. #ifdef ARCH_X86
  1373. #define SGEMM_DEFAULT_P 512
  1374. #define SGEMM_DEFAULT_R sgemm_r
  1375. #define DGEMM_DEFAULT_P 512
  1376. #define DGEMM_DEFAULT_R dgemm_r
  1377. #define QGEMM_DEFAULT_P 504
  1378. #define QGEMM_DEFAULT_R qgemm_r
  1379. #define CGEMM_DEFAULT_P 128
  1380. #define CGEMM_DEFAULT_R 1024
  1381. #define ZGEMM_DEFAULT_P 512
  1382. #define ZGEMM_DEFAULT_R zgemm_r
  1383. #define XGEMM_DEFAULT_P 252
  1384. #define XGEMM_DEFAULT_R xgemm_r
  1385. #define SGEMM_DEFAULT_Q 256
  1386. #define DGEMM_DEFAULT_Q 256
  1387. #define QGEMM_DEFAULT_Q 128
  1388. #define CGEMM_DEFAULT_Q 256
  1389. #define ZGEMM_DEFAULT_Q 192
  1390. #define XGEMM_DEFAULT_Q 128
  1391. #else
  1392. #define SGEMM_DEFAULT_P 448
  1393. #define DGEMM_DEFAULT_P 192
  1394. #define CGEMM_DEFAULT_P 384
  1395. #define ZGEMM_DEFAULT_P 256
  1396. #define SGEMM_DEFAULT_Q 448
  1397. #define DGEMM_DEFAULT_Q 384
  1398. #define CGEMM_DEFAULT_Q 192
  1399. #define ZGEMM_DEFAULT_Q 128
  1400. #define SGEMM_DEFAULT_R sgemm_r
  1401. #define DGEMM_DEFAULT_R 8640
  1402. #define CGEMM_DEFAULT_R cgemm_r
  1403. #define ZGEMM_DEFAULT_R zgemm_r
  1404. #define QGEMM_DEFAULT_Q 128
  1405. #define QGEMM_DEFAULT_P 504
  1406. #define QGEMM_DEFAULT_R qgemm_r
  1407. #define XGEMM_DEFAULT_P 252
  1408. #define XGEMM_DEFAULT_R xgemm_r
  1409. #define XGEMM_DEFAULT_Q 128
  1410. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1411. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1412. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1413. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1414. #define CGEMM3M_DEFAULT_P 320
  1415. #define ZGEMM3M_DEFAULT_P 256
  1416. #define XGEMM3M_DEFAULT_P 112
  1417. #define CGEMM3M_DEFAULT_Q 320
  1418. #define ZGEMM3M_DEFAULT_Q 256
  1419. #define XGEMM3M_DEFAULT_Q 224
  1420. #define CGEMM3M_DEFAULT_R 12288
  1421. #define ZGEMM3M_DEFAULT_R 12288
  1422. #define XGEMM3M_DEFAULT_R 12288
  1423. #endif
  1424. #endif
  1425. #ifdef SAPPHIRERAPIDS
  1426. #define SNUMOPT 16
  1427. #define DNUMOPT 8
  1428. #define GEMM_DEFAULT_OFFSET_A 0
  1429. #define GEMM_DEFAULT_OFFSET_B 0
  1430. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1431. #define SYMV_P 8
  1432. #if defined(XDOUBLE) || defined(DOUBLE)
  1433. #define SWITCH_RATIO 8
  1434. #define GEMM_PREFERED_SIZE 8
  1435. #else
  1436. #define SWITCH_RATIO 16
  1437. #define GEMM_PREFERED_SIZE 16
  1438. #endif
  1439. #define USE_SGEMM_KERNEL_DIRECT 1
  1440. #undef SBGEMM_DEFAULT_UNROLL_N
  1441. #undef SBGEMM_DEFAULT_UNROLL_M
  1442. #undef SBGEMM_DEFAULT_P
  1443. #undef SBGEMM_DEFAULT_R
  1444. #undef SBGEMM_DEFAULT_Q
  1445. // FIXME: actually UNROLL_M = UNROLL_N = 16
  1446. // If M and N is equal, OpenBLAS will reuse OCOPY as ICOPY.
  1447. // But for AMX, they are not the same, set UNROLL_M = 32 to workaround
  1448. #define SBGEMM_DEFAULT_UNROLL_N 16
  1449. #define SBGEMM_DEFAULT_UNROLL_M 32
  1450. #define SBGEMM_DEFAULT_P 256
  1451. #define SBGEMM_DEFAULT_Q 1024
  1452. #define SBGEMM_DEFAULT_R sbgemm_r
  1453. #ifdef ARCH_X86
  1454. #define SGEMM_DEFAULT_UNROLL_M 4
  1455. #define DGEMM_DEFAULT_UNROLL_M 2
  1456. #define QGEMM_DEFAULT_UNROLL_M 2
  1457. #define CGEMM_DEFAULT_UNROLL_M 2
  1458. #define ZGEMM_DEFAULT_UNROLL_M 1
  1459. #define XGEMM_DEFAULT_UNROLL_M 1
  1460. #define SGEMM_DEFAULT_UNROLL_N 4
  1461. #define DGEMM_DEFAULT_UNROLL_N 4
  1462. #define QGEMM_DEFAULT_UNROLL_N 2
  1463. #define CGEMM_DEFAULT_UNROLL_N 2
  1464. #define ZGEMM_DEFAULT_UNROLL_N 2
  1465. #define XGEMM_DEFAULT_UNROLL_N 1
  1466. #else
  1467. #define SGEMM_DEFAULT_UNROLL_M 16
  1468. #define DGEMM_DEFAULT_UNROLL_M 16
  1469. #define QGEMM_DEFAULT_UNROLL_M 2
  1470. #define CGEMM_DEFAULT_UNROLL_M 8
  1471. #define ZGEMM_DEFAULT_UNROLL_M 4
  1472. #define XGEMM_DEFAULT_UNROLL_M 1
  1473. #define SGEMM_DEFAULT_UNROLL_N 4
  1474. #define DGEMM_DEFAULT_UNROLL_N 2
  1475. #define QGEMM_DEFAULT_UNROLL_N 2
  1476. #define CGEMM_DEFAULT_UNROLL_N 2
  1477. #define ZGEMM_DEFAULT_UNROLL_N 2
  1478. #define XGEMM_DEFAULT_UNROLL_N 1
  1479. #define SGEMM_DEFAULT_UNROLL_MN 32
  1480. #define DGEMM_DEFAULT_UNROLL_MN 32
  1481. #endif
  1482. #ifdef ARCH_X86
  1483. #define SGEMM_DEFAULT_P 512
  1484. #define SGEMM_DEFAULT_R sgemm_r
  1485. #define DGEMM_DEFAULT_P 512
  1486. #define DGEMM_DEFAULT_R dgemm_r
  1487. #define QGEMM_DEFAULT_P 504
  1488. #define QGEMM_DEFAULT_R qgemm_r
  1489. #define CGEMM_DEFAULT_P 128
  1490. #define CGEMM_DEFAULT_R 1024
  1491. #define ZGEMM_DEFAULT_P 512
  1492. #define ZGEMM_DEFAULT_R zgemm_r
  1493. #define XGEMM_DEFAULT_P 252
  1494. #define XGEMM_DEFAULT_R xgemm_r
  1495. #define SGEMM_DEFAULT_Q 256
  1496. #define DGEMM_DEFAULT_Q 256
  1497. #define QGEMM_DEFAULT_Q 128
  1498. #define CGEMM_DEFAULT_Q 256
  1499. #define ZGEMM_DEFAULT_Q 192
  1500. #define XGEMM_DEFAULT_Q 128
  1501. #else
  1502. #define SGEMM_DEFAULT_P 640
  1503. #define DGEMM_DEFAULT_P 192
  1504. #define CGEMM_DEFAULT_P 384
  1505. #define ZGEMM_DEFAULT_P 256
  1506. #define SGEMM_DEFAULT_Q 320
  1507. #define DGEMM_DEFAULT_Q 384
  1508. #define CGEMM_DEFAULT_Q 192
  1509. #define ZGEMM_DEFAULT_Q 128
  1510. #define SGEMM_DEFAULT_R sgemm_r
  1511. #define DGEMM_DEFAULT_R 8640
  1512. #define CGEMM_DEFAULT_R cgemm_r
  1513. #define ZGEMM_DEFAULT_R zgemm_r
  1514. #define QGEMM_DEFAULT_Q 128
  1515. #define QGEMM_DEFAULT_P 504
  1516. #define QGEMM_DEFAULT_R qgemm_r
  1517. #define XGEMM_DEFAULT_P 252
  1518. #define XGEMM_DEFAULT_R xgemm_r
  1519. #define XGEMM_DEFAULT_Q 128
  1520. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1521. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1522. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1523. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1524. #define CGEMM3M_DEFAULT_P 320
  1525. #define ZGEMM3M_DEFAULT_P 256
  1526. #define XGEMM3M_DEFAULT_P 112
  1527. #define CGEMM3M_DEFAULT_Q 320
  1528. #define ZGEMM3M_DEFAULT_Q 256
  1529. #define XGEMM3M_DEFAULT_Q 224
  1530. #define CGEMM3M_DEFAULT_R 12288
  1531. #define ZGEMM3M_DEFAULT_R 12288
  1532. #define XGEMM3M_DEFAULT_R 12288
  1533. #endif
  1534. #endif
  1535. #ifdef COOPERLAKE
  1536. #define SNUMOPT 16
  1537. #define DNUMOPT 8
  1538. #define GEMM_DEFAULT_OFFSET_A 0
  1539. #define GEMM_DEFAULT_OFFSET_B 0
  1540. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1541. #define SYMV_P 8
  1542. #if defined(XDOUBLE) || defined(DOUBLE)
  1543. #define SWITCH_RATIO 8
  1544. #define GEMM_PREFERED_SIZE 8
  1545. #else
  1546. #define SWITCH_RATIO 16
  1547. #define GEMM_PREFERED_SIZE 16
  1548. #endif
  1549. #define USE_SGEMM_KERNEL_DIRECT 1
  1550. #undef SBGEMM_DEFAULT_UNROLL_N
  1551. #undef SBGEMM_DEFAULT_UNROLL_M
  1552. #undef SBGEMM_DEFAULT_P
  1553. #undef SBGEMM_DEFAULT_R
  1554. #undef SBGEMM_DEFAULT_Q
  1555. #define SBGEMM_DEFAULT_UNROLL_N 4
  1556. #define SBGEMM_DEFAULT_UNROLL_M 16
  1557. #define SBGEMM_DEFAULT_P 384
  1558. #define SBGEMM_DEFAULT_Q 768
  1559. #define SBGEMM_DEFAULT_R sbgemm_r
  1560. #ifdef ARCH_X86
  1561. #define SGEMM_DEFAULT_UNROLL_M 4
  1562. #define DGEMM_DEFAULT_UNROLL_M 2
  1563. #define QGEMM_DEFAULT_UNROLL_M 2
  1564. #define CGEMM_DEFAULT_UNROLL_M 2
  1565. #define ZGEMM_DEFAULT_UNROLL_M 1
  1566. #define XGEMM_DEFAULT_UNROLL_M 1
  1567. #define SGEMM_DEFAULT_UNROLL_N 4
  1568. #define DGEMM_DEFAULT_UNROLL_N 4
  1569. #define QGEMM_DEFAULT_UNROLL_N 2
  1570. #define CGEMM_DEFAULT_UNROLL_N 2
  1571. #define ZGEMM_DEFAULT_UNROLL_N 2
  1572. #define XGEMM_DEFAULT_UNROLL_N 1
  1573. #else
  1574. #define SGEMM_DEFAULT_UNROLL_M 16
  1575. #define DGEMM_DEFAULT_UNROLL_M 16
  1576. #define QGEMM_DEFAULT_UNROLL_M 2
  1577. #define CGEMM_DEFAULT_UNROLL_M 8
  1578. #define ZGEMM_DEFAULT_UNROLL_M 4
  1579. #define XGEMM_DEFAULT_UNROLL_M 1
  1580. #define SGEMM_DEFAULT_UNROLL_N 4
  1581. #define DGEMM_DEFAULT_UNROLL_N 2
  1582. #define QGEMM_DEFAULT_UNROLL_N 2
  1583. #define CGEMM_DEFAULT_UNROLL_N 2
  1584. #define ZGEMM_DEFAULT_UNROLL_N 2
  1585. #define XGEMM_DEFAULT_UNROLL_N 1
  1586. #define SGEMM_DEFAULT_UNROLL_MN 32
  1587. #define DGEMM_DEFAULT_UNROLL_MN 32
  1588. #endif
  1589. #ifdef ARCH_X86
  1590. #define SGEMM_DEFAULT_P 512
  1591. #define SGEMM_DEFAULT_R sgemm_r
  1592. #define DGEMM_DEFAULT_P 512
  1593. #define DGEMM_DEFAULT_R dgemm_r
  1594. #define QGEMM_DEFAULT_P 504
  1595. #define QGEMM_DEFAULT_R qgemm_r
  1596. #define CGEMM_DEFAULT_P 128
  1597. #define CGEMM_DEFAULT_R 1024
  1598. #define ZGEMM_DEFAULT_P 512
  1599. #define ZGEMM_DEFAULT_R zgemm_r
  1600. #define XGEMM_DEFAULT_P 252
  1601. #define XGEMM_DEFAULT_R xgemm_r
  1602. #define SGEMM_DEFAULT_Q 256
  1603. #define DGEMM_DEFAULT_Q 256
  1604. #define QGEMM_DEFAULT_Q 128
  1605. #define CGEMM_DEFAULT_Q 256
  1606. #define ZGEMM_DEFAULT_Q 192
  1607. #define XGEMM_DEFAULT_Q 128
  1608. #else
  1609. #define SGEMM_DEFAULT_P 640
  1610. #define DGEMM_DEFAULT_P 192
  1611. #define CGEMM_DEFAULT_P 384
  1612. #define ZGEMM_DEFAULT_P 256
  1613. #define SGEMM_DEFAULT_Q 320
  1614. #define DGEMM_DEFAULT_Q 384
  1615. #define CGEMM_DEFAULT_Q 192
  1616. #define ZGEMM_DEFAULT_Q 128
  1617. #define SGEMM_DEFAULT_R sgemm_r
  1618. #define DGEMM_DEFAULT_R 8640
  1619. #define CGEMM_DEFAULT_R cgemm_r
  1620. #define ZGEMM_DEFAULT_R zgemm_r
  1621. #define QGEMM_DEFAULT_Q 128
  1622. #define QGEMM_DEFAULT_P 504
  1623. #define QGEMM_DEFAULT_R qgemm_r
  1624. #define XGEMM_DEFAULT_P 252
  1625. #define XGEMM_DEFAULT_R xgemm_r
  1626. #define XGEMM_DEFAULT_Q 128
  1627. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1628. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1629. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1630. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1631. #define CGEMM3M_DEFAULT_P 320
  1632. #define ZGEMM3M_DEFAULT_P 256
  1633. #define XGEMM3M_DEFAULT_P 112
  1634. #define CGEMM3M_DEFAULT_Q 320
  1635. #define ZGEMM3M_DEFAULT_Q 256
  1636. #define XGEMM3M_DEFAULT_Q 224
  1637. #define CGEMM3M_DEFAULT_R 12288
  1638. #define ZGEMM3M_DEFAULT_R 12288
  1639. #define XGEMM3M_DEFAULT_R 12288
  1640. #endif
  1641. #endif
  1642. #ifdef ATOM
  1643. #define SNUMOPT 2
  1644. #define DNUMOPT 1
  1645. #define GEMM_DEFAULT_OFFSET_A 64
  1646. #define GEMM_DEFAULT_OFFSET_B 0
  1647. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1648. #define SYMV_P 8
  1649. #ifdef ARCH_X86
  1650. #define SGEMM_DEFAULT_UNROLL_M 4
  1651. #define DGEMM_DEFAULT_UNROLL_M 2
  1652. #define QGEMM_DEFAULT_UNROLL_M 2
  1653. #define CGEMM_DEFAULT_UNROLL_M 2
  1654. #define ZGEMM_DEFAULT_UNROLL_M 1
  1655. #define XGEMM_DEFAULT_UNROLL_M 1
  1656. #else
  1657. #define SGEMM_DEFAULT_UNROLL_M 8
  1658. #define DGEMM_DEFAULT_UNROLL_M 4
  1659. #define QGEMM_DEFAULT_UNROLL_M 2
  1660. #define CGEMM_DEFAULT_UNROLL_M 4
  1661. #define ZGEMM_DEFAULT_UNROLL_M 2
  1662. #define XGEMM_DEFAULT_UNROLL_M 1
  1663. #endif
  1664. #define SGEMM_DEFAULT_UNROLL_N 4
  1665. #define DGEMM_DEFAULT_UNROLL_N 2
  1666. #define QGEMM_DEFAULT_UNROLL_N 2
  1667. #define CGEMM_DEFAULT_UNROLL_N 2
  1668. #define ZGEMM_DEFAULT_UNROLL_N 1
  1669. #define XGEMM_DEFAULT_UNROLL_N 1
  1670. #define SGEMM_DEFAULT_P sgemm_p
  1671. #define SGEMM_DEFAULT_R sgemm_r
  1672. #define DGEMM_DEFAULT_P dgemm_p
  1673. #define DGEMM_DEFAULT_R dgemm_r
  1674. #define QGEMM_DEFAULT_P qgemm_p
  1675. #define QGEMM_DEFAULT_R qgemm_r
  1676. #define CGEMM_DEFAULT_P cgemm_p
  1677. #define CGEMM_DEFAULT_R cgemm_r
  1678. #define ZGEMM_DEFAULT_P zgemm_p
  1679. #define ZGEMM_DEFAULT_R zgemm_r
  1680. #define XGEMM_DEFAULT_P xgemm_p
  1681. #define XGEMM_DEFAULT_R xgemm_r
  1682. #define SGEMM_DEFAULT_Q 256
  1683. #define DGEMM_DEFAULT_Q 256
  1684. #define QGEMM_DEFAULT_Q 256
  1685. #define CGEMM_DEFAULT_Q 256
  1686. #define ZGEMM_DEFAULT_Q 256
  1687. #define XGEMM_DEFAULT_Q 256
  1688. #endif
  1689. #ifdef ITANIUM2
  1690. #define SNUMOPT 4
  1691. #define DNUMOPT 4
  1692. #define GEMM_DEFAULT_OFFSET_A 0
  1693. #define GEMM_DEFAULT_OFFSET_B 128
  1694. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1695. #define SGEMM_DEFAULT_UNROLL_M 8
  1696. #define SGEMM_DEFAULT_UNROLL_N 8
  1697. #define DGEMM_DEFAULT_UNROLL_M 8
  1698. #define DGEMM_DEFAULT_UNROLL_N 8
  1699. #define QGEMM_DEFAULT_UNROLL_M 8
  1700. #define QGEMM_DEFAULT_UNROLL_N 8
  1701. #define CGEMM_DEFAULT_UNROLL_M 4
  1702. #define CGEMM_DEFAULT_UNROLL_N 4
  1703. #define ZGEMM_DEFAULT_UNROLL_M 4
  1704. #define ZGEMM_DEFAULT_UNROLL_N 4
  1705. #define XGEMM_DEFAULT_UNROLL_M 4
  1706. #define XGEMM_DEFAULT_UNROLL_N 4
  1707. #define SGEMM_DEFAULT_P sgemm_p
  1708. #define DGEMM_DEFAULT_P dgemm_p
  1709. #define QGEMM_DEFAULT_P qgemm_p
  1710. #define CGEMM_DEFAULT_P cgemm_p
  1711. #define ZGEMM_DEFAULT_P zgemm_p
  1712. #define XGEMM_DEFAULT_P xgemm_p
  1713. #define SGEMM_DEFAULT_Q 1024
  1714. #define DGEMM_DEFAULT_Q 1024
  1715. #define QGEMM_DEFAULT_Q 1024
  1716. #define CGEMM_DEFAULT_Q 1024
  1717. #define ZGEMM_DEFAULT_Q 1024
  1718. #define XGEMM_DEFAULT_Q 1024
  1719. #define SGEMM_DEFAULT_R sgemm_r
  1720. #define DGEMM_DEFAULT_R dgemm_r
  1721. #define QGEMM_DEFAULT_R qgemm_r
  1722. #define CGEMM_DEFAULT_R cgemm_r
  1723. #define ZGEMM_DEFAULT_R zgemm_r
  1724. #define XGEMM_DEFAULT_R xgemm_r
  1725. #define SYMV_P 16
  1726. #define GETRF_FACTOR 0.65
  1727. #endif
  1728. #if defined(EV4) || defined(EV5) || defined(EV6)
  1729. #ifdef EV4
  1730. #define SNUMOPT 1
  1731. #define DNUMOPT 1
  1732. #else
  1733. #define SNUMOPT 2
  1734. #define DNUMOPT 2
  1735. #endif
  1736. #define GEMM_DEFAULT_OFFSET_A 512
  1737. #define GEMM_DEFAULT_OFFSET_B 512
  1738. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1739. #define SGEMM_DEFAULT_UNROLL_M 4
  1740. #define SGEMM_DEFAULT_UNROLL_N 4
  1741. #define DGEMM_DEFAULT_UNROLL_M 4
  1742. #define DGEMM_DEFAULT_UNROLL_N 4
  1743. #define CGEMM_DEFAULT_UNROLL_M 2
  1744. #define CGEMM_DEFAULT_UNROLL_N 2
  1745. #define ZGEMM_DEFAULT_UNROLL_M 2
  1746. #define ZGEMM_DEFAULT_UNROLL_N 2
  1747. #define SYMV_P 8
  1748. #ifdef EV4
  1749. #define SGEMM_DEFAULT_P 32
  1750. #define SGEMM_DEFAULT_Q 112
  1751. #define SGEMM_DEFAULT_R 256
  1752. #define DGEMM_DEFAULT_P 32
  1753. #define DGEMM_DEFAULT_Q 56
  1754. #define DGEMM_DEFAULT_R 256
  1755. #define CGEMM_DEFAULT_P 32
  1756. #define CGEMM_DEFAULT_Q 64
  1757. #define CGEMM_DEFAULT_R 240
  1758. #define ZGEMM_DEFAULT_P 32
  1759. #define ZGEMM_DEFAULT_Q 32
  1760. #define ZGEMM_DEFAULT_R 240
  1761. #endif
  1762. #ifdef EV5
  1763. #define SGEMM_DEFAULT_P 64
  1764. #define SGEMM_DEFAULT_Q 256
  1765. #define DGEMM_DEFAULT_P 64
  1766. #define DGEMM_DEFAULT_Q 128
  1767. #define CGEMM_DEFAULT_P 64
  1768. #define CGEMM_DEFAULT_Q 128
  1769. #define ZGEMM_DEFAULT_P 64
  1770. #define ZGEMM_DEFAULT_Q 64
  1771. #endif
  1772. #ifdef EV6
  1773. #define SGEMM_DEFAULT_P 256
  1774. #define SGEMM_DEFAULT_Q 512
  1775. #define DGEMM_DEFAULT_P 256
  1776. #define DGEMM_DEFAULT_Q 256
  1777. #define CGEMM_DEFAULT_P 256
  1778. #define CGEMM_DEFAULT_Q 256
  1779. #define ZGEMM_DEFAULT_P 128
  1780. #define ZGEMM_DEFAULT_Q 256
  1781. #endif
  1782. #endif
  1783. #ifdef CELL
  1784. #define SNUMOPT 2
  1785. #define DNUMOPT 2
  1786. #define GEMM_DEFAULT_OFFSET_A 0
  1787. #define GEMM_DEFAULT_OFFSET_B 8192
  1788. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1789. #define SGEMM_DEFAULT_UNROLL_M 16
  1790. #define SGEMM_DEFAULT_UNROLL_N 4
  1791. #define DGEMM_DEFAULT_UNROLL_M 4
  1792. #define DGEMM_DEFAULT_UNROLL_N 4
  1793. #define CGEMM_DEFAULT_UNROLL_M 8
  1794. #define CGEMM_DEFAULT_UNROLL_N 2
  1795. #define ZGEMM_DEFAULT_UNROLL_M 2
  1796. #define ZGEMM_DEFAULT_UNROLL_N 2
  1797. #define SGEMM_DEFAULT_P 128
  1798. #define DGEMM_DEFAULT_P 128
  1799. #define CGEMM_DEFAULT_P 128
  1800. #define ZGEMM_DEFAULT_P 128
  1801. #define SGEMM_DEFAULT_Q 512
  1802. #define DGEMM_DEFAULT_Q 256
  1803. #define CGEMM_DEFAULT_Q 256
  1804. #define ZGEMM_DEFAULT_Q 128
  1805. #define SYMV_P 4
  1806. #endif
  1807. #ifdef PPCG4
  1808. #define GEMM_DEFAULT_OFFSET_A 0
  1809. #define GEMM_DEFAULT_OFFSET_B 1024
  1810. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1811. #define SGEMM_DEFAULT_UNROLL_M 4
  1812. #define SGEMM_DEFAULT_UNROLL_N 4
  1813. #define DGEMM_DEFAULT_UNROLL_M 4
  1814. #define DGEMM_DEFAULT_UNROLL_N 4
  1815. #define CGEMM_DEFAULT_UNROLL_M 2
  1816. #define CGEMM_DEFAULT_UNROLL_N 2
  1817. #define ZGEMM_DEFAULT_UNROLL_M 2
  1818. #define ZGEMM_DEFAULT_UNROLL_N 2
  1819. #define SGEMM_DEFAULT_P 256
  1820. #define DGEMM_DEFAULT_P 128
  1821. #define CGEMM_DEFAULT_P 128
  1822. #define ZGEMM_DEFAULT_P 64
  1823. #define SGEMM_DEFAULT_Q 256
  1824. #define DGEMM_DEFAULT_Q 256
  1825. #define CGEMM_DEFAULT_Q 256
  1826. #define ZGEMM_DEFAULT_Q 256
  1827. #define SYMV_P 4
  1828. #endif
  1829. #ifdef PPC970
  1830. #define SNUMOPT 4
  1831. #define DNUMOPT 4
  1832. #define GEMM_DEFAULT_OFFSET_A 2688
  1833. #define GEMM_DEFAULT_OFFSET_B 3072
  1834. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1835. #if defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
  1836. #define SGEMM_DEFAULT_UNROLL_M 4
  1837. #else
  1838. #define SGEMM_DEFAULT_UNROLL_M 16
  1839. #endif
  1840. #define SGEMM_DEFAULT_UNROLL_N 4
  1841. #define DGEMM_DEFAULT_UNROLL_M 4
  1842. #define DGEMM_DEFAULT_UNROLL_N 4
  1843. #if defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
  1844. #define CGEMM_DEFAULT_UNROLL_M 2
  1845. #else
  1846. #define CGEMM_DEFAULT_UNROLL_M 8
  1847. #endif
  1848. #define CGEMM_DEFAULT_UNROLL_N 2
  1849. #define ZGEMM_DEFAULT_UNROLL_M 2
  1850. #define ZGEMM_DEFAULT_UNROLL_N 2
  1851. #if defined(OS_LINUX) || defined(OS_DARWIN) || defined(OS_FREEBSD)
  1852. #if L2_SIZE == 1024976
  1853. #define SGEMM_DEFAULT_P 320
  1854. #define DGEMM_DEFAULT_P 256
  1855. #define CGEMM_DEFAULT_P 256
  1856. #define ZGEMM_DEFAULT_P 256
  1857. #else
  1858. #define SGEMM_DEFAULT_P 176
  1859. #define DGEMM_DEFAULT_P 176
  1860. #define CGEMM_DEFAULT_P 176
  1861. #define ZGEMM_DEFAULT_P 176
  1862. #endif
  1863. #endif
  1864. #define SGEMM_DEFAULT_Q 512
  1865. #define DGEMM_DEFAULT_Q 256
  1866. #define CGEMM_DEFAULT_Q 256
  1867. #define ZGEMM_DEFAULT_Q 128
  1868. #define SYMV_P 4
  1869. #endif
  1870. #ifdef PPC440
  1871. #define SNUMOPT 2
  1872. #define DNUMOPT 2
  1873. #define GEMM_DEFAULT_OFFSET_A (32 * 0)
  1874. #define GEMM_DEFAULT_OFFSET_B (32 * 0)
  1875. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1876. #define SGEMM_DEFAULT_UNROLL_M 4
  1877. #define SGEMM_DEFAULT_UNROLL_N 4
  1878. #define DGEMM_DEFAULT_UNROLL_M 4
  1879. #define DGEMM_DEFAULT_UNROLL_N 4
  1880. #define CGEMM_DEFAULT_UNROLL_M 2
  1881. #define CGEMM_DEFAULT_UNROLL_N 2
  1882. #define ZGEMM_DEFAULT_UNROLL_M 2
  1883. #define ZGEMM_DEFAULT_UNROLL_N 2
  1884. #define SGEMM_DEFAULT_P 512
  1885. #define DGEMM_DEFAULT_P 512
  1886. #define CGEMM_DEFAULT_P 512
  1887. #define ZGEMM_DEFAULT_P 512
  1888. #define SGEMM_DEFAULT_Q 1024
  1889. #define DGEMM_DEFAULT_Q 512
  1890. #define CGEMM_DEFAULT_Q 512
  1891. #define ZGEMM_DEFAULT_Q 256
  1892. #define SGEMM_DEFAULT_R SGEMM_DEFAULT_P
  1893. #define DGEMM_DEFAULT_R DGEMM_DEFAULT_P
  1894. #define CGEMM_DEFAULT_R CGEMM_DEFAULT_P
  1895. #define ZGEMM_DEFAULT_R ZGEMM_DEFAULT_P
  1896. #define SYMV_P 4
  1897. #endif
  1898. #ifdef PPC440FP2
  1899. #define SNUMOPT 4
  1900. #define DNUMOPT 4
  1901. #define GEMM_DEFAULT_OFFSET_A (32 * 0)
  1902. #define GEMM_DEFAULT_OFFSET_B (32 * 0)
  1903. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1904. #define SGEMM_DEFAULT_UNROLL_M 8
  1905. #define SGEMM_DEFAULT_UNROLL_N 4
  1906. #define DGEMM_DEFAULT_UNROLL_M 8
  1907. #define DGEMM_DEFAULT_UNROLL_N 4
  1908. #define CGEMM_DEFAULT_UNROLL_M 4
  1909. #define CGEMM_DEFAULT_UNROLL_N 2
  1910. #define ZGEMM_DEFAULT_UNROLL_M 4
  1911. #define ZGEMM_DEFAULT_UNROLL_N 2
  1912. #define SGEMM_DEFAULT_P 128
  1913. #define DGEMM_DEFAULT_P 128
  1914. #define CGEMM_DEFAULT_P 128
  1915. #define ZGEMM_DEFAULT_P 128
  1916. #if 1
  1917. #define SGEMM_DEFAULT_Q 4096
  1918. #define DGEMM_DEFAULT_Q 3072
  1919. #define CGEMM_DEFAULT_Q 2048
  1920. #define ZGEMM_DEFAULT_Q 1024
  1921. #else
  1922. #define SGEMM_DEFAULT_Q 512
  1923. #define DGEMM_DEFAULT_Q 256
  1924. #define CGEMM_DEFAULT_Q 256
  1925. #define ZGEMM_DEFAULT_Q 128
  1926. #endif
  1927. #define SYMV_P 4
  1928. #endif
  1929. #if defined(POWER3) || defined(POWER4) || defined(POWER5)
  1930. #define GEMM_DEFAULT_OFFSET_A 0
  1931. #define GEMM_DEFAULT_OFFSET_B 2048
  1932. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1933. #define SGEMM_DEFAULT_UNROLL_M 4
  1934. #define SGEMM_DEFAULT_UNROLL_N 4
  1935. #define DGEMM_DEFAULT_UNROLL_M 4
  1936. #define DGEMM_DEFAULT_UNROLL_N 4
  1937. #define CGEMM_DEFAULT_UNROLL_M 2
  1938. #define CGEMM_DEFAULT_UNROLL_N 2
  1939. #define ZGEMM_DEFAULT_UNROLL_M 2
  1940. #define ZGEMM_DEFAULT_UNROLL_N 2
  1941. #ifdef POWER3
  1942. #define SNUMOPT 4
  1943. #define DNUMOPT 4
  1944. #define SGEMM_DEFAULT_P 256
  1945. #define SGEMM_DEFAULT_Q 432
  1946. #define SGEMM_DEFAULT_R 1012
  1947. #define DGEMM_DEFAULT_P 256
  1948. #define DGEMM_DEFAULT_Q 216
  1949. #define DGEMM_DEFAULT_R 1012
  1950. #define CGEMM_DEFAULT_P 256
  1951. #define CGEMM_DEFAULT_Q 104
  1952. #define CGEMM_DEFAULT_R 1012
  1953. #define ZGEMM_DEFAULT_P 256
  1954. #define ZGEMM_DEFAULT_Q 104
  1955. #define ZGEMM_DEFAULT_R 1012
  1956. #endif
  1957. #if defined(POWER4)
  1958. #ifdef ALLOC_HUGETLB
  1959. #define SGEMM_DEFAULT_P 184
  1960. #define DGEMM_DEFAULT_P 184
  1961. #define CGEMM_DEFAULT_P 184
  1962. #define ZGEMM_DEFAULT_P 184
  1963. #else
  1964. #define SGEMM_DEFAULT_P 144
  1965. #define DGEMM_DEFAULT_P 144
  1966. #define CGEMM_DEFAULT_P 144
  1967. #define ZGEMM_DEFAULT_P 144
  1968. #endif
  1969. #define SGEMM_DEFAULT_Q 256
  1970. #define CGEMM_DEFAULT_Q 256
  1971. #define DGEMM_DEFAULT_Q 256
  1972. #define ZGEMM_DEFAULT_Q 256
  1973. #endif
  1974. #if defined(POWER5)
  1975. #ifdef ALLOC_HUGETLB
  1976. #define SGEMM_DEFAULT_P 512
  1977. #define DGEMM_DEFAULT_P 256
  1978. #define CGEMM_DEFAULT_P 256
  1979. #define ZGEMM_DEFAULT_P 128
  1980. #else
  1981. #define SGEMM_DEFAULT_P 320
  1982. #define DGEMM_DEFAULT_P 160
  1983. #define CGEMM_DEFAULT_P 160
  1984. #define ZGEMM_DEFAULT_P 80
  1985. #endif
  1986. #define SGEMM_DEFAULT_Q 256
  1987. #define CGEMM_DEFAULT_Q 256
  1988. #define DGEMM_DEFAULT_Q 256
  1989. #define ZGEMM_DEFAULT_Q 256
  1990. #endif
  1991. #define SYMV_P 8
  1992. #endif
  1993. #if defined(POWER6)
  1994. #define SNUMOPT 4
  1995. #define DNUMOPT 4
  1996. #define GEMM_DEFAULT_OFFSET_A 384
  1997. #define GEMM_DEFAULT_OFFSET_B 1024
  1998. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1999. #define SGEMM_DEFAULT_UNROLL_M 4
  2000. #define SGEMM_DEFAULT_UNROLL_N 4
  2001. #define DGEMM_DEFAULT_UNROLL_M 4
  2002. #define DGEMM_DEFAULT_UNROLL_N 4
  2003. #define CGEMM_DEFAULT_UNROLL_M 2
  2004. #define CGEMM_DEFAULT_UNROLL_N 4
  2005. #define ZGEMM_DEFAULT_UNROLL_M 2
  2006. #define ZGEMM_DEFAULT_UNROLL_N 4
  2007. #define SGEMM_DEFAULT_P 992
  2008. #define DGEMM_DEFAULT_P 480
  2009. #define CGEMM_DEFAULT_P 488
  2010. #define ZGEMM_DEFAULT_P 248
  2011. #define SGEMM_DEFAULT_Q 504
  2012. #define DGEMM_DEFAULT_Q 504
  2013. #define CGEMM_DEFAULT_Q 400
  2014. #define ZGEMM_DEFAULT_Q 400
  2015. #define SYMV_P 8
  2016. #endif
  2017. #if defined(POWER8) || (defined(POWER9) && defined(OS_AIX))
  2018. #define SNUMOPT 16
  2019. #define DNUMOPT 8
  2020. #define GEMM_DEFAULT_OFFSET_A 0
  2021. #define GEMM_DEFAULT_OFFSET_B 65536
  2022. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2023. #if defined(__32BIT__)
  2024. #warning using BINARY32==POWER6
  2025. #define SGEMM_DEFAULT_UNROLL_M 4
  2026. #define SGEMM_DEFAULT_UNROLL_N 4
  2027. #define DGEMM_DEFAULT_UNROLL_M 4
  2028. #define DGEMM_DEFAULT_UNROLL_N 4
  2029. #define CGEMM_DEFAULT_UNROLL_M 2
  2030. #define CGEMM_DEFAULT_UNROLL_N 4
  2031. #define ZGEMM_DEFAULT_UNROLL_M 2
  2032. #define ZGEMM_DEFAULT_UNROLL_N 4
  2033. #else
  2034. #define SGEMM_DEFAULT_UNROLL_M 16
  2035. #define SGEMM_DEFAULT_UNROLL_N 8
  2036. #define DGEMM_DEFAULT_UNROLL_M 16
  2037. #define DGEMM_DEFAULT_UNROLL_N 4
  2038. #define CGEMM_DEFAULT_UNROLL_M 8
  2039. #define CGEMM_DEFAULT_UNROLL_N 4
  2040. #define ZGEMM_DEFAULT_UNROLL_M 8
  2041. #define ZGEMM_DEFAULT_UNROLL_N 2
  2042. #endif
  2043. #define SGEMM_DEFAULT_P 1280UL
  2044. #define DGEMM_DEFAULT_P 640UL
  2045. #define CGEMM_DEFAULT_P 640UL
  2046. #define ZGEMM_DEFAULT_P 320UL
  2047. #define SGEMM_DEFAULT_Q 640UL
  2048. #define DGEMM_DEFAULT_Q 720UL
  2049. #define CGEMM_DEFAULT_Q 640UL
  2050. #define ZGEMM_DEFAULT_Q 640UL
  2051. #if 0
  2052. #define SGEMM_DEFAULT_R SGEMM_DEFAULT_P
  2053. #define DGEMM_DEFAULT_R DGEMM_DEFAULT_P
  2054. #define CGEMM_DEFAULT_R CGEMM_DEFAULT_P
  2055. #define ZGEMM_DEFAULT_R ZGEMM_DEFAULT_P
  2056. #endif
  2057. #define SGEMM_DEFAULT_R 4096
  2058. #define DGEMM_DEFAULT_R 4096
  2059. #define CGEMM_DEFAULT_R 4096
  2060. #define ZGEMM_DEFAULT_R 4096
  2061. #define SYMV_P 8
  2062. #endif
  2063. #if defined(POWER9) && (defined(OS_LINUX) || defined(OS_FREEBSD))
  2064. #define SNUMOPT 16
  2065. #define DNUMOPT 8
  2066. #define GEMM_DEFAULT_OFFSET_A 0
  2067. #define GEMM_DEFAULT_OFFSET_B 65536
  2068. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2069. #define SWITCH_RATIO 16
  2070. #define GEMM_PREFERED_SIZE 16
  2071. #define SGEMM_DEFAULT_UNROLL_M 16
  2072. #define SGEMM_DEFAULT_UNROLL_N 8
  2073. #define DGEMM_DEFAULT_UNROLL_M 16
  2074. #define DGEMM_DEFAULT_UNROLL_N 4
  2075. #define CGEMM_DEFAULT_UNROLL_M 8
  2076. #define CGEMM_DEFAULT_UNROLL_N 4
  2077. #define ZGEMM_DEFAULT_UNROLL_M 8
  2078. #define ZGEMM_DEFAULT_UNROLL_N 2
  2079. #define SGEMM_DEFAULT_P 832
  2080. #define DGEMM_DEFAULT_P 128
  2081. #define CGEMM_DEFAULT_P 512
  2082. #define ZGEMM_DEFAULT_P 256
  2083. #define SGEMM_DEFAULT_Q 1026
  2084. #define DGEMM_DEFAULT_Q 384
  2085. #define CGEMM_DEFAULT_Q 1026
  2086. #define ZGEMM_DEFAULT_Q 1026
  2087. #define SGEMM_DEFAULT_R 4096
  2088. #define DGEMM_DEFAULT_R 4096
  2089. #define CGEMM_DEFAULT_R 4096
  2090. #define ZGEMM_DEFAULT_R 4096
  2091. #define SYMV_P 8
  2092. #endif
  2093. #if defined(POWER10)
  2094. #define SNUMOPT 16
  2095. #define DNUMOPT 8
  2096. #define GEMM_DEFAULT_OFFSET_A 0
  2097. #define GEMM_DEFAULT_OFFSET_B 65536
  2098. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2099. #define SWITCH_RATIO 16
  2100. #define GEMM_PREFERED_SIZE 16
  2101. #define SGEMM_DEFAULT_UNROLL_M 16
  2102. #define SGEMM_DEFAULT_UNROLL_N 8
  2103. #define DGEMM_DEFAULT_UNROLL_M 8
  2104. #define DGEMM_DEFAULT_UNROLL_N 8
  2105. #define CGEMM_DEFAULT_UNROLL_M 8
  2106. #define CGEMM_DEFAULT_UNROLL_N 4
  2107. #define ZGEMM_DEFAULT_UNROLL_M 8
  2108. #define ZGEMM_DEFAULT_UNROLL_N 2
  2109. #define SGEMM_DEFAULT_P 512
  2110. #define DGEMM_DEFAULT_P 384
  2111. #define CGEMM_DEFAULT_P 512
  2112. #define ZGEMM_DEFAULT_P 256
  2113. #define SGEMM_DEFAULT_Q 512
  2114. #define DGEMM_DEFAULT_Q 512
  2115. #define CGEMM_DEFAULT_Q 384
  2116. #define ZGEMM_DEFAULT_Q 384
  2117. #define SGEMM_DEFAULT_R 4096
  2118. #define DGEMM_DEFAULT_R 4096
  2119. #define CGEMM_DEFAULT_R 4096
  2120. #define ZGEMM_DEFAULT_R 4096
  2121. #define SYMV_P 8
  2122. #undef SBGEMM_DEFAULT_UNROLL_N
  2123. #undef SBGEMM_DEFAULT_UNROLL_M
  2124. #undef SBGEMM_DEFAULT_P
  2125. #undef SBGEMM_DEFAULT_R
  2126. #undef SBGEMM_DEFAULT_Q
  2127. #define SBGEMM_DEFAULT_UNROLL_M 16
  2128. #define SBGEMM_DEFAULT_UNROLL_N 8
  2129. #define SBGEMM_DEFAULT_P 512
  2130. #define SBGEMM_DEFAULT_Q 1024
  2131. #define SBGEMM_DEFAULT_R 4096
  2132. #endif
  2133. #if defined(SPARC) && defined(V7)
  2134. #define SNUMOPT 4
  2135. #define DNUMOPT 4
  2136. #define GEMM_DEFAULT_OFFSET_A 0
  2137. #define GEMM_DEFAULT_OFFSET_B 2048
  2138. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2139. #define SGEMM_DEFAULT_UNROLL_M 2
  2140. #define SGEMM_DEFAULT_UNROLL_N 8
  2141. #define DGEMM_DEFAULT_UNROLL_M 2
  2142. #define DGEMM_DEFAULT_UNROLL_N 8
  2143. #define CGEMM_DEFAULT_UNROLL_M 1
  2144. #define CGEMM_DEFAULT_UNROLL_N 4
  2145. #define ZGEMM_DEFAULT_UNROLL_M 1
  2146. #define ZGEMM_DEFAULT_UNROLL_N 4
  2147. #define SGEMM_DEFAULT_P 256
  2148. #define DGEMM_DEFAULT_P 256
  2149. #define CGEMM_DEFAULT_P 256
  2150. #define ZGEMM_DEFAULT_P 256
  2151. #define SGEMM_DEFAULT_Q 512
  2152. #define DGEMM_DEFAULT_Q 256
  2153. #define CGEMM_DEFAULT_Q 256
  2154. #define ZGEMM_DEFAULT_Q 128
  2155. #define SYMV_P 8
  2156. #define GEMM_THREAD gemm_thread_mn
  2157. #endif
  2158. #if (defined(SPARC) && defined(V9)) || defined(__sparc_v9__)
  2159. #define SNUMOPT 2
  2160. #define DNUMOPT 2
  2161. #define GEMM_DEFAULT_OFFSET_A 0
  2162. #define GEMM_DEFAULT_OFFSET_B 2048
  2163. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2164. #define SGEMM_DEFAULT_UNROLL_M 4
  2165. #define SGEMM_DEFAULT_UNROLL_N 4
  2166. #define DGEMM_DEFAULT_UNROLL_M 4
  2167. #define DGEMM_DEFAULT_UNROLL_N 4
  2168. #define CGEMM_DEFAULT_UNROLL_M 2
  2169. #define CGEMM_DEFAULT_UNROLL_N 2
  2170. #define ZGEMM_DEFAULT_UNROLL_M 2
  2171. #define ZGEMM_DEFAULT_UNROLL_N 2
  2172. #define SGEMM_DEFAULT_P 512
  2173. #define DGEMM_DEFAULT_P 512
  2174. #define CGEMM_DEFAULT_P 512
  2175. #define ZGEMM_DEFAULT_P 512
  2176. #define SGEMM_DEFAULT_Q 1024
  2177. #define DGEMM_DEFAULT_Q 512
  2178. #define CGEMM_DEFAULT_Q 512
  2179. #define ZGEMM_DEFAULT_Q 256
  2180. #define SYMV_P 8
  2181. #endif
  2182. #ifdef SICORTEX
  2183. #define SNUMOPT 2
  2184. #define DNUMOPT 2
  2185. #define GEMM_DEFAULT_OFFSET_A 0
  2186. #define GEMM_DEFAULT_OFFSET_B 0
  2187. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2188. #define SGEMM_DEFAULT_UNROLL_M 2
  2189. #define SGEMM_DEFAULT_UNROLL_N 8
  2190. #define DGEMM_DEFAULT_UNROLL_M 2
  2191. #define DGEMM_DEFAULT_UNROLL_N 8
  2192. #define CGEMM_DEFAULT_UNROLL_M 1
  2193. #define CGEMM_DEFAULT_UNROLL_N 4
  2194. #define ZGEMM_DEFAULT_UNROLL_M 1
  2195. #define ZGEMM_DEFAULT_UNROLL_N 4
  2196. #define SGEMM_DEFAULT_P 108
  2197. #define DGEMM_DEFAULT_P 112
  2198. #define CGEMM_DEFAULT_P 108
  2199. #define ZGEMM_DEFAULT_P 112
  2200. #define SGEMM_DEFAULT_Q 288
  2201. #define DGEMM_DEFAULT_Q 144
  2202. #define CGEMM_DEFAULT_Q 144
  2203. #define ZGEMM_DEFAULT_Q 72
  2204. #define SGEMM_DEFAULT_R 2000
  2205. #define DGEMM_DEFAULT_R 2000
  2206. #define CGEMM_DEFAULT_R 2000
  2207. #define ZGEMM_DEFAULT_R 2000
  2208. #define SYMV_P 16
  2209. #endif
  2210. #if defined(LOONGSON3R4)
  2211. #define SNUMOPT 2
  2212. #define DNUMOPT 2
  2213. #define GEMM_DEFAULT_OFFSET_A 0
  2214. #define GEMM_DEFAULT_OFFSET_B 0
  2215. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2216. #if defined(NO_MSA)
  2217. #define SGEMM_DEFAULT_UNROLL_M 8
  2218. #define SGEMM_DEFAULT_UNROLL_N 4
  2219. #define DGEMM_DEFAULT_UNROLL_M 4
  2220. #define DGEMM_DEFAULT_UNROLL_N 4
  2221. #define CGEMM_DEFAULT_UNROLL_M 4
  2222. #define CGEMM_DEFAULT_UNROLL_N 2
  2223. #define ZGEMM_DEFAULT_UNROLL_M 2
  2224. #define ZGEMM_DEFAULT_UNROLL_N 2
  2225. #else
  2226. #define SGEMM_DEFAULT_UNROLL_M 8
  2227. #define SGEMM_DEFAULT_UNROLL_N 8
  2228. #define DGEMM_DEFAULT_UNROLL_M 8
  2229. #define DGEMM_DEFAULT_UNROLL_N 4
  2230. #define CGEMM_DEFAULT_UNROLL_M 8
  2231. #define CGEMM_DEFAULT_UNROLL_N 4
  2232. #define ZGEMM_DEFAULT_UNROLL_M 4
  2233. #define ZGEMM_DEFAULT_UNROLL_N 4
  2234. #endif
  2235. #define SGEMM_DEFAULT_P 64
  2236. #define DGEMM_DEFAULT_P 44
  2237. #define CGEMM_DEFAULT_P 64
  2238. #define ZGEMM_DEFAULT_P 32
  2239. #define SGEMM_DEFAULT_Q 192
  2240. #define DGEMM_DEFAULT_Q 92
  2241. #define CGEMM_DEFAULT_Q 128
  2242. #define ZGEMM_DEFAULT_Q 80
  2243. #define SGEMM_DEFAULT_R 640
  2244. #define DGEMM_DEFAULT_R dgemm_r
  2245. #define CGEMM_DEFAULT_R 640
  2246. #define ZGEMM_DEFAULT_R 640
  2247. #define GEMM_OFFSET_A1 0x10000
  2248. #define GEMM_OFFSET_B1 0x100000
  2249. #define SYMV_P 16
  2250. #endif
  2251. #if defined(LOONGSON3R3)
  2252. ////Copy from SICORTEX
  2253. #define SNUMOPT 2
  2254. #define DNUMOPT 2
  2255. #define GEMM_DEFAULT_OFFSET_A 0
  2256. #define GEMM_DEFAULT_OFFSET_B 0
  2257. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2258. #define SGEMM_DEFAULT_UNROLL_M 8
  2259. #define SGEMM_DEFAULT_UNROLL_N 4
  2260. #define DGEMM_DEFAULT_UNROLL_M 4
  2261. #define DGEMM_DEFAULT_UNROLL_N 4
  2262. #define CGEMM_DEFAULT_UNROLL_M 4
  2263. #define CGEMM_DEFAULT_UNROLL_N 2
  2264. #define ZGEMM_DEFAULT_UNROLL_M 2
  2265. #define ZGEMM_DEFAULT_UNROLL_N 2
  2266. #define SGEMM_DEFAULT_P 64
  2267. #define DGEMM_DEFAULT_P 44
  2268. #define CGEMM_DEFAULT_P 64
  2269. #define ZGEMM_DEFAULT_P 32
  2270. #define SGEMM_DEFAULT_Q 192
  2271. #define DGEMM_DEFAULT_Q 92
  2272. #define CGEMM_DEFAULT_Q 128
  2273. #define ZGEMM_DEFAULT_Q 80
  2274. #define SGEMM_DEFAULT_R 640
  2275. #define DGEMM_DEFAULT_R dgemm_r
  2276. #define CGEMM_DEFAULT_R 640
  2277. #define ZGEMM_DEFAULT_R 640
  2278. #define GEMM_OFFSET_A1 0x10000
  2279. #define GEMM_OFFSET_B1 0x100000
  2280. #define SYMV_P 16
  2281. #endif
  2282. #if defined (LA464)
  2283. #define SNUMOPT 2
  2284. #define DNUMOPT 2
  2285. #define GEMM_DEFAULT_OFFSET_A 0x20000
  2286. #define GEMM_DEFAULT_OFFSET_B 0
  2287. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2288. #if defined(NO_LASX)
  2289. #define DGEMM_DEFAULT_UNROLL_N 8
  2290. #define DGEMM_DEFAULT_UNROLL_M 2
  2291. #define SGEMM_DEFAULT_UNROLL_N 8
  2292. #define SGEMM_DEFAULT_UNROLL_M 2
  2293. #define CGEMM_DEFAULT_UNROLL_N 4
  2294. #define CGEMM_DEFAULT_UNROLL_M 1
  2295. #define ZGEMM_DEFAULT_UNROLL_N 4
  2296. #define ZGEMM_DEFAULT_UNROLL_M 1
  2297. #else
  2298. #define DGEMM_DEFAULT_UNROLL_N 6
  2299. #define DGEMM_DEFAULT_UNROLL_M 16
  2300. #define SGEMM_DEFAULT_UNROLL_N 8
  2301. #define SGEMM_DEFAULT_UNROLL_M 16
  2302. #define CGEMM_DEFAULT_UNROLL_N 4
  2303. #define CGEMM_DEFAULT_UNROLL_M 16
  2304. #define ZGEMM_DEFAULT_UNROLL_N 4
  2305. #define ZGEMM_DEFAULT_UNROLL_M 8
  2306. #define DGEMM_DEFAULT_UNROLL_MN 96
  2307. #endif
  2308. #define QGEMM_DEFAULT_UNROLL_N 2
  2309. #define XGEMM_DEFAULT_UNROLL_N 1
  2310. #define QGEMM_DEFAULT_UNROLL_M 2
  2311. #define XGEMM_DEFAULT_UNROLL_M 1
  2312. #define SGEMM_DEFAULT_P sgemm_p
  2313. #define DGEMM_DEFAULT_P dgemm_p
  2314. #define CGEMM_DEFAULT_P 128
  2315. #define ZGEMM_DEFAULT_P zgemm_p
  2316. #define SGEMM_DEFAULT_R sgemm_r
  2317. #define DGEMM_DEFAULT_R dgemm_r
  2318. #define CGEMM_DEFAULT_R 4096
  2319. #define ZGEMM_DEFAULT_R zgemm_r
  2320. #define SGEMM_DEFAULT_Q sgemm_q
  2321. #define DGEMM_DEFAULT_Q dgemm_q
  2322. #define CGEMM_DEFAULT_Q 128
  2323. #define ZGEMM_DEFAULT_Q zgemm_q
  2324. #define SYMV_P 16
  2325. #endif
  2326. #ifdef LA264
  2327. #define GEMM_DEFAULT_OFFSET_A 0
  2328. #define GEMM_DEFAULT_OFFSET_B 0
  2329. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2330. #define SGEMM_DEFAULT_UNROLL_M 2
  2331. #define SGEMM_DEFAULT_UNROLL_N 8
  2332. #define DGEMM_DEFAULT_UNROLL_M 8
  2333. #define DGEMM_DEFAULT_UNROLL_N 4
  2334. #define CGEMM_DEFAULT_UNROLL_M 8
  2335. #define CGEMM_DEFAULT_UNROLL_N 4
  2336. #define ZGEMM_DEFAULT_UNROLL_M 4
  2337. #define ZGEMM_DEFAULT_UNROLL_N 4
  2338. #define SGEMM_DEFAULT_P 128
  2339. #define DGEMM_DEFAULT_P 128
  2340. #define CGEMM_DEFAULT_P 96
  2341. #define ZGEMM_DEFAULT_P 64
  2342. #define SGEMM_DEFAULT_Q 240
  2343. #define DGEMM_DEFAULT_Q 120
  2344. #define CGEMM_DEFAULT_Q 120
  2345. #define ZGEMM_DEFAULT_Q 120
  2346. #define SGEMM_DEFAULT_R 12288
  2347. #define DGEMM_DEFAULT_R 8192
  2348. #define CGEMM_DEFAULT_R 4096
  2349. #define ZGEMM_DEFAULT_R 4096
  2350. #define SYMV_P 16
  2351. #endif
  2352. #ifdef LA64_GENERIC
  2353. #define GEMM_DEFAULT_OFFSET_A 0
  2354. #define GEMM_DEFAULT_OFFSET_B 0
  2355. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2356. #define SGEMM_DEFAULT_UNROLL_M 2
  2357. #define SGEMM_DEFAULT_UNROLL_N 8
  2358. #define DGEMM_DEFAULT_UNROLL_M 2
  2359. #define DGEMM_DEFAULT_UNROLL_N 8
  2360. #define CGEMM_DEFAULT_UNROLL_M 1
  2361. #define CGEMM_DEFAULT_UNROLL_N 4
  2362. #define ZGEMM_DEFAULT_UNROLL_M 1
  2363. #define ZGEMM_DEFAULT_UNROLL_N 4
  2364. #define SGEMM_DEFAULT_P 128
  2365. #define DGEMM_DEFAULT_P 128
  2366. #define CGEMM_DEFAULT_P 96
  2367. #define ZGEMM_DEFAULT_P 64
  2368. #define SGEMM_DEFAULT_Q 240
  2369. #define DGEMM_DEFAULT_Q 120
  2370. #define CGEMM_DEFAULT_Q 120
  2371. #define ZGEMM_DEFAULT_Q 120
  2372. #define SGEMM_DEFAULT_R 12288
  2373. #define DGEMM_DEFAULT_R 8192
  2374. #define CGEMM_DEFAULT_R 4096
  2375. #define ZGEMM_DEFAULT_R 4096
  2376. #define SYMV_P 16
  2377. #endif
  2378. #if defined(MIPS64_GENERIC) || defined(P5600) || defined(MIPS1004K) || defined(MIPS24K) || defined(I6400) || defined(P6600) || defined(I6500)
  2379. #define SNUMOPT 2
  2380. #define DNUMOPT 2
  2381. #define GEMM_DEFAULT_OFFSET_A 0
  2382. #define GEMM_DEFAULT_OFFSET_B 0
  2383. #define GEMM_DEFAULT_ALIGN (BLASLONG) 0x03fffUL
  2384. #if defined(NO_MSA) || defined(MIPS64_GENERIC)
  2385. #define SGEMM_DEFAULT_UNROLL_M 2
  2386. #define SGEMM_DEFAULT_UNROLL_N 2
  2387. #define DGEMM_DEFAULT_UNROLL_M 2
  2388. #define DGEMM_DEFAULT_UNROLL_N 2
  2389. #define CGEMM_DEFAULT_UNROLL_M 2
  2390. #define CGEMM_DEFAULT_UNROLL_N 2
  2391. #define ZGEMM_DEFAULT_UNROLL_M 2
  2392. #define ZGEMM_DEFAULT_UNROLL_N 2
  2393. #else
  2394. #define SGEMM_DEFAULT_UNROLL_M 8
  2395. #define SGEMM_DEFAULT_UNROLL_N 8
  2396. #define DGEMM_DEFAULT_UNROLL_M 8
  2397. #define DGEMM_DEFAULT_UNROLL_N 4
  2398. #define CGEMM_DEFAULT_UNROLL_M 8
  2399. #define CGEMM_DEFAULT_UNROLL_N 4
  2400. #define ZGEMM_DEFAULT_UNROLL_M 4
  2401. #define ZGEMM_DEFAULT_UNROLL_N 4
  2402. #endif
  2403. #define SGEMM_DEFAULT_P 128
  2404. #define DGEMM_DEFAULT_P 128
  2405. #define CGEMM_DEFAULT_P 96
  2406. #define ZGEMM_DEFAULT_P 64
  2407. #define SGEMM_DEFAULT_Q 240
  2408. #define DGEMM_DEFAULT_Q 120
  2409. #define CGEMM_DEFAULT_Q 120
  2410. #define ZGEMM_DEFAULT_Q 120
  2411. #define SGEMM_DEFAULT_R 12288
  2412. #define DGEMM_DEFAULT_R 8192
  2413. #define CGEMM_DEFAULT_R 4096
  2414. #define ZGEMM_DEFAULT_R 4096
  2415. #define SYMV_P 16
  2416. #endif
  2417. #ifdef RISCV64_GENERIC
  2418. #define GEMM_DEFAULT_OFFSET_A 0
  2419. #define GEMM_DEFAULT_OFFSET_B 0
  2420. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2421. #define SGEMM_DEFAULT_UNROLL_M 2
  2422. #define SGEMM_DEFAULT_UNROLL_N 2
  2423. #define DGEMM_DEFAULT_UNROLL_M 2
  2424. #define DGEMM_DEFAULT_UNROLL_N 2
  2425. #define CGEMM_DEFAULT_UNROLL_M 2
  2426. #define CGEMM_DEFAULT_UNROLL_N 2
  2427. #define ZGEMM_DEFAULT_UNROLL_M 2
  2428. #define ZGEMM_DEFAULT_UNROLL_N 2
  2429. #define SGEMM_DEFAULT_P 128
  2430. #define DGEMM_DEFAULT_P 128
  2431. #define CGEMM_DEFAULT_P 96
  2432. #define ZGEMM_DEFAULT_P 64
  2433. #define SGEMM_DEFAULT_Q 240
  2434. #define DGEMM_DEFAULT_Q 120
  2435. #define CGEMM_DEFAULT_Q 120
  2436. #define ZGEMM_DEFAULT_Q 120
  2437. #define SGEMM_DEFAULT_R 12288
  2438. #define DGEMM_DEFAULT_R 8192
  2439. #define CGEMM_DEFAULT_R 4096
  2440. #define ZGEMM_DEFAULT_R 4096
  2441. #define SYMV_P 16
  2442. #define GEMM_DEFAULT_OFFSET_A 0
  2443. #define GEMM_DEFAULT_OFFSET_B 0
  2444. #endif
  2445. #if defined(x280)
  2446. #define GEMM_DEFAULT_OFFSET_A 0
  2447. #define GEMM_DEFAULT_OFFSET_B 0
  2448. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2449. #define SGEMM_DEFAULT_UNROLL_M 16 // 4 // 16 // 2
  2450. #define SGEMM_DEFAULT_UNROLL_N 8// 4 // 4 // 2
  2451. /* SGEMM_UNROLL_MN is calculated as max(SGEMM_UNROLL_M, SGEMM_UNROLL_N)
  2452. * Since we don't define SGEMM_UNROLL_M correctly we have to manually set this macro.
  2453. * If VLMAX size is ever more than 1024, this should be increased also. */
  2454. #define SGEMM_DEFAULT_UNROLL_MN 32
  2455. #define DGEMM_DEFAULT_UNROLL_M 16 //2 // 8
  2456. #define DGEMM_DEFAULT_UNROLL_N 8 //2 // 4
  2457. #define DGEMM_DEFAULT_UNROLL_MN 32
  2458. #define CGEMM_DEFAULT_UNROLL_M 8
  2459. #define CGEMM_DEFAULT_UNROLL_N 4
  2460. #define CGEMM_DEFAULT_UNROLL_MN 32
  2461. #define ZGEMM_DEFAULT_UNROLL_M 8
  2462. #define ZGEMM_DEFAULT_UNROLL_N 4
  2463. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2464. #define SGEMM_DEFAULT_P 160
  2465. #define DGEMM_DEFAULT_P 160
  2466. #define CGEMM_DEFAULT_P 96
  2467. #define ZGEMM_DEFAULT_P 64
  2468. #define SGEMM_DEFAULT_Q 240
  2469. #define DGEMM_DEFAULT_Q 128
  2470. #define CGEMM_DEFAULT_Q 120
  2471. #define ZGEMM_DEFAULT_Q 120
  2472. #define SGEMM_DEFAULT_R 12288
  2473. #define DGEMM_DEFAULT_R 8192
  2474. #define CGEMM_DEFAULT_R 4096
  2475. #define ZGEMM_DEFAULT_R 4096
  2476. #define SYMV_P 16
  2477. #define GEMM_DEFAULT_OFFSET_A 0
  2478. #define GEMM_DEFAULT_OFFSET_B 0
  2479. #endif
  2480. #ifdef C910V
  2481. #define GEMM_DEFAULT_OFFSET_A 0
  2482. #define GEMM_DEFAULT_OFFSET_B 0
  2483. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2484. #define SGEMM_DEFAULT_UNROLL_M 16
  2485. #define SGEMM_DEFAULT_UNROLL_N 4
  2486. #define DGEMM_DEFAULT_UNROLL_M 8
  2487. #define DGEMM_DEFAULT_UNROLL_N 4
  2488. #define CGEMM_DEFAULT_UNROLL_M 2
  2489. #define CGEMM_DEFAULT_UNROLL_N 2
  2490. #define ZGEMM_DEFAULT_UNROLL_M 2
  2491. #define ZGEMM_DEFAULT_UNROLL_N 2
  2492. #define SGEMM_DEFAULT_P 160
  2493. #define DGEMM_DEFAULT_P 160
  2494. #define CGEMM_DEFAULT_P 96
  2495. #define ZGEMM_DEFAULT_P 64
  2496. #define SGEMM_DEFAULT_Q 240
  2497. #define DGEMM_DEFAULT_Q 128
  2498. #define CGEMM_DEFAULT_Q 120
  2499. #define ZGEMM_DEFAULT_Q 120
  2500. #define SGEMM_DEFAULT_R 12288
  2501. #define DGEMM_DEFAULT_R 8192
  2502. #define CGEMM_DEFAULT_R 4096
  2503. #define ZGEMM_DEFAULT_R 4096
  2504. #define SYMV_P 16
  2505. #define GEMM_DEFAULT_OFFSET_A 0
  2506. #define GEMM_DEFAULT_OFFSET_B 0
  2507. #endif
  2508. #ifdef RISCV64_ZVL128B
  2509. #define GEMM_DEFAULT_OFFSET_A 0
  2510. #define GEMM_DEFAULT_OFFSET_B 0
  2511. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2512. #define SGEMM_DEFAULT_UNROLL_M 8
  2513. #define SGEMM_DEFAULT_UNROLL_N 8
  2514. #define DGEMM_DEFAULT_UNROLL_M 8
  2515. #define DGEMM_DEFAULT_UNROLL_N 4
  2516. #define CGEMM_DEFAULT_UNROLL_M 8
  2517. #define CGEMM_DEFAULT_UNROLL_N 4
  2518. #define ZGEMM_DEFAULT_UNROLL_M 4
  2519. #define ZGEMM_DEFAULT_UNROLL_N 4
  2520. #define SGEMM_DEFAULT_P 128
  2521. #define DGEMM_DEFAULT_P 128
  2522. #define CGEMM_DEFAULT_P 96
  2523. #define ZGEMM_DEFAULT_P 64
  2524. #define SGEMM_DEFAULT_Q 240
  2525. #define DGEMM_DEFAULT_Q 120
  2526. #define CGEMM_DEFAULT_Q 120
  2527. #define ZGEMM_DEFAULT_Q 120
  2528. #define SGEMM_DEFAULT_R 12288
  2529. #define DGEMM_DEFAULT_R 8192
  2530. #define CGEMM_DEFAULT_R 4096
  2531. #define ZGEMM_DEFAULT_R 4096
  2532. #define SYMV_P 16
  2533. #define GEMM_DEFAULT_OFFSET_A 0
  2534. #define GEMM_DEFAULT_OFFSET_B 0
  2535. #endif
  2536. #ifdef RISCV64_ZVL256B
  2537. #define GEMM_DEFAULT_OFFSET_A 0
  2538. #define GEMM_DEFAULT_OFFSET_B 0
  2539. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2540. #define SGEMM_DEFAULT_UNROLL_M 16
  2541. #define SGEMM_DEFAULT_UNROLL_N 8
  2542. #define DGEMM_DEFAULT_UNROLL_M 8
  2543. #define DGEMM_DEFAULT_UNROLL_N 8
  2544. #define CGEMM_DEFAULT_UNROLL_M 8
  2545. #define CGEMM_DEFAULT_UNROLL_N 8
  2546. #define ZGEMM_DEFAULT_UNROLL_M 8
  2547. #define ZGEMM_DEFAULT_UNROLL_N 4
  2548. #define SGEMM_DEFAULT_P 128
  2549. #define DGEMM_DEFAULT_P 64
  2550. #define CGEMM_DEFAULT_P 64
  2551. #define ZGEMM_DEFAULT_P 64
  2552. #define SGEMM_DEFAULT_Q 128
  2553. #define DGEMM_DEFAULT_Q 128
  2554. #define CGEMM_DEFAULT_Q 128
  2555. #define ZGEMM_DEFAULT_Q 64
  2556. #define SGEMM_DEFAULT_R 16384
  2557. #define DGEMM_DEFAULT_R 8192
  2558. #define CGEMM_DEFAULT_R 8192
  2559. #define ZGEMM_DEFAULT_R 4096
  2560. #define SYMV_P 16
  2561. #define GEMM_DEFAULT_OFFSET_A 0
  2562. #define GEMM_DEFAULT_OFFSET_B 0
  2563. #endif
  2564. #ifdef ARMV7
  2565. #define SNUMOPT 2
  2566. #define DNUMOPT 2
  2567. #define GEMM_DEFAULT_OFFSET_A 0
  2568. #define GEMM_DEFAULT_OFFSET_B 0
  2569. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2570. #define SGEMM_DEFAULT_UNROLL_M 4
  2571. #define SGEMM_DEFAULT_UNROLL_N 4
  2572. #define DGEMM_DEFAULT_UNROLL_M 4
  2573. #define DGEMM_DEFAULT_UNROLL_N 4
  2574. #define CGEMM_DEFAULT_UNROLL_M 2
  2575. #define CGEMM_DEFAULT_UNROLL_N 2
  2576. #define ZGEMM_DEFAULT_UNROLL_M 2
  2577. #define ZGEMM_DEFAULT_UNROLL_N 2
  2578. #define SGEMM_DEFAULT_P 128
  2579. #define DGEMM_DEFAULT_P 128
  2580. #define CGEMM_DEFAULT_P 96
  2581. #define ZGEMM_DEFAULT_P 64
  2582. #define SGEMM_DEFAULT_Q 240
  2583. #define DGEMM_DEFAULT_Q 120
  2584. #define CGEMM_DEFAULT_Q 120
  2585. #define ZGEMM_DEFAULT_Q 120
  2586. #define SGEMM_DEFAULT_R 12288
  2587. #define DGEMM_DEFAULT_R 8192
  2588. #define CGEMM_DEFAULT_R 4096
  2589. #define ZGEMM_DEFAULT_R 4096
  2590. #define SYMV_P 16
  2591. #endif
  2592. #if defined(ARMV6)
  2593. #define SNUMOPT 2
  2594. #define DNUMOPT 2
  2595. #define GEMM_DEFAULT_OFFSET_A 0
  2596. #define GEMM_DEFAULT_OFFSET_B 0
  2597. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2598. #define SGEMM_DEFAULT_UNROLL_M 4
  2599. #define SGEMM_DEFAULT_UNROLL_N 2
  2600. #define DGEMM_DEFAULT_UNROLL_M 4
  2601. #define DGEMM_DEFAULT_UNROLL_N 2
  2602. #define CGEMM_DEFAULT_UNROLL_M 2
  2603. #define CGEMM_DEFAULT_UNROLL_N 2
  2604. #define ZGEMM_DEFAULT_UNROLL_M 2
  2605. #define ZGEMM_DEFAULT_UNROLL_N 2
  2606. #define SGEMM_DEFAULT_P 128
  2607. #define DGEMM_DEFAULT_P 128
  2608. #define CGEMM_DEFAULT_P 96
  2609. #define ZGEMM_DEFAULT_P 64
  2610. #define SGEMM_DEFAULT_Q 240
  2611. #define DGEMM_DEFAULT_Q 120
  2612. #define CGEMM_DEFAULT_Q 120
  2613. #define ZGEMM_DEFAULT_Q 120
  2614. #define SGEMM_DEFAULT_R 12288
  2615. #define DGEMM_DEFAULT_R 8192
  2616. #define CGEMM_DEFAULT_R 4096
  2617. #define ZGEMM_DEFAULT_R 4096
  2618. #define SYMV_P 16
  2619. #endif
  2620. /* Common ARMv8 parameters */
  2621. #if defined(ARMV8)
  2622. #define SNUMOPT 2
  2623. #define DNUMOPT 2
  2624. #define GEMM_DEFAULT_OFFSET_A 0
  2625. #define GEMM_DEFAULT_OFFSET_B 0
  2626. #ifdef _WIN64
  2627. /* Use explicit casting for win64 as LLP64 datamodel is used */
  2628. #define GEMM_DEFAULT_ALIGN (BLASULONG)0x03fffUL
  2629. #else
  2630. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2631. #endif
  2632. #define SYMV_P 16
  2633. #if defined(CORTEXA57) || defined(CORTEXX1) || \
  2634. defined(CORTEXA72) || defined(CORTEXA73) || \
  2635. defined(FALKOR) || defined(TSV110) || defined(EMAG8180) || defined(VORTEX) || defined(FT2000)
  2636. #define SGEMM_DEFAULT_UNROLL_M 16
  2637. #define SGEMM_DEFAULT_UNROLL_N 4
  2638. #define DGEMM_DEFAULT_UNROLL_M 8
  2639. #define DGEMM_DEFAULT_UNROLL_N 4
  2640. #define CGEMM_DEFAULT_UNROLL_M 8
  2641. #define CGEMM_DEFAULT_UNROLL_N 4
  2642. #define ZGEMM_DEFAULT_UNROLL_M 4
  2643. #define ZGEMM_DEFAULT_UNROLL_N 4
  2644. /*FIXME: this should be using the cache size, but there is currently no easy way to
  2645. query that on ARM. So if getarch counted more than 8 cores we simply assume the host
  2646. is a big desktop or server with abundant cache rather than a phone or embedded device */
  2647. #if NUM_CORES > 8 || defined(TSV110) || defined(EMAG8180) || defined(VORTEX)|| defined(CORTEXX1)
  2648. #define SGEMM_DEFAULT_P 512
  2649. #define DGEMM_DEFAULT_P 256
  2650. #define CGEMM_DEFAULT_P 256
  2651. #define ZGEMM_DEFAULT_P 128
  2652. #define SGEMM_DEFAULT_Q 1024
  2653. #define DGEMM_DEFAULT_Q 512
  2654. #define CGEMM_DEFAULT_Q 512
  2655. #define ZGEMM_DEFAULT_Q 512
  2656. #else
  2657. #define SGEMM_DEFAULT_P 128
  2658. #define DGEMM_DEFAULT_P 160
  2659. #define CGEMM_DEFAULT_P 128
  2660. #define ZGEMM_DEFAULT_P 128
  2661. #define SGEMM_DEFAULT_Q 352
  2662. #define DGEMM_DEFAULT_Q 128
  2663. #define CGEMM_DEFAULT_Q 224
  2664. #define ZGEMM_DEFAULT_Q 112
  2665. #endif
  2666. #define SGEMM_DEFAULT_R 4096
  2667. #define DGEMM_DEFAULT_R 4096
  2668. #define CGEMM_DEFAULT_R 4096
  2669. #define ZGEMM_DEFAULT_R 2048
  2670. #elif defined(CORTEXA76)
  2671. #define SGEMM_DEFAULT_UNROLL_M 16
  2672. #define SGEMM_DEFAULT_UNROLL_N 4
  2673. #define DGEMM_DEFAULT_UNROLL_M 8
  2674. #define DGEMM_DEFAULT_UNROLL_N 4
  2675. #define CGEMM_DEFAULT_UNROLL_M 8
  2676. #define CGEMM_DEFAULT_UNROLL_N 4
  2677. #define ZGEMM_DEFAULT_UNROLL_M 4
  2678. #define ZGEMM_DEFAULT_UNROLL_N 4
  2679. #if defined(XDOUBLE) || defined(DOUBLE)
  2680. #define SWITCH_RATIO 8
  2681. #else
  2682. #define SWITCH_RATIO 16
  2683. #endif
  2684. #define SGEMM_DEFAULT_P 256
  2685. #define DGEMM_DEFAULT_P 128
  2686. #define CGEMM_DEFAULT_P 128
  2687. #define ZGEMM_DEFAULT_P 64
  2688. #define SGEMM_DEFAULT_Q 512
  2689. #define DGEMM_DEFAULT_Q 256
  2690. #define CGEMM_DEFAULT_Q 256
  2691. #define ZGEMM_DEFAULT_Q 256
  2692. #define SGEMM_DEFAULT_R 4096
  2693. #define DGEMM_DEFAULT_R 4096
  2694. #define CGEMM_DEFAULT_R 4096
  2695. #define ZGEMM_DEFAULT_R 4096
  2696. #elif defined(CORTEXA53) || defined(CORTEXA55)
  2697. #define SGEMM_DEFAULT_UNROLL_M 8
  2698. #define SGEMM_DEFAULT_UNROLL_N 8
  2699. #define DGEMM_DEFAULT_UNROLL_M 4
  2700. #define DGEMM_DEFAULT_UNROLL_N 4
  2701. #define CGEMM_DEFAULT_UNROLL_M 8
  2702. #define CGEMM_DEFAULT_UNROLL_N 4
  2703. #define ZGEMM_DEFAULT_UNROLL_M 4
  2704. #define ZGEMM_DEFAULT_UNROLL_N 4
  2705. #define SGEMM_DEFAULT_P 256
  2706. #define DGEMM_DEFAULT_P 160
  2707. #define CGEMM_DEFAULT_P 128
  2708. #define ZGEMM_DEFAULT_P 128
  2709. #define SGEMM_DEFAULT_Q 256
  2710. #define DGEMM_DEFAULT_Q 128
  2711. #define CGEMM_DEFAULT_Q 224
  2712. #define ZGEMM_DEFAULT_Q 112
  2713. #define SGEMM_DEFAULT_R 4096
  2714. #define DGEMM_DEFAULT_R 4096
  2715. #define CGEMM_DEFAULT_R 4096
  2716. #define ZGEMM_DEFAULT_R 2048
  2717. #elif defined(THUNDERX)
  2718. #define SGEMM_DEFAULT_UNROLL_M 4
  2719. #define SGEMM_DEFAULT_UNROLL_N 4
  2720. #define DGEMM_DEFAULT_UNROLL_M 2
  2721. #define DGEMM_DEFAULT_UNROLL_N 2
  2722. #define CGEMM_DEFAULT_UNROLL_M 2
  2723. #define CGEMM_DEFAULT_UNROLL_N 2
  2724. #define ZGEMM_DEFAULT_UNROLL_M 2
  2725. #define ZGEMM_DEFAULT_UNROLL_N 2
  2726. #define SGEMM_DEFAULT_P 128
  2727. #define DGEMM_DEFAULT_P 128
  2728. #define CGEMM_DEFAULT_P 96
  2729. #define ZGEMM_DEFAULT_P 64
  2730. #define SGEMM_DEFAULT_Q 240
  2731. #define DGEMM_DEFAULT_Q 120
  2732. #define CGEMM_DEFAULT_Q 120
  2733. #define ZGEMM_DEFAULT_Q 120
  2734. #define SGEMM_DEFAULT_R 12288
  2735. #define DGEMM_DEFAULT_R 8192
  2736. #define CGEMM_DEFAULT_R 4096
  2737. #define ZGEMM_DEFAULT_R 4096
  2738. #elif defined(THUNDERX2T99)
  2739. #define SGEMM_DEFAULT_UNROLL_M 16
  2740. #define SGEMM_DEFAULT_UNROLL_N 4
  2741. #define DGEMM_DEFAULT_UNROLL_M 8
  2742. #define DGEMM_DEFAULT_UNROLL_N 4
  2743. #define CGEMM_DEFAULT_UNROLL_M 8
  2744. #define CGEMM_DEFAULT_UNROLL_N 4
  2745. #define ZGEMM_DEFAULT_UNROLL_M 4
  2746. #define ZGEMM_DEFAULT_UNROLL_N 4
  2747. #define SGEMM_DEFAULT_P 128
  2748. #define DGEMM_DEFAULT_P 160
  2749. #define CGEMM_DEFAULT_P 128
  2750. #define ZGEMM_DEFAULT_P 128
  2751. #define SGEMM_DEFAULT_Q 352
  2752. #define DGEMM_DEFAULT_Q 128
  2753. #define CGEMM_DEFAULT_Q 224
  2754. #define ZGEMM_DEFAULT_Q 112
  2755. #define SGEMM_DEFAULT_R 4096
  2756. #define DGEMM_DEFAULT_R 4096
  2757. #define CGEMM_DEFAULT_R 4096
  2758. #define ZGEMM_DEFAULT_R 4096
  2759. #elif defined(THUNDERX3T110)
  2760. #define SGEMM_DEFAULT_UNROLL_M 16
  2761. #define SGEMM_DEFAULT_UNROLL_N 4
  2762. #define DGEMM_DEFAULT_UNROLL_M 8
  2763. #define DGEMM_DEFAULT_UNROLL_N 4
  2764. #define CGEMM_DEFAULT_UNROLL_M 8
  2765. #define CGEMM_DEFAULT_UNROLL_N 4
  2766. #define ZGEMM_DEFAULT_UNROLL_M 4
  2767. #define ZGEMM_DEFAULT_UNROLL_N 4
  2768. #define SGEMM_DEFAULT_P 128
  2769. #define DGEMM_DEFAULT_P 320
  2770. #define CGEMM_DEFAULT_P 128
  2771. #define ZGEMM_DEFAULT_P 128
  2772. #define SGEMM_DEFAULT_Q 352
  2773. #define DGEMM_DEFAULT_Q 128
  2774. #define CGEMM_DEFAULT_Q 224
  2775. #define ZGEMM_DEFAULT_Q 112
  2776. #define SGEMM_DEFAULT_R 4096
  2777. #define DGEMM_DEFAULT_R 4096
  2778. #define CGEMM_DEFAULT_R 4096
  2779. #define ZGEMM_DEFAULT_R 4096
  2780. #elif defined(NEOVERSEN1)
  2781. #if defined(XDOUBLE) || defined(DOUBLE)
  2782. #define SWITCH_RATIO 8
  2783. #else
  2784. #define SWITCH_RATIO 16
  2785. #endif
  2786. #define SGEMM_DEFAULT_UNROLL_M 16
  2787. #define SGEMM_DEFAULT_UNROLL_N 4
  2788. #define DGEMM_DEFAULT_UNROLL_M 8
  2789. #define DGEMM_DEFAULT_UNROLL_N 4
  2790. #define CGEMM_DEFAULT_UNROLL_M 8
  2791. #define CGEMM_DEFAULT_UNROLL_N 4
  2792. #define ZGEMM_DEFAULT_UNROLL_M 4
  2793. #define ZGEMM_DEFAULT_UNROLL_N 4
  2794. #define SGEMM_DEFAULT_P 240
  2795. #define DGEMM_DEFAULT_P 240
  2796. #define CGEMM_DEFAULT_P 128
  2797. #define ZGEMM_DEFAULT_P 128
  2798. #define SGEMM_DEFAULT_Q 640
  2799. #define DGEMM_DEFAULT_Q 320
  2800. #define CGEMM_DEFAULT_Q 224
  2801. #define ZGEMM_DEFAULT_Q 112
  2802. #define SGEMM_DEFAULT_R 4096
  2803. #define DGEMM_DEFAULT_R 4096
  2804. #define CGEMM_DEFAULT_R 4096
  2805. #define ZGEMM_DEFAULT_R 4096
  2806. #elif defined(NEOVERSEV1) // 256-bit SVE
  2807. #if defined(XDOUBLE) || defined(DOUBLE)
  2808. #define SWITCH_RATIO 8
  2809. #define GEMM_PREFERED_SIZE 4
  2810. #else
  2811. #define SWITCH_RATIO 16
  2812. #define GEMM_PREFERED_SIZE 8
  2813. #endif
  2814. #define SGEMM_DEFAULT_UNROLL_M 16
  2815. #define SGEMM_DEFAULT_UNROLL_N 8
  2816. #define DGEMM_DEFAULT_UNROLL_M 4 // Actually 2VL (8) but kept separate to keep copies separate
  2817. #define DGEMM_DEFAULT_UNROLL_N 8
  2818. #define CGEMM_DEFAULT_UNROLL_M 2
  2819. #define CGEMM_DEFAULT_UNROLL_N 4
  2820. #define CGEMM_DEFAULT_UNROLL_MN 16
  2821. #define ZGEMM_DEFAULT_UNROLL_M 2
  2822. #define ZGEMM_DEFAULT_UNROLL_N 4
  2823. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2824. #define SGEMM_DEFAULT_P 240
  2825. #define DGEMM_DEFAULT_P 240
  2826. #define CGEMM_DEFAULT_P 128
  2827. #define ZGEMM_DEFAULT_P 128
  2828. #define SGEMM_DEFAULT_Q 640
  2829. #define DGEMM_DEFAULT_Q 320
  2830. #define CGEMM_DEFAULT_Q 224
  2831. #define ZGEMM_DEFAULT_Q 112
  2832. #define SGEMM_DEFAULT_R 4096
  2833. #define DGEMM_DEFAULT_R 4096
  2834. #define CGEMM_DEFAULT_R 4096
  2835. #define ZGEMM_DEFAULT_R 4096
  2836. #elif defined(NEOVERSEN2)
  2837. #if defined(XDOUBLE) || defined(DOUBLE)
  2838. #define SWITCH_RATIO 8
  2839. #else
  2840. #define SWITCH_RATIO 16
  2841. #endif
  2842. #undef SBGEMM_ALIGN_K
  2843. #define SBGEMM_ALIGN_K 4
  2844. #undef SBGEMM_DEFAULT_UNROLL_M
  2845. #undef SBGEMM_DEFAULT_UNROLL_N
  2846. #define SBGEMM_DEFAULT_UNROLL_M 8
  2847. #define SBGEMM_DEFAULT_UNROLL_N 4
  2848. #define SGEMM_DEFAULT_UNROLL_M 16
  2849. #define SGEMM_DEFAULT_UNROLL_N 4
  2850. #define DGEMM_DEFAULT_UNROLL_M 8
  2851. #define DGEMM_DEFAULT_UNROLL_N 4
  2852. #define CGEMM_DEFAULT_UNROLL_M 8
  2853. #define CGEMM_DEFAULT_UNROLL_N 4
  2854. #define ZGEMM_DEFAULT_UNROLL_M 4
  2855. #define ZGEMM_DEFAULT_UNROLL_N 4
  2856. #define SGEMM_DEFAULT_P 128
  2857. #define DGEMM_DEFAULT_P 160
  2858. #define CGEMM_DEFAULT_P 128
  2859. #define ZGEMM_DEFAULT_P 128
  2860. #define SGEMM_DEFAULT_Q 352
  2861. #define DGEMM_DEFAULT_Q 128
  2862. #define CGEMM_DEFAULT_Q 224
  2863. #define ZGEMM_DEFAULT_Q 112
  2864. #define SGEMM_DEFAULT_R 4096
  2865. #define DGEMM_DEFAULT_R 4096
  2866. #define CGEMM_DEFAULT_R 4096
  2867. #define ZGEMM_DEFAULT_R 4096
  2868. #elif defined(A64FX) // 512-bit SVE
  2869. /* When all BLAS3 routines are implemeted with SVE, SGEMM_DEFAULT_UNROLL_M should be "sve_vl".
  2870. Until then, just keep it different than DGEMM_DEFAULT_UNROLL_N to keep copy routines in both directions seperated. */
  2871. #define SGEMM_DEFAULT_UNROLL_M 4
  2872. #define SGEMM_DEFAULT_UNROLL_N 8
  2873. /* SGEMM_UNROLL_MN is calculated as max(SGEMM_UNROLL_M, SGEMM_UNROLL_N)
  2874. * Since we don't define SGEMM_UNROLL_M correctly we have to manually set this macro.
  2875. * If SVE size is ever more than 1024, this should be increased also. */
  2876. #define SGEMM_DEFAULT_UNROLL_MN 32
  2877. /* When all BLAS3 routines are implemeted with SVE, DGEMM_DEFAULT_UNROLL_M should be "sve_vl".
  2878. Until then, just keep it different than DGEMM_DEFAULT_UNROLL_N to keep copy routines in both directions seperated. */
  2879. #define DGEMM_DEFAULT_UNROLL_M 2
  2880. #define DGEMM_DEFAULT_UNROLL_N 8
  2881. #define DGEMM_DEFAULT_UNROLL_MN 32
  2882. #define CGEMM_DEFAULT_UNROLL_M 2
  2883. #define CGEMM_DEFAULT_UNROLL_N 4
  2884. #define CGEMM_DEFAULT_UNROLL_MN 16
  2885. #define ZGEMM_DEFAULT_UNROLL_M 2
  2886. #define ZGEMM_DEFAULT_UNROLL_N 4
  2887. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2888. #define SGEMM_DEFAULT_P 128
  2889. #define DGEMM_DEFAULT_P 160
  2890. #define CGEMM_DEFAULT_P 128
  2891. #define ZGEMM_DEFAULT_P 128
  2892. #define SGEMM_DEFAULT_Q 352
  2893. #define DGEMM_DEFAULT_Q 128
  2894. #define CGEMM_DEFAULT_Q 224
  2895. #define ZGEMM_DEFAULT_Q 112
  2896. #define SGEMM_DEFAULT_R 4096
  2897. #define DGEMM_DEFAULT_R 4096
  2898. #define CGEMM_DEFAULT_R 4096
  2899. #define ZGEMM_DEFAULT_R 4096
  2900. #elif defined(ARMV8SVE) || defined(ARMV9) || defined(CORTEXA510)|| defined(CORTEXA710) || defined(CORTEXX2) // 128-bit SVE
  2901. #if defined(XDOUBLE) || defined(DOUBLE)
  2902. #define SWITCH_RATIO 8
  2903. #else
  2904. #define SWITCH_RATIO 16
  2905. #endif
  2906. #define SGEMM_DEFAULT_UNROLL_M 4 // Actually 1VL (8) but kept seperate to keep copies seperate
  2907. #define SGEMM_DEFAULT_UNROLL_N 8
  2908. #define DGEMM_DEFAULT_UNROLL_M 4
  2909. #define DGEMM_DEFAULT_UNROLL_N 8
  2910. #define CGEMM_DEFAULT_UNROLL_M 2
  2911. #define CGEMM_DEFAULT_UNROLL_N 4
  2912. #define CGEMM_DEFAULT_UNROLL_MN 16
  2913. #define ZGEMM_DEFAULT_UNROLL_M 2
  2914. #define ZGEMM_DEFAULT_UNROLL_N 4
  2915. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2916. #define SGEMM_DEFAULT_P 128
  2917. #define DGEMM_DEFAULT_P 160
  2918. #define CGEMM_DEFAULT_P 128
  2919. #define ZGEMM_DEFAULT_P 128
  2920. #define SGEMM_DEFAULT_Q 352
  2921. #define DGEMM_DEFAULT_Q 128
  2922. #define CGEMM_DEFAULT_Q 224
  2923. #define ZGEMM_DEFAULT_Q 112
  2924. #define SGEMM_DEFAULT_R 4096
  2925. #define DGEMM_DEFAULT_R 4096
  2926. #define CGEMM_DEFAULT_R 4096
  2927. #define ZGEMM_DEFAULT_R 4096
  2928. #else /* Other/undetected ARMv8 cores */
  2929. #define SGEMM_DEFAULT_UNROLL_M 16
  2930. #define SGEMM_DEFAULT_UNROLL_N 4
  2931. #define DGEMM_DEFAULT_UNROLL_M 8
  2932. #define DGEMM_DEFAULT_UNROLL_N 4
  2933. #define CGEMM_DEFAULT_UNROLL_M 8
  2934. #define CGEMM_DEFAULT_UNROLL_N 4
  2935. #define ZGEMM_DEFAULT_UNROLL_M 4
  2936. #define ZGEMM_DEFAULT_UNROLL_N 4
  2937. #define SGEMM_DEFAULT_P 128
  2938. #define DGEMM_DEFAULT_P 160
  2939. #define CGEMM_DEFAULT_P 128
  2940. #define ZGEMM_DEFAULT_P 128
  2941. #define SGEMM_DEFAULT_Q 352
  2942. #define DGEMM_DEFAULT_Q 128
  2943. #define CGEMM_DEFAULT_Q 224
  2944. #define ZGEMM_DEFAULT_Q 112
  2945. #define SGEMM_DEFAULT_R 4096
  2946. #define DGEMM_DEFAULT_R 4096
  2947. #define CGEMM_DEFAULT_R 4096
  2948. #define ZGEMM_DEFAULT_R 4096
  2949. #endif /* Cores */
  2950. #endif /* ARMv8 */
  2951. #if defined(ARMV5)
  2952. #define SNUMOPT 2
  2953. #define DNUMOPT 2
  2954. #define GEMM_DEFAULT_OFFSET_A 0
  2955. #define GEMM_DEFAULT_OFFSET_B 0
  2956. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2957. #define SGEMM_DEFAULT_UNROLL_M 2
  2958. #define SGEMM_DEFAULT_UNROLL_N 2
  2959. #define DGEMM_DEFAULT_UNROLL_M 2
  2960. #define DGEMM_DEFAULT_UNROLL_N 2
  2961. #define CGEMM_DEFAULT_UNROLL_M 2
  2962. #define CGEMM_DEFAULT_UNROLL_N 2
  2963. #define ZGEMM_DEFAULT_UNROLL_M 2
  2964. #define ZGEMM_DEFAULT_UNROLL_N 2
  2965. #define SGEMM_DEFAULT_P 128
  2966. #define DGEMM_DEFAULT_P 128
  2967. #define CGEMM_DEFAULT_P 96
  2968. #define ZGEMM_DEFAULT_P 64
  2969. #define SGEMM_DEFAULT_Q 240
  2970. #define DGEMM_DEFAULT_Q 120
  2971. #define CGEMM_DEFAULT_Q 120
  2972. #define ZGEMM_DEFAULT_Q 120
  2973. #define SGEMM_DEFAULT_R 12288
  2974. #define DGEMM_DEFAULT_R 8192
  2975. #define CGEMM_DEFAULT_R 4096
  2976. #define ZGEMM_DEFAULT_R 4096
  2977. #define SYMV_P 16
  2978. #endif
  2979. #ifdef CORTEXA9
  2980. #define SNUMOPT 2
  2981. #define DNUMOPT 2
  2982. #define GEMM_DEFAULT_OFFSET_A 0
  2983. #define GEMM_DEFAULT_OFFSET_B 0
  2984. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2985. #define SGEMM_DEFAULT_UNROLL_M 4
  2986. #define SGEMM_DEFAULT_UNROLL_N 4
  2987. #define DGEMM_DEFAULT_UNROLL_M 4
  2988. #define DGEMM_DEFAULT_UNROLL_N 4
  2989. #define CGEMM_DEFAULT_UNROLL_M 2
  2990. #define CGEMM_DEFAULT_UNROLL_N 2
  2991. #define ZGEMM_DEFAULT_UNROLL_M 2
  2992. #define ZGEMM_DEFAULT_UNROLL_N 2
  2993. #define SGEMM_DEFAULT_P 128
  2994. #define DGEMM_DEFAULT_P 128
  2995. #define CGEMM_DEFAULT_P 96
  2996. #define ZGEMM_DEFAULT_P 64
  2997. #define SGEMM_DEFAULT_Q 240
  2998. #define DGEMM_DEFAULT_Q 120
  2999. #define CGEMM_DEFAULT_Q 120
  3000. #define ZGEMM_DEFAULT_Q 120
  3001. #define SGEMM_DEFAULT_R 12288
  3002. #define DGEMM_DEFAULT_R 8192
  3003. #define CGEMM_DEFAULT_R 4096
  3004. #define ZGEMM_DEFAULT_R 4096
  3005. #define SYMV_P 16
  3006. #endif
  3007. #ifdef CORTEXA15
  3008. #define SNUMOPT 2
  3009. #define DNUMOPT 2
  3010. #define GEMM_DEFAULT_OFFSET_A 0
  3011. #define GEMM_DEFAULT_OFFSET_B 0
  3012. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3013. #define SGEMM_DEFAULT_UNROLL_M 4
  3014. #define SGEMM_DEFAULT_UNROLL_N 4
  3015. #define DGEMM_DEFAULT_UNROLL_M 4
  3016. #define DGEMM_DEFAULT_UNROLL_N 4
  3017. #define CGEMM_DEFAULT_UNROLL_M 2
  3018. #define CGEMM_DEFAULT_UNROLL_N 2
  3019. #define ZGEMM_DEFAULT_UNROLL_M 2
  3020. #define ZGEMM_DEFAULT_UNROLL_N 2
  3021. #define SGEMM_DEFAULT_P 128
  3022. #define DGEMM_DEFAULT_P 128
  3023. #define CGEMM_DEFAULT_P 96
  3024. #define ZGEMM_DEFAULT_P 64
  3025. #define SGEMM_DEFAULT_Q 240
  3026. #define DGEMM_DEFAULT_Q 120
  3027. #define CGEMM_DEFAULT_Q 120
  3028. #define ZGEMM_DEFAULT_Q 120
  3029. #define SGEMM_DEFAULT_R 12288
  3030. #define DGEMM_DEFAULT_R 8192
  3031. #define CGEMM_DEFAULT_R 4096
  3032. #define ZGEMM_DEFAULT_R 4096
  3033. #define SYMV_P 16
  3034. #endif
  3035. #if defined(ZARCH_GENERIC)
  3036. #define SNUMOPT 2
  3037. #define DNUMOPT 2
  3038. #define GEMM_DEFAULT_OFFSET_A 0
  3039. #define GEMM_DEFAULT_OFFSET_B 0
  3040. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3041. #define SGEMM_DEFAULT_UNROLL_M 2
  3042. #define SGEMM_DEFAULT_UNROLL_N 2
  3043. #define DGEMM_DEFAULT_UNROLL_M 2
  3044. #define DGEMM_DEFAULT_UNROLL_N 2
  3045. #define CGEMM_DEFAULT_UNROLL_M 2
  3046. #define CGEMM_DEFAULT_UNROLL_N 2
  3047. #define ZGEMM_DEFAULT_UNROLL_M 2
  3048. #define ZGEMM_DEFAULT_UNROLL_N 2
  3049. #define SGEMM_DEFAULT_P 128
  3050. #define DGEMM_DEFAULT_P 128
  3051. #define CGEMM_DEFAULT_P 96
  3052. #define ZGEMM_DEFAULT_P 64
  3053. #define SGEMM_DEFAULT_Q 240
  3054. #define DGEMM_DEFAULT_Q 120
  3055. #define CGEMM_DEFAULT_Q 120
  3056. #define ZGEMM_DEFAULT_Q 120
  3057. #define SGEMM_DEFAULT_R 12288
  3058. #define DGEMM_DEFAULT_R 8192
  3059. #define CGEMM_DEFAULT_R 4096
  3060. #define ZGEMM_DEFAULT_R 4096
  3061. #define SYMV_P 16
  3062. #endif
  3063. #if defined(Z13)
  3064. #define SNUMOPT 2
  3065. #define DNUMOPT 2
  3066. #define GEMM_DEFAULT_OFFSET_A 0
  3067. #define GEMM_DEFAULT_OFFSET_B 0
  3068. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3069. #define SGEMM_DEFAULT_UNROLL_M 8
  3070. #define SGEMM_DEFAULT_UNROLL_N 4
  3071. #define DGEMM_DEFAULT_UNROLL_M 8
  3072. #define DGEMM_DEFAULT_UNROLL_N 4
  3073. #define CGEMM_DEFAULT_UNROLL_M 4
  3074. #define CGEMM_DEFAULT_UNROLL_N 4
  3075. #define ZGEMM_DEFAULT_UNROLL_M 4
  3076. #define ZGEMM_DEFAULT_UNROLL_N 4
  3077. #define SGEMM_DEFAULT_P 456
  3078. #define DGEMM_DEFAULT_P 320
  3079. #define CGEMM_DEFAULT_P 480
  3080. #define ZGEMM_DEFAULT_P 224
  3081. #define SGEMM_DEFAULT_Q 488
  3082. #define DGEMM_DEFAULT_Q 384
  3083. #define CGEMM_DEFAULT_Q 128
  3084. #define ZGEMM_DEFAULT_Q 352
  3085. #define SGEMM_DEFAULT_R 8192
  3086. #define DGEMM_DEFAULT_R 4096
  3087. #define CGEMM_DEFAULT_R 4096
  3088. #define ZGEMM_DEFAULT_R 2048
  3089. #define SYMV_P 16
  3090. #endif
  3091. #if defined(Z14)
  3092. #define SNUMOPT 2
  3093. #define DNUMOPT 2
  3094. #define GEMM_DEFAULT_OFFSET_A 0
  3095. #define GEMM_DEFAULT_OFFSET_B 0
  3096. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  3097. #define SGEMM_DEFAULT_UNROLL_M 16
  3098. #define SGEMM_DEFAULT_UNROLL_N 4
  3099. #define DGEMM_DEFAULT_UNROLL_M 8
  3100. #define DGEMM_DEFAULT_UNROLL_N 4
  3101. #define CGEMM_DEFAULT_UNROLL_M 4
  3102. #define CGEMM_DEFAULT_UNROLL_N 4
  3103. #define ZGEMM_DEFAULT_UNROLL_M 4
  3104. #define ZGEMM_DEFAULT_UNROLL_N 4
  3105. #define SGEMM_DEFAULT_P 480
  3106. #define DGEMM_DEFAULT_P 320
  3107. #define CGEMM_DEFAULT_P 480
  3108. #define ZGEMM_DEFAULT_P 224
  3109. #define SGEMM_DEFAULT_Q 512
  3110. #define DGEMM_DEFAULT_Q 384
  3111. #define CGEMM_DEFAULT_Q 128
  3112. #define ZGEMM_DEFAULT_Q 352
  3113. #define SGEMM_DEFAULT_R 8192
  3114. #define DGEMM_DEFAULT_R 4096
  3115. #define CGEMM_DEFAULT_R 4096
  3116. #define ZGEMM_DEFAULT_R 2048
  3117. #define SYMV_P 16
  3118. #endif
  3119. #if defined(CSKY) || defined(CK860FV)
  3120. #define GEMM_DEFAULT_OFFSET_A 0
  3121. #define GEMM_DEFAULT_OFFSET_B 0
  3122. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  3123. #define SGEMM_DEFAULT_UNROLL_M 2
  3124. #define SGEMM_DEFAULT_UNROLL_N 2
  3125. #define DGEMM_DEFAULT_UNROLL_M 2
  3126. #define DGEMM_DEFAULT_UNROLL_N 2
  3127. #define CGEMM_DEFAULT_UNROLL_M 2
  3128. #define CGEMM_DEFAULT_UNROLL_N 2
  3129. #define ZGEMM_DEFAULT_UNROLL_M 2
  3130. #define ZGEMM_DEFAULT_UNROLL_N 2
  3131. #define SGEMM_DEFAULT_P 128
  3132. #define DGEMM_DEFAULT_P 128
  3133. #define CGEMM_DEFAULT_P 96
  3134. #define ZGEMM_DEFAULT_P 64
  3135. #define SGEMM_DEFAULT_Q 240
  3136. #define DGEMM_DEFAULT_Q 120
  3137. #define CGEMM_DEFAULT_Q 120
  3138. #define ZGEMM_DEFAULT_Q 120
  3139. #define SGEMM_DEFAULT_R 12288
  3140. #define DGEMM_DEFAULT_R 8192
  3141. #define CGEMM_DEFAULT_R 4096
  3142. #define ZGEMM_DEFAULT_R 4096
  3143. #define SYMV_P 16
  3144. #define GEMM_DEFAULT_OFFSET_A 0
  3145. #define GEMM_DEFAULT_OFFSET_B 0
  3146. #endif
  3147. #ifdef GENERIC
  3148. #define SNUMOPT 2
  3149. #define DNUMOPT 2
  3150. #define GEMM_DEFAULT_OFFSET_A 0
  3151. #define GEMM_DEFAULT_OFFSET_B 0
  3152. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  3153. #define SGEMM_DEFAULT_UNROLL_N 2
  3154. #define DGEMM_DEFAULT_UNROLL_N 2
  3155. #define QGEMM_DEFAULT_UNROLL_N 2
  3156. #define CGEMM_DEFAULT_UNROLL_N 2
  3157. #define ZGEMM_DEFAULT_UNROLL_N 2
  3158. #define XGEMM_DEFAULT_UNROLL_N 1
  3159. #ifdef ARCH_X86
  3160. #define SGEMM_DEFAULT_UNROLL_M 2
  3161. #define DGEMM_DEFAULT_UNROLL_M 2
  3162. #define QGEMM_DEFAULT_UNROLL_M 2
  3163. #define CGEMM_DEFAULT_UNROLL_M 2
  3164. #define ZGEMM_DEFAULT_UNROLL_M 2
  3165. #define XGEMM_DEFAULT_UNROLL_M 1
  3166. #else
  3167. #define SGEMM_DEFAULT_UNROLL_M 2
  3168. #define DGEMM_DEFAULT_UNROLL_M 2
  3169. #define QGEMM_DEFAULT_UNROLL_M 2
  3170. #define CGEMM_DEFAULT_UNROLL_M 2
  3171. #define ZGEMM_DEFAULT_UNROLL_M 2
  3172. #define XGEMM_DEFAULT_UNROLL_M 1
  3173. #endif
  3174. #ifdef ARCH_MIPS
  3175. #define SGEMM_DEFAULT_P 128
  3176. #define DGEMM_DEFAULT_P 128
  3177. #define CGEMM_DEFAULT_P 96
  3178. #define ZGEMM_DEFAULT_P 64
  3179. #define SGEMM_DEFAULT_Q 240
  3180. #define DGEMM_DEFAULT_Q 120
  3181. #define CGEMM_DEFAULT_Q 120
  3182. #define ZGEMM_DEFAULT_Q 120
  3183. #define SGEMM_DEFAULT_R 12288
  3184. #define DGEMM_DEFAULT_R 8192
  3185. #define CGEMM_DEFAULT_R 4096
  3186. #define ZGEMM_DEFAULT_R 4096
  3187. #elif defined(ARCH_LOONGARCH64)
  3188. #define SGEMM_DEFAULT_P 128
  3189. #define DGEMM_DEFAULT_P 128
  3190. #define CGEMM_DEFAULT_P 96
  3191. #define ZGEMM_DEFAULT_P 64
  3192. #define SGEMM_DEFAULT_Q 240
  3193. #define DGEMM_DEFAULT_Q 120
  3194. #define CGEMM_DEFAULT_Q 120
  3195. #define ZGEMM_DEFAULT_Q 120
  3196. #define SGEMM_DEFAULT_R 12288
  3197. #define DGEMM_DEFAULT_R 8192
  3198. #define CGEMM_DEFAULT_R 4096
  3199. #define ZGEMM_DEFAULT_R 4096
  3200. #else
  3201. #define SGEMM_DEFAULT_P sgemm_p
  3202. #define DGEMM_DEFAULT_P dgemm_p
  3203. #define QGEMM_DEFAULT_P qgemm_p
  3204. #define CGEMM_DEFAULT_P cgemm_p
  3205. #define ZGEMM_DEFAULT_P zgemm_p
  3206. #define XGEMM_DEFAULT_P xgemm_p
  3207. #define SGEMM_DEFAULT_R sgemm_r
  3208. #define DGEMM_DEFAULT_R dgemm_r
  3209. #define QGEMM_DEFAULT_R qgemm_r
  3210. #define CGEMM_DEFAULT_R cgemm_r
  3211. #define ZGEMM_DEFAULT_R zgemm_r
  3212. #define XGEMM_DEFAULT_R xgemm_r
  3213. #define SGEMM_DEFAULT_Q 128
  3214. #define DGEMM_DEFAULT_Q 128
  3215. #define QGEMM_DEFAULT_Q 128
  3216. #define CGEMM_DEFAULT_Q 128
  3217. #define ZGEMM_DEFAULT_Q 128
  3218. #define XGEMM_DEFAULT_Q 128
  3219. #endif
  3220. #define SYMV_P 16
  3221. #endif
  3222. #ifndef SWITCH_RATIO
  3223. #define SWITCH_RATIO 2
  3224. #endif
  3225. #ifndef QGEMM_DEFAULT_UNROLL_M
  3226. #define QGEMM_DEFAULT_UNROLL_M 2
  3227. #endif
  3228. #ifndef QGEMM_DEFAULT_UNROLL_N
  3229. #define QGEMM_DEFAULT_UNROLL_N 2
  3230. #endif
  3231. #ifndef XGEMM_DEFAULT_UNROLL_M
  3232. #define XGEMM_DEFAULT_UNROLL_M 2
  3233. #endif
  3234. #ifndef XGEMM_DEFAULT_UNROLL_N
  3235. #define XGEMM_DEFAULT_UNROLL_N 2
  3236. #endif
  3237. #ifndef HAVE_SSE2
  3238. #define SHUFPD_0 shufps $0x44,
  3239. #define SHUFPD_1 shufps $0x4e,
  3240. #define SHUFPD_2 shufps $0xe4,
  3241. #define SHUFPD_3 shufps $0xee,
  3242. #endif
  3243. #ifndef SHUFPD_0
  3244. #define SHUFPD_0 shufpd $0,
  3245. #endif
  3246. #ifndef SHUFPD_1
  3247. #define SHUFPD_1 shufpd $1,
  3248. #endif
  3249. #ifndef SHUFPD_2
  3250. #define SHUFPD_2 shufpd $2,
  3251. #endif
  3252. #ifndef SHUFPD_3
  3253. #define SHUFPD_3 shufpd $3,
  3254. #endif
  3255. #ifndef SHUFPS_39
  3256. #define SHUFPS_39 shufps $0x39,
  3257. #endif
  3258. #endif