You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

param.h 82 kB

12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
5 years ago
12 years ago
12 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
12 years ago
12 years ago
12 years ago
5 years ago
5 years ago
12 years ago
5 years ago
5 years ago
12 years ago
5 years ago
12 years ago
5 years ago
5 years ago
5 years ago
12 years ago
6 years ago
12 years ago
12 years ago
12 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
12 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
12 years ago
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000200120022003200420052006200720082009201020112012201320142015201620172018201920202021202220232024202520262027202820292030203120322033203420352036203720382039204020412042204320442045204620472048204920502051205220532054205520562057205820592060206120622063206420652066206720682069207020712072207320742075207620772078207920802081208220832084208520862087208820892090209120922093209420952096209720982099210021012102210321042105210621072108210921102111211221132114211521162117211821192120212121222123212421252126212721282129213021312132213321342135213621372138213921402141214221432144214521462147214821492150215121522153215421552156215721582159216021612162216321642165216621672168216921702171217221732174217521762177217821792180218121822183218421852186218721882189219021912192219321942195219621972198219922002201220222032204220522062207220822092210221122122213221422152216221722182219222022212222222322242225222622272228222922302231223222332234223522362237223822392240224122422243224422452246224722482249225022512252225322542255225622572258225922602261226222632264226522662267226822692270227122722273227422752276227722782279228022812282228322842285228622872288228922902291229222932294229522962297229822992300230123022303230423052306230723082309231023112312231323142315231623172318231923202321232223232324232523262327232823292330233123322333233423352336233723382339234023412342234323442345234623472348234923502351235223532354235523562357235823592360236123622363236423652366236723682369237023712372237323742375237623772378237923802381238223832384238523862387238823892390239123922393239423952396239723982399240024012402240324042405240624072408240924102411241224132414241524162417241824192420242124222423242424252426242724282429243024312432243324342435243624372438243924402441244224432444244524462447244824492450245124522453245424552456245724582459246024612462246324642465246624672468246924702471247224732474247524762477247824792480248124822483248424852486248724882489249024912492249324942495249624972498249925002501250225032504250525062507250825092510251125122513251425152516251725182519252025212522252325242525252625272528252925302531253225332534253525362537253825392540254125422543254425452546254725482549255025512552255325542555255625572558255925602561256225632564256525662567256825692570257125722573257425752576257725782579258025812582258325842585258625872588258925902591259225932594259525962597259825992600260126022603260426052606260726082609261026112612261326142615261626172618261926202621262226232624262526262627262826292630263126322633263426352636263726382639264026412642264326442645264626472648264926502651265226532654265526562657265826592660266126622663266426652666266726682669267026712672267326742675267626772678267926802681268226832684268526862687268826892690269126922693269426952696269726982699270027012702270327042705270627072708270927102711271227132714271527162717271827192720272127222723272427252726272727282729273027312732273327342735273627372738273927402741274227432744274527462747274827492750275127522753275427552756275727582759276027612762276327642765276627672768276927702771277227732774277527762777277827792780278127822783278427852786278727882789279027912792279327942795279627972798279928002801280228032804280528062807280828092810281128122813281428152816281728182819282028212822282328242825282628272828282928302831283228332834283528362837283828392840284128422843284428452846284728482849285028512852285328542855285628572858285928602861286228632864286528662867286828692870287128722873287428752876287728782879288028812882288328842885288628872888288928902891289228932894289528962897289828992900290129022903290429052906290729082909291029112912291329142915291629172918291929202921292229232924292529262927292829292930293129322933293429352936293729382939294029412942294329442945294629472948294929502951295229532954295529562957295829592960296129622963296429652966296729682969297029712972297329742975297629772978297929802981298229832984298529862987298829892990299129922993299429952996299729982999300030013002300330043005300630073008300930103011301230133014301530163017301830193020302130223023302430253026302730283029303030313032303330343035303630373038303930403041304230433044304530463047304830493050305130523053305430553056305730583059306030613062306330643065306630673068306930703071307230733074307530763077307830793080308130823083308430853086308730883089309030913092309330943095309630973098309931003101310231033104310531063107310831093110311131123113311431153116311731183119312031213122312331243125312631273128312931303131313231333134313531363137313831393140314131423143314431453146314731483149315031513152315331543155315631573158315931603161316231633164316531663167316831693170317131723173317431753176317731783179318031813182318331843185318631873188318931903191319231933194319531963197319831993200320132023203320432053206320732083209321032113212321332143215321632173218321932203221322232233224322532263227322832293230323132323233323432353236323732383239324032413242324332443245324632473248324932503251325232533254325532563257325832593260326132623263326432653266326732683269327032713272327332743275327632773278327932803281328232833284328532863287328832893290329132923293329432953296329732983299330033013302330333043305330633073308330933103311331233133314331533163317331833193320332133223323332433253326332733283329333033313332333333343335333633373338333933403341334233433344334533463347334833493350335133523353335433553356335733583359336033613362336333643365336633673368336933703371337233733374337533763377337833793380338133823383338433853386338733883389339033913392339333943395339633973398339934003401340234033404340534063407340834093410341134123413341434153416341734183419342034213422342334243425342634273428342934303431343234333434343534363437343834393440344134423443344434453446344734483449345034513452345334543455345634573458345934603461346234633464346534663467346834693470347134723473347434753476347734783479348034813482
  1. /*****************************************************************************
  2. Copyright (c) 2011-2014, The OpenBLAS Project
  3. All rights reserved.
  4. Redistribution and use in source and binary forms, with or without
  5. modification, are permitted provided that the following conditions are
  6. met:
  7. 1. Redistributions of source code must retain the above copyright
  8. notice, this list of conditions and the following disclaimer.
  9. 2. Redistributions in binary form must reproduce the above copyright
  10. notice, this list of conditions and the following disclaimer in
  11. the documentation and/or other materials provided with the
  12. distribution.
  13. 3. Neither the name of the OpenBLAS project nor the names of
  14. its contributors may be used to endorse or promote products
  15. derived from this software without specific prior written
  16. permission.
  17. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  18. AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  19. IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  20. ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  21. LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  22. DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  23. SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  24. CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  25. OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
  26. USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  27. **********************************************************************************/
  28. /*********************************************************************/
  29. /* Copyright 2009, 2010 The University of Texas at Austin. */
  30. /* All rights reserved. */
  31. /* */
  32. /* Redistribution and use in source and binary forms, with or */
  33. /* without modification, are permitted provided that the following */
  34. /* conditions are met: */
  35. /* */
  36. /* 1. Redistributions of source code must retain the above */
  37. /* copyright notice, this list of conditions and the following */
  38. /* disclaimer. */
  39. /* */
  40. /* 2. Redistributions in binary form must reproduce the above */
  41. /* copyright notice, this list of conditions and the following */
  42. /* disclaimer in the documentation and/or other materials */
  43. /* provided with the distribution. */
  44. /* */
  45. /* THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF TEXAS AT */
  46. /* AUSTIN ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, */
  47. /* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */
  48. /* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */
  49. /* DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT */
  50. /* AUSTIN OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, */
  51. /* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES */
  52. /* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE */
  53. /* GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR */
  54. /* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF */
  55. /* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT */
  56. /* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT */
  57. /* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */
  58. /* POSSIBILITY OF SUCH DAMAGE. */
  59. /* */
  60. /* The views and conclusions contained in the software and */
  61. /* documentation are those of the authors and should not be */
  62. /* interpreted as representing official policies, either expressed */
  63. /* or implied, of The University of Texas at Austin. */
  64. /*********************************************************************/
  65. #ifndef PARAM_H
  66. #define PARAM_H
  67. #define LONGCAST (BLASLONG)
  68. #if defined(__BYTE_ORDER__)
  69. #if __GNUC__ < 9
  70. #undef LONGCAST
  71. #define LONGCAST
  72. #endif
  73. #endif
  74. #define SBGEMM_DEFAULT_UNROLL_N 4
  75. #define SBGEMM_DEFAULT_UNROLL_M 8
  76. #define SBGEMM_DEFAULT_UNROLL_MN 32
  77. #define SBGEMM_DEFAULT_P 256
  78. #define SBGEMM_DEFAULT_R 256
  79. #define SBGEMM_DEFAULT_Q 256
  80. #ifdef OPTERON
  81. #define SNUMOPT 4
  82. #define DNUMOPT 2
  83. #define GEMM_DEFAULT_OFFSET_A 64
  84. #define GEMM_DEFAULT_OFFSET_B 256
  85. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x01ffffUL
  86. #define SGEMM_DEFAULT_UNROLL_N 4
  87. #define DGEMM_DEFAULT_UNROLL_N 4
  88. #define QGEMM_DEFAULT_UNROLL_N 2
  89. #define CGEMM_DEFAULT_UNROLL_N 2
  90. #define ZGEMM_DEFAULT_UNROLL_N 2
  91. #define XGEMM_DEFAULT_UNROLL_N 1
  92. #ifdef ARCH_X86
  93. #define SGEMM_DEFAULT_UNROLL_M 4
  94. #define DGEMM_DEFAULT_UNROLL_M 2
  95. #define QGEMM_DEFAULT_UNROLL_M 2
  96. #define CGEMM_DEFAULT_UNROLL_M 2
  97. #define ZGEMM_DEFAULT_UNROLL_M 1
  98. #define XGEMM_DEFAULT_UNROLL_M 1
  99. #else
  100. #define SGEMM_DEFAULT_UNROLL_M 8
  101. #define DGEMM_DEFAULT_UNROLL_M 4
  102. #define QGEMM_DEFAULT_UNROLL_M 2
  103. #define CGEMM_DEFAULT_UNROLL_M 4
  104. #define ZGEMM_DEFAULT_UNROLL_M 2
  105. #define XGEMM_DEFAULT_UNROLL_M 1
  106. #endif
  107. #define SGEMM_DEFAULT_P sgemm_p
  108. #define DGEMM_DEFAULT_P dgemm_p
  109. #define QGEMM_DEFAULT_P qgemm_p
  110. #define CGEMM_DEFAULT_P cgemm_p
  111. #define ZGEMM_DEFAULT_P zgemm_p
  112. #define XGEMM_DEFAULT_P xgemm_p
  113. #define SGEMM_DEFAULT_R sgemm_r
  114. #define DGEMM_DEFAULT_R dgemm_r
  115. #define QGEMM_DEFAULT_R qgemm_r
  116. #define CGEMM_DEFAULT_R cgemm_r
  117. #define ZGEMM_DEFAULT_R zgemm_r
  118. #define XGEMM_DEFAULT_R xgemm_r
  119. #ifdef ALLOC_HUGETLB
  120. #define SGEMM_DEFAULT_Q 248
  121. #define DGEMM_DEFAULT_Q 248
  122. #define QGEMM_DEFAULT_Q 248
  123. #define CGEMM_DEFAULT_Q 248
  124. #define ZGEMM_DEFAULT_Q 248
  125. #define XGEMM_DEFAULT_Q 248
  126. #else
  127. #define SGEMM_DEFAULT_Q 240
  128. #define DGEMM_DEFAULT_Q 240
  129. #define QGEMM_DEFAULT_Q 240
  130. #define CGEMM_DEFAULT_Q 240
  131. #define ZGEMM_DEFAULT_Q 240
  132. #define XGEMM_DEFAULT_Q 240
  133. #endif
  134. #define SYMV_P 16
  135. #define HAVE_EXCLUSIVE_CACHE
  136. #endif
  137. #if defined(BARCELONA) || defined(SHANGHAI) || defined(BOBCAT)
  138. #define SNUMOPT 8
  139. #define DNUMOPT 4
  140. #define GEMM_DEFAULT_OFFSET_A 64
  141. #define GEMM_DEFAULT_OFFSET_B 832
  142. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  143. #define SGEMM_DEFAULT_UNROLL_N 4
  144. #define DGEMM_DEFAULT_UNROLL_N 4
  145. #define QGEMM_DEFAULT_UNROLL_N 2
  146. #define CGEMM_DEFAULT_UNROLL_N 2
  147. #define ZGEMM_DEFAULT_UNROLL_N 2
  148. #define XGEMM_DEFAULT_UNROLL_N 1
  149. #ifdef ARCH_X86
  150. #define SGEMM_DEFAULT_UNROLL_M 4
  151. #define DGEMM_DEFAULT_UNROLL_M 2
  152. #define QGEMM_DEFAULT_UNROLL_M 2
  153. #define CGEMM_DEFAULT_UNROLL_M 2
  154. #define ZGEMM_DEFAULT_UNROLL_M 1
  155. #define XGEMM_DEFAULT_UNROLL_M 1
  156. #else
  157. #define SGEMM_DEFAULT_UNROLL_M 8
  158. #define DGEMM_DEFAULT_UNROLL_M 4
  159. #define QGEMM_DEFAULT_UNROLL_M 2
  160. #define CGEMM_DEFAULT_UNROLL_M 4
  161. #define ZGEMM_DEFAULT_UNROLL_M 2
  162. #define XGEMM_DEFAULT_UNROLL_M 1
  163. #endif
  164. #if 0
  165. #define SGEMM_DEFAULT_P 496
  166. #define DGEMM_DEFAULT_P 248
  167. #define QGEMM_DEFAULT_P 124
  168. #define CGEMM_DEFAULT_P 248
  169. #define ZGEMM_DEFAULT_P 124
  170. #define XGEMM_DEFAULT_P 62
  171. #define SGEMM_DEFAULT_Q 248
  172. #define DGEMM_DEFAULT_Q 248
  173. #define QGEMM_DEFAULT_Q 248
  174. #define CGEMM_DEFAULT_Q 248
  175. #define ZGEMM_DEFAULT_Q 248
  176. #define XGEMM_DEFAULT_Q 248
  177. #else
  178. #define SGEMM_DEFAULT_P 448
  179. #define DGEMM_DEFAULT_P 224
  180. #define QGEMM_DEFAULT_P 112
  181. #define CGEMM_DEFAULT_P 224
  182. #define ZGEMM_DEFAULT_P 112
  183. #define XGEMM_DEFAULT_P 56
  184. #define SGEMM_DEFAULT_Q 224
  185. #define DGEMM_DEFAULT_Q 224
  186. #define QGEMM_DEFAULT_Q 224
  187. #define CGEMM_DEFAULT_Q 224
  188. #define ZGEMM_DEFAULT_Q 224
  189. #define XGEMM_DEFAULT_Q 224
  190. #endif
  191. #define SGEMM_DEFAULT_R sgemm_r
  192. #define QGEMM_DEFAULT_R qgemm_r
  193. #define DGEMM_DEFAULT_R dgemm_r
  194. #define CGEMM_DEFAULT_R cgemm_r
  195. #define ZGEMM_DEFAULT_R zgemm_r
  196. #define XGEMM_DEFAULT_R xgemm_r
  197. #define SYMV_P 16
  198. #define HAVE_EXCLUSIVE_CACHE
  199. #define GEMM_THREAD gemm_thread_mn
  200. #endif
  201. #ifdef BULLDOZER
  202. #define SNUMOPT 8
  203. #define DNUMOPT 4
  204. #define GEMM_DEFAULT_OFFSET_A 64
  205. #define GEMM_DEFAULT_OFFSET_B 832
  206. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  207. #define QGEMM_DEFAULT_UNROLL_N 2
  208. #define CGEMM_DEFAULT_UNROLL_N 2
  209. #define ZGEMM_DEFAULT_UNROLL_N 2
  210. #define XGEMM_DEFAULT_UNROLL_N 1
  211. #ifdef ARCH_X86
  212. #define SGEMM_DEFAULT_UNROLL_N 4
  213. #define DGEMM_DEFAULT_UNROLL_N 4
  214. #define SGEMM_DEFAULT_UNROLL_M 4
  215. #define DGEMM_DEFAULT_UNROLL_M 2
  216. #define QGEMM_DEFAULT_UNROLL_M 2
  217. #define CGEMM_DEFAULT_UNROLL_M 2
  218. #define ZGEMM_DEFAULT_UNROLL_M 1
  219. #define XGEMM_DEFAULT_UNROLL_M 1
  220. #else
  221. #define SGEMM_DEFAULT_UNROLL_N 2
  222. #define DGEMM_DEFAULT_UNROLL_N 2
  223. #define SGEMM_DEFAULT_UNROLL_M 16
  224. #define DGEMM_DEFAULT_UNROLL_M 8
  225. #define QGEMM_DEFAULT_UNROLL_M 2
  226. #define CGEMM_DEFAULT_UNROLL_M 4
  227. #define ZGEMM_DEFAULT_UNROLL_M 2
  228. #define XGEMM_DEFAULT_UNROLL_M 1
  229. #define CGEMM3M_DEFAULT_UNROLL_N 4
  230. #define CGEMM3M_DEFAULT_UNROLL_M 8
  231. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  232. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  233. #define DGEMM_DEFAULT_UNROLL_MN 16
  234. #define GEMV_UNROLL 8
  235. #endif
  236. #if defined(ARCH_X86_64)
  237. #define SGEMM_DEFAULT_P 768
  238. #define DGEMM_DEFAULT_P 384
  239. #else
  240. #define SGEMM_DEFAULT_P 448
  241. #define DGEMM_DEFAULT_P 224
  242. #endif
  243. #define QGEMM_DEFAULT_P 112
  244. #define CGEMM_DEFAULT_P 224
  245. #define ZGEMM_DEFAULT_P 112
  246. #define XGEMM_DEFAULT_P 56
  247. #if defined(ARCH_X86_64)
  248. #define SGEMM_DEFAULT_Q 168
  249. #define DGEMM_DEFAULT_Q 168
  250. #else
  251. #define SGEMM_DEFAULT_Q 224
  252. #define DGEMM_DEFAULT_Q 224
  253. #endif
  254. #define QGEMM_DEFAULT_Q 224
  255. #define CGEMM_DEFAULT_Q 224
  256. #define ZGEMM_DEFAULT_Q 224
  257. #define XGEMM_DEFAULT_Q 224
  258. #define CGEMM3M_DEFAULT_P 448
  259. #define ZGEMM3M_DEFAULT_P 224
  260. #define XGEMM3M_DEFAULT_P 112
  261. #define CGEMM3M_DEFAULT_Q 224
  262. #define ZGEMM3M_DEFAULT_Q 224
  263. #define XGEMM3M_DEFAULT_Q 224
  264. #define CGEMM3M_DEFAULT_R 12288
  265. #define ZGEMM3M_DEFAULT_R 12288
  266. #define XGEMM3M_DEFAULT_R 12288
  267. #define SGEMM_DEFAULT_R sgemm_r
  268. #define QGEMM_DEFAULT_R qgemm_r
  269. #define DGEMM_DEFAULT_R dgemm_r
  270. #define CGEMM_DEFAULT_R cgemm_r
  271. #define ZGEMM_DEFAULT_R zgemm_r
  272. #define XGEMM_DEFAULT_R xgemm_r
  273. #define SYMV_P 16
  274. #define HAVE_EXCLUSIVE_CACHE
  275. #define GEMM_THREAD gemm_thread_mn
  276. #endif
  277. #ifdef PILEDRIVER
  278. #define SNUMOPT 8
  279. #define DNUMOPT 4
  280. #define GEMM_DEFAULT_OFFSET_A 64
  281. #define GEMM_DEFAULT_OFFSET_B 832
  282. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  283. #define QGEMM_DEFAULT_UNROLL_N 2
  284. #define CGEMM_DEFAULT_UNROLL_N 2
  285. #define ZGEMM_DEFAULT_UNROLL_N 2
  286. #define XGEMM_DEFAULT_UNROLL_N 1
  287. #ifdef ARCH_X86
  288. #define SGEMM_DEFAULT_UNROLL_N 4
  289. #define DGEMM_DEFAULT_UNROLL_N 4
  290. #define SGEMM_DEFAULT_UNROLL_M 4
  291. #define DGEMM_DEFAULT_UNROLL_M 2
  292. #define QGEMM_DEFAULT_UNROLL_M 2
  293. #define CGEMM_DEFAULT_UNROLL_M 2
  294. #define ZGEMM_DEFAULT_UNROLL_M 1
  295. #define XGEMM_DEFAULT_UNROLL_M 1
  296. #else
  297. #define SGEMM_DEFAULT_UNROLL_N 2
  298. #define DGEMM_DEFAULT_UNROLL_N 2
  299. #define SGEMM_DEFAULT_UNROLL_M 16
  300. #define DGEMM_DEFAULT_UNROLL_M 8
  301. #define QGEMM_DEFAULT_UNROLL_M 2
  302. #define CGEMM_DEFAULT_UNROLL_M 4
  303. #define ZGEMM_DEFAULT_UNROLL_M 2
  304. #define XGEMM_DEFAULT_UNROLL_M 1
  305. #define CGEMM3M_DEFAULT_UNROLL_N 4
  306. #define CGEMM3M_DEFAULT_UNROLL_M 8
  307. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  308. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  309. #define GEMV_UNROLL 8
  310. #endif
  311. #if defined(ARCH_X86_64)
  312. #define SGEMM_DEFAULT_P 768
  313. #define DGEMM_DEFAULT_P 768
  314. #define ZGEMM_DEFAULT_P 384
  315. #define CGEMM_DEFAULT_P 768
  316. #else
  317. #define SGEMM_DEFAULT_P 448
  318. #define DGEMM_DEFAULT_P 480
  319. #define ZGEMM_DEFAULT_P 112
  320. #define CGEMM_DEFAULT_P 224
  321. #endif
  322. #define QGEMM_DEFAULT_P 112
  323. #define XGEMM_DEFAULT_P 56
  324. #if defined(ARCH_X86_64)
  325. #define SGEMM_DEFAULT_Q 192
  326. #define DGEMM_DEFAULT_Q 168
  327. #define ZGEMM_DEFAULT_Q 168
  328. #define CGEMM_DEFAULT_Q 168
  329. #else
  330. #define SGEMM_DEFAULT_Q 224
  331. #define DGEMM_DEFAULT_Q 224
  332. #define ZGEMM_DEFAULT_Q 224
  333. #define CGEMM_DEFAULT_Q 224
  334. #endif
  335. #define QGEMM_DEFAULT_Q 224
  336. #define XGEMM_DEFAULT_Q 224
  337. #define CGEMM3M_DEFAULT_P 448
  338. #define ZGEMM3M_DEFAULT_P 224
  339. #define XGEMM3M_DEFAULT_P 112
  340. #define CGEMM3M_DEFAULT_Q 224
  341. #define ZGEMM3M_DEFAULT_Q 224
  342. #define XGEMM3M_DEFAULT_Q 224
  343. #define CGEMM3M_DEFAULT_R 12288
  344. #define ZGEMM3M_DEFAULT_R 12288
  345. #define XGEMM3M_DEFAULT_R 12288
  346. #define SGEMM_DEFAULT_R 12288
  347. #define QGEMM_DEFAULT_R qgemm_r
  348. #define DGEMM_DEFAULT_R 12288
  349. #define CGEMM_DEFAULT_R cgemm_r
  350. #define ZGEMM_DEFAULT_R zgemm_r
  351. #define XGEMM_DEFAULT_R xgemm_r
  352. #define SYMV_P 16
  353. #define HAVE_EXCLUSIVE_CACHE
  354. #define GEMM_THREAD gemm_thread_mn
  355. #endif
  356. #ifdef STEAMROLLER
  357. #define SNUMOPT 8
  358. #define DNUMOPT 4
  359. #define GEMM_DEFAULT_OFFSET_A 64
  360. #define GEMM_DEFAULT_OFFSET_B 832
  361. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  362. #define QGEMM_DEFAULT_UNROLL_N 2
  363. #define CGEMM_DEFAULT_UNROLL_N 2
  364. #define ZGEMM_DEFAULT_UNROLL_N 2
  365. #define XGEMM_DEFAULT_UNROLL_N 1
  366. #ifdef ARCH_X86
  367. #define SGEMM_DEFAULT_UNROLL_N 4
  368. #define DGEMM_DEFAULT_UNROLL_N 4
  369. #define SGEMM_DEFAULT_UNROLL_M 4
  370. #define DGEMM_DEFAULT_UNROLL_M 2
  371. #define QGEMM_DEFAULT_UNROLL_M 2
  372. #define CGEMM_DEFAULT_UNROLL_M 2
  373. #define ZGEMM_DEFAULT_UNROLL_M 1
  374. #define XGEMM_DEFAULT_UNROLL_M 1
  375. #else
  376. #define SGEMM_DEFAULT_UNROLL_N 2
  377. #define DGEMM_DEFAULT_UNROLL_N 2
  378. #define SGEMM_DEFAULT_UNROLL_M 16
  379. #define DGEMM_DEFAULT_UNROLL_M 8
  380. #define QGEMM_DEFAULT_UNROLL_M 2
  381. #define CGEMM_DEFAULT_UNROLL_M 4
  382. #define ZGEMM_DEFAULT_UNROLL_M 2
  383. #define XGEMM_DEFAULT_UNROLL_M 1
  384. #define CGEMM3M_DEFAULT_UNROLL_N 4
  385. #define CGEMM3M_DEFAULT_UNROLL_M 8
  386. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  387. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  388. #define GEMV_UNROLL 8
  389. #endif
  390. #if defined(ARCH_X86_64)
  391. #define SGEMM_DEFAULT_P 768
  392. #define DGEMM_DEFAULT_P 576
  393. #define ZGEMM_DEFAULT_P 288
  394. #define CGEMM_DEFAULT_P 576
  395. #else
  396. #define SGEMM_DEFAULT_P 448
  397. #define DGEMM_DEFAULT_P 480
  398. #define ZGEMM_DEFAULT_P 112
  399. #define CGEMM_DEFAULT_P 224
  400. #endif
  401. #define QGEMM_DEFAULT_P 112
  402. #define XGEMM_DEFAULT_P 56
  403. #if defined(ARCH_X86_64)
  404. #define SGEMM_DEFAULT_Q 192
  405. #define DGEMM_DEFAULT_Q 160
  406. #define ZGEMM_DEFAULT_Q 160
  407. #define CGEMM_DEFAULT_Q 160
  408. #else
  409. #define SGEMM_DEFAULT_Q 224
  410. #define DGEMM_DEFAULT_Q 224
  411. #define ZGEMM_DEFAULT_Q 224
  412. #define CGEMM_DEFAULT_Q 224
  413. #endif
  414. #define QGEMM_DEFAULT_Q 224
  415. #define XGEMM_DEFAULT_Q 224
  416. #define CGEMM3M_DEFAULT_P 448
  417. #define ZGEMM3M_DEFAULT_P 224
  418. #define XGEMM3M_DEFAULT_P 112
  419. #define CGEMM3M_DEFAULT_Q 224
  420. #define ZGEMM3M_DEFAULT_Q 224
  421. #define XGEMM3M_DEFAULT_Q 224
  422. #define CGEMM3M_DEFAULT_R 12288
  423. #define ZGEMM3M_DEFAULT_R 12288
  424. #define XGEMM3M_DEFAULT_R 12288
  425. #define SGEMM_DEFAULT_R 12288
  426. #define QGEMM_DEFAULT_R qgemm_r
  427. #define DGEMM_DEFAULT_R 12288
  428. #define CGEMM_DEFAULT_R cgemm_r
  429. #define ZGEMM_DEFAULT_R zgemm_r
  430. #define XGEMM_DEFAULT_R xgemm_r
  431. #define SYMV_P 16
  432. #define HAVE_EXCLUSIVE_CACHE
  433. #define GEMM_THREAD gemm_thread_mn
  434. #endif
  435. #ifdef EXCAVATOR
  436. #define SNUMOPT 8
  437. #define DNUMOPT 4
  438. #define GEMM_DEFAULT_OFFSET_A 64
  439. #define GEMM_DEFAULT_OFFSET_B 832
  440. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  441. #define QGEMM_DEFAULT_UNROLL_N 2
  442. #define CGEMM_DEFAULT_UNROLL_N 2
  443. #define ZGEMM_DEFAULT_UNROLL_N 2
  444. #define XGEMM_DEFAULT_UNROLL_N 1
  445. #ifdef ARCH_X86
  446. #define SGEMM_DEFAULT_UNROLL_N 4
  447. #define DGEMM_DEFAULT_UNROLL_N 4
  448. #define SGEMM_DEFAULT_UNROLL_M 4
  449. #define DGEMM_DEFAULT_UNROLL_M 2
  450. #define QGEMM_DEFAULT_UNROLL_M 2
  451. #define CGEMM_DEFAULT_UNROLL_M 2
  452. #define ZGEMM_DEFAULT_UNROLL_M 1
  453. #define XGEMM_DEFAULT_UNROLL_M 1
  454. #else
  455. #define SGEMM_DEFAULT_UNROLL_N 2
  456. #define DGEMM_DEFAULT_UNROLL_N 2
  457. #define SGEMM_DEFAULT_UNROLL_M 16
  458. #define DGEMM_DEFAULT_UNROLL_M 8
  459. #define QGEMM_DEFAULT_UNROLL_M 2
  460. #define CGEMM_DEFAULT_UNROLL_M 4
  461. #define ZGEMM_DEFAULT_UNROLL_M 2
  462. #define XGEMM_DEFAULT_UNROLL_M 1
  463. #define CGEMM3M_DEFAULT_UNROLL_N 4
  464. #define CGEMM3M_DEFAULT_UNROLL_M 8
  465. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  466. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  467. #define GEMV_UNROLL 8
  468. #endif
  469. #if defined(ARCH_X86_64)
  470. #define SGEMM_DEFAULT_P 768
  471. #define DGEMM_DEFAULT_P 576
  472. #define ZGEMM_DEFAULT_P 288
  473. #define CGEMM_DEFAULT_P 576
  474. #else
  475. #define SGEMM_DEFAULT_P 448
  476. #define DGEMM_DEFAULT_P 480
  477. #define ZGEMM_DEFAULT_P 112
  478. #define CGEMM_DEFAULT_P 224
  479. #endif
  480. #define QGEMM_DEFAULT_P 112
  481. #define XGEMM_DEFAULT_P 56
  482. #if defined(ARCH_X86_64)
  483. #define SGEMM_DEFAULT_Q 192
  484. #define DGEMM_DEFAULT_Q 160
  485. #define ZGEMM_DEFAULT_Q 160
  486. #define CGEMM_DEFAULT_Q 160
  487. #else
  488. #define SGEMM_DEFAULT_Q 224
  489. #define DGEMM_DEFAULT_Q 224
  490. #define ZGEMM_DEFAULT_Q 224
  491. #define CGEMM_DEFAULT_Q 224
  492. #endif
  493. #define QGEMM_DEFAULT_Q 224
  494. #define XGEMM_DEFAULT_Q 224
  495. #define CGEMM3M_DEFAULT_P 448
  496. #define ZGEMM3M_DEFAULT_P 224
  497. #define XGEMM3M_DEFAULT_P 112
  498. #define CGEMM3M_DEFAULT_Q 224
  499. #define ZGEMM3M_DEFAULT_Q 224
  500. #define XGEMM3M_DEFAULT_Q 224
  501. #define CGEMM3M_DEFAULT_R 12288
  502. #define ZGEMM3M_DEFAULT_R 12288
  503. #define XGEMM3M_DEFAULT_R 12288
  504. #define SGEMM_DEFAULT_R 12288
  505. #define QGEMM_DEFAULT_R qgemm_r
  506. #define DGEMM_DEFAULT_R 12288
  507. #define CGEMM_DEFAULT_R cgemm_r
  508. #define ZGEMM_DEFAULT_R zgemm_r
  509. #define XGEMM_DEFAULT_R xgemm_r
  510. #define SYMV_P 16
  511. #define HAVE_EXCLUSIVE_CACHE
  512. #define GEMM_THREAD gemm_thread_mn
  513. #endif
  514. #ifdef ZEN
  515. #define SNUMOPT 16
  516. #define DNUMOPT 8
  517. #define GEMM_DEFAULT_OFFSET_A 0
  518. #define GEMM_DEFAULT_OFFSET_B 0
  519. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  520. #define SYMV_P 8
  521. #define SWITCH_RATIO 16
  522. #ifdef ARCH_X86
  523. #define SGEMM_DEFAULT_UNROLL_M 4
  524. #define DGEMM_DEFAULT_UNROLL_M 2
  525. #define QGEMM_DEFAULT_UNROLL_M 2
  526. #define CGEMM_DEFAULT_UNROLL_M 2
  527. #define ZGEMM_DEFAULT_UNROLL_M 1
  528. #define XGEMM_DEFAULT_UNROLL_M 1
  529. #define SGEMM_DEFAULT_UNROLL_N 4
  530. #define DGEMM_DEFAULT_UNROLL_N 4
  531. #define QGEMM_DEFAULT_UNROLL_N 2
  532. #define CGEMM_DEFAULT_UNROLL_N 2
  533. #define ZGEMM_DEFAULT_UNROLL_N 2
  534. #define XGEMM_DEFAULT_UNROLL_N 1
  535. #else
  536. #define SGEMM_DEFAULT_UNROLL_M 8
  537. #define DGEMM_DEFAULT_UNROLL_M 4
  538. #define QGEMM_DEFAULT_UNROLL_M 2
  539. #define CGEMM_DEFAULT_UNROLL_M 8
  540. #define ZGEMM_DEFAULT_UNROLL_M 4
  541. #define XGEMM_DEFAULT_UNROLL_M 1
  542. #define SGEMM_DEFAULT_UNROLL_N 4
  543. #define DGEMM_DEFAULT_UNROLL_N 8
  544. #define QGEMM_DEFAULT_UNROLL_N 2
  545. #define CGEMM_DEFAULT_UNROLL_N 2
  546. #define ZGEMM_DEFAULT_UNROLL_N 2
  547. #define XGEMM_DEFAULT_UNROLL_N 1
  548. /*
  549. #define SGEMM_DEFAULT_UNROLL_MN 32
  550. #define DGEMM_DEFAULT_UNROLL_MN 32
  551. */
  552. #endif
  553. #ifdef ARCH_X86
  554. #define SGEMM_DEFAULT_P 512
  555. #define SGEMM_DEFAULT_R sgemm_r
  556. #define DGEMM_DEFAULT_P 512
  557. #define DGEMM_DEFAULT_R dgemm_r
  558. #define QGEMM_DEFAULT_P 504
  559. #define QGEMM_DEFAULT_R qgemm_r
  560. #define CGEMM_DEFAULT_P 128
  561. #define CGEMM_DEFAULT_R 1024
  562. #define ZGEMM_DEFAULT_P 512
  563. #define ZGEMM_DEFAULT_R zgemm_r
  564. #define XGEMM_DEFAULT_P 252
  565. #define XGEMM_DEFAULT_R xgemm_r
  566. #define SGEMM_DEFAULT_Q 256
  567. #define DGEMM_DEFAULT_Q 256
  568. #define QGEMM_DEFAULT_Q 128
  569. #define CGEMM_DEFAULT_Q 256
  570. #define ZGEMM_DEFAULT_Q 192
  571. #define XGEMM_DEFAULT_Q 128
  572. #else
  573. #define SGEMM_DEFAULT_P 320
  574. #define DGEMM_DEFAULT_P 512
  575. #define CGEMM_DEFAULT_P 256
  576. #define ZGEMM_DEFAULT_P 192
  577. #ifdef WINDOWS_ABI
  578. #define SGEMM_DEFAULT_Q 320
  579. #define DGEMM_DEFAULT_Q 128
  580. #else
  581. #define SGEMM_DEFAULT_Q 320
  582. #define DGEMM_DEFAULT_Q 256
  583. #endif
  584. #define CGEMM_DEFAULT_Q 256
  585. #define ZGEMM_DEFAULT_Q 192
  586. #define SGEMM_DEFAULT_R sgemm_r
  587. #define DGEMM_DEFAULT_R 13824
  588. #define CGEMM_DEFAULT_R cgemm_r
  589. #define ZGEMM_DEFAULT_R zgemm_r
  590. #define QGEMM_DEFAULT_Q 128
  591. #define QGEMM_DEFAULT_P 504
  592. #define QGEMM_DEFAULT_R qgemm_r
  593. #define XGEMM_DEFAULT_P 252
  594. #define XGEMM_DEFAULT_R xgemm_r
  595. #define XGEMM_DEFAULT_Q 128
  596. #define CGEMM3M_DEFAULT_UNROLL_N 4
  597. #define CGEMM3M_DEFAULT_UNROLL_M 8
  598. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  599. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  600. #define CGEMM3M_DEFAULT_P 320
  601. #define ZGEMM3M_DEFAULT_P 256
  602. #define XGEMM3M_DEFAULT_P 112
  603. #define CGEMM3M_DEFAULT_Q 320
  604. #define ZGEMM3M_DEFAULT_Q 256
  605. #define XGEMM3M_DEFAULT_Q 224
  606. #define CGEMM3M_DEFAULT_R 12288
  607. #define ZGEMM3M_DEFAULT_R 12288
  608. #define XGEMM3M_DEFAULT_R 12288
  609. #endif
  610. #endif
  611. #ifdef ATHLON
  612. #define SNUMOPT 4
  613. #define DNUMOPT 2
  614. #define GEMM_DEFAULT_OFFSET_A 0
  615. #define GEMM_DEFAULT_OFFSET_B 384
  616. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  617. #define SGEMM_DEFAULT_UNROLL_N 4
  618. #define DGEMM_DEFAULT_UNROLL_N 4
  619. #define QGEMM_DEFAULT_UNROLL_N 2
  620. #define CGEMM_DEFAULT_UNROLL_N 2
  621. #define ZGEMM_DEFAULT_UNROLL_N 2
  622. #define XGEMM_DEFAULT_UNROLL_N 1
  623. #define SGEMM_DEFAULT_UNROLL_M 2
  624. #define DGEMM_DEFAULT_UNROLL_M 1
  625. #define QGEMM_DEFAULT_UNROLL_M 2
  626. #define CGEMM_DEFAULT_UNROLL_M 1
  627. #define ZGEMM_DEFAULT_UNROLL_M 1
  628. #define XGEMM_DEFAULT_UNROLL_M 1
  629. #define SGEMM_DEFAULT_R sgemm_r
  630. #define DGEMM_DEFAULT_R dgemm_r
  631. #define QGEMM_DEFAULT_R qgemm_r
  632. #define CGEMM_DEFAULT_R cgemm_r
  633. #define ZGEMM_DEFAULT_R zgemm_r
  634. #define XGEMM_DEFAULT_R xgemm_r
  635. #define SGEMM_DEFAULT_P 208
  636. #define DGEMM_DEFAULT_P 104
  637. #define QGEMM_DEFAULT_P 56
  638. #define CGEMM_DEFAULT_P 104
  639. #define ZGEMM_DEFAULT_P 56
  640. #define XGEMM_DEFAULT_P 28
  641. #define SGEMM_DEFAULT_Q 208
  642. #define DGEMM_DEFAULT_Q 208
  643. #define QGEMM_DEFAULT_Q 208
  644. #define CGEMM_DEFAULT_Q 208
  645. #define ZGEMM_DEFAULT_Q 208
  646. #define XGEMM_DEFAULT_Q 208
  647. #define SYMV_P 16
  648. #define HAVE_EXCLUSIVE_CACHE
  649. #endif
  650. #ifdef VIAC3
  651. #define SNUMOPT 2
  652. #define DNUMOPT 1
  653. #define GEMM_DEFAULT_OFFSET_A 0
  654. #define GEMM_DEFAULT_OFFSET_B 256
  655. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  656. #define SGEMM_DEFAULT_UNROLL_N 4
  657. #define DGEMM_DEFAULT_UNROLL_N 4
  658. #define QGEMM_DEFAULT_UNROLL_N 2
  659. #define CGEMM_DEFAULT_UNROLL_N 2
  660. #define ZGEMM_DEFAULT_UNROLL_N 2
  661. #define XGEMM_DEFAULT_UNROLL_N 1
  662. #define SGEMM_DEFAULT_UNROLL_M 2
  663. #define DGEMM_DEFAULT_UNROLL_M 1
  664. #define QGEMM_DEFAULT_UNROLL_M 2
  665. #define CGEMM_DEFAULT_UNROLL_M 1
  666. #define ZGEMM_DEFAULT_UNROLL_M 1
  667. #define XGEMM_DEFAULT_UNROLL_M 1
  668. #define SGEMM_DEFAULT_R sgemm_r
  669. #define DGEMM_DEFAULT_R dgemm_r
  670. #define QGEMM_DEFAULT_R qgemm_r
  671. #define CGEMM_DEFAULT_R cgemm_r
  672. #define ZGEMM_DEFAULT_R zgemm_r
  673. #define XGEMM_DEFAULT_R xgemm_r
  674. #define SGEMM_DEFAULT_P 128
  675. #define DGEMM_DEFAULT_P 128
  676. #define QGEMM_DEFAULT_P 128
  677. #define CGEMM_DEFAULT_P 128
  678. #define ZGEMM_DEFAULT_P 128
  679. #define XGEMM_DEFAULT_P 128
  680. #define SGEMM_DEFAULT_Q 512
  681. #define DGEMM_DEFAULT_Q 256
  682. #define QGEMM_DEFAULT_Q 256
  683. #define CGEMM_DEFAULT_Q 256
  684. #define ZGEMM_DEFAULT_Q 128
  685. #define XGEMM_DEFAULT_Q 128
  686. #define SYMV_P 16
  687. #endif
  688. #ifdef NANO
  689. #define SNUMOPT 4
  690. #define DNUMOPT 2
  691. #define GEMM_DEFAULT_OFFSET_A 64
  692. #define GEMM_DEFAULT_OFFSET_B 256
  693. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x01ffffUL
  694. #ifdef ARCH_X86
  695. #define SGEMM_DEFAULT_UNROLL_N 4
  696. #define DGEMM_DEFAULT_UNROLL_N 4
  697. #define QGEMM_DEFAULT_UNROLL_N 2
  698. #define CGEMM_DEFAULT_UNROLL_N 2
  699. #define ZGEMM_DEFAULT_UNROLL_N 2
  700. #define XGEMM_DEFAULT_UNROLL_N 1
  701. #define SGEMM_DEFAULT_UNROLL_M 4
  702. #define DGEMM_DEFAULT_UNROLL_M 2
  703. #define QGEMM_DEFAULT_UNROLL_M 2
  704. #define CGEMM_DEFAULT_UNROLL_M 2
  705. #define ZGEMM_DEFAULT_UNROLL_M 1
  706. #define XGEMM_DEFAULT_UNROLL_M 1
  707. #else
  708. #define SGEMM_DEFAULT_UNROLL_N 8
  709. #define DGEMM_DEFAULT_UNROLL_N 4
  710. #define QGEMM_DEFAULT_UNROLL_N 2
  711. #define CGEMM_DEFAULT_UNROLL_N 4
  712. #define ZGEMM_DEFAULT_UNROLL_N 2
  713. #define XGEMM_DEFAULT_UNROLL_N 1
  714. #define SGEMM_DEFAULT_UNROLL_M 4
  715. #define DGEMM_DEFAULT_UNROLL_M 4
  716. #define QGEMM_DEFAULT_UNROLL_M 2
  717. #define CGEMM_DEFAULT_UNROLL_M 2
  718. #define ZGEMM_DEFAULT_UNROLL_M 2
  719. #define XGEMM_DEFAULT_UNROLL_M 1
  720. #endif
  721. #define SGEMM_DEFAULT_P 288
  722. #define DGEMM_DEFAULT_P 288
  723. #define QGEMM_DEFAULT_P 288
  724. #define CGEMM_DEFAULT_P 288
  725. #define ZGEMM_DEFAULT_P 288
  726. #define XGEMM_DEFAULT_P 288
  727. #define SGEMM_DEFAULT_R sgemm_r
  728. #define DGEMM_DEFAULT_R dgemm_r
  729. #define QGEMM_DEFAULT_R qgemm_r
  730. #define CGEMM_DEFAULT_R cgemm_r
  731. #define ZGEMM_DEFAULT_R zgemm_r
  732. #define XGEMM_DEFAULT_R xgemm_r
  733. #define SGEMM_DEFAULT_Q 256
  734. #define DGEMM_DEFAULT_Q 128
  735. #define QGEMM_DEFAULT_Q 64
  736. #define CGEMM_DEFAULT_Q 128
  737. #define ZGEMM_DEFAULT_Q 64
  738. #define XGEMM_DEFAULT_Q 32
  739. #define SYMV_P 16
  740. #define HAVE_EXCLUSIVE_CACHE
  741. #endif
  742. #if defined(PENTIUM) || defined(PENTIUM2) || defined(PENTIUM3)
  743. #ifdef HAVE_SSE
  744. #define SNUMOPT 2
  745. #else
  746. #define SNUMOPT 1
  747. #endif
  748. #define DNUMOPT 1
  749. #define GEMM_DEFAULT_OFFSET_A 0
  750. #define GEMM_DEFAULT_OFFSET_B 0
  751. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  752. #ifdef HAVE_SSE
  753. #define SGEMM_DEFAULT_UNROLL_M 8
  754. #define CGEMM_DEFAULT_UNROLL_M 4
  755. #else
  756. #define SGEMM_DEFAULT_UNROLL_M 4
  757. #define CGEMM_DEFAULT_UNROLL_M 2
  758. #endif
  759. #define DGEMM_DEFAULT_UNROLL_M 2
  760. #define SGEMM_DEFAULT_UNROLL_N 2
  761. #define DGEMM_DEFAULT_UNROLL_N 2
  762. #define QGEMM_DEFAULT_UNROLL_M 2
  763. #define QGEMM_DEFAULT_UNROLL_N 2
  764. #define CGEMM_DEFAULT_UNROLL_N 1
  765. #define ZGEMM_DEFAULT_UNROLL_M 1
  766. #define ZGEMM_DEFAULT_UNROLL_N 1
  767. #define XGEMM_DEFAULT_UNROLL_M 1
  768. #define XGEMM_DEFAULT_UNROLL_N 1
  769. #define SGEMM_DEFAULT_P sgemm_p
  770. #define SGEMM_DEFAULT_Q 256
  771. #define SGEMM_DEFAULT_R sgemm_r
  772. #define DGEMM_DEFAULT_P dgemm_p
  773. #define DGEMM_DEFAULT_Q 256
  774. #define DGEMM_DEFAULT_R dgemm_r
  775. #define QGEMM_DEFAULT_P qgemm_p
  776. #define QGEMM_DEFAULT_Q 256
  777. #define QGEMM_DEFAULT_R qgemm_r
  778. #define CGEMM_DEFAULT_P cgemm_p
  779. #define CGEMM_DEFAULT_Q 256
  780. #define CGEMM_DEFAULT_R cgemm_r
  781. #define ZGEMM_DEFAULT_P zgemm_p
  782. #define ZGEMM_DEFAULT_Q 256
  783. #define ZGEMM_DEFAULT_R zgemm_r
  784. #define XGEMM_DEFAULT_P xgemm_p
  785. #define XGEMM_DEFAULT_Q 256
  786. #define XGEMM_DEFAULT_R xgemm_r
  787. #define SYMV_P 4
  788. #endif
  789. #ifdef PENTIUMM
  790. #define SNUMOPT 2
  791. #define DNUMOPT 1
  792. #define GEMM_DEFAULT_OFFSET_A 0
  793. #define GEMM_DEFAULT_OFFSET_B 0
  794. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  795. #ifdef CORE_YONAH
  796. #define SGEMM_DEFAULT_UNROLL_M 4
  797. #define SGEMM_DEFAULT_UNROLL_N 4
  798. #define DGEMM_DEFAULT_UNROLL_M 2
  799. #define DGEMM_DEFAULT_UNROLL_N 4
  800. #define QGEMM_DEFAULT_UNROLL_M 2
  801. #define QGEMM_DEFAULT_UNROLL_N 2
  802. #define CGEMM_DEFAULT_UNROLL_M 2
  803. #define CGEMM_DEFAULT_UNROLL_N 2
  804. #define ZGEMM_DEFAULT_UNROLL_M 1
  805. #define ZGEMM_DEFAULT_UNROLL_N 2
  806. #define XGEMM_DEFAULT_UNROLL_M 1
  807. #define XGEMM_DEFAULT_UNROLL_N 1
  808. #else
  809. #define SGEMM_DEFAULT_UNROLL_M 8
  810. #define SGEMM_DEFAULT_UNROLL_N 2
  811. #define DGEMM_DEFAULT_UNROLL_M 2
  812. #define DGEMM_DEFAULT_UNROLL_N 2
  813. #define QGEMM_DEFAULT_UNROLL_M 2
  814. #define QGEMM_DEFAULT_UNROLL_N 2
  815. #define CGEMM_DEFAULT_UNROLL_M 4
  816. #define CGEMM_DEFAULT_UNROLL_N 1
  817. #define ZGEMM_DEFAULT_UNROLL_M 1
  818. #define ZGEMM_DEFAULT_UNROLL_N 1
  819. #define XGEMM_DEFAULT_UNROLL_M 1
  820. #define XGEMM_DEFAULT_UNROLL_N 1
  821. #endif
  822. #define SGEMM_DEFAULT_P sgemm_p
  823. #define SGEMM_DEFAULT_Q 256
  824. #define SGEMM_DEFAULT_R sgemm_r
  825. #define DGEMM_DEFAULT_P dgemm_p
  826. #define DGEMM_DEFAULT_Q 256
  827. #define DGEMM_DEFAULT_R dgemm_r
  828. #define QGEMM_DEFAULT_P qgemm_p
  829. #define QGEMM_DEFAULT_Q 256
  830. #define QGEMM_DEFAULT_R qgemm_r
  831. #define CGEMM_DEFAULT_P cgemm_p
  832. #define CGEMM_DEFAULT_Q 256
  833. #define CGEMM_DEFAULT_R cgemm_r
  834. #define ZGEMM_DEFAULT_P zgemm_p
  835. #define ZGEMM_DEFAULT_Q 256
  836. #define ZGEMM_DEFAULT_R zgemm_r
  837. #define XGEMM_DEFAULT_P xgemm_p
  838. #define XGEMM_DEFAULT_Q 256
  839. #define XGEMM_DEFAULT_R xgemm_r
  840. #define SYMV_P 4
  841. #endif
  842. #ifdef CORE_NORTHWOOD
  843. #define SNUMOPT 4
  844. #define DNUMOPT 2
  845. #define GEMM_DEFAULT_OFFSET_A 0
  846. #define GEMM_DEFAULT_OFFSET_B 32
  847. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  848. #define SYMV_P 8
  849. #define SGEMM_DEFAULT_UNROLL_M 8
  850. #define DGEMM_DEFAULT_UNROLL_M 4
  851. #define QGEMM_DEFAULT_UNROLL_M 2
  852. #define CGEMM_DEFAULT_UNROLL_M 4
  853. #define ZGEMM_DEFAULT_UNROLL_M 2
  854. #define XGEMM_DEFAULT_UNROLL_M 1
  855. #define SGEMM_DEFAULT_UNROLL_N 2
  856. #define DGEMM_DEFAULT_UNROLL_N 2
  857. #define QGEMM_DEFAULT_UNROLL_N 2
  858. #define CGEMM_DEFAULT_UNROLL_N 1
  859. #define ZGEMM_DEFAULT_UNROLL_N 1
  860. #define XGEMM_DEFAULT_UNROLL_N 1
  861. #define SGEMM_DEFAULT_P sgemm_p
  862. #define SGEMM_DEFAULT_R sgemm_r
  863. #define DGEMM_DEFAULT_P dgemm_p
  864. #define DGEMM_DEFAULT_R dgemm_r
  865. #define QGEMM_DEFAULT_P qgemm_p
  866. #define QGEMM_DEFAULT_R qgemm_r
  867. #define CGEMM_DEFAULT_P cgemm_p
  868. #define CGEMM_DEFAULT_R cgemm_r
  869. #define ZGEMM_DEFAULT_P zgemm_p
  870. #define ZGEMM_DEFAULT_R zgemm_r
  871. #define XGEMM_DEFAULT_P xgemm_p
  872. #define XGEMM_DEFAULT_R xgemm_r
  873. #define SGEMM_DEFAULT_Q 128
  874. #define DGEMM_DEFAULT_Q 128
  875. #define QGEMM_DEFAULT_Q 128
  876. #define CGEMM_DEFAULT_Q 128
  877. #define ZGEMM_DEFAULT_Q 128
  878. #define XGEMM_DEFAULT_Q 128
  879. #endif
  880. #ifdef CORE_PRESCOTT
  881. #define SNUMOPT 4
  882. #define DNUMOPT 2
  883. #ifndef __64BIT__
  884. #define GEMM_DEFAULT_OFFSET_A 128
  885. #define GEMM_DEFAULT_OFFSET_B 192
  886. #else
  887. #define GEMM_DEFAULT_OFFSET_A 0
  888. #define GEMM_DEFAULT_OFFSET_B 256
  889. #endif
  890. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  891. #define SYMV_P 8
  892. #ifdef ARCH_X86
  893. #define SGEMM_DEFAULT_UNROLL_M 4
  894. #define DGEMM_DEFAULT_UNROLL_M 2
  895. #define QGEMM_DEFAULT_UNROLL_M 2
  896. #define CGEMM_DEFAULT_UNROLL_M 2
  897. #define ZGEMM_DEFAULT_UNROLL_M 1
  898. #define XGEMM_DEFAULT_UNROLL_M 1
  899. #else
  900. #define SGEMM_DEFAULT_UNROLL_M 8
  901. #define DGEMM_DEFAULT_UNROLL_M 4
  902. #define QGEMM_DEFAULT_UNROLL_M 2
  903. #define CGEMM_DEFAULT_UNROLL_M 4
  904. #define ZGEMM_DEFAULT_UNROLL_M 2
  905. #define XGEMM_DEFAULT_UNROLL_M 1
  906. #endif
  907. #define SGEMM_DEFAULT_UNROLL_N 4
  908. #define DGEMM_DEFAULT_UNROLL_N 4
  909. #define QGEMM_DEFAULT_UNROLL_N 2
  910. #define CGEMM_DEFAULT_UNROLL_N 2
  911. #define ZGEMM_DEFAULT_UNROLL_N 2
  912. #define XGEMM_DEFAULT_UNROLL_N 1
  913. #define SGEMM_DEFAULT_P sgemm_p
  914. #define SGEMM_DEFAULT_R sgemm_r
  915. #define DGEMM_DEFAULT_P dgemm_p
  916. #define DGEMM_DEFAULT_R dgemm_r
  917. #define QGEMM_DEFAULT_P qgemm_p
  918. #define QGEMM_DEFAULT_R qgemm_r
  919. #define CGEMM_DEFAULT_P cgemm_p
  920. #define CGEMM_DEFAULT_R cgemm_r
  921. #define ZGEMM_DEFAULT_P zgemm_p
  922. #define ZGEMM_DEFAULT_R zgemm_r
  923. #define XGEMM_DEFAULT_P xgemm_p
  924. #define XGEMM_DEFAULT_R xgemm_r
  925. #define SGEMM_DEFAULT_Q 128
  926. #define DGEMM_DEFAULT_Q 128
  927. #define QGEMM_DEFAULT_Q 128
  928. #define CGEMM_DEFAULT_Q 128
  929. #define ZGEMM_DEFAULT_Q 128
  930. #define XGEMM_DEFAULT_Q 128
  931. #endif
  932. #ifdef CORE2
  933. #define SNUMOPT 8
  934. #define DNUMOPT 4
  935. #define GEMM_DEFAULT_OFFSET_A 448
  936. #define GEMM_DEFAULT_OFFSET_B 128
  937. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  938. #define SYMV_P 8
  939. #define SWITCH_RATIO 4
  940. #ifdef ARCH_X86
  941. #define SGEMM_DEFAULT_UNROLL_M 8
  942. #define DGEMM_DEFAULT_UNROLL_M 4
  943. #define QGEMM_DEFAULT_UNROLL_M 2
  944. #define CGEMM_DEFAULT_UNROLL_M 4
  945. #define ZGEMM_DEFAULT_UNROLL_M 2
  946. #define XGEMM_DEFAULT_UNROLL_M 1
  947. #define SGEMM_DEFAULT_UNROLL_N 2
  948. #define DGEMM_DEFAULT_UNROLL_N 2
  949. #define QGEMM_DEFAULT_UNROLL_N 2
  950. #define CGEMM_DEFAULT_UNROLL_N 1
  951. #define ZGEMM_DEFAULT_UNROLL_N 1
  952. #define XGEMM_DEFAULT_UNROLL_N 1
  953. #define MASK(a, b) ((((a) + (b) - 1) / (b)) * (b))
  954. #else
  955. #define SGEMM_DEFAULT_UNROLL_M 8
  956. #define DGEMM_DEFAULT_UNROLL_M 4
  957. #define QGEMM_DEFAULT_UNROLL_M 2
  958. #define CGEMM_DEFAULT_UNROLL_M 4
  959. #define ZGEMM_DEFAULT_UNROLL_M 2
  960. #define XGEMM_DEFAULT_UNROLL_M 1
  961. #define SGEMM_DEFAULT_UNROLL_N 4
  962. #define DGEMM_DEFAULT_UNROLL_N 4
  963. #define QGEMM_DEFAULT_UNROLL_N 2
  964. #define CGEMM_DEFAULT_UNROLL_N 2
  965. #define ZGEMM_DEFAULT_UNROLL_N 2
  966. #define XGEMM_DEFAULT_UNROLL_N 1
  967. #endif
  968. #define SGEMM_DEFAULT_P sgemm_p
  969. #define SGEMM_DEFAULT_R sgemm_r
  970. #define DGEMM_DEFAULT_P dgemm_p
  971. #define DGEMM_DEFAULT_R dgemm_r
  972. #define QGEMM_DEFAULT_P qgemm_p
  973. #define QGEMM_DEFAULT_R qgemm_r
  974. #define CGEMM_DEFAULT_P cgemm_p
  975. #define CGEMM_DEFAULT_R cgemm_r
  976. #define ZGEMM_DEFAULT_P zgemm_p
  977. #define ZGEMM_DEFAULT_R zgemm_r
  978. #define XGEMM_DEFAULT_P xgemm_p
  979. #define XGEMM_DEFAULT_R xgemm_r
  980. #define SGEMM_DEFAULT_Q 256
  981. #define DGEMM_DEFAULT_Q 256
  982. #define QGEMM_DEFAULT_Q 256
  983. #define CGEMM_DEFAULT_Q 256
  984. #define ZGEMM_DEFAULT_Q 256
  985. #define XGEMM_DEFAULT_Q 256
  986. #endif
  987. #ifdef PENRYN
  988. #define SNUMOPT 8
  989. #define DNUMOPT 4
  990. #define GEMM_DEFAULT_OFFSET_A 128
  991. #define GEMM_DEFAULT_OFFSET_B 0
  992. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  993. #define SYMV_P 8
  994. #define SWITCH_RATIO 4
  995. #ifdef ARCH_X86
  996. #define SGEMM_DEFAULT_UNROLL_M 4
  997. #define DGEMM_DEFAULT_UNROLL_M 2
  998. #define QGEMM_DEFAULT_UNROLL_M 2
  999. #define CGEMM_DEFAULT_UNROLL_M 2
  1000. #define ZGEMM_DEFAULT_UNROLL_M 1
  1001. #define XGEMM_DEFAULT_UNROLL_M 1
  1002. #define SGEMM_DEFAULT_UNROLL_N 4
  1003. #define DGEMM_DEFAULT_UNROLL_N 4
  1004. #define QGEMM_DEFAULT_UNROLL_N 2
  1005. #define CGEMM_DEFAULT_UNROLL_N 2
  1006. #define ZGEMM_DEFAULT_UNROLL_N 2
  1007. #define XGEMM_DEFAULT_UNROLL_N 1
  1008. #else
  1009. #define SGEMM_DEFAULT_UNROLL_M 8
  1010. #define DGEMM_DEFAULT_UNROLL_M 4
  1011. #define QGEMM_DEFAULT_UNROLL_M 2
  1012. #define CGEMM_DEFAULT_UNROLL_M 4
  1013. #define ZGEMM_DEFAULT_UNROLL_M 2
  1014. #define XGEMM_DEFAULT_UNROLL_M 1
  1015. #define SGEMM_DEFAULT_UNROLL_N 4
  1016. #define DGEMM_DEFAULT_UNROLL_N 4
  1017. #define QGEMM_DEFAULT_UNROLL_N 2
  1018. #define CGEMM_DEFAULT_UNROLL_N 2
  1019. #define ZGEMM_DEFAULT_UNROLL_N 2
  1020. #define XGEMM_DEFAULT_UNROLL_N 1
  1021. #endif
  1022. #define SGEMM_DEFAULT_P sgemm_p
  1023. #define SGEMM_DEFAULT_R sgemm_r
  1024. #define DGEMM_DEFAULT_P dgemm_p
  1025. #define DGEMM_DEFAULT_R dgemm_r
  1026. #define QGEMM_DEFAULT_P qgemm_p
  1027. #define QGEMM_DEFAULT_R qgemm_r
  1028. #define CGEMM_DEFAULT_P cgemm_p
  1029. #define CGEMM_DEFAULT_R cgemm_r
  1030. #define ZGEMM_DEFAULT_P zgemm_p
  1031. #define ZGEMM_DEFAULT_R zgemm_r
  1032. #define XGEMM_DEFAULT_P xgemm_p
  1033. #define XGEMM_DEFAULT_R xgemm_r
  1034. #define SGEMM_DEFAULT_Q 512
  1035. #define DGEMM_DEFAULT_Q 256
  1036. #define QGEMM_DEFAULT_Q 128
  1037. #define CGEMM_DEFAULT_Q 512
  1038. #define ZGEMM_DEFAULT_Q 256
  1039. #define XGEMM_DEFAULT_Q 128
  1040. #define GETRF_FACTOR 0.75
  1041. #endif
  1042. #ifdef DUNNINGTON
  1043. #define SNUMOPT 8
  1044. #define DNUMOPT 4
  1045. #define GEMM_DEFAULT_OFFSET_A 128
  1046. #define GEMM_DEFAULT_OFFSET_B 0
  1047. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1048. #define SYMV_P 8
  1049. #define SWITCH_RATIO 4
  1050. #ifdef ARCH_X86
  1051. #define SGEMM_DEFAULT_UNROLL_M 4
  1052. #define DGEMM_DEFAULT_UNROLL_M 2
  1053. #define QGEMM_DEFAULT_UNROLL_M 2
  1054. #define CGEMM_DEFAULT_UNROLL_M 2
  1055. #define ZGEMM_DEFAULT_UNROLL_M 1
  1056. #define XGEMM_DEFAULT_UNROLL_M 1
  1057. #define SGEMM_DEFAULT_UNROLL_N 4
  1058. #define DGEMM_DEFAULT_UNROLL_N 4
  1059. #define QGEMM_DEFAULT_UNROLL_N 2
  1060. #define CGEMM_DEFAULT_UNROLL_N 2
  1061. #define ZGEMM_DEFAULT_UNROLL_N 2
  1062. #define XGEMM_DEFAULT_UNROLL_N 1
  1063. #else
  1064. #define SGEMM_DEFAULT_UNROLL_M 8
  1065. #define DGEMM_DEFAULT_UNROLL_M 4
  1066. #define QGEMM_DEFAULT_UNROLL_M 2
  1067. #define CGEMM_DEFAULT_UNROLL_M 4
  1068. #define ZGEMM_DEFAULT_UNROLL_M 2
  1069. #define XGEMM_DEFAULT_UNROLL_M 1
  1070. #define SGEMM_DEFAULT_UNROLL_N 4
  1071. #define DGEMM_DEFAULT_UNROLL_N 4
  1072. #define QGEMM_DEFAULT_UNROLL_N 2
  1073. #define CGEMM_DEFAULT_UNROLL_N 2
  1074. #define ZGEMM_DEFAULT_UNROLL_N 2
  1075. #define XGEMM_DEFAULT_UNROLL_N 1
  1076. #endif
  1077. #define SGEMM_DEFAULT_P sgemm_p
  1078. #define SGEMM_DEFAULT_R sgemm_r
  1079. #define DGEMM_DEFAULT_P dgemm_p
  1080. #define DGEMM_DEFAULT_R dgemm_r
  1081. #define QGEMM_DEFAULT_P qgemm_p
  1082. #define QGEMM_DEFAULT_R qgemm_r
  1083. #define CGEMM_DEFAULT_P cgemm_p
  1084. #define CGEMM_DEFAULT_R cgemm_r
  1085. #define ZGEMM_DEFAULT_P zgemm_p
  1086. #define ZGEMM_DEFAULT_R zgemm_r
  1087. #define XGEMM_DEFAULT_P xgemm_p
  1088. #define XGEMM_DEFAULT_R xgemm_r
  1089. #define SGEMM_DEFAULT_Q 768
  1090. #define DGEMM_DEFAULT_Q 384
  1091. #define QGEMM_DEFAULT_Q 192
  1092. #define CGEMM_DEFAULT_Q 768
  1093. #define ZGEMM_DEFAULT_Q 384
  1094. #define XGEMM_DEFAULT_Q 192
  1095. #define GETRF_FACTOR 0.75
  1096. #define GEMM_THREAD gemm_thread_mn
  1097. #endif
  1098. #ifdef NEHALEM
  1099. #define SNUMOPT 8
  1100. #define DNUMOPT 4
  1101. #define GEMM_DEFAULT_OFFSET_A 32
  1102. #define GEMM_DEFAULT_OFFSET_B 0
  1103. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1104. #define SYMV_P 8
  1105. #define SWITCH_RATIO 4
  1106. #ifdef ARCH_X86
  1107. #define SGEMM_DEFAULT_UNROLL_M 4
  1108. #define DGEMM_DEFAULT_UNROLL_M 2
  1109. #define QGEMM_DEFAULT_UNROLL_M 2
  1110. #define CGEMM_DEFAULT_UNROLL_M 2
  1111. #define ZGEMM_DEFAULT_UNROLL_M 1
  1112. #define XGEMM_DEFAULT_UNROLL_M 1
  1113. #define SGEMM_DEFAULT_UNROLL_N 4
  1114. #define DGEMM_DEFAULT_UNROLL_N 4
  1115. #define QGEMM_DEFAULT_UNROLL_N 2
  1116. #define CGEMM_DEFAULT_UNROLL_N 2
  1117. #define ZGEMM_DEFAULT_UNROLL_N 2
  1118. #define XGEMM_DEFAULT_UNROLL_N 1
  1119. #else
  1120. #define SGEMM_DEFAULT_UNROLL_M 4
  1121. #define DGEMM_DEFAULT_UNROLL_M 2
  1122. #define QGEMM_DEFAULT_UNROLL_M 2
  1123. #define CGEMM_DEFAULT_UNROLL_M 2
  1124. #define ZGEMM_DEFAULT_UNROLL_M 1
  1125. #define XGEMM_DEFAULT_UNROLL_M 1
  1126. #define SGEMM_DEFAULT_UNROLL_N 8
  1127. #define DGEMM_DEFAULT_UNROLL_N 8
  1128. #define QGEMM_DEFAULT_UNROLL_N 2
  1129. #define CGEMM_DEFAULT_UNROLL_N 4
  1130. #define ZGEMM_DEFAULT_UNROLL_N 4
  1131. #define XGEMM_DEFAULT_UNROLL_N 1
  1132. #endif
  1133. #define SGEMM_DEFAULT_P 504
  1134. #define SGEMM_DEFAULT_R sgemm_r
  1135. #define DGEMM_DEFAULT_P 504
  1136. #define DGEMM_DEFAULT_R dgemm_r
  1137. #define QGEMM_DEFAULT_P 504
  1138. #define QGEMM_DEFAULT_R qgemm_r
  1139. #define CGEMM_DEFAULT_P 252
  1140. #define CGEMM_DEFAULT_R cgemm_r
  1141. #define ZGEMM_DEFAULT_P 252
  1142. #define ZGEMM_DEFAULT_R zgemm_r
  1143. #define XGEMM_DEFAULT_P 252
  1144. #define XGEMM_DEFAULT_R xgemm_r
  1145. #define SGEMM_DEFAULT_Q 512
  1146. #define DGEMM_DEFAULT_Q 256
  1147. #define QGEMM_DEFAULT_Q 128
  1148. #define CGEMM_DEFAULT_Q 512
  1149. #define ZGEMM_DEFAULT_Q 256
  1150. #define XGEMM_DEFAULT_Q 128
  1151. #define GETRF_FACTOR 0.72
  1152. #endif
  1153. #ifdef SANDYBRIDGE
  1154. #define SNUMOPT 8
  1155. #define DNUMOPT 4
  1156. #define GEMM_DEFAULT_OFFSET_A 0
  1157. #define GEMM_DEFAULT_OFFSET_B 0
  1158. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1159. #define SYMV_P 8
  1160. #define SWITCH_RATIO 4
  1161. #ifdef ARCH_X86
  1162. #define SGEMM_DEFAULT_UNROLL_M 4
  1163. #define DGEMM_DEFAULT_UNROLL_M 2
  1164. #define QGEMM_DEFAULT_UNROLL_M 2
  1165. #define CGEMM_DEFAULT_UNROLL_M 2
  1166. #define ZGEMM_DEFAULT_UNROLL_M 1
  1167. #define XGEMM_DEFAULT_UNROLL_M 1
  1168. #define SGEMM_DEFAULT_UNROLL_N 4
  1169. #define DGEMM_DEFAULT_UNROLL_N 4
  1170. #define QGEMM_DEFAULT_UNROLL_N 2
  1171. #define CGEMM_DEFAULT_UNROLL_N 2
  1172. #define ZGEMM_DEFAULT_UNROLL_N 2
  1173. #define XGEMM_DEFAULT_UNROLL_N 1
  1174. #else
  1175. #define SGEMM_DEFAULT_UNROLL_M 16
  1176. #define DGEMM_DEFAULT_UNROLL_M 8
  1177. #define QGEMM_DEFAULT_UNROLL_M 2
  1178. #define CGEMM_DEFAULT_UNROLL_M 8
  1179. #define ZGEMM_DEFAULT_UNROLL_M 1
  1180. #define XGEMM_DEFAULT_UNROLL_M 1
  1181. #define SGEMM_DEFAULT_UNROLL_N 4
  1182. #define DGEMM_DEFAULT_UNROLL_N 4
  1183. #define QGEMM_DEFAULT_UNROLL_N 2
  1184. #define CGEMM_DEFAULT_UNROLL_N 2
  1185. #define ZGEMM_DEFAULT_UNROLL_N 4
  1186. #define XGEMM_DEFAULT_UNROLL_N 1
  1187. #endif
  1188. #define SGEMM_DEFAULT_P 768
  1189. #define SGEMM_DEFAULT_R sgemm_r
  1190. /*#define SGEMM_DEFAULT_R 1024*/
  1191. #define DGEMM_DEFAULT_P 512
  1192. #define DGEMM_DEFAULT_R dgemm_r
  1193. /*#define DGEMM_DEFAULT_R 1024*/
  1194. #define QGEMM_DEFAULT_P 504
  1195. #define QGEMM_DEFAULT_R qgemm_r
  1196. #define CGEMM_DEFAULT_P 768
  1197. #define CGEMM_DEFAULT_R cgemm_r
  1198. /*#define CGEMM_DEFAULT_R 1024*/
  1199. #define ZGEMM_DEFAULT_P 512
  1200. #define ZGEMM_DEFAULT_R zgemm_r
  1201. /*#define ZGEMM_DEFAULT_R 1024*/
  1202. #define XGEMM_DEFAULT_P 252
  1203. #define XGEMM_DEFAULT_R xgemm_r
  1204. #define SGEMM_DEFAULT_Q 384
  1205. #define DGEMM_DEFAULT_Q 256
  1206. #define QGEMM_DEFAULT_Q 128
  1207. #define CGEMM_DEFAULT_Q 512
  1208. #define ZGEMM_DEFAULT_Q 192
  1209. #define XGEMM_DEFAULT_Q 128
  1210. #define CGEMM3M_DEFAULT_UNROLL_N 8
  1211. #define CGEMM3M_DEFAULT_UNROLL_M 4
  1212. #define ZGEMM3M_DEFAULT_UNROLL_N 8
  1213. #define ZGEMM3M_DEFAULT_UNROLL_M 2
  1214. #define CGEMM3M_DEFAULT_P 448
  1215. #define ZGEMM3M_DEFAULT_P 224
  1216. #define XGEMM3M_DEFAULT_P 112
  1217. #define CGEMM3M_DEFAULT_Q 224
  1218. #define ZGEMM3M_DEFAULT_Q 224
  1219. #define XGEMM3M_DEFAULT_Q 224
  1220. #define CGEMM3M_DEFAULT_R 12288
  1221. #define ZGEMM3M_DEFAULT_R 12288
  1222. #define XGEMM3M_DEFAULT_R 12288
  1223. #define GETRF_FACTOR 0.72
  1224. #endif
  1225. #ifdef HASWELL
  1226. #define SNUMOPT 16
  1227. #define DNUMOPT 8
  1228. #define GEMM_DEFAULT_OFFSET_A 0
  1229. #define GEMM_DEFAULT_OFFSET_B 0
  1230. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1231. #define SYMV_P 8
  1232. #if defined(XDOUBLE) || defined(DOUBLE)
  1233. #define SWITCH_RATIO 4
  1234. #define GEMM_PREFERED_SIZE 4
  1235. #else
  1236. #define SWITCH_RATIO 8
  1237. #define GEMM_PREFERED_SIZE 8
  1238. #endif
  1239. #ifdef ARCH_X86
  1240. #define SGEMM_DEFAULT_UNROLL_M 4
  1241. #define DGEMM_DEFAULT_UNROLL_M 2
  1242. #define QGEMM_DEFAULT_UNROLL_M 2
  1243. #define CGEMM_DEFAULT_UNROLL_M 2
  1244. #define ZGEMM_DEFAULT_UNROLL_M 1
  1245. #define XGEMM_DEFAULT_UNROLL_M 1
  1246. #define SGEMM_DEFAULT_UNROLL_N 4
  1247. #define DGEMM_DEFAULT_UNROLL_N 4
  1248. #define QGEMM_DEFAULT_UNROLL_N 2
  1249. #define CGEMM_DEFAULT_UNROLL_N 2
  1250. #define ZGEMM_DEFAULT_UNROLL_N 2
  1251. #define XGEMM_DEFAULT_UNROLL_N 1
  1252. #else
  1253. #define SGEMM_DEFAULT_UNROLL_M 8
  1254. #define DGEMM_DEFAULT_UNROLL_M 4
  1255. #define QGEMM_DEFAULT_UNROLL_M 2
  1256. #define CGEMM_DEFAULT_UNROLL_M 8
  1257. #define ZGEMM_DEFAULT_UNROLL_M 4
  1258. #define XGEMM_DEFAULT_UNROLL_M 1
  1259. #define SGEMM_DEFAULT_UNROLL_N 4
  1260. #define DGEMM_DEFAULT_UNROLL_N 8
  1261. #define QGEMM_DEFAULT_UNROLL_N 2
  1262. #define CGEMM_DEFAULT_UNROLL_N 2
  1263. #define ZGEMM_DEFAULT_UNROLL_N 2
  1264. #define XGEMM_DEFAULT_UNROLL_N 1
  1265. /*
  1266. #define SGEMM_DEFAULT_UNROLL_MN 32
  1267. #define DGEMM_DEFAULT_UNROLL_MN 32
  1268. */
  1269. #endif
  1270. #ifdef ARCH_X86
  1271. #define SGEMM_DEFAULT_P 512
  1272. #define SGEMM_DEFAULT_R sgemm_r
  1273. #define DGEMM_DEFAULT_P 512
  1274. #define DGEMM_DEFAULT_R dgemm_r
  1275. #define QGEMM_DEFAULT_P 504
  1276. #define QGEMM_DEFAULT_R qgemm_r
  1277. #define CGEMM_DEFAULT_P 128
  1278. #define CGEMM_DEFAULT_R 1024
  1279. #define ZGEMM_DEFAULT_P 512
  1280. #define ZGEMM_DEFAULT_R zgemm_r
  1281. #define XGEMM_DEFAULT_P 252
  1282. #define XGEMM_DEFAULT_R xgemm_r
  1283. #define SGEMM_DEFAULT_Q 256
  1284. #define DGEMM_DEFAULT_Q 256
  1285. #define QGEMM_DEFAULT_Q 128
  1286. #define CGEMM_DEFAULT_Q 256
  1287. #define ZGEMM_DEFAULT_Q 192
  1288. #define XGEMM_DEFAULT_Q 128
  1289. #else
  1290. #define SGEMM_DEFAULT_P 320
  1291. #define DGEMM_DEFAULT_P 512
  1292. #define CGEMM_DEFAULT_P 256
  1293. #define ZGEMM_DEFAULT_P 192
  1294. #ifdef WINDOWS_ABI
  1295. #define SGEMM_DEFAULT_Q 320
  1296. #define DGEMM_DEFAULT_Q 128
  1297. #else
  1298. #define SGEMM_DEFAULT_Q 320
  1299. #define DGEMM_DEFAULT_Q 256
  1300. #endif
  1301. #define CGEMM_DEFAULT_Q 256
  1302. #define ZGEMM_DEFAULT_Q 192
  1303. #define SGEMM_DEFAULT_R sgemm_r
  1304. #define DGEMM_DEFAULT_R 13824
  1305. #define CGEMM_DEFAULT_R cgemm_r
  1306. #define ZGEMM_DEFAULT_R zgemm_r
  1307. #define QGEMM_DEFAULT_Q 128
  1308. #define QGEMM_DEFAULT_P 504
  1309. #define QGEMM_DEFAULT_R qgemm_r
  1310. #define XGEMM_DEFAULT_P 252
  1311. #define XGEMM_DEFAULT_R xgemm_r
  1312. #define XGEMM_DEFAULT_Q 128
  1313. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1314. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1315. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1316. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1317. #define CGEMM3M_DEFAULT_P 320
  1318. #define ZGEMM3M_DEFAULT_P 256
  1319. #define XGEMM3M_DEFAULT_P 112
  1320. #define CGEMM3M_DEFAULT_Q 320
  1321. #define ZGEMM3M_DEFAULT_Q 256
  1322. #define XGEMM3M_DEFAULT_Q 224
  1323. #define CGEMM3M_DEFAULT_R 12288
  1324. #define ZGEMM3M_DEFAULT_R 12288
  1325. #define XGEMM3M_DEFAULT_R 12288
  1326. #endif
  1327. #endif
  1328. #ifdef SKYLAKEX
  1329. #define SNUMOPT 16
  1330. #define DNUMOPT 8
  1331. #define GEMM_DEFAULT_OFFSET_A 0
  1332. #define GEMM_DEFAULT_OFFSET_B 0
  1333. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1334. #define SYMV_P 8
  1335. #if defined(XDOUBLE) || defined(DOUBLE)
  1336. #define SWITCH_RATIO 8
  1337. #define GEMM_PREFERED_SIZE 8
  1338. #else
  1339. #define SWITCH_RATIO 16
  1340. #define GEMM_PREFERED_SIZE 16
  1341. #endif
  1342. #define USE_SGEMM_KERNEL_DIRECT 1
  1343. #ifdef ARCH_X86
  1344. #define SGEMM_DEFAULT_UNROLL_M 4
  1345. #define DGEMM_DEFAULT_UNROLL_M 2
  1346. #define QGEMM_DEFAULT_UNROLL_M 2
  1347. #define CGEMM_DEFAULT_UNROLL_M 2
  1348. #define ZGEMM_DEFAULT_UNROLL_M 1
  1349. #define XGEMM_DEFAULT_UNROLL_M 1
  1350. #define SGEMM_DEFAULT_UNROLL_N 4
  1351. #define DGEMM_DEFAULT_UNROLL_N 4
  1352. #define QGEMM_DEFAULT_UNROLL_N 2
  1353. #define CGEMM_DEFAULT_UNROLL_N 2
  1354. #define ZGEMM_DEFAULT_UNROLL_N 2
  1355. #define XGEMM_DEFAULT_UNROLL_N 1
  1356. #else
  1357. #define SGEMM_DEFAULT_UNROLL_M 16
  1358. #define DGEMM_DEFAULT_UNROLL_M 16
  1359. #define QGEMM_DEFAULT_UNROLL_M 2
  1360. #define CGEMM_DEFAULT_UNROLL_M 8
  1361. #define ZGEMM_DEFAULT_UNROLL_M 4
  1362. #define XGEMM_DEFAULT_UNROLL_M 1
  1363. #define SGEMM_DEFAULT_UNROLL_N 4
  1364. #define DGEMM_DEFAULT_UNROLL_N 2
  1365. #define QGEMM_DEFAULT_UNROLL_N 2
  1366. #define CGEMM_DEFAULT_UNROLL_N 2
  1367. #define ZGEMM_DEFAULT_UNROLL_N 2
  1368. #define XGEMM_DEFAULT_UNROLL_N 1
  1369. #define SGEMM_DEFAULT_UNROLL_MN 32
  1370. #define DGEMM_DEFAULT_UNROLL_MN 32
  1371. #endif
  1372. #ifdef ARCH_X86
  1373. #define SGEMM_DEFAULT_P 512
  1374. #define SGEMM_DEFAULT_R sgemm_r
  1375. #define DGEMM_DEFAULT_P 512
  1376. #define DGEMM_DEFAULT_R dgemm_r
  1377. #define QGEMM_DEFAULT_P 504
  1378. #define QGEMM_DEFAULT_R qgemm_r
  1379. #define CGEMM_DEFAULT_P 128
  1380. #define CGEMM_DEFAULT_R 1024
  1381. #define ZGEMM_DEFAULT_P 512
  1382. #define ZGEMM_DEFAULT_R zgemm_r
  1383. #define XGEMM_DEFAULT_P 252
  1384. #define XGEMM_DEFAULT_R xgemm_r
  1385. #define SGEMM_DEFAULT_Q 256
  1386. #define DGEMM_DEFAULT_Q 256
  1387. #define QGEMM_DEFAULT_Q 128
  1388. #define CGEMM_DEFAULT_Q 256
  1389. #define ZGEMM_DEFAULT_Q 192
  1390. #define XGEMM_DEFAULT_Q 128
  1391. #else
  1392. #define SGEMM_DEFAULT_P 448
  1393. #define DGEMM_DEFAULT_P 192
  1394. #define CGEMM_DEFAULT_P 384
  1395. #define ZGEMM_DEFAULT_P 256
  1396. #define SGEMM_DEFAULT_Q 448
  1397. #define DGEMM_DEFAULT_Q 384
  1398. #define CGEMM_DEFAULT_Q 192
  1399. #define ZGEMM_DEFAULT_Q 128
  1400. #define SGEMM_DEFAULT_R sgemm_r
  1401. #define DGEMM_DEFAULT_R 8640
  1402. #define CGEMM_DEFAULT_R cgemm_r
  1403. #define ZGEMM_DEFAULT_R zgemm_r
  1404. #define QGEMM_DEFAULT_Q 128
  1405. #define QGEMM_DEFAULT_P 504
  1406. #define QGEMM_DEFAULT_R qgemm_r
  1407. #define XGEMM_DEFAULT_P 252
  1408. #define XGEMM_DEFAULT_R xgemm_r
  1409. #define XGEMM_DEFAULT_Q 128
  1410. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1411. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1412. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1413. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1414. #define CGEMM3M_DEFAULT_P 320
  1415. #define ZGEMM3M_DEFAULT_P 256
  1416. #define XGEMM3M_DEFAULT_P 112
  1417. #define CGEMM3M_DEFAULT_Q 320
  1418. #define ZGEMM3M_DEFAULT_Q 256
  1419. #define XGEMM3M_DEFAULT_Q 224
  1420. #define CGEMM3M_DEFAULT_R 12288
  1421. #define ZGEMM3M_DEFAULT_R 12288
  1422. #define XGEMM3M_DEFAULT_R 12288
  1423. #endif
  1424. #endif
  1425. #ifdef COOPERLAKE
  1426. #define SNUMOPT 16
  1427. #define DNUMOPT 8
  1428. #define GEMM_DEFAULT_OFFSET_A 0
  1429. #define GEMM_DEFAULT_OFFSET_B 0
  1430. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1431. #define SYMV_P 8
  1432. #if defined(XDOUBLE) || defined(DOUBLE)
  1433. #define SWITCH_RATIO 8
  1434. #define GEMM_PREFERED_SIZE 8
  1435. #else
  1436. #define SWITCH_RATIO 16
  1437. #define GEMM_PREFERED_SIZE 16
  1438. #endif
  1439. #define USE_SGEMM_KERNEL_DIRECT 1
  1440. #ifdef ARCH_X86
  1441. #define SGEMM_DEFAULT_UNROLL_M 4
  1442. #define DGEMM_DEFAULT_UNROLL_M 2
  1443. #define QGEMM_DEFAULT_UNROLL_M 2
  1444. #define CGEMM_DEFAULT_UNROLL_M 2
  1445. #define ZGEMM_DEFAULT_UNROLL_M 1
  1446. #define XGEMM_DEFAULT_UNROLL_M 1
  1447. #define SGEMM_DEFAULT_UNROLL_N 4
  1448. #define DGEMM_DEFAULT_UNROLL_N 4
  1449. #define QGEMM_DEFAULT_UNROLL_N 2
  1450. #define CGEMM_DEFAULT_UNROLL_N 2
  1451. #define ZGEMM_DEFAULT_UNROLL_N 2
  1452. #define XGEMM_DEFAULT_UNROLL_N 1
  1453. #else
  1454. #define SGEMM_DEFAULT_UNROLL_M 16
  1455. #define DGEMM_DEFAULT_UNROLL_M 16
  1456. #define QGEMM_DEFAULT_UNROLL_M 2
  1457. #define CGEMM_DEFAULT_UNROLL_M 8
  1458. #define ZGEMM_DEFAULT_UNROLL_M 4
  1459. #define XGEMM_DEFAULT_UNROLL_M 1
  1460. #define SGEMM_DEFAULT_UNROLL_N 4
  1461. #define DGEMM_DEFAULT_UNROLL_N 2
  1462. #define QGEMM_DEFAULT_UNROLL_N 2
  1463. #define CGEMM_DEFAULT_UNROLL_N 2
  1464. #define ZGEMM_DEFAULT_UNROLL_N 2
  1465. #define XGEMM_DEFAULT_UNROLL_N 1
  1466. #define SGEMM_DEFAULT_UNROLL_MN 32
  1467. #define DGEMM_DEFAULT_UNROLL_MN 32
  1468. #endif
  1469. #ifdef ARCH_X86
  1470. #define SGEMM_DEFAULT_P 512
  1471. #define SGEMM_DEFAULT_R sgemm_r
  1472. #define DGEMM_DEFAULT_P 512
  1473. #define DGEMM_DEFAULT_R dgemm_r
  1474. #define QGEMM_DEFAULT_P 504
  1475. #define QGEMM_DEFAULT_R qgemm_r
  1476. #define CGEMM_DEFAULT_P 128
  1477. #define CGEMM_DEFAULT_R 1024
  1478. #define ZGEMM_DEFAULT_P 512
  1479. #define ZGEMM_DEFAULT_R zgemm_r
  1480. #define XGEMM_DEFAULT_P 252
  1481. #define XGEMM_DEFAULT_R xgemm_r
  1482. #define SGEMM_DEFAULT_Q 256
  1483. #define DGEMM_DEFAULT_Q 256
  1484. #define QGEMM_DEFAULT_Q 128
  1485. #define CGEMM_DEFAULT_Q 256
  1486. #define ZGEMM_DEFAULT_Q 192
  1487. #define XGEMM_DEFAULT_Q 128
  1488. #else
  1489. #define SGEMM_DEFAULT_P 640
  1490. #define DGEMM_DEFAULT_P 192
  1491. #define CGEMM_DEFAULT_P 384
  1492. #define ZGEMM_DEFAULT_P 256
  1493. #define SGEMM_DEFAULT_Q 320
  1494. #define DGEMM_DEFAULT_Q 384
  1495. #define CGEMM_DEFAULT_Q 192
  1496. #define ZGEMM_DEFAULT_Q 128
  1497. #define SGEMM_DEFAULT_R sgemm_r
  1498. #define DGEMM_DEFAULT_R 8640
  1499. #define CGEMM_DEFAULT_R cgemm_r
  1500. #define ZGEMM_DEFAULT_R zgemm_r
  1501. #define QGEMM_DEFAULT_Q 128
  1502. #define QGEMM_DEFAULT_P 504
  1503. #define QGEMM_DEFAULT_R qgemm_r
  1504. #define XGEMM_DEFAULT_P 252
  1505. #define XGEMM_DEFAULT_R xgemm_r
  1506. #define XGEMM_DEFAULT_Q 128
  1507. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1508. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1509. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1510. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1511. #define CGEMM3M_DEFAULT_P 320
  1512. #define ZGEMM3M_DEFAULT_P 256
  1513. #define XGEMM3M_DEFAULT_P 112
  1514. #define CGEMM3M_DEFAULT_Q 320
  1515. #define ZGEMM3M_DEFAULT_Q 256
  1516. #define XGEMM3M_DEFAULT_Q 224
  1517. #define CGEMM3M_DEFAULT_R 12288
  1518. #define ZGEMM3M_DEFAULT_R 12288
  1519. #define XGEMM3M_DEFAULT_R 12288
  1520. #endif
  1521. #endif
  1522. #ifdef ATOM
  1523. #define SNUMOPT 2
  1524. #define DNUMOPT 1
  1525. #define GEMM_DEFAULT_OFFSET_A 64
  1526. #define GEMM_DEFAULT_OFFSET_B 0
  1527. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1528. #define SYMV_P 8
  1529. #ifdef ARCH_X86
  1530. #define SGEMM_DEFAULT_UNROLL_M 4
  1531. #define DGEMM_DEFAULT_UNROLL_M 2
  1532. #define QGEMM_DEFAULT_UNROLL_M 2
  1533. #define CGEMM_DEFAULT_UNROLL_M 2
  1534. #define ZGEMM_DEFAULT_UNROLL_M 1
  1535. #define XGEMM_DEFAULT_UNROLL_M 1
  1536. #else
  1537. #define SGEMM_DEFAULT_UNROLL_M 8
  1538. #define DGEMM_DEFAULT_UNROLL_M 4
  1539. #define QGEMM_DEFAULT_UNROLL_M 2
  1540. #define CGEMM_DEFAULT_UNROLL_M 4
  1541. #define ZGEMM_DEFAULT_UNROLL_M 2
  1542. #define XGEMM_DEFAULT_UNROLL_M 1
  1543. #endif
  1544. #define SGEMM_DEFAULT_UNROLL_N 4
  1545. #define DGEMM_DEFAULT_UNROLL_N 2
  1546. #define QGEMM_DEFAULT_UNROLL_N 2
  1547. #define CGEMM_DEFAULT_UNROLL_N 2
  1548. #define ZGEMM_DEFAULT_UNROLL_N 1
  1549. #define XGEMM_DEFAULT_UNROLL_N 1
  1550. #define SGEMM_DEFAULT_P sgemm_p
  1551. #define SGEMM_DEFAULT_R sgemm_r
  1552. #define DGEMM_DEFAULT_P dgemm_p
  1553. #define DGEMM_DEFAULT_R dgemm_r
  1554. #define QGEMM_DEFAULT_P qgemm_p
  1555. #define QGEMM_DEFAULT_R qgemm_r
  1556. #define CGEMM_DEFAULT_P cgemm_p
  1557. #define CGEMM_DEFAULT_R cgemm_r
  1558. #define ZGEMM_DEFAULT_P zgemm_p
  1559. #define ZGEMM_DEFAULT_R zgemm_r
  1560. #define XGEMM_DEFAULT_P xgemm_p
  1561. #define XGEMM_DEFAULT_R xgemm_r
  1562. #define SGEMM_DEFAULT_Q 256
  1563. #define DGEMM_DEFAULT_Q 256
  1564. #define QGEMM_DEFAULT_Q 256
  1565. #define CGEMM_DEFAULT_Q 256
  1566. #define ZGEMM_DEFAULT_Q 256
  1567. #define XGEMM_DEFAULT_Q 256
  1568. #endif
  1569. #ifdef ITANIUM2
  1570. #define SNUMOPT 4
  1571. #define DNUMOPT 4
  1572. #define GEMM_DEFAULT_OFFSET_A 0
  1573. #define GEMM_DEFAULT_OFFSET_B 128
  1574. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1575. #define SGEMM_DEFAULT_UNROLL_M 8
  1576. #define SGEMM_DEFAULT_UNROLL_N 8
  1577. #define DGEMM_DEFAULT_UNROLL_M 8
  1578. #define DGEMM_DEFAULT_UNROLL_N 8
  1579. #define QGEMM_DEFAULT_UNROLL_M 8
  1580. #define QGEMM_DEFAULT_UNROLL_N 8
  1581. #define CGEMM_DEFAULT_UNROLL_M 4
  1582. #define CGEMM_DEFAULT_UNROLL_N 4
  1583. #define ZGEMM_DEFAULT_UNROLL_M 4
  1584. #define ZGEMM_DEFAULT_UNROLL_N 4
  1585. #define XGEMM_DEFAULT_UNROLL_M 4
  1586. #define XGEMM_DEFAULT_UNROLL_N 4
  1587. #define SGEMM_DEFAULT_P sgemm_p
  1588. #define DGEMM_DEFAULT_P dgemm_p
  1589. #define QGEMM_DEFAULT_P qgemm_p
  1590. #define CGEMM_DEFAULT_P cgemm_p
  1591. #define ZGEMM_DEFAULT_P zgemm_p
  1592. #define XGEMM_DEFAULT_P xgemm_p
  1593. #define SGEMM_DEFAULT_Q 1024
  1594. #define DGEMM_DEFAULT_Q 1024
  1595. #define QGEMM_DEFAULT_Q 1024
  1596. #define CGEMM_DEFAULT_Q 1024
  1597. #define ZGEMM_DEFAULT_Q 1024
  1598. #define XGEMM_DEFAULT_Q 1024
  1599. #define SGEMM_DEFAULT_R sgemm_r
  1600. #define DGEMM_DEFAULT_R dgemm_r
  1601. #define QGEMM_DEFAULT_R qgemm_r
  1602. #define CGEMM_DEFAULT_R cgemm_r
  1603. #define ZGEMM_DEFAULT_R zgemm_r
  1604. #define XGEMM_DEFAULT_R xgemm_r
  1605. #define SYMV_P 16
  1606. #define GETRF_FACTOR 0.65
  1607. #endif
  1608. #if defined(EV4) || defined(EV5) || defined(EV6)
  1609. #ifdef EV4
  1610. #define SNUMOPT 1
  1611. #define DNUMOPT 1
  1612. #else
  1613. #define SNUMOPT 2
  1614. #define DNUMOPT 2
  1615. #endif
  1616. #define GEMM_DEFAULT_OFFSET_A 512
  1617. #define GEMM_DEFAULT_OFFSET_B 512
  1618. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1619. #define SGEMM_DEFAULT_UNROLL_M 4
  1620. #define SGEMM_DEFAULT_UNROLL_N 4
  1621. #define DGEMM_DEFAULT_UNROLL_M 4
  1622. #define DGEMM_DEFAULT_UNROLL_N 4
  1623. #define CGEMM_DEFAULT_UNROLL_M 2
  1624. #define CGEMM_DEFAULT_UNROLL_N 2
  1625. #define ZGEMM_DEFAULT_UNROLL_M 2
  1626. #define ZGEMM_DEFAULT_UNROLL_N 2
  1627. #define SYMV_P 8
  1628. #ifdef EV4
  1629. #define SGEMM_DEFAULT_P 32
  1630. #define SGEMM_DEFAULT_Q 112
  1631. #define SGEMM_DEFAULT_R 256
  1632. #define DGEMM_DEFAULT_P 32
  1633. #define DGEMM_DEFAULT_Q 56
  1634. #define DGEMM_DEFAULT_R 256
  1635. #define CGEMM_DEFAULT_P 32
  1636. #define CGEMM_DEFAULT_Q 64
  1637. #define CGEMM_DEFAULT_R 240
  1638. #define ZGEMM_DEFAULT_P 32
  1639. #define ZGEMM_DEFAULT_Q 32
  1640. #define ZGEMM_DEFAULT_R 240
  1641. #endif
  1642. #ifdef EV5
  1643. #define SGEMM_DEFAULT_P 64
  1644. #define SGEMM_DEFAULT_Q 256
  1645. #define DGEMM_DEFAULT_P 64
  1646. #define DGEMM_DEFAULT_Q 128
  1647. #define CGEMM_DEFAULT_P 64
  1648. #define CGEMM_DEFAULT_Q 128
  1649. #define ZGEMM_DEFAULT_P 64
  1650. #define ZGEMM_DEFAULT_Q 64
  1651. #endif
  1652. #ifdef EV6
  1653. #define SGEMM_DEFAULT_P 256
  1654. #define SGEMM_DEFAULT_Q 512
  1655. #define DGEMM_DEFAULT_P 256
  1656. #define DGEMM_DEFAULT_Q 256
  1657. #define CGEMM_DEFAULT_P 256
  1658. #define CGEMM_DEFAULT_Q 256
  1659. #define ZGEMM_DEFAULT_P 128
  1660. #define ZGEMM_DEFAULT_Q 256
  1661. #endif
  1662. #endif
  1663. #ifdef CELL
  1664. #define SNUMOPT 2
  1665. #define DNUMOPT 2
  1666. #define GEMM_DEFAULT_OFFSET_A 0
  1667. #define GEMM_DEFAULT_OFFSET_B 8192
  1668. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1669. #define SGEMM_DEFAULT_UNROLL_M 16
  1670. #define SGEMM_DEFAULT_UNROLL_N 4
  1671. #define DGEMM_DEFAULT_UNROLL_M 4
  1672. #define DGEMM_DEFAULT_UNROLL_N 4
  1673. #define CGEMM_DEFAULT_UNROLL_M 8
  1674. #define CGEMM_DEFAULT_UNROLL_N 2
  1675. #define ZGEMM_DEFAULT_UNROLL_M 2
  1676. #define ZGEMM_DEFAULT_UNROLL_N 2
  1677. #define SGEMM_DEFAULT_P 128
  1678. #define DGEMM_DEFAULT_P 128
  1679. #define CGEMM_DEFAULT_P 128
  1680. #define ZGEMM_DEFAULT_P 128
  1681. #define SGEMM_DEFAULT_Q 512
  1682. #define DGEMM_DEFAULT_Q 256
  1683. #define CGEMM_DEFAULT_Q 256
  1684. #define ZGEMM_DEFAULT_Q 128
  1685. #define SYMV_P 4
  1686. #endif
  1687. #ifdef PPCG4
  1688. #define GEMM_DEFAULT_OFFSET_A 0
  1689. #define GEMM_DEFAULT_OFFSET_B 1024
  1690. #define GEMM_DEFAULT_ALIGN LONGCAST 0x0ffffUL
  1691. #define SGEMM_DEFAULT_UNROLL_M 16
  1692. #define SGEMM_DEFAULT_UNROLL_N 4
  1693. #define DGEMM_DEFAULT_UNROLL_M 4
  1694. #define DGEMM_DEFAULT_UNROLL_N 4
  1695. #define CGEMM_DEFAULT_UNROLL_M 2
  1696. #define CGEMM_DEFAULT_UNROLL_N 2
  1697. #define ZGEMM_DEFAULT_UNROLL_M 2
  1698. #define ZGEMM_DEFAULT_UNROLL_N 2
  1699. #define SGEMM_DEFAULT_P 256
  1700. #define DGEMM_DEFAULT_P 128
  1701. #define CGEMM_DEFAULT_P 128
  1702. #define ZGEMM_DEFAULT_P 64
  1703. #define SGEMM_DEFAULT_Q 256
  1704. #define DGEMM_DEFAULT_Q 256
  1705. #define CGEMM_DEFAULT_Q 256
  1706. #define ZGEMM_DEFAULT_Q 256
  1707. #define SYMV_P 4
  1708. #endif
  1709. #ifdef PPC970
  1710. #define SNUMOPT 4
  1711. #define DNUMOPT 4
  1712. #define GEMM_DEFAULT_OFFSET_A 2688
  1713. #define GEMM_DEFAULT_OFFSET_B 3072
  1714. #define GEMM_DEFAULT_ALIGN LONGCAST 0x03fffUL
  1715. #if defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
  1716. #define SGEMM_DEFAULT_UNROLL_M 4
  1717. #else
  1718. #define SGEMM_DEFAULT_UNROLL_M 16
  1719. #endif
  1720. #define SGEMM_DEFAULT_UNROLL_N 4
  1721. #define DGEMM_DEFAULT_UNROLL_M 4
  1722. #define DGEMM_DEFAULT_UNROLL_N 4
  1723. #if defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
  1724. #define CGEMM_DEFAULT_UNROLL_M 2
  1725. #else
  1726. #define CGEMM_DEFAULT_UNROLL_M 8
  1727. #endif
  1728. #define CGEMM_DEFAULT_UNROLL_N 2
  1729. #define ZGEMM_DEFAULT_UNROLL_M 2
  1730. #define ZGEMM_DEFAULT_UNROLL_N 2
  1731. #if defined(OS_LINUX) || defined(OS_DARWIN) || defined(OS_FREEBSD)
  1732. #if L2_SIZE == 1024976
  1733. #define SGEMM_DEFAULT_P 320
  1734. #define DGEMM_DEFAULT_P 256
  1735. #define CGEMM_DEFAULT_P 256
  1736. #define ZGEMM_DEFAULT_P 256
  1737. #else
  1738. #define SGEMM_DEFAULT_P 176
  1739. #define DGEMM_DEFAULT_P 176
  1740. #define CGEMM_DEFAULT_P 176
  1741. #define ZGEMM_DEFAULT_P 176
  1742. #endif
  1743. #endif
  1744. #define SGEMM_DEFAULT_Q 512
  1745. #define DGEMM_DEFAULT_Q 256
  1746. #define CGEMM_DEFAULT_Q 256
  1747. #define ZGEMM_DEFAULT_Q 128
  1748. #define SYMV_P 4
  1749. #endif
  1750. #ifdef PPC440
  1751. #define SNUMOPT 2
  1752. #define DNUMOPT 2
  1753. #define GEMM_DEFAULT_OFFSET_A (32 * 0)
  1754. #define GEMM_DEFAULT_OFFSET_B (32 * 0)
  1755. #define GEMM_DEFAULT_ALIGN LONGCAST 0x0ffffUL
  1756. #define SGEMM_DEFAULT_UNROLL_M 4
  1757. #define SGEMM_DEFAULT_UNROLL_N 4
  1758. #define DGEMM_DEFAULT_UNROLL_M 4
  1759. #define DGEMM_DEFAULT_UNROLL_N 4
  1760. #define CGEMM_DEFAULT_UNROLL_M 2
  1761. #define CGEMM_DEFAULT_UNROLL_N 2
  1762. #define ZGEMM_DEFAULT_UNROLL_M 2
  1763. #define ZGEMM_DEFAULT_UNROLL_N 2
  1764. #define SGEMM_DEFAULT_P 512
  1765. #define DGEMM_DEFAULT_P 512
  1766. #define CGEMM_DEFAULT_P 512
  1767. #define ZGEMM_DEFAULT_P 512
  1768. #define SGEMM_DEFAULT_Q 1024
  1769. #define DGEMM_DEFAULT_Q 512
  1770. #define CGEMM_DEFAULT_Q 512
  1771. #define ZGEMM_DEFAULT_Q 256
  1772. #define SGEMM_DEFAULT_R SGEMM_DEFAULT_P
  1773. #define DGEMM_DEFAULT_R DGEMM_DEFAULT_P
  1774. #define CGEMM_DEFAULT_R CGEMM_DEFAULT_P
  1775. #define ZGEMM_DEFAULT_R ZGEMM_DEFAULT_P
  1776. #define SYMV_P 4
  1777. #endif
  1778. #ifdef PPC440FP2
  1779. #define SNUMOPT 4
  1780. #define DNUMOPT 4
  1781. #define GEMM_DEFAULT_OFFSET_A (32 * 0)
  1782. #define GEMM_DEFAULT_OFFSET_B (32 * 0)
  1783. #define GEMM_DEFAULT_ALIGN LONGCAST 0x0ffffUL
  1784. #define SGEMM_DEFAULT_UNROLL_M 8
  1785. #define SGEMM_DEFAULT_UNROLL_N 4
  1786. #define DGEMM_DEFAULT_UNROLL_M 8
  1787. #define DGEMM_DEFAULT_UNROLL_N 4
  1788. #define CGEMM_DEFAULT_UNROLL_M 4
  1789. #define CGEMM_DEFAULT_UNROLL_N 2
  1790. #define ZGEMM_DEFAULT_UNROLL_M 4
  1791. #define ZGEMM_DEFAULT_UNROLL_N 2
  1792. #define SGEMM_DEFAULT_P 128
  1793. #define DGEMM_DEFAULT_P 128
  1794. #define CGEMM_DEFAULT_P 128
  1795. #define ZGEMM_DEFAULT_P 128
  1796. #if 1
  1797. #define SGEMM_DEFAULT_Q 4096
  1798. #define DGEMM_DEFAULT_Q 3072
  1799. #define CGEMM_DEFAULT_Q 2048
  1800. #define ZGEMM_DEFAULT_Q 1024
  1801. #else
  1802. #define SGEMM_DEFAULT_Q 512
  1803. #define DGEMM_DEFAULT_Q 256
  1804. #define CGEMM_DEFAULT_Q 256
  1805. #define ZGEMM_DEFAULT_Q 128
  1806. #endif
  1807. #define SYMV_P 4
  1808. #endif
  1809. #if defined(POWER3) || defined(POWER4) || defined(POWER5)
  1810. #define GEMM_DEFAULT_OFFSET_A 0
  1811. #define GEMM_DEFAULT_OFFSET_B 2048
  1812. #define GEMM_DEFAULT_ALIGN LONGCAST 0x0ffffUL
  1813. #define SGEMM_DEFAULT_UNROLL_M 4
  1814. #define SGEMM_DEFAULT_UNROLL_N 4
  1815. #define DGEMM_DEFAULT_UNROLL_M 4
  1816. #define DGEMM_DEFAULT_UNROLL_N 4
  1817. #define CGEMM_DEFAULT_UNROLL_M 2
  1818. #define CGEMM_DEFAULT_UNROLL_N 2
  1819. #define ZGEMM_DEFAULT_UNROLL_M 2
  1820. #define ZGEMM_DEFAULT_UNROLL_N 2
  1821. #ifdef POWER3
  1822. #define SNUMOPT 4
  1823. #define DNUMOPT 4
  1824. #define SGEMM_DEFAULT_P 256
  1825. #define SGEMM_DEFAULT_Q 432
  1826. #define SGEMM_DEFAULT_R 1012
  1827. #define DGEMM_DEFAULT_P 256
  1828. #define DGEMM_DEFAULT_Q 216
  1829. #define DGEMM_DEFAULT_R 1012
  1830. #define ZGEMM_DEFAULT_P 256
  1831. #define ZGEMM_DEFAULT_Q 104
  1832. #define ZGEMM_DEFAULT_R 1012
  1833. #endif
  1834. #if defined(POWER4)
  1835. #ifdef ALLOC_HUGETLB
  1836. #define SGEMM_DEFAULT_P 184
  1837. #define DGEMM_DEFAULT_P 184
  1838. #define CGEMM_DEFAULT_P 184
  1839. #define ZGEMM_DEFAULT_P 184
  1840. #else
  1841. #define SGEMM_DEFAULT_P 144
  1842. #define DGEMM_DEFAULT_P 144
  1843. #define CGEMM_DEFAULT_P 144
  1844. #define ZGEMM_DEFAULT_P 144
  1845. #endif
  1846. #endif
  1847. #if defined(POWER5)
  1848. #ifdef ALLOC_HUGETLB
  1849. #define SGEMM_DEFAULT_P 512
  1850. #define DGEMM_DEFAULT_P 256
  1851. #define CGEMM_DEFAULT_P 256
  1852. #define ZGEMM_DEFAULT_P 128
  1853. #else
  1854. #define SGEMM_DEFAULT_P 320
  1855. #define DGEMM_DEFAULT_P 160
  1856. #define CGEMM_DEFAULT_P 160
  1857. #define ZGEMM_DEFAULT_P 80
  1858. #endif
  1859. #define SGEMM_DEFAULT_Q 256
  1860. #define CGEMM_DEFAULT_Q 256
  1861. #define DGEMM_DEFAULT_Q 256
  1862. #define ZGEMM_DEFAULT_Q 256
  1863. #endif
  1864. #define SYMV_P 8
  1865. #endif
  1866. #if defined(POWER6)
  1867. #define SNUMOPT 4
  1868. #define DNUMOPT 4
  1869. #define GEMM_DEFAULT_OFFSET_A 384
  1870. #define GEMM_DEFAULT_OFFSET_B 1024
  1871. #define GEMM_DEFAULT_ALIGN LONGCAST 0x03fffUL
  1872. #define SGEMM_DEFAULT_UNROLL_M 4
  1873. #define SGEMM_DEFAULT_UNROLL_N 4
  1874. #define DGEMM_DEFAULT_UNROLL_M 4
  1875. #define DGEMM_DEFAULT_UNROLL_N 4
  1876. #define CGEMM_DEFAULT_UNROLL_M 2
  1877. #define CGEMM_DEFAULT_UNROLL_N 4
  1878. #define ZGEMM_DEFAULT_UNROLL_M 2
  1879. #define ZGEMM_DEFAULT_UNROLL_N 4
  1880. #define SGEMM_DEFAULT_P 992
  1881. #define DGEMM_DEFAULT_P 480
  1882. #define CGEMM_DEFAULT_P 488
  1883. #define ZGEMM_DEFAULT_P 248
  1884. #define SGEMM_DEFAULT_Q 504
  1885. #define DGEMM_DEFAULT_Q 504
  1886. #define CGEMM_DEFAULT_Q 400
  1887. #define ZGEMM_DEFAULT_Q 400
  1888. #define SYMV_P 8
  1889. #endif
  1890. #if defined(POWER8)
  1891. #define SNUMOPT 16
  1892. #define DNUMOPT 8
  1893. #define GEMM_DEFAULT_OFFSET_A 0
  1894. #define GEMM_DEFAULT_OFFSET_B 65536
  1895. #define GEMM_DEFAULT_ALIGN LONGCAST 0x0ffffUL
  1896. #if defined(__32BIT__)
  1897. #warning using BINARY32==POWER6
  1898. #define SGEMM_DEFAULT_UNROLL_M 4
  1899. #define SGEMM_DEFAULT_UNROLL_N 4
  1900. #define DGEMM_DEFAULT_UNROLL_M 4
  1901. #define DGEMM_DEFAULT_UNROLL_N 4
  1902. #define CGEMM_DEFAULT_UNROLL_M 2
  1903. #define CGEMM_DEFAULT_UNROLL_N 4
  1904. #define ZGEMM_DEFAULT_UNROLL_M 2
  1905. #define ZGEMM_DEFAULT_UNROLL_N 4
  1906. #else
  1907. #define SGEMM_DEFAULT_UNROLL_M 16
  1908. #define SGEMM_DEFAULT_UNROLL_N 8
  1909. #define DGEMM_DEFAULT_UNROLL_M 16
  1910. #define DGEMM_DEFAULT_UNROLL_N 4
  1911. #define CGEMM_DEFAULT_UNROLL_M 8
  1912. #define CGEMM_DEFAULT_UNROLL_N 4
  1913. #define ZGEMM_DEFAULT_UNROLL_M 8
  1914. #define ZGEMM_DEFAULT_UNROLL_N 2
  1915. #endif
  1916. #define SGEMM_DEFAULT_P 1280UL
  1917. #define DGEMM_DEFAULT_P 640UL
  1918. #define CGEMM_DEFAULT_P 640UL
  1919. #define ZGEMM_DEFAULT_P 320UL
  1920. #define SGEMM_DEFAULT_Q 640UL
  1921. #define DGEMM_DEFAULT_Q 720UL
  1922. #define CGEMM_DEFAULT_Q 640UL
  1923. #define ZGEMM_DEFAULT_Q 640UL
  1924. #if 0
  1925. #define SGEMM_DEFAULT_R SGEMM_DEFAULT_P
  1926. #define DGEMM_DEFAULT_R DGEMM_DEFAULT_P
  1927. #define CGEMM_DEFAULT_R CGEMM_DEFAULT_P
  1928. #define ZGEMM_DEFAULT_R ZGEMM_DEFAULT_P
  1929. #endif
  1930. #define SGEMM_DEFAULT_R 4096
  1931. #define DGEMM_DEFAULT_R 4096
  1932. #define CGEMM_DEFAULT_R 4096
  1933. #define ZGEMM_DEFAULT_R 4096
  1934. #define SYMV_P 8
  1935. #endif
  1936. #if defined(POWER9)
  1937. #define SNUMOPT 16
  1938. #define DNUMOPT 8
  1939. #define GEMM_DEFAULT_OFFSET_A 0
  1940. #define GEMM_DEFAULT_OFFSET_B 65536
  1941. #define GEMM_DEFAULT_ALIGN LONGCAST 0x0ffffUL
  1942. #define SWITCH_RATIO 16
  1943. #define GEMM_PREFERED_SIZE 16
  1944. #define SGEMM_DEFAULT_UNROLL_M 16
  1945. #define SGEMM_DEFAULT_UNROLL_N 8
  1946. #define DGEMM_DEFAULT_UNROLL_M 16
  1947. #define DGEMM_DEFAULT_UNROLL_N 4
  1948. #define CGEMM_DEFAULT_UNROLL_M 8
  1949. #define CGEMM_DEFAULT_UNROLL_N 4
  1950. #define ZGEMM_DEFAULT_UNROLL_M 8
  1951. #define ZGEMM_DEFAULT_UNROLL_N 2
  1952. #define SGEMM_DEFAULT_P 832
  1953. #define DGEMM_DEFAULT_P 128
  1954. #define CGEMM_DEFAULT_P 512
  1955. #define ZGEMM_DEFAULT_P 256
  1956. #define SGEMM_DEFAULT_Q 1026
  1957. #define DGEMM_DEFAULT_Q 384
  1958. #define CGEMM_DEFAULT_Q 1026
  1959. #define ZGEMM_DEFAULT_Q 1026
  1960. #define SGEMM_DEFAULT_R 4096
  1961. #define DGEMM_DEFAULT_R 4096
  1962. #define CGEMM_DEFAULT_R 4096
  1963. #define ZGEMM_DEFAULT_R 4096
  1964. #define SYMV_P 8
  1965. #endif
  1966. #if defined(POWER10)
  1967. #define SNUMOPT 16
  1968. #define DNUMOPT 8
  1969. #define GEMM_DEFAULT_OFFSET_A 0
  1970. #define GEMM_DEFAULT_OFFSET_B 65536
  1971. #define GEMM_DEFAULT_ALIGN LONGCAST 0x0ffffUL
  1972. #define SWITCH_RATIO 16
  1973. #define GEMM_PREFERED_SIZE 16
  1974. #define SGEMM_DEFAULT_UNROLL_M 16
  1975. #define SGEMM_DEFAULT_UNROLL_N 8
  1976. #if defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
  1977. #define DGEMM_DEFAULT_UNROLL_M 16
  1978. #define DGEMM_DEFAULT_UNROLL_N 4
  1979. #else
  1980. #define DGEMM_DEFAULT_UNROLL_M 8
  1981. #define DGEMM_DEFAULT_UNROLL_N 8
  1982. #endif
  1983. #define CGEMM_DEFAULT_UNROLL_M 8
  1984. #define CGEMM_DEFAULT_UNROLL_N 4
  1985. #define ZGEMM_DEFAULT_UNROLL_M 8
  1986. #define ZGEMM_DEFAULT_UNROLL_N 2
  1987. #define SGEMM_DEFAULT_P 512
  1988. #define DGEMM_DEFAULT_P 384
  1989. #define CGEMM_DEFAULT_P 512
  1990. #define ZGEMM_DEFAULT_P 256
  1991. #define SGEMM_DEFAULT_Q 512
  1992. #define DGEMM_DEFAULT_Q 512
  1993. #define CGEMM_DEFAULT_Q 384
  1994. #define ZGEMM_DEFAULT_Q 384
  1995. #define SGEMM_DEFAULT_R 4096
  1996. #define DGEMM_DEFAULT_R 4096
  1997. #define CGEMM_DEFAULT_R 4096
  1998. #define ZGEMM_DEFAULT_R 4096
  1999. #define SYMV_P 8
  2000. #undef SBGEMM_DEFAULT_UNROLL_N
  2001. #undef SBGEMM_DEFAULT_UNROLL_M
  2002. #undef SBGEMM_DEFAULT_P
  2003. #undef SBGEMM_DEFAULT_R
  2004. #undef SBGEMM_DEFAULT_Q
  2005. #define SBGEMM_DEFAULT_UNROLL_M 16
  2006. #define SBGEMM_DEFAULT_UNROLL_N 8
  2007. #define SBGEMM_DEFAULT_P 832
  2008. #define SBGEMM_DEFAULT_Q 1026
  2009. #define SBGEMM_DEFAULT_R 4096
  2010. #endif
  2011. #if defined(SPARC) && defined(V7)
  2012. #define SNUMOPT 4
  2013. #define DNUMOPT 4
  2014. #define GEMM_DEFAULT_OFFSET_A 0
  2015. #define GEMM_DEFAULT_OFFSET_B 2048
  2016. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2017. #define SGEMM_DEFAULT_UNROLL_M 2
  2018. #define SGEMM_DEFAULT_UNROLL_N 8
  2019. #define DGEMM_DEFAULT_UNROLL_M 2
  2020. #define DGEMM_DEFAULT_UNROLL_N 8
  2021. #define CGEMM_DEFAULT_UNROLL_M 1
  2022. #define CGEMM_DEFAULT_UNROLL_N 4
  2023. #define ZGEMM_DEFAULT_UNROLL_M 1
  2024. #define ZGEMM_DEFAULT_UNROLL_N 4
  2025. #define SGEMM_DEFAULT_P 256
  2026. #define DGEMM_DEFAULT_P 256
  2027. #define CGEMM_DEFAULT_P 256
  2028. #define ZGEMM_DEFAULT_P 256
  2029. #define SGEMM_DEFAULT_Q 512
  2030. #define DGEMM_DEFAULT_Q 256
  2031. #define CGEMM_DEFAULT_Q 256
  2032. #define ZGEMM_DEFAULT_Q 128
  2033. #define SYMV_P 8
  2034. #define GEMM_THREAD gemm_thread_mn
  2035. #endif
  2036. #if (defined(SPARC) && defined(V9)) || defined(__sparc_v9__)
  2037. #define SNUMOPT 2
  2038. #define DNUMOPT 2
  2039. #define GEMM_DEFAULT_OFFSET_A 0
  2040. #define GEMM_DEFAULT_OFFSET_B 2048
  2041. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2042. #define SGEMM_DEFAULT_UNROLL_M 4
  2043. #define SGEMM_DEFAULT_UNROLL_N 4
  2044. #define DGEMM_DEFAULT_UNROLL_M 4
  2045. #define DGEMM_DEFAULT_UNROLL_N 4
  2046. #define CGEMM_DEFAULT_UNROLL_M 2
  2047. #define CGEMM_DEFAULT_UNROLL_N 2
  2048. #define ZGEMM_DEFAULT_UNROLL_M 2
  2049. #define ZGEMM_DEFAULT_UNROLL_N 2
  2050. #define SGEMM_DEFAULT_P 512
  2051. #define DGEMM_DEFAULT_P 512
  2052. #define CGEMM_DEFAULT_P 512
  2053. #define ZGEMM_DEFAULT_P 512
  2054. #define SGEMM_DEFAULT_Q 1024
  2055. #define DGEMM_DEFAULT_Q 512
  2056. #define CGEMM_DEFAULT_Q 512
  2057. #define ZGEMM_DEFAULT_Q 256
  2058. #define SYMV_P 8
  2059. #endif
  2060. #ifdef SICORTEX
  2061. #define SNUMOPT 2
  2062. #define DNUMOPT 2
  2063. #define GEMM_DEFAULT_OFFSET_A 0
  2064. #define GEMM_DEFAULT_OFFSET_B 0
  2065. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2066. #define SGEMM_DEFAULT_UNROLL_M 2
  2067. #define SGEMM_DEFAULT_UNROLL_N 8
  2068. #define DGEMM_DEFAULT_UNROLL_M 2
  2069. #define DGEMM_DEFAULT_UNROLL_N 8
  2070. #define CGEMM_DEFAULT_UNROLL_M 1
  2071. #define CGEMM_DEFAULT_UNROLL_N 4
  2072. #define ZGEMM_DEFAULT_UNROLL_M 1
  2073. #define ZGEMM_DEFAULT_UNROLL_N 4
  2074. #define SGEMM_DEFAULT_P 108
  2075. #define DGEMM_DEFAULT_P 112
  2076. #define CGEMM_DEFAULT_P 108
  2077. #define ZGEMM_DEFAULT_P 112
  2078. #define SGEMM_DEFAULT_Q 288
  2079. #define DGEMM_DEFAULT_Q 144
  2080. #define CGEMM_DEFAULT_Q 144
  2081. #define ZGEMM_DEFAULT_Q 72
  2082. #define SGEMM_DEFAULT_R 2000
  2083. #define DGEMM_DEFAULT_R 2000
  2084. #define CGEMM_DEFAULT_R 2000
  2085. #define ZGEMM_DEFAULT_R 2000
  2086. #define SYMV_P 16
  2087. #endif
  2088. #if defined(LOONGSON3R4)
  2089. #define SNUMOPT 2
  2090. #define DNUMOPT 2
  2091. #define GEMM_DEFAULT_OFFSET_A 0
  2092. #define GEMM_DEFAULT_OFFSET_B 0
  2093. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2094. #ifdef HAVE_MSA
  2095. #define SGEMM_DEFAULT_UNROLL_M 8
  2096. #define SGEMM_DEFAULT_UNROLL_N 8
  2097. #define DGEMM_DEFAULT_UNROLL_M 8
  2098. #define DGEMM_DEFAULT_UNROLL_N 4
  2099. #define CGEMM_DEFAULT_UNROLL_M 8
  2100. #define CGEMM_DEFAULT_UNROLL_N 4
  2101. #define ZGEMM_DEFAULT_UNROLL_M 4
  2102. #define ZGEMM_DEFAULT_UNROLL_N 4
  2103. #else
  2104. #define SGEMM_DEFAULT_UNROLL_M 8
  2105. #define SGEMM_DEFAULT_UNROLL_N 4
  2106. #define DGEMM_DEFAULT_UNROLL_M 4
  2107. #define DGEMM_DEFAULT_UNROLL_N 4
  2108. #define CGEMM_DEFAULT_UNROLL_M 4
  2109. #define CGEMM_DEFAULT_UNROLL_N 2
  2110. #define ZGEMM_DEFAULT_UNROLL_M 2
  2111. #define ZGEMM_DEFAULT_UNROLL_N 2
  2112. #endif
  2113. #define SGEMM_DEFAULT_P 64
  2114. #define DGEMM_DEFAULT_P 44
  2115. #define CGEMM_DEFAULT_P 64
  2116. #define ZGEMM_DEFAULT_P 32
  2117. #define SGEMM_DEFAULT_Q 192
  2118. #define DGEMM_DEFAULT_Q 92
  2119. #define CGEMM_DEFAULT_Q 128
  2120. #define ZGEMM_DEFAULT_Q 80
  2121. #define SGEMM_DEFAULT_R 640
  2122. #define DGEMM_DEFAULT_R dgemm_r
  2123. #define CGEMM_DEFAULT_R 640
  2124. #define ZGEMM_DEFAULT_R 640
  2125. #define GEMM_OFFSET_A1 0x10000
  2126. #define GEMM_OFFSET_B1 0x100000
  2127. #define SYMV_P 16
  2128. #endif
  2129. #if defined(LOONGSON3R3)
  2130. ////Copy from SICORTEX
  2131. #define SNUMOPT 2
  2132. #define DNUMOPT 2
  2133. #define GEMM_DEFAULT_OFFSET_A 0
  2134. #define GEMM_DEFAULT_OFFSET_B 0
  2135. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2136. #define SGEMM_DEFAULT_UNROLL_M 8
  2137. #define SGEMM_DEFAULT_UNROLL_N 4
  2138. #define DGEMM_DEFAULT_UNROLL_M 4
  2139. #define DGEMM_DEFAULT_UNROLL_N 4
  2140. #define CGEMM_DEFAULT_UNROLL_M 4
  2141. #define CGEMM_DEFAULT_UNROLL_N 2
  2142. #define ZGEMM_DEFAULT_UNROLL_M 2
  2143. #define ZGEMM_DEFAULT_UNROLL_N 2
  2144. #define SGEMM_DEFAULT_P 64
  2145. #define DGEMM_DEFAULT_P 44
  2146. #define CGEMM_DEFAULT_P 64
  2147. #define ZGEMM_DEFAULT_P 32
  2148. #define SGEMM_DEFAULT_Q 192
  2149. #define DGEMM_DEFAULT_Q 92
  2150. #define CGEMM_DEFAULT_Q 128
  2151. #define ZGEMM_DEFAULT_Q 80
  2152. #define SGEMM_DEFAULT_R 640
  2153. #define DGEMM_DEFAULT_R dgemm_r
  2154. #define CGEMM_DEFAULT_R 640
  2155. #define ZGEMM_DEFAULT_R 640
  2156. #define GEMM_OFFSET_A1 0x10000
  2157. #define GEMM_OFFSET_B1 0x100000
  2158. #define SYMV_P 16
  2159. #endif
  2160. #if defined(P5600) || defined(MIPS1004K) || defined(MIPS24K) || defined(I6400) || defined(P6600) || defined(I6500)
  2161. #define SNUMOPT 2
  2162. #define DNUMOPT 2
  2163. #define GEMM_DEFAULT_OFFSET_A 0
  2164. #define GEMM_DEFAULT_OFFSET_B 0
  2165. #define GEMM_DEFAULT_ALIGN (BLASLONG) 0x03fffUL
  2166. #ifdef HAVE_MSA
  2167. #define SGEMM_DEFAULT_UNROLL_M 8
  2168. #define SGEMM_DEFAULT_UNROLL_N 8
  2169. #define DGEMM_DEFAULT_UNROLL_M 8
  2170. #define DGEMM_DEFAULT_UNROLL_N 4
  2171. #define CGEMM_DEFAULT_UNROLL_M 8
  2172. #define CGEMM_DEFAULT_UNROLL_N 4
  2173. #define ZGEMM_DEFAULT_UNROLL_M 4
  2174. #define ZGEMM_DEFAULT_UNROLL_N 4
  2175. #else
  2176. #define SGEMM_DEFAULT_UNROLL_M 2
  2177. #define SGEMM_DEFAULT_UNROLL_N 2
  2178. #define DGEMM_DEFAULT_UNROLL_M 2
  2179. #define DGEMM_DEFAULT_UNROLL_N 2
  2180. #define CGEMM_DEFAULT_UNROLL_M 2
  2181. #define CGEMM_DEFAULT_UNROLL_N 2
  2182. #define ZGEMM_DEFAULT_UNROLL_M 2
  2183. #define ZGEMM_DEFAULT_UNROLL_N 2
  2184. #endif
  2185. #define SGEMM_DEFAULT_P 128
  2186. #define DGEMM_DEFAULT_P 128
  2187. #define CGEMM_DEFAULT_P 96
  2188. #define ZGEMM_DEFAULT_P 64
  2189. #define SGEMM_DEFAULT_Q 240
  2190. #define DGEMM_DEFAULT_Q 120
  2191. #define CGEMM_DEFAULT_Q 120
  2192. #define ZGEMM_DEFAULT_Q 120
  2193. #define SGEMM_DEFAULT_R 12288
  2194. #define DGEMM_DEFAULT_R 8192
  2195. #define CGEMM_DEFAULT_R 4096
  2196. #define ZGEMM_DEFAULT_R 4096
  2197. #define SYMV_P 16
  2198. #endif
  2199. #ifdef RISCV64_GENERIC
  2200. #define GEMM_DEFAULT_OFFSET_A 0
  2201. #define GEMM_DEFAULT_OFFSET_B 0
  2202. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2203. #define SGEMM_DEFAULT_UNROLL_M 2
  2204. #define SGEMM_DEFAULT_UNROLL_N 2
  2205. #define DGEMM_DEFAULT_UNROLL_M 2
  2206. #define DGEMM_DEFAULT_UNROLL_N 2
  2207. #define CGEMM_DEFAULT_UNROLL_M 2
  2208. #define CGEMM_DEFAULT_UNROLL_N 2
  2209. #define ZGEMM_DEFAULT_UNROLL_M 2
  2210. #define ZGEMM_DEFAULT_UNROLL_N 2
  2211. #define SGEMM_DEFAULT_P 128
  2212. #define DGEMM_DEFAULT_P 128
  2213. #define CGEMM_DEFAULT_P 96
  2214. #define ZGEMM_DEFAULT_P 64
  2215. #define SGEMM_DEFAULT_Q 240
  2216. #define DGEMM_DEFAULT_Q 120
  2217. #define CGEMM_DEFAULT_Q 120
  2218. #define ZGEMM_DEFAULT_Q 120
  2219. #define SGEMM_DEFAULT_R 12288
  2220. #define DGEMM_DEFAULT_R 8192
  2221. #define CGEMM_DEFAULT_R 4096
  2222. #define ZGEMM_DEFAULT_R 4096
  2223. #define SYMV_P 16
  2224. #define GEMM_DEFAULT_OFFSET_A 0
  2225. #define GEMM_DEFAULT_OFFSET_B 0
  2226. #endif
  2227. #ifdef C910V
  2228. #define GEMM_DEFAULT_OFFSET_A 0
  2229. #define GEMM_DEFAULT_OFFSET_B 0
  2230. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2231. #define SGEMM_DEFAULT_UNROLL_M 16
  2232. #define SGEMM_DEFAULT_UNROLL_N 4
  2233. #define DGEMM_DEFAULT_UNROLL_M 8
  2234. #define DGEMM_DEFAULT_UNROLL_N 4
  2235. #define CGEMM_DEFAULT_UNROLL_M 2
  2236. #define CGEMM_DEFAULT_UNROLL_N 2
  2237. #define ZGEMM_DEFAULT_UNROLL_M 2
  2238. #define ZGEMM_DEFAULT_UNROLL_N 2
  2239. #define SGEMM_DEFAULT_P 160
  2240. #define DGEMM_DEFAULT_P 160
  2241. #define CGEMM_DEFAULT_P 96
  2242. #define ZGEMM_DEFAULT_P 64
  2243. #define SGEMM_DEFAULT_Q 240
  2244. #define DGEMM_DEFAULT_Q 128
  2245. #define CGEMM_DEFAULT_Q 120
  2246. #define ZGEMM_DEFAULT_Q 120
  2247. #define SGEMM_DEFAULT_R 12288
  2248. #define DGEMM_DEFAULT_R 8192
  2249. #define CGEMM_DEFAULT_R 4096
  2250. #define ZGEMM_DEFAULT_R 4096
  2251. #define SYMV_P 16
  2252. #define GEMM_DEFAULT_OFFSET_A 0
  2253. #define GEMM_DEFAULT_OFFSET_B 0
  2254. #endif
  2255. #ifdef ARMV7
  2256. #define SNUMOPT 2
  2257. #define DNUMOPT 2
  2258. #define GEMM_DEFAULT_OFFSET_A 0
  2259. #define GEMM_DEFAULT_OFFSET_B 0
  2260. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2261. #define SGEMM_DEFAULT_UNROLL_M 4
  2262. #define SGEMM_DEFAULT_UNROLL_N 4
  2263. #define DGEMM_DEFAULT_UNROLL_M 4
  2264. #define DGEMM_DEFAULT_UNROLL_N 4
  2265. #define CGEMM_DEFAULT_UNROLL_M 2
  2266. #define CGEMM_DEFAULT_UNROLL_N 2
  2267. #define ZGEMM_DEFAULT_UNROLL_M 2
  2268. #define ZGEMM_DEFAULT_UNROLL_N 2
  2269. #define SGEMM_DEFAULT_P 128
  2270. #define DGEMM_DEFAULT_P 128
  2271. #define CGEMM_DEFAULT_P 96
  2272. #define ZGEMM_DEFAULT_P 64
  2273. #define SGEMM_DEFAULT_Q 240
  2274. #define DGEMM_DEFAULT_Q 120
  2275. #define CGEMM_DEFAULT_Q 120
  2276. #define ZGEMM_DEFAULT_Q 120
  2277. #define SGEMM_DEFAULT_R 12288
  2278. #define DGEMM_DEFAULT_R 8192
  2279. #define CGEMM_DEFAULT_R 4096
  2280. #define ZGEMM_DEFAULT_R 4096
  2281. #define SYMV_P 16
  2282. #endif
  2283. #if defined(ARMV6)
  2284. #define SNUMOPT 2
  2285. #define DNUMOPT 2
  2286. #define GEMM_DEFAULT_OFFSET_A 0
  2287. #define GEMM_DEFAULT_OFFSET_B 0
  2288. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2289. #define SGEMM_DEFAULT_UNROLL_M 4
  2290. #define SGEMM_DEFAULT_UNROLL_N 2
  2291. #define DGEMM_DEFAULT_UNROLL_M 4
  2292. #define DGEMM_DEFAULT_UNROLL_N 2
  2293. #define CGEMM_DEFAULT_UNROLL_M 2
  2294. #define CGEMM_DEFAULT_UNROLL_N 2
  2295. #define ZGEMM_DEFAULT_UNROLL_M 2
  2296. #define ZGEMM_DEFAULT_UNROLL_N 2
  2297. #define SGEMM_DEFAULT_P 128
  2298. #define DGEMM_DEFAULT_P 128
  2299. #define CGEMM_DEFAULT_P 96
  2300. #define ZGEMM_DEFAULT_P 64
  2301. #define SGEMM_DEFAULT_Q 240
  2302. #define DGEMM_DEFAULT_Q 120
  2303. #define CGEMM_DEFAULT_Q 120
  2304. #define ZGEMM_DEFAULT_Q 120
  2305. #define SGEMM_DEFAULT_R 12288
  2306. #define DGEMM_DEFAULT_R 8192
  2307. #define CGEMM_DEFAULT_R 4096
  2308. #define ZGEMM_DEFAULT_R 4096
  2309. #define SYMV_P 16
  2310. #endif
  2311. /* Common ARMv8 parameters */
  2312. #if defined(ARMV8)
  2313. #define SNUMOPT 2
  2314. #define DNUMOPT 2
  2315. #define GEMM_DEFAULT_OFFSET_A 0
  2316. #define GEMM_DEFAULT_OFFSET_B 0
  2317. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2318. #define SYMV_P 16
  2319. #if defined(CORTEXA57) || \
  2320. defined(CORTEXA72) || defined(CORTEXA73) || \
  2321. defined(FALKOR) || defined(TSV110) || defined(EMAG8180)
  2322. #define SGEMM_DEFAULT_UNROLL_M 16
  2323. #define SGEMM_DEFAULT_UNROLL_N 4
  2324. #define DGEMM_DEFAULT_UNROLL_M 8
  2325. #define DGEMM_DEFAULT_UNROLL_N 4
  2326. #define CGEMM_DEFAULT_UNROLL_M 8
  2327. #define CGEMM_DEFAULT_UNROLL_N 4
  2328. #define ZGEMM_DEFAULT_UNROLL_M 4
  2329. #define ZGEMM_DEFAULT_UNROLL_N 4
  2330. /*FIXME: this should be using the cache size, but there is currently no easy way to
  2331. query that on ARM. So if getarch counted more than 8 cores we simply assume the host
  2332. is a big desktop or server with abundant cache rather than a phone or embedded device */
  2333. #if NUM_CORES > 8 || defined(TSV110) || defined(EMAG8180)
  2334. #define SGEMM_DEFAULT_P 512
  2335. #define DGEMM_DEFAULT_P 256
  2336. #define CGEMM_DEFAULT_P 256
  2337. #define ZGEMM_DEFAULT_P 128
  2338. #define SGEMM_DEFAULT_Q 1024
  2339. #define DGEMM_DEFAULT_Q 512
  2340. #define CGEMM_DEFAULT_Q 512
  2341. #define ZGEMM_DEFAULT_Q 512
  2342. #else
  2343. #define SGEMM_DEFAULT_P 128
  2344. #define DGEMM_DEFAULT_P 160
  2345. #define CGEMM_DEFAULT_P 128
  2346. #define ZGEMM_DEFAULT_P 128
  2347. #define SGEMM_DEFAULT_Q 352
  2348. #define DGEMM_DEFAULT_Q 128
  2349. #define CGEMM_DEFAULT_Q 224
  2350. #define ZGEMM_DEFAULT_Q 112
  2351. #endif
  2352. #define SGEMM_DEFAULT_R 4096
  2353. #define DGEMM_DEFAULT_R 4096
  2354. #define CGEMM_DEFAULT_R 4096
  2355. #define ZGEMM_DEFAULT_R 2048
  2356. #elif defined(CORTEXA53)
  2357. #define SGEMM_DEFAULT_UNROLL_M 8
  2358. #define SGEMM_DEFAULT_UNROLL_N 8
  2359. #define DGEMM_DEFAULT_UNROLL_M 8
  2360. #define DGEMM_DEFAULT_UNROLL_N 4
  2361. #define CGEMM_DEFAULT_UNROLL_M 8
  2362. #define CGEMM_DEFAULT_UNROLL_N 4
  2363. #define ZGEMM_DEFAULT_UNROLL_M 4
  2364. #define ZGEMM_DEFAULT_UNROLL_N 4
  2365. #define SGEMM_DEFAULT_P 256
  2366. #define DGEMM_DEFAULT_P 160
  2367. #define CGEMM_DEFAULT_P 128
  2368. #define ZGEMM_DEFAULT_P 128
  2369. #define SGEMM_DEFAULT_Q 256
  2370. #define DGEMM_DEFAULT_Q 128
  2371. #define CGEMM_DEFAULT_Q 224
  2372. #define ZGEMM_DEFAULT_Q 112
  2373. #define SGEMM_DEFAULT_R 4096
  2374. #define DGEMM_DEFAULT_R 4096
  2375. #define CGEMM_DEFAULT_R 4096
  2376. #define ZGEMM_DEFAULT_R 2048
  2377. #elif defined(THUNDERX)
  2378. #define SGEMM_DEFAULT_UNROLL_M 4
  2379. #define SGEMM_DEFAULT_UNROLL_N 4
  2380. #define DGEMM_DEFAULT_UNROLL_M 2
  2381. #define DGEMM_DEFAULT_UNROLL_N 2
  2382. #define CGEMM_DEFAULT_UNROLL_M 2
  2383. #define CGEMM_DEFAULT_UNROLL_N 2
  2384. #define ZGEMM_DEFAULT_UNROLL_M 2
  2385. #define ZGEMM_DEFAULT_UNROLL_N 2
  2386. #define SGEMM_DEFAULT_P 128
  2387. #define DGEMM_DEFAULT_P 128
  2388. #define CGEMM_DEFAULT_P 96
  2389. #define ZGEMM_DEFAULT_P 64
  2390. #define SGEMM_DEFAULT_Q 240
  2391. #define DGEMM_DEFAULT_Q 120
  2392. #define CGEMM_DEFAULT_Q 120
  2393. #define ZGEMM_DEFAULT_Q 120
  2394. #define SGEMM_DEFAULT_R 12288
  2395. #define DGEMM_DEFAULT_R 8192
  2396. #define CGEMM_DEFAULT_R 4096
  2397. #define ZGEMM_DEFAULT_R 4096
  2398. #elif defined(THUNDERX2T99)
  2399. #define SGEMM_DEFAULT_UNROLL_M 16
  2400. #define SGEMM_DEFAULT_UNROLL_N 4
  2401. #define DGEMM_DEFAULT_UNROLL_M 8
  2402. #define DGEMM_DEFAULT_UNROLL_N 4
  2403. #define CGEMM_DEFAULT_UNROLL_M 8
  2404. #define CGEMM_DEFAULT_UNROLL_N 4
  2405. #define ZGEMM_DEFAULT_UNROLL_M 4
  2406. #define ZGEMM_DEFAULT_UNROLL_N 4
  2407. #define SGEMM_DEFAULT_P 128
  2408. #define DGEMM_DEFAULT_P 160
  2409. #define CGEMM_DEFAULT_P 128
  2410. #define ZGEMM_DEFAULT_P 128
  2411. #define SGEMM_DEFAULT_Q 352
  2412. #define DGEMM_DEFAULT_Q 128
  2413. #define CGEMM_DEFAULT_Q 224
  2414. #define ZGEMM_DEFAULT_Q 112
  2415. #define SGEMM_DEFAULT_R 4096
  2416. #define DGEMM_DEFAULT_R 4096
  2417. #define CGEMM_DEFAULT_R 4096
  2418. #define ZGEMM_DEFAULT_R 4096
  2419. #elif defined(THUNDERX3T110)
  2420. #define SGEMM_DEFAULT_UNROLL_M 16
  2421. #define SGEMM_DEFAULT_UNROLL_N 4
  2422. #define DGEMM_DEFAULT_UNROLL_M 8
  2423. #define DGEMM_DEFAULT_UNROLL_N 4
  2424. #define CGEMM_DEFAULT_UNROLL_M 8
  2425. #define CGEMM_DEFAULT_UNROLL_N 4
  2426. #define ZGEMM_DEFAULT_UNROLL_M 4
  2427. #define ZGEMM_DEFAULT_UNROLL_N 4
  2428. #define SGEMM_DEFAULT_P 128
  2429. #define DGEMM_DEFAULT_P 320
  2430. #define CGEMM_DEFAULT_P 128
  2431. #define ZGEMM_DEFAULT_P 128
  2432. #define SGEMM_DEFAULT_Q 352
  2433. #define DGEMM_DEFAULT_Q 128
  2434. #define CGEMM_DEFAULT_Q 224
  2435. #define ZGEMM_DEFAULT_Q 112
  2436. #define SGEMM_DEFAULT_R 4096
  2437. #define DGEMM_DEFAULT_R 4096
  2438. #define CGEMM_DEFAULT_R 4096
  2439. #define ZGEMM_DEFAULT_R 4096
  2440. #elif defined(NEOVERSEN1)
  2441. #define SGEMM_DEFAULT_UNROLL_M 16
  2442. #define SGEMM_DEFAULT_UNROLL_N 4
  2443. #define DGEMM_DEFAULT_UNROLL_M 8
  2444. #define DGEMM_DEFAULT_UNROLL_N 4
  2445. #define CGEMM_DEFAULT_UNROLL_M 8
  2446. #define CGEMM_DEFAULT_UNROLL_N 4
  2447. #define ZGEMM_DEFAULT_UNROLL_M 4
  2448. #define ZGEMM_DEFAULT_UNROLL_N 4
  2449. #define SGEMM_DEFAULT_P 128
  2450. #define DGEMM_DEFAULT_P 160
  2451. #define CGEMM_DEFAULT_P 128
  2452. #define ZGEMM_DEFAULT_P 128
  2453. #define SGEMM_DEFAULT_Q 352
  2454. #define DGEMM_DEFAULT_Q 128
  2455. #define CGEMM_DEFAULT_Q 224
  2456. #define ZGEMM_DEFAULT_Q 112
  2457. #define SGEMM_DEFAULT_R 4096
  2458. #define DGEMM_DEFAULT_R 4096
  2459. #define CGEMM_DEFAULT_R 4096
  2460. #define ZGEMM_DEFAULT_R 4096
  2461. #else /* Other/undetected ARMv8 cores */
  2462. #define SGEMM_DEFAULT_UNROLL_M 16
  2463. #define SGEMM_DEFAULT_UNROLL_N 4
  2464. #define DGEMM_DEFAULT_UNROLL_M 8
  2465. #define DGEMM_DEFAULT_UNROLL_N 4
  2466. #define CGEMM_DEFAULT_UNROLL_M 8
  2467. #define CGEMM_DEFAULT_UNROLL_N 4
  2468. #define ZGEMM_DEFAULT_UNROLL_M 4
  2469. #define ZGEMM_DEFAULT_UNROLL_N 4
  2470. #define SGEMM_DEFAULT_P 128
  2471. #define DGEMM_DEFAULT_P 160
  2472. #define CGEMM_DEFAULT_P 128
  2473. #define ZGEMM_DEFAULT_P 128
  2474. #define SGEMM_DEFAULT_Q 352
  2475. #define DGEMM_DEFAULT_Q 128
  2476. #define CGEMM_DEFAULT_Q 224
  2477. #define ZGEMM_DEFAULT_Q 112
  2478. #define SGEMM_DEFAULT_R 4096
  2479. #define DGEMM_DEFAULT_R 4096
  2480. #define CGEMM_DEFAULT_R 4096
  2481. #define ZGEMM_DEFAULT_R 4096
  2482. #endif /* Cores */
  2483. #endif /* ARMv8 */
  2484. #if defined(ARMV5)
  2485. #define SNUMOPT 2
  2486. #define DNUMOPT 2
  2487. #define GEMM_DEFAULT_OFFSET_A 0
  2488. #define GEMM_DEFAULT_OFFSET_B 0
  2489. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2490. #define SGEMM_DEFAULT_UNROLL_M 2
  2491. #define SGEMM_DEFAULT_UNROLL_N 2
  2492. #define DGEMM_DEFAULT_UNROLL_M 2
  2493. #define DGEMM_DEFAULT_UNROLL_N 2
  2494. #define CGEMM_DEFAULT_UNROLL_M 2
  2495. #define CGEMM_DEFAULT_UNROLL_N 2
  2496. #define ZGEMM_DEFAULT_UNROLL_M 2
  2497. #define ZGEMM_DEFAULT_UNROLL_N 2
  2498. #define SGEMM_DEFAULT_P 128
  2499. #define DGEMM_DEFAULT_P 128
  2500. #define CGEMM_DEFAULT_P 96
  2501. #define ZGEMM_DEFAULT_P 64
  2502. #define SGEMM_DEFAULT_Q 240
  2503. #define DGEMM_DEFAULT_Q 120
  2504. #define CGEMM_DEFAULT_Q 120
  2505. #define ZGEMM_DEFAULT_Q 120
  2506. #define SGEMM_DEFAULT_R 12288
  2507. #define DGEMM_DEFAULT_R 8192
  2508. #define CGEMM_DEFAULT_R 4096
  2509. #define ZGEMM_DEFAULT_R 4096
  2510. #define SYMV_P 16
  2511. #endif
  2512. #ifdef CORTEXA9
  2513. #define SNUMOPT 2
  2514. #define DNUMOPT 2
  2515. #define GEMM_DEFAULT_OFFSET_A 0
  2516. #define GEMM_DEFAULT_OFFSET_B 0
  2517. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2518. #define SGEMM_DEFAULT_UNROLL_M 4
  2519. #define SGEMM_DEFAULT_UNROLL_N 4
  2520. #define DGEMM_DEFAULT_UNROLL_M 4
  2521. #define DGEMM_DEFAULT_UNROLL_N 4
  2522. #define CGEMM_DEFAULT_UNROLL_M 2
  2523. #define CGEMM_DEFAULT_UNROLL_N 2
  2524. #define ZGEMM_DEFAULT_UNROLL_M 2
  2525. #define ZGEMM_DEFAULT_UNROLL_N 2
  2526. #define SGEMM_DEFAULT_P 128
  2527. #define DGEMM_DEFAULT_P 128
  2528. #define CGEMM_DEFAULT_P 96
  2529. #define ZGEMM_DEFAULT_P 64
  2530. #define SGEMM_DEFAULT_Q 240
  2531. #define DGEMM_DEFAULT_Q 120
  2532. #define CGEMM_DEFAULT_Q 120
  2533. #define ZGEMM_DEFAULT_Q 120
  2534. #define SGEMM_DEFAULT_R 12288
  2535. #define DGEMM_DEFAULT_R 8192
  2536. #define CGEMM_DEFAULT_R 4096
  2537. #define ZGEMM_DEFAULT_R 4096
  2538. #define SYMV_P 16
  2539. #endif
  2540. #ifdef CORTEXA15
  2541. #define SNUMOPT 2
  2542. #define DNUMOPT 2
  2543. #define GEMM_DEFAULT_OFFSET_A 0
  2544. #define GEMM_DEFAULT_OFFSET_B 0
  2545. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2546. #define SGEMM_DEFAULT_UNROLL_M 4
  2547. #define SGEMM_DEFAULT_UNROLL_N 4
  2548. #define DGEMM_DEFAULT_UNROLL_M 4
  2549. #define DGEMM_DEFAULT_UNROLL_N 4
  2550. #define CGEMM_DEFAULT_UNROLL_M 2
  2551. #define CGEMM_DEFAULT_UNROLL_N 2
  2552. #define ZGEMM_DEFAULT_UNROLL_M 2
  2553. #define ZGEMM_DEFAULT_UNROLL_N 2
  2554. #define SGEMM_DEFAULT_P 128
  2555. #define DGEMM_DEFAULT_P 128
  2556. #define CGEMM_DEFAULT_P 96
  2557. #define ZGEMM_DEFAULT_P 64
  2558. #define SGEMM_DEFAULT_Q 240
  2559. #define DGEMM_DEFAULT_Q 120
  2560. #define CGEMM_DEFAULT_Q 120
  2561. #define ZGEMM_DEFAULT_Q 120
  2562. #define SGEMM_DEFAULT_R 12288
  2563. #define DGEMM_DEFAULT_R 8192
  2564. #define CGEMM_DEFAULT_R 4096
  2565. #define ZGEMM_DEFAULT_R 4096
  2566. #define SYMV_P 16
  2567. #endif
  2568. #if defined(ZARCH_GENERIC)
  2569. #define SNUMOPT 2
  2570. #define DNUMOPT 2
  2571. #define GEMM_DEFAULT_OFFSET_A 0
  2572. #define GEMM_DEFAULT_OFFSET_B 0
  2573. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2574. #define SGEMM_DEFAULT_UNROLL_M 2
  2575. #define SGEMM_DEFAULT_UNROLL_N 2
  2576. #define DGEMM_DEFAULT_UNROLL_M 2
  2577. #define DGEMM_DEFAULT_UNROLL_N 2
  2578. #define CGEMM_DEFAULT_UNROLL_M 2
  2579. #define CGEMM_DEFAULT_UNROLL_N 2
  2580. #define ZGEMM_DEFAULT_UNROLL_M 2
  2581. #define ZGEMM_DEFAULT_UNROLL_N 2
  2582. #define SGEMM_DEFAULT_P 128
  2583. #define DGEMM_DEFAULT_P 128
  2584. #define CGEMM_DEFAULT_P 96
  2585. #define ZGEMM_DEFAULT_P 64
  2586. #define SGEMM_DEFAULT_Q 240
  2587. #define DGEMM_DEFAULT_Q 120
  2588. #define CGEMM_DEFAULT_Q 120
  2589. #define ZGEMM_DEFAULT_Q 120
  2590. #define SGEMM_DEFAULT_R 12288
  2591. #define DGEMM_DEFAULT_R 8192
  2592. #define CGEMM_DEFAULT_R 4096
  2593. #define ZGEMM_DEFAULT_R 4096
  2594. #define SYMV_P 16
  2595. #endif
  2596. #if defined(Z13)
  2597. #define SNUMOPT 2
  2598. #define DNUMOPT 2
  2599. #define GEMM_DEFAULT_OFFSET_A 0
  2600. #define GEMM_DEFAULT_OFFSET_B 0
  2601. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2602. #define SGEMM_DEFAULT_UNROLL_M 8
  2603. #define SGEMM_DEFAULT_UNROLL_N 4
  2604. #define DGEMM_DEFAULT_UNROLL_M 8
  2605. #define DGEMM_DEFAULT_UNROLL_N 4
  2606. #define CGEMM_DEFAULT_UNROLL_M 4
  2607. #define CGEMM_DEFAULT_UNROLL_N 4
  2608. #define ZGEMM_DEFAULT_UNROLL_M 4
  2609. #define ZGEMM_DEFAULT_UNROLL_N 4
  2610. #define SGEMM_DEFAULT_P 456
  2611. #define DGEMM_DEFAULT_P 320
  2612. #define CGEMM_DEFAULT_P 480
  2613. #define ZGEMM_DEFAULT_P 224
  2614. #define SGEMM_DEFAULT_Q 488
  2615. #define DGEMM_DEFAULT_Q 384
  2616. #define CGEMM_DEFAULT_Q 128
  2617. #define ZGEMM_DEFAULT_Q 352
  2618. #define SGEMM_DEFAULT_R 8192
  2619. #define DGEMM_DEFAULT_R 4096
  2620. #define CGEMM_DEFAULT_R 4096
  2621. #define ZGEMM_DEFAULT_R 2048
  2622. #define SYMV_P 16
  2623. #endif
  2624. #if defined(Z14)
  2625. #define SNUMOPT 2
  2626. #define DNUMOPT 2
  2627. #define GEMM_DEFAULT_OFFSET_A 0
  2628. #define GEMM_DEFAULT_OFFSET_B 0
  2629. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2630. #define SGEMM_DEFAULT_UNROLL_M 16
  2631. #define SGEMM_DEFAULT_UNROLL_N 4
  2632. #define DGEMM_DEFAULT_UNROLL_M 8
  2633. #define DGEMM_DEFAULT_UNROLL_N 4
  2634. #define CGEMM_DEFAULT_UNROLL_M 4
  2635. #define CGEMM_DEFAULT_UNROLL_N 4
  2636. #define ZGEMM_DEFAULT_UNROLL_M 4
  2637. #define ZGEMM_DEFAULT_UNROLL_N 4
  2638. #define SGEMM_DEFAULT_P 480
  2639. #define DGEMM_DEFAULT_P 320
  2640. #define CGEMM_DEFAULT_P 480
  2641. #define ZGEMM_DEFAULT_P 224
  2642. #define SGEMM_DEFAULT_Q 512
  2643. #define DGEMM_DEFAULT_Q 384
  2644. #define CGEMM_DEFAULT_Q 128
  2645. #define ZGEMM_DEFAULT_Q 352
  2646. #define SGEMM_DEFAULT_R 8192
  2647. #define DGEMM_DEFAULT_R 4096
  2648. #define CGEMM_DEFAULT_R 4096
  2649. #define ZGEMM_DEFAULT_R 2048
  2650. #define SYMV_P 16
  2651. #endif
  2652. #ifdef GENERIC
  2653. #define SNUMOPT 2
  2654. #define DNUMOPT 2
  2655. #define GEMM_DEFAULT_OFFSET_A 0
  2656. #define GEMM_DEFAULT_OFFSET_B 0
  2657. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  2658. #define SGEMM_DEFAULT_UNROLL_N 2
  2659. #define DGEMM_DEFAULT_UNROLL_N 2
  2660. #define QGEMM_DEFAULT_UNROLL_N 2
  2661. #define CGEMM_DEFAULT_UNROLL_N 2
  2662. #define ZGEMM_DEFAULT_UNROLL_N 2
  2663. #define XGEMM_DEFAULT_UNROLL_N 1
  2664. #ifdef ARCH_X86
  2665. #define SGEMM_DEFAULT_UNROLL_M 2
  2666. #define DGEMM_DEFAULT_UNROLL_M 2
  2667. #define QGEMM_DEFAULT_UNROLL_M 2
  2668. #define CGEMM_DEFAULT_UNROLL_M 2
  2669. #define ZGEMM_DEFAULT_UNROLL_M 2
  2670. #define XGEMM_DEFAULT_UNROLL_M 1
  2671. #else
  2672. #define SGEMM_DEFAULT_UNROLL_M 2
  2673. #define DGEMM_DEFAULT_UNROLL_M 2
  2674. #define QGEMM_DEFAULT_UNROLL_M 2
  2675. #define CGEMM_DEFAULT_UNROLL_M 2
  2676. #define ZGEMM_DEFAULT_UNROLL_M 2
  2677. #define XGEMM_DEFAULT_UNROLL_M 1
  2678. #endif
  2679. #define SGEMM_DEFAULT_P sgemm_p
  2680. #define DGEMM_DEFAULT_P dgemm_p
  2681. #define QGEMM_DEFAULT_P qgemm_p
  2682. #define CGEMM_DEFAULT_P cgemm_p
  2683. #define ZGEMM_DEFAULT_P zgemm_p
  2684. #define XGEMM_DEFAULT_P xgemm_p
  2685. #define SGEMM_DEFAULT_R sgemm_r
  2686. #define DGEMM_DEFAULT_R dgemm_r
  2687. #define QGEMM_DEFAULT_R qgemm_r
  2688. #define CGEMM_DEFAULT_R cgemm_r
  2689. #define ZGEMM_DEFAULT_R zgemm_r
  2690. #define XGEMM_DEFAULT_R xgemm_r
  2691. #define SGEMM_DEFAULT_Q 128
  2692. #define DGEMM_DEFAULT_Q 128
  2693. #define QGEMM_DEFAULT_Q 128
  2694. #define CGEMM_DEFAULT_Q 128
  2695. #define ZGEMM_DEFAULT_Q 128
  2696. #define XGEMM_DEFAULT_Q 128
  2697. #define SYMV_P 16
  2698. #endif
  2699. #ifndef QGEMM_DEFAULT_UNROLL_M
  2700. #define QGEMM_DEFAULT_UNROLL_M 2
  2701. #endif
  2702. #ifndef QGEMM_DEFAULT_UNROLL_N
  2703. #define QGEMM_DEFAULT_UNROLL_N 2
  2704. #endif
  2705. #ifndef XGEMM_DEFAULT_UNROLL_M
  2706. #define XGEMM_DEFAULT_UNROLL_M 2
  2707. #endif
  2708. #ifndef XGEMM_DEFAULT_UNROLL_N
  2709. #define XGEMM_DEFAULT_UNROLL_N 2
  2710. #endif
  2711. #ifndef HAVE_SSE2
  2712. #define SHUFPD_0 shufps $0x44,
  2713. #define SHUFPD_1 shufps $0x4e,
  2714. #define SHUFPD_2 shufps $0xe4,
  2715. #define SHUFPD_3 shufps $0xee,
  2716. #endif
  2717. #ifndef SHUFPD_0
  2718. #define SHUFPD_0 shufpd $0,
  2719. #endif
  2720. #ifndef SHUFPD_1
  2721. #define SHUFPD_1 shufpd $1,
  2722. #endif
  2723. #ifndef SHUFPD_2
  2724. #define SHUFPD_2 shufpd $2,
  2725. #endif
  2726. #ifndef SHUFPD_3
  2727. #define SHUFPD_3 shufpd $3,
  2728. #endif
  2729. #ifndef SHUFPS_39
  2730. #define SHUFPS_39 shufps $0x39,
  2731. #endif
  2732. #endif