You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

param.h 94 kB

12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
5 years ago
12 years ago
12 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
12 years ago
12 years ago
12 years ago
5 years ago
5 years ago
12 years ago
5 years ago
5 years ago
12 years ago
5 years ago
12 years ago
5 years ago
5 years ago
5 years ago
12 years ago
6 years ago
12 years ago
12 years ago
12 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
3 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
Simplifying ARMv8 build parameters ARMv8 builds were a bit mixed up, with ThunderX2 code in ARMv8 mode (which is not right because TX2 is ARMv8.1) as well as requiring a few redundancies in the defines, making it harder to maintain and understand what core has what. A few other minor issues were also fixed. Tests were made on the following cores: A53, A57, A72, Falkor, ThunderX, ThunderX2, and XGene. Tests were: OpenBLAS/test, OpenBLAS/benchmark, BLAS-Tester. A summary: * Removed TX2 code from ARMv8 build, to make sure it is compatible with all ARMv8 cores, not just v8.1. Also, the TX2 code has actually harmed performance on big cores. * Commoned up ARMv8 architectures' defines in params.h, to make sure that all will benefit from ARMv8 settings, in addition to their own. * Adding a few more cores, using ARMv8's include strategy, to benefit from compiler optimisations using mtune. Also updated cache information from the manuals, making sure we set good conservative values by default. Removed Vulcan, as it's an alias to TX2. * Auto-detecting most of those cores, but also updating the forced compilation in getarch.c, to make sure the parameters are the same whether compiled natively or forced arch. Benefits: * ARMv8 build is now guaranteed to work on all ARMv8 cores * Improved performance for ARMv8 builds on some cores (A72, Falkor, ThunderX1 and 2: up to 11%) over current develop * Improved performance for *all* cores comparing to develop branch before TX2's patch (9% ~ 36%) * ThunderX1 builds are 14% faster than ARMv8 on TX1, 9% faster than current develop's branch and 8% faster than deveop before tx2 patches Issues: * Regression from current develop branch for A53 (-12%) and A57 (-3%) with ARMv8 builds, but still faster than before TX2's commit (+15% and +24% respectively). This can be improved with a simplification of TX2's code, to be done in future patches. At least the code is guaranteed to be ARMv8.0 now. Comments: * CortexA57 builds are unchanged on A57 hardware from develop's branch, which makes sense, as it's untouched. * CortexA72 builds improve over A57 on A72 hardware, even if they're using the same includes due to new compiler tunning in the makefile.
6 years ago
12 years ago
12 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
8 years ago
12 years ago
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712171317141715171617171718171917201721172217231724172517261727172817291730173117321733173417351736173717381739174017411742174317441745174617471748174917501751175217531754175517561757175817591760176117621763176417651766176717681769177017711772177317741775177617771778177917801781178217831784178517861787178817891790179117921793179417951796179717981799180018011802180318041805180618071808180918101811181218131814181518161817181818191820182118221823182418251826182718281829183018311832183318341835183618371838183918401841184218431844184518461847184818491850185118521853185418551856185718581859186018611862186318641865186618671868186918701871187218731874187518761877187818791880188118821883188418851886188718881889189018911892189318941895189618971898189919001901190219031904190519061907190819091910191119121913191419151916191719181919192019211922192319241925192619271928192919301931193219331934193519361937193819391940194119421943194419451946194719481949195019511952195319541955195619571958195919601961196219631964196519661967196819691970197119721973197419751976197719781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000200120022003200420052006200720082009201020112012201320142015201620172018201920202021202220232024202520262027202820292030203120322033203420352036203720382039204020412042204320442045204620472048204920502051205220532054205520562057205820592060206120622063206420652066206720682069207020712072207320742075207620772078207920802081208220832084208520862087208820892090209120922093209420952096209720982099210021012102210321042105210621072108210921102111211221132114211521162117211821192120212121222123212421252126212721282129213021312132213321342135213621372138213921402141214221432144214521462147214821492150215121522153215421552156215721582159216021612162216321642165216621672168216921702171217221732174217521762177217821792180218121822183218421852186218721882189219021912192219321942195219621972198219922002201220222032204220522062207220822092210221122122213221422152216221722182219222022212222222322242225222622272228222922302231223222332234223522362237223822392240224122422243224422452246224722482249225022512252225322542255225622572258225922602261226222632264226522662267226822692270227122722273227422752276227722782279228022812282228322842285228622872288228922902291229222932294229522962297229822992300230123022303230423052306230723082309231023112312231323142315231623172318231923202321232223232324232523262327232823292330233123322333233423352336233723382339234023412342234323442345234623472348234923502351235223532354235523562357235823592360236123622363236423652366236723682369237023712372237323742375237623772378237923802381238223832384238523862387238823892390239123922393239423952396239723982399240024012402240324042405240624072408240924102411241224132414241524162417241824192420242124222423242424252426242724282429243024312432243324342435243624372438243924402441244224432444244524462447244824492450245124522453245424552456245724582459246024612462246324642465246624672468246924702471247224732474247524762477247824792480248124822483248424852486248724882489249024912492249324942495249624972498249925002501250225032504250525062507250825092510251125122513251425152516251725182519252025212522252325242525252625272528252925302531253225332534253525362537253825392540254125422543254425452546254725482549255025512552255325542555255625572558255925602561256225632564256525662567256825692570257125722573257425752576257725782579258025812582258325842585258625872588258925902591259225932594259525962597259825992600260126022603260426052606260726082609261026112612261326142615261626172618261926202621262226232624262526262627262826292630263126322633263426352636263726382639264026412642264326442645264626472648264926502651265226532654265526562657265826592660266126622663266426652666266726682669267026712672267326742675267626772678267926802681268226832684268526862687268826892690269126922693269426952696269726982699270027012702270327042705270627072708270927102711271227132714271527162717271827192720272127222723272427252726272727282729273027312732273327342735273627372738273927402741274227432744274527462747274827492750275127522753275427552756275727582759276027612762276327642765276627672768276927702771277227732774277527762777277827792780278127822783278427852786278727882789279027912792279327942795279627972798279928002801280228032804280528062807280828092810281128122813281428152816281728182819282028212822282328242825282628272828282928302831283228332834283528362837283828392840284128422843284428452846284728482849285028512852285328542855285628572858285928602861286228632864286528662867286828692870287128722873287428752876287728782879288028812882288328842885288628872888288928902891289228932894289528962897289828992900290129022903290429052906290729082909291029112912291329142915291629172918291929202921292229232924292529262927292829292930293129322933293429352936293729382939294029412942294329442945294629472948294929502951295229532954295529562957295829592960296129622963296429652966296729682969297029712972297329742975297629772978297929802981298229832984298529862987298829892990299129922993299429952996299729982999300030013002300330043005300630073008300930103011301230133014301530163017301830193020302130223023302430253026302730283029303030313032303330343035303630373038303930403041304230433044304530463047304830493050305130523053305430553056305730583059306030613062306330643065306630673068306930703071307230733074307530763077307830793080308130823083308430853086308730883089309030913092309330943095309630973098309931003101310231033104310531063107310831093110311131123113311431153116311731183119312031213122312331243125312631273128312931303131313231333134313531363137313831393140314131423143314431453146314731483149315031513152315331543155315631573158315931603161316231633164316531663167316831693170317131723173317431753176317731783179318031813182318331843185318631873188318931903191319231933194319531963197319831993200320132023203320432053206320732083209321032113212321332143215321632173218321932203221322232233224322532263227322832293230323132323233323432353236323732383239324032413242324332443245324632473248324932503251325232533254325532563257325832593260326132623263326432653266326732683269327032713272327332743275327632773278327932803281328232833284328532863287328832893290329132923293329432953296329732983299330033013302330333043305330633073308330933103311331233133314331533163317331833193320332133223323332433253326332733283329333033313332333333343335333633373338333933403341334233433344334533463347334833493350335133523353335433553356335733583359336033613362336333643365336633673368336933703371337233733374337533763377337833793380338133823383338433853386338733883389339033913392339333943395339633973398339934003401340234033404340534063407340834093410341134123413341434153416341734183419342034213422342334243425342634273428342934303431343234333434343534363437343834393440344134423443344434453446344734483449345034513452345334543455345634573458345934603461346234633464346534663467346834693470347134723473347434753476347734783479348034813482348334843485348634873488348934903491349234933494349534963497349834993500350135023503350435053506350735083509351035113512351335143515351635173518351935203521352235233524352535263527352835293530353135323533353435353536353735383539354035413542354335443545354635473548354935503551355235533554355535563557355835593560356135623563356435653566356735683569357035713572357335743575357635773578357935803581358235833584358535863587358835893590359135923593359435953596359735983599360036013602360336043605360636073608360936103611361236133614361536163617361836193620362136223623362436253626362736283629363036313632363336343635363636373638363936403641364236433644364536463647364836493650365136523653365436553656365736583659366036613662366336643665366636673668366936703671367236733674367536763677367836793680368136823683368436853686368736883689369036913692369336943695369636973698369937003701370237033704370537063707370837093710371137123713371437153716371737183719372037213722372337243725372637273728372937303731373237333734373537363737373837393740374137423743374437453746374737483749375037513752375337543755375637573758375937603761376237633764376537663767376837693770377137723773377437753776377737783779378037813782378337843785378637873788378937903791379237933794379537963797379837993800380138023803380438053806380738083809381038113812381338143815381638173818381938203821382238233824382538263827382838293830383138323833383438353836383738383839384038413842384338443845384638473848384938503851385238533854385538563857385838593860386138623863386438653866386738683869387038713872387338743875387638773878387938803881388238833884388538863887388838893890389138923893389438953896389738983899390039013902390339043905390639073908390939103911391239133914391539163917391839193920392139223923392439253926392739283929393039313932393339343935393639373938393939403941394239433944394539463947
  1. /*****************************************************************************
  2. Copyright (c) 2011-2023, The OpenBLAS Project
  3. All rights reserved.
  4. Redistribution and use in source and binary forms, with or without
  5. modification, are permitted provided that the following conditions are
  6. met:
  7. 1. Redistributions of source code must retain the above copyright
  8. notice, this list of conditions and the following disclaimer.
  9. 2. Redistributions in binary form must reproduce the above copyright
  10. notice, this list of conditions and the following disclaimer in
  11. the documentation and/or other materials provided with the
  12. distribution.
  13. 3. Neither the name of the OpenBLAS project nor the names of
  14. its contributors may be used to endorse or promote products
  15. derived from this software without specific prior written
  16. permission.
  17. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  18. AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  19. IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  20. ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  21. LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  22. DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  23. SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  24. CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  25. OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
  26. USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  27. **********************************************************************************/
  28. /*********************************************************************/
  29. /* Copyright 2009, 2010 The University of Texas at Austin. */
  30. /* All rights reserved. */
  31. /* */
  32. /* Redistribution and use in source and binary forms, with or */
  33. /* without modification, are permitted provided that the following */
  34. /* conditions are met: */
  35. /* */
  36. /* 1. Redistributions of source code must retain the above */
  37. /* copyright notice, this list of conditions and the following */
  38. /* disclaimer. */
  39. /* */
  40. /* 2. Redistributions in binary form must reproduce the above */
  41. /* copyright notice, this list of conditions and the following */
  42. /* disclaimer in the documentation and/or other materials */
  43. /* provided with the distribution. */
  44. /* */
  45. /* THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF TEXAS AT */
  46. /* AUSTIN ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, */
  47. /* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */
  48. /* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */
  49. /* DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS AT */
  50. /* AUSTIN OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, */
  51. /* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES */
  52. /* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE */
  53. /* GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR */
  54. /* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF */
  55. /* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT */
  56. /* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT */
  57. /* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */
  58. /* POSSIBILITY OF SUCH DAMAGE. */
  59. /* */
  60. /* The views and conclusions contained in the software and */
  61. /* documentation are those of the authors and should not be */
  62. /* interpreted as representing official policies, either expressed */
  63. /* or implied, of The University of Texas at Austin. */
  64. /*********************************************************************/
  65. #ifndef PARAM_H
  66. #define PARAM_H
  67. #define SBGEMM_DEFAULT_UNROLL_N 4
  68. #define SBGEMM_DEFAULT_UNROLL_M 8
  69. #define SBGEMM_DEFAULT_UNROLL_MN 32
  70. #define SBGEMM_DEFAULT_P 256
  71. #define SBGEMM_DEFAULT_R 256
  72. #define SBGEMM_DEFAULT_Q 256
  73. #define SBGEMM_ALIGN_K 1 // must be 2^x
  74. #ifdef OPTERON
  75. #define SNUMOPT 4
  76. #define DNUMOPT 2
  77. #define GEMM_DEFAULT_OFFSET_A 64
  78. #define GEMM_DEFAULT_OFFSET_B 256
  79. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x01ffffUL
  80. #define SGEMM_DEFAULT_UNROLL_N 4
  81. #define DGEMM_DEFAULT_UNROLL_N 4
  82. #define QGEMM_DEFAULT_UNROLL_N 2
  83. #define CGEMM_DEFAULT_UNROLL_N 2
  84. #define ZGEMM_DEFAULT_UNROLL_N 2
  85. #define XGEMM_DEFAULT_UNROLL_N 1
  86. #ifdef ARCH_X86
  87. #define SGEMM_DEFAULT_UNROLL_M 4
  88. #define DGEMM_DEFAULT_UNROLL_M 2
  89. #define QGEMM_DEFAULT_UNROLL_M 2
  90. #define CGEMM_DEFAULT_UNROLL_M 2
  91. #define ZGEMM_DEFAULT_UNROLL_M 1
  92. #define XGEMM_DEFAULT_UNROLL_M 1
  93. #else
  94. #define SGEMM_DEFAULT_UNROLL_M 8
  95. #define DGEMM_DEFAULT_UNROLL_M 4
  96. #define QGEMM_DEFAULT_UNROLL_M 2
  97. #define CGEMM_DEFAULT_UNROLL_M 4
  98. #define ZGEMM_DEFAULT_UNROLL_M 2
  99. #define XGEMM_DEFAULT_UNROLL_M 1
  100. #endif
  101. #define SGEMM_DEFAULT_P sgemm_p
  102. #define DGEMM_DEFAULT_P dgemm_p
  103. #define QGEMM_DEFAULT_P qgemm_p
  104. #define CGEMM_DEFAULT_P cgemm_p
  105. #define ZGEMM_DEFAULT_P zgemm_p
  106. #define XGEMM_DEFAULT_P xgemm_p
  107. #define SGEMM_DEFAULT_R sgemm_r
  108. #define DGEMM_DEFAULT_R dgemm_r
  109. #define QGEMM_DEFAULT_R qgemm_r
  110. #define CGEMM_DEFAULT_R cgemm_r
  111. #define ZGEMM_DEFAULT_R zgemm_r
  112. #define XGEMM_DEFAULT_R xgemm_r
  113. #ifdef ALLOC_HUGETLB
  114. #define SGEMM_DEFAULT_Q 248
  115. #define DGEMM_DEFAULT_Q 248
  116. #define QGEMM_DEFAULT_Q 248
  117. #define CGEMM_DEFAULT_Q 248
  118. #define ZGEMM_DEFAULT_Q 248
  119. #define XGEMM_DEFAULT_Q 248
  120. #else
  121. #define SGEMM_DEFAULT_Q 240
  122. #define DGEMM_DEFAULT_Q 240
  123. #define QGEMM_DEFAULT_Q 240
  124. #define CGEMM_DEFAULT_Q 240
  125. #define ZGEMM_DEFAULT_Q 240
  126. #define XGEMM_DEFAULT_Q 240
  127. #endif
  128. #define SYMV_P 16
  129. #define HAVE_EXCLUSIVE_CACHE
  130. #endif
  131. #if defined(BARCELONA) || defined(SHANGHAI) || defined(BOBCAT)
  132. #define SNUMOPT 8
  133. #define DNUMOPT 4
  134. #define GEMM_DEFAULT_OFFSET_A 64
  135. #define GEMM_DEFAULT_OFFSET_B 832
  136. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  137. #define SGEMM_DEFAULT_UNROLL_N 4
  138. #define DGEMM_DEFAULT_UNROLL_N 4
  139. #define QGEMM_DEFAULT_UNROLL_N 2
  140. #define CGEMM_DEFAULT_UNROLL_N 2
  141. #define ZGEMM_DEFAULT_UNROLL_N 2
  142. #define XGEMM_DEFAULT_UNROLL_N 1
  143. #ifdef ARCH_X86
  144. #define SGEMM_DEFAULT_UNROLL_M 4
  145. #define DGEMM_DEFAULT_UNROLL_M 2
  146. #define QGEMM_DEFAULT_UNROLL_M 2
  147. #define CGEMM_DEFAULT_UNROLL_M 2
  148. #define ZGEMM_DEFAULT_UNROLL_M 1
  149. #define XGEMM_DEFAULT_UNROLL_M 1
  150. #else
  151. #define SGEMM_DEFAULT_UNROLL_M 8
  152. #define DGEMM_DEFAULT_UNROLL_M 4
  153. #define QGEMM_DEFAULT_UNROLL_M 2
  154. #define CGEMM_DEFAULT_UNROLL_M 4
  155. #define ZGEMM_DEFAULT_UNROLL_M 2
  156. #define XGEMM_DEFAULT_UNROLL_M 1
  157. #endif
  158. #if 0
  159. #define SGEMM_DEFAULT_P 496
  160. #define DGEMM_DEFAULT_P 248
  161. #define QGEMM_DEFAULT_P 124
  162. #define CGEMM_DEFAULT_P 248
  163. #define ZGEMM_DEFAULT_P 124
  164. #define XGEMM_DEFAULT_P 62
  165. #define SGEMM_DEFAULT_Q 248
  166. #define DGEMM_DEFAULT_Q 248
  167. #define QGEMM_DEFAULT_Q 248
  168. #define CGEMM_DEFAULT_Q 248
  169. #define ZGEMM_DEFAULT_Q 248
  170. #define XGEMM_DEFAULT_Q 248
  171. #else
  172. #define SGEMM_DEFAULT_P 448
  173. #define DGEMM_DEFAULT_P 224
  174. #define QGEMM_DEFAULT_P 112
  175. #define CGEMM_DEFAULT_P 224
  176. #define ZGEMM_DEFAULT_P 112
  177. #define XGEMM_DEFAULT_P 56
  178. #define SGEMM_DEFAULT_Q 224
  179. #define DGEMM_DEFAULT_Q 224
  180. #define QGEMM_DEFAULT_Q 224
  181. #define CGEMM_DEFAULT_Q 224
  182. #define ZGEMM_DEFAULT_Q 224
  183. #define XGEMM_DEFAULT_Q 224
  184. #endif
  185. #define SGEMM_DEFAULT_R sgemm_r
  186. #define QGEMM_DEFAULT_R qgemm_r
  187. #define DGEMM_DEFAULT_R dgemm_r
  188. #define CGEMM_DEFAULT_R cgemm_r
  189. #define ZGEMM_DEFAULT_R zgemm_r
  190. #define XGEMM_DEFAULT_R xgemm_r
  191. #define SYMV_P 16
  192. #define HAVE_EXCLUSIVE_CACHE
  193. #define GEMM_THREAD gemm_thread_mn
  194. #endif
  195. #ifdef BULLDOZER
  196. #define SNUMOPT 8
  197. #define DNUMOPT 4
  198. #define GEMM_DEFAULT_OFFSET_A 64
  199. #define GEMM_DEFAULT_OFFSET_B 832
  200. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  201. #define QGEMM_DEFAULT_UNROLL_N 2
  202. #define CGEMM_DEFAULT_UNROLL_N 2
  203. #define ZGEMM_DEFAULT_UNROLL_N 2
  204. #define XGEMM_DEFAULT_UNROLL_N 1
  205. #ifdef ARCH_X86
  206. #define SGEMM_DEFAULT_UNROLL_N 4
  207. #define DGEMM_DEFAULT_UNROLL_N 4
  208. #define SGEMM_DEFAULT_UNROLL_M 4
  209. #define DGEMM_DEFAULT_UNROLL_M 2
  210. #define QGEMM_DEFAULT_UNROLL_M 2
  211. #define CGEMM_DEFAULT_UNROLL_M 2
  212. #define ZGEMM_DEFAULT_UNROLL_M 1
  213. #define XGEMM_DEFAULT_UNROLL_M 1
  214. #else
  215. #define SGEMM_DEFAULT_UNROLL_N 2
  216. #define DGEMM_DEFAULT_UNROLL_N 2
  217. #define SGEMM_DEFAULT_UNROLL_M 16
  218. #define DGEMM_DEFAULT_UNROLL_M 8
  219. #define QGEMM_DEFAULT_UNROLL_M 2
  220. #define CGEMM_DEFAULT_UNROLL_M 4
  221. #define ZGEMM_DEFAULT_UNROLL_M 2
  222. #define XGEMM_DEFAULT_UNROLL_M 1
  223. #define CGEMM3M_DEFAULT_UNROLL_N 4
  224. #define CGEMM3M_DEFAULT_UNROLL_M 8
  225. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  226. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  227. #define DGEMM_DEFAULT_UNROLL_MN 16
  228. #define GEMV_UNROLL 8
  229. #endif
  230. #if defined(ARCH_X86_64)
  231. #define SGEMM_DEFAULT_P 768
  232. #define DGEMM_DEFAULT_P 384
  233. #else
  234. #define SGEMM_DEFAULT_P 448
  235. #define DGEMM_DEFAULT_P 224
  236. #endif
  237. #define QGEMM_DEFAULT_P 112
  238. #define CGEMM_DEFAULT_P 224
  239. #define ZGEMM_DEFAULT_P 112
  240. #define XGEMM_DEFAULT_P 56
  241. #if defined(ARCH_X86_64)
  242. #define SGEMM_DEFAULT_Q 168
  243. #define DGEMM_DEFAULT_Q 168
  244. #else
  245. #define SGEMM_DEFAULT_Q 224
  246. #define DGEMM_DEFAULT_Q 224
  247. #endif
  248. #define QGEMM_DEFAULT_Q 224
  249. #define CGEMM_DEFAULT_Q 224
  250. #define ZGEMM_DEFAULT_Q 224
  251. #define XGEMM_DEFAULT_Q 224
  252. #define CGEMM3M_DEFAULT_P 448
  253. #define ZGEMM3M_DEFAULT_P 224
  254. #define XGEMM3M_DEFAULT_P 112
  255. #define CGEMM3M_DEFAULT_Q 224
  256. #define ZGEMM3M_DEFAULT_Q 224
  257. #define XGEMM3M_DEFAULT_Q 224
  258. #define CGEMM3M_DEFAULT_R 12288
  259. #define ZGEMM3M_DEFAULT_R 12288
  260. #define XGEMM3M_DEFAULT_R 12288
  261. #define SGEMM_DEFAULT_R sgemm_r
  262. #define QGEMM_DEFAULT_R qgemm_r
  263. #define DGEMM_DEFAULT_R dgemm_r
  264. #define CGEMM_DEFAULT_R cgemm_r
  265. #define ZGEMM_DEFAULT_R zgemm_r
  266. #define XGEMM_DEFAULT_R xgemm_r
  267. #define SYMV_P 16
  268. #define HAVE_EXCLUSIVE_CACHE
  269. #define GEMM_THREAD gemm_thread_mn
  270. #endif
  271. #ifdef PILEDRIVER
  272. #define SNUMOPT 8
  273. #define DNUMOPT 4
  274. #define GEMM_DEFAULT_OFFSET_A 64
  275. #define GEMM_DEFAULT_OFFSET_B 832
  276. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  277. #define QGEMM_DEFAULT_UNROLL_N 2
  278. #define CGEMM_DEFAULT_UNROLL_N 2
  279. #define ZGEMM_DEFAULT_UNROLL_N 2
  280. #define XGEMM_DEFAULT_UNROLL_N 1
  281. #ifdef ARCH_X86
  282. #define SGEMM_DEFAULT_UNROLL_N 4
  283. #define DGEMM_DEFAULT_UNROLL_N 4
  284. #define SGEMM_DEFAULT_UNROLL_M 4
  285. #define DGEMM_DEFAULT_UNROLL_M 2
  286. #define QGEMM_DEFAULT_UNROLL_M 2
  287. #define CGEMM_DEFAULT_UNROLL_M 2
  288. #define ZGEMM_DEFAULT_UNROLL_M 1
  289. #define XGEMM_DEFAULT_UNROLL_M 1
  290. #else
  291. #define SGEMM_DEFAULT_UNROLL_N 2
  292. #define DGEMM_DEFAULT_UNROLL_N 2
  293. #define SGEMM_DEFAULT_UNROLL_M 16
  294. #define DGEMM_DEFAULT_UNROLL_M 8
  295. #define QGEMM_DEFAULT_UNROLL_M 2
  296. #define CGEMM_DEFAULT_UNROLL_M 4
  297. #define ZGEMM_DEFAULT_UNROLL_M 2
  298. #define XGEMM_DEFAULT_UNROLL_M 1
  299. #define CGEMM3M_DEFAULT_UNROLL_N 4
  300. #define CGEMM3M_DEFAULT_UNROLL_M 8
  301. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  302. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  303. #define GEMV_UNROLL 8
  304. #endif
  305. #if defined(ARCH_X86_64)
  306. #define SGEMM_DEFAULT_P 768
  307. #define DGEMM_DEFAULT_P 768
  308. #define ZGEMM_DEFAULT_P 384
  309. #define CGEMM_DEFAULT_P 768
  310. #else
  311. #define SGEMM_DEFAULT_P 448
  312. #define DGEMM_DEFAULT_P 480
  313. #define ZGEMM_DEFAULT_P 112
  314. #define CGEMM_DEFAULT_P 224
  315. #endif
  316. #define QGEMM_DEFAULT_P 112
  317. #define XGEMM_DEFAULT_P 56
  318. #if defined(ARCH_X86_64)
  319. #define SGEMM_DEFAULT_Q 192
  320. #define DGEMM_DEFAULT_Q 168
  321. #define ZGEMM_DEFAULT_Q 168
  322. #define CGEMM_DEFAULT_Q 168
  323. #else
  324. #define SGEMM_DEFAULT_Q 224
  325. #define DGEMM_DEFAULT_Q 224
  326. #define ZGEMM_DEFAULT_Q 224
  327. #define CGEMM_DEFAULT_Q 224
  328. #endif
  329. #define QGEMM_DEFAULT_Q 224
  330. #define XGEMM_DEFAULT_Q 224
  331. #define CGEMM3M_DEFAULT_P 448
  332. #define ZGEMM3M_DEFAULT_P 224
  333. #define XGEMM3M_DEFAULT_P 112
  334. #define CGEMM3M_DEFAULT_Q 224
  335. #define ZGEMM3M_DEFAULT_Q 224
  336. #define XGEMM3M_DEFAULT_Q 224
  337. #define CGEMM3M_DEFAULT_R 12288
  338. #define ZGEMM3M_DEFAULT_R 12288
  339. #define XGEMM3M_DEFAULT_R 12288
  340. #define SGEMM_DEFAULT_R 12288
  341. #define QGEMM_DEFAULT_R qgemm_r
  342. #define DGEMM_DEFAULT_R 12288
  343. #define CGEMM_DEFAULT_R cgemm_r
  344. #define ZGEMM_DEFAULT_R zgemm_r
  345. #define XGEMM_DEFAULT_R xgemm_r
  346. #define SYMV_P 16
  347. #define HAVE_EXCLUSIVE_CACHE
  348. #define GEMM_THREAD gemm_thread_mn
  349. #endif
  350. #ifdef STEAMROLLER
  351. #define SNUMOPT 8
  352. #define DNUMOPT 4
  353. #define GEMM_DEFAULT_OFFSET_A 64
  354. #define GEMM_DEFAULT_OFFSET_B 832
  355. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  356. #define QGEMM_DEFAULT_UNROLL_N 2
  357. #define CGEMM_DEFAULT_UNROLL_N 2
  358. #define ZGEMM_DEFAULT_UNROLL_N 2
  359. #define XGEMM_DEFAULT_UNROLL_N 1
  360. #ifdef ARCH_X86
  361. #define SGEMM_DEFAULT_UNROLL_N 4
  362. #define DGEMM_DEFAULT_UNROLL_N 4
  363. #define SGEMM_DEFAULT_UNROLL_M 4
  364. #define DGEMM_DEFAULT_UNROLL_M 2
  365. #define QGEMM_DEFAULT_UNROLL_M 2
  366. #define CGEMM_DEFAULT_UNROLL_M 2
  367. #define ZGEMM_DEFAULT_UNROLL_M 1
  368. #define XGEMM_DEFAULT_UNROLL_M 1
  369. #else
  370. #define SGEMM_DEFAULT_UNROLL_N 2
  371. #define DGEMM_DEFAULT_UNROLL_N 2
  372. #define SGEMM_DEFAULT_UNROLL_M 16
  373. #define DGEMM_DEFAULT_UNROLL_M 8
  374. #define QGEMM_DEFAULT_UNROLL_M 2
  375. #define CGEMM_DEFAULT_UNROLL_M 4
  376. #define ZGEMM_DEFAULT_UNROLL_M 2
  377. #define XGEMM_DEFAULT_UNROLL_M 1
  378. #define CGEMM3M_DEFAULT_UNROLL_N 4
  379. #define CGEMM3M_DEFAULT_UNROLL_M 8
  380. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  381. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  382. #define GEMV_UNROLL 8
  383. #endif
  384. #if defined(ARCH_X86_64)
  385. #define SGEMM_DEFAULT_P 768
  386. #define DGEMM_DEFAULT_P 576
  387. #define ZGEMM_DEFAULT_P 288
  388. #define CGEMM_DEFAULT_P 576
  389. #else
  390. #define SGEMM_DEFAULT_P 448
  391. #define DGEMM_DEFAULT_P 480
  392. #define ZGEMM_DEFAULT_P 112
  393. #define CGEMM_DEFAULT_P 224
  394. #endif
  395. #define QGEMM_DEFAULT_P 112
  396. #define XGEMM_DEFAULT_P 56
  397. #if defined(ARCH_X86_64)
  398. #define SGEMM_DEFAULT_Q 192
  399. #define DGEMM_DEFAULT_Q 160
  400. #define ZGEMM_DEFAULT_Q 160
  401. #define CGEMM_DEFAULT_Q 160
  402. #else
  403. #define SGEMM_DEFAULT_Q 224
  404. #define DGEMM_DEFAULT_Q 224
  405. #define ZGEMM_DEFAULT_Q 224
  406. #define CGEMM_DEFAULT_Q 224
  407. #endif
  408. #define QGEMM_DEFAULT_Q 224
  409. #define XGEMM_DEFAULT_Q 224
  410. #define CGEMM3M_DEFAULT_P 448
  411. #define ZGEMM3M_DEFAULT_P 224
  412. #define XGEMM3M_DEFAULT_P 112
  413. #define CGEMM3M_DEFAULT_Q 224
  414. #define ZGEMM3M_DEFAULT_Q 224
  415. #define XGEMM3M_DEFAULT_Q 224
  416. #define CGEMM3M_DEFAULT_R 12288
  417. #define ZGEMM3M_DEFAULT_R 12288
  418. #define XGEMM3M_DEFAULT_R 12288
  419. #define SGEMM_DEFAULT_R 12288
  420. #define QGEMM_DEFAULT_R qgemm_r
  421. #define DGEMM_DEFAULT_R 12288
  422. #define CGEMM_DEFAULT_R cgemm_r
  423. #define ZGEMM_DEFAULT_R zgemm_r
  424. #define XGEMM_DEFAULT_R xgemm_r
  425. #define SYMV_P 16
  426. #define HAVE_EXCLUSIVE_CACHE
  427. #define GEMM_THREAD gemm_thread_mn
  428. #endif
  429. #ifdef EXCAVATOR
  430. #define SNUMOPT 8
  431. #define DNUMOPT 4
  432. #define GEMM_DEFAULT_OFFSET_A 64
  433. #define GEMM_DEFAULT_OFFSET_B 832
  434. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0fffUL
  435. #define QGEMM_DEFAULT_UNROLL_N 2
  436. #define CGEMM_DEFAULT_UNROLL_N 2
  437. #define ZGEMM_DEFAULT_UNROLL_N 2
  438. #define XGEMM_DEFAULT_UNROLL_N 1
  439. #ifdef ARCH_X86
  440. #define SGEMM_DEFAULT_UNROLL_N 4
  441. #define DGEMM_DEFAULT_UNROLL_N 4
  442. #define SGEMM_DEFAULT_UNROLL_M 4
  443. #define DGEMM_DEFAULT_UNROLL_M 2
  444. #define QGEMM_DEFAULT_UNROLL_M 2
  445. #define CGEMM_DEFAULT_UNROLL_M 2
  446. #define ZGEMM_DEFAULT_UNROLL_M 1
  447. #define XGEMM_DEFAULT_UNROLL_M 1
  448. #else
  449. #define SGEMM_DEFAULT_UNROLL_N 2
  450. #define DGEMM_DEFAULT_UNROLL_N 2
  451. #define SGEMM_DEFAULT_UNROLL_M 16
  452. #define DGEMM_DEFAULT_UNROLL_M 8
  453. #define QGEMM_DEFAULT_UNROLL_M 2
  454. #define CGEMM_DEFAULT_UNROLL_M 4
  455. #define ZGEMM_DEFAULT_UNROLL_M 2
  456. #define XGEMM_DEFAULT_UNROLL_M 1
  457. #define CGEMM3M_DEFAULT_UNROLL_N 4
  458. #define CGEMM3M_DEFAULT_UNROLL_M 8
  459. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  460. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  461. #define GEMV_UNROLL 8
  462. #endif
  463. #if defined(ARCH_X86_64)
  464. #define SGEMM_DEFAULT_P 768
  465. #define DGEMM_DEFAULT_P 576
  466. #define ZGEMM_DEFAULT_P 288
  467. #define CGEMM_DEFAULT_P 576
  468. #else
  469. #define SGEMM_DEFAULT_P 448
  470. #define DGEMM_DEFAULT_P 480
  471. #define ZGEMM_DEFAULT_P 112
  472. #define CGEMM_DEFAULT_P 224
  473. #endif
  474. #define QGEMM_DEFAULT_P 112
  475. #define XGEMM_DEFAULT_P 56
  476. #if defined(ARCH_X86_64)
  477. #define SGEMM_DEFAULT_Q 192
  478. #define DGEMM_DEFAULT_Q 160
  479. #define ZGEMM_DEFAULT_Q 160
  480. #define CGEMM_DEFAULT_Q 160
  481. #else
  482. #define SGEMM_DEFAULT_Q 224
  483. #define DGEMM_DEFAULT_Q 224
  484. #define ZGEMM_DEFAULT_Q 224
  485. #define CGEMM_DEFAULT_Q 224
  486. #endif
  487. #define QGEMM_DEFAULT_Q 224
  488. #define XGEMM_DEFAULT_Q 224
  489. #define CGEMM3M_DEFAULT_P 448
  490. #define ZGEMM3M_DEFAULT_P 224
  491. #define XGEMM3M_DEFAULT_P 112
  492. #define CGEMM3M_DEFAULT_Q 224
  493. #define ZGEMM3M_DEFAULT_Q 224
  494. #define XGEMM3M_DEFAULT_Q 224
  495. #define CGEMM3M_DEFAULT_R 12288
  496. #define ZGEMM3M_DEFAULT_R 12288
  497. #define XGEMM3M_DEFAULT_R 12288
  498. #define SGEMM_DEFAULT_R 12288
  499. #define QGEMM_DEFAULT_R qgemm_r
  500. #define DGEMM_DEFAULT_R 12288
  501. #define CGEMM_DEFAULT_R cgemm_r
  502. #define ZGEMM_DEFAULT_R zgemm_r
  503. #define XGEMM_DEFAULT_R xgemm_r
  504. #define SYMV_P 16
  505. #define HAVE_EXCLUSIVE_CACHE
  506. #define GEMM_THREAD gemm_thread_mn
  507. #endif
  508. #ifdef ZEN
  509. #define SNUMOPT 16
  510. #define DNUMOPT 8
  511. #define GEMM_DEFAULT_OFFSET_A 0
  512. #define GEMM_DEFAULT_OFFSET_B 0
  513. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  514. #define SYMV_P 8
  515. #define SWITCH_RATIO 16
  516. #ifdef ARCH_X86
  517. #define SGEMM_DEFAULT_UNROLL_M 4
  518. #define DGEMM_DEFAULT_UNROLL_M 2
  519. #define QGEMM_DEFAULT_UNROLL_M 2
  520. #define CGEMM_DEFAULT_UNROLL_M 2
  521. #define ZGEMM_DEFAULT_UNROLL_M 1
  522. #define XGEMM_DEFAULT_UNROLL_M 1
  523. #define SGEMM_DEFAULT_UNROLL_N 4
  524. #define DGEMM_DEFAULT_UNROLL_N 4
  525. #define QGEMM_DEFAULT_UNROLL_N 2
  526. #define CGEMM_DEFAULT_UNROLL_N 2
  527. #define ZGEMM_DEFAULT_UNROLL_N 2
  528. #define XGEMM_DEFAULT_UNROLL_N 1
  529. #else
  530. #define SGEMM_DEFAULT_UNROLL_M 8
  531. #define DGEMM_DEFAULT_UNROLL_M 4
  532. #define QGEMM_DEFAULT_UNROLL_M 2
  533. #define CGEMM_DEFAULT_UNROLL_M 8
  534. #define ZGEMM_DEFAULT_UNROLL_M 4
  535. #define XGEMM_DEFAULT_UNROLL_M 1
  536. #define SGEMM_DEFAULT_UNROLL_N 4
  537. #define DGEMM_DEFAULT_UNROLL_N 8
  538. #define QGEMM_DEFAULT_UNROLL_N 2
  539. #define CGEMM_DEFAULT_UNROLL_N 2
  540. #define ZGEMM_DEFAULT_UNROLL_N 2
  541. #define XGEMM_DEFAULT_UNROLL_N 1
  542. /*
  543. #define SGEMM_DEFAULT_UNROLL_MN 32
  544. #define DGEMM_DEFAULT_UNROLL_MN 32
  545. */
  546. #endif
  547. #ifdef ARCH_X86
  548. #define SGEMM_DEFAULT_P 512
  549. #define SGEMM_DEFAULT_R sgemm_r
  550. #define DGEMM_DEFAULT_P 512
  551. #define DGEMM_DEFAULT_R dgemm_r
  552. #define QGEMM_DEFAULT_P 504
  553. #define QGEMM_DEFAULT_R qgemm_r
  554. #define CGEMM_DEFAULT_P 128
  555. #define CGEMM_DEFAULT_R 1024
  556. #define ZGEMM_DEFAULT_P 512
  557. #define ZGEMM_DEFAULT_R zgemm_r
  558. #define XGEMM_DEFAULT_P 252
  559. #define XGEMM_DEFAULT_R xgemm_r
  560. #define SGEMM_DEFAULT_Q 256
  561. #define DGEMM_DEFAULT_Q 256
  562. #define QGEMM_DEFAULT_Q 128
  563. #define CGEMM_DEFAULT_Q 256
  564. #define ZGEMM_DEFAULT_Q 192
  565. #define XGEMM_DEFAULT_Q 128
  566. #else
  567. #define SGEMM_DEFAULT_P 320
  568. #define DGEMM_DEFAULT_P 512
  569. #define CGEMM_DEFAULT_P 256
  570. #define ZGEMM_DEFAULT_P 192
  571. #ifdef WINDOWS_ABI
  572. #define SGEMM_DEFAULT_Q 320
  573. #define DGEMM_DEFAULT_Q 128
  574. #else
  575. #define SGEMM_DEFAULT_Q 320
  576. #define DGEMM_DEFAULT_Q 256
  577. #endif
  578. #define CGEMM_DEFAULT_Q 256
  579. #define ZGEMM_DEFAULT_Q 192
  580. #define SGEMM_DEFAULT_R sgemm_r
  581. #define DGEMM_DEFAULT_R 13824
  582. #define CGEMM_DEFAULT_R cgemm_r
  583. #define ZGEMM_DEFAULT_R zgemm_r
  584. #define QGEMM_DEFAULT_Q 128
  585. #define QGEMM_DEFAULT_P 504
  586. #define QGEMM_DEFAULT_R qgemm_r
  587. #define XGEMM_DEFAULT_P 252
  588. #define XGEMM_DEFAULT_R xgemm_r
  589. #define XGEMM_DEFAULT_Q 128
  590. #define CGEMM3M_DEFAULT_UNROLL_N 4
  591. #define CGEMM3M_DEFAULT_UNROLL_M 8
  592. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  593. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  594. #define CGEMM3M_DEFAULT_P 320
  595. #define ZGEMM3M_DEFAULT_P 256
  596. #define XGEMM3M_DEFAULT_P 112
  597. #define CGEMM3M_DEFAULT_Q 320
  598. #define ZGEMM3M_DEFAULT_Q 256
  599. #define XGEMM3M_DEFAULT_Q 224
  600. #define CGEMM3M_DEFAULT_R 12288
  601. #define ZGEMM3M_DEFAULT_R 12288
  602. #define XGEMM3M_DEFAULT_R 12288
  603. #endif
  604. #endif
  605. #ifdef ATHLON
  606. #define SNUMOPT 4
  607. #define DNUMOPT 2
  608. #define GEMM_DEFAULT_OFFSET_A 0
  609. #define GEMM_DEFAULT_OFFSET_B 384
  610. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  611. #define SGEMM_DEFAULT_UNROLL_N 4
  612. #define DGEMM_DEFAULT_UNROLL_N 4
  613. #define QGEMM_DEFAULT_UNROLL_N 2
  614. #define CGEMM_DEFAULT_UNROLL_N 2
  615. #define ZGEMM_DEFAULT_UNROLL_N 2
  616. #define XGEMM_DEFAULT_UNROLL_N 1
  617. #define SGEMM_DEFAULT_UNROLL_M 2
  618. #define DGEMM_DEFAULT_UNROLL_M 1
  619. #define QGEMM_DEFAULT_UNROLL_M 2
  620. #define CGEMM_DEFAULT_UNROLL_M 1
  621. #define ZGEMM_DEFAULT_UNROLL_M 1
  622. #define XGEMM_DEFAULT_UNROLL_M 1
  623. #define SGEMM_DEFAULT_R sgemm_r
  624. #define DGEMM_DEFAULT_R dgemm_r
  625. #define QGEMM_DEFAULT_R qgemm_r
  626. #define CGEMM_DEFAULT_R cgemm_r
  627. #define ZGEMM_DEFAULT_R zgemm_r
  628. #define XGEMM_DEFAULT_R xgemm_r
  629. #define SGEMM_DEFAULT_P 208
  630. #define DGEMM_DEFAULT_P 104
  631. #define QGEMM_DEFAULT_P 56
  632. #define CGEMM_DEFAULT_P 104
  633. #define ZGEMM_DEFAULT_P 56
  634. #define XGEMM_DEFAULT_P 28
  635. #define SGEMM_DEFAULT_Q 208
  636. #define DGEMM_DEFAULT_Q 208
  637. #define QGEMM_DEFAULT_Q 208
  638. #define CGEMM_DEFAULT_Q 208
  639. #define ZGEMM_DEFAULT_Q 208
  640. #define XGEMM_DEFAULT_Q 208
  641. #define SYMV_P 16
  642. #define HAVE_EXCLUSIVE_CACHE
  643. #endif
  644. #ifdef VIAC3
  645. #define SNUMOPT 2
  646. #define DNUMOPT 1
  647. #define GEMM_DEFAULT_OFFSET_A 0
  648. #define GEMM_DEFAULT_OFFSET_B 256
  649. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  650. #define SGEMM_DEFAULT_UNROLL_N 4
  651. #define DGEMM_DEFAULT_UNROLL_N 4
  652. #define QGEMM_DEFAULT_UNROLL_N 2
  653. #define CGEMM_DEFAULT_UNROLL_N 2
  654. #define ZGEMM_DEFAULT_UNROLL_N 2
  655. #define XGEMM_DEFAULT_UNROLL_N 1
  656. #define SGEMM_DEFAULT_UNROLL_M 2
  657. #define DGEMM_DEFAULT_UNROLL_M 1
  658. #define QGEMM_DEFAULT_UNROLL_M 2
  659. #define CGEMM_DEFAULT_UNROLL_M 1
  660. #define ZGEMM_DEFAULT_UNROLL_M 1
  661. #define XGEMM_DEFAULT_UNROLL_M 1
  662. #define SGEMM_DEFAULT_R sgemm_r
  663. #define DGEMM_DEFAULT_R dgemm_r
  664. #define QGEMM_DEFAULT_R qgemm_r
  665. #define CGEMM_DEFAULT_R cgemm_r
  666. #define ZGEMM_DEFAULT_R zgemm_r
  667. #define XGEMM_DEFAULT_R xgemm_r
  668. #define SGEMM_DEFAULT_P 128
  669. #define DGEMM_DEFAULT_P 128
  670. #define QGEMM_DEFAULT_P 128
  671. #define CGEMM_DEFAULT_P 128
  672. #define ZGEMM_DEFAULT_P 128
  673. #define XGEMM_DEFAULT_P 128
  674. #define SGEMM_DEFAULT_Q 512
  675. #define DGEMM_DEFAULT_Q 256
  676. #define QGEMM_DEFAULT_Q 256
  677. #define CGEMM_DEFAULT_Q 256
  678. #define ZGEMM_DEFAULT_Q 128
  679. #define XGEMM_DEFAULT_Q 128
  680. #define SYMV_P 16
  681. #endif
  682. #ifdef NANO
  683. #define SNUMOPT 4
  684. #define DNUMOPT 2
  685. #define GEMM_DEFAULT_OFFSET_A 64
  686. #define GEMM_DEFAULT_OFFSET_B 256
  687. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x01ffffUL
  688. #ifdef ARCH_X86
  689. #define SGEMM_DEFAULT_UNROLL_N 4
  690. #define DGEMM_DEFAULT_UNROLL_N 4
  691. #define QGEMM_DEFAULT_UNROLL_N 2
  692. #define CGEMM_DEFAULT_UNROLL_N 2
  693. #define ZGEMM_DEFAULT_UNROLL_N 2
  694. #define XGEMM_DEFAULT_UNROLL_N 1
  695. #define SGEMM_DEFAULT_UNROLL_M 4
  696. #define DGEMM_DEFAULT_UNROLL_M 2
  697. #define QGEMM_DEFAULT_UNROLL_M 2
  698. #define CGEMM_DEFAULT_UNROLL_M 2
  699. #define ZGEMM_DEFAULT_UNROLL_M 1
  700. #define XGEMM_DEFAULT_UNROLL_M 1
  701. #else
  702. #define SGEMM_DEFAULT_UNROLL_N 8
  703. #define DGEMM_DEFAULT_UNROLL_N 4
  704. #define QGEMM_DEFAULT_UNROLL_N 2
  705. #define CGEMM_DEFAULT_UNROLL_N 4
  706. #define ZGEMM_DEFAULT_UNROLL_N 2
  707. #define XGEMM_DEFAULT_UNROLL_N 1
  708. #define SGEMM_DEFAULT_UNROLL_M 4
  709. #define DGEMM_DEFAULT_UNROLL_M 4
  710. #define QGEMM_DEFAULT_UNROLL_M 2
  711. #define CGEMM_DEFAULT_UNROLL_M 2
  712. #define ZGEMM_DEFAULT_UNROLL_M 2
  713. #define XGEMM_DEFAULT_UNROLL_M 1
  714. #endif
  715. #define SGEMM_DEFAULT_P 288
  716. #define DGEMM_DEFAULT_P 288
  717. #define QGEMM_DEFAULT_P 288
  718. #define CGEMM_DEFAULT_P 288
  719. #define ZGEMM_DEFAULT_P 288
  720. #define XGEMM_DEFAULT_P 288
  721. #define SGEMM_DEFAULT_R sgemm_r
  722. #define DGEMM_DEFAULT_R dgemm_r
  723. #define QGEMM_DEFAULT_R qgemm_r
  724. #define CGEMM_DEFAULT_R cgemm_r
  725. #define ZGEMM_DEFAULT_R zgemm_r
  726. #define XGEMM_DEFAULT_R xgemm_r
  727. #define SGEMM_DEFAULT_Q 256
  728. #define DGEMM_DEFAULT_Q 128
  729. #define QGEMM_DEFAULT_Q 64
  730. #define CGEMM_DEFAULT_Q 128
  731. #define ZGEMM_DEFAULT_Q 64
  732. #define XGEMM_DEFAULT_Q 32
  733. #define SYMV_P 16
  734. #define HAVE_EXCLUSIVE_CACHE
  735. #endif
  736. #if defined(PENTIUM) || defined(PENTIUM2) || defined(PENTIUM3)
  737. #ifdef HAVE_SSE
  738. #define SNUMOPT 2
  739. #else
  740. #define SNUMOPT 1
  741. #endif
  742. #define DNUMOPT 1
  743. #define GEMM_DEFAULT_OFFSET_A 0
  744. #define GEMM_DEFAULT_OFFSET_B 0
  745. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  746. #ifdef HAVE_SSE
  747. #define SGEMM_DEFAULT_UNROLL_M 8
  748. #define CGEMM_DEFAULT_UNROLL_M 4
  749. #else
  750. #define SGEMM_DEFAULT_UNROLL_M 4
  751. #define CGEMM_DEFAULT_UNROLL_M 2
  752. #endif
  753. #define DGEMM_DEFAULT_UNROLL_M 2
  754. #define SGEMM_DEFAULT_UNROLL_N 2
  755. #define DGEMM_DEFAULT_UNROLL_N 2
  756. #define QGEMM_DEFAULT_UNROLL_M 2
  757. #define QGEMM_DEFAULT_UNROLL_N 2
  758. #define CGEMM_DEFAULT_UNROLL_N 1
  759. #define ZGEMM_DEFAULT_UNROLL_M 1
  760. #define ZGEMM_DEFAULT_UNROLL_N 1
  761. #define XGEMM_DEFAULT_UNROLL_M 1
  762. #define XGEMM_DEFAULT_UNROLL_N 1
  763. #define SGEMM_DEFAULT_P sgemm_p
  764. #define SGEMM_DEFAULT_Q 256
  765. #define SGEMM_DEFAULT_R sgemm_r
  766. #define DGEMM_DEFAULT_P dgemm_p
  767. #define DGEMM_DEFAULT_Q 256
  768. #define DGEMM_DEFAULT_R dgemm_r
  769. #define QGEMM_DEFAULT_P qgemm_p
  770. #define QGEMM_DEFAULT_Q 256
  771. #define QGEMM_DEFAULT_R qgemm_r
  772. #define CGEMM_DEFAULT_P cgemm_p
  773. #define CGEMM_DEFAULT_Q 256
  774. #define CGEMM_DEFAULT_R cgemm_r
  775. #define ZGEMM_DEFAULT_P zgemm_p
  776. #define ZGEMM_DEFAULT_Q 256
  777. #define ZGEMM_DEFAULT_R zgemm_r
  778. #define XGEMM_DEFAULT_P xgemm_p
  779. #define XGEMM_DEFAULT_Q 256
  780. #define XGEMM_DEFAULT_R xgemm_r
  781. #define SYMV_P 4
  782. #endif
  783. #ifdef PENTIUMM
  784. #define SNUMOPT 2
  785. #define DNUMOPT 1
  786. #define GEMM_DEFAULT_OFFSET_A 0
  787. #define GEMM_DEFAULT_OFFSET_B 0
  788. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  789. #ifdef CORE_YONAH
  790. #define SGEMM_DEFAULT_UNROLL_M 4
  791. #define SGEMM_DEFAULT_UNROLL_N 4
  792. #define DGEMM_DEFAULT_UNROLL_M 2
  793. #define DGEMM_DEFAULT_UNROLL_N 4
  794. #define QGEMM_DEFAULT_UNROLL_M 2
  795. #define QGEMM_DEFAULT_UNROLL_N 2
  796. #define CGEMM_DEFAULT_UNROLL_M 2
  797. #define CGEMM_DEFAULT_UNROLL_N 2
  798. #define ZGEMM_DEFAULT_UNROLL_M 1
  799. #define ZGEMM_DEFAULT_UNROLL_N 2
  800. #define XGEMM_DEFAULT_UNROLL_M 1
  801. #define XGEMM_DEFAULT_UNROLL_N 1
  802. #else
  803. #define SGEMM_DEFAULT_UNROLL_M 8
  804. #define SGEMM_DEFAULT_UNROLL_N 2
  805. #define DGEMM_DEFAULT_UNROLL_M 2
  806. #define DGEMM_DEFAULT_UNROLL_N 2
  807. #define QGEMM_DEFAULT_UNROLL_M 2
  808. #define QGEMM_DEFAULT_UNROLL_N 2
  809. #define CGEMM_DEFAULT_UNROLL_M 4
  810. #define CGEMM_DEFAULT_UNROLL_N 1
  811. #define ZGEMM_DEFAULT_UNROLL_M 1
  812. #define ZGEMM_DEFAULT_UNROLL_N 1
  813. #define XGEMM_DEFAULT_UNROLL_M 1
  814. #define XGEMM_DEFAULT_UNROLL_N 1
  815. #endif
  816. #define SGEMM_DEFAULT_P sgemm_p
  817. #define SGEMM_DEFAULT_Q 256
  818. #define SGEMM_DEFAULT_R sgemm_r
  819. #define DGEMM_DEFAULT_P dgemm_p
  820. #define DGEMM_DEFAULT_Q 256
  821. #define DGEMM_DEFAULT_R dgemm_r
  822. #define QGEMM_DEFAULT_P qgemm_p
  823. #define QGEMM_DEFAULT_Q 256
  824. #define QGEMM_DEFAULT_R qgemm_r
  825. #define CGEMM_DEFAULT_P cgemm_p
  826. #define CGEMM_DEFAULT_Q 256
  827. #define CGEMM_DEFAULT_R cgemm_r
  828. #define ZGEMM_DEFAULT_P zgemm_p
  829. #define ZGEMM_DEFAULT_Q 256
  830. #define ZGEMM_DEFAULT_R zgemm_r
  831. #define XGEMM_DEFAULT_P xgemm_p
  832. #define XGEMM_DEFAULT_Q 256
  833. #define XGEMM_DEFAULT_R xgemm_r
  834. #define SYMV_P 4
  835. #endif
  836. #ifdef CORE_NORTHWOOD
  837. #define SNUMOPT 4
  838. #define DNUMOPT 2
  839. #define GEMM_DEFAULT_OFFSET_A 0
  840. #define GEMM_DEFAULT_OFFSET_B 32
  841. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  842. #define SYMV_P 8
  843. #define SGEMM_DEFAULT_UNROLL_M 8
  844. #define DGEMM_DEFAULT_UNROLL_M 4
  845. #define QGEMM_DEFAULT_UNROLL_M 2
  846. #define CGEMM_DEFAULT_UNROLL_M 4
  847. #define ZGEMM_DEFAULT_UNROLL_M 2
  848. #define XGEMM_DEFAULT_UNROLL_M 1
  849. #define SGEMM_DEFAULT_UNROLL_N 2
  850. #define DGEMM_DEFAULT_UNROLL_N 2
  851. #define QGEMM_DEFAULT_UNROLL_N 2
  852. #define CGEMM_DEFAULT_UNROLL_N 1
  853. #define ZGEMM_DEFAULT_UNROLL_N 1
  854. #define XGEMM_DEFAULT_UNROLL_N 1
  855. #define SGEMM_DEFAULT_P sgemm_p
  856. #define SGEMM_DEFAULT_R sgemm_r
  857. #define DGEMM_DEFAULT_P dgemm_p
  858. #define DGEMM_DEFAULT_R dgemm_r
  859. #define QGEMM_DEFAULT_P qgemm_p
  860. #define QGEMM_DEFAULT_R qgemm_r
  861. #define CGEMM_DEFAULT_P cgemm_p
  862. #define CGEMM_DEFAULT_R cgemm_r
  863. #define ZGEMM_DEFAULT_P zgemm_p
  864. #define ZGEMM_DEFAULT_R zgemm_r
  865. #define XGEMM_DEFAULT_P xgemm_p
  866. #define XGEMM_DEFAULT_R xgemm_r
  867. #define SGEMM_DEFAULT_Q 128
  868. #define DGEMM_DEFAULT_Q 128
  869. #define QGEMM_DEFAULT_Q 128
  870. #define CGEMM_DEFAULT_Q 128
  871. #define ZGEMM_DEFAULT_Q 128
  872. #define XGEMM_DEFAULT_Q 128
  873. #endif
  874. #ifdef CORE_PRESCOTT
  875. #define SNUMOPT 4
  876. #define DNUMOPT 2
  877. #ifndef __64BIT__
  878. #define GEMM_DEFAULT_OFFSET_A 128
  879. #define GEMM_DEFAULT_OFFSET_B 192
  880. #else
  881. #define GEMM_DEFAULT_OFFSET_A 0
  882. #define GEMM_DEFAULT_OFFSET_B 256
  883. #endif
  884. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  885. #define SYMV_P 8
  886. #ifdef ARCH_X86
  887. #define SGEMM_DEFAULT_UNROLL_M 4
  888. #define DGEMM_DEFAULT_UNROLL_M 2
  889. #define QGEMM_DEFAULT_UNROLL_M 2
  890. #define CGEMM_DEFAULT_UNROLL_M 2
  891. #define ZGEMM_DEFAULT_UNROLL_M 1
  892. #define XGEMM_DEFAULT_UNROLL_M 1
  893. #else
  894. #define SGEMM_DEFAULT_UNROLL_M 8
  895. #define DGEMM_DEFAULT_UNROLL_M 4
  896. #define QGEMM_DEFAULT_UNROLL_M 2
  897. #define CGEMM_DEFAULT_UNROLL_M 4
  898. #define ZGEMM_DEFAULT_UNROLL_M 2
  899. #define XGEMM_DEFAULT_UNROLL_M 1
  900. #endif
  901. #define SGEMM_DEFAULT_UNROLL_N 4
  902. #define DGEMM_DEFAULT_UNROLL_N 4
  903. #define QGEMM_DEFAULT_UNROLL_N 2
  904. #define CGEMM_DEFAULT_UNROLL_N 2
  905. #define ZGEMM_DEFAULT_UNROLL_N 2
  906. #define XGEMM_DEFAULT_UNROLL_N 1
  907. #define SGEMM_DEFAULT_P sgemm_p
  908. #define SGEMM_DEFAULT_R sgemm_r
  909. #define DGEMM_DEFAULT_P dgemm_p
  910. #define DGEMM_DEFAULT_R dgemm_r
  911. #define QGEMM_DEFAULT_P qgemm_p
  912. #define QGEMM_DEFAULT_R qgemm_r
  913. #define CGEMM_DEFAULT_P cgemm_p
  914. #define CGEMM_DEFAULT_R cgemm_r
  915. #define ZGEMM_DEFAULT_P zgemm_p
  916. #define ZGEMM_DEFAULT_R zgemm_r
  917. #define XGEMM_DEFAULT_P xgemm_p
  918. #define XGEMM_DEFAULT_R xgemm_r
  919. #define SGEMM_DEFAULT_Q 128
  920. #define DGEMM_DEFAULT_Q 128
  921. #define QGEMM_DEFAULT_Q 128
  922. #define CGEMM_DEFAULT_Q 128
  923. #define ZGEMM_DEFAULT_Q 128
  924. #define XGEMM_DEFAULT_Q 128
  925. #endif
  926. #ifdef CORE2
  927. #define SNUMOPT 8
  928. #define DNUMOPT 4
  929. #define GEMM_DEFAULT_OFFSET_A 448
  930. #define GEMM_DEFAULT_OFFSET_B 128
  931. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  932. #define SYMV_P 8
  933. #define SWITCH_RATIO 4
  934. #ifdef ARCH_X86
  935. #define SGEMM_DEFAULT_UNROLL_M 8
  936. #define DGEMM_DEFAULT_UNROLL_M 4
  937. #define QGEMM_DEFAULT_UNROLL_M 2
  938. #define CGEMM_DEFAULT_UNROLL_M 4
  939. #define ZGEMM_DEFAULT_UNROLL_M 2
  940. #define XGEMM_DEFAULT_UNROLL_M 1
  941. #define SGEMM_DEFAULT_UNROLL_N 2
  942. #define DGEMM_DEFAULT_UNROLL_N 2
  943. #define QGEMM_DEFAULT_UNROLL_N 2
  944. #define CGEMM_DEFAULT_UNROLL_N 1
  945. #define ZGEMM_DEFAULT_UNROLL_N 1
  946. #define XGEMM_DEFAULT_UNROLL_N 1
  947. #define MASK(a, b) ((((a) + (b) - 1) / (b)) * (b))
  948. #else
  949. #define SGEMM_DEFAULT_UNROLL_M 8
  950. #define DGEMM_DEFAULT_UNROLL_M 4
  951. #define QGEMM_DEFAULT_UNROLL_M 2
  952. #define CGEMM_DEFAULT_UNROLL_M 4
  953. #define ZGEMM_DEFAULT_UNROLL_M 2
  954. #define XGEMM_DEFAULT_UNROLL_M 1
  955. #define SGEMM_DEFAULT_UNROLL_N 4
  956. #define DGEMM_DEFAULT_UNROLL_N 4
  957. #define QGEMM_DEFAULT_UNROLL_N 2
  958. #define CGEMM_DEFAULT_UNROLL_N 2
  959. #define ZGEMM_DEFAULT_UNROLL_N 2
  960. #define XGEMM_DEFAULT_UNROLL_N 1
  961. #endif
  962. #define SGEMM_DEFAULT_P sgemm_p
  963. #define SGEMM_DEFAULT_R sgemm_r
  964. #define DGEMM_DEFAULT_P dgemm_p
  965. #define DGEMM_DEFAULT_R dgemm_r
  966. #define QGEMM_DEFAULT_P qgemm_p
  967. #define QGEMM_DEFAULT_R qgemm_r
  968. #define CGEMM_DEFAULT_P cgemm_p
  969. #define CGEMM_DEFAULT_R cgemm_r
  970. #define ZGEMM_DEFAULT_P zgemm_p
  971. #define ZGEMM_DEFAULT_R zgemm_r
  972. #define XGEMM_DEFAULT_P xgemm_p
  973. #define XGEMM_DEFAULT_R xgemm_r
  974. #define SGEMM_DEFAULT_Q 256
  975. #define DGEMM_DEFAULT_Q 256
  976. #define QGEMM_DEFAULT_Q 256
  977. #define CGEMM_DEFAULT_Q 256
  978. #define ZGEMM_DEFAULT_Q 256
  979. #define XGEMM_DEFAULT_Q 256
  980. #endif
  981. #ifdef PENRYN
  982. #define SNUMOPT 8
  983. #define DNUMOPT 4
  984. #define GEMM_DEFAULT_OFFSET_A 128
  985. #define GEMM_DEFAULT_OFFSET_B 0
  986. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  987. #define SYMV_P 8
  988. #define SWITCH_RATIO 4
  989. #ifdef ARCH_X86
  990. #define SGEMM_DEFAULT_UNROLL_M 4
  991. #define DGEMM_DEFAULT_UNROLL_M 2
  992. #define QGEMM_DEFAULT_UNROLL_M 2
  993. #define CGEMM_DEFAULT_UNROLL_M 2
  994. #define ZGEMM_DEFAULT_UNROLL_M 1
  995. #define XGEMM_DEFAULT_UNROLL_M 1
  996. #define SGEMM_DEFAULT_UNROLL_N 4
  997. #define DGEMM_DEFAULT_UNROLL_N 4
  998. #define QGEMM_DEFAULT_UNROLL_N 2
  999. #define CGEMM_DEFAULT_UNROLL_N 2
  1000. #define ZGEMM_DEFAULT_UNROLL_N 2
  1001. #define XGEMM_DEFAULT_UNROLL_N 1
  1002. #else
  1003. #define SGEMM_DEFAULT_UNROLL_M 8
  1004. #define DGEMM_DEFAULT_UNROLL_M 4
  1005. #define QGEMM_DEFAULT_UNROLL_M 2
  1006. #define CGEMM_DEFAULT_UNROLL_M 4
  1007. #define ZGEMM_DEFAULT_UNROLL_M 2
  1008. #define XGEMM_DEFAULT_UNROLL_M 1
  1009. #define SGEMM_DEFAULT_UNROLL_N 4
  1010. #define DGEMM_DEFAULT_UNROLL_N 4
  1011. #define QGEMM_DEFAULT_UNROLL_N 2
  1012. #define CGEMM_DEFAULT_UNROLL_N 2
  1013. #define ZGEMM_DEFAULT_UNROLL_N 2
  1014. #define XGEMM_DEFAULT_UNROLL_N 1
  1015. #endif
  1016. #define SGEMM_DEFAULT_P sgemm_p
  1017. #define SGEMM_DEFAULT_R sgemm_r
  1018. #define DGEMM_DEFAULT_P dgemm_p
  1019. #define DGEMM_DEFAULT_R dgemm_r
  1020. #define QGEMM_DEFAULT_P qgemm_p
  1021. #define QGEMM_DEFAULT_R qgemm_r
  1022. #define CGEMM_DEFAULT_P cgemm_p
  1023. #define CGEMM_DEFAULT_R cgemm_r
  1024. #define ZGEMM_DEFAULT_P zgemm_p
  1025. #define ZGEMM_DEFAULT_R zgemm_r
  1026. #define XGEMM_DEFAULT_P xgemm_p
  1027. #define XGEMM_DEFAULT_R xgemm_r
  1028. #define SGEMM_DEFAULT_Q 512
  1029. #define DGEMM_DEFAULT_Q 256
  1030. #define QGEMM_DEFAULT_Q 128
  1031. #define CGEMM_DEFAULT_Q 512
  1032. #define ZGEMM_DEFAULT_Q 256
  1033. #define XGEMM_DEFAULT_Q 128
  1034. #define GETRF_FACTOR 0.75
  1035. #endif
  1036. #ifdef DUNNINGTON
  1037. #define SNUMOPT 8
  1038. #define DNUMOPT 4
  1039. #define GEMM_DEFAULT_OFFSET_A 128
  1040. #define GEMM_DEFAULT_OFFSET_B 0
  1041. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1042. #define SYMV_P 8
  1043. #define SWITCH_RATIO 4
  1044. #ifdef ARCH_X86
  1045. #define SGEMM_DEFAULT_UNROLL_M 4
  1046. #define DGEMM_DEFAULT_UNROLL_M 2
  1047. #define QGEMM_DEFAULT_UNROLL_M 2
  1048. #define CGEMM_DEFAULT_UNROLL_M 2
  1049. #define ZGEMM_DEFAULT_UNROLL_M 1
  1050. #define XGEMM_DEFAULT_UNROLL_M 1
  1051. #define SGEMM_DEFAULT_UNROLL_N 4
  1052. #define DGEMM_DEFAULT_UNROLL_N 4
  1053. #define QGEMM_DEFAULT_UNROLL_N 2
  1054. #define CGEMM_DEFAULT_UNROLL_N 2
  1055. #define ZGEMM_DEFAULT_UNROLL_N 2
  1056. #define XGEMM_DEFAULT_UNROLL_N 1
  1057. #else
  1058. #define SGEMM_DEFAULT_UNROLL_M 8
  1059. #define DGEMM_DEFAULT_UNROLL_M 4
  1060. #define QGEMM_DEFAULT_UNROLL_M 2
  1061. #define CGEMM_DEFAULT_UNROLL_M 4
  1062. #define ZGEMM_DEFAULT_UNROLL_M 2
  1063. #define XGEMM_DEFAULT_UNROLL_M 1
  1064. #define SGEMM_DEFAULT_UNROLL_N 4
  1065. #define DGEMM_DEFAULT_UNROLL_N 4
  1066. #define QGEMM_DEFAULT_UNROLL_N 2
  1067. #define CGEMM_DEFAULT_UNROLL_N 2
  1068. #define ZGEMM_DEFAULT_UNROLL_N 2
  1069. #define XGEMM_DEFAULT_UNROLL_N 1
  1070. #endif
  1071. #define SGEMM_DEFAULT_P sgemm_p
  1072. #define SGEMM_DEFAULT_R sgemm_r
  1073. #define DGEMM_DEFAULT_P dgemm_p
  1074. #define DGEMM_DEFAULT_R dgemm_r
  1075. #define QGEMM_DEFAULT_P qgemm_p
  1076. #define QGEMM_DEFAULT_R qgemm_r
  1077. #define CGEMM_DEFAULT_P cgemm_p
  1078. #define CGEMM_DEFAULT_R cgemm_r
  1079. #define ZGEMM_DEFAULT_P zgemm_p
  1080. #define ZGEMM_DEFAULT_R zgemm_r
  1081. #define XGEMM_DEFAULT_P xgemm_p
  1082. #define XGEMM_DEFAULT_R xgemm_r
  1083. #define SGEMM_DEFAULT_Q 768
  1084. #define DGEMM_DEFAULT_Q 384
  1085. #define QGEMM_DEFAULT_Q 192
  1086. #define CGEMM_DEFAULT_Q 768
  1087. #define ZGEMM_DEFAULT_Q 384
  1088. #define XGEMM_DEFAULT_Q 192
  1089. #define GETRF_FACTOR 0.75
  1090. #define GEMM_THREAD gemm_thread_mn
  1091. #endif
  1092. #ifdef NEHALEM
  1093. #define SNUMOPT 8
  1094. #define DNUMOPT 4
  1095. #define GEMM_DEFAULT_OFFSET_A 32
  1096. #define GEMM_DEFAULT_OFFSET_B 0
  1097. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1098. #define SYMV_P 8
  1099. #define SWITCH_RATIO 4
  1100. #ifdef ARCH_X86
  1101. #define SGEMM_DEFAULT_UNROLL_M 4
  1102. #define DGEMM_DEFAULT_UNROLL_M 2
  1103. #define QGEMM_DEFAULT_UNROLL_M 2
  1104. #define CGEMM_DEFAULT_UNROLL_M 2
  1105. #define ZGEMM_DEFAULT_UNROLL_M 1
  1106. #define XGEMM_DEFAULT_UNROLL_M 1
  1107. #define SGEMM_DEFAULT_UNROLL_N 4
  1108. #define DGEMM_DEFAULT_UNROLL_N 4
  1109. #define QGEMM_DEFAULT_UNROLL_N 2
  1110. #define CGEMM_DEFAULT_UNROLL_N 2
  1111. #define ZGEMM_DEFAULT_UNROLL_N 2
  1112. #define XGEMM_DEFAULT_UNROLL_N 1
  1113. #else
  1114. #define SGEMM_DEFAULT_UNROLL_M 4
  1115. #define DGEMM_DEFAULT_UNROLL_M 2
  1116. #define QGEMM_DEFAULT_UNROLL_M 2
  1117. #define CGEMM_DEFAULT_UNROLL_M 2
  1118. #define ZGEMM_DEFAULT_UNROLL_M 1
  1119. #define XGEMM_DEFAULT_UNROLL_M 1
  1120. #define SGEMM_DEFAULT_UNROLL_N 8
  1121. #define DGEMM_DEFAULT_UNROLL_N 8
  1122. #define QGEMM_DEFAULT_UNROLL_N 2
  1123. #define CGEMM_DEFAULT_UNROLL_N 4
  1124. #define ZGEMM_DEFAULT_UNROLL_N 4
  1125. #define XGEMM_DEFAULT_UNROLL_N 1
  1126. #endif
  1127. #define SGEMM_DEFAULT_P 504
  1128. #define SGEMM_DEFAULT_R sgemm_r
  1129. #define DGEMM_DEFAULT_P 504
  1130. #define DGEMM_DEFAULT_R dgemm_r
  1131. #define QGEMM_DEFAULT_P 504
  1132. #define QGEMM_DEFAULT_R qgemm_r
  1133. #define CGEMM_DEFAULT_P 252
  1134. #define CGEMM_DEFAULT_R cgemm_r
  1135. #define ZGEMM_DEFAULT_P 252
  1136. #define ZGEMM_DEFAULT_R zgemm_r
  1137. #define XGEMM_DEFAULT_P 252
  1138. #define XGEMM_DEFAULT_R xgemm_r
  1139. #define SGEMM_DEFAULT_Q 512
  1140. #define DGEMM_DEFAULT_Q 256
  1141. #define QGEMM_DEFAULT_Q 128
  1142. #define CGEMM_DEFAULT_Q 512
  1143. #define ZGEMM_DEFAULT_Q 256
  1144. #define XGEMM_DEFAULT_Q 128
  1145. #define GETRF_FACTOR 0.72
  1146. #endif
  1147. #ifdef SANDYBRIDGE
  1148. #define SNUMOPT 8
  1149. #define DNUMOPT 4
  1150. #define GEMM_DEFAULT_OFFSET_A 0
  1151. #define GEMM_DEFAULT_OFFSET_B 0
  1152. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1153. #define SYMV_P 8
  1154. #define SWITCH_RATIO 4
  1155. #ifdef ARCH_X86
  1156. #define SGEMM_DEFAULT_UNROLL_M 4
  1157. #define DGEMM_DEFAULT_UNROLL_M 2
  1158. #define QGEMM_DEFAULT_UNROLL_M 2
  1159. #define CGEMM_DEFAULT_UNROLL_M 2
  1160. #define ZGEMM_DEFAULT_UNROLL_M 1
  1161. #define XGEMM_DEFAULT_UNROLL_M 1
  1162. #define SGEMM_DEFAULT_UNROLL_N 4
  1163. #define DGEMM_DEFAULT_UNROLL_N 4
  1164. #define QGEMM_DEFAULT_UNROLL_N 2
  1165. #define CGEMM_DEFAULT_UNROLL_N 2
  1166. #define ZGEMM_DEFAULT_UNROLL_N 2
  1167. #define XGEMM_DEFAULT_UNROLL_N 1
  1168. #else
  1169. #define SGEMM_DEFAULT_UNROLL_M 16
  1170. #define DGEMM_DEFAULT_UNROLL_M 8
  1171. #define QGEMM_DEFAULT_UNROLL_M 2
  1172. #define CGEMM_DEFAULT_UNROLL_M 8
  1173. #define ZGEMM_DEFAULT_UNROLL_M 1
  1174. #define XGEMM_DEFAULT_UNROLL_M 1
  1175. #define SGEMM_DEFAULT_UNROLL_N 4
  1176. #define DGEMM_DEFAULT_UNROLL_N 4
  1177. #define QGEMM_DEFAULT_UNROLL_N 2
  1178. #define CGEMM_DEFAULT_UNROLL_N 2
  1179. #define ZGEMM_DEFAULT_UNROLL_N 4
  1180. #define XGEMM_DEFAULT_UNROLL_N 1
  1181. #endif
  1182. #define SGEMM_DEFAULT_P 768
  1183. #define SGEMM_DEFAULT_R sgemm_r
  1184. /*#define SGEMM_DEFAULT_R 1024*/
  1185. #define DGEMM_DEFAULT_P 512
  1186. #define DGEMM_DEFAULT_R dgemm_r
  1187. /*#define DGEMM_DEFAULT_R 1024*/
  1188. #define QGEMM_DEFAULT_P 504
  1189. #define QGEMM_DEFAULT_R qgemm_r
  1190. #define CGEMM_DEFAULT_P 768
  1191. #define CGEMM_DEFAULT_R cgemm_r
  1192. /*#define CGEMM_DEFAULT_R 1024*/
  1193. #define ZGEMM_DEFAULT_P 512
  1194. #define ZGEMM_DEFAULT_R zgemm_r
  1195. /*#define ZGEMM_DEFAULT_R 1024*/
  1196. #define XGEMM_DEFAULT_P 252
  1197. #define XGEMM_DEFAULT_R xgemm_r
  1198. #define SGEMM_DEFAULT_Q 384
  1199. #define DGEMM_DEFAULT_Q 256
  1200. #define QGEMM_DEFAULT_Q 128
  1201. #define CGEMM_DEFAULT_Q 512
  1202. #define ZGEMM_DEFAULT_Q 192
  1203. #define XGEMM_DEFAULT_Q 128
  1204. #define CGEMM3M_DEFAULT_UNROLL_N 8
  1205. #define CGEMM3M_DEFAULT_UNROLL_M 4
  1206. #define ZGEMM3M_DEFAULT_UNROLL_N 8
  1207. #define ZGEMM3M_DEFAULT_UNROLL_M 2
  1208. #define CGEMM3M_DEFAULT_P 448
  1209. #define ZGEMM3M_DEFAULT_P 224
  1210. #define XGEMM3M_DEFAULT_P 112
  1211. #define CGEMM3M_DEFAULT_Q 224
  1212. #define ZGEMM3M_DEFAULT_Q 224
  1213. #define XGEMM3M_DEFAULT_Q 224
  1214. #define CGEMM3M_DEFAULT_R 12288
  1215. #define ZGEMM3M_DEFAULT_R 12288
  1216. #define XGEMM3M_DEFAULT_R 12288
  1217. #define GETRF_FACTOR 0.72
  1218. #endif
  1219. #ifdef HASWELL
  1220. #define SNUMOPT 16
  1221. #define DNUMOPT 8
  1222. #define GEMM_DEFAULT_OFFSET_A 0
  1223. #define GEMM_DEFAULT_OFFSET_B 0
  1224. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1225. #define SYMV_P 8
  1226. #if defined(XDOUBLE) || defined(DOUBLE)
  1227. #define SWITCH_RATIO 4
  1228. #define GEMM_PREFERED_SIZE 4
  1229. #else
  1230. #define SWITCH_RATIO 8
  1231. #define GEMM_PREFERED_SIZE 8
  1232. #endif
  1233. #ifdef ARCH_X86
  1234. #define SGEMM_DEFAULT_UNROLL_M 4
  1235. #define DGEMM_DEFAULT_UNROLL_M 2
  1236. #define QGEMM_DEFAULT_UNROLL_M 2
  1237. #define CGEMM_DEFAULT_UNROLL_M 2
  1238. #define ZGEMM_DEFAULT_UNROLL_M 1
  1239. #define XGEMM_DEFAULT_UNROLL_M 1
  1240. #define SGEMM_DEFAULT_UNROLL_N 4
  1241. #define DGEMM_DEFAULT_UNROLL_N 4
  1242. #define QGEMM_DEFAULT_UNROLL_N 2
  1243. #define CGEMM_DEFAULT_UNROLL_N 2
  1244. #define ZGEMM_DEFAULT_UNROLL_N 2
  1245. #define XGEMM_DEFAULT_UNROLL_N 1
  1246. #else
  1247. #define SGEMM_DEFAULT_UNROLL_M 8
  1248. #define DGEMM_DEFAULT_UNROLL_M 4
  1249. #define QGEMM_DEFAULT_UNROLL_M 2
  1250. #define CGEMM_DEFAULT_UNROLL_M 8
  1251. #define ZGEMM_DEFAULT_UNROLL_M 4
  1252. #define XGEMM_DEFAULT_UNROLL_M 1
  1253. #define SGEMM_DEFAULT_UNROLL_N 4
  1254. #define DGEMM_DEFAULT_UNROLL_N 8
  1255. #define QGEMM_DEFAULT_UNROLL_N 2
  1256. #define CGEMM_DEFAULT_UNROLL_N 2
  1257. #define ZGEMM_DEFAULT_UNROLL_N 2
  1258. #define XGEMM_DEFAULT_UNROLL_N 1
  1259. /*
  1260. #define SGEMM_DEFAULT_UNROLL_MN 32
  1261. #define DGEMM_DEFAULT_UNROLL_MN 32
  1262. */
  1263. #endif
  1264. #ifdef ARCH_X86
  1265. #define SGEMM_DEFAULT_P 512
  1266. #define SGEMM_DEFAULT_R sgemm_r
  1267. #define DGEMM_DEFAULT_P 512
  1268. #define DGEMM_DEFAULT_R dgemm_r
  1269. #define QGEMM_DEFAULT_P 504
  1270. #define QGEMM_DEFAULT_R qgemm_r
  1271. #define CGEMM_DEFAULT_P 128
  1272. #define CGEMM_DEFAULT_R 1024
  1273. #define ZGEMM_DEFAULT_P 512
  1274. #define ZGEMM_DEFAULT_R zgemm_r
  1275. #define XGEMM_DEFAULT_P 252
  1276. #define XGEMM_DEFAULT_R xgemm_r
  1277. #define SGEMM_DEFAULT_Q 256
  1278. #define DGEMM_DEFAULT_Q 256
  1279. #define QGEMM_DEFAULT_Q 128
  1280. #define CGEMM_DEFAULT_Q 256
  1281. #define ZGEMM_DEFAULT_Q 192
  1282. #define XGEMM_DEFAULT_Q 128
  1283. #else
  1284. #define SGEMM_DEFAULT_P 320
  1285. #define DGEMM_DEFAULT_P 512
  1286. #define CGEMM_DEFAULT_P 256
  1287. #define ZGEMM_DEFAULT_P 192
  1288. #ifdef WINDOWS_ABI
  1289. #define SGEMM_DEFAULT_Q 320
  1290. #define DGEMM_DEFAULT_Q 128
  1291. #else
  1292. #define SGEMM_DEFAULT_Q 320
  1293. #define DGEMM_DEFAULT_Q 256
  1294. #endif
  1295. #define CGEMM_DEFAULT_Q 256
  1296. #define ZGEMM_DEFAULT_Q 192
  1297. #define SGEMM_DEFAULT_R sgemm_r
  1298. #define DGEMM_DEFAULT_R 13824
  1299. #define CGEMM_DEFAULT_R cgemm_r
  1300. #define ZGEMM_DEFAULT_R zgemm_r
  1301. #define QGEMM_DEFAULT_Q 128
  1302. #define QGEMM_DEFAULT_P 504
  1303. #define QGEMM_DEFAULT_R qgemm_r
  1304. #define XGEMM_DEFAULT_P 252
  1305. #define XGEMM_DEFAULT_R xgemm_r
  1306. #define XGEMM_DEFAULT_Q 128
  1307. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1308. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1309. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1310. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1311. #define CGEMM3M_DEFAULT_P 320
  1312. #define ZGEMM3M_DEFAULT_P 256
  1313. #define XGEMM3M_DEFAULT_P 112
  1314. #define CGEMM3M_DEFAULT_Q 320
  1315. #define ZGEMM3M_DEFAULT_Q 256
  1316. #define XGEMM3M_DEFAULT_Q 224
  1317. #define CGEMM3M_DEFAULT_R 12288
  1318. #define ZGEMM3M_DEFAULT_R 12288
  1319. #define XGEMM3M_DEFAULT_R 12288
  1320. #endif
  1321. #endif
  1322. #ifdef SKYLAKEX
  1323. #define SNUMOPT 16
  1324. #define DNUMOPT 8
  1325. #define GEMM_DEFAULT_OFFSET_A 0
  1326. #define GEMM_DEFAULT_OFFSET_B 0
  1327. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1328. #define SYMV_P 8
  1329. #if defined(XDOUBLE) || defined(DOUBLE)
  1330. #define SWITCH_RATIO 8
  1331. #define GEMM_PREFERED_SIZE 8
  1332. #else
  1333. #define SWITCH_RATIO 16
  1334. #define GEMM_PREFERED_SIZE 16
  1335. #endif
  1336. #define USE_SGEMM_KERNEL_DIRECT 1
  1337. #ifdef ARCH_X86
  1338. #define SGEMM_DEFAULT_UNROLL_M 4
  1339. #define DGEMM_DEFAULT_UNROLL_M 2
  1340. #define QGEMM_DEFAULT_UNROLL_M 2
  1341. #define CGEMM_DEFAULT_UNROLL_M 2
  1342. #define ZGEMM_DEFAULT_UNROLL_M 1
  1343. #define XGEMM_DEFAULT_UNROLL_M 1
  1344. #define SGEMM_DEFAULT_UNROLL_N 4
  1345. #define DGEMM_DEFAULT_UNROLL_N 4
  1346. #define QGEMM_DEFAULT_UNROLL_N 2
  1347. #define CGEMM_DEFAULT_UNROLL_N 2
  1348. #define ZGEMM_DEFAULT_UNROLL_N 2
  1349. #define XGEMM_DEFAULT_UNROLL_N 1
  1350. #else
  1351. #define SGEMM_DEFAULT_UNROLL_M 16
  1352. #define DGEMM_DEFAULT_UNROLL_M 16
  1353. #define QGEMM_DEFAULT_UNROLL_M 2
  1354. #define CGEMM_DEFAULT_UNROLL_M 8
  1355. #define ZGEMM_DEFAULT_UNROLL_M 4
  1356. #define XGEMM_DEFAULT_UNROLL_M 1
  1357. #define SGEMM_DEFAULT_UNROLL_N 4
  1358. #define DGEMM_DEFAULT_UNROLL_N 2
  1359. #define QGEMM_DEFAULT_UNROLL_N 2
  1360. #define CGEMM_DEFAULT_UNROLL_N 2
  1361. #define ZGEMM_DEFAULT_UNROLL_N 2
  1362. #define XGEMM_DEFAULT_UNROLL_N 1
  1363. #define SGEMM_DEFAULT_UNROLL_MN 32
  1364. #define DGEMM_DEFAULT_UNROLL_MN 32
  1365. #endif
  1366. #ifdef ARCH_X86
  1367. #define SGEMM_DEFAULT_P 512
  1368. #define SGEMM_DEFAULT_R sgemm_r
  1369. #define DGEMM_DEFAULT_P 512
  1370. #define DGEMM_DEFAULT_R dgemm_r
  1371. #define QGEMM_DEFAULT_P 504
  1372. #define QGEMM_DEFAULT_R qgemm_r
  1373. #define CGEMM_DEFAULT_P 128
  1374. #define CGEMM_DEFAULT_R 1024
  1375. #define ZGEMM_DEFAULT_P 512
  1376. #define ZGEMM_DEFAULT_R zgemm_r
  1377. #define XGEMM_DEFAULT_P 252
  1378. #define XGEMM_DEFAULT_R xgemm_r
  1379. #define SGEMM_DEFAULT_Q 256
  1380. #define DGEMM_DEFAULT_Q 256
  1381. #define QGEMM_DEFAULT_Q 128
  1382. #define CGEMM_DEFAULT_Q 256
  1383. #define ZGEMM_DEFAULT_Q 192
  1384. #define XGEMM_DEFAULT_Q 128
  1385. #else
  1386. #define SGEMM_DEFAULT_P 448
  1387. #define DGEMM_DEFAULT_P 192
  1388. #define CGEMM_DEFAULT_P 384
  1389. #define ZGEMM_DEFAULT_P 256
  1390. #define SGEMM_DEFAULT_Q 448
  1391. #define DGEMM_DEFAULT_Q 384
  1392. #define CGEMM_DEFAULT_Q 192
  1393. #define ZGEMM_DEFAULT_Q 128
  1394. #define SGEMM_DEFAULT_R sgemm_r
  1395. #define DGEMM_DEFAULT_R 8640
  1396. #define CGEMM_DEFAULT_R cgemm_r
  1397. #define ZGEMM_DEFAULT_R zgemm_r
  1398. #define QGEMM_DEFAULT_Q 128
  1399. #define QGEMM_DEFAULT_P 504
  1400. #define QGEMM_DEFAULT_R qgemm_r
  1401. #define XGEMM_DEFAULT_P 252
  1402. #define XGEMM_DEFAULT_R xgemm_r
  1403. #define XGEMM_DEFAULT_Q 128
  1404. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1405. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1406. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1407. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1408. #define CGEMM3M_DEFAULT_P 320
  1409. #define ZGEMM3M_DEFAULT_P 256
  1410. #define XGEMM3M_DEFAULT_P 112
  1411. #define CGEMM3M_DEFAULT_Q 320
  1412. #define ZGEMM3M_DEFAULT_Q 256
  1413. #define XGEMM3M_DEFAULT_Q 224
  1414. #define CGEMM3M_DEFAULT_R 12288
  1415. #define ZGEMM3M_DEFAULT_R 12288
  1416. #define XGEMM3M_DEFAULT_R 12288
  1417. #endif
  1418. #endif
  1419. #ifdef SAPPHIRERAPIDS
  1420. #define SNUMOPT 16
  1421. #define DNUMOPT 8
  1422. #define GEMM_DEFAULT_OFFSET_A 0
  1423. #define GEMM_DEFAULT_OFFSET_B 0
  1424. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1425. #define SYMV_P 8
  1426. #if defined(XDOUBLE) || defined(DOUBLE)
  1427. #define SWITCH_RATIO 8
  1428. #define GEMM_PREFERED_SIZE 8
  1429. #else
  1430. #define SWITCH_RATIO 16
  1431. #define GEMM_PREFERED_SIZE 16
  1432. #endif
  1433. #define USE_SGEMM_KERNEL_DIRECT 1
  1434. #undef SBGEMM_DEFAULT_UNROLL_N
  1435. #undef SBGEMM_DEFAULT_UNROLL_M
  1436. #undef SBGEMM_DEFAULT_P
  1437. #undef SBGEMM_DEFAULT_R
  1438. #undef SBGEMM_DEFAULT_Q
  1439. // FIXME: actually UNROLL_M = UNROLL_N = 16
  1440. // If M and N is equal, OpenBLAS will reuse OCOPY as ICOPY.
  1441. // But for AMX, they are not the same, set UNROLL_M = 32 to workaround
  1442. #define SBGEMM_DEFAULT_UNROLL_N 16
  1443. #define SBGEMM_DEFAULT_UNROLL_M 32
  1444. #define SBGEMM_DEFAULT_P 256
  1445. #define SBGEMM_DEFAULT_Q 1024
  1446. #define SBGEMM_DEFAULT_R sbgemm_r
  1447. #ifdef ARCH_X86
  1448. #define SGEMM_DEFAULT_UNROLL_M 4
  1449. #define DGEMM_DEFAULT_UNROLL_M 2
  1450. #define QGEMM_DEFAULT_UNROLL_M 2
  1451. #define CGEMM_DEFAULT_UNROLL_M 2
  1452. #define ZGEMM_DEFAULT_UNROLL_M 1
  1453. #define XGEMM_DEFAULT_UNROLL_M 1
  1454. #define SGEMM_DEFAULT_UNROLL_N 4
  1455. #define DGEMM_DEFAULT_UNROLL_N 4
  1456. #define QGEMM_DEFAULT_UNROLL_N 2
  1457. #define CGEMM_DEFAULT_UNROLL_N 2
  1458. #define ZGEMM_DEFAULT_UNROLL_N 2
  1459. #define XGEMM_DEFAULT_UNROLL_N 1
  1460. #else
  1461. #define SGEMM_DEFAULT_UNROLL_M 16
  1462. #define DGEMM_DEFAULT_UNROLL_M 16
  1463. #define QGEMM_DEFAULT_UNROLL_M 2
  1464. #define CGEMM_DEFAULT_UNROLL_M 8
  1465. #define ZGEMM_DEFAULT_UNROLL_M 4
  1466. #define XGEMM_DEFAULT_UNROLL_M 1
  1467. #define SGEMM_DEFAULT_UNROLL_N 4
  1468. #define DGEMM_DEFAULT_UNROLL_N 2
  1469. #define QGEMM_DEFAULT_UNROLL_N 2
  1470. #define CGEMM_DEFAULT_UNROLL_N 2
  1471. #define ZGEMM_DEFAULT_UNROLL_N 2
  1472. #define XGEMM_DEFAULT_UNROLL_N 1
  1473. #define SGEMM_DEFAULT_UNROLL_MN 32
  1474. #define DGEMM_DEFAULT_UNROLL_MN 32
  1475. #endif
  1476. #ifdef ARCH_X86
  1477. #define SGEMM_DEFAULT_P 512
  1478. #define SGEMM_DEFAULT_R sgemm_r
  1479. #define DGEMM_DEFAULT_P 512
  1480. #define DGEMM_DEFAULT_R dgemm_r
  1481. #define QGEMM_DEFAULT_P 504
  1482. #define QGEMM_DEFAULT_R qgemm_r
  1483. #define CGEMM_DEFAULT_P 128
  1484. #define CGEMM_DEFAULT_R 1024
  1485. #define ZGEMM_DEFAULT_P 512
  1486. #define ZGEMM_DEFAULT_R zgemm_r
  1487. #define XGEMM_DEFAULT_P 252
  1488. #define XGEMM_DEFAULT_R xgemm_r
  1489. #define SGEMM_DEFAULT_Q 256
  1490. #define DGEMM_DEFAULT_Q 256
  1491. #define QGEMM_DEFAULT_Q 128
  1492. #define CGEMM_DEFAULT_Q 256
  1493. #define ZGEMM_DEFAULT_Q 192
  1494. #define XGEMM_DEFAULT_Q 128
  1495. #else
  1496. #define SGEMM_DEFAULT_P 640
  1497. #define DGEMM_DEFAULT_P 192
  1498. #define CGEMM_DEFAULT_P 384
  1499. #define ZGEMM_DEFAULT_P 256
  1500. #define SGEMM_DEFAULT_Q 320
  1501. #define DGEMM_DEFAULT_Q 384
  1502. #define CGEMM_DEFAULT_Q 192
  1503. #define ZGEMM_DEFAULT_Q 128
  1504. #define SGEMM_DEFAULT_R sgemm_r
  1505. #define DGEMM_DEFAULT_R 8640
  1506. #define CGEMM_DEFAULT_R cgemm_r
  1507. #define ZGEMM_DEFAULT_R zgemm_r
  1508. #define QGEMM_DEFAULT_Q 128
  1509. #define QGEMM_DEFAULT_P 504
  1510. #define QGEMM_DEFAULT_R qgemm_r
  1511. #define XGEMM_DEFAULT_P 252
  1512. #define XGEMM_DEFAULT_R xgemm_r
  1513. #define XGEMM_DEFAULT_Q 128
  1514. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1515. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1516. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1517. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1518. #define CGEMM3M_DEFAULT_P 320
  1519. #define ZGEMM3M_DEFAULT_P 256
  1520. #define XGEMM3M_DEFAULT_P 112
  1521. #define CGEMM3M_DEFAULT_Q 320
  1522. #define ZGEMM3M_DEFAULT_Q 256
  1523. #define XGEMM3M_DEFAULT_Q 224
  1524. #define CGEMM3M_DEFAULT_R 12288
  1525. #define ZGEMM3M_DEFAULT_R 12288
  1526. #define XGEMM3M_DEFAULT_R 12288
  1527. #endif
  1528. #endif
  1529. #ifdef COOPERLAKE
  1530. #define SNUMOPT 16
  1531. #define DNUMOPT 8
  1532. #define GEMM_DEFAULT_OFFSET_A 0
  1533. #define GEMM_DEFAULT_OFFSET_B 0
  1534. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1535. #define SYMV_P 8
  1536. #if defined(XDOUBLE) || defined(DOUBLE)
  1537. #define SWITCH_RATIO 8
  1538. #define GEMM_PREFERED_SIZE 8
  1539. #else
  1540. #define SWITCH_RATIO 16
  1541. #define GEMM_PREFERED_SIZE 16
  1542. #endif
  1543. #define USE_SGEMM_KERNEL_DIRECT 1
  1544. #undef SBGEMM_DEFAULT_UNROLL_N
  1545. #undef SBGEMM_DEFAULT_UNROLL_M
  1546. #undef SBGEMM_DEFAULT_P
  1547. #undef SBGEMM_DEFAULT_R
  1548. #undef SBGEMM_DEFAULT_Q
  1549. #define SBGEMM_DEFAULT_UNROLL_N 4
  1550. #define SBGEMM_DEFAULT_UNROLL_M 16
  1551. #define SBGEMM_DEFAULT_P 384
  1552. #define SBGEMM_DEFAULT_Q 768
  1553. #define SBGEMM_DEFAULT_R sbgemm_r
  1554. #ifdef ARCH_X86
  1555. #define SGEMM_DEFAULT_UNROLL_M 4
  1556. #define DGEMM_DEFAULT_UNROLL_M 2
  1557. #define QGEMM_DEFAULT_UNROLL_M 2
  1558. #define CGEMM_DEFAULT_UNROLL_M 2
  1559. #define ZGEMM_DEFAULT_UNROLL_M 1
  1560. #define XGEMM_DEFAULT_UNROLL_M 1
  1561. #define SGEMM_DEFAULT_UNROLL_N 4
  1562. #define DGEMM_DEFAULT_UNROLL_N 4
  1563. #define QGEMM_DEFAULT_UNROLL_N 2
  1564. #define CGEMM_DEFAULT_UNROLL_N 2
  1565. #define ZGEMM_DEFAULT_UNROLL_N 2
  1566. #define XGEMM_DEFAULT_UNROLL_N 1
  1567. #else
  1568. #define SGEMM_DEFAULT_UNROLL_M 16
  1569. #define DGEMM_DEFAULT_UNROLL_M 16
  1570. #define QGEMM_DEFAULT_UNROLL_M 2
  1571. #define CGEMM_DEFAULT_UNROLL_M 8
  1572. #define ZGEMM_DEFAULT_UNROLL_M 4
  1573. #define XGEMM_DEFAULT_UNROLL_M 1
  1574. #define SGEMM_DEFAULT_UNROLL_N 4
  1575. #define DGEMM_DEFAULT_UNROLL_N 2
  1576. #define QGEMM_DEFAULT_UNROLL_N 2
  1577. #define CGEMM_DEFAULT_UNROLL_N 2
  1578. #define ZGEMM_DEFAULT_UNROLL_N 2
  1579. #define XGEMM_DEFAULT_UNROLL_N 1
  1580. #define SGEMM_DEFAULT_UNROLL_MN 32
  1581. #define DGEMM_DEFAULT_UNROLL_MN 32
  1582. #endif
  1583. #ifdef ARCH_X86
  1584. #define SGEMM_DEFAULT_P 512
  1585. #define SGEMM_DEFAULT_R sgemm_r
  1586. #define DGEMM_DEFAULT_P 512
  1587. #define DGEMM_DEFAULT_R dgemm_r
  1588. #define QGEMM_DEFAULT_P 504
  1589. #define QGEMM_DEFAULT_R qgemm_r
  1590. #define CGEMM_DEFAULT_P 128
  1591. #define CGEMM_DEFAULT_R 1024
  1592. #define ZGEMM_DEFAULT_P 512
  1593. #define ZGEMM_DEFAULT_R zgemm_r
  1594. #define XGEMM_DEFAULT_P 252
  1595. #define XGEMM_DEFAULT_R xgemm_r
  1596. #define SGEMM_DEFAULT_Q 256
  1597. #define DGEMM_DEFAULT_Q 256
  1598. #define QGEMM_DEFAULT_Q 128
  1599. #define CGEMM_DEFAULT_Q 256
  1600. #define ZGEMM_DEFAULT_Q 192
  1601. #define XGEMM_DEFAULT_Q 128
  1602. #else
  1603. #define SGEMM_DEFAULT_P 640
  1604. #define DGEMM_DEFAULT_P 192
  1605. #define CGEMM_DEFAULT_P 384
  1606. #define ZGEMM_DEFAULT_P 256
  1607. #define SGEMM_DEFAULT_Q 320
  1608. #define DGEMM_DEFAULT_Q 384
  1609. #define CGEMM_DEFAULT_Q 192
  1610. #define ZGEMM_DEFAULT_Q 128
  1611. #define SGEMM_DEFAULT_R sgemm_r
  1612. #define DGEMM_DEFAULT_R 8640
  1613. #define CGEMM_DEFAULT_R cgemm_r
  1614. #define ZGEMM_DEFAULT_R zgemm_r
  1615. #define QGEMM_DEFAULT_Q 128
  1616. #define QGEMM_DEFAULT_P 504
  1617. #define QGEMM_DEFAULT_R qgemm_r
  1618. #define XGEMM_DEFAULT_P 252
  1619. #define XGEMM_DEFAULT_R xgemm_r
  1620. #define XGEMM_DEFAULT_Q 128
  1621. #define CGEMM3M_DEFAULT_UNROLL_N 4
  1622. #define CGEMM3M_DEFAULT_UNROLL_M 8
  1623. #define ZGEMM3M_DEFAULT_UNROLL_N 4
  1624. #define ZGEMM3M_DEFAULT_UNROLL_M 4
  1625. #define CGEMM3M_DEFAULT_P 320
  1626. #define ZGEMM3M_DEFAULT_P 256
  1627. #define XGEMM3M_DEFAULT_P 112
  1628. #define CGEMM3M_DEFAULT_Q 320
  1629. #define ZGEMM3M_DEFAULT_Q 256
  1630. #define XGEMM3M_DEFAULT_Q 224
  1631. #define CGEMM3M_DEFAULT_R 12288
  1632. #define ZGEMM3M_DEFAULT_R 12288
  1633. #define XGEMM3M_DEFAULT_R 12288
  1634. #endif
  1635. #endif
  1636. #ifdef ATOM
  1637. #define SNUMOPT 2
  1638. #define DNUMOPT 1
  1639. #define GEMM_DEFAULT_OFFSET_A 64
  1640. #define GEMM_DEFAULT_OFFSET_B 0
  1641. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1642. #define SYMV_P 8
  1643. #ifdef ARCH_X86
  1644. #define SGEMM_DEFAULT_UNROLL_M 4
  1645. #define DGEMM_DEFAULT_UNROLL_M 2
  1646. #define QGEMM_DEFAULT_UNROLL_M 2
  1647. #define CGEMM_DEFAULT_UNROLL_M 2
  1648. #define ZGEMM_DEFAULT_UNROLL_M 1
  1649. #define XGEMM_DEFAULT_UNROLL_M 1
  1650. #else
  1651. #define SGEMM_DEFAULT_UNROLL_M 8
  1652. #define DGEMM_DEFAULT_UNROLL_M 4
  1653. #define QGEMM_DEFAULT_UNROLL_M 2
  1654. #define CGEMM_DEFAULT_UNROLL_M 4
  1655. #define ZGEMM_DEFAULT_UNROLL_M 2
  1656. #define XGEMM_DEFAULT_UNROLL_M 1
  1657. #endif
  1658. #define SGEMM_DEFAULT_UNROLL_N 4
  1659. #define DGEMM_DEFAULT_UNROLL_N 2
  1660. #define QGEMM_DEFAULT_UNROLL_N 2
  1661. #define CGEMM_DEFAULT_UNROLL_N 2
  1662. #define ZGEMM_DEFAULT_UNROLL_N 1
  1663. #define XGEMM_DEFAULT_UNROLL_N 1
  1664. #define SGEMM_DEFAULT_P sgemm_p
  1665. #define SGEMM_DEFAULT_R sgemm_r
  1666. #define DGEMM_DEFAULT_P dgemm_p
  1667. #define DGEMM_DEFAULT_R dgemm_r
  1668. #define QGEMM_DEFAULT_P qgemm_p
  1669. #define QGEMM_DEFAULT_R qgemm_r
  1670. #define CGEMM_DEFAULT_P cgemm_p
  1671. #define CGEMM_DEFAULT_R cgemm_r
  1672. #define ZGEMM_DEFAULT_P zgemm_p
  1673. #define ZGEMM_DEFAULT_R zgemm_r
  1674. #define XGEMM_DEFAULT_P xgemm_p
  1675. #define XGEMM_DEFAULT_R xgemm_r
  1676. #define SGEMM_DEFAULT_Q 256
  1677. #define DGEMM_DEFAULT_Q 256
  1678. #define QGEMM_DEFAULT_Q 256
  1679. #define CGEMM_DEFAULT_Q 256
  1680. #define ZGEMM_DEFAULT_Q 256
  1681. #define XGEMM_DEFAULT_Q 256
  1682. #endif
  1683. #ifdef ITANIUM2
  1684. #define SNUMOPT 4
  1685. #define DNUMOPT 4
  1686. #define GEMM_DEFAULT_OFFSET_A 0
  1687. #define GEMM_DEFAULT_OFFSET_B 128
  1688. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  1689. #define SGEMM_DEFAULT_UNROLL_M 8
  1690. #define SGEMM_DEFAULT_UNROLL_N 8
  1691. #define DGEMM_DEFAULT_UNROLL_M 8
  1692. #define DGEMM_DEFAULT_UNROLL_N 8
  1693. #define QGEMM_DEFAULT_UNROLL_M 8
  1694. #define QGEMM_DEFAULT_UNROLL_N 8
  1695. #define CGEMM_DEFAULT_UNROLL_M 4
  1696. #define CGEMM_DEFAULT_UNROLL_N 4
  1697. #define ZGEMM_DEFAULT_UNROLL_M 4
  1698. #define ZGEMM_DEFAULT_UNROLL_N 4
  1699. #define XGEMM_DEFAULT_UNROLL_M 4
  1700. #define XGEMM_DEFAULT_UNROLL_N 4
  1701. #define SGEMM_DEFAULT_P sgemm_p
  1702. #define DGEMM_DEFAULT_P dgemm_p
  1703. #define QGEMM_DEFAULT_P qgemm_p
  1704. #define CGEMM_DEFAULT_P cgemm_p
  1705. #define ZGEMM_DEFAULT_P zgemm_p
  1706. #define XGEMM_DEFAULT_P xgemm_p
  1707. #define SGEMM_DEFAULT_Q 1024
  1708. #define DGEMM_DEFAULT_Q 1024
  1709. #define QGEMM_DEFAULT_Q 1024
  1710. #define CGEMM_DEFAULT_Q 1024
  1711. #define ZGEMM_DEFAULT_Q 1024
  1712. #define XGEMM_DEFAULT_Q 1024
  1713. #define SGEMM_DEFAULT_R sgemm_r
  1714. #define DGEMM_DEFAULT_R dgemm_r
  1715. #define QGEMM_DEFAULT_R qgemm_r
  1716. #define CGEMM_DEFAULT_R cgemm_r
  1717. #define ZGEMM_DEFAULT_R zgemm_r
  1718. #define XGEMM_DEFAULT_R xgemm_r
  1719. #define SYMV_P 16
  1720. #define GETRF_FACTOR 0.65
  1721. #endif
  1722. #if defined(EV4) || defined(EV5) || defined(EV6)
  1723. #ifdef EV4
  1724. #define SNUMOPT 1
  1725. #define DNUMOPT 1
  1726. #else
  1727. #define SNUMOPT 2
  1728. #define DNUMOPT 2
  1729. #endif
  1730. #define GEMM_DEFAULT_OFFSET_A 512
  1731. #define GEMM_DEFAULT_OFFSET_B 512
  1732. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1733. #define SGEMM_DEFAULT_UNROLL_M 4
  1734. #define SGEMM_DEFAULT_UNROLL_N 4
  1735. #define DGEMM_DEFAULT_UNROLL_M 4
  1736. #define DGEMM_DEFAULT_UNROLL_N 4
  1737. #define CGEMM_DEFAULT_UNROLL_M 2
  1738. #define CGEMM_DEFAULT_UNROLL_N 2
  1739. #define ZGEMM_DEFAULT_UNROLL_M 2
  1740. #define ZGEMM_DEFAULT_UNROLL_N 2
  1741. #define SYMV_P 8
  1742. #ifdef EV4
  1743. #define SGEMM_DEFAULT_P 32
  1744. #define SGEMM_DEFAULT_Q 112
  1745. #define SGEMM_DEFAULT_R 256
  1746. #define DGEMM_DEFAULT_P 32
  1747. #define DGEMM_DEFAULT_Q 56
  1748. #define DGEMM_DEFAULT_R 256
  1749. #define CGEMM_DEFAULT_P 32
  1750. #define CGEMM_DEFAULT_Q 64
  1751. #define CGEMM_DEFAULT_R 240
  1752. #define ZGEMM_DEFAULT_P 32
  1753. #define ZGEMM_DEFAULT_Q 32
  1754. #define ZGEMM_DEFAULT_R 240
  1755. #endif
  1756. #ifdef EV5
  1757. #define SGEMM_DEFAULT_P 64
  1758. #define SGEMM_DEFAULT_Q 256
  1759. #define DGEMM_DEFAULT_P 64
  1760. #define DGEMM_DEFAULT_Q 128
  1761. #define CGEMM_DEFAULT_P 64
  1762. #define CGEMM_DEFAULT_Q 128
  1763. #define ZGEMM_DEFAULT_P 64
  1764. #define ZGEMM_DEFAULT_Q 64
  1765. #endif
  1766. #ifdef EV6
  1767. #define SGEMM_DEFAULT_P 256
  1768. #define SGEMM_DEFAULT_Q 512
  1769. #define DGEMM_DEFAULT_P 256
  1770. #define DGEMM_DEFAULT_Q 256
  1771. #define CGEMM_DEFAULT_P 256
  1772. #define CGEMM_DEFAULT_Q 256
  1773. #define ZGEMM_DEFAULT_P 128
  1774. #define ZGEMM_DEFAULT_Q 256
  1775. #endif
  1776. #endif
  1777. #ifdef CELL
  1778. #define SNUMOPT 2
  1779. #define DNUMOPT 2
  1780. #define GEMM_DEFAULT_OFFSET_A 0
  1781. #define GEMM_DEFAULT_OFFSET_B 8192
  1782. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  1783. #define SGEMM_DEFAULT_UNROLL_M 16
  1784. #define SGEMM_DEFAULT_UNROLL_N 4
  1785. #define DGEMM_DEFAULT_UNROLL_M 4
  1786. #define DGEMM_DEFAULT_UNROLL_N 4
  1787. #define CGEMM_DEFAULT_UNROLL_M 8
  1788. #define CGEMM_DEFAULT_UNROLL_N 2
  1789. #define ZGEMM_DEFAULT_UNROLL_M 2
  1790. #define ZGEMM_DEFAULT_UNROLL_N 2
  1791. #define SGEMM_DEFAULT_P 128
  1792. #define DGEMM_DEFAULT_P 128
  1793. #define CGEMM_DEFAULT_P 128
  1794. #define ZGEMM_DEFAULT_P 128
  1795. #define SGEMM_DEFAULT_Q 512
  1796. #define DGEMM_DEFAULT_Q 256
  1797. #define CGEMM_DEFAULT_Q 256
  1798. #define ZGEMM_DEFAULT_Q 128
  1799. #define SYMV_P 4
  1800. #endif
  1801. #ifdef PPCG4
  1802. #define GEMM_DEFAULT_OFFSET_A 0
  1803. #define GEMM_DEFAULT_OFFSET_B 1024
  1804. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1805. #define SGEMM_DEFAULT_UNROLL_M 16
  1806. #define SGEMM_DEFAULT_UNROLL_N 4
  1807. #define DGEMM_DEFAULT_UNROLL_M 4
  1808. #define DGEMM_DEFAULT_UNROLL_N 4
  1809. #define CGEMM_DEFAULT_UNROLL_M 2
  1810. #define CGEMM_DEFAULT_UNROLL_N 2
  1811. #define ZGEMM_DEFAULT_UNROLL_M 2
  1812. #define ZGEMM_DEFAULT_UNROLL_N 2
  1813. #define SGEMM_DEFAULT_P 256
  1814. #define DGEMM_DEFAULT_P 128
  1815. #define CGEMM_DEFAULT_P 128
  1816. #define ZGEMM_DEFAULT_P 64
  1817. #define SGEMM_DEFAULT_Q 256
  1818. #define DGEMM_DEFAULT_Q 256
  1819. #define CGEMM_DEFAULT_Q 256
  1820. #define ZGEMM_DEFAULT_Q 256
  1821. #define SYMV_P 4
  1822. #endif
  1823. #ifdef PPC970
  1824. #define SNUMOPT 4
  1825. #define DNUMOPT 4
  1826. #define GEMM_DEFAULT_OFFSET_A 2688
  1827. #define GEMM_DEFAULT_OFFSET_B 3072
  1828. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1829. #if defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
  1830. #define SGEMM_DEFAULT_UNROLL_M 4
  1831. #else
  1832. #define SGEMM_DEFAULT_UNROLL_M 16
  1833. #endif
  1834. #define SGEMM_DEFAULT_UNROLL_N 4
  1835. #define DGEMM_DEFAULT_UNROLL_M 4
  1836. #define DGEMM_DEFAULT_UNROLL_N 4
  1837. #if defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
  1838. #define CGEMM_DEFAULT_UNROLL_M 2
  1839. #else
  1840. #define CGEMM_DEFAULT_UNROLL_M 8
  1841. #endif
  1842. #define CGEMM_DEFAULT_UNROLL_N 2
  1843. #define ZGEMM_DEFAULT_UNROLL_M 2
  1844. #define ZGEMM_DEFAULT_UNROLL_N 2
  1845. #if defined(OS_LINUX) || defined(OS_DARWIN) || defined(OS_FREEBSD)
  1846. #if L2_SIZE == 1024976
  1847. #define SGEMM_DEFAULT_P 320
  1848. #define DGEMM_DEFAULT_P 256
  1849. #define CGEMM_DEFAULT_P 256
  1850. #define ZGEMM_DEFAULT_P 256
  1851. #else
  1852. #define SGEMM_DEFAULT_P 176
  1853. #define DGEMM_DEFAULT_P 176
  1854. #define CGEMM_DEFAULT_P 176
  1855. #define ZGEMM_DEFAULT_P 176
  1856. #endif
  1857. #endif
  1858. #define SGEMM_DEFAULT_Q 512
  1859. #define DGEMM_DEFAULT_Q 256
  1860. #define CGEMM_DEFAULT_Q 256
  1861. #define ZGEMM_DEFAULT_Q 128
  1862. #define SYMV_P 4
  1863. #endif
  1864. #ifdef PPC440
  1865. #define SNUMOPT 2
  1866. #define DNUMOPT 2
  1867. #define GEMM_DEFAULT_OFFSET_A (32 * 0)
  1868. #define GEMM_DEFAULT_OFFSET_B (32 * 0)
  1869. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1870. #define SGEMM_DEFAULT_UNROLL_M 4
  1871. #define SGEMM_DEFAULT_UNROLL_N 4
  1872. #define DGEMM_DEFAULT_UNROLL_M 4
  1873. #define DGEMM_DEFAULT_UNROLL_N 4
  1874. #define CGEMM_DEFAULT_UNROLL_M 2
  1875. #define CGEMM_DEFAULT_UNROLL_N 2
  1876. #define ZGEMM_DEFAULT_UNROLL_M 2
  1877. #define ZGEMM_DEFAULT_UNROLL_N 2
  1878. #define SGEMM_DEFAULT_P 512
  1879. #define DGEMM_DEFAULT_P 512
  1880. #define CGEMM_DEFAULT_P 512
  1881. #define ZGEMM_DEFAULT_P 512
  1882. #define SGEMM_DEFAULT_Q 1024
  1883. #define DGEMM_DEFAULT_Q 512
  1884. #define CGEMM_DEFAULT_Q 512
  1885. #define ZGEMM_DEFAULT_Q 256
  1886. #define SGEMM_DEFAULT_R SGEMM_DEFAULT_P
  1887. #define DGEMM_DEFAULT_R DGEMM_DEFAULT_P
  1888. #define CGEMM_DEFAULT_R CGEMM_DEFAULT_P
  1889. #define ZGEMM_DEFAULT_R ZGEMM_DEFAULT_P
  1890. #define SYMV_P 4
  1891. #endif
  1892. #ifdef PPC440FP2
  1893. #define SNUMOPT 4
  1894. #define DNUMOPT 4
  1895. #define GEMM_DEFAULT_OFFSET_A (32 * 0)
  1896. #define GEMM_DEFAULT_OFFSET_B (32 * 0)
  1897. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1898. #define SGEMM_DEFAULT_UNROLL_M 8
  1899. #define SGEMM_DEFAULT_UNROLL_N 4
  1900. #define DGEMM_DEFAULT_UNROLL_M 8
  1901. #define DGEMM_DEFAULT_UNROLL_N 4
  1902. #define CGEMM_DEFAULT_UNROLL_M 4
  1903. #define CGEMM_DEFAULT_UNROLL_N 2
  1904. #define ZGEMM_DEFAULT_UNROLL_M 4
  1905. #define ZGEMM_DEFAULT_UNROLL_N 2
  1906. #define SGEMM_DEFAULT_P 128
  1907. #define DGEMM_DEFAULT_P 128
  1908. #define CGEMM_DEFAULT_P 128
  1909. #define ZGEMM_DEFAULT_P 128
  1910. #if 1
  1911. #define SGEMM_DEFAULT_Q 4096
  1912. #define DGEMM_DEFAULT_Q 3072
  1913. #define CGEMM_DEFAULT_Q 2048
  1914. #define ZGEMM_DEFAULT_Q 1024
  1915. #else
  1916. #define SGEMM_DEFAULT_Q 512
  1917. #define DGEMM_DEFAULT_Q 256
  1918. #define CGEMM_DEFAULT_Q 256
  1919. #define ZGEMM_DEFAULT_Q 128
  1920. #endif
  1921. #define SYMV_P 4
  1922. #endif
  1923. #if defined(POWER3) || defined(POWER4) || defined(POWER5)
  1924. #define GEMM_DEFAULT_OFFSET_A 0
  1925. #define GEMM_DEFAULT_OFFSET_B 2048
  1926. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  1927. #define SGEMM_DEFAULT_UNROLL_M 4
  1928. #define SGEMM_DEFAULT_UNROLL_N 4
  1929. #define DGEMM_DEFAULT_UNROLL_M 4
  1930. #define DGEMM_DEFAULT_UNROLL_N 4
  1931. #define CGEMM_DEFAULT_UNROLL_M 2
  1932. #define CGEMM_DEFAULT_UNROLL_N 2
  1933. #define ZGEMM_DEFAULT_UNROLL_M 2
  1934. #define ZGEMM_DEFAULT_UNROLL_N 2
  1935. #ifdef POWER3
  1936. #define SNUMOPT 4
  1937. #define DNUMOPT 4
  1938. #define SGEMM_DEFAULT_P 256
  1939. #define SGEMM_DEFAULT_Q 432
  1940. #define SGEMM_DEFAULT_R 1012
  1941. #define DGEMM_DEFAULT_P 256
  1942. #define DGEMM_DEFAULT_Q 216
  1943. #define DGEMM_DEFAULT_R 1012
  1944. #define CGEMM_DEFAULT_P 256
  1945. #define CGEMM_DEFAULT_Q 104
  1946. #define CGEMM_DEFAULT_R 1012
  1947. #define ZGEMM_DEFAULT_P 256
  1948. #define ZGEMM_DEFAULT_Q 104
  1949. #define ZGEMM_DEFAULT_R 1012
  1950. #endif
  1951. #if defined(POWER4)
  1952. #ifdef ALLOC_HUGETLB
  1953. #define SGEMM_DEFAULT_P 184
  1954. #define DGEMM_DEFAULT_P 184
  1955. #define CGEMM_DEFAULT_P 184
  1956. #define ZGEMM_DEFAULT_P 184
  1957. #else
  1958. #define SGEMM_DEFAULT_P 144
  1959. #define DGEMM_DEFAULT_P 144
  1960. #define CGEMM_DEFAULT_P 144
  1961. #define ZGEMM_DEFAULT_P 144
  1962. #endif
  1963. #define SGEMM_DEFAULT_Q 256
  1964. #define CGEMM_DEFAULT_Q 256
  1965. #define DGEMM_DEFAULT_Q 256
  1966. #define ZGEMM_DEFAULT_Q 256
  1967. #endif
  1968. #if defined(POWER5)
  1969. #ifdef ALLOC_HUGETLB
  1970. #define SGEMM_DEFAULT_P 512
  1971. #define DGEMM_DEFAULT_P 256
  1972. #define CGEMM_DEFAULT_P 256
  1973. #define ZGEMM_DEFAULT_P 128
  1974. #else
  1975. #define SGEMM_DEFAULT_P 320
  1976. #define DGEMM_DEFAULT_P 160
  1977. #define CGEMM_DEFAULT_P 160
  1978. #define ZGEMM_DEFAULT_P 80
  1979. #endif
  1980. #define SGEMM_DEFAULT_Q 256
  1981. #define CGEMM_DEFAULT_Q 256
  1982. #define DGEMM_DEFAULT_Q 256
  1983. #define ZGEMM_DEFAULT_Q 256
  1984. #endif
  1985. #define SYMV_P 8
  1986. #endif
  1987. #if defined(POWER6)
  1988. #define SNUMOPT 4
  1989. #define DNUMOPT 4
  1990. #define GEMM_DEFAULT_OFFSET_A 384
  1991. #define GEMM_DEFAULT_OFFSET_B 1024
  1992. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  1993. #define SGEMM_DEFAULT_UNROLL_M 4
  1994. #define SGEMM_DEFAULT_UNROLL_N 4
  1995. #define DGEMM_DEFAULT_UNROLL_M 4
  1996. #define DGEMM_DEFAULT_UNROLL_N 4
  1997. #define CGEMM_DEFAULT_UNROLL_M 2
  1998. #define CGEMM_DEFAULT_UNROLL_N 4
  1999. #define ZGEMM_DEFAULT_UNROLL_M 2
  2000. #define ZGEMM_DEFAULT_UNROLL_N 4
  2001. #define SGEMM_DEFAULT_P 992
  2002. #define DGEMM_DEFAULT_P 480
  2003. #define CGEMM_DEFAULT_P 488
  2004. #define ZGEMM_DEFAULT_P 248
  2005. #define SGEMM_DEFAULT_Q 504
  2006. #define DGEMM_DEFAULT_Q 504
  2007. #define CGEMM_DEFAULT_Q 400
  2008. #define ZGEMM_DEFAULT_Q 400
  2009. #define SYMV_P 8
  2010. #endif
  2011. #if defined(POWER8)
  2012. #define SNUMOPT 16
  2013. #define DNUMOPT 8
  2014. #define GEMM_DEFAULT_OFFSET_A 0
  2015. #define GEMM_DEFAULT_OFFSET_B 65536
  2016. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2017. #if defined(__32BIT__)
  2018. #warning using BINARY32==POWER6
  2019. #define SGEMM_DEFAULT_UNROLL_M 4
  2020. #define SGEMM_DEFAULT_UNROLL_N 4
  2021. #define DGEMM_DEFAULT_UNROLL_M 4
  2022. #define DGEMM_DEFAULT_UNROLL_N 4
  2023. #define CGEMM_DEFAULT_UNROLL_M 2
  2024. #define CGEMM_DEFAULT_UNROLL_N 4
  2025. #define ZGEMM_DEFAULT_UNROLL_M 2
  2026. #define ZGEMM_DEFAULT_UNROLL_N 4
  2027. #else
  2028. #define SGEMM_DEFAULT_UNROLL_M 16
  2029. #define SGEMM_DEFAULT_UNROLL_N 8
  2030. #define DGEMM_DEFAULT_UNROLL_M 16
  2031. #define DGEMM_DEFAULT_UNROLL_N 4
  2032. #define CGEMM_DEFAULT_UNROLL_M 8
  2033. #define CGEMM_DEFAULT_UNROLL_N 4
  2034. #define ZGEMM_DEFAULT_UNROLL_M 8
  2035. #define ZGEMM_DEFAULT_UNROLL_N 2
  2036. #endif
  2037. #define SGEMM_DEFAULT_P 1280UL
  2038. #define DGEMM_DEFAULT_P 640UL
  2039. #define CGEMM_DEFAULT_P 640UL
  2040. #define ZGEMM_DEFAULT_P 320UL
  2041. #define SGEMM_DEFAULT_Q 640UL
  2042. #define DGEMM_DEFAULT_Q 720UL
  2043. #define CGEMM_DEFAULT_Q 640UL
  2044. #define ZGEMM_DEFAULT_Q 640UL
  2045. #if 0
  2046. #define SGEMM_DEFAULT_R SGEMM_DEFAULT_P
  2047. #define DGEMM_DEFAULT_R DGEMM_DEFAULT_P
  2048. #define CGEMM_DEFAULT_R CGEMM_DEFAULT_P
  2049. #define ZGEMM_DEFAULT_R ZGEMM_DEFAULT_P
  2050. #endif
  2051. #define SGEMM_DEFAULT_R 4096
  2052. #define DGEMM_DEFAULT_R 4096
  2053. #define CGEMM_DEFAULT_R 4096
  2054. #define ZGEMM_DEFAULT_R 4096
  2055. #define SYMV_P 8
  2056. #endif
  2057. #if defined(POWER9)
  2058. #define SNUMOPT 16
  2059. #define DNUMOPT 8
  2060. #define GEMM_DEFAULT_OFFSET_A 0
  2061. #define GEMM_DEFAULT_OFFSET_B 65536
  2062. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2063. #define SWITCH_RATIO 16
  2064. #define GEMM_PREFERED_SIZE 16
  2065. #define SGEMM_DEFAULT_UNROLL_M 16
  2066. #define SGEMM_DEFAULT_UNROLL_N 8
  2067. #define DGEMM_DEFAULT_UNROLL_M 16
  2068. #define DGEMM_DEFAULT_UNROLL_N 4
  2069. #define CGEMM_DEFAULT_UNROLL_M 8
  2070. #define CGEMM_DEFAULT_UNROLL_N 4
  2071. #define ZGEMM_DEFAULT_UNROLL_M 8
  2072. #define ZGEMM_DEFAULT_UNROLL_N 2
  2073. #define SGEMM_DEFAULT_P 832
  2074. #define DGEMM_DEFAULT_P 128
  2075. #define CGEMM_DEFAULT_P 512
  2076. #define ZGEMM_DEFAULT_P 256
  2077. #define SGEMM_DEFAULT_Q 1026
  2078. #define DGEMM_DEFAULT_Q 384
  2079. #define CGEMM_DEFAULT_Q 1026
  2080. #define ZGEMM_DEFAULT_Q 1026
  2081. #define SGEMM_DEFAULT_R 4096
  2082. #define DGEMM_DEFAULT_R 4096
  2083. #define CGEMM_DEFAULT_R 4096
  2084. #define ZGEMM_DEFAULT_R 4096
  2085. #define SYMV_P 8
  2086. #endif
  2087. #if defined(POWER10)
  2088. #define SNUMOPT 16
  2089. #define DNUMOPT 8
  2090. #define GEMM_DEFAULT_OFFSET_A 0
  2091. #define GEMM_DEFAULT_OFFSET_B 65536
  2092. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2093. #define SWITCH_RATIO 16
  2094. #define GEMM_PREFERED_SIZE 16
  2095. #define SGEMM_DEFAULT_UNROLL_M 16
  2096. #define SGEMM_DEFAULT_UNROLL_N 8
  2097. #define DGEMM_DEFAULT_UNROLL_M 8
  2098. #define DGEMM_DEFAULT_UNROLL_N 8
  2099. #define CGEMM_DEFAULT_UNROLL_M 8
  2100. #define CGEMM_DEFAULT_UNROLL_N 4
  2101. #define ZGEMM_DEFAULT_UNROLL_M 8
  2102. #define ZGEMM_DEFAULT_UNROLL_N 2
  2103. #define SGEMM_DEFAULT_P 512
  2104. #define DGEMM_DEFAULT_P 384
  2105. #define CGEMM_DEFAULT_P 512
  2106. #define ZGEMM_DEFAULT_P 256
  2107. #define SGEMM_DEFAULT_Q 512
  2108. #define DGEMM_DEFAULT_Q 512
  2109. #define CGEMM_DEFAULT_Q 384
  2110. #define ZGEMM_DEFAULT_Q 384
  2111. #define SGEMM_DEFAULT_R 4096
  2112. #define DGEMM_DEFAULT_R 4096
  2113. #define CGEMM_DEFAULT_R 4096
  2114. #define ZGEMM_DEFAULT_R 4096
  2115. #define SYMV_P 8
  2116. #undef SBGEMM_DEFAULT_UNROLL_N
  2117. #undef SBGEMM_DEFAULT_UNROLL_M
  2118. #undef SBGEMM_DEFAULT_P
  2119. #undef SBGEMM_DEFAULT_R
  2120. #undef SBGEMM_DEFAULT_Q
  2121. #define SBGEMM_DEFAULT_UNROLL_M 16
  2122. #define SBGEMM_DEFAULT_UNROLL_N 8
  2123. #define SBGEMM_DEFAULT_P 832
  2124. #define SBGEMM_DEFAULT_Q 1026
  2125. #define SBGEMM_DEFAULT_R 4096
  2126. #endif
  2127. #if defined(SPARC) && defined(V7)
  2128. #define SNUMOPT 4
  2129. #define DNUMOPT 4
  2130. #define GEMM_DEFAULT_OFFSET_A 0
  2131. #define GEMM_DEFAULT_OFFSET_B 2048
  2132. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2133. #define SGEMM_DEFAULT_UNROLL_M 2
  2134. #define SGEMM_DEFAULT_UNROLL_N 8
  2135. #define DGEMM_DEFAULT_UNROLL_M 2
  2136. #define DGEMM_DEFAULT_UNROLL_N 8
  2137. #define CGEMM_DEFAULT_UNROLL_M 1
  2138. #define CGEMM_DEFAULT_UNROLL_N 4
  2139. #define ZGEMM_DEFAULT_UNROLL_M 1
  2140. #define ZGEMM_DEFAULT_UNROLL_N 4
  2141. #define SGEMM_DEFAULT_P 256
  2142. #define DGEMM_DEFAULT_P 256
  2143. #define CGEMM_DEFAULT_P 256
  2144. #define ZGEMM_DEFAULT_P 256
  2145. #define SGEMM_DEFAULT_Q 512
  2146. #define DGEMM_DEFAULT_Q 256
  2147. #define CGEMM_DEFAULT_Q 256
  2148. #define ZGEMM_DEFAULT_Q 128
  2149. #define SYMV_P 8
  2150. #define GEMM_THREAD gemm_thread_mn
  2151. #endif
  2152. #if (defined(SPARC) && defined(V9)) || defined(__sparc_v9__)
  2153. #define SNUMOPT 2
  2154. #define DNUMOPT 2
  2155. #define GEMM_DEFAULT_OFFSET_A 0
  2156. #define GEMM_DEFAULT_OFFSET_B 2048
  2157. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2158. #define SGEMM_DEFAULT_UNROLL_M 4
  2159. #define SGEMM_DEFAULT_UNROLL_N 4
  2160. #define DGEMM_DEFAULT_UNROLL_M 4
  2161. #define DGEMM_DEFAULT_UNROLL_N 4
  2162. #define CGEMM_DEFAULT_UNROLL_M 2
  2163. #define CGEMM_DEFAULT_UNROLL_N 2
  2164. #define ZGEMM_DEFAULT_UNROLL_M 2
  2165. #define ZGEMM_DEFAULT_UNROLL_N 2
  2166. #define SGEMM_DEFAULT_P 512
  2167. #define DGEMM_DEFAULT_P 512
  2168. #define CGEMM_DEFAULT_P 512
  2169. #define ZGEMM_DEFAULT_P 512
  2170. #define SGEMM_DEFAULT_Q 1024
  2171. #define DGEMM_DEFAULT_Q 512
  2172. #define CGEMM_DEFAULT_Q 512
  2173. #define ZGEMM_DEFAULT_Q 256
  2174. #define SYMV_P 8
  2175. #endif
  2176. #ifdef SICORTEX
  2177. #define SNUMOPT 2
  2178. #define DNUMOPT 2
  2179. #define GEMM_DEFAULT_OFFSET_A 0
  2180. #define GEMM_DEFAULT_OFFSET_B 0
  2181. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2182. #define SGEMM_DEFAULT_UNROLL_M 2
  2183. #define SGEMM_DEFAULT_UNROLL_N 8
  2184. #define DGEMM_DEFAULT_UNROLL_M 2
  2185. #define DGEMM_DEFAULT_UNROLL_N 8
  2186. #define CGEMM_DEFAULT_UNROLL_M 1
  2187. #define CGEMM_DEFAULT_UNROLL_N 4
  2188. #define ZGEMM_DEFAULT_UNROLL_M 1
  2189. #define ZGEMM_DEFAULT_UNROLL_N 4
  2190. #define SGEMM_DEFAULT_P 108
  2191. #define DGEMM_DEFAULT_P 112
  2192. #define CGEMM_DEFAULT_P 108
  2193. #define ZGEMM_DEFAULT_P 112
  2194. #define SGEMM_DEFAULT_Q 288
  2195. #define DGEMM_DEFAULT_Q 144
  2196. #define CGEMM_DEFAULT_Q 144
  2197. #define ZGEMM_DEFAULT_Q 72
  2198. #define SGEMM_DEFAULT_R 2000
  2199. #define DGEMM_DEFAULT_R 2000
  2200. #define CGEMM_DEFAULT_R 2000
  2201. #define ZGEMM_DEFAULT_R 2000
  2202. #define SYMV_P 16
  2203. #endif
  2204. #if defined(LOONGSON3R4)
  2205. #define SNUMOPT 2
  2206. #define DNUMOPT 2
  2207. #define GEMM_DEFAULT_OFFSET_A 0
  2208. #define GEMM_DEFAULT_OFFSET_B 0
  2209. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2210. #ifdef HAVE_MSA
  2211. #define SGEMM_DEFAULT_UNROLL_M 8
  2212. #define SGEMM_DEFAULT_UNROLL_N 8
  2213. #define DGEMM_DEFAULT_UNROLL_M 8
  2214. #define DGEMM_DEFAULT_UNROLL_N 4
  2215. #define CGEMM_DEFAULT_UNROLL_M 8
  2216. #define CGEMM_DEFAULT_UNROLL_N 4
  2217. #define ZGEMM_DEFAULT_UNROLL_M 4
  2218. #define ZGEMM_DEFAULT_UNROLL_N 4
  2219. #else
  2220. #define SGEMM_DEFAULT_UNROLL_M 8
  2221. #define SGEMM_DEFAULT_UNROLL_N 4
  2222. #define DGEMM_DEFAULT_UNROLL_M 4
  2223. #define DGEMM_DEFAULT_UNROLL_N 4
  2224. #define CGEMM_DEFAULT_UNROLL_M 4
  2225. #define CGEMM_DEFAULT_UNROLL_N 2
  2226. #define ZGEMM_DEFAULT_UNROLL_M 2
  2227. #define ZGEMM_DEFAULT_UNROLL_N 2
  2228. #endif
  2229. #define SGEMM_DEFAULT_P 64
  2230. #define DGEMM_DEFAULT_P 44
  2231. #define CGEMM_DEFAULT_P 64
  2232. #define ZGEMM_DEFAULT_P 32
  2233. #define SGEMM_DEFAULT_Q 192
  2234. #define DGEMM_DEFAULT_Q 92
  2235. #define CGEMM_DEFAULT_Q 128
  2236. #define ZGEMM_DEFAULT_Q 80
  2237. #define SGEMM_DEFAULT_R 640
  2238. #define DGEMM_DEFAULT_R dgemm_r
  2239. #define CGEMM_DEFAULT_R 640
  2240. #define ZGEMM_DEFAULT_R 640
  2241. #define GEMM_OFFSET_A1 0x10000
  2242. #define GEMM_OFFSET_B1 0x100000
  2243. #define SYMV_P 16
  2244. #endif
  2245. #if defined(LOONGSON3R3)
  2246. ////Copy from SICORTEX
  2247. #define SNUMOPT 2
  2248. #define DNUMOPT 2
  2249. #define GEMM_DEFAULT_OFFSET_A 0
  2250. #define GEMM_DEFAULT_OFFSET_B 0
  2251. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2252. #define SGEMM_DEFAULT_UNROLL_M 8
  2253. #define SGEMM_DEFAULT_UNROLL_N 4
  2254. #define DGEMM_DEFAULT_UNROLL_M 4
  2255. #define DGEMM_DEFAULT_UNROLL_N 4
  2256. #define CGEMM_DEFAULT_UNROLL_M 4
  2257. #define CGEMM_DEFAULT_UNROLL_N 2
  2258. #define ZGEMM_DEFAULT_UNROLL_M 2
  2259. #define ZGEMM_DEFAULT_UNROLL_N 2
  2260. #define SGEMM_DEFAULT_P 64
  2261. #define DGEMM_DEFAULT_P 44
  2262. #define CGEMM_DEFAULT_P 64
  2263. #define ZGEMM_DEFAULT_P 32
  2264. #define SGEMM_DEFAULT_Q 192
  2265. #define DGEMM_DEFAULT_Q 92
  2266. #define CGEMM_DEFAULT_Q 128
  2267. #define ZGEMM_DEFAULT_Q 80
  2268. #define SGEMM_DEFAULT_R 640
  2269. #define DGEMM_DEFAULT_R dgemm_r
  2270. #define CGEMM_DEFAULT_R 640
  2271. #define ZGEMM_DEFAULT_R 640
  2272. #define GEMM_OFFSET_A1 0x10000
  2273. #define GEMM_OFFSET_B1 0x100000
  2274. #define SYMV_P 16
  2275. #endif
  2276. #if defined (LOONGSON3R5)
  2277. #define SNUMOPT 2
  2278. #define DNUMOPT 2
  2279. #define GEMM_DEFAULT_OFFSET_A 0
  2280. #define GEMM_DEFAULT_OFFSET_B 0
  2281. #define GEMM_DEFAULT_ALIGN 0x0ffffUL
  2282. #if defined(NO_LASX)
  2283. #define DGEMM_DEFAULT_UNROLL_N 8
  2284. #define DGEMM_DEFAULT_UNROLL_M 2
  2285. #define SGEMM_DEFAULT_UNROLL_N 8
  2286. #define SGEMM_DEFAULT_UNROLL_M 2
  2287. #else
  2288. #define DGEMM_DEFAULT_UNROLL_N 4
  2289. #define DGEMM_DEFAULT_UNROLL_M 16
  2290. #define SGEMM_DEFAULT_UNROLL_N 8
  2291. #define SGEMM_DEFAULT_UNROLL_M 16
  2292. #endif
  2293. #define QGEMM_DEFAULT_UNROLL_N 2
  2294. #define CGEMM_DEFAULT_UNROLL_N 4
  2295. #define ZGEMM_DEFAULT_UNROLL_N 4
  2296. #define XGEMM_DEFAULT_UNROLL_N 1
  2297. #define QGEMM_DEFAULT_UNROLL_M 2
  2298. #define CGEMM_DEFAULT_UNROLL_M 1
  2299. #define ZGEMM_DEFAULT_UNROLL_M 1
  2300. #define XGEMM_DEFAULT_UNROLL_M 1
  2301. #define SGEMM_DEFAULT_P 256
  2302. #define DGEMM_DEFAULT_P 32
  2303. #define CGEMM_DEFAULT_P 128
  2304. #define ZGEMM_DEFAULT_P 128
  2305. #define SGEMM_DEFAULT_R 1024
  2306. #define DGEMM_DEFAULT_R 858
  2307. #define CGEMM_DEFAULT_R 4096
  2308. #define ZGEMM_DEFAULT_R 4096
  2309. #define SGEMM_DEFAULT_Q 256
  2310. #define DGEMM_DEFAULT_Q 152
  2311. #define CGEMM_DEFAULT_Q 128
  2312. #define ZGEMM_DEFAULT_Q 128
  2313. #define SYMV_P 16
  2314. #endif
  2315. #ifdef LOONGSON2K1000
  2316. #define GEMM_DEFAULT_OFFSET_A 0
  2317. #define GEMM_DEFAULT_OFFSET_B 0
  2318. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2319. #define SGEMM_DEFAULT_UNROLL_M 2
  2320. #define SGEMM_DEFAULT_UNROLL_N 8
  2321. #define DGEMM_DEFAULT_UNROLL_M 2
  2322. #define DGEMM_DEFAULT_UNROLL_N 8
  2323. #define CGEMM_DEFAULT_UNROLL_M 1
  2324. #define CGEMM_DEFAULT_UNROLL_N 4
  2325. #define ZGEMM_DEFAULT_UNROLL_M 1
  2326. #define ZGEMM_DEFAULT_UNROLL_N 4
  2327. #define SGEMM_DEFAULT_P 128
  2328. #define DGEMM_DEFAULT_P 128
  2329. #define CGEMM_DEFAULT_P 96
  2330. #define ZGEMM_DEFAULT_P 64
  2331. #define SGEMM_DEFAULT_Q 240
  2332. #define DGEMM_DEFAULT_Q 120
  2333. #define CGEMM_DEFAULT_Q 120
  2334. #define ZGEMM_DEFAULT_Q 120
  2335. #define SGEMM_DEFAULT_R 12288
  2336. #define DGEMM_DEFAULT_R 8192
  2337. #define CGEMM_DEFAULT_R 4096
  2338. #define ZGEMM_DEFAULT_R 4096
  2339. #define SYMV_P 16
  2340. #endif
  2341. #ifdef LOONGSONGENERIC
  2342. #define GEMM_DEFAULT_OFFSET_A 0
  2343. #define GEMM_DEFAULT_OFFSET_B 0
  2344. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2345. #define SGEMM_DEFAULT_UNROLL_M 2
  2346. #define SGEMM_DEFAULT_UNROLL_N 8
  2347. #define DGEMM_DEFAULT_UNROLL_M 2
  2348. #define DGEMM_DEFAULT_UNROLL_N 8
  2349. #define CGEMM_DEFAULT_UNROLL_M 1
  2350. #define CGEMM_DEFAULT_UNROLL_N 4
  2351. #define ZGEMM_DEFAULT_UNROLL_M 1
  2352. #define ZGEMM_DEFAULT_UNROLL_N 4
  2353. #define SGEMM_DEFAULT_P 128
  2354. #define DGEMM_DEFAULT_P 128
  2355. #define CGEMM_DEFAULT_P 96
  2356. #define ZGEMM_DEFAULT_P 64
  2357. #define SGEMM_DEFAULT_Q 240
  2358. #define DGEMM_DEFAULT_Q 120
  2359. #define CGEMM_DEFAULT_Q 120
  2360. #define ZGEMM_DEFAULT_Q 120
  2361. #define SGEMM_DEFAULT_R 12288
  2362. #define DGEMM_DEFAULT_R 8192
  2363. #define CGEMM_DEFAULT_R 4096
  2364. #define ZGEMM_DEFAULT_R 4096
  2365. #define SYMV_P 16
  2366. #endif
  2367. #if defined(MIPS64_GENERIC) || defined(P5600) || defined(MIPS1004K) || defined(MIPS24K) || defined(I6400) || defined(P6600) || defined(I6500)
  2368. #define SNUMOPT 2
  2369. #define DNUMOPT 2
  2370. #define GEMM_DEFAULT_OFFSET_A 0
  2371. #define GEMM_DEFAULT_OFFSET_B 0
  2372. #define GEMM_DEFAULT_ALIGN (BLASLONG) 0x03fffUL
  2373. #if defined(HAVE_MSA)
  2374. #define SGEMM_DEFAULT_UNROLL_M 8
  2375. #define SGEMM_DEFAULT_UNROLL_N 8
  2376. #define DGEMM_DEFAULT_UNROLL_M 8
  2377. #define DGEMM_DEFAULT_UNROLL_N 4
  2378. #define CGEMM_DEFAULT_UNROLL_M 8
  2379. #define CGEMM_DEFAULT_UNROLL_N 4
  2380. #define ZGEMM_DEFAULT_UNROLL_M 4
  2381. #define ZGEMM_DEFAULT_UNROLL_N 4
  2382. #else
  2383. #define SGEMM_DEFAULT_UNROLL_M 2
  2384. #define SGEMM_DEFAULT_UNROLL_N 2
  2385. #define DGEMM_DEFAULT_UNROLL_M 2
  2386. #define DGEMM_DEFAULT_UNROLL_N 2
  2387. #define CGEMM_DEFAULT_UNROLL_M 2
  2388. #define CGEMM_DEFAULT_UNROLL_N 2
  2389. #define ZGEMM_DEFAULT_UNROLL_M 2
  2390. #define ZGEMM_DEFAULT_UNROLL_N 2
  2391. #endif
  2392. #define SGEMM_DEFAULT_P 128
  2393. #define DGEMM_DEFAULT_P 128
  2394. #define CGEMM_DEFAULT_P 96
  2395. #define ZGEMM_DEFAULT_P 64
  2396. #define SGEMM_DEFAULT_Q 240
  2397. #define DGEMM_DEFAULT_Q 120
  2398. #define CGEMM_DEFAULT_Q 120
  2399. #define ZGEMM_DEFAULT_Q 120
  2400. #define SGEMM_DEFAULT_R 12288
  2401. #define DGEMM_DEFAULT_R 8192
  2402. #define CGEMM_DEFAULT_R 4096
  2403. #define ZGEMM_DEFAULT_R 4096
  2404. #define SYMV_P 16
  2405. #endif
  2406. #ifdef RISCV64_GENERIC
  2407. #define GEMM_DEFAULT_OFFSET_A 0
  2408. #define GEMM_DEFAULT_OFFSET_B 0
  2409. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2410. #define SGEMM_DEFAULT_UNROLL_M 2
  2411. #define SGEMM_DEFAULT_UNROLL_N 2
  2412. #define DGEMM_DEFAULT_UNROLL_M 2
  2413. #define DGEMM_DEFAULT_UNROLL_N 2
  2414. #define CGEMM_DEFAULT_UNROLL_M 2
  2415. #define CGEMM_DEFAULT_UNROLL_N 2
  2416. #define ZGEMM_DEFAULT_UNROLL_M 2
  2417. #define ZGEMM_DEFAULT_UNROLL_N 2
  2418. #define SGEMM_DEFAULT_P 128
  2419. #define DGEMM_DEFAULT_P 128
  2420. #define CGEMM_DEFAULT_P 96
  2421. #define ZGEMM_DEFAULT_P 64
  2422. #define SGEMM_DEFAULT_Q 240
  2423. #define DGEMM_DEFAULT_Q 120
  2424. #define CGEMM_DEFAULT_Q 120
  2425. #define ZGEMM_DEFAULT_Q 120
  2426. #define SGEMM_DEFAULT_R 12288
  2427. #define DGEMM_DEFAULT_R 8192
  2428. #define CGEMM_DEFAULT_R 4096
  2429. #define ZGEMM_DEFAULT_R 4096
  2430. #define SYMV_P 16
  2431. #define GEMM_DEFAULT_OFFSET_A 0
  2432. #define GEMM_DEFAULT_OFFSET_B 0
  2433. #endif
  2434. #ifdef C910V
  2435. #define GEMM_DEFAULT_OFFSET_A 0
  2436. #define GEMM_DEFAULT_OFFSET_B 0
  2437. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2438. #define SGEMM_DEFAULT_UNROLL_M 16
  2439. #define SGEMM_DEFAULT_UNROLL_N 4
  2440. #define DGEMM_DEFAULT_UNROLL_M 8
  2441. #define DGEMM_DEFAULT_UNROLL_N 4
  2442. #define CGEMM_DEFAULT_UNROLL_M 2
  2443. #define CGEMM_DEFAULT_UNROLL_N 2
  2444. #define ZGEMM_DEFAULT_UNROLL_M 2
  2445. #define ZGEMM_DEFAULT_UNROLL_N 2
  2446. #define SGEMM_DEFAULT_P 160
  2447. #define DGEMM_DEFAULT_P 160
  2448. #define CGEMM_DEFAULT_P 96
  2449. #define ZGEMM_DEFAULT_P 64
  2450. #define SGEMM_DEFAULT_Q 240
  2451. #define DGEMM_DEFAULT_Q 128
  2452. #define CGEMM_DEFAULT_Q 120
  2453. #define ZGEMM_DEFAULT_Q 120
  2454. #define SGEMM_DEFAULT_R 12288
  2455. #define DGEMM_DEFAULT_R 8192
  2456. #define CGEMM_DEFAULT_R 4096
  2457. #define ZGEMM_DEFAULT_R 4096
  2458. #define SYMV_P 16
  2459. #define GEMM_DEFAULT_OFFSET_A 0
  2460. #define GEMM_DEFAULT_OFFSET_B 0
  2461. #endif
  2462. #ifdef ARMV7
  2463. #define SNUMOPT 2
  2464. #define DNUMOPT 2
  2465. #define GEMM_DEFAULT_OFFSET_A 0
  2466. #define GEMM_DEFAULT_OFFSET_B 0
  2467. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2468. #define SGEMM_DEFAULT_UNROLL_M 4
  2469. #define SGEMM_DEFAULT_UNROLL_N 4
  2470. #define DGEMM_DEFAULT_UNROLL_M 4
  2471. #define DGEMM_DEFAULT_UNROLL_N 4
  2472. #define CGEMM_DEFAULT_UNROLL_M 2
  2473. #define CGEMM_DEFAULT_UNROLL_N 2
  2474. #define ZGEMM_DEFAULT_UNROLL_M 2
  2475. #define ZGEMM_DEFAULT_UNROLL_N 2
  2476. #define SGEMM_DEFAULT_P 128
  2477. #define DGEMM_DEFAULT_P 128
  2478. #define CGEMM_DEFAULT_P 96
  2479. #define ZGEMM_DEFAULT_P 64
  2480. #define SGEMM_DEFAULT_Q 240
  2481. #define DGEMM_DEFAULT_Q 120
  2482. #define CGEMM_DEFAULT_Q 120
  2483. #define ZGEMM_DEFAULT_Q 120
  2484. #define SGEMM_DEFAULT_R 12288
  2485. #define DGEMM_DEFAULT_R 8192
  2486. #define CGEMM_DEFAULT_R 4096
  2487. #define ZGEMM_DEFAULT_R 4096
  2488. #define SYMV_P 16
  2489. #endif
  2490. #if defined(ARMV6)
  2491. #define SNUMOPT 2
  2492. #define DNUMOPT 2
  2493. #define GEMM_DEFAULT_OFFSET_A 0
  2494. #define GEMM_DEFAULT_OFFSET_B 0
  2495. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2496. #define SGEMM_DEFAULT_UNROLL_M 4
  2497. #define SGEMM_DEFAULT_UNROLL_N 2
  2498. #define DGEMM_DEFAULT_UNROLL_M 4
  2499. #define DGEMM_DEFAULT_UNROLL_N 2
  2500. #define CGEMM_DEFAULT_UNROLL_M 2
  2501. #define CGEMM_DEFAULT_UNROLL_N 2
  2502. #define ZGEMM_DEFAULT_UNROLL_M 2
  2503. #define ZGEMM_DEFAULT_UNROLL_N 2
  2504. #define SGEMM_DEFAULT_P 128
  2505. #define DGEMM_DEFAULT_P 128
  2506. #define CGEMM_DEFAULT_P 96
  2507. #define ZGEMM_DEFAULT_P 64
  2508. #define SGEMM_DEFAULT_Q 240
  2509. #define DGEMM_DEFAULT_Q 120
  2510. #define CGEMM_DEFAULT_Q 120
  2511. #define ZGEMM_DEFAULT_Q 120
  2512. #define SGEMM_DEFAULT_R 12288
  2513. #define DGEMM_DEFAULT_R 8192
  2514. #define CGEMM_DEFAULT_R 4096
  2515. #define ZGEMM_DEFAULT_R 4096
  2516. #define SYMV_P 16
  2517. #endif
  2518. /* Common ARMv8 parameters */
  2519. #if defined(ARMV8)
  2520. #define SNUMOPT 2
  2521. #define DNUMOPT 2
  2522. #define GEMM_DEFAULT_OFFSET_A 0
  2523. #define GEMM_DEFAULT_OFFSET_B 0
  2524. #ifdef _WIN64
  2525. /* Use explicit casting for win64 as LLP64 datamodel is used */
  2526. #define GEMM_DEFAULT_ALIGN (BLASULONG)0x03fffUL
  2527. #else
  2528. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2529. #endif
  2530. #define SYMV_P 16
  2531. #if defined(CORTEXA57) || defined(CORTEXX1) || \
  2532. defined(CORTEXA72) || defined(CORTEXA73) || \
  2533. defined(FALKOR) || defined(TSV110) || defined(EMAG8180) || defined(VORTEX) || defined(FT2000)
  2534. #define SGEMM_DEFAULT_UNROLL_M 16
  2535. #define SGEMM_DEFAULT_UNROLL_N 4
  2536. #define DGEMM_DEFAULT_UNROLL_M 8
  2537. #define DGEMM_DEFAULT_UNROLL_N 4
  2538. #define CGEMM_DEFAULT_UNROLL_M 8
  2539. #define CGEMM_DEFAULT_UNROLL_N 4
  2540. #define ZGEMM_DEFAULT_UNROLL_M 4
  2541. #define ZGEMM_DEFAULT_UNROLL_N 4
  2542. /*FIXME: this should be using the cache size, but there is currently no easy way to
  2543. query that on ARM. So if getarch counted more than 8 cores we simply assume the host
  2544. is a big desktop or server with abundant cache rather than a phone or embedded device */
  2545. #if NUM_CORES > 8 || defined(TSV110) || defined(EMAG8180) || defined(VORTEX)|| defined(CORTEXX1)
  2546. #define SGEMM_DEFAULT_P 512
  2547. #define DGEMM_DEFAULT_P 256
  2548. #define CGEMM_DEFAULT_P 256
  2549. #define ZGEMM_DEFAULT_P 128
  2550. #define SGEMM_DEFAULT_Q 1024
  2551. #define DGEMM_DEFAULT_Q 512
  2552. #define CGEMM_DEFAULT_Q 512
  2553. #define ZGEMM_DEFAULT_Q 512
  2554. #else
  2555. #define SGEMM_DEFAULT_P 128
  2556. #define DGEMM_DEFAULT_P 160
  2557. #define CGEMM_DEFAULT_P 128
  2558. #define ZGEMM_DEFAULT_P 128
  2559. #define SGEMM_DEFAULT_Q 352
  2560. #define DGEMM_DEFAULT_Q 128
  2561. #define CGEMM_DEFAULT_Q 224
  2562. #define ZGEMM_DEFAULT_Q 112
  2563. #endif
  2564. #define SGEMM_DEFAULT_R 4096
  2565. #define DGEMM_DEFAULT_R 4096
  2566. #define CGEMM_DEFAULT_R 4096
  2567. #define ZGEMM_DEFAULT_R 2048
  2568. #elif defined(CORTEXA53) || defined(CORTEXA55)
  2569. #define SGEMM_DEFAULT_UNROLL_M 8
  2570. #define SGEMM_DEFAULT_UNROLL_N 8
  2571. #define DGEMM_DEFAULT_UNROLL_M 4
  2572. #define DGEMM_DEFAULT_UNROLL_N 4
  2573. #define CGEMM_DEFAULT_UNROLL_M 8
  2574. #define CGEMM_DEFAULT_UNROLL_N 4
  2575. #define ZGEMM_DEFAULT_UNROLL_M 4
  2576. #define ZGEMM_DEFAULT_UNROLL_N 4
  2577. #define SGEMM_DEFAULT_P 256
  2578. #define DGEMM_DEFAULT_P 160
  2579. #define CGEMM_DEFAULT_P 128
  2580. #define ZGEMM_DEFAULT_P 128
  2581. #define SGEMM_DEFAULT_Q 256
  2582. #define DGEMM_DEFAULT_Q 128
  2583. #define CGEMM_DEFAULT_Q 224
  2584. #define ZGEMM_DEFAULT_Q 112
  2585. #define SGEMM_DEFAULT_R 4096
  2586. #define DGEMM_DEFAULT_R 4096
  2587. #define CGEMM_DEFAULT_R 4096
  2588. #define ZGEMM_DEFAULT_R 2048
  2589. #elif defined(THUNDERX)
  2590. #define SGEMM_DEFAULT_UNROLL_M 4
  2591. #define SGEMM_DEFAULT_UNROLL_N 4
  2592. #define DGEMM_DEFAULT_UNROLL_M 2
  2593. #define DGEMM_DEFAULT_UNROLL_N 2
  2594. #define CGEMM_DEFAULT_UNROLL_M 2
  2595. #define CGEMM_DEFAULT_UNROLL_N 2
  2596. #define ZGEMM_DEFAULT_UNROLL_M 2
  2597. #define ZGEMM_DEFAULT_UNROLL_N 2
  2598. #define SGEMM_DEFAULT_P 128
  2599. #define DGEMM_DEFAULT_P 128
  2600. #define CGEMM_DEFAULT_P 96
  2601. #define ZGEMM_DEFAULT_P 64
  2602. #define SGEMM_DEFAULT_Q 240
  2603. #define DGEMM_DEFAULT_Q 120
  2604. #define CGEMM_DEFAULT_Q 120
  2605. #define ZGEMM_DEFAULT_Q 120
  2606. #define SGEMM_DEFAULT_R 12288
  2607. #define DGEMM_DEFAULT_R 8192
  2608. #define CGEMM_DEFAULT_R 4096
  2609. #define ZGEMM_DEFAULT_R 4096
  2610. #elif defined(THUNDERX2T99)
  2611. #define SGEMM_DEFAULT_UNROLL_M 16
  2612. #define SGEMM_DEFAULT_UNROLL_N 4
  2613. #define DGEMM_DEFAULT_UNROLL_M 8
  2614. #define DGEMM_DEFAULT_UNROLL_N 4
  2615. #define CGEMM_DEFAULT_UNROLL_M 8
  2616. #define CGEMM_DEFAULT_UNROLL_N 4
  2617. #define ZGEMM_DEFAULT_UNROLL_M 4
  2618. #define ZGEMM_DEFAULT_UNROLL_N 4
  2619. #define SGEMM_DEFAULT_P 128
  2620. #define DGEMM_DEFAULT_P 160
  2621. #define CGEMM_DEFAULT_P 128
  2622. #define ZGEMM_DEFAULT_P 128
  2623. #define SGEMM_DEFAULT_Q 352
  2624. #define DGEMM_DEFAULT_Q 128
  2625. #define CGEMM_DEFAULT_Q 224
  2626. #define ZGEMM_DEFAULT_Q 112
  2627. #define SGEMM_DEFAULT_R 4096
  2628. #define DGEMM_DEFAULT_R 4096
  2629. #define CGEMM_DEFAULT_R 4096
  2630. #define ZGEMM_DEFAULT_R 4096
  2631. #elif defined(THUNDERX3T110)
  2632. #define SGEMM_DEFAULT_UNROLL_M 16
  2633. #define SGEMM_DEFAULT_UNROLL_N 4
  2634. #define DGEMM_DEFAULT_UNROLL_M 8
  2635. #define DGEMM_DEFAULT_UNROLL_N 4
  2636. #define CGEMM_DEFAULT_UNROLL_M 8
  2637. #define CGEMM_DEFAULT_UNROLL_N 4
  2638. #define ZGEMM_DEFAULT_UNROLL_M 4
  2639. #define ZGEMM_DEFAULT_UNROLL_N 4
  2640. #define SGEMM_DEFAULT_P 128
  2641. #define DGEMM_DEFAULT_P 320
  2642. #define CGEMM_DEFAULT_P 128
  2643. #define ZGEMM_DEFAULT_P 128
  2644. #define SGEMM_DEFAULT_Q 352
  2645. #define DGEMM_DEFAULT_Q 128
  2646. #define CGEMM_DEFAULT_Q 224
  2647. #define ZGEMM_DEFAULT_Q 112
  2648. #define SGEMM_DEFAULT_R 4096
  2649. #define DGEMM_DEFAULT_R 4096
  2650. #define CGEMM_DEFAULT_R 4096
  2651. #define ZGEMM_DEFAULT_R 4096
  2652. #elif defined(NEOVERSEN1)
  2653. #if defined(XDOUBLE) || defined(DOUBLE)
  2654. #define SWITCH_RATIO 8
  2655. #else
  2656. #define SWITCH_RATIO 16
  2657. #endif
  2658. #define SGEMM_DEFAULT_UNROLL_M 16
  2659. #define SGEMM_DEFAULT_UNROLL_N 4
  2660. #define DGEMM_DEFAULT_UNROLL_M 8
  2661. #define DGEMM_DEFAULT_UNROLL_N 4
  2662. #define CGEMM_DEFAULT_UNROLL_M 8
  2663. #define CGEMM_DEFAULT_UNROLL_N 4
  2664. #define ZGEMM_DEFAULT_UNROLL_M 4
  2665. #define ZGEMM_DEFAULT_UNROLL_N 4
  2666. #define SGEMM_DEFAULT_P 128
  2667. #define DGEMM_DEFAULT_P 160
  2668. #define CGEMM_DEFAULT_P 128
  2669. #define ZGEMM_DEFAULT_P 128
  2670. #define SGEMM_DEFAULT_Q 352
  2671. #define DGEMM_DEFAULT_Q 128
  2672. #define CGEMM_DEFAULT_Q 224
  2673. #define ZGEMM_DEFAULT_Q 112
  2674. #define SGEMM_DEFAULT_R 4096
  2675. #define DGEMM_DEFAULT_R 4096
  2676. #define CGEMM_DEFAULT_R 4096
  2677. #define ZGEMM_DEFAULT_R 4096
  2678. #elif defined(NEOVERSEV1) // 256-bit SVE
  2679. #if defined(XDOUBLE) || defined(DOUBLE)
  2680. #define SWITCH_RATIO 8
  2681. #else
  2682. #define SWITCH_RATIO 16
  2683. #endif
  2684. #define SGEMM_DEFAULT_UNROLL_M 16
  2685. #define SGEMM_DEFAULT_UNROLL_N 8
  2686. #define DGEMM_DEFAULT_UNROLL_M 4 // Actually 2VL (8) but kept separate to keep copies separate
  2687. #define DGEMM_DEFAULT_UNROLL_N 8
  2688. #define CGEMM_DEFAULT_UNROLL_M 2
  2689. #define CGEMM_DEFAULT_UNROLL_N 4
  2690. #define CGEMM_DEFAULT_UNROLL_MN 16
  2691. #define ZGEMM_DEFAULT_UNROLL_M 2
  2692. #define ZGEMM_DEFAULT_UNROLL_N 4
  2693. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2694. #define SGEMM_DEFAULT_P 240
  2695. #define DGEMM_DEFAULT_P 240
  2696. #define CGEMM_DEFAULT_P 128
  2697. #define ZGEMM_DEFAULT_P 128
  2698. #define SGEMM_DEFAULT_Q 640
  2699. #define DGEMM_DEFAULT_Q 320
  2700. #define CGEMM_DEFAULT_Q 224
  2701. #define ZGEMM_DEFAULT_Q 112
  2702. #define SGEMM_DEFAULT_R 4096
  2703. #define DGEMM_DEFAULT_R 4096
  2704. #define CGEMM_DEFAULT_R 4096
  2705. #define ZGEMM_DEFAULT_R 4096
  2706. #elif defined(NEOVERSEN2)
  2707. #if defined(XDOUBLE) || defined(DOUBLE)
  2708. #define SWITCH_RATIO 8
  2709. #else
  2710. #define SWITCH_RATIO 16
  2711. #endif
  2712. #undef SBGEMM_ALIGN_K
  2713. #define SBGEMM_ALIGN_K 4
  2714. #undef SBGEMM_DEFAULT_UNROLL_M
  2715. #undef SBGEMM_DEFAULT_UNROLL_N
  2716. #define SBGEMM_DEFAULT_UNROLL_M 8
  2717. #define SBGEMM_DEFAULT_UNROLL_N 4
  2718. #define SGEMM_DEFAULT_UNROLL_M 16
  2719. #define SGEMM_DEFAULT_UNROLL_N 4
  2720. #define DGEMM_DEFAULT_UNROLL_M 8
  2721. #define DGEMM_DEFAULT_UNROLL_N 4
  2722. #define CGEMM_DEFAULT_UNROLL_M 8
  2723. #define CGEMM_DEFAULT_UNROLL_N 4
  2724. #define ZGEMM_DEFAULT_UNROLL_M 4
  2725. #define ZGEMM_DEFAULT_UNROLL_N 4
  2726. #define SGEMM_DEFAULT_P 128
  2727. #define DGEMM_DEFAULT_P 160
  2728. #define CGEMM_DEFAULT_P 128
  2729. #define ZGEMM_DEFAULT_P 128
  2730. #define SGEMM_DEFAULT_Q 352
  2731. #define DGEMM_DEFAULT_Q 128
  2732. #define CGEMM_DEFAULT_Q 224
  2733. #define ZGEMM_DEFAULT_Q 112
  2734. #define SGEMM_DEFAULT_R 4096
  2735. #define DGEMM_DEFAULT_R 4096
  2736. #define CGEMM_DEFAULT_R 4096
  2737. #define ZGEMM_DEFAULT_R 4096
  2738. #elif defined(A64FX) // 512-bit SVE
  2739. /* When all BLAS3 routines are implemeted with SVE, SGEMM_DEFAULT_UNROLL_M should be "sve_vl".
  2740. Until then, just keep it different than DGEMM_DEFAULT_UNROLL_N to keep copy routines in both directions seperated. */
  2741. #define SGEMM_DEFAULT_UNROLL_M 4
  2742. #define SGEMM_DEFAULT_UNROLL_N 8
  2743. /* SGEMM_UNROLL_MN is calculated as max(SGEMM_UNROLL_M, SGEMM_UNROLL_N)
  2744. * Since we don't define SGEMM_UNROLL_M correctly we have to manually set this macro.
  2745. * If SVE size is ever more than 1024, this should be increased also. */
  2746. #define SGEMM_DEFAULT_UNROLL_MN 32
  2747. /* When all BLAS3 routines are implemeted with SVE, DGEMM_DEFAULT_UNROLL_M should be "sve_vl".
  2748. Until then, just keep it different than DGEMM_DEFAULT_UNROLL_N to keep copy routines in both directions seperated. */
  2749. #define DGEMM_DEFAULT_UNROLL_M 2
  2750. #define DGEMM_DEFAULT_UNROLL_N 8
  2751. #define DGEMM_DEFAULT_UNROLL_MN 32
  2752. #define CGEMM_DEFAULT_UNROLL_M 2
  2753. #define CGEMM_DEFAULT_UNROLL_N 4
  2754. #define CGEMM_DEFAULT_UNROLL_MN 16
  2755. #define ZGEMM_DEFAULT_UNROLL_M 2
  2756. #define ZGEMM_DEFAULT_UNROLL_N 4
  2757. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2758. #define SGEMM_DEFAULT_P 128
  2759. #define DGEMM_DEFAULT_P 160
  2760. #define CGEMM_DEFAULT_P 128
  2761. #define ZGEMM_DEFAULT_P 128
  2762. #define SGEMM_DEFAULT_Q 352
  2763. #define DGEMM_DEFAULT_Q 128
  2764. #define CGEMM_DEFAULT_Q 224
  2765. #define ZGEMM_DEFAULT_Q 112
  2766. #define SGEMM_DEFAULT_R 4096
  2767. #define DGEMM_DEFAULT_R 4096
  2768. #define CGEMM_DEFAULT_R 4096
  2769. #define ZGEMM_DEFAULT_R 4096
  2770. #elif defined(ARMV8SVE) || defined(ARMV9) || defined(CORTEXA510)|| defined(CORTEXA710) || defined(CORTEXX2) // 128-bit SVE
  2771. #if defined(XDOUBLE) || defined(DOUBLE)
  2772. #define SWITCH_RATIO 8
  2773. #else
  2774. #define SWITCH_RATIO 16
  2775. #endif
  2776. #define SGEMM_DEFAULT_UNROLL_M 4 // Actually 1VL (8) but kept seperate to keep copies seperate
  2777. #define SGEMM_DEFAULT_UNROLL_N 8
  2778. #define DGEMM_DEFAULT_UNROLL_M 4
  2779. #define DGEMM_DEFAULT_UNROLL_N 8
  2780. #define CGEMM_DEFAULT_UNROLL_M 2
  2781. #define CGEMM_DEFAULT_UNROLL_N 4
  2782. #define CGEMM_DEFAULT_UNROLL_MN 16
  2783. #define ZGEMM_DEFAULT_UNROLL_M 2
  2784. #define ZGEMM_DEFAULT_UNROLL_N 4
  2785. #define ZGEMM_DEFAULT_UNROLL_MN 16
  2786. #define SGEMM_DEFAULT_P 128
  2787. #define DGEMM_DEFAULT_P 160
  2788. #define CGEMM_DEFAULT_P 128
  2789. #define ZGEMM_DEFAULT_P 128
  2790. #define SGEMM_DEFAULT_Q 352
  2791. #define DGEMM_DEFAULT_Q 128
  2792. #define CGEMM_DEFAULT_Q 224
  2793. #define ZGEMM_DEFAULT_Q 112
  2794. #define SGEMM_DEFAULT_R 4096
  2795. #define DGEMM_DEFAULT_R 4096
  2796. #define CGEMM_DEFAULT_R 4096
  2797. #define ZGEMM_DEFAULT_R 4096
  2798. #else /* Other/undetected ARMv8 cores */
  2799. #define SGEMM_DEFAULT_UNROLL_M 16
  2800. #define SGEMM_DEFAULT_UNROLL_N 4
  2801. #define DGEMM_DEFAULT_UNROLL_M 8
  2802. #define DGEMM_DEFAULT_UNROLL_N 4
  2803. #define CGEMM_DEFAULT_UNROLL_M 8
  2804. #define CGEMM_DEFAULT_UNROLL_N 4
  2805. #define ZGEMM_DEFAULT_UNROLL_M 4
  2806. #define ZGEMM_DEFAULT_UNROLL_N 4
  2807. #define SGEMM_DEFAULT_P 128
  2808. #define DGEMM_DEFAULT_P 160
  2809. #define CGEMM_DEFAULT_P 128
  2810. #define ZGEMM_DEFAULT_P 128
  2811. #define SGEMM_DEFAULT_Q 352
  2812. #define DGEMM_DEFAULT_Q 128
  2813. #define CGEMM_DEFAULT_Q 224
  2814. #define ZGEMM_DEFAULT_Q 112
  2815. #define SGEMM_DEFAULT_R 4096
  2816. #define DGEMM_DEFAULT_R 4096
  2817. #define CGEMM_DEFAULT_R 4096
  2818. #define ZGEMM_DEFAULT_R 4096
  2819. #endif /* Cores */
  2820. #endif /* ARMv8 */
  2821. #if defined(ARMV5)
  2822. #define SNUMOPT 2
  2823. #define DNUMOPT 2
  2824. #define GEMM_DEFAULT_OFFSET_A 0
  2825. #define GEMM_DEFAULT_OFFSET_B 0
  2826. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2827. #define SGEMM_DEFAULT_UNROLL_M 2
  2828. #define SGEMM_DEFAULT_UNROLL_N 2
  2829. #define DGEMM_DEFAULT_UNROLL_M 2
  2830. #define DGEMM_DEFAULT_UNROLL_N 2
  2831. #define CGEMM_DEFAULT_UNROLL_M 2
  2832. #define CGEMM_DEFAULT_UNROLL_N 2
  2833. #define ZGEMM_DEFAULT_UNROLL_M 2
  2834. #define ZGEMM_DEFAULT_UNROLL_N 2
  2835. #define SGEMM_DEFAULT_P 128
  2836. #define DGEMM_DEFAULT_P 128
  2837. #define CGEMM_DEFAULT_P 96
  2838. #define ZGEMM_DEFAULT_P 64
  2839. #define SGEMM_DEFAULT_Q 240
  2840. #define DGEMM_DEFAULT_Q 120
  2841. #define CGEMM_DEFAULT_Q 120
  2842. #define ZGEMM_DEFAULT_Q 120
  2843. #define SGEMM_DEFAULT_R 12288
  2844. #define DGEMM_DEFAULT_R 8192
  2845. #define CGEMM_DEFAULT_R 4096
  2846. #define ZGEMM_DEFAULT_R 4096
  2847. #define SYMV_P 16
  2848. #endif
  2849. #ifdef CORTEXA9
  2850. #define SNUMOPT 2
  2851. #define DNUMOPT 2
  2852. #define GEMM_DEFAULT_OFFSET_A 0
  2853. #define GEMM_DEFAULT_OFFSET_B 0
  2854. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2855. #define SGEMM_DEFAULT_UNROLL_M 4
  2856. #define SGEMM_DEFAULT_UNROLL_N 4
  2857. #define DGEMM_DEFAULT_UNROLL_M 4
  2858. #define DGEMM_DEFAULT_UNROLL_N 4
  2859. #define CGEMM_DEFAULT_UNROLL_M 2
  2860. #define CGEMM_DEFAULT_UNROLL_N 2
  2861. #define ZGEMM_DEFAULT_UNROLL_M 2
  2862. #define ZGEMM_DEFAULT_UNROLL_N 2
  2863. #define SGEMM_DEFAULT_P 128
  2864. #define DGEMM_DEFAULT_P 128
  2865. #define CGEMM_DEFAULT_P 96
  2866. #define ZGEMM_DEFAULT_P 64
  2867. #define SGEMM_DEFAULT_Q 240
  2868. #define DGEMM_DEFAULT_Q 120
  2869. #define CGEMM_DEFAULT_Q 120
  2870. #define ZGEMM_DEFAULT_Q 120
  2871. #define SGEMM_DEFAULT_R 12288
  2872. #define DGEMM_DEFAULT_R 8192
  2873. #define CGEMM_DEFAULT_R 4096
  2874. #define ZGEMM_DEFAULT_R 4096
  2875. #define SYMV_P 16
  2876. #endif
  2877. #ifdef CORTEXA15
  2878. #define SNUMOPT 2
  2879. #define DNUMOPT 2
  2880. #define GEMM_DEFAULT_OFFSET_A 0
  2881. #define GEMM_DEFAULT_OFFSET_B 0
  2882. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2883. #define SGEMM_DEFAULT_UNROLL_M 4
  2884. #define SGEMM_DEFAULT_UNROLL_N 4
  2885. #define DGEMM_DEFAULT_UNROLL_M 4
  2886. #define DGEMM_DEFAULT_UNROLL_N 4
  2887. #define CGEMM_DEFAULT_UNROLL_M 2
  2888. #define CGEMM_DEFAULT_UNROLL_N 2
  2889. #define ZGEMM_DEFAULT_UNROLL_M 2
  2890. #define ZGEMM_DEFAULT_UNROLL_N 2
  2891. #define SGEMM_DEFAULT_P 128
  2892. #define DGEMM_DEFAULT_P 128
  2893. #define CGEMM_DEFAULT_P 96
  2894. #define ZGEMM_DEFAULT_P 64
  2895. #define SGEMM_DEFAULT_Q 240
  2896. #define DGEMM_DEFAULT_Q 120
  2897. #define CGEMM_DEFAULT_Q 120
  2898. #define ZGEMM_DEFAULT_Q 120
  2899. #define SGEMM_DEFAULT_R 12288
  2900. #define DGEMM_DEFAULT_R 8192
  2901. #define CGEMM_DEFAULT_R 4096
  2902. #define ZGEMM_DEFAULT_R 4096
  2903. #define SYMV_P 16
  2904. #endif
  2905. #if defined(ZARCH_GENERIC)
  2906. #define SNUMOPT 2
  2907. #define DNUMOPT 2
  2908. #define GEMM_DEFAULT_OFFSET_A 0
  2909. #define GEMM_DEFAULT_OFFSET_B 0
  2910. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2911. #define SGEMM_DEFAULT_UNROLL_M 2
  2912. #define SGEMM_DEFAULT_UNROLL_N 2
  2913. #define DGEMM_DEFAULT_UNROLL_M 2
  2914. #define DGEMM_DEFAULT_UNROLL_N 2
  2915. #define CGEMM_DEFAULT_UNROLL_M 2
  2916. #define CGEMM_DEFAULT_UNROLL_N 2
  2917. #define ZGEMM_DEFAULT_UNROLL_M 2
  2918. #define ZGEMM_DEFAULT_UNROLL_N 2
  2919. #define SGEMM_DEFAULT_P 128
  2920. #define DGEMM_DEFAULT_P 128
  2921. #define CGEMM_DEFAULT_P 96
  2922. #define ZGEMM_DEFAULT_P 64
  2923. #define SGEMM_DEFAULT_Q 240
  2924. #define DGEMM_DEFAULT_Q 120
  2925. #define CGEMM_DEFAULT_Q 120
  2926. #define ZGEMM_DEFAULT_Q 120
  2927. #define SGEMM_DEFAULT_R 12288
  2928. #define DGEMM_DEFAULT_R 8192
  2929. #define CGEMM_DEFAULT_R 4096
  2930. #define ZGEMM_DEFAULT_R 4096
  2931. #define SYMV_P 16
  2932. #endif
  2933. #if defined(Z13)
  2934. #define SNUMOPT 2
  2935. #define DNUMOPT 2
  2936. #define GEMM_DEFAULT_OFFSET_A 0
  2937. #define GEMM_DEFAULT_OFFSET_B 0
  2938. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x03fffUL
  2939. #define SGEMM_DEFAULT_UNROLL_M 8
  2940. #define SGEMM_DEFAULT_UNROLL_N 4
  2941. #define DGEMM_DEFAULT_UNROLL_M 8
  2942. #define DGEMM_DEFAULT_UNROLL_N 4
  2943. #define CGEMM_DEFAULT_UNROLL_M 4
  2944. #define CGEMM_DEFAULT_UNROLL_N 4
  2945. #define ZGEMM_DEFAULT_UNROLL_M 4
  2946. #define ZGEMM_DEFAULT_UNROLL_N 4
  2947. #define SGEMM_DEFAULT_P 456
  2948. #define DGEMM_DEFAULT_P 320
  2949. #define CGEMM_DEFAULT_P 480
  2950. #define ZGEMM_DEFAULT_P 224
  2951. #define SGEMM_DEFAULT_Q 488
  2952. #define DGEMM_DEFAULT_Q 384
  2953. #define CGEMM_DEFAULT_Q 128
  2954. #define ZGEMM_DEFAULT_Q 352
  2955. #define SGEMM_DEFAULT_R 8192
  2956. #define DGEMM_DEFAULT_R 4096
  2957. #define CGEMM_DEFAULT_R 4096
  2958. #define ZGEMM_DEFAULT_R 2048
  2959. #define SYMV_P 16
  2960. #endif
  2961. #if defined(Z14)
  2962. #define SNUMOPT 2
  2963. #define DNUMOPT 2
  2964. #define GEMM_DEFAULT_OFFSET_A 0
  2965. #define GEMM_DEFAULT_OFFSET_B 0
  2966. #define GEMM_DEFAULT_ALIGN 0x03fffUL
  2967. #define SGEMM_DEFAULT_UNROLL_M 16
  2968. #define SGEMM_DEFAULT_UNROLL_N 4
  2969. #define DGEMM_DEFAULT_UNROLL_M 8
  2970. #define DGEMM_DEFAULT_UNROLL_N 4
  2971. #define CGEMM_DEFAULT_UNROLL_M 4
  2972. #define CGEMM_DEFAULT_UNROLL_N 4
  2973. #define ZGEMM_DEFAULT_UNROLL_M 4
  2974. #define ZGEMM_DEFAULT_UNROLL_N 4
  2975. #define SGEMM_DEFAULT_P 480
  2976. #define DGEMM_DEFAULT_P 320
  2977. #define CGEMM_DEFAULT_P 480
  2978. #define ZGEMM_DEFAULT_P 224
  2979. #define SGEMM_DEFAULT_Q 512
  2980. #define DGEMM_DEFAULT_Q 384
  2981. #define CGEMM_DEFAULT_Q 128
  2982. #define ZGEMM_DEFAULT_Q 352
  2983. #define SGEMM_DEFAULT_R 8192
  2984. #define DGEMM_DEFAULT_R 4096
  2985. #define CGEMM_DEFAULT_R 4096
  2986. #define ZGEMM_DEFAULT_R 2048
  2987. #define SYMV_P 16
  2988. #endif
  2989. #ifdef GENERIC
  2990. #define SNUMOPT 2
  2991. #define DNUMOPT 2
  2992. #define GEMM_DEFAULT_OFFSET_A 0
  2993. #define GEMM_DEFAULT_OFFSET_B 0
  2994. #define GEMM_DEFAULT_ALIGN (BLASLONG)0x0ffffUL
  2995. #define SGEMM_DEFAULT_UNROLL_N 2
  2996. #define DGEMM_DEFAULT_UNROLL_N 2
  2997. #define QGEMM_DEFAULT_UNROLL_N 2
  2998. #define CGEMM_DEFAULT_UNROLL_N 2
  2999. #define ZGEMM_DEFAULT_UNROLL_N 2
  3000. #define XGEMM_DEFAULT_UNROLL_N 1
  3001. #ifdef ARCH_X86
  3002. #define SGEMM_DEFAULT_UNROLL_M 2
  3003. #define DGEMM_DEFAULT_UNROLL_M 2
  3004. #define QGEMM_DEFAULT_UNROLL_M 2
  3005. #define CGEMM_DEFAULT_UNROLL_M 2
  3006. #define ZGEMM_DEFAULT_UNROLL_M 2
  3007. #define XGEMM_DEFAULT_UNROLL_M 1
  3008. #else
  3009. #define SGEMM_DEFAULT_UNROLL_M 2
  3010. #define DGEMM_DEFAULT_UNROLL_M 2
  3011. #define QGEMM_DEFAULT_UNROLL_M 2
  3012. #define CGEMM_DEFAULT_UNROLL_M 2
  3013. #define ZGEMM_DEFAULT_UNROLL_M 2
  3014. #define XGEMM_DEFAULT_UNROLL_M 1
  3015. #endif
  3016. #ifdef ARCH_MIPS
  3017. #define SGEMM_DEFAULT_P 128
  3018. #define DGEMM_DEFAULT_P 128
  3019. #define CGEMM_DEFAULT_P 96
  3020. #define ZGEMM_DEFAULT_P 64
  3021. #define SGEMM_DEFAULT_Q 240
  3022. #define DGEMM_DEFAULT_Q 120
  3023. #define CGEMM_DEFAULT_Q 120
  3024. #define ZGEMM_DEFAULT_Q 120
  3025. #define SGEMM_DEFAULT_R 12288
  3026. #define DGEMM_DEFAULT_R 8192
  3027. #define CGEMM_DEFAULT_R 4096
  3028. #define ZGEMM_DEFAULT_R 4096
  3029. #elif defined(ARCH_LOONGARCH64)
  3030. #define SGEMM_DEFAULT_P 128
  3031. #define DGEMM_DEFAULT_P 128
  3032. #define CGEMM_DEFAULT_P 96
  3033. #define ZGEMM_DEFAULT_P 64
  3034. #define SGEMM_DEFAULT_Q 240
  3035. #define DGEMM_DEFAULT_Q 120
  3036. #define CGEMM_DEFAULT_Q 120
  3037. #define ZGEMM_DEFAULT_Q 120
  3038. #define SGEMM_DEFAULT_R 12288
  3039. #define DGEMM_DEFAULT_R 8192
  3040. #define CGEMM_DEFAULT_R 4096
  3041. #define ZGEMM_DEFAULT_R 4096
  3042. #else
  3043. #define SGEMM_DEFAULT_P sgemm_p
  3044. #define DGEMM_DEFAULT_P dgemm_p
  3045. #define QGEMM_DEFAULT_P qgemm_p
  3046. #define CGEMM_DEFAULT_P cgemm_p
  3047. #define ZGEMM_DEFAULT_P zgemm_p
  3048. #define XGEMM_DEFAULT_P xgemm_p
  3049. #define SGEMM_DEFAULT_R sgemm_r
  3050. #define DGEMM_DEFAULT_R dgemm_r
  3051. #define QGEMM_DEFAULT_R qgemm_r
  3052. #define CGEMM_DEFAULT_R cgemm_r
  3053. #define ZGEMM_DEFAULT_R zgemm_r
  3054. #define XGEMM_DEFAULT_R xgemm_r
  3055. #define SGEMM_DEFAULT_Q 128
  3056. #define DGEMM_DEFAULT_Q 128
  3057. #define QGEMM_DEFAULT_Q 128
  3058. #define CGEMM_DEFAULT_Q 128
  3059. #define ZGEMM_DEFAULT_Q 128
  3060. #define XGEMM_DEFAULT_Q 128
  3061. #endif
  3062. #define SYMV_P 16
  3063. #endif
  3064. #ifndef SWITCH_RATIO
  3065. #define SWITCH_RATIO 2
  3066. #endif
  3067. #ifndef QGEMM_DEFAULT_UNROLL_M
  3068. #define QGEMM_DEFAULT_UNROLL_M 2
  3069. #endif
  3070. #ifndef QGEMM_DEFAULT_UNROLL_N
  3071. #define QGEMM_DEFAULT_UNROLL_N 2
  3072. #endif
  3073. #ifndef XGEMM_DEFAULT_UNROLL_M
  3074. #define XGEMM_DEFAULT_UNROLL_M 2
  3075. #endif
  3076. #ifndef XGEMM_DEFAULT_UNROLL_N
  3077. #define XGEMM_DEFAULT_UNROLL_N 2
  3078. #endif
  3079. #ifndef HAVE_SSE2
  3080. #define SHUFPD_0 shufps $0x44,
  3081. #define SHUFPD_1 shufps $0x4e,
  3082. #define SHUFPD_2 shufps $0xe4,
  3083. #define SHUFPD_3 shufps $0xee,
  3084. #endif
  3085. #ifndef SHUFPD_0
  3086. #define SHUFPD_0 shufpd $0,
  3087. #endif
  3088. #ifndef SHUFPD_1
  3089. #define SHUFPD_1 shufpd $1,
  3090. #endif
  3091. #ifndef SHUFPD_2
  3092. #define SHUFPD_2 shufpd $2,
  3093. #endif
  3094. #ifndef SHUFPD_3
  3095. #define SHUFPD_3 shufpd $3,
  3096. #endif
  3097. #ifndef SHUFPS_39
  3098. #define SHUFPS_39 shufps $0x39,
  3099. #endif
  3100. #endif