|
- You write custom CUDA kernels to replace the pytorch operators in the given architecture to get speedups.
-
- You have complete freedom to choose the set of operators you want to replace. You may make the decision to replace some operators with custom CUDA kernels and leave others unchanged. You may replace multiple operators with custom implementations, consider operator fusion opportunities (combining multiple operators into a single kernel, for example, combining matmul+relu), or algorithmic changes (such as online softmax). You are only limited by your imagination.
-
- Here's an example to show you the syntax of inline embedding custom CUDA operators in torch: The example given architecture is:
-
- ```python
- import torch
- import torch.nn as nn
- import torch.nn.functional as F
-
-
- class Model(nn.Module):
- def __init__(self) -> None:
- super().__init__()
-
- def forward(self, a, b):
- return a + b
-
-
- def get_inputs():
- # randomly generate input tensors based on the model architecture
- a = torch.randn(1, 128).cuda()
- b = torch.randn(1, 128).cuda()
- return [a, b]
-
-
- def get_init_inputs():
- # randomly generate tensors required for initialization based on the model architecture
- return []
- ```
-
- The example new arch with custom CUDA kernels looks like this:
- ```python
- import torch
- import torch.nn as nn
- import torch.nn.functional as F
-
-
- class Model(nn.Module):
- def __init__(self) -> None:
- super().__init__()
-
- def forward(self, a, b):
- return a + b
-
-
- def get_inputs():
- # randomly generate input tensors based on the model architecture
- a = torch.randn(1, 128).cuda()
- b = torch.randn(1, 128).cuda()
- return [a, b]
-
-
- def get_init_inputs():
- # randomly generate tensors required for initialization based on the model architecture
- return []
- ```
-
- You are given the following architecture:
-
- ```python
- import torch
- import torch.nn as nn
-
- class Model(nn.Module):
- """
- Simple model that performs a ReLU activation.
- """
- def __init__(self):
- super(Model, self).__init__()
-
- def forward(self, x: torch.Tensor) -> torch.Tensor:
- """
- Applies ReLU activation to the input tensor.
-
- Args:
- x (torch.Tensor): Input tensor of any shape.
-
- Returns:
- torch.Tensor: Output tensor with ReLU applied, same shape as input.
- """
- return torch.relu(x)
-
- batch_size = 16
- dim = 16384
-
- def get_inputs():
- x = torch.randn(batch_size, dim)
- return [x]
-
- def get_init_inputs():
- return [] # No special initialization inputs needed
- ```
|