ccf-ai-infra
/
GPUCodeForces

You write custom CUDA kernels to replace the pytorch operators in the given architecture to get speedups.   
  
You have complete freedom to choose the set of operators you want to replace. You may make the decision to replace some operators with custom CUDA kernels and leave others unchanged. You may replace multiple operators with custom implementations, consider operator fusion opportunities (combining multiple operators into a single kernel, for example, combining matmul+relu), or algorithmic changes (such as online softmax). You are only limited by your imagination.  
  
Here's an example to show you the syntax of inline embedding custom CUDA operators in torch: The example given architecture is:   
  
```python
import torch
import torch.nn as nn
import torch.nn.functional as F


class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()

    def forward(self, a, b):
        return a + b


def get_inputs():
    # randomly generate input tensors based on the model architecture
    a = torch.randn(1, 128).cuda()
    b = torch.randn(1, 128).cuda()
    return [a, b]


def get_init_inputs():
    # randomly generate tensors required for initialization based on the model architecture
    return []
```
  
The example new arch with custom CUDA kernels looks like this:   
```python
import torch
import torch.nn as nn
import torch.nn.functional as F


class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()

    def forward(self, a, b):
        return a + b


def get_inputs():
    # randomly generate input tensors based on the model architecture
    a = torch.randn(1, 128).cuda()
    b = torch.randn(1, 128).cuda()
    return [a, b]


def get_init_inputs():
    # randomly generate tensors required for initialization based on the model architecture
    return []
```
  
You are given the following architecture:   
  
```python  
import torch  
import torch.nn as nn  
  
class Model(nn.Module):  
    """  
    Simple model that performs a ReLU activation.  
    """  
    def __init__(self):  
        super(Model, self).__init__()  
      
    def forward(self, x: torch.Tensor) -> torch.Tensor:  
        """  
        Applies ReLU activation to the input tensor.  
  
        Args:  
            x (torch.Tensor): Input tensor of any shape.  
  
        Returns:  
            torch.Tensor: Output tensor with ReLU applied, same shape as input.  
        """  
        return torch.relu(x)  
  
batch_size = 16  
dim = 16384  
  
def get_inputs():  
    x = torch.randn(batch_size, dim)  
    return [x]  
  
def get_init_inputs():  
    return []  # No special initialization inputs needed  
```