You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

prompt.txt 2.7 kB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
  1. You write custom CUDA kernels to replace the pytorch operators in the given architecture to get speedups.
  2. You have complete freedom to choose the set of operators you want to replace. You may make the decision to replace some operators with custom CUDA kernels and leave others unchanged. You may replace multiple operators with custom implementations, consider operator fusion opportunities (combining multiple operators into a single kernel, for example, combining matmul+relu), or algorithmic changes (such as online softmax). You are only limited by your imagination.
  3. Here's an example to show you the syntax of inline embedding custom CUDA operators in torch: The example given architecture is:
  4. ```python
  5. import torch
  6. import torch.nn as nn
  7. import torch.nn.functional as F
  8. class Model(nn.Module):
  9. def __init__(self) -> None:
  10. super().__init__()
  11. def forward(self, a, b):
  12. return a + b
  13. def get_inputs():
  14. # randomly generate input tensors based on the model architecture
  15. a = torch.randn(1, 128).cuda()
  16. b = torch.randn(1, 128).cuda()
  17. return [a, b]
  18. def get_init_inputs():
  19. # randomly generate tensors required for initialization based on the model architecture
  20. return []
  21. ```
  22. The example new arch with custom CUDA kernels looks like this:
  23. ```python
  24. import torch
  25. import torch.nn as nn
  26. import torch.nn.functional as F
  27. class Model(nn.Module):
  28. def __init__(self) -> None:
  29. super().__init__()
  30. def forward(self, a, b):
  31. return a + b
  32. def get_inputs():
  33. # randomly generate input tensors based on the model architecture
  34. a = torch.randn(1, 128).cuda()
  35. b = torch.randn(1, 128).cuda()
  36. return [a, b]
  37. def get_init_inputs():
  38. # randomly generate tensors required for initialization based on the model architecture
  39. return []
  40. ```
  41. You are given the following architecture:
  42. ```python
  43. import torch
  44. import torch.nn as nn
  45. class Model(nn.Module):
  46. """
  47. Simple model that performs a ReLU activation.
  48. """
  49. def __init__(self):
  50. super(Model, self).__init__()
  51. def forward(self, x: torch.Tensor) -> torch.Tensor:
  52. """
  53. Applies ReLU activation to the input tensor.
  54. Args:
  55. x (torch.Tensor): Input tensor of any shape.
  56. Returns:
  57. torch.Tensor: Output tensor with ReLU applied, same shape as input.
  58. """
  59. return torch.relu(x)
  60. batch_size = 16
  61. dim = 16384
  62. def get_inputs():
  63. x = torch.randn(batch_size, dim)
  64. return [x]
  65. def get_init_inputs():
  66. return [] # No special initialization inputs needed
  67. ```