You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

dataset-and-model.md 9.4 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350
  1. * [Dataset and Model](#dataset-and-model)
  2. * [Motivation](#motivation)
  3. * [Goals](#goals)
  4. * [Non\-goals](#non-goals)
  5. * [Proposal](#proposal)
  6. * [Use Cases](#use-cases)
  7. * [Design Details](#design-details)
  8. * [CRD API Group and Version](#crd-api-group-and-version)
  9. * [CRDs](#crds)
  10. * [Type definition](#crd-type-definition)
  11. * [Crd sample](#crd-samples)
  12. * [Controller Design](#controller-design)
  13. # Dataset and Model
  14. ## Motivation
  15. Currently, the Edge AI features depend on the object `dataset` and `model`.
  16. This proposal provides the definitions of dataset and model as the first class of k8s resources.
  17. ### Goals
  18. * Metadata of `dataset` and `model` objects.
  19. * Used by the Edge AI features
  20. ### Non-goals
  21. * The truly format of the AI `dataset`, such as `imagenet`, `coco` or `tf-record` etc.
  22. * The truly format of the AI `model`, such as `ckpt`, `saved_model` of tensorflow etc.
  23. * The truly operations of the AI `dataset`, such as `shuffle`, `crop` etc.
  24. * The truly operations of the AI `model`, such as `train`, `inference` etc.
  25. ## Proposal
  26. We propose using Kubernetes Custom Resource Definitions (CRDs) to describe
  27. the dataset/model specification/status and a controller to synchronize these updates between edge and cloud.
  28. ![](./images/dataset-model-crd.png)
  29. ### Use Cases
  30. * Users can create the dataset resource, by providing the `dataset url`, `format` and the `nodeName` which owns the dataset.
  31. * Users can create the model resource by providing the `model url` and `format`.
  32. * Users can show the information of dataset/model.
  33. * Users can delete the dataset/model.
  34. ## Design Details
  35. ### CRD API Group and Version
  36. The `Dataset` and `Model` CRDs will be namespace-scoped.
  37. The tables below summarize the group, kind and API version details for the CRDs.
  38. * Dataset
  39. | Field | Description |
  40. |-----------------------|-------------------------|
  41. |Group | neptune.io |
  42. |APIVersion | v1alpha1 |
  43. |Kind | Dataset |
  44. * Model
  45. | Field | Description |
  46. |-----------------------|-------------------------|
  47. |Group | neptune.io |
  48. |APIVersion | v1alpha1 |
  49. |Kind | Model |
  50. ### CRDs
  51. #### `Dataset` CRD
  52. [crd source](/build/crds/neptune/dataset_v1alpha1.yaml)
  53. ```yaml
  54. apiVersion: apiextensions.k8s.io/v1
  55. kind: CustomResourceDefinition
  56. metadata:
  57. name: datasets.neptune.io
  58. spec:
  59. group: neptune.io
  60. names:
  61. kind: Dataset
  62. plural: datasets
  63. scope: Namespaced
  64. versions:
  65. - name: v1alpha1
  66. subresources:
  67. # status enables the status subresource.
  68. status: {}
  69. served: true
  70. storage: true
  71. schema:
  72. openAPIV3Schema:
  73. type: object
  74. properties:
  75. spec:
  76. type: object
  77. required:
  78. - url
  79. - format
  80. properties:
  81. url:
  82. type: string
  83. format:
  84. type: string
  85. nodeName:
  86. type: string
  87. status:
  88. type: object
  89. properties:
  90. numberOfSamples:
  91. type: integer
  92. updateTime:
  93. type: string
  94. format: datatime
  95. additionalPrinterColumns:
  96. - name: NumberOfSamples
  97. type: integer
  98. description: The number of samples in the dataset
  99. jsonPath: ".status.numberOfSamples"
  100. - name: Node
  101. type: string
  102. description: The node name of the dataset
  103. jsonPath: ".spec.nodeName"
  104. - name: spec
  105. type: string
  106. description: The spec of the dataset
  107. jsonPath: ".spec"
  108. ```
  109. 1. `format` of dataset
  110. We use this field to report the number of samples for the dataset and do dataset splitting.
  111. Current we support these below formats:
  112. - txt: one nonempty line is one sample
  113. #### `Model` CRD
  114. [crd source](/build/crds/neptune/model_v1alpha1.yaml)
  115. ```yaml
  116. apiVersion: apiextensions.k8s.io/v1
  117. kind: CustomResourceDefinition
  118. metadata:
  119. name: models.neptune.io
  120. spec:
  121. group: neptune.io
  122. names:
  123. kind: Model
  124. plural: models
  125. scope: Namespaced
  126. versions:
  127. - name: v1alpha1
  128. subresources:
  129. # status enables the status subresource.
  130. status: {}
  131. served: true
  132. storage: true
  133. schema:
  134. openAPIV3Schema:
  135. type: object
  136. properties:
  137. spec:
  138. type: object
  139. required:
  140. - url
  141. - format
  142. properties:
  143. url:
  144. type: string
  145. format:
  146. type: string
  147. status:
  148. type: object
  149. properties:
  150. updateTime:
  151. type: string
  152. format: datetime
  153. metrics:
  154. type: array
  155. items:
  156. type: object
  157. properties:
  158. key:
  159. type: string
  160. value:
  161. type: string
  162. additionalPrinterColumns:
  163. - name: updateAGE
  164. type: date
  165. description: The update age
  166. jsonPath: ".status.updateTime"
  167. - name: metrics
  168. type: string
  169. description: The metrics
  170. jsonPath: ".status.metrics"
  171. ```
  172. ### CRD type definition
  173. - `Dataset`
  174. [go source](cloud/pkg/apis/neptune/v1alpha1/dataset_types.go)
  175. ```go
  176. package v1alpha1
  177. import (
  178. metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
  179. )
  180. // +genclient
  181. // +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
  182. // Dataset describes the data that a dataset resource should have
  183. type Dataset struct {
  184. metav1.TypeMeta `json:",inline"`
  185. metav1.ObjectMeta `json:"metadata,omitempty"`
  186. Spec DatasetSpec `json:"spec"`
  187. Status DatasetStatus `json:"status"`
  188. }
  189. // DatasetSpec is a description of a dataset
  190. type DatasetSpec struct {
  191. URL string `json:"url"`
  192. Format string `json:"format"`
  193. NodeName string `json:"nodeName"`
  194. }
  195. // DatasetStatus represents information about the status of a dataset
  196. // including the time a dataset updated, and number of samples in a dataset
  197. type DatasetStatus struct {
  198. UpdateTime *metav1.Time `json:"updateTime,omitempty" protobuf:"bytes,1,opt,name=updateTime"`
  199. NumberOfSamples int `json:"numberOfSamples"`
  200. }
  201. // +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
  202. // DatasetList is a list of Datasets
  203. type DatasetList struct {
  204. metav1.TypeMeta `json:",inline"`
  205. metav1.ListMeta `json:"metadata"`
  206. Items []Dataset `json:"items"`
  207. }
  208. ```
  209. - `Model`
  210. [go source](cloud/pkg/apis/neptune/v1alpha1/model_types.go)
  211. ```go
  212. package v1alpha1
  213. import (
  214. metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
  215. )
  216. // +genclient
  217. // +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
  218. // Model describes the data that a model resource should have
  219. type Model struct {
  220. metav1.TypeMeta `json:",inline"`
  221. metav1.ObjectMeta `json:"metadata,omitempty"`
  222. Spec ModelSpec `json:"spec"`
  223. Status ModelStatus `json:"status"`
  224. }
  225. // ModelSpec is a description of a model
  226. type ModelSpec struct {
  227. URL string `json:"url"`
  228. Format string `json:"format"`
  229. }
  230. // ModelStatus represents information about the status of a model
  231. // including the time a model updated, and metrics in a model
  232. type ModelStatus struct {
  233. UpdateTime *metav1.Time `json:"updateTime,omitempty" protobuf:"bytes,1,opt,name=updateTime"`
  234. Metrics []Metric `json:"metrics,omitempty" protobuf:"bytes,2,rep,name=metrics"`
  235. }
  236. // +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
  237. // ModelList is a list of Models
  238. type ModelList struct {
  239. metav1.TypeMeta `json:",inline"`
  240. metav1.ListMeta `json:"metadata"`
  241. Items []Model `json:"items"`
  242. }
  243. ```
  244. ### Crd samples
  245. - `Dataset`
  246. ```yaml
  247. apiVersion: neptune.io/v1alpha1
  248. kind: Dataset
  249. metadata:
  250. name: "dataset-examp"
  251. spec:
  252. url: "/code/data"
  253. format: "txt"
  254. nodeName: "edge0"
  255. ```
  256. - `Model`
  257. ```yaml
  258. apiVersion: neptune.io/v1alpha1
  259. kind: Model
  260. metadata:
  261. name: model-examp
  262. spec:
  263. url: "/model/frozen.pb"
  264. format: pb
  265. ```
  266. ## Controller Design
  267. In the current design there is downstream/upstream controller for `dataset`, no downstream/upstream controller for `model`.<br/>
  268. The dataset controller synchronizes the dataset between the cloud and edge.
  269. - downstream: synchronize the dataset info from the cloud to the edge node.
  270. - upstream: synchronize the dataset status from the edge to the cloud node, such as the information how many samples the dataset has.
  271. <br/>
  272. Here is the flow of the dataset creation:
  273. ![](./images/dataset-creation-flow.png)
  274. For the model:
  275. 1. Model's info will be synced when sync the federated-task etc which uses the model.
  276. 1. Model's status will be updated when the corresponding training/inference work has completed.

No Description