You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

quickstart.md 13 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481
  1. # Quick Start
  2. ## Introduction
  3. Sedna provide some examples of running Sedna jobs in [here](/examples/README.md)
  4. Here is a general guide to quick start an incremental learning job.
  5. ### Get Sedna
  6. You can find the latest Sedna release [here](https://github.com/kubeedge/sedna/releases).
  7. ### Deploying Sedna
  8. Sedna provides two deployment methods, which can be selected according to your actual situation:
  9. - Install Sedna on a cluster Step By Step: [guide here](setup/install.md).
  10. - Install Sedna AllinOne : [guide here](setup/local-up.md).
  11. ### Component
  12. Sedna consists of the following components:
  13. ![Architecture](./proposals/images/framework.png)
  14. #### GlobalManager
  15. * Unified edge-cloud synergy AI task management
  16. * Cross edge-cloud synergy management and collaboration
  17. * Central Configuration Management
  18. #### LocalController
  19. * Local process control of edge-cloud synergy AI tasks
  20. * Local general management: model, dataset, and status synchronization
  21. #### Worker
  22. * Do inference or training, based on existing ML framework.
  23. * Launch on demand, imagine they are docker containers.
  24. * Different workers for different features.
  25. * Could run on edge or cloud.
  26. #### Lib
  27. * Expose the Edge AI features to applications, i.e. training or inference programs.
  28. ### System Design
  29. There are three stages in a [incremental learning job](./proposals/incremental-learning.md): train/eval/deploy.
  30. ![](./proposals/images/incremental-learning-state-machine.png)
  31. ## Deployment Guide
  32. ### 1. Prepare
  33. #### 1.1 Deployment Planning
  34. In this example, there is only one host with two nodes, which had creating a Kubernetes cluster with `kind`.
  35. | NAME | ROLES | Ip Address | Operating System | Host Configuration | Storage | Deployment Module |
  36. | ----- | ------- | ----------------------------- | ----------------------- | ------------------ | ------- | ------------------------------------------------------------ |
  37. | edge-node | agent,edge | 192.168.0.233 | Ubuntu 18.04.5 LTS | 8C16G | 500G | LC,lib, inference worker |
  38. | sedna-control-plane | control-plane,master | 172.18.0.2 | Ubuntu 20.10 | 8C16G | 500G | GM,LC,lib,training worker,evaluate worker |
  39. #### 1.2 Network Requirements
  40. In this example the node **sedna-control-plane** has a internal-ip `172.18.0.2`, and **edge-node** can access it.
  41. ### 2. Project Deployment
  42. #### 2.1 (optional) create virtual env
  43. ```bash
  44. python3.6 -m venv venv
  45. source venv/bin/activate
  46. pip3 install -U pip
  47. ```
  48. #### 2.2 install sedna SDK
  49. ```bash
  50. cd $SENDA_ROOT/lib
  51. python3.6 setup.py bdist_wheel
  52. pip3 install dist/sedna*.whl
  53. ```
  54. #### 2.3 Prepare your machine learning model and datasets
  55. ##### 2.3.1 Encapsulate an Estimators
  56. Sedna implements several pre-made Estimators in [example](/examples), your can find them from the python scripts called `interface`.
  57. Sedna supports the Estimators build from popular AI frameworks, such as TensorFlow, Pytorch, PaddlePaddle, MindSpore. Also Custom estimators can be used according to our interface document.
  58. All Estimators—pre-made or custom ones—are classes should encapsulate the following actions:
  59. - Training
  60. - Evaluation
  61. - Prediction
  62. - Export/load
  63. Follow [here](/lib/sedna/README.md) for more details, a [toy_example](/examples/incremental_learning/helmet_detection/training/interface.py) like:
  64. ```python
  65. os.environ['BACKEND_TYPE'] = 'TENSORFLOW'
  66. class Estimator:
  67. def __init__(self, **kwargs):
  68. ...
  69. def train(self, train_data, valid_data=None, **kwargs):
  70. ...
  71. def evaluate(self, data, **kwargs):
  72. ...
  73. def predict(self, data, **kwargs):
  74. ...
  75. def load(self, model_url, **kwargs):
  76. ...
  77. def save(self, model_path, **kwargs):
  78. ...
  79. def get_weights(self):
  80. ...
  81. def set_weights(self, weights):
  82. ...
  83. ```
  84. ##### 2.3.2 Dataset prepare
  85. In incremental_learning jobs, the following files will be indispensable:
  86. - base model: tensorflow object detection Fine-tuning a model from an existing checkpoint.
  87. - deploy model: tensorflow object detection model, for inference.
  88. - train data: images with label use for Fine-tuning model.
  89. - test data: video stream use for model inference.
  90. ```bash
  91. # download models, including base model and deploy model.
  92. cd /
  93. wget https://kubeedge.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/models.tar.gz
  94. tar -zxvf models.tar.gz
  95. # download train data
  96. cd /data/helmet_detection # notes: files here will be monitored and used to trigger the incremental training
  97. wget https://kubeedge.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/dataset.tar.gz
  98. tar -zxvf dataset.tar.gz
  99. # download test data
  100. cd /incremental_learning/video/
  101. wget https://kubeedge.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/video.tar.gz
  102. tar -zxvf video.tar.gz
  103. ```
  104. #### 2.3.3 Scripts prepare
  105. In incremental_learning jobs, the following scripts will be indispensable:
  106. - train.py: script for model fine-tuning/training.
  107. - eval.py: script for model evaluate.
  108. - inference.py: script for data inference.
  109. You can also find demos [here](/examples/incremental_learning/helmet_detection).
  110. Some interfaces should be learn in job pipeline:
  111. - `BaseConfig` provides the capability of obtaining the config from env
  112. ```python
  113. from sedna.common.config import BaseConfig
  114. train_dataset_url = BaseConfig.train_dataset_url
  115. model_url = BaseConfig.model_url
  116. ```
  117. - `Context` provides the capability of obtaining the context from CRD
  118. ```python
  119. from sedna.common.config import Context
  120. obj_threshold = Context.get_parameters("obj_threshold")
  121. nms_threshold = Context.get_parameters("nms_threshold")
  122. input_shape = Context.get_parameters("input_shape")
  123. epochs = Context.get_parameters('epochs')
  124. batch_size = Context.get_parameters('batch_size')
  125. ```
  126. - `datasources` base class, as that core feature of sedna require identifying the features and labels from data input, we specify that the first parameter for train/evaluate of the ML framework
  127. ```python
  128. from sedna.datasources import BaseDataSource
  129. train_data = BaseDataSource(data_type="train")
  130. train_data.x = []
  131. train_data.y = []
  132. for item in mnist_ds.create_dict_iterator():
  133. train_data.x.append(item["image"].asnumpy())
  134. train_data.y.append(item["label"].asnumpy())
  135. ```
  136. - `sedna.core` contain all edge-cloud features, Please note that each feature has its own parameters.
  137. - **Hard Example Mining Algorithms** in IncrementalLearning named `hard_example_mining`
  138. ```python
  139. from sedna.core.incremental_learning import IncrementalLearning
  140. hard_example_mining = IncrementalLearning.get_hem_algorithm_from_config(
  141. threshold_img=0.9
  142. )
  143. # initial an incremental instance
  144. incremental_instance = IncrementalLearning(
  145. estimator=Estimator,
  146. hard_example_mining=hem_dict
  147. )
  148. # Call the interface according to the job state
  149. # train.py
  150. incremental_instance.train(train_data=train_data, epochs=epochs,
  151. batch_size=batch_size,
  152. class_names=class_names,
  153. input_shape=input_shape,
  154. obj_threshold=obj_threshold,
  155. nms_threshold=nms_threshold)
  156. # inference
  157. results, _, is_hard_example = incremental_instance.inference(
  158. data, input_shape=input_shape)
  159. ```
  160. ### 3. Configuration
  161. ##### 3.1 Prepare Image
  162. This example uses the image:
  163. ```
  164. kubeedge/sedna-example-incremental-learning-helmet-detection:v0.4.0
  165. ```
  166. This image is generated by the script [build_images.sh](/examples/build_image.sh), used for creating training, eval and inference worker.
  167. ##### 3.2 Create Incremental Job
  168. In this example, `$WORKER_NODE` is a custom node, you can fill it which you actually run.
  169. ```
  170. WORKER_NODE="edge-node"
  171. ```
  172. - Create Dataset
  173. ```
  174. kubectl create -f - <<EOF
  175. apiVersion: sedna.io/v1alpha1
  176. kind: Dataset
  177. metadata:
  178. name: incremental-dataset
  179. spec:
  180. url: "/data/helmet_detection/train_data/train_data.txt"
  181. format: "txt"
  182. nodeName: $WORKER_NODE
  183. EOF
  184. ```
  185. - Create Initial Model to simulate the initial model in incremental learning scenario.
  186. ```
  187. kubectl create -f - <<EOF
  188. apiVersion: sedna.io/v1alpha1
  189. kind: Model
  190. metadata:
  191. name: initial-model
  192. spec:
  193. url : "/models/base_model"
  194. format: "ckpt"
  195. EOF
  196. ```
  197. - Create Deploy Model
  198. ```
  199. kubectl create -f - <<EOF
  200. apiVersion: sedna.io/v1alpha1
  201. kind: Model
  202. metadata:
  203. name: deploy-model
  204. spec:
  205. url : "/models/deploy_model/saved_model.pb"
  206. format: "pb"
  207. EOF
  208. ```
  209. ### 4. Run
  210. * incremental learning supports hot model updates and cold model updates. Job support
  211. cold model updates default. If you want to use hot model updates, please to add the following fields:
  212. ```yaml
  213. deploySpec:
  214. model:
  215. hotUpdateEnabled: true
  216. pollPeriodSeconds: 60 # default value is 60
  217. ```
  218. * create the job:
  219. ```
  220. IMAGE=kubeedge/sedna-example-incremental-learning-helmet-detection:v0.4.0
  221. kubectl create -f - <<EOF
  222. apiVersion: sedna.io/v1alpha1
  223. kind: IncrementalLearningJob
  224. metadata:
  225. name: helmet-detection-demo
  226. spec:
  227. initialModel:
  228. name: "initial-model"
  229. dataset:
  230. name: "incremental-dataset"
  231. trainProb: 0.8
  232. trainSpec:
  233. template:
  234. spec:
  235. nodeName: $WORKER_NODE
  236. containers:
  237. - image: $IMAGE
  238. name: train-worker
  239. imagePullPolicy: IfNotPresent
  240. args: ["train.py"]
  241. env:
  242. - name: "batch_size"
  243. value: "32"
  244. - name: "epochs"
  245. value: "1"
  246. - name: "input_shape"
  247. value: "352,640"
  248. - name: "class_names"
  249. value: "person,helmet,helmet-on,helmet-off"
  250. - name: "nms_threshold"
  251. value: "0.4"
  252. - name: "obj_threshold"
  253. value: "0.3"
  254. trigger:
  255. checkPeriodSeconds: 60
  256. timer:
  257. start: 02:00
  258. end: 20:00
  259. condition:
  260. operator: ">"
  261. threshold: 500
  262. metric: num_of_samples
  263. evalSpec:
  264. template:
  265. spec:
  266. nodeName: $WORKER_NODE
  267. containers:
  268. - image: $IMAGE
  269. name: eval-worker
  270. imagePullPolicy: IfNotPresent
  271. args: ["eval.py"]
  272. env:
  273. - name: "input_shape"
  274. value: "352,640"
  275. - name: "class_names"
  276. value: "person,helmet,helmet-on,helmet-off"
  277. deploySpec:
  278. model:
  279. name: "deploy-model"
  280. hotUpdateEnabled: true
  281. pollPeriodSeconds: 60
  282. trigger:
  283. condition:
  284. operator: ">"
  285. threshold: 0.1
  286. metric: precision_delta
  287. hardExampleMining:
  288. name: "IBT"
  289. parameters:
  290. - key: "threshold_img"
  291. value: "0.9"
  292. - key: "threshold_box"
  293. value: "0.9"
  294. template:
  295. spec:
  296. nodeName: $WORKER_NODE
  297. containers:
  298. - image: $IMAGE
  299. name: infer-worker
  300. imagePullPolicy: IfNotPresent
  301. args: ["inference.py"]
  302. env:
  303. - name: "input_shape"
  304. value: "352,640"
  305. - name: "video_url"
  306. value: "file://video/video.mp4"
  307. - name: "HE_SAVED_URL"
  308. value: "/he_saved_url"
  309. volumeMounts:
  310. - name: localvideo
  311. mountPath: /video/
  312. - name: hedir
  313. mountPath: /he_saved_url
  314. resources: # user defined resources
  315. limits:
  316. memory: 2Gi
  317. volumes: # user defined volumes
  318. - name: localvideo
  319. hostPath:
  320. path: /incremental_learning/video/
  321. type: DirectoryorCreate
  322. - name: hedir
  323. hostPath:
  324. path: /incremental_learning/he/
  325. type: DirectoryorCreate
  326. outputDir: "/output"
  327. EOF
  328. ```
  329. 1. The `Dataset` describes data with labels and `HE_SAVED_URL` indicates the address of the deploy container for uploading hard examples. Users will mark label for the hard examples in the address.
  330. 2. Ensure that the path of outputDir in the YAML file exists on your node. This path will be directly mounted to the container.
  331. ### 5. Monitor
  332. ### Check Incremental Learning Job
  333. Query the service status:
  334. ```
  335. kubectl get incrementallearningjob helmet-detection-demo
  336. ```
  337. In the `IncrementalLearningJob` resource helmet-detection-demo, the following trigger is configured:
  338. ```
  339. trigger:
  340. checkPeriodSeconds: 60
  341. timer:
  342. start: 02:00
  343. end: 20:00
  344. condition:
  345. operator: ">"
  346. threshold: 500
  347. metric: num_of_samples
  348. ```
  349. ## API
  350. - control-plane: Please refer to this [link](api/crd).
  351. - Lib: Please refer to this [link](api/lib).
  352. ## Contributing
  353. Contributions are very welcome!
  354. - control-plane: Please refer to this [link](contributing/control-plane/development.md).
  355. - Lib: Please refer to this [link](contributing/lib/development.md).
  356. ## Community
  357. Sedna is an open source project and in the spirit of openness and freedom, we welcome new contributors to join us.
  358. You can get in touch with the community according to the ways:
  359. * [Github Issues](https://github.com/kubeedge/sedna/issues)
  360. * [Regular Community Meeting](https://zoom.us/j/4167237304)
  361. * [slack channel](https://app.slack.com/client/TDZ5TGXQW/C01EG84REVB/details)