You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 11 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342
  1. # Using Incremental Learning Job in Helmet Detection Scenario
  2. This document introduces how to use incremental learning job in helmet detection scenario.
  3. Using the incremental learning job, our application can automatically retrains, evaluates,
  4. and updates models based on the data generated at the edge.
  5. ## Helmet Detection Experiment
  6. ### Install Sedna
  7. Follow the [Sedna installation document](/docs/setup/install.md) to install Sedna.
  8. ### Prepare Model
  9. In this example, we need to prepare base model and deploy model in advance.
  10. download [models](https://kubeedge.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/model.tar.gz), including base model and deploy model.
  11. ```
  12. cd /
  13. wget https://kubeedge.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/models.tar.gz
  14. tar -zxvf models.tar.gz
  15. ```
  16. ### Prepare for Inference Worker
  17. In this example, we simulate a inference worker for helmet detection, the worker will upload hard examples to `HE_SAVED_URL`, while
  18. it inferences data from local video. We need to make following preparations:
  19. * make sure following localdirs exist
  20. ```
  21. mkdir -p /incremental_learning/video/
  22. mkdir -p /incremental_learning/he/
  23. mkdir -p /data/helmet_detection
  24. mkdir /output
  25. ```
  26. * download [video](https://kubeedge.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/video.tar.gz), unzip video.tar.gz, and put it into `/incremental_learning/video/`
  27. ```
  28. cd /incremental_learning/video/
  29. wget https://kubeedge.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/video.tar.gz
  30. tar -zxvf video.tar.gz
  31. ```
  32. ### Prepare Image
  33. This example uses the image:
  34. ```
  35. kubeedge/sedna-example-incremental-learning-helmet-detection:v0.4.0
  36. ```
  37. This image is generated by the script [build_images.sh](/examples/build_image.sh), used for creating training, eval and inference worker.
  38. ### Create Incremental Job
  39. In this example, `$WORKER_NODE` is a custom node, you can fill it which you actually run.
  40. ```
  41. WORKER_NODE="edge-node"
  42. ```
  43. Create Dataset
  44. ```
  45. kubectl create -f - <<EOF
  46. apiVersion: sedna.io/v1alpha1
  47. kind: Dataset
  48. metadata:
  49. name: incremental-dataset
  50. spec:
  51. url: "/data/helmet_detection/train_data/train_data.txt"
  52. format: "txt"
  53. nodeName: $WORKER_NODE
  54. EOF
  55. ```
  56. Create Initial Model to simulate the initial model in incremental learning scenario.
  57. ```
  58. kubectl create -f - <<EOF
  59. apiVersion: sedna.io/v1alpha1
  60. kind: Model
  61. metadata:
  62. name: initial-model
  63. spec:
  64. url : "/models/base_model"
  65. format: "ckpt"
  66. EOF
  67. ```
  68. Create Deploy Model
  69. ```
  70. kubectl create -f - <<EOF
  71. apiVersion: sedna.io/v1alpha1
  72. kind: Model
  73. metadata:
  74. name: deploy-model
  75. spec:
  76. url : "/models/deploy_model/saved_model.pb"
  77. format: "pb"
  78. EOF
  79. ```
  80. Start The Incremental Learning Job
  81. * incremental learning supports hot model updates and cold model updates. Job support
  82. cold model updates default. If you want to use hot model updates, please to add the following fields:
  83. ```yaml
  84. deploySpec:
  85. model:
  86. hotUpdateEnabled: true
  87. pollPeriodSeconds: 60 # default value is 60
  88. ```
  89. * create the job:
  90. ```
  91. IMAGE=kubeedge/sedna-example-incremental-learning-helmet-detection:v0.4.0
  92. kubectl create -f - <<EOF
  93. apiVersion: sedna.io/v1alpha1
  94. kind: IncrementalLearningJob
  95. metadata:
  96. name: helmet-detection-demo
  97. spec:
  98. initialModel:
  99. name: "initial-model"
  100. dataset:
  101. name: "incremental-dataset"
  102. trainProb: 0.8
  103. trainSpec:
  104. template:
  105. spec:
  106. nodeName: $WORKER_NODE
  107. containers:
  108. - image: $IMAGE
  109. name: train-worker
  110. imagePullPolicy: IfNotPresent
  111. args: ["train.py"]
  112. env:
  113. - name: "batch_size"
  114. value: "32"
  115. - name: "epochs"
  116. value: "1"
  117. - name: "input_shape"
  118. value: "352,640"
  119. - name: "class_names"
  120. value: "person,helmet,helmet-on,helmet-off"
  121. - name: "nms_threshold"
  122. value: "0.4"
  123. - name: "obj_threshold"
  124. value: "0.3"
  125. trigger:
  126. checkPeriodSeconds: 60
  127. timer:
  128. start: 02:00
  129. end: 20:00
  130. condition:
  131. operator: ">"
  132. threshold: 500
  133. metric: num_of_samples
  134. evalSpec:
  135. template:
  136. spec:
  137. nodeName: $WORKER_NODE
  138. containers:
  139. - image: $IMAGE
  140. name: eval-worker
  141. imagePullPolicy: IfNotPresent
  142. args: ["eval.py"]
  143. env:
  144. - name: "input_shape"
  145. value: "352,640"
  146. - name: "class_names"
  147. value: "person,helmet,helmet-on,helmet-off"
  148. deploySpec:
  149. model:
  150. name: "deploy-model"
  151. hotUpdateEnabled: true
  152. pollPeriodSeconds: 60
  153. trigger:
  154. condition:
  155. operator: ">"
  156. threshold: 0.1
  157. metric: precision_delta
  158. hardExampleMining:
  159. name: "IBT"
  160. parameters:
  161. - key: "threshold_img"
  162. value: "0.9"
  163. - key: "threshold_box"
  164. value: "0.9"
  165. template:
  166. spec:
  167. nodeName: $WORKER_NODE
  168. containers:
  169. - image: $IMAGE
  170. name: infer-worker
  171. imagePullPolicy: IfNotPresent
  172. args: ["inference.py"]
  173. env:
  174. - name: "input_shape"
  175. value: "352,640"
  176. - name: "video_url"
  177. value: "file://video/video.mp4"
  178. - name: "HE_SAVED_URL"
  179. value: "/he_saved_url"
  180. volumeMounts:
  181. - name: localvideo
  182. mountPath: /video/
  183. - name: hedir
  184. mountPath: /he_saved_url
  185. resources: # user defined resources
  186. limits:
  187. memory: 2Gi
  188. volumes: # user defined volumes
  189. - name: localvideo
  190. hostPath:
  191. path: /incremental_learning/video/
  192. type: DirectoryOrCreate
  193. - name: hedir
  194. hostPath:
  195. path: /incremental_learning/he/
  196. type: DirectoryOrCreate
  197. outputDir: "/output"
  198. EOF
  199. ```
  200. 1. The `Dataset` describes data with labels and `HE_SAVED_URL` indicates the address of the deploy container for uploading hard examples. Users will mark label for the hard examples in the address.
  201. 2. Ensure that the path of outputDir in the YAML file exists on your node. This path will be directly mounted to the container.
  202. ### Check Incremental Learning Job
  203. Query the service status:
  204. ```
  205. kubectl get incrementallearningjob helmet-detection-demo
  206. ```
  207. In the `IncrementalLearningJob` resource helmet-detection-demo, the following trigger is configured:
  208. ```
  209. trigger:
  210. checkPeriodSeconds: 60
  211. timer:
  212. start: 02:00
  213. end: 20:00
  214. condition:
  215. operator: ">"
  216. threshold: 500
  217. metric: num_of_samples
  218. ```
  219. ### Hard Example Labeling
  220. In a real word, we need to label the hard examples in `HE_SAVED_URL` with annotation tools and then put the examples to `Dataset`'s url.
  221. You can use Open-Source annotation tools to label hard examples, such as [MAKE SENSE](https://www.makesense.ai), which has following main advantages:
  222. * Open source and free to use under GPLv3 license
  223. * Support outputfile formats like YOLO, VOC XML, VGG JSON, CSV
  224. * No advanced installation required, just open up your browser
  225. * Use AI to make your work more productive
  226. * Offline running as a container, ensuring data security
  227. ![img.png](image/make-sense.png)
  228. the details labeling are not described here, main steps in this demo are as follows:
  229. * import unlabeled hard example to anonotation tools
  230. ![img_1.png](image/label-interface.png)
  231. * label and export annotations
  232. ![img_2.png](image/export-label.png)![img_3.png](image/label-result.png)
  233. * you will get YOLO format annotations, so you need convert them to the type which can be used by your own training code. in this example, the following scripts are
  234. provided for reference:
  235. ```
  236. import os
  237. annotation_dir_path = "C:/Users/Administrator/Desktop/labeled_data"
  238. save_path = "C:/Users/Administrator/Desktop/labeled_data/save_label.txt"
  239. def convert_single_line(line):
  240. line_list = []
  241. line = line.split(" ")
  242. for i in range(1, len(line)):
  243. line[i] = float(line[i])
  244. line[i] = line[i] * 1000
  245. line_list.append(str(int(line[i])))
  246. line_list.append(line[0])
  247. return ",".join(line_list)
  248. if __name__ == '__main__':
  249. results = []
  250. g = os.walk(annotation_dir_path)
  251. for path, dir_list, file_list in g:
  252. for file_name in file_list:
  253. file_path = os.path.join(path, file_name)
  254. file_name = file_name.split("txt")
  255. file_name = file_name[0] + 'jpg'
  256. single_label_string = file_name
  257. f = open(file_path)
  258. lines = f.readlines()
  259. for line in lines:
  260. line = line.strip('\n')
  261. single_label_string = single_label_string + " " + convert_single_line(line)
  262. results.append(single_label_string)
  263. save_file = open(save_path, "w")
  264. for result in results:
  265. save_file.write(result + "\n")
  266. save_file.close()
  267. ```
  268. How to use:
  269. `annotation_dir_path`: location for labeled annotations from MAKESENSE
  270. `save_path`: location for label txt which converted from annotations
  271. * run above script, you can get a txt which includes all label information
  272. * put the text with examples in the same dir
  273. * you will get labeled examples which meet training requirements
  274. ![img_1.png](image/reverted_label.png)
  275. * put these examples and annotations above to `Dataset`'s url
  276. Without annotation tools, we can simulate the condition of `num_of_samples` in the following ways:
  277. Download [dataset](https://kubeedge.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/dataset.tar.gz) to $WORKER_NODE.
  278. ```
  279. cd /data/helmet_detection
  280. wget https://kubeedge.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/dataset.tar.gz
  281. tar -zxvf dataset.tar.gz
  282. ```
  283. The LocalController component will check the number of the sample, realize trigger conditions are met and notice the GlobalManager Component to start train worker.
  284. When the train worker finish, we can view the updated model in the `/output` directory in `$WORKER_NODE` node.
  285. Then the eval worker will start to evaluate the model that train worker generated.
  286. If the eval result satisfy the `deploySpec`'s trigger
  287. ```
  288. trigger:
  289. condition:
  290. operator: ">"
  291. threshold: 0.1
  292. metric: precision_delta
  293. ```
  294. the deploy worker will load the new model and provide service.
  295. ### Effect Display
  296. In this example, false and failed detections occur at stage of inference before incremental learning, after incremental learning,
  297. all targets are correctly detected.
  298. ![img_1.png](image/effect_comparison.png)