Browse Source

Enhance federated learning example

Signed-off-by: JimmyYang <yangjin39@huawei.com>
tags/v0.1.0
JimmyYang 4 years ago
parent
commit
b0b7cb992d
1 changed files with 95 additions and 51 deletions
  1. +95
    -51
      examples/federated_learning/surface_defect_detection/README.md

+ 95
- 51
examples/federated_learning/surface_defect_detection/README.md View File

@@ -5,31 +5,51 @@ Using Federated Learning, we can solve the problem. Each place uses its own data


## Surface Defect Detection Experiment
> Assume that there are two edge nodes (edge1 and edge2) and a cloud node. Data on the edge nodes cannot be migrated to the cloud due to privacy issues.
> Assume that there are two edge nodes and a cloud node. Data on the edge nodes cannot be migrated to the cloud due to privacy issues.
> Base on this scenario, we will demonstrate the surface inspection.

### Prepare Nodes
```
CLOUD_NODE="cloud-node-name"
EDGE1_NODE="edge1-node-name"
EDGE2_NODE="edge2-node-name"
```

### Install Sedna

Follow the [Sedna installation document](/docs/setup/install.md) to install Sedna.
### Prepare Dataset

Download [dataset](https://github.com/abin24/Magnetic-tile-defect-datasets.) and the [label file](/examples/federated_learning/surface_defect_detection/data/1.txt) to `/data` of edge1.
Download [dataset](https://github.com/abin24/Magnetic-tile-defect-datasets.) and the [label file](/examples/federated_learning/surface_defect_detection/data/2.txt) to `/data` of edge2.

### Prepare Script
Download [dataset](https://github.com/abin24/Magnetic-tile-defect-datasets.) and the [label file](/examples/federated_learning/surface_defect_detection/data/1.txt) to `/data` of ```EDGE1_NODE```.
```
mkdir -p /data
cd /data
git clone https://github.com/abin24/Magnetic-tile-defect-datasets..git Magnetic-tile-defect-datasets
curl -o 1.txt https://raw.githubusercontent.com/kubeedge/sedna/main/examples/federated_learning/surface_defect_detection/data/1.txt
```

Download the script [aggregate.py](/examples/federated_learning/surface_defect_detection/aggregation_worker/aggregate.py) to the `/code` of cloud node.
Download [dataset](https://github.com/abin24/Magnetic-tile-defect-datasets.) and the [label file](/examples/federated_learning/surface_defect_detection/data/2.txt) to `/data` of ```EDGE2_NODE```.
```
mkdir -p /data
cd /data
git clone https://github.com/abin24/Magnetic-tile-defect-datasets..git Magnetic-tile-defect-datasets
curl -o 2.txt https://raw.githubusercontent.com/kubeedge/sedna/main/examples/federated_learning/surface_defect_detection/data/2.txt
```

Download the script [training_worker](/examples/federated_learning/surface_defect_detection/training_worker/train.py) to the `/code` of edge1 and edge2.
### Prepare Images
This example uses these images:
1. aggregation worker: ```kubeedge/sedna-example-federated-learning-surface-defect-detection-aggregation:v0.1.0```
2. train worker: ```kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.1.0```

These images are generated by the script [build_images.sh](/examples/build_image.sh).

### Create Federated Learning Job

#### Create Dataset

create dataset for `$EDGE1_NODE`
```
# create dataset for edge1
kubectl create -f - <<EOF
apiVersion: sedna.io/v1alpha1
kind: Dataset
@@ -38,10 +58,12 @@ metadata:
spec:
url: "/data/1.txt"
format: "txt"
nodeName: "edge1"
nodeName: $EDGE1_NODE
EOF
```

# create dataset for edge2
create dataset for `$EDGE2_NODE`
```
kubectl create -f - <<EOF
apiVersion: sedna.io/v1alpha1
kind: Dataset
@@ -50,12 +72,22 @@ metadata:
spec:
url: "/data/2.txt"
format: "txt"
nodeName: "edge2"
nodeName: $EDGE2_NODE
EOF
```

#### Create Model

create the directory `/model` in the host of `$EDGE1_NODE`
```
mkdir /model
```
create the directory `/model` in the host of `$EDGE2_NODE`
```
mkdir /model
```

create model
```
kubectl create -f - <<EOF
apiVersion: sedna.io/v1alpha1
@@ -80,46 +112,58 @@ spec:
aggregationWorker:
model:
name: "surface-defect-detection-model"
nodeName: "cloud0"
workerSpec:
scriptDir: "/code"
scriptBootFile: "aggregate.py"
frameworkType: "tensorflow"
frameworkVersion: "2.3"
parameters:
- key: "exit_round"
value: "3"
template:
spec:
nodeName: $CLOUD_NODE
containers:
- image: kubeedge/sedna-example-federated-learning-surface-defect-detection-aggregation:v0.1.0
name: agg-worker
imagePullPolicy: IfNotPresent
env: # user defined environments
- name: "exit_round"
value: "3"
resources: # user defined resources
limits:
memory: 2Gi
trainingWorkers:
- nodeName: "edge1"
dataset:
name: "edge-1-surface-defect-detection-dataset"
workerSpec:
scriptDir: "/code"
scriptBootFile: "train.py"
frameworkType: "tensorflow"
frameworkVersion: "2.3"
parameters:
- key: "batch_size"
value: "32"
- key: "learning_rate"
value: "0.001"
- key: "epochs"
value: "1"
- nodeName: "edge2"
dataset:
name: "edge-2-surface-defect-detection-dataset"
workerSpec:
scriptDir: "/code"
scriptBootFile: "train.py"
frameworkType: "tensorflow"
frameworkVersion: "2.3"
parameters:
- key: "batch_size"
value: "32"
- key: "learning_rate"
value: "0.001"
- key: "epochs"
value: "1"
- dataset:
name: "edge1-surface-defect-detection-dataset"
template:
spec:
nodeName: $EDGE1_NODE
containers:
- image: kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.1.0
name: train-worker
imagePullPolicy: IfNotPresent
env: # user defined environments
- name: "batch_size"
value: "32"
- name: "learning_rate"
value: "0.001"
- name: "epochs"
value: "2"
resources: # user defined resources
limits:
memory: 2Gi
- dataset:
name: "edge2-surface-defect-detection-dataset"
template:
spec:
nodeName: $EDGE2_NODE
containers:
- image: kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.1.0
name: train-worker
imagePullPolicy: IfNotPresent
env: # user defined environments
- name: "batch_size"
value: "32"
- name: "learning_rate"
value: "0.001"
- name: "epochs"
value: "2"
resources: # user defined resources
limits:
memory: 2Gi
EOF
```

@@ -130,4 +174,4 @@ kubectl get federatedlearningjob surface-defect-detection
```

### Check Federated Learning Train Result
After the job completed, you will find the model generated on the path `/model` in edge1 and edge2.
After the job completed, you will find the model generated on the directory `/model` in `$EDGE1_NODE` and `EDGE2_NODE`.

Loading…
Cancel
Save