In this version of Sedna lifelong learning framework, we realize the following features:
We propose using Kubernetes Custom Resource Definitions (CRDs) to describe
the lifelong learning specification/status and a controller to synchronize these updates between edge and cloud.
There are three stages in a lifelong learning job: train/eval/deploy.
Each stage contains these below states:
The LifelongLearningJob CRD will be namespace-scoped.
The tables below summarize the group, kind and API version details for the CRD.
| Field | Description |
|---|---|
| Group | sedna.io |
| APIVersion | v1alpha1 |
| Kind | LifelongLearningJob |
See the crd source for details.
See the golang source for details.
Open API v3 Schema based validation can be used to guard against bad requests.
Invalid values for fields (example string value for a boolean field etc) can be validated using this.
Here is a list of validations we need to support :
dataset specified in the crd should exist in k8s.See the source for an example.
The Lifelong learning controller starts three separate goroutines called upstream, downstream and Lifelonglearningjobcontroller.
These are not separate controllers as such but named here for clarity.
The lifelong-learning controller watches for the updates of lifelong-learning jobs and the corresponding pods against the K8S API server.
Updates are categorized below along with the possible actions:
| Update Type | Action |
|---|---|
| New lifelong-learning-job Created | Wait to train trigger satisfied |
| lifelong-learning-job Deleted | NA. These workers will be deleted by k8s gc. |
| The Status of lifelong-learning-job Updated | Create the train/eval worker if it's ready. |
| The corresponding pod created/running/completed/failed | Update the status of lifelong-learning job. |
The downstream controller watches for the lifelong-learning job updates against the K8S API server.
Updates are categorized below along with the possible actions that the downstream controller can take:
| Update Type | Action |
|---|---|
| New Lifelong-learning-job Created | Sends the job information to LCs. |
| Lifelong-learning-job Deleted | The controller sends the delete event to LCs. |
The upstream controller watches for the lifelong-learning job updates from the edge node and applies these updates against the API server in the cloud.
Updates are categorized below along with the possible actions that the upstream controller can take:
| Update Type | Action |
|---|---|
| Lifelong-learning-job Reported State Updated | The controller appends the reported status of the job by LC in the cloud. |
train stage:eval stage:deploy stage:No need to communicate between workers.