Browse Source

Merge branch 'dev' into 'doc'

# Conflicts:
#   docs/index.rst
#   docs/workflow/Identify helpful learnwares.rst
tags/v0.3.2
Xiaodong Bi 2 years ago
parent
commit
a5324d762f
12 changed files with 183 additions and 66 deletions
  1. +21
    -0
      .readthedocs.yaml
  2. +31
    -16
      docs/conf.py
  3. +1
    -1
      docs/index.rst
  4. +63
    -0
      docs/start/get_start_examples.rst
  5. +2
    -2
      docs/start/quick.rst
  6. +1
    -5
      docs/workflow/Identify helpful learnwares.rst
  7. +1
    -1
      examples/workflow_by_code/learnware_example/README.md
  8. +2
    -2
      examples/workflow_by_code/learnware_example/example_init.py
  9. +52
    -34
      examples/workflow_by_code/main.py
  10. +3
    -2
      learnware/learnware/base.py
  11. +3
    -3
      learnware/learnware/reuse.py
  12. +3
    -0
      learnware/market/easy.py

+ 21
- 0
.readthedocs.yaml View File

@@ -0,0 +1,21 @@
# .readthedocs.yml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the os and other tools you might need
build:
os: ubuntu-22.04

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py

# Build all formats
formats: all

# Optionally set the version of Python and requirements required to build your docs
python:
version: 3.8

+ 31
- 16
docs/conf.py View File

@@ -90,24 +90,39 @@ html_theme = "sphinx_rtd_theme"


html_logo = "_static/img/logo/logo1.png" html_logo = "_static/img/logo/logo1.png"



# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
# html_context = {
# "display_github": False,
# "last_updated": True,
# "commit": True,
# "github_user": "Microsoft",
# "github_repo": "QLib",
# 'github_version': 'master',
# 'conf_py_path': '/docs/',

# }
#
html_theme_options = { html_theme_options = {
"logo_only": True, "logo_only": True,
"collapse_navigation": False, "collapse_navigation": False,
"display_version": False, "display_version": False,
"navigation_depth": 3,
"navigation_depth": 4,
}


# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# This is required for the alabaster theme
# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
html_sidebars = {
"**": [
"about.html",
"navigation.html",
"relations.html", # needs 'show_related': True theme option to display
"searchbox.html",
]
}


# -- Options for HTMLHelp output ------------------------------------------

# Output file base name for HTML help builder.
htmlhelp_basename = "learnwaredoc"


autodoc_member_order = "bysource"
autodoc_default_flags = ["members"]
autodoc_default_options = {
"members": True,
"member-order": "bysource",
"special-members": "__init__",
} }

+ 1
- 1
docs/index.rst View File

@@ -4,7 +4,7 @@
contain the root `toctree` directive. contain the root `toctree` directive.


============================================================ ============================================================
``Learnware`` Documentation
``Learnware Market`` Documentation
============================================================ ============================================================


``Learnware`` is a model sharing platform, which give a basic implementation of the learnware paradigm. A learnware is a well-performed trained machine learning model with a specification that enables it to be adequately identified to reuse according to the requirement of future users who may know nothing about the learnware in advance. The learnware paradigm can solve entangled problems in the current machine learning paradigm, like continual learning and catastrophic forgetting. It also reduces resources for training a well-performed model. ``Learnware`` is a model sharing platform, which give a basic implementation of the learnware paradigm. A learnware is a well-performed trained machine learning model with a specification that enables it to be adequately identified to reuse according to the requirement of future users who may know nothing about the learnware in advance. The learnware paradigm can solve entangled problems in the current machine learning paradigm, like continual learning and catastrophic forgetting. It also reduces resources for training a well-performed model.


+ 63
- 0
docs/start/get_start_examples.rst View File

@@ -0,0 +1,63 @@
.. _examples:
================================
Experiments & Get Start Examples
================================

This chapter will introduce related experiments to illustrate the search and reuse performance of our learnware system.

================
Environment
================
For all experiments, we used a single linux server. Details on the specifications are listed in the table below. All processors were used for training and evaluating.

==================== ==================== ===============================
System GPU CPU
==================== ==================== ===============================
Ubuntu 20.04.4 LTS Nvidia Tesla V100S Intel(R) Xeon(R) Gold 6240R
==================== ==================== ===============================


================
Experiments
================

Datasets
================
We designed experiments on three publicly available datasets, namely `Prediction Future Sales (PFS) <https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data>`_,
`M5 Forecasting (M5) <https://www.kaggle.com/competitions/m5-forecasting-accuracy/data>`_ and `CIFAR 10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_.
For the two sales forecasting data sets of PFS and M5, we divide the user data according to different stores, and train the Ridge model and LightGBM model on the corresponding data respectively.
For the CIFAR10 image classification task, we first randomly pick 6 to 10 categories, and randomly select 800 to 2000 samples from each category from the categories corresponding to the training set, constituting a total of 50 different uploaders.
For test users, we first randomly pick 3 to 6 categories, and randomly select 150 to 350 samples from each category from the corresponding categories from the test set, constituting a total of 20 different users.

We tested the efficiency of the specification generation and the accuracy of the search and reuse model respectively.
The evaluation index on PFS and M5 data is RMSE, and the evaluation index on CIFAR10 classification task is classification accuracy

Results
================

The time-consuming specification generation is shown in the table below:

==================== ==================== =================================
Dataset Data Dimensions Specification Generation Time (s)
==================== ==================== =================================
PFS
M5
CIFAR10 9000*3*32*32 7~10
==================== ==================== =================================

The accuracy of search and reuse is shown in the table below:

==================== ==================== ================================= =================================
Dataset Top-1 Performance Job Selector Reuse Average Ensemble Reuse
==================== ==================== ================================= =================================
PFS
M5
CIFAR10 0.619 +/- 0.138 0.585 +/- 0.056 0.715 +/- 0.075
==================== ==================== ================================= =================================

=========================
Get Start Examples
=========================
Examples for `PFS, M5` and `CIFAR10` are available at [xxx]. You can run { main.py } directly to reproduce related experiments.
The test code is mainly composed of three parts, namely data preparation (optional), specification generation and market construction, and search test.
You can load data prepared by as and skip the data preparation step.

+ 2
- 2
docs/start/quick.rst View File

@@ -25,7 +25,7 @@ Learnware is currently hosted on `PyPI <https://pypi.org/>`__. You can easily in


.. code-block:: .. code-block::


conda install -c pytorch fais
conda install -c pytorch faiss
pip install learnware pip install learnware




@@ -153,7 +153,7 @@ For example, the following code is designed to work with Reduced Set Kernel Embe
user_spec = specification.rkme.RKMEStatSpecification() user_spec = specification.rkme.RKMEStatSpecification()
user_spec.load(os.path.join(unzip_path, "rkme.json")) user_spec.load(os.path.join(unzip_path, "rkme.json"))
user_info = BaseUserInfo( user_info = BaseUserInfo(
id="user", semantic_spec=user_semantic, stat_info={"RKMEStatSpecification": user_spec}
semantic_spec=user_semantic, stat_info={"RKMEStatSpecification": user_spec}
) )
(sorted_score_list, single_learnware_list, (sorted_score_list, single_learnware_list,
mixture_score, mixture_learnware_list) = easy_market.search_learnware(user_info) mixture_score, mixture_learnware_list) = easy_market.search_learnware(user_info)


+ 1
- 5
docs/workflow/Identify helpful learnwares.rst View File

@@ -23,14 +23,10 @@ The semantic specification ``user_semantic`` is stored in a ``dict``, with keywo
"Name": {"Values": "digits", "Type": "String"}, "Name": {"Values": "digits", "Type": "String"},
} }


.. _semantic_specification:


.. figure: ..\_static\img\semantic_spec.png
:alt: Semantic Specification
.. image:: ../_static/img/semantic_spec.png
:align: center :align: center


引用方式 :ref:`semantic_specification` 。



The user's statistical information ``stat_info`` is stored in a ``json`` file, e.g., ``stat.json``. The generation of this file is seen in `这是一个语义规约生成的链接`_. The user's statistical information ``stat_info`` is stored in a ``json`` file, e.g., ``stat.json``. The generation of this file is seen in `这是一个语义规约生成的链接`_.




+ 1
- 1
examples/workflow_by_code/learnware_example/README.md View File

@@ -1,5 +1,5 @@
## How to Generate Environment Yaml


## 2 Environment
* create env config for conda: * create env config for conda:
```shell ```shell
conda env export | grep -v "^prefix: " > environment.yml conda env export | grep -v "^prefix: " > environment.yml


+ 2
- 2
examples/workflow_by_code/learnware_example/example_init.py View File

@@ -6,7 +6,7 @@ from learnware.model import BaseModel


class SVM(BaseModel): class SVM(BaseModel):
def __init__(self): def __init__(self):
super(SVM, self).__init__(input_shape=(20,), output_shape=())
super(SVM, self).__init__(input_shape=(64,), output_shape=(10,))
dir_path = os.path.dirname(os.path.abspath(__file__)) dir_path = os.path.dirname(os.path.abspath(__file__))
self.model = joblib.load(os.path.join(dir_path, "svm.pkl")) self.model = joblib.load(os.path.join(dir_path, "svm.pkl"))


@@ -14,7 +14,7 @@ class SVM(BaseModel):
pass pass


def predict(self, X: np.ndarray) -> np.ndarray: def predict(self, X: np.ndarray) -> np.ndarray:
return self.model.predict(X)
return self.model.predict_proba(X)


def finetune(self, X: np.ndarray, y: np.ndarray): def finetune(self, X: np.ndarray, y: np.ndarray):
pass pass

+ 52
- 34
examples/workflow_by_code/main.py View File

@@ -1,39 +1,30 @@
import os import os
import fire import fire
import copy
import joblib import joblib
import zipfile import zipfile
import numpy as np import numpy as np
from sklearn import svm from sklearn import svm
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from shutil import copyfile, rmtree from shutil import copyfile, rmtree


import learnware import learnware
from learnware.market import EasyMarket, BaseUserInfo from learnware.market import EasyMarket, BaseUserInfo
from learnware.market import database_ops
from learnware.learnware import Learnware
from learnware.learnware import JobSelectorReuser, AveragingReuser
import learnware.specification as specification import learnware.specification as specification
from learnware.utils import get_module_by_module_path from learnware.utils import get_module_by_module_path


curr_root = os.path.dirname(os.path.abspath(__file__)) curr_root = os.path.dirname(os.path.abspath(__file__))


semantic_specs = [
{
"Data": {"Values": ["Tabular"], "Type": "Class"},
"Task": {"Values": ["Classification"], "Type": "Class"},
"Library": {"Values": ["Scikit-learn"], "Type": "Class"},
"Scenario": {"Values": ["Business"], "Type": "Tag"},
"Description": {"Values": "", "Type": "String"},
"Name": {"Values": "learnware_1", "Type": "String"},
}
]

user_semantic = { user_semantic = {
"Data": {"Values": ["Tabular"], "Type": "Class"}, "Data": {"Values": ["Tabular"], "Type": "Class"},
"Task": { "Task": {
"Values": ["Classification"], "Values": ["Classification"],
"Type": "Class", "Type": "Class",
}, },
"Library": {"Values": ["Scikit-learn"], "Type": "Tag"},
"Scenario": {"Values": ["Business"], "Type": "Class"},
"Library": {"Values": ["Scikit-learn"], "Type": "Class"},
"Scenario": {"Values": ["Education"], "Type": "Tag"},
"Description": {"Values": "", "Type": "String"}, "Description": {"Values": "", "Type": "String"},
"Name": {"Values": "", "Type": "String"}, "Name": {"Values": "", "Type": "String"},
} }
@@ -44,22 +35,23 @@ class LearnwareMarketWorkflow:
"""initialize learnware market""" """initialize learnware market"""
learnware.init() learnware.init()
np.random.seed(2023) np.random.seed(2023)
easy_market = EasyMarket(market_id="workflow_by_code", rebuild=True)
easy_market = EasyMarket(market_id="sklearn_digits", rebuild=True)
return easy_market return easy_market


def prepare_learnware_randomly(self, learnware_num=10):
def prepare_learnware_randomly(self, learnware_num=5):
self.zip_path_list = [] self.zip_path_list = []
X, y = load_digits(return_X_y=True)

for i in range(learnware_num): for i in range(learnware_num):
dir_path = os.path.join(curr_root, "learnware_pool", "svm_%d" % (i)) dir_path = os.path.join(curr_root, "learnware_pool", "svm_%d" % (i))
os.makedirs(dir_path, exist_ok=True) os.makedirs(dir_path, exist_ok=True)


print("Preparing Learnware: %d" % (i)) print("Preparing Learnware: %d" % (i))
data_X = np.random.randn(5000, 20) * i
data_y = np.random.randn(5000)
data_y = np.where(data_y > 0, 1, 0)


clf = svm.SVC(kernel="linear")
data_X, _, data_y, _ = train_test_split(X, y, test_size=0.3, shuffle=True)
clf = svm.SVC(kernel="linear", probability=True)
clf.fit(data_X, data_y) clf.fit(data_X, data_y)

joblib.dump(clf, os.path.join(dir_path, "svm.pkl")) joblib.dump(clf, os.path.join(dir_path, "svm.pkl"))


spec = specification.utils.generate_rkme_spec(X=data_X, gamma=0.1, cuda_idx=0) spec = specification.utils.generate_rkme_spec(X=data_X, gamma=0.1, cuda_idx=0)
@@ -95,7 +87,7 @@ class LearnwareMarketWorkflow:
print("Total Item:", len(easy_market)) print("Total Item:", len(easy_market))


for idx, zip_path in enumerate(self.zip_path_list): for idx, zip_path in enumerate(self.zip_path_list):
semantic_spec = semantic_specs[0]
semantic_spec = copy.deepcopy(user_semantic)
semantic_spec["Name"]["Values"] = "learnware_%d" % (idx) semantic_spec["Name"]["Values"] = "learnware_%d" % (idx)
semantic_spec["Description"]["Values"] = "test_learnware_number_%d" % (idx) semantic_spec["Description"]["Values"] = "test_learnware_number_%d" % (idx)
easy_market.add_learnware(zip_path, semantic_spec) easy_market.add_learnware(zip_path, semantic_spec)
@@ -107,7 +99,6 @@ class LearnwareMarketWorkflow:
if delete: if delete:
for learnware_id in curr_inds: for learnware_id in curr_inds:
easy_market.delete_learnware(learnware_id) easy_market.delete_learnware(learnware_id)
easy_market.delete_learnware(learnware_id)
curr_inds = easy_market._get_ids() curr_inds = easy_market._get_ids()
print("Available ids After Deleting Learnwares:", curr_inds) print("Available ids After Deleting Learnwares:", curr_inds)


@@ -119,22 +110,23 @@ class LearnwareMarketWorkflow:


test_folder = os.path.join(curr_root, "test_semantics") test_folder = os.path.join(curr_root, "test_semantics")


idx, zip_path = 1, self.zip_path_list[1]
unzip_dir = os.path.join(test_folder, f"{idx}")

# unzip -o -q zip_path -d unzip_dir # unzip -o -q zip_path -d unzip_dir
if os.path.exists(unzip_dir):
rmtree(unzip_dir)
os.makedirs(unzip_dir, exist_ok=True)
if os.path.exists(test_folder):
rmtree(test_folder)
os.makedirs(test_folder, exist_ok=True)

with zipfile.ZipFile(self.zip_path_list[0], "r") as zip_obj:
zip_obj.extractall(path=test_folder)


with zipfile.ZipFile(zip_path, "r") as zip_obj:
zip_obj.extractall(path=unzip_dir)
semantic_spec = copy.deepcopy(user_semantic)
semantic_spec["Name"]["Values"] = f"learnware_{learnware_num - 1}"
semantic_spec["Description"]["Values"] = f"test_learnware_number_{learnware_num - 1}"


user_info = BaseUserInfo(semantic_spec=user_semantic)
_, single_learnware_list, _ = easy_market.search_learnware(user_info)
user_info = BaseUserInfo(semantic_spec=semantic_spec)
_, single_learnware_list, _, _ = easy_market.search_learnware(user_info)


print("User info:", user_info.get_semantic_spec()) print("User info:", user_info.get_semantic_spec())
print(f"search result of user{idx}:")
print(f"Search result:")
for learnware in single_learnware_list: for learnware in single_learnware_list:
print("Choose learnware:", learnware.id, learnware.get_specification().get_semantic_spec()) print("Choose learnware:", learnware.id, learnware.get_specification().get_semantic_spec())


@@ -175,6 +167,32 @@ class LearnwareMarketWorkflow:


rmtree(test_folder) # rm -r test_folder rmtree(test_folder) # rm -r test_folder


def test_learnware_reuse(self, learnware_num=5):
easy_market = self.test_upload_delete_learnware(learnware_num, delete=False)
print("Total Item:", len(easy_market))

X, y = load_digits(return_X_y=True)
_, data_X, _, data_y = train_test_split(X, y, test_size=0.3, shuffle=True)

stat_spec = specification.utils.generate_rkme_spec(X=data_X, gamma=0.1, cuda_idx=0)
user_info = BaseUserInfo(semantic_spec=user_semantic, stat_info={"RKMEStatSpecification": stat_spec})

_, _, _, mixture_learnware_list = easy_market.search_learnware(user_info)

# print("Mixture Learnware:", mixture_learnware_list)

# Based on user information, the learnware market returns a list of learnwares (learnware_list)
# Use jobselector reuser to reuse the searched learnwares to make prediction
reuse_job_selector = JobSelectorReuser(learnware_list=mixture_learnware_list)
job_selector_predict_y = reuse_job_selector.predict(user_data=data_X)

# Use averaging ensemble reuser to reuse the searched learnwares to make prediction
reuse_ensemble = AveragingReuser(learnware_list=mixture_learnware_list)
ensemble_predict_y = reuse_ensemble.predict(user_data=data_X)

print("Job Selector Acc:", np.sum(np.argmax(job_selector_predict_y, axis=1) == data_y) / len(data_y))
print("Averaging Selector Acc:", np.sum(np.argmax(ensemble_predict_y, axis=1) == data_y) / len(data_y))



if __name__ == "__main__": if __name__ == "__main__":
fire.Fire(LearnwareMarketWorkflow) fire.Fire(LearnwareMarketWorkflow)

+ 3
- 2
learnware/learnware/base.py View File

@@ -11,10 +11,10 @@ logger = get_module_logger("Learnware")




class Learnware: class Learnware:
"""The learnware class, which is the basic components in learnware market."""
"""The learnware class, which is the basic components in learnware market"""


def __init__(self, id: str, model: Union[BaseModel, dict], specification: Specification): def __init__(self, id: str, model: Union[BaseModel, dict], specification: Specification):
"""The initialization method for learnware
"""The initialization method for learnware.


Parameters Parameters
---------- ----------
@@ -22,6 +22,7 @@ class Learnware:
The learnware id that is generated by market, and is unique The learnware id that is generated by market, and is unique
model : Union[BaseModel, dict] model : Union[BaseModel, dict]
The learnware model for prediction, can be BaseModel or dict The learnware model for prediction, can be BaseModel or dict

- If the model is BaseModel, it denotes the model instant itself - If the model is BaseModel, it denotes the model instant itself
- If the model is dict, it must be the following format: - If the model is dict, it must be the following format:
{ {


+ 3
- 3
learnware/learnware/reuse.py View File

@@ -163,7 +163,7 @@ class JobSelectorReuser(BaseReuser):
Parameters Parameters
---------- ----------
user_data : np.ndarray user_data : np.ndarray
User's labeld raw data.
Raw user data.
task_rkme_list : List[RKMEStatSpecification] task_rkme_list : List[RKMEStatSpecification]
The list of learwares' rkmes whose mixture approximates the user's rkme The list of learwares' rkmes whose mixture approximates the user's rkme
task_rkme_matrix : np.ndarray task_rkme_matrix : np.ndarray
@@ -272,7 +272,7 @@ class AveragingReuser(BaseReuser):
Parameters Parameters
---------- ----------
learnware_list : List[Learnware] learnware_list : List[Learnware]
The learnware list, which should have RKME Specification for each learnweare
The learnware list
""" """
super(AveragingReuser, self).__init__(learnware_list) super(AveragingReuser, self).__init__(learnware_list)
self.mode = mode self.mode = mode
@@ -283,7 +283,7 @@ class AveragingReuser(BaseReuser):
Parameters Parameters
---------- ----------
user_data : np.ndarray user_data : np.ndarray
User's labeled raw data.
Raw user data.


Returns Returns
------- -------


+ 3
- 0
learnware/market/easy.py View File

@@ -1,4 +1,5 @@
import os import os
import copy
from shutil import copyfile, rmtree from shutil import copyfile, rmtree
import zipfile import zipfile
import torch import torch
@@ -143,6 +144,8 @@ class EasyMarket(BaseMarket):
- int indicating what the flag of learnware is added. - int indicating what the flag of learnware is added.


""" """
semantic_spec = copy.deepcopy(semantic_spec)

if not os.path.exists(zip_path): if not os.path.exists(zip_path):
logger.warning("Zip Path NOT Found! Fail to add learnware.") logger.warning("Zip Path NOT Found! Fail to add learnware.")
return None, self.INVALID_LEARNWARE return None, self.INVALID_LEARNWARE


Loading…
Cancel
Save