Merge branch 'dev' into 'doc'

# Conflicts: # docs/index.rst # docs/workflow/Identify helpful learnwares.rst
2 years ago · a5324d762f
--- a/.readthedocs.yaml
+++ b/.readthedocs.yaml
@@ -0,0 +1,21 @@
 # .readthedocs.yml
 # Read the Docs configuration file
 # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

 # Required
 version: 2

 # Set the os and other tools you might need
 build:
  os: ubuntu-22.04

 # Build documentation in the docs/ directory with Sphinx
 sphinx:
  configuration: docs/conf.py

 # Build all formats
 formats: all

 # Optionally set the version of Python and requirements required to build your docs
 python:
  version: 3.8
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -90,24 +90,39 @@ html_theme = "sphinx_rtd_theme"

 html_logo = "_static/img/logo/logo1.png"


 # Theme options are theme-specific and customize the look and feel of a theme
 # further.  For a list of options available for each theme, see the
 # documentation.
 # html_context = {
 #     "display_github": False,
 #     "last_updated": True,
 #     "commit": True,
 #     "github_user": "Microsoft",
 #     "github_repo": "QLib",
 #     'github_version': 'master',
 #     'conf_py_path': '/docs/',

 # }
 #
 html_theme_options = {
    "logo_only": True,
    "collapse_navigation": False,
    "display_version": False,
    "navigation_depth": 3,
    "navigation_depth": 4,
 }


 # Custom sidebar templates, must be a dictionary that maps document names
 # to template names.
 #
 # This is required for the alabaster theme
 # refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
 html_sidebars = {
    "**": [
        "about.html",
        "navigation.html",
        "relations.html",  # needs 'show_related': True theme option to display
        "searchbox.html",
    ]
 }


 # -- Options for HTMLHelp output ------------------------------------------

 # Output file base name for HTML help builder.
 htmlhelp_basename = "learnwaredoc"


 autodoc_member_order = "bysource"
 autodoc_default_flags = ["members"]
 autodoc_default_options = {
    "members": True,
    "member-order": "bysource",
    "special-members": "__init__",
 }
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -4,7 +4,7 @@
   contain the root `toctree` directive.

 ============================================================
 ``Learnware`` Documentation
 ``Learnware Market`` Documentation
 ============================================================

 ``Learnware`` is a model sharing platform, which give a basic implementation of the learnware paradigm. A learnware is a well-performed trained machine learning model with a specification that enables it to be adequately identified to reuse according to the requirement of future users who may know nothing about the learnware in advance. The learnware paradigm can solve entangled problems in the current machine learning paradigm, like continual learning and catastrophic forgetting. It also reduces resources for training a well-performed model. 
--- a/docs/start/get_start_examples.rst
+++ b/docs/start/get_start_examples.rst
@@ -0,0 +1,63 @@
 .. _examples:
 ================================
 Experiments & Get Start Examples
 ================================

 This chapter will introduce related experiments to illustrate the search and reuse performance of our learnware system.

 ================
 Environment
 ================
 For all experiments, we used a single linux server. Details on the specifications are listed in the table below. All processors were used for training and evaluating.

 ====================  ====================  ===============================
 System                GPU                   CPU
 ====================  ====================  ===============================
 Ubuntu 20.04.4 LTS    Nvidia Tesla V100S    Intel(R) Xeon(R) Gold 6240R
 ====================  ====================  ===============================


 ================
 Experiments
 ================

 Datasets
 ================
 We designed experiments on three publicly available datasets, namely `Prediction Future Sales (PFS) <https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data>`_,
 `M5 Forecasting (M5) <https://www.kaggle.com/competitions/m5-forecasting-accuracy/data>`_ and `CIFAR 10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_.
 For the two sales forecasting data sets of PFS and M5, we divide the user data according to different stores, and train the Ridge model and LightGBM model on the corresponding data respectively.
 For the CIFAR10 image classification task, we first randomly pick 6 to 10 categories, and randomly select 800 to 2000 samples from each category from the categories corresponding to the training set, constituting a total of 50 different uploaders.
 For test users, we first randomly pick 3 to 6 categories, and randomly select 150 to 350 samples from each category from the corresponding categories from the test set, constituting a total of 20 different users.

 We tested the efficiency of the specification generation and the accuracy of the search and reuse model respectively.
 The evaluation index on PFS and M5 data is RMSE, and the evaluation index on CIFAR10 classification task is classification accuracy

 Results
 ================

 The time-consuming specification generation is shown in the table below:

 ====================  ====================  =================================
 Dataset               Data Dimensions       Specification Generation Time (s)
 ====================  ====================  =================================
 PFS
 M5
 CIFAR10               9000*3*32*32          7~10
 ====================  ====================  =================================

 The accuracy of search and reuse is shown in the table below:

 ====================  ==================== ================================= =================================
 Dataset               Top-1 Performance    Job Selector Reuse                Average Ensemble Reuse
 ====================  ==================== ================================= =================================
 PFS
 M5
 CIFAR10                 0.619 +/- 0.138    0.585 +/- 0.056                    0.715 +/- 0.075
 ====================  ==================== ================================= =================================

 =========================
 Get Start Examples
 =========================
 Examples for `PFS, M5` and `CIFAR10` are available at [xxx]. You can run { main.py } directly to reproduce related experiments.
 The test code is mainly composed of three parts, namely data preparation (optional), specification generation and market construction, and search test.
 You can load data prepared by as and skip the data preparation step.
--- a/docs/start/quick.rst
+++ b/docs/start/quick.rst
@@ -25,7 +25,7 @@ Learnware is currently hosted on `PyPI <https://pypi.org/>`__. You can easily in

    .. code-block::

        conda install -c pytorch fais
        conda install -c pytorch faiss
        pip install learnware


@@ -153,7 +153,7 @@ For example, the following code is designed to work with Reduced Set Kernel Embe
    user_spec = specification.rkme.RKMEStatSpecification()
    user_spec.load(os.path.join(unzip_path, "rkme.json"))
    user_info = BaseUserInfo(
        id="user", semantic_spec=user_semantic, stat_info={"RKMEStatSpecification": user_spec}
        semantic_spec=user_semantic, stat_info={"RKMEStatSpecification": user_spec}
    )
    (sorted_score_list, single_learnware_list,
        mixture_score, mixture_learnware_list) = easy_market.search_learnware(user_info)
--- a/docs/workflow/Identify
+++ b/docs/workflow/Identify
@@ -23,14 +23,10 @@ The semantic specification ``user_semantic`` is stored in a ``dict``, with keywo
        "Name": {"Values": "digits", "Type": "String"},
    }

 .. _semantic_specification:

 .. figure: ..\_static\img\semantic_spec.png
   :alt: Semantic Specification
 .. image:: ../_static/img/semantic_spec.png
   :align: center

 引用方式 :ref:`semantic_specification` 。


 The user's statistical information ``stat_info`` is stored in a ``json`` file, e.g., ``stat.json``. The generation of this file is seen in `这是一个语义规约生成的链接`_.

--- a/examples/workflow_by_code/learnware_example/README.md
+++ b/examples/workflow_by_code/learnware_example/README.md
@@ -1,5 +1,5 @@
 ## How to Generate Environment Yaml

 ## 2 Environment
 * create env config for conda:
 ```shell
 conda env export | grep -v "^prefix: " > environment.yml
--- a/examples/workflow_by_code/learnware_example/example_init.py
+++ b/examples/workflow_by_code/learnware_example/example_init.py
@@ -6,7 +6,7 @@ from learnware.model import BaseModel

 class SVM(BaseModel):
    def __init__(self):
        super(SVM, self).__init__(input_shape=(20,), output_shape=())
        super(SVM, self).__init__(input_shape=(64,), output_shape=(10,))
        dir_path = os.path.dirname(os.path.abspath(__file__))
        self.model = joblib.load(os.path.join(dir_path, "svm.pkl"))

@@ -14,7 +14,7 @@ class SVM(BaseModel):
        pass

    def predict(self, X: np.ndarray) -> np.ndarray:
        return self.model.predict(X)
        return self.model.predict_proba(X)

    def finetune(self, X: np.ndarray, y: np.ndarray):
        pass
--- a/examples/workflow_by_code/main.py
+++ b/examples/workflow_by_code/main.py
@@ -1,39 +1,30 @@
 import os
 import fire
 import copy
 import joblib
 import zipfile
 import numpy as np
 from sklearn import svm
 from sklearn.datasets import load_digits
 from sklearn.model_selection import train_test_split
 from shutil import copyfile, rmtree

 import learnware
 from learnware.market import EasyMarket, BaseUserInfo
 from learnware.market import database_ops
 from learnware.learnware import Learnware
 from learnware.learnware import JobSelectorReuser, AveragingReuser
 import learnware.specification as specification
 from learnware.utils import get_module_by_module_path

 curr_root = os.path.dirname(os.path.abspath(__file__))

 semantic_specs = [
    {
        "Data": {"Values": ["Tabular"], "Type": "Class"},
        "Task": {"Values": ["Classification"], "Type": "Class"},
        "Library": {"Values": ["Scikit-learn"], "Type": "Class"},
        "Scenario": {"Values": ["Business"], "Type": "Tag"},
        "Description": {"Values": "", "Type": "String"},
        "Name": {"Values": "learnware_1", "Type": "String"},
    }
 ]

 user_semantic = {
    "Data": {"Values": ["Tabular"], "Type": "Class"},
    "Task": {
        "Values": ["Classification"],
        "Type": "Class",
    },
    "Library": {"Values": ["Scikit-learn"], "Type": "Tag"},
    "Scenario": {"Values": ["Business"], "Type": "Class"},
    "Library": {"Values": ["Scikit-learn"], "Type": "Class"},
    "Scenario": {"Values": ["Education"], "Type": "Tag"},
    "Description": {"Values": "", "Type": "String"},
    "Name": {"Values": "", "Type": "String"},
 }
@@ -44,22 +35,23 @@ class LearnwareMarketWorkflow:
        """initialize learnware market"""
        learnware.init()
        np.random.seed(2023)
        easy_market = EasyMarket(market_id="workflow_by_code", rebuild=True)
        easy_market = EasyMarket(market_id="sklearn_digits", rebuild=True)
        return easy_market

    def prepare_learnware_randomly(self, learnware_num=10):
    def prepare_learnware_randomly(self, learnware_num=5):
        self.zip_path_list = []
        X, y = load_digits(return_X_y=True)

        for i in range(learnware_num):
            dir_path = os.path.join(curr_root, "learnware_pool", "svm_%d" % (i))
            os.makedirs(dir_path, exist_ok=True)

            print("Preparing Learnware: %d" % (i))
            data_X = np.random.randn(5000, 20) * i
            data_y = np.random.randn(5000)
            data_y = np.where(data_y > 0, 1, 0)

            clf = svm.SVC(kernel="linear")
            data_X, _, data_y, _ = train_test_split(X, y, test_size=0.3, shuffle=True)
            clf = svm.SVC(kernel="linear", probability=True)
            clf.fit(data_X, data_y)

            joblib.dump(clf, os.path.join(dir_path, "svm.pkl"))

            spec = specification.utils.generate_rkme_spec(X=data_X, gamma=0.1, cuda_idx=0)
@@ -95,7 +87,7 @@ class LearnwareMarketWorkflow:
        print("Total Item:", len(easy_market))

        for idx, zip_path in enumerate(self.zip_path_list):
            semantic_spec = semantic_specs[0]
            semantic_spec = copy.deepcopy(user_semantic)
            semantic_spec["Name"]["Values"] = "learnware_%d" % (idx)
            semantic_spec["Description"]["Values"] = "test_learnware_number_%d" % (idx)
            easy_market.add_learnware(zip_path, semantic_spec)
@@ -107,7 +99,6 @@ class LearnwareMarketWorkflow:
        if delete:
            for learnware_id in curr_inds:
                easy_market.delete_learnware(learnware_id)
                easy_market.delete_learnware(learnware_id)
            curr_inds = easy_market._get_ids()
            print("Available ids After Deleting Learnwares:", curr_inds)

@@ -119,22 +110,23 @@ class LearnwareMarketWorkflow:

        test_folder = os.path.join(curr_root, "test_semantics")

        idx, zip_path = 1, self.zip_path_list[1]
        unzip_dir = os.path.join(test_folder, f"{idx}")

        # unzip -o -q zip_path -d unzip_dir
        if os.path.exists(unzip_dir):
            rmtree(unzip_dir)
        os.makedirs(unzip_dir, exist_ok=True)
        if os.path.exists(test_folder):
            rmtree(test_folder)
        os.makedirs(test_folder, exist_ok=True)

        with zipfile.ZipFile(self.zip_path_list[0], "r") as zip_obj:
            zip_obj.extractall(path=test_folder)

        with zipfile.ZipFile(zip_path, "r") as zip_obj:
            zip_obj.extractall(path=unzip_dir)
        semantic_spec = copy.deepcopy(user_semantic)
        semantic_spec["Name"]["Values"] = f"learnware_{learnware_num - 1}"
        semantic_spec["Description"]["Values"] = f"test_learnware_number_{learnware_num - 1}"

        user_info = BaseUserInfo(semantic_spec=user_semantic)
        _, single_learnware_list, _ = easy_market.search_learnware(user_info)
        user_info = BaseUserInfo(semantic_spec=semantic_spec)
        _, single_learnware_list, _, _ = easy_market.search_learnware(user_info)

        print("User info:", user_info.get_semantic_spec())
        print(f"search result of user{idx}:")
        print(f"Search result:")
        for learnware in single_learnware_list:
            print("Choose learnware:", learnware.id, learnware.get_specification().get_semantic_spec())

@@ -175,6 +167,32 @@ class LearnwareMarketWorkflow:

        rmtree(test_folder)  # rm -r test_folder

    def test_learnware_reuse(self, learnware_num=5):
        easy_market = self.test_upload_delete_learnware(learnware_num, delete=False)
        print("Total Item:", len(easy_market))

        X, y = load_digits(return_X_y=True)
        _, data_X, _, data_y = train_test_split(X, y, test_size=0.3, shuffle=True)

        stat_spec = specification.utils.generate_rkme_spec(X=data_X, gamma=0.1, cuda_idx=0)
        user_info = BaseUserInfo(semantic_spec=user_semantic, stat_info={"RKMEStatSpecification": stat_spec})

        _, _, _, mixture_learnware_list = easy_market.search_learnware(user_info)

        # print("Mixture Learnware:", mixture_learnware_list)

        # Based on user information, the learnware market returns a list of learnwares (learnware_list)
        # Use jobselector reuser to reuse the searched learnwares to make prediction
        reuse_job_selector = JobSelectorReuser(learnware_list=mixture_learnware_list)
        job_selector_predict_y = reuse_job_selector.predict(user_data=data_X)

        # Use averaging ensemble reuser to reuse the searched learnwares to make prediction
        reuse_ensemble = AveragingReuser(learnware_list=mixture_learnware_list)
        ensemble_predict_y = reuse_ensemble.predict(user_data=data_X)

        print("Job Selector Acc:", np.sum(np.argmax(job_selector_predict_y, axis=1) == data_y) / len(data_y))
        print("Averaging Selector Acc:", np.sum(np.argmax(ensemble_predict_y, axis=1) == data_y) / len(data_y))


 if __name__ == "__main__":
    fire.Fire(LearnwareMarketWorkflow)
--- a/learnware/learnware/base.py
+++ b/learnware/learnware/base.py
@@ -11,10 +11,10 @@ logger = get_module_logger("Learnware")


 class Learnware:
    """The learnware class, which is the basic components in learnware market."""
    """The learnware class, which is the basic components in learnware market"""

    def __init__(self, id: str, model: Union[BaseModel, dict], specification: Specification):
        """The initialization method for learnware
        """The initialization method for learnware.

        Parameters
        ----------
@@ -22,6 +22,7 @@ class Learnware:
            The learnware id that is generated by market, and is unique
        model : Union[BaseModel, dict]
            The learnware model for prediction, can be BaseModel or dict

            - If the model is BaseModel, it denotes the model instant itself
            - If the model is dict, it must be the following format:
                {
--- a/learnware/learnware/reuse.py
+++ b/learnware/learnware/reuse.py
@@ -163,7 +163,7 @@ class JobSelectorReuser(BaseReuser):
        Parameters
        ----------
        user_data : np.ndarray
            User's labeld raw data.
            Raw user data.
        task_rkme_list : List[RKMEStatSpecification]
            The list of learwares' rkmes whose mixture approximates the user's rkme
        task_rkme_matrix : np.ndarray
@@ -272,7 +272,7 @@ class AveragingReuser(BaseReuser):
        Parameters
        ----------
        learnware_list : List[Learnware]
            The learnware list, which should have RKME Specification for each learnweare
            The learnware list
        """
        super(AveragingReuser, self).__init__(learnware_list)
        self.mode = mode
@@ -283,7 +283,7 @@ class AveragingReuser(BaseReuser):
        Parameters
        ----------
        user_data : np.ndarray
            User's labeled raw data.
            Raw user data.

        Returns
        -------
--- a/learnware/market/easy.py
+++ b/learnware/market/easy.py
@@ -1,4 +1,5 @@
 import os
 import copy
 from shutil import copyfile, rmtree
 import zipfile
 import torch
@@ -143,6 +144,8 @@ class EasyMarket(BaseMarket):
            - int indicating what the flag of learnware is added.

        """
        semantic_spec = copy.deepcopy(semantic_spec)

        if not os.path.exists(zip_path):
            logger.warning("Zip Path NOT Found! Fail to add learnware.")
            return None, self.INVALID_LEARNWARE