Browse Source

Merge branch 'dev' into 'doc'

# Conflicts:
#   docs/index.rst
#   docs/workflow/Identify helpful learnwares.rst
tags/v0.3.2
Xiaodong Bi 2 years ago
parent
commit
a5324d762f
12 changed files with 183 additions and 66 deletions
  1. +21
    -0
      .readthedocs.yaml
  2. +31
    -16
      docs/conf.py
  3. +1
    -1
      docs/index.rst
  4. +63
    -0
      docs/start/get_start_examples.rst
  5. +2
    -2
      docs/start/quick.rst
  6. +1
    -5
      docs/workflow/Identify helpful learnwares.rst
  7. +1
    -1
      examples/workflow_by_code/learnware_example/README.md
  8. +2
    -2
      examples/workflow_by_code/learnware_example/example_init.py
  9. +52
    -34
      examples/workflow_by_code/main.py
  10. +3
    -2
      learnware/learnware/base.py
  11. +3
    -3
      learnware/learnware/reuse.py
  12. +3
    -0
      learnware/market/easy.py

+ 21
- 0
.readthedocs.yaml View File

@@ -0,0 +1,21 @@
# .readthedocs.yml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the os and other tools you might need
build:
os: ubuntu-22.04

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py

# Build all formats
formats: all

# Optionally set the version of Python and requirements required to build your docs
python:
version: 3.8

+ 31
- 16
docs/conf.py View File

@@ -90,24 +90,39 @@ html_theme = "sphinx_rtd_theme"

html_logo = "_static/img/logo/logo1.png"


# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
# html_context = {
# "display_github": False,
# "last_updated": True,
# "commit": True,
# "github_user": "Microsoft",
# "github_repo": "QLib",
# 'github_version': 'master',
# 'conf_py_path': '/docs/',

# }
#
html_theme_options = {
"logo_only": True,
"collapse_navigation": False,
"display_version": False,
"navigation_depth": 3,
"navigation_depth": 4,
}


# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# This is required for the alabaster theme
# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
html_sidebars = {
"**": [
"about.html",
"navigation.html",
"relations.html", # needs 'show_related': True theme option to display
"searchbox.html",
]
}


# -- Options for HTMLHelp output ------------------------------------------

# Output file base name for HTML help builder.
htmlhelp_basename = "learnwaredoc"


autodoc_member_order = "bysource"
autodoc_default_flags = ["members"]
autodoc_default_options = {
"members": True,
"member-order": "bysource",
"special-members": "__init__",
}

+ 1
- 1
docs/index.rst View File

@@ -4,7 +4,7 @@
contain the root `toctree` directive.

============================================================
``Learnware`` Documentation
``Learnware Market`` Documentation
============================================================

``Learnware`` is a model sharing platform, which give a basic implementation of the learnware paradigm. A learnware is a well-performed trained machine learning model with a specification that enables it to be adequately identified to reuse according to the requirement of future users who may know nothing about the learnware in advance. The learnware paradigm can solve entangled problems in the current machine learning paradigm, like continual learning and catastrophic forgetting. It also reduces resources for training a well-performed model.


+ 63
- 0
docs/start/get_start_examples.rst View File

@@ -0,0 +1,63 @@
.. _examples:
================================
Experiments & Get Start Examples
================================

This chapter will introduce related experiments to illustrate the search and reuse performance of our learnware system.

================
Environment
================
For all experiments, we used a single linux server. Details on the specifications are listed in the table below. All processors were used for training and evaluating.

==================== ==================== ===============================
System GPU CPU
==================== ==================== ===============================
Ubuntu 20.04.4 LTS Nvidia Tesla V100S Intel(R) Xeon(R) Gold 6240R
==================== ==================== ===============================


================
Experiments
================

Datasets
================
We designed experiments on three publicly available datasets, namely `Prediction Future Sales (PFS) <https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data>`_,
`M5 Forecasting (M5) <https://www.kaggle.com/competitions/m5-forecasting-accuracy/data>`_ and `CIFAR 10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_.
For the two sales forecasting data sets of PFS and M5, we divide the user data according to different stores, and train the Ridge model and LightGBM model on the corresponding data respectively.
For the CIFAR10 image classification task, we first randomly pick 6 to 10 categories, and randomly select 800 to 2000 samples from each category from the categories corresponding to the training set, constituting a total of 50 different uploaders.
For test users, we first randomly pick 3 to 6 categories, and randomly select 150 to 350 samples from each category from the corresponding categories from the test set, constituting a total of 20 different users.

We tested the efficiency of the specification generation and the accuracy of the search and reuse model respectively.
The evaluation index on PFS and M5 data is RMSE, and the evaluation index on CIFAR10 classification task is classification accuracy

Results
================

The time-consuming specification generation is shown in the table below:

==================== ==================== =================================
Dataset Data Dimensions Specification Generation Time (s)
==================== ==================== =================================
PFS
M5
CIFAR10 9000*3*32*32 7~10
==================== ==================== =================================

The accuracy of search and reuse is shown in the table below:

==================== ==================== ================================= =================================
Dataset Top-1 Performance Job Selector Reuse Average Ensemble Reuse
==================== ==================== ================================= =================================
PFS
M5
CIFAR10 0.619 +/- 0.138 0.585 +/- 0.056 0.715 +/- 0.075
==================== ==================== ================================= =================================

=========================
Get Start Examples
=========================
Examples for `PFS, M5` and `CIFAR10` are available at [xxx]. You can run { main.py } directly to reproduce related experiments.
The test code is mainly composed of three parts, namely data preparation (optional), specification generation and market construction, and search test.
You can load data prepared by as and skip the data preparation step.

+ 2
- 2
docs/start/quick.rst View File

@@ -25,7 +25,7 @@ Learnware is currently hosted on `PyPI <https://pypi.org/>`__. You can easily in

.. code-block::

conda install -c pytorch fais
conda install -c pytorch faiss
pip install learnware


@@ -153,7 +153,7 @@ For example, the following code is designed to work with Reduced Set Kernel Embe
user_spec = specification.rkme.RKMEStatSpecification()
user_spec.load(os.path.join(unzip_path, "rkme.json"))
user_info = BaseUserInfo(
id="user", semantic_spec=user_semantic, stat_info={"RKMEStatSpecification": user_spec}
semantic_spec=user_semantic, stat_info={"RKMEStatSpecification": user_spec}
)
(sorted_score_list, single_learnware_list,
mixture_score, mixture_learnware_list) = easy_market.search_learnware(user_info)


+ 1
- 5
docs/workflow/Identify helpful learnwares.rst View File

@@ -23,14 +23,10 @@ The semantic specification ``user_semantic`` is stored in a ``dict``, with keywo
"Name": {"Values": "digits", "Type": "String"},
}

.. _semantic_specification:

.. figure: ..\_static\img\semantic_spec.png
:alt: Semantic Specification
.. image:: ../_static/img/semantic_spec.png
:align: center

引用方式 :ref:`semantic_specification` 。


The user's statistical information ``stat_info`` is stored in a ``json`` file, e.g., ``stat.json``. The generation of this file is seen in `这是一个语义规约生成的链接`_.



+ 1
- 1
examples/workflow_by_code/learnware_example/README.md View File

@@ -1,5 +1,5 @@
## How to Generate Environment Yaml

## 2 Environment
* create env config for conda:
```shell
conda env export | grep -v "^prefix: " > environment.yml


+ 2
- 2
examples/workflow_by_code/learnware_example/example_init.py View File

@@ -6,7 +6,7 @@ from learnware.model import BaseModel

class SVM(BaseModel):
def __init__(self):
super(SVM, self).__init__(input_shape=(20,), output_shape=())
super(SVM, self).__init__(input_shape=(64,), output_shape=(10,))
dir_path = os.path.dirname(os.path.abspath(__file__))
self.model = joblib.load(os.path.join(dir_path, "svm.pkl"))

@@ -14,7 +14,7 @@ class SVM(BaseModel):
pass

def predict(self, X: np.ndarray) -> np.ndarray:
return self.model.predict(X)
return self.model.predict_proba(X)

def finetune(self, X: np.ndarray, y: np.ndarray):
pass

+ 52
- 34
examples/workflow_by_code/main.py View File

@@ -1,39 +1,30 @@
import os
import fire
import copy
import joblib
import zipfile
import numpy as np
from sklearn import svm
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from shutil import copyfile, rmtree

import learnware
from learnware.market import EasyMarket, BaseUserInfo
from learnware.market import database_ops
from learnware.learnware import Learnware
from learnware.learnware import JobSelectorReuser, AveragingReuser
import learnware.specification as specification
from learnware.utils import get_module_by_module_path

curr_root = os.path.dirname(os.path.abspath(__file__))

semantic_specs = [
{
"Data": {"Values": ["Tabular"], "Type": "Class"},
"Task": {"Values": ["Classification"], "Type": "Class"},
"Library": {"Values": ["Scikit-learn"], "Type": "Class"},
"Scenario": {"Values": ["Business"], "Type": "Tag"},
"Description": {"Values": "", "Type": "String"},
"Name": {"Values": "learnware_1", "Type": "String"},
}
]

user_semantic = {
"Data": {"Values": ["Tabular"], "Type": "Class"},
"Task": {
"Values": ["Classification"],
"Type": "Class",
},
"Library": {"Values": ["Scikit-learn"], "Type": "Tag"},
"Scenario": {"Values": ["Business"], "Type": "Class"},
"Library": {"Values": ["Scikit-learn"], "Type": "Class"},
"Scenario": {"Values": ["Education"], "Type": "Tag"},
"Description": {"Values": "", "Type": "String"},
"Name": {"Values": "", "Type": "String"},
}
@@ -44,22 +35,23 @@ class LearnwareMarketWorkflow:
"""initialize learnware market"""
learnware.init()
np.random.seed(2023)
easy_market = EasyMarket(market_id="workflow_by_code", rebuild=True)
easy_market = EasyMarket(market_id="sklearn_digits", rebuild=True)
return easy_market

def prepare_learnware_randomly(self, learnware_num=10):
def prepare_learnware_randomly(self, learnware_num=5):
self.zip_path_list = []
X, y = load_digits(return_X_y=True)

for i in range(learnware_num):
dir_path = os.path.join(curr_root, "learnware_pool", "svm_%d" % (i))
os.makedirs(dir_path, exist_ok=True)

print("Preparing Learnware: %d" % (i))
data_X = np.random.randn(5000, 20) * i
data_y = np.random.randn(5000)
data_y = np.where(data_y > 0, 1, 0)

clf = svm.SVC(kernel="linear")
data_X, _, data_y, _ = train_test_split(X, y, test_size=0.3, shuffle=True)
clf = svm.SVC(kernel="linear", probability=True)
clf.fit(data_X, data_y)

joblib.dump(clf, os.path.join(dir_path, "svm.pkl"))

spec = specification.utils.generate_rkme_spec(X=data_X, gamma=0.1, cuda_idx=0)
@@ -95,7 +87,7 @@ class LearnwareMarketWorkflow:
print("Total Item:", len(easy_market))

for idx, zip_path in enumerate(self.zip_path_list):
semantic_spec = semantic_specs[0]
semantic_spec = copy.deepcopy(user_semantic)
semantic_spec["Name"]["Values"] = "learnware_%d" % (idx)
semantic_spec["Description"]["Values"] = "test_learnware_number_%d" % (idx)
easy_market.add_learnware(zip_path, semantic_spec)
@@ -107,7 +99,6 @@ class LearnwareMarketWorkflow:
if delete:
for learnware_id in curr_inds:
easy_market.delete_learnware(learnware_id)
easy_market.delete_learnware(learnware_id)
curr_inds = easy_market._get_ids()
print("Available ids After Deleting Learnwares:", curr_inds)

@@ -119,22 +110,23 @@ class LearnwareMarketWorkflow:

test_folder = os.path.join(curr_root, "test_semantics")

idx, zip_path = 1, self.zip_path_list[1]
unzip_dir = os.path.join(test_folder, f"{idx}")

# unzip -o -q zip_path -d unzip_dir
if os.path.exists(unzip_dir):
rmtree(unzip_dir)
os.makedirs(unzip_dir, exist_ok=True)
if os.path.exists(test_folder):
rmtree(test_folder)
os.makedirs(test_folder, exist_ok=True)

with zipfile.ZipFile(self.zip_path_list[0], "r") as zip_obj:
zip_obj.extractall(path=test_folder)

with zipfile.ZipFile(zip_path, "r") as zip_obj:
zip_obj.extractall(path=unzip_dir)
semantic_spec = copy.deepcopy(user_semantic)
semantic_spec["Name"]["Values"] = f"learnware_{learnware_num - 1}"
semantic_spec["Description"]["Values"] = f"test_learnware_number_{learnware_num - 1}"

user_info = BaseUserInfo(semantic_spec=user_semantic)
_, single_learnware_list, _ = easy_market.search_learnware(user_info)
user_info = BaseUserInfo(semantic_spec=semantic_spec)
_, single_learnware_list, _, _ = easy_market.search_learnware(user_info)

print("User info:", user_info.get_semantic_spec())
print(f"search result of user{idx}:")
print(f"Search result:")
for learnware in single_learnware_list:
print("Choose learnware:", learnware.id, learnware.get_specification().get_semantic_spec())

@@ -175,6 +167,32 @@ class LearnwareMarketWorkflow:

rmtree(test_folder) # rm -r test_folder

def test_learnware_reuse(self, learnware_num=5):
easy_market = self.test_upload_delete_learnware(learnware_num, delete=False)
print("Total Item:", len(easy_market))

X, y = load_digits(return_X_y=True)
_, data_X, _, data_y = train_test_split(X, y, test_size=0.3, shuffle=True)

stat_spec = specification.utils.generate_rkme_spec(X=data_X, gamma=0.1, cuda_idx=0)
user_info = BaseUserInfo(semantic_spec=user_semantic, stat_info={"RKMEStatSpecification": stat_spec})

_, _, _, mixture_learnware_list = easy_market.search_learnware(user_info)

# print("Mixture Learnware:", mixture_learnware_list)

# Based on user information, the learnware market returns a list of learnwares (learnware_list)
# Use jobselector reuser to reuse the searched learnwares to make prediction
reuse_job_selector = JobSelectorReuser(learnware_list=mixture_learnware_list)
job_selector_predict_y = reuse_job_selector.predict(user_data=data_X)

# Use averaging ensemble reuser to reuse the searched learnwares to make prediction
reuse_ensemble = AveragingReuser(learnware_list=mixture_learnware_list)
ensemble_predict_y = reuse_ensemble.predict(user_data=data_X)

print("Job Selector Acc:", np.sum(np.argmax(job_selector_predict_y, axis=1) == data_y) / len(data_y))
print("Averaging Selector Acc:", np.sum(np.argmax(ensemble_predict_y, axis=1) == data_y) / len(data_y))


if __name__ == "__main__":
fire.Fire(LearnwareMarketWorkflow)

+ 3
- 2
learnware/learnware/base.py View File

@@ -11,10 +11,10 @@ logger = get_module_logger("Learnware")


class Learnware:
"""The learnware class, which is the basic components in learnware market."""
"""The learnware class, which is the basic components in learnware market"""

def __init__(self, id: str, model: Union[BaseModel, dict], specification: Specification):
"""The initialization method for learnware
"""The initialization method for learnware.

Parameters
----------
@@ -22,6 +22,7 @@ class Learnware:
The learnware id that is generated by market, and is unique
model : Union[BaseModel, dict]
The learnware model for prediction, can be BaseModel or dict

- If the model is BaseModel, it denotes the model instant itself
- If the model is dict, it must be the following format:
{


+ 3
- 3
learnware/learnware/reuse.py View File

@@ -163,7 +163,7 @@ class JobSelectorReuser(BaseReuser):
Parameters
----------
user_data : np.ndarray
User's labeld raw data.
Raw user data.
task_rkme_list : List[RKMEStatSpecification]
The list of learwares' rkmes whose mixture approximates the user's rkme
task_rkme_matrix : np.ndarray
@@ -272,7 +272,7 @@ class AveragingReuser(BaseReuser):
Parameters
----------
learnware_list : List[Learnware]
The learnware list, which should have RKME Specification for each learnweare
The learnware list
"""
super(AveragingReuser, self).__init__(learnware_list)
self.mode = mode
@@ -283,7 +283,7 @@ class AveragingReuser(BaseReuser):
Parameters
----------
user_data : np.ndarray
User's labeled raw data.
Raw user data.

Returns
-------


+ 3
- 0
learnware/market/easy.py View File

@@ -1,4 +1,5 @@
import os
import copy
from shutil import copyfile, rmtree
import zipfile
import torch
@@ -143,6 +144,8 @@ class EasyMarket(BaseMarket):
- int indicating what the flag of learnware is added.

"""
semantic_spec = copy.deepcopy(semantic_spec)

if not os.path.exists(zip_path):
logger.warning("Zip Path NOT Found! Fail to add learnware.")
return None, self.INVALID_LEARNWARE


Loading…
Cancel
Save