[DOC] Refactor docstags/v0.3.2
| @@ -1,5 +1,6 @@ | |||
| [](https://pypi.org/project/learnware/#files) | |||
| [](https://pypi.org/project/learnware/#files) | |||
| [](https://github.com/Learnware-LAMDA/Learnware/actions) | |||
| [](https://pypi.org/project/learnware/#history) | |||
| [](https://learnware.readthedocs.io/en/latest/?badge=latest) | |||
| [](LICENSE) | |||
| @@ -11,30 +12,21 @@ | |||
| </div> | |||
| ``Learnware`` is a model sharing platform, which give a basic implementation of the learnware paradigm. A learnware is a well-performed trained machine learning model with a specification that enables it to be adequately identified to reuse according to the requirement of future users who may know nothing about the learnware in advance. The learnware paradigm can solve entangled problems in the current machine learning paradigm, like continual learning and catastrophic forgetting. It also reduces resources for training a well-performed model. | |||
| The `learnware` package provides a fundamental implementation of the central concepts and procedures for the learnware paradigm, which is a new paradigm aimed at enabling users to reuse existed well-trained models to solve their AI tasks instead of starting from scratch. | |||
| Moreover, the package's well-structured design ensures high scalability and allows for the effortless integration of various new features and techniques in the future. | |||
| # Introduction | |||
| ## Framework | |||
| In addition, the `learnware` package serves as the engine for the [Beimingwu System](https://bmwu.cloud) and can be effectively employed for conducting experiments related to learnware. | |||
| <div align="center"> | |||
| <img src="./docs/_static/img/learnware_paradigm.jpg" width="70%"/> | |||
| </div> | |||
| Machine learning, especially the prevailing big model paradigm, has achieved great success in natural language processing and computer vision applications. However, it still faces challenges such as the requirement of a large amount of labeled training data, difficulty in adapting to changing environments, and catastrophic forgetting when refining trained models incrementally. These big models, while useful in their targeted tasks, often fail to address the above issues and struggle to generalize beyond their specific purposes. | |||
| <div align="center"> | |||
| <img src="./docs/_static/img/learnware_market.jpg" width="70%" /> | |||
| </div> | |||
| The learnware paradigm introduces the concept of a well-performed, trained machine learning model with a specification that allows future users, who have no prior knowledge of the learnware, to reuse it based on their requirements. | |||
| # Introduction | |||
| Developers or owners of trained machine learning models can submit their models to a learnware market. If accepted, the market assigns a specification to the model and accommodates it. The learnware market could host thousands or millions of well-performed models from different developers, for various tasks, using diverse data, and optimizing different objectives. | |||
| ## Learnware Paradigm | |||
| Instead of building a model from scratch, users can submit their requirements to the learnware market, which then identifies and deploys helpful learnware(s) based on the specifications. Users can apply the learnware directly, adapt it using their data, or exploit it in other ways to improve their model. This process is more efficient and less expensive than building a model from scratch. | |||
| A learnware consists of a high-performance machine learning model and specifications that characterize the model, i.e., "Learnware = Model + Specification". | |||
| These specifications, encompassing both semantic and statistical aspects, detail the model's functionality and statistical information, making it easier for future users to identify and reuse these models. | |||
| ## Benefits of the Learnware Paradigm | |||
| The need for Learnware arises due to challenges in machine learning, such as the need for extensive training data, advanced techniques, continuous learning, catastrophic forgetting, and data privacy issues. Although there are many efforts focusing on one of these issues separately, they are entangled, and solving one problem may exacerbate others. The learnware paradigm aims to address many of these challenges through a unified framework. Its benefits are listed as follows. | |||
| | Benefit | Description | | |||
| | ---- | ---- | | |||
| @@ -46,227 +38,353 @@ Instead of building a model from scratch, users can submit their requirements to | |||
| | Unplanned tasks | Open to all legal developers, the learnware market can accommodate helpful learnwares for various tasks. | | |||
| | Carbon emission | Assembling small models may offer good-enough performance, reducing interest in training large models and the carbon footprint. | | |||
| # Quick Start | |||
| ## Installation | |||
| Learnware is currently hosted on [PyPI](https://pypi.org/). You can easily intsall ``Learnware`` according to the following steps: | |||
| - For Windows and Linux users: | |||
| The learnware paradigm consists of two distinct stages: | |||
| - `Submitting Stage`: Developers voluntarily submit various learnwares to the learnware market, and the system conducts quality checks and further organization of these learnwares. | |||
| - `Deploying Stage`: When users submit task requirements, the learnware market automatically selects whether to recommend a single learnware or a combination of multiple learnwares and provides efficient deployment methods. Whether it’s a single learnware or a combination of multiple learnwares, the system offers convenient learnware reuse interfaces. | |||
| ```bash | |||
| pip install learnware | |||
| ``` | |||
| <div align="center"> | |||
| <img src="./docs/_static/img/learnware_market.svg" width="70%" /> | |||
| </div> | |||
| - For macOS users: | |||
| ## Learnware Package Design | |||
| ```bash | |||
| conda install -c pytorch faiss | |||
| pip install learnware | |||
| ``` | |||
| <div align="center"> | |||
| <img src="./docs/_static/img/learnware_framework.svg" width="70%"/> | |||
| </div> | |||
| ## Prepare Learnware | |||
| The Learnware Market consists of a wide range of learnwares. A valid learnware is a zipfile which | |||
| is composed of the following four parts. | |||
| At the workflow level, the `learnware` package consists of `Submitting Stage` and `Deploying Stage`. | |||
| At the module level, the `learnware` package is a platform that consists of above components. The components are designed as loose-coupled modules and each component could be used stand-alone. | |||
| - ``__init__.py`` | |||
| # Quick Start | |||
| A python file offering interfaces for your model's fitting, predicting and fine-tuning. | |||
| ## Installation | |||
| - ``rkme.json`` | |||
| Learnware is currently hosted on [PyPI](https://pypi.org/project/learnware/). You can easily install `learnware` by following these steps: | |||
| A json file containing the statistical specification of your data. | |||
| ```bash | |||
| pip install learnware | |||
| ``` | |||
| - ``learnware.yaml`` | |||
| A config file describing your model class name, type of statistical specification(e.g. Reduced Kernel Mean Embedding, ``RKMETableSpecification``), and | |||
| the file name of your statistical specification file. | |||
| In the `learnware` package, besides the base classes, many core functionalities such as "learnware specification generation" and "learnware deployment" rely on the `torch` library. Users have the option to manually install `torch`, or they can directly use the following command to install the `learnware` package: | |||
| - ``environment.yaml`` | |||
| ```bash | |||
| pip install learnware[full] | |||
| ``` | |||
| A Conda environment configuration file for running the model (if the model environment is incompatible, you can rely on this for manual configuration). | |||
| You can generate this file according to the following steps: | |||
| **Note:** However, it's crucial to note that due to the potential complexity of the user's local environment, installing `learnware[full]` does not guarantee that `torch` will successfully invoke `CUDA` in the user's local setting. | |||
| - Create env config for conda: | |||
| ## Prepare Learnware | |||
| ```bash | |||
| conda env export | grep -v "^prefix: " > environment.yaml | |||
| ``` | |||
| - Recover env from config: | |||
| In the `learnware` package, each learnware is encapsulated in a `zip` package, which should contain at least the following four files: | |||
| ```bash | |||
| conda env create -f environment.yaml | |||
| ``` | |||
| - `learnware.yaml`: learnware configuration file. | |||
| - `__init__.py`: methods for using the model. | |||
| - `stat.json`: the statistical specification of the learnware. Its filename can be customized and recorded in learnware.yaml. | |||
| - `environment.yaml` or `requirements.txt`: specifies the environment for the model. | |||
| We also demonstrate the detail format of learnware zipfile in [DOC link], and also please refer to [Examples](./examples/workflow_by_code/learnware_example) for concrete learnware zipfile example. | |||
| To facilitate the construction of a learnware, we provide a [Learnware Template](https://www.bmwu.cloud/static/learnware-template.zip) that users can use as a basis for building their own learnware. We've also detailed the format of the learnware `zip` package in [Learnware Preparation](docs/workflows/upload:prepare-learnware). | |||
| ## Learnware Market Workflow | |||
| ## Learnware Package Workflow | |||
| Users can start an ``Learnware`` workflow according to the following steps: | |||
| Users can start a `learnware` workflow according to the following steps: | |||
| ### Initialize a Learnware Market | |||
| The ``EasyMarket`` class implements the most basic set of functions in a ``Learnware``. | |||
| You can use the following code snippet to initialize a basic ``Learnware`` named "demo": | |||
| The `EasyMarket` class provides the core functions of a `Learnware Market`. You can initialize a basic `Learnware Market` named "demo" using the code snippet below: | |||
| ```python | |||
| import learnware | |||
| from learnware.market import EasyMarket | |||
| from learnware.market import instantiate_learnware_market | |||
| learnware.init() | |||
| easy_market = EasyMarket(market_id="demo", rebuild=True) | |||
| # instantiate a demo market | |||
| demo_market = instantiate_learnware_market(market_id="demo", name="easy", rebuild=True) | |||
| ``` | |||
| ### Upload Leanwares | |||
| ### Upload Learnware | |||
| Before uploading your learnware into the ``Learnware``, | |||
| create a semantic specification ``semantic_spec`` by selecting or filling in values for the predefined semantic tags | |||
| to describe the features of your task and model. | |||
| Before uploading your learnware to the `Learnware Market`, you'll need to create a semantic specification, `semantic_spec`. This involves selecting or inputting values for predefined semantic tags to describe the features of your task and model. | |||
| For example, the following code snippet demonstrates the semantic specification | |||
| of a Scikit-Learn type model, which is designed for business scenario and performs classification on tabular data: | |||
| For instance, the following code illustrates the semantic specification for a Scikit-Learn type model. This model is tailored for education scenarios and performs classification tasks on tabular data: | |||
| ```python | |||
| semantic_spec = { | |||
| "Data": {"Values": ["Tabular"], "Type": "Class"}, | |||
| "Task": {"Values": ["Classification"], "Type": "Class"}, | |||
| "Library": {"Values": ["Scikit-learn"], "Type": "Class"}, | |||
| "Scenario": {"Values": ["Business"], "Type": "Tag"}, | |||
| "Description": {"Values": "", "Type": "String"}, | |||
| "Name": {"Values": "demo_learnware", "Type": "String"}, | |||
| } | |||
| from learnware.specification import generate_semantic_spec | |||
| semantic_spec = generate_semantic_spec( | |||
| name="demo_learnware", | |||
| data_type="Table", | |||
| task_type="Classification", | |||
| library_type="Scikit-learn", | |||
| scenarios="Education", | |||
| license="MIT", | |||
| ) | |||
| ``` | |||
| Once the semantic specification is defined, | |||
| you can easily upload your learnware with a single line of code: | |||
| After defining the semantic specification, you can upload your learnware using a single line of code: | |||
| ```python | |||
| easy_market.add_learnware(zip_path, semantic_spec) | |||
| demo_market.add_learnware(zip_path, semantic_spec) | |||
| ``` | |||
| Here, ``zip_path`` is the directory of your learnware zipfile. | |||
| Here, `zip_path` is the directory of your learnware `zip` package. | |||
| ### Semantic Specification Search | |||
| To search for learnwares that fit your task purpose, | |||
| you should also provide a semantic specification ``user_semantic`` that describes the characteristics of your task. | |||
| The ``Learnware`` will perform a first-stage search based on ``user_semantic``, | |||
| identifying potentially helpful leranwares whose models solve tasks similar to your requirements. | |||
| To find learnwares that align with your task's purpose, you'll need to provide a semantic specification, `user_semantic`, that outlines your task's characteristics. The `Learnware Market` will then perform an initial search using `user_semantic`, identifying potentially useful learnwares with models that solve tasks similar to your requirements. | |||
| ```python | |||
| # construct user_info which includes semantic specification for searching learnware | |||
| # construct user_info, which includes a semantic specification | |||
| user_info = BaseUserInfo(id="user", semantic_spec=semantic_spec) | |||
| # search_learnware performs semantic specification search if user_info doesn't include a statistical specification | |||
| _, single_learnware_list, _ = easy_market.search_learnware(user_info) | |||
| # search_learnware: performs semantic specification search when user_info doesn't include a statistical specification | |||
| search_result = easy_market.search_learnware(user_info) | |||
| single_result = search_results.get_single_results() | |||
| # single_learnware_list is the learnware list by semantic specification searching | |||
| print(single_learnware_list) | |||
| # single_result: the List of Tuple[Score, Learnware] returned by semantic specification search | |||
| print(single_result) | |||
| ``` | |||
| ### Statistical Specification Search | |||
| If you choose to porvide your own statistical specification file ``stat.json``, | |||
| the ``Learnware`` can perform a more accurate leanware selection from | |||
| the learnwares returned by the previous step. This second-stage search is based on statistical information | |||
| and returns one or more learnwares that are most likely to be helpful for your task. | |||
| If you decide in favor of providing your own statistical specification file, `stat.json`, the `Learnware Market` can further refine the selection of learnwares from the previous step. This second-stage search leverages statistical information to identify one or more learnwares that are most likely to be beneficial for your task. | |||
| For example, the following code is designed to work with Reduced Set Kernel Embedding as a statistical specification: | |||
| For example, the code below executes learnware search when using Reduced Set Kernel Embedding as the statistical specification: | |||
| ```python | |||
| import learnware.specification as specification | |||
| user_spec = specification.RKMETableSpecification() | |||
| # unzip_path: directory for unzipped learnware zipfile | |||
| user_spec.load(os.path.join(unzip_path, "rkme.json")) | |||
| user_info = BaseUserInfo( | |||
| semantic_spec=user_semantic, stat_info={"RKMETableSpecification": user_spec} | |||
| ) | |||
| (sorted_score_list, single_learnware_list, | |||
| mixture_score, mixture_learnware_list) = easy_market.search_learnware(user_info) | |||
| # sorted_score_list is the learnware scores based on MMD distances, sorted in descending order | |||
| print(sorted_score_list) | |||
| # single_learnware_list is the learnwares sorted in descending order based on their scores | |||
| print(single_learnware_list) | |||
| # mixture_learnware_list is the learnwares whose mixture is helpful for your task | |||
| print(mixture_learnware_list) | |||
| # mixture_score is the score of the mixture of learnwares | |||
| print(mixture_score) | |||
| search_result = easy_market.search_learnware(user_info) | |||
| single_result = search_results.get_single_results() | |||
| multiple_result = search_results.get_multiple_results() | |||
| # search_item.score: based on MMD distances, sorted in descending order | |||
| # search_item.learnware.id: id of learnwares, sorted by scores in descending order | |||
| for search_item in single_result: | |||
| print(f"score: {search_item.score}, learnware_id: {search_item.learnware.id}") | |||
| # mixture_item.learnwares: collection of learnwares whose combined use is beneficial | |||
| # mixture_item.score: score assigned to the combined set of learnwares in `mixture_item.learnwares` | |||
| for mixture_item in multiple_result: | |||
| print(f"mixture_score: {mixture_item.score}\n") | |||
| mixture_id = " ".join([learnware.id for learnware in mixture_item.learnwares]) | |||
| print(f"mixture_learnware: {mixture_id}\n") | |||
| ``` | |||
| ### Reuse Learnwares | |||
| Based on the returned list of learnwares ``mixture_learnware_list`` in the previous step, | |||
| you can easily reuse them to make predictions your own data, instead of training a model from scratch. | |||
| We provide two baseline methods for reusing a given list of learnwares, namely ``JobSelectorReuser`` and ``AveragingReuser``. | |||
| Simply replace ``test_x`` in the code snippet below with your own testing data and start reusing learnwares! | |||
| With the list of learnwares, `mixture_learnware_list`, returned from the previous step, you can readily apply them to make predictions on your own data, bypassing the need to train a model from scratch. We provide two methods for reusing a given list of learnwares: `JobSelectorReuser` and `AveragingReuser`. Substitute `test_x` in the code snippet below with your testing data, and you're all set to reuse learnwares: | |||
| ```python | |||
| from learnware.reuse import JobSelectorReuser, AveragingReuser | |||
| # using jobselector reuser to reuse the searched learnwares to make prediction | |||
| reuse_job_selector = JobSelectorReuser(learnware_list=mixture_learnware_list) | |||
| reuse_job_selector = JobSelectorReuser(learnware_list=mixture_item.learnwares) | |||
| job_selector_predict_y = reuse_job_selector.predict(user_data=test_x) | |||
| # using averaging ensemble reuser to reuse the searched learnwares to make prediction | |||
| reuse_ensemble = AveragingReuser(learnware_list=mixture_learnware_list) | |||
| reuse_ensemble = AveragingReuser(learnware_list=mixture_item.learnwares) | |||
| ensemble_predict_y = reuse_ensemble.predict(user_data=test_x) | |||
| ``` | |||
| ## Auto Workflow Example | |||
| We also provide two methods when the user has labeled data for reusing a given list of learnwares: `EnsemblePruningReuser` and `FeatureAugmentReuser`. Substitute `test_x` in the code snippet below with your testing data, and substitute `train_X, train_y` with your training labeled data, and you're all set to reuse learnwares: | |||
| ``Learnware`` also provides an auto workflow example, which includes preparing learnwares, upload and delete learnware from markets, search learnware with semantic specifications and statistical specifications. The users can run ``examples/workflow_by_code.py`` to try the basic workflow of ``Learnware``. | |||
| ```python | |||
| from learnware.reuse import EnsemblePruningReuser, FeatureAugmentReuser | |||
| # Use ensemble pruning reuser to reuse the searched learnwares to make prediction | |||
| reuse_ensemble = EnsemblePruningReuser(learnware_list=mixture_item.learnwares, mode="classification") | |||
| reuse_ensemble.fit(train_X, train_y) | |||
| ensemble_pruning_predict_y = reuse_ensemble.predict(user_data=data_X) | |||
| # Use feature augment reuser to reuse the searched learnwares to make prediction | |||
| reuse_feature_augment = FeatureAugmentReuser(learnware_list=mixture_item.learnwares, mode="classification") | |||
| reuse_feature_augment.fit(train_X, train_y) | |||
| feature_augment_predict_y = reuse_feature_augment.predict(user_data=data_X) | |||
| ``` | |||
| ### Auto Workflow Example | |||
| The `learnware` package also offers automated workflow examples. This includes preparing learnwares, uploading and deleting learnwares from the market, and searching for learnwares using both semantic and statistical specifications. To experience the basic workflow of the `learnware` package, the users can run `test/test_workflow/test_workflow.py` to try the basic workflow of `learnware`. | |||
| # Experiments and Examples | |||
| ## Environment | |||
| For all experiments, we used a single linux server. Details on the specifications are listed in the table below. All processors were used for training and evaluating. | |||
| For all experiments, we used a single Linux server. Details on the specifications are listed in the table below. All processors were used for training and evaluating. | |||
| <div align=center> | |||
| | System | GPU | CPU | | |||
| |----------------------|--------------------|--------------------------| | |||
| | Ubuntu 20.04.4 LTS | Nvidia Tesla V100S | Intel(R) Xeon(R) Gold 6240R | | |||
| </div> | |||
| ## Tabular Scenario Experiments | |||
| ### Datasets | |||
| Our study involved three public datasets in the sales forecasting field: [Predict Future Sales (PFS)](https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data), [M5 Forecasting (M5)](https://www.kaggle.com/competitions/m5-forecasting-accuracy/data), and [Corporacion](https://www.kaggle.com/competitions/favorita-grocery-sales-forecasting/data). | |||
| We applied various pre-processing methods to these datasets to enhance the richness of the data. After pre-processing, we first divided each dataset by store and then split the data for each store into training and test sets. Specifically: | |||
| - For PFS, the test set consisted of the last month's data from each store. | |||
| - For M5, we designated the final 28 days' data from each store as the test set. | |||
| - For Corporacion, the test set was composed of the last 16 days of data from each store. | |||
| In the submitting stage, the Corporacion dataset's 55 stores are regarded as 165 uploaders, each employing one of three different feature engineering methods. For the PFS dataset, 100 uploaders are established, each using one of two feature engineering approaches. These uploaders then utilize their respective stores' training data to develop LightGBM models. As a result, the learnware market comprises 265 learnwares, derived from five types of feature spaces and two types of label spaces. | |||
| Based on the specific design of user tasks, our experiments were primarily categorized into two types: | |||
| - **homogeneous experiments** are designed to evaluate performance when users can reuse learnwares in the learnware market that have the same feature space as their tasks (homogeneous learnwares). This contributes to showing the effectiveness of using learnwares that align closely with the user's specific requirements. | |||
| - **heterogeneous experiments** aim to evaluate the performance of identifying and reusing helpful heterogeneous learnwares in situations where no available learnwares match the feature space of the user's task. This helps to highlight the potential of learnwares for applications beyond their original purpose. | |||
| ### Homogeneous Tabular Scenario | |||
| For homogeneous experiments, the 55 stores in the Corporacion dataset act as 55 users, each applying one feature engineering method, and using the test data from their respective store as user data. These users can then search for homogeneous learnwares in the market with the same feature spaces as their tasks. | |||
| | System | GPU | CPU | | |||
| | ---- | ---- | ---- | | |||
| | Ubuntu 20.04.4 LTS | Nvidia Tesla V100S | Intel(R) Xeon(R) Gold 6240R | | |||
| The Mean Squared Error (MSE) of search and reuse across all users is presented in the table below: | |||
| <div align=center> | |||
| | Setting | MSE | | |||
| |-----------------------------------|--------| | |||
| | Mean in Market (Single) | 0.331 | | |||
| | Best in Market (Single) | 0.151 | | |||
| | Top-1 Reuse (Single) | 0.280 | | |||
| | Job Selector Reuse (Multiple) | 0.274 | | |||
| | Average Ensemble Reuse (Multiple) | 0.267 | | |||
| </div> | |||
| When users have both test data and limited training data derived from their original data, reusing single or multiple searched learnwares from the market can often yield better results than training models from scratch on limited training data. We present the change curves in MSE for the user's self-trained model, as well as for the Feature Augmentation single learnware reuse method and the Ensemble Pruning multiple learnware reuse method. These curves display their performance on the user's test data as the amount of labeled training data increases. The average results across 55 users are depicted in the figure below: | |||
| <div align=center> | |||
| <img src="./docs/_static/img/table_homo_labeled.png" width="50%"/> | |||
| </div> | |||
| From the figure, it's evident that when users have limited training data, the performance of reusing single/multiple table learnwares is superior to that of the user's own model. This emphasizes the benefit of learnware reuse in significantly reducing the need for extensive training data and achieving enhanced results when available user training data is limited. | |||
| ### Heterogeneous Tabular Scenario | |||
| ## Datasets | |||
| In heterogeneous experiments, the learnware market would recommend helpful heterogeneous learnwares with different feature spaces with the user tasks. Based on whether there are learnwares in the market that handle tasks similar to the user's task, the experiments can be further subdivided into the following two types: | |||
| We designed experiments on three publicly available datasets, namely Prediction Future Sales (PFS), M5 Forecasting (M5) and CIFAR 10. For the two sales forecasting data sets of PFS and M5, we divide the user data according to different stores, and train the Ridge model and LightGBM model on the corresponding data respectively. For the CIFAR10 image classification task, we first randomly pick 6 to 10 categories, and randomly select 800 to 2000 samples from each category from the categories corresponding to the training set, constituting a total of 50 different uploaders. For test users, we first randomly pick 3 to 6 categories, and randomly select 150 to 350 samples from each category from the corresponding categories from the test set, constituting a total of 20 different users. | |||
| #### Cross Feature Space Experiments | |||
| We tested the efficiency of the specification generation and the accuracy of the search and reuse model respectively. The evaluation index on PFS and M5 data is RMSE, and the evaluation index on CIFAR10 classification task is classification accuracy | |||
| We designate the 41 stores in the PFS dataset as users, creating their user data with an alternative feature engineering approach that varies from the methods employed by learnwares in the market. Consequently, while the market's learnwares from the PFS dataset undertake tasks very similar to our users, the feature spaces do not match exactly. In this experimental configuration, we tested various heterogeneous learnware reuse methods (without using user's labeled data) and compared them to the user's self-trained model based on a small amount of training data. The average MSE performance across 41 users is as follows: | |||
| ## Results | |||
| <div align=center> | |||
| The time-consuming specification generation is shown in the table below: | |||
| | Setting | MSE | | |||
| |-----------------------------------|--------| | |||
| | Mean in Market (Single) | 1.459 | | |||
| | Best in Market (Single) | 1.226 | | |||
| | Top-1 Reuse (Single) | 1.407 | | |||
| | Average Ensemble Reuse (Multiple) | 1.312 | | |||
| | User model with 50 labeled data | 1.267 | | |||
| | Dataset | Data Dimensions | Specification Generation Time (s) | | |||
| | ---- | ---- | ---- | | |||
| | PFS | 8714274*31 | < 1.5 | | |||
| | M5 | 46027957*82 | 9~15 | | |||
| | CIFAR 10 | 9000\*3\*32\*32 | 7~10 | | |||
| </div> | |||
| From the results, it is noticeable that the learnware market still performs quite well even when users lack labeled data, provided it includes learnwares addressing tasks that are similar but not identical to the user's. In these instances, the market's effectiveness can match or even rival scenarios where users have access to a limited quantity of labeled data. | |||
| The accuracy of search and reuse is shown in the table below: | |||
| #### Cross Task Experiments | |||
| | Dataset | Top-1 Performance | Job Selector Reuse | Average Ensemble Reuse | | |||
| | ---- | ---- | ---- | ---- | | |||
| | PFS | 1.955 +/- 2.866 | 2.175 +/- 2.847 | 1.950 +/- 2.888 | | |||
| | M5 | 2.066 +/- 0.424 | 2.116 +/- 0.472 | 2.512 +/- 0.573 | | |||
| | CIFAR 10 | 0.619 +/- 0.138 | 0.585 +/- 0.056 | .715 +/- 0.075 | | |||
| Here we have chosen the 10 stores from the M5 dataset to act as users. Although the broad task of sales forecasting is similar to the tasks addressed by the learnwares in the market, there are no learnwares available that directly cater to the M5 sales forecasting requirements. All learnwares show variations in both feature and label spaces compared to the tasks of M5 users. We present the change curves in RMSE for the user's self-trained model and several learnware reuse methods. These curves display their performance on the user's test data as the amount of labeled training data increases. The average results across 10 users are depicted in the figure below: | |||
| <div align=center> | |||
| <img src="./docs/_static/img/table_hetero_labeled.png" width="50%"/> | |||
| </div> | |||
| We can observe that heterogeneous learnwares are beneficial when there's a limited amount of the user's labeled training data available, aiding in better alignment with the user's specific task. This underscores the potential of learnwares to be applied to tasks beyond their original purpose. | |||
| ## Image Scenario Experiment | |||
| For the CIFAR-10 dataset, we sampled the training set unevenly by category and constructed unbalanced training datasets for the 50 learnwares that contained only some of the categories. This makes it unlikely that there exists any learnware in the learnware market that can accurately handle all categories of data; only the learnware whose training data is closest to the data distribution of the target task is likely to perform well on the target task. Specifically, the probability of each category being sampled obeys a random multinomial distribution, with a non-zero probability of sampling on only 4 categories, and the sampling ratio is 0.4: 0.4: 0.1: 0.1. Ultimately, the training set for each learnware contains 12,000 samples covering the data of 4 categories in CIFAR-10. | |||
| We constructed 50 target tasks using data from the test set of CIFAR-10. Similar to constructing the training set for the learnwares, to allow for some variation between tasks, we sampled the test set unevenly. Specifically, the probability of each category being sampled obeys a random multinomial distribution, with non-zero sampling probability on 6 categories, and the sampling ratio is 0.3: 0.3: 0.1: 0.1: 0.1: 0.1. Ultimately, each target task contains 3000 samples covering the data of 6 categories in CIFAR-10. | |||
| With this experimental setup, we evaluated the performance of RKME Image using 1 - Accuracy as the loss. | |||
| <div align=center> | |||
| | Setting | Accuracy | | |||
| |-----------------------------------|----------| | |||
| | Mean in Market (Single) | 0.655 | | |||
| | Best in Market (Single) | 0.304 | | |||
| | Top-1 Reuse (Single) | 0.406 | | |||
| | Job Selector Reuse (Multiple) | 0.406 | | |||
| | Average Ensemble Reuse (Multiple) | 0.310 | | |||
| </div> | |||
| In some specific settings, the user will have a small number of labelled samples. In such settings, learning the weight of selected learnwares on a limited number of labelled samples can result in better performance than training directly on a limited number of labelled samples. | |||
| <div align=center> | |||
| <img src="./docs/_static/img/image_labeled.svg" width="50%"/> | |||
| </div> | |||
| ## Text Scenario Experiment | |||
| ### Datasets | |||
| We conducted experiments on the widely used text benchmark dataset: [20-newsgroup](http://qwone.com/~jason/20Newsgroups/). 20-newsgroup is a renowned text classification benchmark with a hierarchical structure, featuring 5 superclasses {comp, rec, sci, talk, misc}. | |||
| In the submitting stage, we enumerated all combinations of three superclasses from the five available, randomly sampling 50% of each combination from the training set to create datasets for 50 uploaders. | |||
| In the deploying stage, we considered all combinations of two superclasses out of the five, selecting all data for each combination from the testing set as a test dataset for one user. This resulted in 10 users. The user's own training data was generated using the same sampling procedure as the user test data, despite originating from the training dataset. | |||
| Model training comprised two parts: the first part involved training a tfidf feature extractor, and the second part used the extracted text feature vectors to train a naive Bayes classifier. | |||
| Our experiments comprise two components: | |||
| - **unlabeled_text_example** is designed to evaluate performance when users possess only testing data, searching and reusing learnware available in the market. | |||
| - **labeled_text_example** aims to assess performance when users have both testing and limited training data, searching and reusing learnware directly from the market instead of training a model from scratch. This helps determine the amount of training data saved for the user. | |||
| ### Results | |||
| - **unlabeled_text_example**: | |||
| The table below presents the mean accuracy of search and reuse across all users: | |||
| <div align=center> | |||
| | Setting | Accuracy | | |||
| |-----------------------------------|----------| | |||
| | Mean in Market (Single) | 0.507 | | |||
| | Best in Market (Single) | 0.859 | | |||
| | Top-1 Reuse (Single) | 0.846 | | |||
| | Job Selector Reuse (Multiple) | 0.845 | | |||
| | Average Ensemble Reuse (Multiple) | 0.862 | | |||
| </div> | |||
| - **labeled_text_example**: | |||
| We present the change curves in classification error rates for both the user's self-trained model and the multiple learnware reuse (EnsemblePrune), showcasing their performance on the user's test data as the user's training data increases. The average results across 10 users are depicted below: | |||
| <div align=center> | |||
| <img src="./docs/_static/img/text_labeled.svg" width="50%"/> | |||
| </div> | |||
| From the figure above, it is evident that when the user's own training data is limited, the performance of multiple learnware reuse surpasses that of the user's own model. As the user's training data grows, it is expected that the user's model will eventually outperform the learnware reuse. This underscores the value of reusing learnware to significantly conserve training data and achieve superior performance when user training data is limited. | |||
| # About | |||
| ## Contributor | |||
| ## Contributors | |||
| We appreciate all contributions and thank all the contributors! | |||
| TODO: Here paste the github API after publishing: | |||
| [Pic after publish]() | |||
| <div align=center> | |||
| <img src="https://github.com/Learnware-LAMDA/Learnware/graphs/contributors"/> | |||
| </div> | |||
| ## About us | |||
| Visit [LAMDA's official website](http://www.lamda.nju.edu.cn/MainPage.ashx). | |||
| Visit [LAMDA's official website](http://www.lamda.nju.edu.cn/). | |||
| @@ -2,7 +2,10 @@ | |||
| About Us | |||
| ================ | |||
| We thank all the contributors for the development of learnware package: | |||
| Contributors | |||
| ================ | |||
| .. image:: https://github.com/Learnware-LAMDA/Learnware/graphs/contributors | |||
| :align: center | |||
| In LAMDA Group, also many people participate the discussions, learnware package design and development and so on. | |||
| For more details about us, please refer to `LAMDA Group <https://www.lamda.nju.edu.cn/>`_. | |||
| @@ -3,6 +3,39 @@ | |||
| For Developer | |||
| ================ | |||
| Install with Dev Mode | |||
| ======================= | |||
| As a developer, you often want make changes to ``Learnware Market`` and hope it would reflect directly in your environment without reinstalling it. You can install ``Learnware Market`` in editable mode with following command. | |||
| .. code-block:: bash | |||
| $ git clone https://github.com/Learnware-LAMDA/Learnware.git && cd Learnware | |||
| $ pip install -e .[dev] | |||
| .. note:: | |||
| It's recommended to use anaconda/miniconda to setup the environment. Also you can run ``pip install -e .[full, dev]`` to install ``torch`` automatically. | |||
| Commit Format | |||
| ============== | |||
| Please submit in the following manner: Submit using the format ``prefix`` + ``space`` + ``suffix``. | |||
| There are four choices for the prefix, and they can be combined using commas: | |||
| - [ENH]: Represents enhancement, indicating the addition of new features. | |||
| - [DOC]: Indicates modifications to the documentation. | |||
| - [FIX]: Represents bug fixes and typo corrections. | |||
| - [MNT]: Indicates other minor modifications, such as version updates. | |||
| The suffix specifies the specific nature of the modification, with the initial letter capitalized. | |||
| Examples: The following are all valid: | |||
| - [DOC] Fix the document | |||
| - [FIX, ENH] Fix the bug and add some feature" | |||
| Docstring | |||
| ============ | |||
| Please use the `Numpydoc Style <https://stackoverflow.com/a/24385103>`_. | |||
| @@ -15,7 +48,7 @@ Continuous Integration | |||
| Continuous Integration (CI) tools help you stick to the quality standards by running tests every time you push a new commit and reporting the results to a pull request. | |||
| ``Learnware Market`` will check the following tests when you pull a request: | |||
| 1. We will check your code style pylint, you can fix your code style by the following commands: | |||
| 1. We will check your code length, you can fix your code style by the following commands: | |||
| .. code-block:: bash | |||
| @@ -30,22 +63,34 @@ Continuous Integration (CI) tools help you stick to the quality standards by run | |||
| pip install pytest | |||
| python -m pytest tests | |||
| Development Guidance | |||
| ================================= | |||
| ``pre-commit`` Config | |||
| ======================== | |||
| As a developer, you often want make changes to ``Learnware Market`` and hope it would reflect directly in your environment without reinstalling it. You can install ``Learnware Market`` in editable mode with following command. | |||
| The ``learnware`` package support config ``pre-commit``. Run the following command to install ``pre-commit``: | |||
| - For Windows and Linux users: | |||
| .. code-block:: bash | |||
| pip install pre-commit | |||
| Run the following command in the root directory of ``Learnware`` Project to enable ``pre-commit``: | |||
| .. code-block:: bash | |||
| pre-commit install | |||
| .. code-block:: bash | |||
| $ git clone https://git.nju.edu.cn/learnware/learnware-market.git && cd learnware-market | |||
| $ python setup.py install | |||
| ``isort`` Config | |||
| =================== | |||
| The codes in the ``learnware`` package will be processed by ``isort`` (``examples`` and ``tests`` are excluded). Run the following command to install ``isort``: | |||
| .. code-block:: bash | |||
| pip install isort | |||
| Run the following command in the root directory of ``Learnware`` Project to run ``isort``: | |||
| .. code-block:: bash | |||
| - For macOS users: | |||
| isort learnware --reverse-relative | |||
| .. code-block:: bash | |||
| $ conda install -c pytorch faiss | |||
| $ git clone https://git.nju.edu.cn/learnware/learnware-market.git && cd learnware-market | |||
| $ python setup.py install | |||
| @@ -4,7 +4,7 @@ | |||
| Learnware & Reuser | |||
| ========================================== | |||
| ``Learnware`` is the most basic concept in the ``learnware paradigm``. In this section, we will introduce the concept and design of ``learnware`` and its extension for ``Hetero Reuse``. Then we will introduce the ``Reuse Methods``, which applies one or several ``learnware``\ s to solve the user's task. | |||
| ``Learnware`` is the most basic concept in the ``learnware paradigm``. In this section, we will introduce the concept and design of ``Learnware`` and its extension for ``Hetero Reuse``. Then we will introduce the ``Reuse Methods``, which applies one or several ``Learnware``\ s to solve the user's task. | |||
| Concepts | |||
| =================== | |||
| @@ -16,7 +16,7 @@ In our implementation, the class ``Learnware`` has 3 important member variables: | |||
| - ``model``: The model in the learnware, can be a ``BaseModel`` or a dict including model name and path. When it is a dict, the function ``Learnware.instantiate_model`` is used to transform it to a ``BaseModel``. The function ``Learnware.predict`` use the model to predict for an input ``X``. See more in `COMPONENTS: Model <./model.html>`_. | |||
| - ``specification``: The specification including the semantic specification and the statistic specification. | |||
| Learnware for Hetero Reuse (Feature Align + Hetero Map Learnware) | |||
| Learnware for Hetero Reuse | |||
| ======================================================================= | |||
| In the Hetero Market(see `COMPONENTS: Hetero Market <./market.html#hetero-market>`_ for details), ``HeteroSearcher`` identifies and recommends helpful learnwares among all learnwares in the market, | |||
| @@ -107,7 +107,7 @@ specifies the ensemble method(default is set to ``mean``). | |||
| Reuse Learnware with Labeled Data | |||
| ---------------------------------- | |||
| When users have a small amount of labeled data available, ``learnware`` package provides two methods: ``EnsemblePruningReuser`` and ``FeatureAugmentReuser`` to help reuse learnwares. | |||
| When users have a small amount of labeled data available, the ``learnware`` package provides two methods: ``EnsemblePruningReuser`` and ``FeatureAugmentReuser`` to help reuse learnwares. | |||
| They are both initialized with a list of ``Learnware`` objects ``learnware_list``, and have different implementations of ``fit`` and ``predict`` methods. | |||
| EnsemblePruningReuser | |||
| @@ -4,20 +4,20 @@ | |||
| Learnware Market | |||
| ================================ | |||
| The ``learnware market`` receives high-performance machine learning models from developers, incorporates them into the system, and provides services to users by identifying and reusing learnware to help users solve current tasks. Developers voluntarily submit various learnwares to the learnware market, and the market conducts quality checks and further organization of these learnwares. When users submit task requirements, the learnware market automatically selects whether to recommend a single learnware or a combination of multiple learnwares. | |||
| The ``Learnware Market`` receives high-performance machine learning models from developers, incorporates them into the system, and provides services to users by identifying and reusing learnware to help users solve current tasks. Developers voluntarily submit various learnwares to the learnware market, and the market conducts quality checks and further organization of these learnwares. When users submit task requirements, the learnware market automatically selects whether to recommend a single learnware or a combination of multiple learnwares. | |||
| The ``learnware market`` will receive various kinds of learnwares, and learnwares from different feature/label spaces form numerous islands of specifications. All these islands together constitute the ``specification world`` in the learnware market. The market should discover and establish connections between different islands, and then merge them into a unified specification world. This further organization of learnwares support search learnwares among all learnwares, not just among learnwares which has the same feature space and label space with the user's task requirements. | |||
| The ``Learnware Market`` will receive various kinds of learnwares, and learnwares from different feature/label spaces form numerous islands of specifications. All these islands together constitute the ``specification world`` in the learnware market. The market should discover and establish connections between different islands, and then merge them into a unified specification world. This further organization of learnwares support search learnwares among all learnwares, not just among learnwares which has the same feature space and label space with the user's task requirements. | |||
| Framework | |||
| ====================================== | |||
| The ``learnware market`` is combined with a ``organizer``, a ``searcher``, and a list of ``checker``\ s. | |||
| The ``Learnware Market`` is combined with a ``organizer``, a ``searcher``, and a list of ``checker``\ s. | |||
| The ``organizer`` can store and organize learnwares in the market. It supports ``add``, ``delete``, and ``update`` operations for learnwares. It also provides the interface for ``searcher`` to search learnwares based on user requirement. | |||
| The ``searcher`` can search learnwares based on user requirement. The implementation of ``searcher`` is dependent on the concrete implementation and interface for ``organizer``, where usually an ``organizer`` can be compatible with multiple different ``searcher``\ s. | |||
| The ``checker`` is used for checking the learnware in some standards. It should check the utility of a learnware and is supposed to return the status and a message related to the learnware's check result. Only the learnwares who passed the ``checker`` could be able to be stored and added into the ``learnware market``. | |||
| The ``checker`` is used for checking the learnware in some standards. It should check the utility of a learnware and is supposed to return the status and a message related to the learnware's check result. Only the learnwares who passed the ``checker`` could be able to be stored and added into the ``Learnware Market``. | |||
| @@ -37,6 +37,9 @@ Semantic Specification | |||
| The semantic specification consists of a "dict" structure that includes keywords "Data", "Task", "Library", "Scenario", "License", "Description", and "Name". | |||
| In the case of table learnwares, users should additionally provide descriptions for each feature dimension and output dimension through the "Input" and "Output" keywords. | |||
| - If "data_type" is "Table", you need to specify the semantics of each dimension of the model's input data to make the uploaded learnware suitable for tasks with heterogeneous feature spaces. | |||
| - If "task_type" is "Classification", you need to provide the semantics of model output labels (prediction labels start from 0), making the uploaded learnware suitable for classification tasks with heterogeneous output spaces. | |||
| - If "task_type" is "Regression", you need to specify the semantics of each dimension of the model output, making the uploaded learnware suitable for regression tasks with heterogeneous output spaces. | |||
| Regular Specification | |||
| ====================================== | |||
| @@ -131,7 +134,7 @@ with particular learnware market implementations. | |||
| - Learnware searchers perform helpful learnware recommendations among all table learnwares in the market, leveraging the ``system specification``\ s generated for users. | |||
| ``learnware`` package now includes a type of ``system specification``, named ``HeteroMapTableSpecification``, made especially for the ``Hetero Market`` implementation. | |||
| The ``learnware`` package now includes a type of ``system specification``, named ``HeteroMapTableSpecification``, made especially for the ``Hetero Market`` implementation. | |||
| This specification is automatically given to all table learnwares when they are added to the ``Hetero Market``. | |||
| It is also set up to be updated periodically, ensuring it remains accurate as the learnware market evolves and builds more precise specification worlds. | |||
| Please refer to `COMPONENTS: Hetero Market <../components/market.html#hetero-market>`_ for implementation details. | |||
| @@ -7,7 +7,9 @@ | |||
| ``Learnware`` Documentation | |||
| ============================================================ | |||
| ``Learnware`` is a model sharing platform, which give a basic implementation of the learnware paradigm. A learnware is a well-performed trained machine learning model with a specification that enables it to be adequately identified to reuse according to the requirement of future users who may know nothing about the learnware in advance. The learnware paradigm can solve entangled problems in the current machine learning paradigm, like continual learning and catastrophic forgetting. It also reduces resources for training a well-performed model. | |||
| The ``learnware`` package provides a fundamental implementation of the central concepts and procedures for the learnware paradigm. | |||
| A learnware is a well-performed trained machine learning model with a specification that enables it to be adequately identified to reuse according to the requirement of future users who may know nothing about the learnware in advance. | |||
| The learnware paradigm is a new paradigm aimed at enabling users to reuse existed well-trained models to solve their AI tasks instead of starting from scratch. | |||
| .. _user_guide: | |||
| @@ -1,5 +1,12 @@ | |||
| .. _faq: | |||
| ==================== | |||
| FAQ | |||
| Learnware FAQ | |||
| ==================== | |||
| Learnware Frequently Asked Questions | |||
| ===================================== | |||
| .. contents:: | |||
| :depth: 1 | |||
| :local: | |||
| :backlinks: none | |||
| @@ -3,7 +3,7 @@ | |||
| API Reference | |||
| ================================ | |||
| Here you can find all ``learnware`` interfaces. | |||
| Here you can find high-level ``Learnware`` interfaces. | |||
| Market | |||
| ==================== | |||
| @@ -13,23 +13,96 @@ Market | |||
| .. autoclass:: learnware.market.BaseUserInfo | |||
| :members: | |||
| Learnware & Reuser | |||
| Organizer | |||
| ------------------ | |||
| .. autoclass:: learnware.market.BaseOrganizer | |||
| :members: | |||
| .. autoclass:: learnware.market.EasyOrganizer | |||
| :members: | |||
| .. autoclass:: learnware.market.HeteroOrganizer | |||
| :members: | |||
| Searcher | |||
| ------------------ | |||
| .. autoclass:: learnware.market.BaseSearcher | |||
| :members: | |||
| .. autoclass:: learnware.market.EasySearcher | |||
| :members: | |||
| .. autoclass:: learnware.market.EasyExactSemanticSearcher | |||
| :members: | |||
| .. autoclass:: learnware.market.EasyFuzzSemanticSearcher | |||
| :members: | |||
| .. autoclass:: learnware.market.EasyStatSearcher | |||
| :members: | |||
| .. autoclass:: learnware.market.HeteroSearcher | |||
| :members: | |||
| Checker | |||
| ------------------ | |||
| .. autoclass:: learnware.market.BaseChecker | |||
| :members: | |||
| .. autoclass:: learnware.market.EasyChecker | |||
| :members: | |||
| .. autoclass:: learnware.market.EasySemanticChecker | |||
| :members: | |||
| .. autoclass:: learnware.market.EasyStatChecker | |||
| :members: | |||
| Learnware | |||
| ==================== | |||
| .. autoclass:: learnware.learnware.Learnware | |||
| :members: | |||
| Reuser | |||
| ==================== | |||
| .. autoclass:: learnware.reuse.BaseReuser | |||
| :members: | |||
| Data Independent Reuser | |||
| ------------------------- | |||
| .. autoclass:: learnware.reuse.JobSelectorReuser | |||
| :members: | |||
| .. autoclass:: learnware.reuse.AveragingReuser | |||
| :members: | |||
| Data Dependent Reuser | |||
| ------------------------- | |||
| .. autoclass:: learnware.reuse.EnsemblePruningReuser | |||
| :members: | |||
| .. autoclass:: learnware.reuse.FeatureAugmentReuser | |||
| :members: | |||
| Aligned Learnware | |||
| -------------------- | |||
| .. autoclass:: learnware.reuse.AlignLearnware | |||
| :members: | |||
| .. autoclass:: learnware.reuse.FeatureAlignLearnware | |||
| :members: | |||
| .. autoclass:: learnware.reuse.HeteroMapAlignLearnware | |||
| :members: | |||
| Specification | |||
| ==================== | |||
| @@ -39,6 +112,12 @@ Specification | |||
| .. autoclass:: learnware.specification.BaseStatSpecification | |||
| :members: | |||
| Regular Specification | |||
| -------------------------- | |||
| .. autoclass:: learnware.specification.RegularStatSpecification | |||
| :members: | |||
| .. autoclass:: learnware.specification.RKMETableSpecification | |||
| :members: | |||
| @@ -48,8 +127,32 @@ Specification | |||
| .. autoclass:: learnware.specification.RKMETextSpecification | |||
| :members: | |||
| System Specification | |||
| -------------------------- | |||
| .. autoclass:: learnware.specification.HeteroMapTableSpecification | |||
| :members: | |||
| Model | |||
| ==================== | |||
| Base Model | |||
| -------------- | |||
| .. autoclass:: learnware.model.BaseModel | |||
| :members: | |||
| Container | |||
| ------------- | |||
| .. autoclass:: learnware.client.ModelContainer | |||
| :members: | |||
| .. autoclass:: learnware.client.ModelCondaContainer | |||
| :members: | |||
| .. autoclass:: learnware.client.ModelDockerContainer | |||
| :members: | |||
| .. autoclass:: learnware.client.LearnwaresContainer | |||
| :members: | |||
| @@ -16,8 +16,8 @@ Ubuntu 20.04.4 LTS Nvidia Tesla V100S Intel(R) Xeon(R) Gold 6240R | |||
| ==================== ==================== =============================== | |||
| Table: homo+hetero | |||
| ==================== | |||
| Tabular Data Experiments | |||
| =========================== | |||
| Datasets | |||
| ------------------ | |||
| @@ -43,8 +43,8 @@ Based on the specific design of user tasks, our experiments were primarily categ | |||
| - ``heterogeneous experiments`` aim to evaluate the performance of identifying and reusing helpful heterogeneous learnwares in situations where | |||
| no available learnwares match the feature space of the user's task. This helps to highlight the potential of learnwares for applications beyond their original purpose. | |||
| Homo Experiments | |||
| ----------------------- | |||
| Homogeneous Tabular Dataset | |||
| ----------------------------- | |||
| For homogeneous experiments, the 55 stores in the Corporacion dataset act as 55 users, each applying one feature engineering method, | |||
| and using the test data from their respective store as user data. These users can then search for homogeneous learnwares in the market with the same feature spaces as their tasks. | |||
| @@ -52,17 +52,20 @@ and using the test data from their respective store as user data. These users ca | |||
| The Mean Squared Error (MSE) of search and reuse across all users is presented in the table below: | |||
| +-----------------------------------+---------------------+ | |||
| | Mean in Market (Single) | 0.331 | | |||
| | Setting | MSE | | |||
| +===================================+=====================+ | |||
| | Mean in Market (Single) | 0.331 | | |||
| +-----------------------------------+---------------------+ | |||
| | Best in Market (Single) | 0.151 | | |||
| | Best in Market (Single) | 0.151 | | |||
| +-----------------------------------+---------------------+ | |||
| | Top-1 Reuse (Single) | 0.280 | | |||
| | Top-1 Reuse (Single) | 0.280 | | |||
| +-----------------------------------+---------------------+ | |||
| | Job Selector Reuse (Multiple) | 0.274 | | |||
| | Job Selector Reuse (Multiple) | 0.274 | | |||
| +-----------------------------------+---------------------+ | |||
| | Average Ensemble Reuse (Multiple) | 0.267 | | |||
| | Average Ensemble Reuse (Multiple) | 0.267 | | |||
| +-----------------------------------+---------------------+ | |||
| When users have both test data and limited training data derived from their original data, reusing single or multiple searched learnwares from the market can often yield | |||
| better results than training models from scratch on limited training data. We present the change curves in MSE for the user's self-trained model, as well as for the Feature Augmentation single learnware reuse method and the Ensemble Pruning multiple learnware reuse method. | |||
| These curves display their performance on the user's test data as the amount of labeled training data increases. | |||
| @@ -76,8 +79,8 @@ From the figure, it's evident that when users have limited training data, the pe | |||
| This emphasizes the benefit of learnware reuse in significantly reducing the need for extensive training data and achieving enhanced results when available user training data is limited. | |||
| Hetero Experiments | |||
| ------------------------- | |||
| Heterogeneous Tabular Dataset | |||
| ------------------------------ | |||
| In heterogeneous experiments, the learnware market would recommend helpful heterogeneous learnwares with different feature spaces with | |||
| the user tasks. Based on whether there are learnwares in the market that handle tasks similar to the user's task, the experiments can be further subdivided into the following two types: | |||
| @@ -91,6 +94,8 @@ we tested various heterogeneous learnware reuse methods (without using user's la | |||
| The average MSE performance across 41 users are as follows: | |||
| +-----------------------------------+---------------------+ | |||
| | Setting | MSE | | |||
| +===================================+=====================+ | |||
| | Mean in Market (Single) | 1.459 | | |||
| +-----------------------------------+---------------------+ | |||
| | Best in Market (Single) | 1.226 | | |||
| @@ -122,35 +127,36 @@ The average results across 10 users are depicted in the figure below: | |||
| We can observe that heterogeneous learnwares are beneficial when there's a limited amount of the user's labeled training data available, | |||
| aiding in better alignment with the user's specific task. This underscores the potential of learnwares to be applied to tasks beyond their original purpose. | |||
| Image Experiment | |||
| ==================== | |||
| Image Data Experiment | |||
| ========================= | |||
| For the CIFAR-10 dataset, we sampled the training set unevenly by category and constructed unbalanced training datasets for the 50 learnwares that contained only some of the categories. This makes it unlikely that there exists any learnware in the learnware market that can accurately handle all categories of data; only the learnware whose training data is closest to the data distribution of the target task is likely to perform well on the target task. Specifically, the probability of each category being sampled obeys a random multinomial distribution, with a non-zero probability of sampling on only 4 categories, and the sampling ratio is 0.4: 0.4: 0.1: 0.1. Ultimately, the training set for each learnware contains 12,000 samples covering the data of 4 categories in CIFAR-10. | |||
| We constructed 100 target tasks using data from the test set of CIFAR-10. Similar to constructing the training set for the learnwares, in order to allow for some variation between tasks, we sampled the test set unevenly. Specifically, the probability of each category being sampled obeys a random multinomial distribution, with non-zero sampling probability on 6 categories, and the sampling ratio is 0.3: 0.3: 0.1: 0.1: 0.1: 0.1. Ultimately, each target task contains 3000 samples covering the data of 6 categories in CIFAR-10. | |||
| We constructed 50 target tasks using data from the test set of CIFAR-10. Similar to constructing the training set for the learnwares, in order to allow for some variation between tasks, we sampled the test set unevenly. Specifically, the probability of each category being sampled obeys a random multinomial distribution, with non-zero sampling probability on 6 categories, and the sampling ratio is 0.3: 0.3: 0.1: 0.1: 0.1: 0.1. Ultimately, each target task contains 3000 samples covering the data of 6 categories in CIFAR-10. | |||
| With this experimental setup, we evaluated the performance of RKME Image by calculating the mean accuracy across all users. | |||
| With this experimental setup, we evaluated the performance of RKME Image using 1 - Accuracy as the loss. | |||
| +-----------------------------------+---------------------+ | |||
| | Mean in Market (Single) | 0.346 | | |||
| | Setting | Accuracy | | |||
| +===================================+=====================+ | |||
| | Mean in Market (Single) | 0.655 | | |||
| +-----------------------------------+---------------------+ | |||
| | Best in Market (Single) | 0.688 | | |||
| | Best in Market (Single) | 0.304 | | |||
| +-----------------------------------+---------------------+ | |||
| | Top-1 Reuse (Single) | 0.534 | | |||
| | Top-1 Reuse (Single) | 0.406 | | |||
| +-----------------------------------+---------------------+ | |||
| | Job Selector Reuse (Multiple) | 0.534 | | |||
| | Job Selector Reuse (Multiple) | 0.406 | | |||
| +-----------------------------------+---------------------+ | |||
| | Average Ensemble Reuse (Multiple) | 0.676 | | |||
| | Average Ensemble Reuse (Multiple) | 0.310 | | |||
| +-----------------------------------+---------------------+ | |||
| In some specific settings, the user will have a small number of labeled samples. In such settings, learning the weight of selected learnwares on a limited number of labeled samples can result in a better performance than training directly on a limited number of labeled samples. | |||
| In some specific settings, the user will have a small number of labelled samples. In such settings, learning the weight of selected learnwares on a limited number of labelled samples can result in a better performance than training directly on a limited number of labelled samples. | |||
| .. image:: ../_static/img/image_labeled.svg | |||
| :align: center | |||
| Text Experiment | |||
| ==================== | |||
| Text Data Experiment | |||
| ========================== | |||
| Datasets | |||
| ------------------ | |||
| @@ -177,6 +183,8 @@ Results | |||
| The table below presents the mean accuracy of search and reuse across all users: | |||
| +-----------------------------------+---------------------+ | |||
| | Setting | Accuracy | | |||
| +===================================+=====================+ | |||
| | Mean in Market (Single) | 0.507 | | |||
| +-----------------------------------+---------------------+ | |||
| | Best in Market (Single) | 0.859 | | |||
| @@ -199,17 +207,23 @@ We present the change curves in classification error rates for both the user's s | |||
| From the figure above, it is evident that when the user's own training data is limited, the performance of multiple learnware reuse surpasses that of the user's own model. As the user's training data grows, it is expected that the user's model will eventually outperform the learnware reuse. This underscores the value of reusing learnware to significantly conserve training data and achieve superior performance when user training data is limited. | |||
| Get Start Examples | |||
| ========================= | |||
| We utilize the `fire` module to construct our experiments, including table, image and text scenario. | |||
| Examples for `Tabular, Text` and `Image` data sets are available at `Learnware Examples <https://github.com/Learnware-LAMDA/Learnware/tree/main/examples>`_. You can run { main.py } directly to reproduce related experiments. | |||
| We utilize the `fire` module to construct our experiments. | |||
| Examples for `Image` are available at [examples/dataset_image_workflow]. | |||
| Text Examples | |||
| ------------------ | |||
| You can execute the experiment with the following commands: | |||
| * `python workflow.py image_example`: Run both the unlabeled_image_example and labeled_image_example experiments. The results will be printed in the terminal, and the curves will be automatically saved in the `figs` directory. | |||
| * `python main.py unlabeled_text_example`: Executes the unlabeled_text_example experiment; the results will be printed in the terminal. | |||
| * `python main.py labeled_text_example`: Executes the labeled_text_example experiment; result curves will be automatically saved in the `figs` directory. | |||
| Examples for `Text` are available at [examples/dataset_text_workflow]. | |||
| Image Examples | |||
| ------------------ | |||
| You can execute the experiment with the following commands: | |||
| * `python workflow.py unlabeled_text_example`: Run the unlabeled_text_example experiment. The results will be printed in the terminal. | |||
| * `python workflow.py labeled_text_example`: Run the labeled_text_example experiment. The result curves will be automatically saved in the `figs` directory. | |||
| .. code-block:: bash | |||
| python workflow.py image_example | |||
| @@ -4,50 +4,43 @@ Installation Guide | |||
| ======================== | |||
| ``Learnware Market`` Installation | |||
| ================================= | |||
| ``learnware`` Package Installation | |||
| =================================== | |||
| .. note:: | |||
| ``Learnware Market`` supports `Windows`, `Linux` and `Macos`. It's recommended to use ``Learnware Market`` in `Linux`. ``Learnware Market`` supports Python3, which is up to Python3.8. | |||
| The ``learnware`` package supports `Windows`, `Linux`. It's recommended to use ``Learnware`` in `Linux`. ``Learnware`` supports Python3, which is up to Python3.11. | |||
| Users can easily install ``Learnware Market`` by pip according to the following command: | |||
| Users can easily install ``Learnware`` by pip according to the following command: | |||
| - For Windows and Linux users: | |||
| .. code-block:: bash | |||
| .. code-block:: bash | |||
| pip install learnware | |||
| - For macOS users: | |||
| pip install learnware | |||
| .. code-block:: bash | |||
| In the ``learnware`` package, besides the base classes, many core functionalities such as "learnware specification generation" and "learnware deployment" rely on the ``torch`` library. Users have the option to manually install ``torch``, or they can directly use the following command to install the ``learnware`` package: | |||
| conda install -c pytorch faiss | |||
| pip install learnware | |||
| .. code-block:: bash | |||
| pip install learnware[full] | |||
| Also, Users can install ``Learnware Market`` by the source code according to the following steps: | |||
| .. note:: | |||
| However, it's crucial to note that due to the potential complexity of the user's local environment, installing ``learnware[full]`` does not guarantee that ``torch`` will successfully invoke ``CUDA`` in the user's local setting. | |||
| - Enter the root directory of ``Learnware Market``, in which the file ``setup.py`` exists. | |||
| - Then, please execute the following command to install the environment dependencies and install ``Learnware Market``: | |||
| - For Windows and Linux users: | |||
| Install ``learnware`` Package From Source | |||
| ========================================== | |||
| .. code-block:: bash | |||
| $ git clone https://git.nju.edu.cn/learnware/learnware-market.git && cd learnware-market | |||
| $ python setup.py install | |||
| Also, Users can install ``Learnware`` by the source code according to the following steps: | |||
| - For macOS users: | |||
| - Enter the root directory of ``Learnware``, in which the file ``setup.py`` exists. | |||
| - Then, please execute the following command to install the environment dependencies and install ``Learnware``: | |||
| .. code-block:: bash | |||
| $ conda install -c pytorch faiss | |||
| $ git clone https://git.nju.edu.cn/learnware/learnware-market.git && cd learnware-market | |||
| $ python setup.py install | |||
| $ git clone hhttps://github.com/Learnware-LAMDA/Learnware.git && cd Learnware | |||
| $ pip install -e .[dev] | |||
| .. note:: | |||
| It's recommended to use anaconda/miniconda to setup the environment. | |||
| It's recommended to use anaconda/miniconda to setup the environment. Also you can run ``pip install -e .[full, dev]`` to install ``torch`` automatically as well. | |||
| Use the following code to make sure the installation successful: | |||
| @@ -12,7 +12,7 @@ In addition, the ``learnware`` package serves as the engine for the `Beimingwu S | |||
| What is Learnware? | |||
| ==================== | |||
| A learnware consists of high-performance machine learning models and specifications that characterize the models, i.e., "Learnware = Model + Specification." | |||
| A learnware consists of a high-performance machine learning model and specifications that characterize the model, i.e., "Learnware = Model + Specification". | |||
| The learnware specification consists of "semantic specification" and "statistical specification": | |||
| @@ -29,7 +29,7 @@ The Benefits of Learnware Paradigm | |||
| Machine learning has achieved great success in many fields but still faces various challenges, such as the need for extensive training data and advanced training techniques, the difficulty of continuous learning, the risk of catastrophic forgetting, and the leakage of data privacy. | |||
| Although there are many efforts focusing on one of these issues separately, they are entangled, and solving one problem may exacerbate others. The learnware paradigm aimss to address many of these challenges through a unified framework. | |||
| Although there are many efforts focusing on one of these issues separately, they are entangled, and solving one problem may exacerbate others. The learnware paradigm aims to address many of these challenges through a unified framework. | |||
| +-----------------------+-----------------------------------------------------------------------------------------------+ | |||
| | Benefit | Description | | |||
| @@ -76,4 +76,8 @@ Procedure of Learnware Paradigm | |||
| Learnware Package Design | |||
| ========================== | |||
| TBD by xiaodong. | |||
| .. image:: ../_static/img/learnware_framework.svg | |||
| :align: center | |||
| At the workflow level, the ``learnware`` package consists of ``Submitting Stage`` and ``Deploying Stage``. | |||
| At the module level, the ``learnware`` package is a platform that consists of above components. The components are designed as loose-coupled modules and each component could be used stand-alone. | |||
| @@ -7,90 +7,44 @@ Quick Start | |||
| Introduction | |||
| ==================== | |||
| This ``Quick Start`` guide aims to illustrate the straightforward process of establishing a full ``Learnware Market`` workflow | |||
| and utilizing ``Learnware Market`` to handle user tasks. | |||
| This ``Quick Start`` guide aims to illustrate the straightforward process of establishing a full ``Learnware`` workflow | |||
| and utilizing ``Learnware`` to handle user tasks. | |||
| Installation | |||
| ==================== | |||
| Learnware is currently hosted on `PyPI <https://pypi.org/>`__. You can easily intsall ``learnware`` by following these steps: | |||
| Learnware is currently hosted on `PyPI <https://pypi.org/>`_. You can easily intsall ``Learnware`` by following these steps: | |||
| - For Windows and Linux users: | |||
| .. code-block:: bash | |||
| .. code-block:: | |||
| pip install learnware | |||
| pip install learnware | |||
| In the ``learnware`` package, besides the base classes, many core functionalities such as "learnware specification generation" and "learnware deployment" rely on the ``torch`` library. Users have the option to manually install ``torch``, or they can directly use the following command to install the ``learnware`` package: | |||
| - For macOS users: | |||
| .. code-block:: bash | |||
| .. code-block:: | |||
| conda install -c pytorch faiss | |||
| pip install learnware | |||
| pip install learnware[full] | |||
| .. note:: | |||
| However, it's crucial to note that due to the potential complexity of the user's local environment, installing ``learnware[full]`` does not guarantee that ``torch`` will successfully invoke ``CUDA`` in the user's local setting. | |||
| Prepare Learnware | |||
| ==================== | |||
| The Learnware Market encompasses a board variety of learnwares. A valid learnware is a zipfile that | |||
| includes the following four components: | |||
| - ``__init__.py`` | |||
| A Python file that provides interfaces for fitting, predicting, and fine-tuning your model. | |||
| - ``rkme.json`` | |||
| A JSON file that contains the statistical specification of your data. | |||
| - ``learnware.yaml`` | |||
| A configuration file that details your model's class name, the type of statistical specification(e.g. ``RKMETableSpecification`` for Reduced Kernel Mean Embedding), and | |||
| the file name of your statistical specification file. | |||
| - ``environment.yaml`` or ``requirements.txt`` | |||
| - ``environment.yaml`` for conda: | |||
| A Conda environment configuration file for running the model. If the model environment is incompatible, this file can be used for manual configuration. | |||
| Here's how you can generate this file: | |||
| - Create env config for conda: | |||
| - For Windows users: | |||
| .. code-block:: | |||
| conda env export | findstr /v "^prefix: " > environment.yaml | |||
| - For macOS and Linux users | |||
| .. code-block:: | |||
| conda env export | grep -v "^prefix: " > environment.yaml | |||
| - Recover env from config: | |||
| .. code-block:: | |||
| In learnware ``learnware`` package, each learnware is encapsulated in a ``zip`` package, which should contain at least the following four files: | |||
| conda env create -f environment.yaml | |||
| - ``requirements.txt`` for pip: | |||
| A plain text documents that lists all packages necessary for executing the model. These dependencies can be effortlessly installed using pip with the command: | |||
| .. code-block:: | |||
| pip install -r requirements.txt. | |||
| We've also detailed the format of the learnware zipfile in :ref:`Learnware Preparation<workflows/upload:Prepare Learnware>`. | |||
| - ``learnware.yaml``: learnware configuration file. | |||
| - ``__init__.py``: methods for using the model. | |||
| - ``stat.json``: the statistical specification of the learnware. Its filename can be customized and recorded in learnware.yaml. | |||
| - ``environment.yaml`` or ``requirements.txt``: specifies the environment for the model. | |||
| To facilitate the construction of a learnware, we provide a `Learnware Template <https://www.bmwu.cloud/static/learnware-template.zip>`_ that the users can use as a basis for building your own learnware. We've also detailed the format of the learnware ``zip`` package in `Learnware Preparation<../workflows/upload:prepare-learnware>`. | |||
| Learnware Market Workflow | |||
| Learnware Package Workflow | |||
| ============================ | |||
| Users can start a ``Learnware Market`` workflow according to the following steps: | |||
| Users can start a ``Learnware`` workflow according to the following steps: | |||
| Initialize a Learnware Market | |||
| ------------------------------- | |||
| @@ -100,11 +54,10 @@ You can initialize a basic ``Learnware Market`` named "demo" using the code snip | |||
| .. code-block:: python | |||
| import learnware | |||
| from learnware.market import EasyMarket | |||
| from learnware.market import instantiate_learnware_market | |||
| learnware.init() | |||
| easy_market = EasyMarket(market_id="demo", rebuild=True) | |||
| # instantiate a demo market | |||
| demo_market = instantiate_learnware_market(market_id="demo", name="easy", rebuild=True) | |||
| Upload Leanware | |||
| @@ -114,28 +67,30 @@ Before uploading your learnware to the ``Learnware Market``, | |||
| you'll need to create a semantic specification, ``semantic_spec``. This involves selecting or inputting values for predefined semantic tags | |||
| to describe the features of your task and model. | |||
| For instance, the dictionary snippet below illustrates the semantic specification for a Scikit-Learn type model. | |||
| This model is tailored for business scenarios and performs classification tasks on tabular data: | |||
| For instance, the following codes illustrates the semantic specification for a Scikit-Learn type model. | |||
| This model is tailored for education scenarios and performs classification tasks on tabular data: | |||
| .. code-block:: python | |||
| semantic_spec = { | |||
| "Data": {"Values": ["Tabular"], "Type": "Class"}, | |||
| "Task": {"Values": ["Classification"], "Type": "Class"}, | |||
| "Library": {"Values": ["Scikit-learn"], "Type": "Class"}, | |||
| "Scenario": {"Values": ["Business"], "Type": "Tag"}, | |||
| "Description": {"Values": "", "Type": "String"}, | |||
| "Name": {"Values": "demo_learnware", "Type": "String"}, | |||
| } | |||
| from learnware.specification import generate_semantic_spec | |||
| semantic_spec = generate_semantic_spec( | |||
| name="demo_learnware", | |||
| data_type="Table", | |||
| task_type="Classification", | |||
| library_type="Scikit-learn", | |||
| scenarios="Education", | |||
| license="MIT", | |||
| ) | |||
| After defining the semantic specification, | |||
| you can upload your learnware using a single line of code: | |||
| .. code-block:: python | |||
| easy_market.add_learnware(zip_path, semantic_spec) | |||
| Here, ``zip_path`` is the directory of your learnware zipfile. | |||
| demo_market.add_learnware(zip_path, semantic_spec) | |||
| Here, ``zip_path`` is the directory of your learnware ``zip`` package. | |||
| Semantic Specification Search | |||
| @@ -150,10 +105,11 @@ The ``Learnware Market`` will then perform an initial search using ``user_semant | |||
| user_info = BaseUserInfo(id="user", semantic_spec=semantic_spec) | |||
| # search_learnware: performs semantic specification search when user_info doesn't include a statistical specification | |||
| _, single_learnware_list, _ = easy_market.search_learnware(user_info) | |||
| search_result = easy_market.search_learnware(user_info) | |||
| single_result = search_results.get_single_results() | |||
| # single_learnware_list: the learnware list returned by semantic specification search | |||
| print(single_learnware_list) | |||
| # single_result: the List of Tuple[Score, Learnware] returned by semantic specification search | |||
| print(single_result) | |||
| Statistical Specification Search | |||
| @@ -176,43 +132,64 @@ For example, the code below executes learnware search when using Reduced Set Ker | |||
| user_info = BaseUserInfo( | |||
| semantic_spec=user_semantic, stat_info={"RKMETableSpecification": user_spec} | |||
| ) | |||
| (sorted_score_list, single_learnware_list, | |||
| mixture_score, mixture_learnware_list) = easy_market.search_learnware(user_info) | |||
| search_result = easy_market.search_learnware(user_info) | |||
| # sorted_score_list: learnware scores(based on MMD distances), sorted in descending order | |||
| print(sorted_score_list) | |||
| single_result = search_results.get_single_results() | |||
| multiple_result = search_results.get_multiple_results() | |||
| # single_learnware_list: learnwares, sorted by scores in descending order | |||
| print(single_learnware_list) | |||
| # search_item.score: based on MMD distances, sorted in descending order | |||
| # search_item.learnware.id: id of learnwares, sorted by scores in descending order | |||
| for search_item in single_result: | |||
| print(f"score: {search_item.score}, learnware_id: {search_item.learnware.id}") | |||
| # mixture_learnware_list: collection of learnwares whose combined use is beneficial | |||
| print(mixture_learnware_list) | |||
| # mixture_score: score assigned to the combined set of learnwares in `mixture_learnware_list` | |||
| print(mixture_score) | |||
| # mixture_item.learnwares: collection of learnwares whose combined use is beneficial | |||
| # mixture_item.score: score assigned to the combined set of learnwares in `mixture_item.learnwares` | |||
| for mixture_item in multiple_result: | |||
| print(f"mixture_score: {mixture_item.score}\n") | |||
| mixture_id = " ".join([learnware.id for learnware in mixture_item.learnwares]) | |||
| print(f"mixture_learnware: {mixture_id}\n") | |||
| Reuse Learnwares | |||
| ------------------------------- | |||
| With the list of learnwares, ``mixture_learnware_list``, returned from the previous step, you can readily apply them to make predictions on your own data, bypassing the need to train a model from scratch. | |||
| We offer two baseline methods for reusing a given list of learnwares: ``JobSelectorReuser`` and ``AveragingReuser``. | |||
| Just substitute ``test_x`` in the code snippet below with your own testing data, and you're all set to reuse learnwares! | |||
| We offer provide two methods for reusing a given list of learnwares: ``JobSelectorReuser`` and ``AveragingReuser``. | |||
| Just substitute ``test_x`` in the code snippet below with your own testing data, and you're all set to reuse learnwares: | |||
| .. code-block:: python | |||
| from learnware.reuse import JobSelectorReuser, AveragingReuser | |||
| # using jobselector reuser to reuse the searched learnwares to make prediction | |||
| reuse_job_selector = JobSelectorReuser(learnware_list=mixture_learnware_list) | |||
| reuse_job_selector = JobSelectorReuser(learnware_list=mixture_item.learnwares) | |||
| job_selector_predict_y = reuse_job_selector.predict(user_data=test_x) | |||
| # using averaging ensemble reuser to reuse the searched learnwares to make prediction | |||
| reuse_ensemble = AveragingReuser(learnware_list=mixture_learnware_list) | |||
| reuse_ensemble = AveragingReuser(learnware_list=mixture_item.learnwares) | |||
| ensemble_predict_y = reuse_ensemble.predict(user_data=test_x) | |||
| We also provide two method when the user has labeled data for reusing a given list of learnwares: ``EnsemblePruningReuser`` and ``FeatureAugmentReuser``. | |||
| Just substitute ``test_x`` in the code snippet below with your own testing data, and substitute ``train_X, train_y`` with your own training labeled data, and you're all set to reuse learnwares: | |||
| .. code-block:: python | |||
| from learnware.reuse import EnsemblePruningReuser, FeatureAugmentReuser | |||
| # Use ensemble pruning reuser to reuse the searched learnwares to make prediction | |||
| reuse_ensemble = EnsemblePruningReuser(learnware_list=mixture_item.learnwares, mode="classification") | |||
| reuse_ensemble.fit(train_X, train_y) | |||
| ensemble_pruning_predict_y = reuse_ensemble.predict(user_data=data_X) | |||
| # Use feature augment reuser to reuse the searched learnwares to make prediction | |||
| reuse_feature_augment = FeatureAugmentReuser(learnware_list=mixture_item.learnwares, mode="classification") | |||
| reuse_feature_augment.fit(train_X, train_y) | |||
| feature_augment_predict_y = reuse_feature_augment.predict(user_data=data_X) | |||
| Auto Workflow Example | |||
| ============================ | |||
| The ``Learnware Market`` also offers an automated workflow example. | |||
| The ``Learnware`` also offers automated workflow examples. | |||
| This includes preparing learnwares, uploading and deleting learnwares from the market, and searching for learnwares using both semantic and statistical specifications. | |||
| To experience the basic workflow of the Learnware Market, users can run [workflow code link]. | |||
| To experience the basic workflow of the Learnware Market, please refer to `Learnware Examples <https://github.com/Learnware-LAMDA/Learnware/tree/main/examples>`_. | |||
| @@ -132,5 +132,27 @@ combine ``HeteroMapAlignLearnware`` with the homogeneous reuse methods ``Averagi | |||
| reuse_ensemble.fit(val_x, val_y) | |||
| ensemble_pruning_predict_y = reuse_ensemble.predict(user_data=test_x) | |||
| Reuse with Container | |||
| ===================== | |||
| Reuse with ``Model Container`` | |||
| ================================ | |||
| The ``learnware`` package provides ``Model Container`` to build executive environment for learnwares according to their runtime dependent files. The learnware's model will be executed in the containers and its env will be installed and uninstalled automatically. | |||
| Run the following codes to try run a learnware with ``Model Container``: | |||
| .. code-block:: python | |||
| from learnware.learnware import Learnware | |||
| with LearnwaresContainer(learnware, mode="conda") as env_container: # Let learnware be instance of Learnware Class, and its input shape is (20, 204) | |||
| learnware = env_container.get_learnwares_with_container()[0] | |||
| input_array = np.random.random(size=(20, 204)) | |||
| print(learnware.predict(input_array)) | |||
| The ``mode`` parameter has two options, each for a specific learnware environment loading method: | |||
| - ``'conda'``: Install a separate conda virtual environment for each learnware (automatically deleted after execution); run each learnware independently within its virtual environment. | |||
| - ``'docker'``: Install a conda virtual environment inside a Docker container (automatically destroyed after execution); run each learnware independently within the container (requires Docker privileges). | |||
| .. note:: | |||
| It's important to note that the "conda" modes are not secure if there are any malicious learnwares. If the user cannot guarantee the security of the learnware they want to load, it's recommended to use the "docker" mode to load the learnware. | |||
| @@ -51,7 +51,7 @@ Hetero Search | |||
| For table-based user tasks, | |||
| homogeneous searchers like ``EasySearcher`` fail to recommend learnwares when no table learnware matches the user task's feature dimension, returning empty results. | |||
| To enhance functionality, ``learnware`` package includes the heterogeneous learnware search feature, whose processions is as follows: | |||
| To enhance functionality, the ``learnware`` package includes the heterogeneous learnware search feature, whose processions is as follows: | |||
| - Learnware markets such as ``Hetero Market`` integrate different specification islands into a unified "specification world" by assigning system-level specifications to all learnwares. This allows heterogeneous searchers like ``HeteroSearcher`` to find helpful learnwares from all available table learnwares. | |||
| - Searchers assign system-level specifications to users based on ``UserInfo``'s statistical specification, using methods provided by corresponding organizers. In ``Hetero Market``, for example, ``HeteroOrganizer.generate_hetero_map_spec`` generates system-level specifications for users. | |||
| @@ -1,180 +1,274 @@ | |||
| .. _submit: | |||
| ========================================== | |||
| Learnware Preparation and Submission | |||
| Learnware Preparation and Uoloading | |||
| ========================================== | |||
| In this section, we provide a comprehensive guide on submitting your custom learnware to the Learnware Market. | |||
| In this section, we provide a comprehensive guide on submitting your custom learnware to the ``Learnware Market``. | |||
| We will first discuss the necessary components of a valid learnware, followed by a detailed explanation on how to upload and remove learnwares within ``Learnware Market``. | |||
| Prepare Learnware | |||
| ==================== | |||
| ==================================== | |||
| A valid learnware is encapsulated in a zipfile, comprising four essential components. | |||
| Below, we illustrate the detailed structure of a learnware zipfile. | |||
| In the ``learnware`` package, each learnware is encapsulated in a ``zip`` package, which should contain at least the following four files: | |||
| ``__init__.py`` | |||
| --------------- | |||
| - ``learnware.yaml``: learnware configuration file. | |||
| - ``__init__.py``: methods for using the model. | |||
| - ``stat.json``: the statistical specification of the learnware. Its filename can be customized and recorded in learnware.yaml. | |||
| - ``environment.yaml`` or ``requirements.txt``: specifies the environment for the model. | |||
| Within ``Learnware Market``, every uploader must provide a unified set of interfaces for their model, | |||
| facilitating easy utilization for future users. | |||
| The ``__init__.py`` file serves as the Python interface for your model's fitting, prediction, and fine-tuning processes. | |||
| For example, the code snippet below is used to train and save a SVM model for a sample dataset on sklearn digits classification: | |||
| To facilitate the construction of a learnware, we provide a `Learnware Template <https://www.bmwu.cloud/static/learnware-template.zip>`_ that you can use as a basis for building your own learnware. | |||
| .. code-block:: python | |||
| import joblib | |||
| from sklearn.datasets import load_digits | |||
| from sklearn.model_selection import train_test_split | |||
| X, y = load_digits(return_X_y=True) | |||
| data_X, _, data_y, _ = train_test_split(X, y, test_size=0.3, shuffle=True) | |||
| # input dimension: (64, ), output dimension: (10, ) | |||
| clf = svm.SVC(kernel="linear", probability=True) | |||
| clf.fit(data_X, data_y) | |||
| Next, we will provide detailed explanations for the content of these four files. | |||
| joblib.dump(clf, "svm.pkl") # model is stored as file "svm.pkl" | |||
| Model Invocation File ``__init__.py`` | |||
| ------------------------------------- | |||
| To ensure that the uploaded learnware can be used by subsequent users, you need to provide interfaces for model fitting ``fit(X, y)``, prediction ``predict(X)``, and fine-tuning ``finetune(X, y)`` in ``__init__.py``. Among these interfaces, only the ```predict(X)``` interface is mandatory, while the others depend on the functionality of your model. | |||
| Then the corresponding ``__init__.py`` for this SVM model should be structured as follows: | |||
| Below is a reference template for the ```__init__.py``` file. Please make sure that the input parameter format (the number of parameters and parameter names) for each interface in your model invocation file matches the template below. | |||
| .. code-block:: python | |||
| import os | |||
| import joblib | |||
| import pickle | |||
| import numpy as np | |||
| from learnware.model import BaseModel | |||
| class SVM(BaseModel): | |||
| class MyModel(BaseModel): | |||
| def __init__(self): | |||
| super(SVM, self).__init__(input_shape=(64,), output_shape=(10,)) | |||
| super(MyModel, self).__init__(input_shape=(37,), output_shape=(1,)) | |||
| dir_path = os.path.dirname(os.path.abspath(__file__)) | |||
| self.model = joblib.load(os.path.join(dir_path, "svm.pkl")) | |||
| model_path = os.path.join(dir_path, "model.pkl") | |||
| with open(model_path, "rb") as f: | |||
| self.model = pickle.load(f) | |||
| def fit(self, X: np.ndarray, y: np.ndarray): | |||
| pass | |||
| self.model = self.model.fit(X) | |||
| def predict(self, X: np.ndarray) -> np.ndarray: | |||
| return self.model.predict_proba(X) | |||
| return self.model.predict(X) | |||
| def finetune(self, X: np.ndarray, y: np.ndarray): | |||
| pass | |||
| Please remember to specify the ``input_shape`` and ``output_shape`` corresponding to your model. | |||
| In our sklearn digits classification example, these would be (64,) and (10,) respectively. | |||
| ``stat.json`` | |||
| ------------- | |||
| Please ensure that the ``MyModel`` class inherits from ``BaseModel`` in the ``learnware.model`` module, and specify the class name (e.g., ``MyModel``) in the ``learnware.yaml`` file later. | |||
| Input and Output Dimensions | |||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||
| To accurately and effectively match users with appropriate learnwares for their tasks, we require information about your training dataset. | |||
| Specifically, you are required to provide a statistical specification | |||
| stored as a json file, such as ``stat.json``, which contains the statistical information of the dataset. | |||
| This json file meets all our requirements regarding your training data, so you don't need to upload the local original data. | |||
| ``input_shape`` and ``output_shape`` represent the input and output dimensions of the model, respectively. You can refer to the following guidelines when filling them out: | |||
| - ``input_shape`` specifies a single input sample's dimension, and ``output_shape`` refers to the model's output dimension for a single sample. | |||
| - When the data type being processed is text data, there are no specific requirements for the value of ``input_shape``, and it can be filled in as ``None``. | |||
| - When the ``output_shape`` corresponds to tasks with variable outputs (such as object detection, text segmentation, etc.), there are no specific requirements for the value of ``output_shape``, and it can be filled in as ``None``. | |||
| - For classification tasks, ``output_shape`` should be (1, ) if the model directly outputs predicted labels, and the sample labels need to start from 0. If the model outputs logits, ``output_shape`` should be specified as the number of classes, i.e., (class_num, ). | |||
| There are various methods to generate a statistical specification. | |||
| If you choose to use Reduced Kernel Mean Embedding (RKME) as your statistical specification, | |||
| the following code snippet offers guidance on how to construct and store the RKME of a dataset: | |||
| File Path | |||
| ^^^^^^^^^^^^^^^^^^ | |||
| If you need to load certain files within the zip package in the ``__init__.py`` file (and any other Python files that may be involved), please follow the method shown in the template above about obtaining the ``model_path``: | |||
| - First, obtain the root directory path of the entire package by getting ``dir_path``. | |||
| - - Then, based on the specific file's relative location within the package, obtain the specific file's path, ``model_path``. | |||
| Module Imports | |||
| ^^^^^^^^^^^^^^^^^^ | |||
| Please note that module imports between Python files within the zip package should be done using **relative imports**. For instance: | |||
| .. code-block:: python | |||
| from learnware.specification import generate_rkme_spec | |||
| # generate rkme specification for digits dataset | |||
| spec = generate_rkme_spec(X=data_X) | |||
| from .package_name import * | |||
| from .package_name import module_name | |||
| Learnware Statistical Specification ``stat.json`` | |||
| --------------------------------------------------- | |||
| A learnware consists of a model and a specification. Therefore, after preparing the model, you need to generate a statistical specification for it. Specifically, using the previously installed ``learnware`` package, you can use the training data ``train_x`` (supported types include numpy.ndarray, pandas.DataFrame, and torch.Tensor) as input to generate the statistical specification of the model. | |||
| Here is an example of the code: | |||
| .. code-block:: python | |||
| from learnware.specification import generate_stat_spec | |||
| data_type = "table" # Data types: ["table", "image", "text"] | |||
| spec = generate_stat_spec(type=data_type, X=train_x) | |||
| spec.save("stat.json") | |||
| Significantly, the RKME generation process is entirely conducted on your local machine, without any involvement of cloud services, | |||
| guaranteeing the security and privacy of your local original data. | |||
| It's worth noting that the above code only runs on your local computer and does not interact with any cloud servers or leak any local private data. | |||
| Additionally, if the model's training data is too large, causing the above code to fail, you can consider sampling the training data to ensure it's of a suitable size before proceeding with reduction generation. | |||
| ``learnware.yaml`` | |||
| ------------------ | |||
| Additionally, you are asked to prepare a configuration file in YAML format. | |||
| The file should detail your model's class name, the type of statistical specification(e.g. Reduced Kernel Mean Embedding, ``RKMETableSpecification``), and | |||
| the file name of your statistical specification file. The following ``learnware.yaml`` provides an example of | |||
| how your learnware configuration file should be structured, based on our previous discussion: | |||
| Learnware Configuration File ``learnware.yaml`` | |||
| ------------------------------------------------- | |||
| This file is used to specify the class name (``MyModel``) in the model invocation file ``__init__.py``, the module called for generating the statistical specification (``learnware.specification``), the category of the statistical specification (``RKMETableSpecification``), and the specific filename (``stat.json``): | |||
| .. code-block:: yaml | |||
| model: | |||
| class_name: SVM | |||
| kwargs: {} | |||
| class_name: MyModel | |||
| kwargs: {} | |||
| stat_specifications: | |||
| - module_path: learnware.specification | |||
| - module_path: learnware.specification | |||
| class_name: RKMETableSpecification | |||
| file_name: stat.json | |||
| kwargs: {} | |||
| kwargs: {} | |||
| Please note that the statistical specification class name for different data types ``['table', 'image', 'text']`` is ``[RKMETableSpecification, RKMEImageSpecification, RKMETextSpecification]``, respectively. | |||
| ``environment.yaml`` or ``requirements.txt`` | |||
| Model Runtime Dependent File | |||
| -------------------------------------------- | |||
| In order to allow others to execute your learnware, it's necessary to specify your model's dependencies. | |||
| You can do this by providing either an ``environment.yaml`` file or a ``requirements.txt`` file. | |||
| To ensure that your uploaded learnware can be used by other users, the ``zip`` package of the uploaded learnware should specify the model's runtime dependencies. The Beimingwu System supports the following two ways to specify runtime dependencies: | |||
| - Provide an ``environment.yaml`` file supported by ``conda``. | |||
| - Provide a ``requirements.txt`` file supported by ``pip``. | |||
| You can choose either method, but please try to remove unnecessary dependencies to keep the dependency list as minimal as possible. | |||
| - ``environment.yaml`` for conda: | |||
| Using ``environment.yaml`` File | |||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||
| If you provide an ``environment.yaml``, a new conda environment will be created based on this file | |||
| when users install your learnware. You can generate this yaml file using the following command: | |||
| - For Windows users: | |||
| You can export the `environment.yaml` file directly from the `conda` virtual environment using the following command: | |||
| .. code-block:: | |||
| - For Linux and macOS systems | |||
| conda env export | findstr /v "^prefix: " > environment.yaml | |||
| .. code-block:: bash | |||
| conda env export | grep -v "^prefix: " > environment.yaml | |||
| - For macOS and Linux users: | |||
| - For Windows systems: | |||
| .. code-block:: | |||
| .. code-block:: bash | |||
| conda env export | findstr /v "^prefix: " > environment.yaml | |||
| conda env export | grep -v "^prefix: " > environment.yaml | |||
| Note that the ``environment.yaml`` file in the ``zip`` package needs to be encoded in ``UTF-8`` format. Please check the encoding format of the ``environment.yaml`` file after using the above command. Due to the ``conda`` version and system differences, you may not get a ``UTF-8`` encoded file (e.g. get a ``UTF-16LE`` encoded file). You'll need to manually convert the file to ``UTF-8``, which is supported by most text editors. The following ``Python`` code for encoding conversion is also for reference: | |||
| .. code-block:: python | |||
| import codecs | |||
| # Read the output file from the 'conda env export' command | |||
| # Assuming the file name is environment.yaml and the export format is UTF-16LE | |||
| with codecs.open('environment.yaml', 'r', encoding='utf-16le') as file: | |||
| content = file.read() | |||
| # Convert the content to UTF-8 encoding | |||
| output_content = content.encode('utf-8') | |||
| # Write to UTF-8 encoded file | |||
| with open('environment.yaml', 'wb') as file: | |||
| file.write(output_content) | |||
| - ``requirements.txt`` for pip: | |||
| If you provide a ``requirements.txt``, the dependent packages will be installed using the `-r` option of pip. | |||
| You can find more information about ``requirements.txt`` in | |||
| `pip documentation <https://pip.pypa.io/en/stable/user_guide/#requirements-files>`_. | |||
| Additionally, due to the complexity of users' local ``conda`` virtual environments, you can execute the following command before uploading to confirm that there are no dependency conflicts in the ``environment.yaml`` file: | |||
| .. code-block:: bash | |||
| We recommend using ``environment.yaml`` as it can help minimize conflicts between different packages. | |||
| conda env create --name test_env --file environment.yaml | |||
| .. note:: | |||
| Whether you choose to use ``environment.yaml`` or ``requirements.txt``, | |||
| it's important to keep your dependencies as minimal as possible. | |||
| This may involve manually opening the file and removing any unnecessary packages. | |||
| The above command will create a virtual environment based on the ``environment.yaml`` file, and if successful, it indicates that there are no dependency conflicts. You can delete the created virtual environment using the following command: | |||
| .. code-block:: bash | |||
| Check Learnware | |||
| ==================== | |||
| conda env remove --name test_env | |||
| Upload Learnware | |||
| ================== | |||
| Using `requirements.txt` File | |||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||
| The ``requirements.txt`` file should list the packages required for running the ``__init__.py`` file and their specific versions. You can obtain these version details by executing the ``pip show <package_name>`` or ``conda list <package_name>`` command. Here is an example file: | |||
| .. code-block:: text | |||
| numpy==1.23.5 | |||
| scikit-learn==1.2.2 | |||
| Manually listing these dependencies can be cumbersome, so you can also use the ``pipreqs`` package to automatically scan your entire project and export the packages used along with their specific versions (though some manual verification may be required): | |||
| After preparing the four required files mentioned above, | |||
| you can bundle them into your own learnware zipfile. Along with the generated semantic specification that | |||
| succinctly describes the features of your task and model (for more details, please refer to :ref:`semantic specification<components/spec:Semantic Specification>`), | |||
| you can effortlessly upload your learnware to the ``Learnware Market`` using a single line of code: | |||
| .. code-block:: bash | |||
| pip install pipreqs | |||
| pipreqs ./ # Run this command in the project's root directory | |||
| Please note that if you use the ``requirements.txt`` file to specify runtime dependencies, the system will by default install these dependencies in a ``conda`` virtual environment running ``Python 3.8`` during the learnware deployment. | |||
| Furthermore, for version-sensitive packages like ``torch``, it's essential to specify package versions in the ``requirements.txt`` file to ensure successful deployment of the uploaded learnware on other machines. | |||
| Upload Learnware | |||
| ================================== | |||
| After preparing the four required files mentioned above, you can bundle them into your own learnware ``zip`` package. | |||
| Prepare Sematic Specifcation | |||
| ----------------------------- | |||
| The semantic specification succinctly describes the features of your task and model. For uploading learnware ``zip`` package, the user need to prepare the semantic specification. Here is an example of a "Table Data" for a "Classification Task": | |||
| .. code-block:: python | |||
| import learnware | |||
| from learnware.market import EasyMarket | |||
| from learnware.specification import generate_semantic_spec | |||
| # Prepare input description when data_type="Table" | |||
| input_description = { | |||
| "Dimension": 5, | |||
| "Description": { | |||
| "0": "age", | |||
| "1": "weight", | |||
| "2": "body length", | |||
| "3": "animal type", | |||
| "4": "claw length" | |||
| }, | |||
| } | |||
| # Prepare output description when task_type in ["Classification", "Regression"] | |||
| output_description = { | |||
| "Dimension": 3, | |||
| "Description": { | |||
| "0": "cat", | |||
| "1": "dog", | |||
| "2": "bird", | |||
| }, | |||
| } | |||
| # Create semantic specification | |||
| semantic_spec = generate_semantic_spec( | |||
| name="learnware_example", | |||
| description="Just an example for uploading learnware", | |||
| data_type="Table", | |||
| task_type="Classification", | |||
| library_type="Scikit-learn", | |||
| scenarios=["Business", "Financial"], | |||
| input_description=input_description, | |||
| output_description=output_description, | |||
| ) | |||
| For more details, please refer to :ref:`semantic specification<components/spec:Semantic Specification>`, | |||
| Uploading | |||
| -------------- | |||
| you can effortlessly upload your learnware to the ``Learnware Market`` as follows. | |||
| learnware.init() | |||
| # EasyMarket: most basic set of functions in a Learnware Market | |||
| easy_market = EasyMarket(market_id="demo", rebuild=True) | |||
| .. code-block:: python | |||
| from learnware.market import BaseChecker | |||
| from learnware.market import instantiate_learnware_market | |||
| # instantiate a demo market | |||
| demo_market = instantiate_learnware_market(market_id="demo", name="hetero", rebuild=True) | |||
| # upload the learnware into the market | |||
| learnware_id, learnware_status = demo_market.add_learnware(zip_path, semantic_spec) | |||
| # single line uploading | |||
| easy_market.add_learnware(zip_path, semantic_spec) | |||
| # assert whether the learnware passed the check and was uploaded successfully. | |||
| assert learnware_status != BaseChecker.INVALID_LEARNWARE, "Insert learnware failed!" | |||
| Here, ``zip_path`` refers to the directory of your learnware zipfile. | |||
| Here, ``zip_path`` refers to the directory of your learnware ``zip`` package. ``learnware_id`` indicates the id assigned by ``Learnware Market``, and the ``learnware_status`` indicates the check status for learnware. | |||
| .. note:: | |||
| The learnware ``zip`` package uploaded into ``LearnwareMarket`` will be checked semantically and statistically, and ``add_learnware`` will return the concrete check status. The check status ``BaseChecker.INVALID_LEARNWARE`` indicates the learnware did not pass the check. For more details about learnware checker, please refer to `Learnware Market <../components/market.html#easy-checker>` | |||
| Remove Learnware | |||
| ================== | |||
| @@ -39,13 +39,14 @@ python workflow.py labeled_text_example | |||
| The table below presents the mean accuracy of search and reuse across all users: | |||
| | Metric | Value | | |||
| |--------------------------------------|---------------------| | |||
| | Mean in Market (Single) | 0.507 | | |||
| | Best in Market (Single) | 0.859 | | |||
| | Top-1 Reuse (Single) | 0.846 | | |||
| | Job Selector Reuse (Multiple) | 0.845 | | |||
| | Average Ensemble Reuse (Multiple) | 0.862 | | |||
| | Setting | Accuracy | | |||
| |---------------------------------------|---------------------| | |||
| | Mean in Market (Single) | 0.507 | | |||
| | Best in Market (Single) | 0.859 | | |||
| | Top-1 Reuse (Single) | 0.846 | | |||
| | Job Selector Reuse (Multiple) | 0.845 | | |||
| | Average Ensemble Reuse (Multiple) | 0.862 | | |||
| ### ``labeled_text_example``: | |||
| @@ -1,7 +1,8 @@ | |||
| __version__ = "0.2.0.9" | |||
| import os | |||
| import json | |||
| import os | |||
| from .logger import get_module_logger | |||
| from .utils import is_torch_available, setup_seed | |||
| @@ -1,21 +1,19 @@ | |||
| import atexit | |||
| import os | |||
| import docker | |||
| import pickle | |||
| import atexit | |||
| import tarfile | |||
| import tempfile | |||
| import shortuuid | |||
| from concurrent.futures import ThreadPoolExecutor | |||
| from typing import List, Optional, Union | |||
| import docker | |||
| import shortuuid | |||
| from typing import List, Union, Optional | |||
| from .utils import system_execute, install_environment, remove_enviroment | |||
| from .utils import install_environment, remove_enviroment, system_execute | |||
| from ..config import C | |||
| from ..learnware import Learnware | |||
| from ..model.base import BaseModel | |||
| from .package_utils import filter_nonexist_conda_packages_file, filter_nonexist_pip_packages_file | |||
| from ..logger import get_module_logger | |||
| from ..model.base import BaseModel | |||
| logger = get_module_logger(module_name="client_container") | |||
| @@ -224,7 +222,7 @@ class ModelDockerContainer(ModelContainer): | |||
| } | |||
| container = client.containers.run(**container_config) | |||
| logger.info(f"Docker container {container.id[:12]} is generated.") | |||
| try: | |||
| environment_cmd = [ | |||
| "pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple", | |||
| @@ -265,7 +263,7 @@ class ModelDockerContainer(ModelContainer): | |||
| if isinstance(docker_container, docker.models.containers.Container): | |||
| client = docker.from_env() | |||
| container_ids = [container.id for container in client.containers.list()] | |||
| if docker_container.id in container_ids: | |||
| docker_container.stop() | |||
| docker_container.remove() | |||
| @@ -521,7 +519,7 @@ class LearnwaresContainer: | |||
| except KeyboardInterrupt: | |||
| logger.warning("The KeyboardInterrupt is ignored when removing the container env!") | |||
| self._destroy_docker_container() | |||
| def __enter__(self): | |||
| if self.mode == "conda": | |||
| self.learnware_containers = [ | |||
| @@ -1,24 +1,23 @@ | |||
| import os | |||
| import uuid | |||
| import yaml | |||
| import json | |||
| import atexit | |||
| import zipfile | |||
| import hashlib | |||
| import requests | |||
| import json | |||
| import os | |||
| import tempfile | |||
| import uuid | |||
| import zipfile | |||
| from enum import Enum | |||
| from typing import List, Optional, Union | |||
| import requests | |||
| import yaml | |||
| from tqdm import tqdm | |||
| from typing import Union, List, Optional | |||
| from ..config import C | |||
| from .container import LearnwaresContainer | |||
| from ..market import BaseChecker | |||
| from ..specification import generate_semantic_spec | |||
| from ..logger import get_module_logger | |||
| from ..config import C | |||
| from ..learnware import get_learnware_from_dirpath | |||
| from ..market import BaseUserInfo | |||
| from ..logger import get_module_logger | |||
| from ..market import BaseChecker, BaseUserInfo | |||
| from ..specification import generate_semantic_spec | |||
| CHUNK_SIZE = 1024 * 1024 | |||
| logger = get_module_logger(module_name="LearnwareClient") | |||
| @@ -413,7 +412,7 @@ class LearnwareClient: | |||
| @staticmethod | |||
| def _check_stat_specification(learnware): | |||
| from ..market import EasyStatChecker, CondaChecker | |||
| from ..market import CondaChecker, EasyStatChecker | |||
| stat_checker = CondaChecker(inner_checker=EasyStatChecker()) | |||
| check_status, message = stat_checker(learnware) | |||
| @@ -1,14 +1,14 @@ | |||
| import json | |||
| import os | |||
| import re | |||
| import json | |||
| import yaml | |||
| import tempfile | |||
| import subprocess | |||
| from typing import List, Tuple | |||
| from . import utils | |||
| import tempfile | |||
| from concurrent.futures import ThreadPoolExecutor | |||
| from typing import List, Tuple | |||
| import yaml | |||
| from . import utils | |||
| from ..logger import get_module_logger | |||
| logger = get_module_logger("package_utils") | |||
| @@ -1,6 +1,6 @@ | |||
| import argparse | |||
| from learnware.client.utils import install_environment | |||
| from learnware.client.utils import install_environment | |||
| if __name__ == "__main__": | |||
| parser = argparse.ArgumentParser() | |||
| @@ -1,6 +1,7 @@ | |||
| import sys | |||
| import pickle | |||
| import argparse | |||
| import pickle | |||
| import sys | |||
| from learnware.utils import get_module_by_module_path | |||
| @@ -1,10 +1,9 @@ | |||
| import os | |||
| import zipfile | |||
| import tempfile | |||
| import subprocess | |||
| import tempfile | |||
| from ..logger import get_module_logger | |||
| from .package_utils import filter_nonexist_conda_packages_file, filter_nonexist_pip_packages_file | |||
| from ..logger import get_module_logger | |||
| logger = get_module_logger(module_name="client_utils") | |||
| @@ -1,6 +1,6 @@ | |||
| import os | |||
| import copy | |||
| import logging | |||
| import os | |||
| from enum import Enum | |||
| @@ -1,14 +1,14 @@ | |||
| import os | |||
| import copy | |||
| from typing import Optional | |||
| import os | |||
| import traceback | |||
| from typing import Optional | |||
| from .base import Learnware | |||
| from .utils import get_stat_spec_from_config | |||
| from ..config import C | |||
| from ..logger import get_module_logger | |||
| from ..specification import Specification | |||
| from ..utils import read_yaml_to_dict | |||
| from ..logger import get_module_logger | |||
| from ..config import C | |||
| logger = get_module_logger("learnware.learnware") | |||
| @@ -1,12 +1,13 @@ | |||
| import os | |||
| import numpy as np | |||
| from typing import Union, List | |||
| import sys | |||
| from typing import List, Union | |||
| import numpy as np | |||
| from ..specification import Specification, BaseStatSpecification | |||
| from ..logger import get_module_logger | |||
| from ..model import BaseModel | |||
| from ..specification import BaseStatSpecification, Specification | |||
| from ..utils import get_module_by_module_path | |||
| from ..logger import get_module_logger | |||
| logger = get_module_logger("Learnware") | |||
| @@ -1,9 +1,10 @@ | |||
| from .anchor import AnchoredUserInfo, AnchoredSearcher, AnchoredOrganizer | |||
| from .base import BaseUserInfo, LearnwareMarket, BaseChecker, BaseOrganizer, BaseSearcher | |||
| from .evolve_anchor import EvolvedAnchoredOrganizer | |||
| from .anchor import AnchoredOrganizer, AnchoredSearcher, AnchoredUserInfo | |||
| from .base import (BaseChecker, BaseOrganizer, BaseSearcher, BaseUserInfo, | |||
| LearnwareMarket) | |||
| from .classes import CondaChecker | |||
| from .easy import (EasyOrganizer, EasySearcher, EasySemanticChecker, | |||
| EasyStatChecker) | |||
| from .evolve import EvolvedOrganizer | |||
| from .easy import EasyOrganizer, EasySearcher, EasySemanticChecker, EasyStatChecker | |||
| from .evolve_anchor import EvolvedAnchoredOrganizer | |||
| from .heterogeneous import HeteroMapTableOrganizer, HeteroSearcher | |||
| from .classes import CondaChecker | |||
| from .module import instantiate_learnware_market | |||
| @@ -1,8 +1,7 @@ | |||
| from .organizer import AnchoredOrganizer | |||
| from .user_info import AnchoredUserInfo | |||
| from ...utils import is_torch_available | |||
| from ...logger import get_module_logger | |||
| from ...utils import is_torch_available | |||
| logger = get_module_logger("market_anchor") | |||
| @@ -1,8 +1,8 @@ | |||
| from typing import Dict | |||
| from ..easy.organizer import EasyOrganizer | |||
| from ...logger import get_module_logger | |||
| from ...learnware import Learnware | |||
| from ...logger import get_module_logger | |||
| logger = get_module_logger("anchor_organizer") | |||
| @@ -1,9 +1,9 @@ | |||
| from typing import List, Tuple, Any | |||
| from typing import Any, List, Tuple | |||
| from .user_info import AnchoredUserInfo | |||
| from ..easy.searcher import EasySearcher | |||
| from ...logger import get_module_logger | |||
| from ...learnware import Learnware | |||
| from ...logger import get_module_logger | |||
| logger = get_module_logger("anchor_searcher") | |||
| @@ -1,4 +1,5 @@ | |||
| from typing import List, Any, Union | |||
| from typing import Any, List, Union | |||
| from ..base import BaseUserInfo | |||
| @@ -1,10 +1,11 @@ | |||
| from __future__ import annotations | |||
| import tempfile | |||
| import traceback | |||
| import zipfile | |||
| import tempfile | |||
| from typing import Tuple, Any, List, Union, Optional | |||
| from dataclasses import dataclass | |||
| from typing import Any, List, Optional, Tuple, Union | |||
| from ..learnware import Learnware, get_learnware_from_dirpath | |||
| from ..logger import get_module_logger | |||
| @@ -1,8 +1,9 @@ | |||
| import traceback | |||
| from typing import Tuple | |||
| from .base import BaseChecker | |||
| from ..learnware import Learnware | |||
| from ..client.container import LearnwaresContainer | |||
| from ..learnware import Learnware | |||
| from ..logger import get_module_logger | |||
| logger = get_module_logger("market_classes") | |||
| @@ -1,7 +1,6 @@ | |||
| from .organizer import EasyOrganizer | |||
| from ...utils import is_torch_available | |||
| from ...logger import get_module_logger | |||
| from ...utils import is_torch_available | |||
| logger = get_module_logger("market_easy") | |||
| @@ -11,5 +10,6 @@ if not is_torch_available(verbose=False): | |||
| EasyStatChecker = None | |||
| logger.error("EasySeacher and EasyChecker are not available because 'torch' is not installed!") | |||
| else: | |||
| from .searcher import EasySearcher, EasyStatSearcher, EasyFuzzSemanticSearcher, EasyExactSemanticSearcher | |||
| from .checker import EasySemanticChecker, EasyStatChecker | |||
| from .searcher import (EasyExactSemanticSearcher, EasyFuzzSemanticSearcher, | |||
| EasySearcher, EasyStatSearcher) | |||
| @@ -1,10 +1,10 @@ | |||
| import traceback | |||
| import numpy as np | |||
| import torch | |||
| import random | |||
| import string | |||
| import traceback | |||
| import numpy as np | |||
| import torch | |||
| from ..base import BaseChecker | |||
| from ..utils import parse_specification_type | |||
| from ...config import C | |||
| @@ -1,9 +1,10 @@ | |||
| from sqlalchemy.ext.declarative import declarative_base | |||
| from sqlalchemy import create_engine, text | |||
| from sqlalchemy import Column, Text, String | |||
| import os | |||
| import json | |||
| import os | |||
| import traceback | |||
| from sqlalchemy import Column, String, Text, create_engine, text | |||
| from sqlalchemy.ext.declarative import declarative_base | |||
| from ...learnware import get_learnware_from_dirpath | |||
| from ...logger import get_module_logger | |||
| @@ -1,14 +1,13 @@ | |||
| import os | |||
| import copy | |||
| import zipfile | |||
| import os | |||
| import tempfile | |||
| import zipfile | |||
| from shutil import copyfile, rmtree | |||
| from typing import Tuple, List, Union, Dict | |||
| from typing import Dict, List, Tuple, Union | |||
| from .database_ops import DatabaseOperations | |||
| from ..base import BaseOrganizer, BaseChecker | |||
| from ..base import BaseChecker, BaseOrganizer | |||
| from ...config import C as conf | |||
| from ...logger import get_module_logger | |||
| from ...learnware import Learnware, get_learnware_from_dirpath | |||
| from ...logger import get_module_logger | |||
| @@ -107,22 +106,22 @@ class EasyOrganizer(BaseOrganizer): | |||
| if new_learnware is None: | |||
| return None, BaseChecker.INVALID_LEARNWARE | |||
| learnwere_status = check_status if check_status is not None else BaseChecker.NONUSABLE_LEARNWARE | |||
| learnware_status = check_status if check_status is not None else BaseChecker.NONUSABLE_LEARNWARE | |||
| self.dbops.add_learnware( | |||
| id=learnware_id, | |||
| semantic_spec=semantic_spec, | |||
| zip_path=target_zip_dir, | |||
| folder_path=target_folder_dir, | |||
| use_flag=learnwere_status, | |||
| use_flag=learnware_status, | |||
| ) | |||
| self.learnware_list[learnware_id] = new_learnware | |||
| self.learnware_zip_list[learnware_id] = target_zip_dir | |||
| self.learnware_folder_list[learnware_id] = target_folder_dir | |||
| self.use_flags[learnware_id] = learnwere_status | |||
| self.use_flags[learnware_id] = learnware_status | |||
| self.count += 1 | |||
| return learnware_id, learnwere_status | |||
| return learnware_id, learnware_status | |||
| def delete_learnware(self, id: str) -> bool: | |||
| """Delete Learnware from market | |||
| @@ -1,15 +1,18 @@ | |||
| import math | |||
| import torch | |||
| from typing import List, Optional, Tuple, Union | |||
| import numpy as np | |||
| import torch | |||
| from rapidfuzz import fuzz | |||
| from typing import Tuple, List, Union, Optional | |||
| from .organizer import EasyOrganizer | |||
| from ..base import (BaseSearcher, BaseUserInfo, MultipleSearchItem, | |||
| SearchResults, SingleSearchItem) | |||
| from ..utils import parse_specification_type | |||
| from ..base import BaseUserInfo, BaseSearcher, SearchResults, SingleSearchItem, MultipleSearchItem | |||
| from ...learnware import Learnware | |||
| from ...specification import RKMETableSpecification, RKMEImageSpecification, RKMETextSpecification, rkme_solve_qp | |||
| from ...logger import get_module_logger | |||
| from ...specification import (RKMEImageSpecification, RKMETableSpecification, | |||
| RKMETextSpecification, rkme_solve_qp) | |||
| logger = get_module_logger("easy_seacher") | |||
| @@ -2,8 +2,8 @@ from typing import List | |||
| from ..easy.organizer import EasyOrganizer | |||
| from ...learnware import Learnware | |||
| from ...specification import BaseStatSpecification | |||
| from ...logger import get_module_logger | |||
| from ...specification import BaseStatSpecification | |||
| logger = get_module_logger("evolve_organizer") | |||
| @@ -1,7 +1,7 @@ | |||
| from typing import List | |||
| from ..evolve import EvolvedOrganizer | |||
| from ..anchor import AnchoredOrganizer, AnchoredUserInfo | |||
| from ..evolve import EvolvedOrganizer | |||
| from ...logger import get_module_logger | |||
| logger = get_module_logger("evolve_anchor_organizer") | |||
| @@ -1,5 +1,5 @@ | |||
| from ...utils import is_torch_available | |||
| from ...logger import get_module_logger | |||
| from ...utils import is_torch_available | |||
| logger = get_module_logger("market_hetero") | |||
| @@ -1,6 +1,5 @@ | |||
| import os | |||
| import traceback | |||
| import pandas as pd | |||
| from collections import defaultdict | |||
| from typing import List, Tuple, Union | |||
| @@ -14,7 +13,6 @@ from ....learnware import Learnware | |||
| from ....logger import get_module_logger | |||
| from ....specification import HeteroMapTableSpecification | |||
| logger = get_module_logger("hetero_map_table_organizer") | |||
| @@ -6,10 +6,11 @@ import torch | |||
| import torch.nn.functional as F | |||
| from torch import nn | |||
| from .....utils import allocate_cuda_idx, choose_device | |||
| from .....specification import HeteroMapTableSpecification, RKMETableSpecification | |||
| from .feature_extractor import CLSToken, FeatureProcessor, FeatureTokenizer | |||
| from .trainer import TransTabCollatorForCL, Trainer | |||
| from .trainer import Trainer, TransTabCollatorForCL | |||
| from .....specification import (HeteroMapTableSpecification, | |||
| RKMETableSpecification) | |||
| from .....utils import allocate_cuda_idx, choose_device | |||
| class HeteroMap(nn.Module): | |||
| @@ -10,8 +10,8 @@ from torch import nn | |||
| from torch.utils.data import DataLoader, Dataset | |||
| from tqdm.autonotebook import trange | |||
| from .....logger import get_module_logger | |||
| from .feature_extractor import FeatureTokenizer | |||
| from .....logger import get_module_logger | |||
| logger = get_module_logger("hetero_mapping_trainer") | |||
| @@ -6,7 +6,6 @@ from ..easy import EasySearcher | |||
| from ..utils import parse_specification_type | |||
| from ...logger import get_module_logger | |||
| logger = get_module_logger("hetero_searcher") | |||
| @@ -1,4 +1,5 @@ | |||
| import traceback | |||
| from ...logger import get_module_logger | |||
| logger = get_module_logger("hetero_utils") | |||
| @@ -1,6 +1,7 @@ | |||
| from .base import LearnwareMarket | |||
| from .classes import CondaChecker | |||
| from .easy import EasyOrganizer, EasySearcher, EasySemanticChecker, EasyStatChecker | |||
| from .easy import (EasyOrganizer, EasySearcher, EasySemanticChecker, | |||
| EasyStatChecker) | |||
| from .heterogeneous import HeteroMapTableOrganizer, HeteroSearcher | |||
| @@ -1,6 +1,7 @@ | |||
| import numpy as np | |||
| from typing import Union | |||
| import numpy as np | |||
| class BaseModel: | |||
| """Base interface tor model standard when user want to submit learnware to market.""" | |||
| @@ -1,6 +1,5 @@ | |||
| from .base import BaseReuser | |||
| from .align import AlignLearnware | |||
| from .base import BaseReuser | |||
| from ..logger import get_module_logger | |||
| from ..utils import is_torch_available | |||
| @@ -18,7 +17,7 @@ if not is_torch_available(verbose=False): | |||
| ) | |||
| else: | |||
| from .averaging import AveragingReuser | |||
| from .ensemble_pruning import EnsemblePruningReuser | |||
| from .feature_augment import FeatureAugmentReuser | |||
| from .hetero import HeteroMapAlignLearnware, FeatureAlignLearnware | |||
| from .job_selector import JobSelectorReuser | |||
| from .ensemble_pruning import EnsemblePruningReuser | |||
| from .hetero import FeatureAlignLearnware, HeteroMapAlignLearnware | |||
| from .job_selector import JobSelectorReuser | |||
| @@ -1,11 +1,11 @@ | |||
| import torch | |||
| import numpy as np | |||
| from typing import List, Union | |||
| from scipy.special import softmax | |||
| import numpy as np | |||
| import torch | |||
| from scipy.special import softmax | |||
| from ..learnware import Learnware | |||
| from .base import BaseReuser | |||
| from ..learnware import Learnware | |||
| from ..logger import get_module_logger | |||
| logger = get_module_logger("avaraging_reuser") | |||
| @@ -1,6 +1,7 @@ | |||
| import numpy as np | |||
| from typing import List | |||
| import numpy as np | |||
| from ..learnware import Learnware | |||
| from ..logger import get_module_logger | |||
| @@ -1,10 +1,11 @@ | |||
| import torch | |||
| import random | |||
| import numpy as np | |||
| from typing import List | |||
| from ..learnware import Learnware | |||
| import numpy as np | |||
| import torch | |||
| from .base import BaseReuser | |||
| from ..learnware import Learnware | |||
| from ..logger import get_module_logger | |||
| logger = get_module_logger("ensemble_pruning") | |||
| @@ -1,7 +1,8 @@ | |||
| import torch | |||
| import numpy as np | |||
| from typing import List | |||
| from sklearn.linear_model import RidgeCV, LogisticRegressionCV | |||
| import numpy as np | |||
| import torch | |||
| from sklearn.linear_model import LogisticRegressionCV, RidgeCV | |||
| from .base import BaseReuser | |||
| from .utils import fill_data_with_mean | |||
| @@ -1,17 +1,18 @@ | |||
| import time | |||
| import torch | |||
| from typing import List | |||
| import numpy as np | |||
| import torch | |||
| import torch.nn as nn | |||
| from typing import List | |||
| from tqdm import trange | |||
| import torch.nn.functional as F | |||
| from tqdm import trange | |||
| from ..align import AlignLearnware | |||
| from ..utils import fill_data_with_mean | |||
| from ...utils import choose_device, allocate_cuda_idx | |||
| from ...logger import get_module_logger | |||
| from ...learnware import Learnware | |||
| from ...logger import get_module_logger | |||
| from ...specification import RKMETableSpecification | |||
| from ...utils import allocate_cuda_idx, choose_device | |||
| logger = get_module_logger("feature_align") | |||
| @@ -1,10 +1,10 @@ | |||
| import numpy as np | |||
| from .feature_align import FeatureAlignLearnware | |||
| from ..align import AlignLearnware | |||
| from ..feature_augment import FeatureAugmentReuser | |||
| from ...learnware import Learnware | |||
| from ...logger import get_module_logger | |||
| from .feature_align import FeatureAlignLearnware | |||
| from ..feature_augment import FeatureAugmentReuser | |||
| from ...specification import RKMETableSpecification | |||
| logger = get_module_logger("hetero_map_align") | |||
| @@ -1,15 +1,15 @@ | |||
| import torch | |||
| import numpy as np | |||
| from typing import List, Union | |||
| import numpy as np | |||
| import torch | |||
| from sklearn.metrics import accuracy_score | |||
| from .base import BaseReuser | |||
| from ..market.utils import parse_specification_type | |||
| from ..learnware import Learnware | |||
| from ..specification import RKMETableSpecification, RKMETextSpecification | |||
| from ..specification import generate_rkme_table_spec, rkme_solve_qp | |||
| from ..logger import get_module_logger | |||
| from ..market.utils import parse_specification_type | |||
| from ..specification import (RKMETableSpecification, RKMETextSpecification, | |||
| generate_rkme_table_spec, rkme_solve_qp) | |||
| logger = get_module_logger("job_selector_reuse") | |||
| @@ -1,4 +1,5 @@ | |||
| import numpy as np | |||
| from ..logger import get_module_logger | |||
| logger = get_module_logger("reuse_utils") | |||
| @@ -1,15 +1,8 @@ | |||
| from .base import Specification, BaseStatSpecification | |||
| from .regular import ( | |||
| RegularStatSpecification, | |||
| RKMEStatSpecification, | |||
| RKMETableSpecification, | |||
| RKMEImageSpecification, | |||
| RKMETextSpecification, | |||
| rkme_solve_qp, | |||
| ) | |||
| from .base import BaseStatSpecification, Specification | |||
| from .regular import (RegularStatSpecification, RKMEImageSpecification, | |||
| RKMEStatSpecification, RKMETableSpecification, | |||
| RKMETextSpecification, rkme_solve_qp) | |||
| from .system import HeteroMapTableSpecification | |||
| from ..utils import is_torch_available | |||
| if not is_torch_available(verbose=False): | |||
| @@ -19,10 +12,6 @@ if not is_torch_available(verbose=False): | |||
| generate_rkme_text_spec = None | |||
| generate_semantic_spec = None | |||
| else: | |||
| from .module import ( | |||
| generate_stat_spec, | |||
| generate_rkme_table_spec, | |||
| generate_rkme_image_spec, | |||
| generate_rkme_text_spec, | |||
| generate_semantic_spec, | |||
| ) | |||
| from .module import (generate_rkme_image_spec, generate_rkme_table_spec, | |||
| generate_rkme_text_spec, generate_semantic_spec, | |||
| generate_stat_spec) | |||
| @@ -1,9 +1,10 @@ | |||
| from __future__ import annotations | |||
| import copy | |||
| import numpy as np | |||
| from typing import Dict | |||
| import numpy as np | |||
| class BaseStatSpecification: | |||
| """The Statistical Specification Interface, which provide save and load method""" | |||
| @@ -1,11 +1,13 @@ | |||
| import torch | |||
| from typing import List, Optional, Union | |||
| import numpy as np | |||
| import pandas as pd | |||
| from typing import Union, List, Optional | |||
| import torch | |||
| from .utils import convert_to_numpy | |||
| from .base import BaseStatSpecification | |||
| from .regular import RKMETableSpecification, RKMEImageSpecification, RKMETextSpecification | |||
| from .regular import (RKMEImageSpecification, RKMETableSpecification, | |||
| RKMETextSpecification) | |||
| from .utils import convert_to_numpy | |||
| from ..config import C | |||
| @@ -1,4 +1,4 @@ | |||
| from .base import RegularStatSpecification | |||
| from .text import RKMETextSpecification | |||
| from .table import RKMETableSpecification, RKMEStatSpecification, rkme_solve_qp | |||
| from .image import RKMEImageSpecification | |||
| from .table import RKMEStatSpecification, RKMETableSpecification, rkme_solve_qp | |||
| from .text import RKMETextSpecification | |||
| @@ -1,6 +1,5 @@ | |||
| from ....utils import is_torch_available | |||
| from ....logger import get_module_logger | |||
| from ....utils import is_torch_available | |||
| logger = get_module_logger("regular_image_spec") | |||
| @@ -1,9 +1,9 @@ | |||
| import math | |||
| import numpy as np | |||
| import torch as t | |||
| import torch.nn as nn | |||
| import torch.nn.functional as F | |||
| import numpy as np | |||
| import math | |||
| __all__ = ("NNGPKernel", "Conv2d", "ReLU", "Sequential", "ConvKP", "NonlinKP") | |||
| """ | |||
| @@ -4,23 +4,22 @@ import codecs | |||
| import functools | |||
| import json | |||
| import os | |||
| from typing import Any | |||
| from contextlib import contextmanager | |||
| from typing import Any | |||
| import numpy as np | |||
| import torch | |||
| from numpy.random import RandomState | |||
| from torch import nn | |||
| from torch.utils.data import TensorDataset, DataLoader | |||
| from torch.utils.data import DataLoader, TensorDataset | |||
| from tqdm import tqdm | |||
| from numpy.random import RandomState | |||
| from . import cnn_gp | |||
| from ..base import RegularStatSpecification | |||
| from ..table.rkme import rkme_solve_qp | |||
| from .... import setup_seed | |||
| from ....logger import get_module_logger | |||
| from ....utils import choose_device, allocate_cuda_idx | |||
| from ....utils import allocate_cuda_idx, choose_device | |||
| logger = get_module_logger("image_rkme") | |||
| @@ -1,5 +1,5 @@ | |||
| from ....utils import is_torch_available | |||
| from ....logger import get_module_logger | |||
| from ....utils import is_torch_available | |||
| logger = get_module_logger("regular_table_spec") | |||
| @@ -11,4 +11,5 @@ if not is_torch_available(verbose=False): | |||
| f"RKMETableSpecification, RKMEStatSpecification and rkme_solve_qp are not available because 'torch' is not installed!" | |||
| ) | |||
| else: | |||
| from .rkme import RKMETableSpecification, RKMEStatSpecification, rkme_solve_qp | |||
| from .rkme import (RKMEStatSpecification, RKMETableSpecification, | |||
| rkme_solve_qp) | |||
| @@ -1,15 +1,16 @@ | |||
| from __future__ import annotations | |||
| import os | |||
| import torch | |||
| import json | |||
| import codecs | |||
| import scipy | |||
| import numpy as np | |||
| from qpsolvers import Problem, solve_problem | |||
| import json | |||
| import os | |||
| from collections import Counter | |||
| from typing import Any, Union | |||
| import numpy as np | |||
| import scipy | |||
| import torch | |||
| from qpsolvers import Problem, solve_problem | |||
| from ..base import RegularStatSpecification | |||
| from ....logger import get_module_logger | |||
| from ....utils import allocate_cuda_idx, choose_device | |||
| @@ -1,5 +1,5 @@ | |||
| from ....utils import is_torch_available | |||
| from ....logger import get_module_logger | |||
| from ....utils import is_torch_available | |||
| logger = get_module_logger("regular_text_spec") | |||
| @@ -1,10 +1,11 @@ | |||
| import os | |||
| import langdetect | |||
| import numpy as np | |||
| from ..table import RKMETableSpecification | |||
| from ....logger import get_module_logger | |||
| from ....config import C | |||
| from ....logger import get_module_logger | |||
| logger = get_module_logger("RKMETextSpecification", "INFO") | |||
| @@ -1,6 +1,6 @@ | |||
| from .base import SystemStatSpecification | |||
| from ...utils import is_torch_available | |||
| from ...logger import get_module_logger | |||
| from ...utils import is_torch_available | |||
| logger = get_module_logger("system_spec") | |||
| @@ -1,16 +1,17 @@ | |||
| from __future__ import annotations | |||
| import os | |||
| import json | |||
| import torch | |||
| import codecs | |||
| import json | |||
| import os | |||
| import numpy as np | |||
| import torch | |||
| from .base import SystemStatSpecification | |||
| from ..regular import RKMETableSpecification | |||
| from ..regular.table.rkme import torch_rbf_kernel | |||
| from ...logger import get_module_logger | |||
| from ...utils import choose_device, allocate_cuda_idx | |||
| from ...utils import allocate_cuda_idx, choose_device | |||
| logger = get_module_logger("hetero_map_table_spec") | |||
| @@ -1,7 +1,8 @@ | |||
| import torch | |||
| from typing import Union | |||
| import numpy as np | |||
| import pandas as pd | |||
| from typing import Union | |||
| import torch | |||
| def convert_to_numpy(data: Union[np.ndarray, pd.DataFrame, torch.Tensor]): | |||
| @@ -3,7 +3,7 @@ import pickle | |||
| import tempfile | |||
| import zipfile | |||
| from dataclasses import dataclass | |||
| from typing import Tuple, Optional, List, Union | |||
| from typing import List, Optional, Tuple, Union | |||
| from .config import BenchmarkConfig, benchmark_configs | |||
| from ..data import GetData | |||
| @@ -1,5 +1,5 @@ | |||
| from dataclasses import dataclass | |||
| from typing import Optional, List | |||
| from typing import List, Optional | |||
| @dataclass | |||
| @@ -1,4 +1,5 @@ | |||
| import json | |||
| import requests | |||
| from tqdm import tqdm | |||
| @@ -2,10 +2,10 @@ import os | |||
| import tempfile | |||
| from dataclasses import dataclass, field | |||
| from shutil import copyfile | |||
| from typing import List, Tuple, Union, Optional | |||
| from typing import List, Optional, Tuple, Union | |||
| from ...utils import save_dict_to_yaml, convert_folder_to_zipfile | |||
| from ...config import C | |||
| from ...utils import convert_folder_to_zipfile, save_dict_to_yaml | |||
| @dataclass | |||
| @@ -1,8 +1,11 @@ | |||
| import os | |||
| import pickle | |||
| import numpy as np | |||
| from learnware.model.base import BaseModel | |||
| class PickleLoadedModel(BaseModel): | |||
| def __init__( | |||
| @@ -1,5 +1,6 @@ | |||
| import unittest | |||
| def parametrize(test_class, **kwargs): | |||
| test_loader = unittest.TestLoader() | |||
| test_names = test_loader.getTestCaseNames(test_class) | |||
| @@ -1,11 +1,13 @@ | |||
| import os | |||
| import zipfile | |||
| from .file import (convert_folder_to_zipfile, read_yaml_to_dict, | |||
| save_dict_to_yaml) | |||
| from .gpu import allocate_cuda_idx, choose_device, setup_seed | |||
| from .import_utils import is_torch_available | |||
| from .module import get_module_by_module_path | |||
| from .file import read_yaml_to_dict, save_dict_to_yaml, convert_folder_to_zipfile | |||
| from .gpu import setup_seed, choose_device, allocate_cuda_idx | |||
| from ..config import get_platform, SystemType | |||
| from ..config import SystemType, get_platform | |||
| def zip_learnware_folder(path: str, output_name: str): | |||
| with zipfile.ZipFile(output_name, "w") as zip_ref: | |||
| @@ -1,7 +1,9 @@ | |||
| import os | |||
| import yaml | |||
| import zipfile | |||
| import yaml | |||
| def save_dict_to_yaml(dict_value: dict, save_path: str): | |||
| """save dict object into yaml file""" | |||
| with open(save_path, "w") as file: | |||
| @@ -1,5 +1,7 @@ | |||
| import random | |||
| import numpy as np | |||
| from .import_utils import is_torch_available | |||
| @@ -1,9 +1,9 @@ | |||
| import sys | |||
| import re | |||
| import importlib | |||
| import importlib.util | |||
| from typing import Union | |||
| import re | |||
| import sys | |||
| from types import ModuleType | |||
| from typing import Union | |||
| def get_module_by_module_path(module_path: Union[str, ModuleType]): | |||