[to #42322933] add/refactor nlp models source code and finetune

1. add sbert,veco,palm,space source code 2. support sbert sequence classification, token classification finetune 3. support veco sequence classification finetune 4. support palm nlg finetune evaluation result: https://sheet.alibaba-inc.com/#/sheet/f7fdcc7f22bd5105 sheet:Maas 5. add ut for finetunes 6. add veco's taskdataset processor 7. add a common trainer for nlp, and a specific trainer for veco 8. merge some duplicate codes of models, preprocessors, pipelines Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/9574105 * add basic class of hook&metrics * pre-commit passed * change some comments * pre commit passed * 1. remove accuracy's groups 2. remove useless hooks 3. simplify priorities * pre-commit passed * fix a comment * Merge branch 'master' into finetune_hooks_metrics # Conflicts: # modelscope/metainfo.py * pre-commit passed * add basic class of hook&metrics * pre-commit passed * change some comments * pre commit passed * 1. remove accuracy's groups 2. remove useless hooks 3. simplify priorities * pre-commit passed * fix a comment * Merge branch 'feat/finetune' of gitlab.alibaba-inc.com:Ali-MaaS/MaaS-lib into feat/finetune * mv hooks related to modelscope/trainers/hooks * mv priority back * add torch mdoel base and test * update hooks, trainer, import_util * add torch epoch based trainer and dis utils * add hooks * fix warmup * format code stype and fix warmup and add warmup unittest * fix impls * pre-commit check passed * update hook and add EpochBasedTrainer * add trainer unittest * Merge branch 'feat/add_hooks' into feat/add_task # Conflicts: # modelscope/models/base_torch.py # modelscope/trainers/hooks/hook.py # modelscope/trainers/trainer.py * update unittest name * rewrite taskdataset to trainer * fix trainer and add unittest * add unittest * code: run to forward * run through... but ugly code * arrange some cls * fix some errs * revert some mistakes * init check in * Merge branch 'feat/add_hooks' into feat/add_task # Conflicts: # modelscope/trainers/trainer.py * test with bigger epoch and size * add the default metrics class * move build metrics code to a method * merge add_task * merge origin add_task * add device initialization * remove preprocessor arg for bool * add task models * move metric collect logic to metrics class * pre-commit passed * fix cr comments * precommit passed * add task models * Merge remote-tracking branch 'origin/feat/add_task' into feat/backbone_head * add comment * change comment formats. * fix comments * fix ut bug * fix comments * add wrapper check * fix comments * pre commit passed * fix cr comments * solve a loop import problem * fix ut bug * fix ut errors * change dummydataset to msdataset * precommit passed * merge add task * backbone-head is build, model is not correctly loaded * model load states matched * result matched * lint * add veco/palm_v2 code * merge master * merge master success running * add repr model name level * Merge branch 'feat/veco_palm' into feat/finetune_sbert_veco * model test for training * add token-classification metric add formal ut * fix running bug * finetune and pipeline are working with backbone-head * add nli * add missing code * finetune and pipeline are working with backbone-head * Merge branch 'feat/backbone_head' of http://gitlab.alibaba-inc.com/Ali-MaaS/MaaS-lib into feat/backbone_head * add a test repo for pr * remove merge conflicted file * remove merge conflicted file 1 * lint check * import error * none type bug fix * forward input unpacking or dict bug * move head into models, add build_backbone with registry, no base method * merge master * feat: 1. add interleave dataset method 2. support multiple dataset in trainer.build_dataset 3. support 3 sub tasks in sequence_classification task * unfinished * update the task model structure in NLP field * merge master * update by comments * keep the default model id as current on production * unfinished * unfinished * veco can run * Merge remote-tracking branch 'origin/master' into feat/backbone_head * add taskmodel for module management * remove forward_input_is_dict * unfinished * token classification started * update base model structure * move space to backbone * remove 'type' in build_from_cfg method * test update * bug fix * on tesing, mess code * Merge branch 'feat/backbone_head' into feat/refactor_nlp_730 # Conflicts: # modelscope/metrics/builder.py # modelscope/models/__init__.py # modelscope/models/nlp/__init__.py # modelscope/preprocessors/nlp.py # modelscope/trainers/trainer.py # requirements/multi-modal.txt * add missing merge * add sofa source code * refactor * add veco task dataset * add veco task dataset * pre-commit passed * fix bug of log * add some features * merge master * bug fix * refine nlp models * fix the training error * unfinished * refactor pipeline * Merge branch 'feat/backbone_head' into feat/refactor_nlp_730 # Conflicts: # modelscope/metrics/builder.py # modelscope/models/nlp/__init__.py # modelscope/models/nlp/backbones/structbert/modeling_sbert.py # modelscope/models/nlp/palm_v2/palm_for_text_generation.py # modelscope/preprocessors/base.py # modelscope/preprocessors/nlp.py # modelscope/trainers/trainer.py * Merge commit 'ab04ceafc5453ce7daa9aa09e37a55f703072a10' into feat/refactor_nlp_730 # Conflicts: # modelscope/metainfo.py # modelscope/metrics/builder.py # modelscope/models/__init__.py # modelscope/models/base/base_torch_model.py # modelscope/models/nlp/__init__.py # modelscope/models/nlp/backbones/space/model/intent_unified_transformer.py # modelscope/models/nlp/backbones/space/model/model_base.py # modelscope/models/nlp/palm_v2/palm_for_text_generation.py # modelscope/models/nlp/sbert_for_sequence_classification.py # modelscope/models/nlp/sequence_classification.py # modelscope/models/nlp/space/__init__.py # modelscope/models/nlp/space_for_dialog_intent_prediction.py # modelscope/models/nlp/space_for_dialog_modeling.py # modelscope/models/nlp/space_for_dialog_state_tracking.py # modelscope/models/nlp/task_model.py # modelscope/pipelines/nlp/sentiment_classification_pipeline.py # modelscope/preprocessors/base.py # modelscope/preprocessors/nlp.py # modelscope/trainers/trainer.py * revert changes * unify sentnece classification postprocess * revert some changes, move some model files * pipeline first case run through * ws pipeline passed * Merge branch 'feat/refactor_nlp_730' into feat/finetune_sbert_veco * finetune * revert code * revert some code * ws finetune started, only the accuracy is weird * Merge branch 'feat/veco_taskdataset' into feat/finetune_sbert_veco # Conflicts: # modelscope/task_datasets/veco_dataset.py # tests/taskdataset/test_veco_dataset.py * veco+nli finetune started * Merge branch 'master' into feat/finetune_sbert_veco # Conflicts: # modelscope/models/nlp/sbert_for_sequence_classification.py # modelscope/models/nlp/sbert_for_token_classification.py # modelscope/models/nlp/sbert_for_zero_shot_classification.py # modelscope/models/nlp/space/space_for_dialog_intent_prediction.py # modelscope/models/nlp/space/space_for_dialog_modeling.py # modelscope/trainers/trainer.py * add trainer for nlp * trainer: dataset params passed into preprocessor * test passed by nlptrainer * fix some bugs * fix some bugs * add backbone/head subclass * fix regression bugs * fix bug in token-cls finetune * support cfg modification * fix bug * fix bug * update requirements * add some comments and fix some t * add some comments and revert a argument * split to two test files * revert code * fixbug in precessor (cherry picked from commit 7a648d096e) * fix ut bug * support sbert models * unfinished * Merge branch 'feat/finetune_sbert_veco' into sly_tmp_veco_finetune # Conflicts: # tests/trainers/test_finetune_sequence_classification.py * fixbug in veco * fix bug * fixbug * correct running params * remove useless files * add palm finetuning with cnn_dailymail dataset * copy space model from sofa * Merge branch 'feat/finetune_sbert_veco' of gitlab.alibaba-inc.com:Ali-MaaS/MaaS-lib into feat/finetune_sbert_veco * Merge branch 'master' into feat/finetune_sbert_veco # Conflicts: # modelscope/metrics/__init__.py # modelscope/models/__init__.py # modelscope/models/nlp/__init__.py # modelscope/models/nlp/backbones/__init__.py # modelscope/models/nlp/backbones/structbert/modeling_sbert.py # modelscope/models/nlp/heads/__init__.py # modelscope/models/nlp/masked_language.py # modelscope/models/nlp/palm_v2/palm_for_text_generation.py # modelscope/models/nlp/sbert_for_nli.py # modelscope/models/nlp/sbert_for_sentence_similarity.py # modelscope/models/nlp/sbert_for_sentiment_classification.py # modelscope/models/nlp/sbert_for_sequence_classification.py # modelscope/models/nlp/sbert_for_token_classification.py # modelscope/models/nlp/sbert_for_zero_shot_classification.py # modelscope/models/nlp/sequence_classification.py # modelscope/models/nlp/space/space_for_dialog_intent_prediction.py # modelscope/models/nlp/space/space_for_dialog_modeling.py # modelscope/models/nlp/space/space_for_dialog_state_tracking.py # modelscope/models/nlp/structbert/adv_utils.py # modelscope/models/nlp/structbert/configuration_sbert.py # modelscope/models/nlp/task_models/task_model.py # modelscope/pipelines/__init__.py # modelscope/pipelines/nlp/__init__.py # modelscope/pipelines/nlp/fill_mask_pipeline.py # modelscope/pipelines/nlp/named_entity_recognition_pipeline.py # modelscope/pipelines/nlp/nli_pipeline.py # modelscope/pipelines/nlp/sentence_similarity_pipeline.py # modelscope/pipelines/nlp/sentiment_classification_pipeline.py # modelscope/pipelines/nlp/text_generation_pipeline.py # modelscope/pipelines/nlp/word_segmentation_pipeline.py # modelscope/pipelines/nlp/zero_shot_classification_pipeline.py # modelscope/preprocessors/nlp.py # modelscope/task_datasets/__init__.py # modelscope/trainers/trainer.py # modelscope/trainers/utils/inference.py # modelscope/utils/file_utils.py # requirements/nlp.txt # tests/pipelines/test_nli.py # tests/pipelines/test_sentence_similarity.py # tests/pipelines/test_sentiment_classification.py * fix imports * mark backbone in their own modeling * pre-commit check passed * pre-commit passed, remove roberta model * fix a bug in ast import * skip all finetune uts * fix bugs * pre-commit passed * bug fixed * bug fixed * bug fixed * bug fixed * fix ut bug * fix bug * fix ut bug * fix bug * fix bug * fixbugs * fixbug * revert veco * revert veco because of core dump * fix palm bug * revert veco * revert mistaken code * add a test print * pre-commit check * test exception * add test code * for test * fix bug and test * remove test code * remove useless file * 1. fix some bugs 2. add backbone ut * Merge branch 'master' into feat/finetune_refactor_730 # Conflicts: # modelscope/metainfo.py # modelscope/metrics/sequence_classification_metric.py # modelscope/models/nlp/__init__.py # modelscope/models/nlp/task_models/task_model.py # modelscope/preprocessors/__init__.py # modelscope/preprocessors/nlp.py # modelscope/trainers/trainer.py # modelscope/trainers/utils/inference.py # modelscope/utils/file_utils.py # tests/trainers/test_trainer_with_nlp.py * pre-commit passed * revert files * increase test level * unregister models * fix bugs * fix cr comments * fix bug in backbone-head * add sbert backbone * fix bug * add test for token-cls-metric * pre-commit passed * fix ut comments * revert normal tokenizer to fast tokenizer * Merge branch 'master' into feat/finetune_refactor_730 # Conflicts: # modelscope/models/nlp/__init__.py # modelscope/models/nlp/backbones/__init__.py # modelscope/models/nlp/backbones/structbert/__init__.py # modelscope/models/nlp/masked_language.py # modelscope/models/nlp/palm_v2/palm_for_text_generation.py # modelscope/models/nlp/sbert_for_sequence_classification.py # modelscope/models/nlp/sbert_for_token_classification.py # modelscope/models/nlp/sbert_for_zero_shot_classification.py # modelscope/pipelines/nlp/text_generation_pipeline.py # modelscope/preprocessors/nlp.py # modelscope/trainers/trainer.py # modelscope/trainers/utils/inference.py * fix merge bugs * pre commit passed * fix bug * fix bug * fix bug * fix bug from master * add print * fix ut bug * fix bug * Merge branch 'master' into feat/finetune_refactor_730 * skip task model test
3 years ago · 21fa71baf0
--- a/configs/nlp/sbert_sentence_similarity.json
+++ b/configs/nlp/sbert_sentence_similarity.json
@@ -2,7 +2,7 @@
    "framework": "pytorch",
    "task": "sentence-similarity",
    "preprocessor": {
      "type": "bert-seq-cls-tokenizer-finetune",
      "type": "sen-sim-tokenizer",
      "first_sequence": "sentence1",
      "second_sequence": "sentence2"
    },
--- a/modelscope/hub/utils/utils.py
+++ b/modelscope/hub/utils/utils.py
@@ -4,7 +4,7 @@ from modelscope.hub.constants import (DEFAULT_MODELSCOPE_DOMAIN,
                                      DEFAULT_MODELSCOPE_GROUP,
                                      MODEL_ID_SEPARATOR,
                                      MODELSCOPE_URL_SCHEME)
 from modelscope.utils.utils import get_default_cache_dir
 from modelscope.utils.file_utils import get_default_cache_dir


 def model_id_to_group_owner_name(model_id):
--- a/modelscope/metainfo.py
+++ b/modelscope/metainfo.py
@@ -53,6 +53,10 @@ class TaskModels(object):
 class Heads(object):
    # nlp heads
    text_classification = 'text-classification'
    # mlm
    bert_mlm = 'bert-mlm'
    # roberta mlm
    roberta_mlm = 'roberta-mlm'


 class Pipelines(object):
@@ -137,7 +141,7 @@ class Trainers(object):
        Holds the standard trainer name to use for identifying different trainer.
    This should be used to register trainers.

        For a general Trainer, you can use easynlp-trainer/ofa-trainer/sofa-trainer.
        For a general Trainer, you can use easynlp-trainer/ofa-trainer.
        For a model specific Trainer, you can use ${ModelName}-${Task}-trainer.
    """

@@ -179,6 +183,8 @@ class Preprocessors(object):
    sbert_token_cls_tokenizer = 'sbert-token-cls-tokenizer'
    zero_shot_cls_tokenizer = 'zero-shot-cls-tokenizer'
    text_error_correction = 'text-error-correction'
    word_segment_text_to_label_preprocessor = 'word-segment-text-to-label-preprocessor'
    fill_mask = 'fill-mask'

    # audio preprocessor
    linear_aec_fbank = 'linear-aec-fbank'
@@ -204,7 +210,7 @@ class Metrics(object):
    # metric for image instance segmentation task
    image_ins_seg_coco_metric = 'image-ins-seg-coco-metric'
    # metrics for sequence classification task
    seq_cls_metric = 'seq_cls_metric'
    seq_cls_metric = 'seq-cls-metric'
    # metrics for token-classification task
    token_cls_metric = 'token-cls-metric'
    # metrics for text-generation task
--- a/modelscope/metrics/init.py
+++ b/modelscope/metrics/init.py
@@ -13,6 +13,7 @@ if TYPE_CHECKING:
    from .image_portrait_enhancement_metric import ImagePortraitEnhancementMetric
    from .sequence_classification_metric import SequenceClassificationMetric
    from .text_generation_metric import TextGenerationMetric
    from .token_classification_metric import TokenClassificationMetric

 else:
    _import_structure = {
@@ -26,6 +27,7 @@ else:
        ['ImagePortraitEnhancementMetric'],
        'sequence_classification_metric': ['SequenceClassificationMetric'],
        'text_generation_metric': ['TextGenerationMetric'],
        'token_classification_metric': ['TokenClassificationMetric'],
    }

    import sys
--- a/modelscope/metrics/base.py
+++ b/modelscope/metrics/base.py
@@ -10,6 +10,9 @@ class Metric(ABC):
    complex metrics for a specific task with or without other Metric subclasses.
    """

    def __init__(self, trainer=None, *args, **kwargs):
        self.trainer = trainer

    @abstractmethod
    def add(self, outputs: Dict, inputs: Dict):
        """ Append logits and labels within an eval loop.
--- a/modelscope/metrics/builder.py
+++ b/modelscope/metrics/builder.py
@@ -20,7 +20,9 @@ class MetricKeys(object):
 task_default_metrics = {
    Tasks.image_segmentation: [Metrics.image_ins_seg_coco_metric],
    Tasks.sentence_similarity: [Metrics.seq_cls_metric],
    Tasks.nli: [Metrics.seq_cls_metric],
    Tasks.sentiment_classification: [Metrics.seq_cls_metric],
    Tasks.token_classification: [Metrics.token_cls_metric],
    Tasks.text_generation: [Metrics.text_gen_metric],
    Tasks.image_denoising: [Metrics.image_denoise_metric],
    Tasks.image_color_enhancement: [Metrics.image_color_enhance_metric],
--- a/modelscope/metrics/sequence_classification_metric.py
+++ b/modelscope/metrics/sequence_classification_metric.py
@@ -17,14 +17,14 @@ class SequenceClassificationMetric(Metric):
    """The metric computation class for sequence classification classes.
    """

    label_name = 'labels'

    def __init__(self):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.preds = []
        self.labels = []

    def add(self, outputs: Dict, inputs: Dict):
        ground_truths = inputs[self.label_name]
        label_name = OutputKeys.LABEL if OutputKeys.LABEL in inputs else OutputKeys.LABELS
        ground_truths = inputs[label_name]
        eval_results = outputs[OutputKeys.LOGITS]
        self.preds.append(
            torch_nested_numpify(torch_nested_detach(eval_results)))
--- a/modelscope/metrics/token_classification_metric.py
+++ b/modelscope/metrics/token_classification_metric.py
@@ -0,0 +1,123 @@
 import importlib
 from typing import Dict, List, Optional, Union

 import numpy as np

 from modelscope.outputs import OutputKeys
 from ..metainfo import Metrics
 from ..utils.registry import default_group
 from ..utils.tensor_utils import torch_nested_detach, torch_nested_numpify
 from .base import Metric
 from .builder import METRICS, MetricKeys


@METRICS.register_module(
    group_key=default_group, module_name=Metrics.token_cls_metric)
 class TokenClassificationMetric(Metric):
    """
    The metric computation class for token-classification task.
    Args:
        return_entity_level_metrics (bool, *optional*):
            Whether to return every label's detail metrics, default False.
    """

    def add(self, outputs: Dict, inputs: Dict):
        label_name = OutputKeys.LABEL if OutputKeys.LABEL in inputs else OutputKeys.LABELS
        ground_truths = inputs[label_name]
        eval_results = outputs[OutputKeys.LOGITS]
        self.preds.append(
            torch_nested_numpify(torch_nested_detach(eval_results)))
        self.labels.append(
            torch_nested_numpify(torch_nested_detach(ground_truths)))

    def __init__(self, return_entity_level_metrics=False, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.return_entity_level_metrics = return_entity_level_metrics
        self.preds = []
        self.labels = []

    def evaluate(self):
        self.id2label = {
            id: label
            for label, id in self.trainer.label2id.items()
        }
        self.preds = np.concatenate(self.preds, axis=0)
        self.labels = np.concatenate(self.labels, axis=0)
        predictions = np.argmax(self.preds, axis=-1)

        true_predictions = [[
            self.id2label[p] for (p, lb) in zip(prediction, label)
            if lb != -100
        ] for prediction, label in zip(predictions, self.labels)]
        true_labels = [[
            self.id2label[lb] for (p, lb) in zip(prediction, label)
            if lb != -100
        ] for prediction, label in zip(predictions, self.labels)]

        results = self._compute(
            predictions=true_predictions, references=true_labels)
        if self.return_entity_level_metrics:
            final_results = {}
            for key, value in results.items():
                if isinstance(value, dict):
                    for n, v in value.items():
                        final_results[f'{key}_{n}'] = v
                else:
                    final_results[key] = value
            return final_results
        else:
            return {
                MetricKeys.PRECISION: results[MetricKeys.PRECISION],
                MetricKeys.RECALL: results[MetricKeys.RECALL],
                MetricKeys.F1: results[MetricKeys.F1],
                MetricKeys.ACCURACY: results[MetricKeys.ACCURACY],
            }

    @staticmethod
    def _compute(
        predictions,
        references,
        suffix: bool = False,
        scheme: Optional[str] = None,
        mode: Optional[str] = None,
        sample_weight: Optional[List[int]] = None,
        zero_division: Union[str, int] = 'warn',
    ):
        from seqeval.metrics import accuracy_score, classification_report
        if scheme is not None:
            try:
                scheme_module = importlib.import_module('seqeval.scheme')
                scheme = getattr(scheme_module, scheme)
            except AttributeError:
                raise ValueError(
                    f'Scheme should be one of [IOB1, IOB2, IOE1, IOE2, IOBES, BILOU], got {scheme}'
                )
        report = classification_report(
            y_true=references,
            y_pred=predictions,
            suffix=suffix,
            output_dict=True,
            scheme=scheme,
            mode=mode,
            sample_weight=sample_weight,
            zero_division=zero_division,
        )
        report.pop('macro avg')
        report.pop('weighted avg')
        overall_score = report.pop('micro avg')

        scores = {
            type_name: {
                MetricKeys.PRECISION: score['precision'],
                MetricKeys.RECALL: score['recall'],
                MetricKeys.F1: score['f1-score'],
                'number': score['support'],
            }
            for type_name, score in report.items()
        }
        scores[MetricKeys.PRECISION] = overall_score['precision']
        scores[MetricKeys.RECALL] = overall_score['recall']
        scores[MetricKeys.F1] = overall_score['f1-score']
        scores[MetricKeys.ACCURACY] = accuracy_score(
            y_true=references, y_pred=predictions)
        return scores
--- a/modelscope/models/base/base_model.py
+++ b/modelscope/models/base/base_model.py
@@ -10,6 +10,8 @@ from modelscope.hub.snapshot_download import snapshot_download
 from modelscope.models.builder import build_model
 from modelscope.utils.config import Config
 from modelscope.utils.constant import DEFAULT_MODEL_REVISION, ModelFile
 from modelscope.utils.file_utils import func_receive_dict_inputs
 from modelscope.utils.hub import parse_label_mapping
 from modelscope.utils.logger import get_logger

 logger = get_logger()
@@ -69,6 +71,7 @@ class Model(ABC):
    def from_pretrained(cls,
                        model_name_or_path: str,
                        revision: Optional[str] = DEFAULT_MODEL_REVISION,
                        cfg_dict: Config = None,
                        *model_args,
                        **kwargs):
        """ Instantiate a model from local directory or remote model repo. Note
@@ -87,25 +90,25 @@ class Model(ABC):
                )
            local_model_dir = snapshot_download(model_name_or_path, revision)
        logger.info(f'initialize model from {local_model_dir}')
        cfg = Config.from_file(
            osp.join(local_model_dir, ModelFile.CONFIGURATION))
        if cfg_dict is not None:
            cfg = cfg_dict
        else:
            cfg = Config.from_file(
                osp.join(local_model_dir, ModelFile.CONFIGURATION))
        task_name = cfg.task
        model_cfg = cfg.model
        assert hasattr(
            cfg, 'pipeline'), 'pipeline config is missing from config file.'
        pipeline_cfg = cfg.pipeline
        # TODO @wenmeng.zwm may should manually initialize model after model building

        if hasattr(model_cfg, 'model_type') and not hasattr(model_cfg, 'type'):
            model_cfg.type = model_cfg.model_type

        model_cfg.model_dir = local_model_dir

        for k, v in kwargs.items():
            model_cfg[k] = v
        model = build_model(
            model_cfg, task_name=task_name, default_args=kwargs)

        # dynamically add pipeline info to model for pipeline inference
        model.pipeline = pipeline_cfg
        if hasattr(cfg, 'pipeline'):
            model.pipeline = cfg.pipeline
        return model
--- a/modelscope/models/base/base_torch_model.py
+++ b/modelscope/models/base/base_torch_model.py
@@ -5,6 +5,7 @@ from typing import Any, Dict, Optional, Union
 import torch
 from torch import nn

 from modelscope.utils.file_utils import func_receive_dict_inputs
 from modelscope.utils.logger import get_logger
 from .base_model import Model

@@ -20,6 +21,13 @@ class TorchModel(Model, torch.nn.Module):
        super().__init__(model_dir, *args, **kwargs)
        torch.nn.Module.__init__(self)

    def __call__(self, input: Dict[str,
                                   torch.Tensor]) -> Dict[str, torch.Tensor]:
        if func_receive_dict_inputs(self.forward):
            return self.postprocess(self.forward(input))
        else:
            return self.postprocess(self.forward(**input))

    def forward(self, inputs: Dict[str,
                                   torch.Tensor]) -> Dict[str, torch.Tensor]:
        raise NotImplementedError
@@ -50,6 +58,3 @@ class TorchModel(Model, torch.nn.Module):
        elif isinstance(module, nn.LayerNorm):
            module.bias.data.zero_()
            module.weight.data.fill_(1.0)

    def compute_loss(self, outputs: Dict[str, Any], labels):
        raise NotImplementedError()
--- a/modelscope/models/nlp/init.py
+++ b/modelscope/models/nlp/init.py
@@ -4,32 +4,26 @@ from typing import TYPE_CHECKING
 from modelscope.utils.import_utils import LazyImportModule

 if TYPE_CHECKING:
    from .backbones import (SbertModel, SpaceGenerator, SpaceModelBase,
                            GPT3Model)
    from .backbones import SbertModel
    from .heads import SequenceClassificationHead
    from .bert_for_sequence_classification import BertForSequenceClassification
    from .csanmt_for_translation import CsanmtForTranslation
    from .masked_language import (StructBertForMaskedLM, VecoForMaskedLM,
                                  BertForMaskedLM)
    from .nncrf_for_named_entity_recognition import TransformerCRFForNamedEntityRecognition
    from .palm_for_text_generation import PalmForTextGeneration
    from .sbert_for_nli import SbertForNLI
    from .sbert_for_sentence_similarity import SbertForSentenceSimilarity
    from .sbert_for_sentiment_classification import SbertForSentimentClassification
    from .sbert_for_token_classification import SbertForTokenClassification
    from .sbert_for_zero_shot_classification import SbertForZeroShotClassification
    from .sequence_classification import SequenceClassificationModel
    from .space_for_dialog_intent_prediction import SpaceForDialogIntent
    from .space_for_dialog_modeling import SpaceForDialogModeling
    from .space_for_dialog_state_tracking import SpaceForDialogStateTracking
    from .task_model import SingleBackboneTaskModelBase
    from .palm_v2 import PalmForTextGeneration
    from .token_classification import SbertForTokenClassification
    from .sequence_classification import VecoForSequenceClassification, SbertForSequenceClassification
    from .space import SpaceForDialogIntent
    from .space import SpaceForDialogModeling
    from .space import SpaceForDialogStateTracking
    from .task_models.task_model import SingleBackboneTaskModelBase
    from .bart_for_text_error_correction import BartForTextErrorCorrection
    from .gpt3_for_text_generation import GPT3ForTextGeneration
    from .gpt3 import GPT3ForTextGeneration

 else:
    _import_structure = {
        'backbones':
        ['SbertModel', 'SpaceGenerator', 'SpaceModelBase', 'GPT3Model'],
        'backbones': ['SbertModel'],
        'heads': ['SequenceClassificationHead'],
        'csanmt_for_translation': ['CsanmtForTranslation'],
        'bert_for_sequence_classification': ['BertForSequenceClassification'],
@@ -37,21 +31,17 @@ else:
        ['StructBertForMaskedLM', 'VecoForMaskedLM', 'BertForMaskedLM'],
        'nncrf_for_named_entity_recognition':
        ['TransformerCRFForNamedEntityRecognition'],
        'palm_for_text_generation': ['PalmForTextGeneration'],
        'sbert_for_nli': ['SbertForNLI'],
        'sbert_for_sentence_similarity': ['SbertForSentenceSimilarity'],
        'sbert_for_sentiment_classification':
        ['SbertForSentimentClassification'],
        'sbert_for_token_classification': ['SbertForTokenClassification'],
        'sbert_for_zero_shot_classification':
        ['SbertForZeroShotClassification'],
        'sequence_classification': ['SequenceClassificationModel'],
        'space_for_dialog_intent_prediction': ['SpaceForDialogIntent'],
        'space_for_dialog_modeling': ['SpaceForDialogModeling'],
        'space_for_dialog_state_tracking': ['SpaceForDialogStateTracking'],
        'palm_v2': ['PalmForTextGeneration'],
        'token_classification': ['SbertForTokenClassification'],
        'sequence_classification':
        ['VecoForSequenceClassification', 'SbertForSequenceClassification'],
        'space': [
            'SpaceForDialogIntent', 'SpaceForDialogModeling',
            'SpaceForDialogStateTracking'
        ],
        'task_model': ['SingleBackboneTaskModelBase'],
        'bart_for_text_error_correction': ['BartForTextErrorCorrection'],
        'gpt3_for_text_generation': ['GPT3ForTextGeneration'],
        'gpt3': ['GPT3ForTextGeneration'],
    }

    import sys
--- a/modelscope/models/nlp/backbones/init.py
+++ b/modelscope/models/nlp/backbones/init.py
@@ -4,14 +4,10 @@ from typing import TYPE_CHECKING
 from modelscope.utils.import_utils import LazyImportModule

 if TYPE_CHECKING:
    from .space import SpaceGenerator, SpaceModelBase
    from .structbert import SbertModel
    from .gpt3 import GPT3Model
 else:
    _import_structure = {
        'space': ['SpaceGenerator', 'SpaceModelBase'],
        'structbert': ['SbertModel'],
        'gpt3': ['GPT3Model']
    }

    import sys
--- a/modelscope/models/nlp/backbones/space/init.py
+++ b/modelscope/models/nlp/backbones/space/init.py
@@ -1,2 +0,0 @@
 from .model.generator import Generator as SpaceGenerator
 from .model.model_base import SpaceModelBase
--- a/modelscope/models/nlp/backbones/space/model/init.py
+++ b/modelscope/models/nlp/backbones/space/model/init.py
@@ -1,3 +0,0 @@
 from .gen_unified_transformer import GenUnifiedTransformer
 from .intent_unified_transformer import IntentUnifiedTransformer
 from .unified_transformer import UnifiedTransformer
--- a/modelscope/models/nlp/backbones/structbert.py
+++ b/modelscope/models/nlp/backbones/structbert.py
@@ -0,0 +1,54 @@
 from transformers import PreTrainedModel

 from modelscope.metainfo import Models
 from modelscope.models.base import TorchModel
 from modelscope.models.builder import BACKBONES
 from modelscope.models.nlp.structbert import SbertConfig
 from modelscope.models.nlp.structbert import SbertModel as SbertModelTransform
 from modelscope.utils.constant import Fields
 from modelscope.utils.logger import get_logger

 logger = get_logger(__name__)


@BACKBONES.register_module(Fields.nlp, module_name=Models.structbert)
 class SbertModel(TorchModel, SbertModelTransform):

    def __init__(self, model_dir=None, add_pooling_layer=True, **config):
        """
        Args:
            model_dir (str, optional): The model checkpoint directory. Defaults to None.
            add_pooling_layer (bool, optional): to decide if pool the output from hidden layer. Defaults to True.
        """
        config = SbertConfig(**config)
        super().__init__(model_dir)
        self.config = config
        SbertModelTransform.__init__(self, config, add_pooling_layer)

    def extract_sequence_outputs(self, outputs):
        return outputs['last_hidden_state']

    def extract_pooled_outputs(self, outputs):
        return outputs['pooler_output']

    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,
        position_ids=None,
        head_mask=None,
        inputs_embeds=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        past_key_values=None,
        use_cache=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=None,
    ):
        return SbertModelTransform.forward(
            self, input_ids, attention_mask, token_type_ids, position_ids,
            head_mask, inputs_embeds, encoder_hidden_states,
            encoder_attention_mask, past_key_values, use_cache,
            output_attentions, output_hidden_states, return_dict)
--- a/modelscope/models/nlp/backbones/structbert/init.py
+++ b/modelscope/models/nlp/backbones/structbert/init.py
@@ -1,19 +0,0 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.
 from typing import TYPE_CHECKING

 from modelscope.utils.import_utils import LazyImportModule

 if TYPE_CHECKING:
    from .modeling_sbert import SbertModel
 else:
    _import_structure = {'modeling_sbert': ['SbertModel']}

    import sys

    sys.modules[__name__] = LazyImportModule(
        __name__,
        globals()['__file__'],
        _import_structure,
        module_spec=__spec__,
        extra_objects={},
    )
--- a/modelscope/models/nlp/backbones/structbert/modeling_sbert.py
+++ b/modelscope/models/nlp/backbones/structbert/modeling_sbert.py
@@ -1,815 +0,0 @@
 import math
 from dataclasses import dataclass
 from typing import Optional, Tuple, Union

 import torch
 import torch.utils.checkpoint
 from packaging import version
 from torch import nn
 from transformers import PreTrainedModel
 from transformers.activations import ACT2FN
 from transformers.modeling_outputs import (
    BaseModelOutputWithPastAndCrossAttentions,
    BaseModelOutputWithPoolingAndCrossAttentions, ModelOutput)
 from transformers.modeling_utils import (apply_chunking_to_forward,
                                         find_pruneable_heads_and_indices,
                                         prune_linear_layer)

 from modelscope.metainfo import Models
 from modelscope.models.base import TorchModel
 from modelscope.models.builder import BACKBONES
 from modelscope.utils.constant import Fields
 from modelscope.utils.logger import get_logger
 from .configuration_sbert import SbertConfig

 logger = get_logger(__name__)


@BACKBONES.register_module(Fields.nlp, module_name=Models.structbert)
 class SbertModel(TorchModel, PreTrainedModel):
    """

    The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of
    cross-attention is added between the self-attention layers, following the architecture described in `Attention is
    all you need <https://arxiv.org/abs/1706.03762>`__ by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit,
    Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

    To behave as an decoder the model needs to be initialized with the :obj:`is_decoder` argument of the configuration
    set to :obj:`True`. To be used in a Seq2Seq model, the model needs to initialized with both :obj:`is_decoder`
    argument and :obj:`add_cross_attention` set to :obj:`True`; an :obj:`encoder_hidden_states` is then expected as an
    input to the forward pass.
    """

    def __init__(self, model_dir=None, add_pooling_layer=True, **config):
        """
        Args:
            model_dir (str, optional): The model checkpoint directory. Defaults to None.
            add_pooling_layer (bool, optional): to decide if pool the output from hidden layer. Defaults to True.
        """
        config = SbertConfig(**config)
        super().__init__(model_dir)
        self.config = config

        self.embeddings = SbertEmbeddings(config)
        self.encoder = SbertEncoder(config)

        self.pooler = SbertPooler(config) if add_pooling_layer else None
        self.init_weights()

    def get_input_embeddings(self):
        return self.embeddings.word_embeddings

    def set_input_embeddings(self, value):
        self.embeddings.word_embeddings = value

    def _prune_heads(self, heads_to_prune):
        """
        Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} See base
        class PreTrainedModel
        """
        for layer, heads in heads_to_prune.items():
            self.encoder.layer[layer].attention.prune_heads(heads)

    def forward(self,
                input_ids=None,
                attention_mask=None,
                token_type_ids=None,
                position_ids=None,
                head_mask=None,
                inputs_embeds=None,
                encoder_hidden_states=None,
                encoder_attention_mask=None,
                past_key_values=None,
                use_cache=None,
                output_attentions=None,
                output_hidden_states=None,
                return_dict=None,
                **kwargs):
        r"""
        encoder_hidden_states  (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`
        , `optional`):
            Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
            the model is configured as a decoder.
        encoder_attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
            Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
            the cross-attention if the model is configured as a decoder. Mask values selected in ``[0, 1]``:

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.
        past_key_values (:obj:`tuple(tuple(torch.FloatTensor))` of length :obj:`config.n_layers`
        with each tuple having 4 tensors of shape :obj:`(batch_size, num_heads,
        sequence_length - 1, embed_size_per_head)`):
            Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.

            If :obj:`past_key_values` are used, the user can optionally input only the last :obj:`decoder_input_ids`
            (those that don't have their past key value states given to this model) of shape :obj:`(batch_size, 1)`
            instead of all :obj:`decoder_input_ids` of shape :obj:`(batch_size, sequence_length)`.
        use_cache (:obj:`bool`, `optional`):
            If set to :obj:`True`, :obj:`past_key_values` key value states are returned and can be used to speed up
            decoding (see :obj:`past_key_values`).
        """

        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else
            self.config.output_hidden_states)
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if self.config.is_decoder:
            use_cache = use_cache if use_cache is not None else self.config.use_cache
        else:
            use_cache = False

        if input_ids is not None and inputs_embeds is not None:
            raise ValueError(
                'You cannot specify both input_ids and inputs_embeds at the same time'
            )
        elif input_ids is not None:
            input_shape = input_ids.size()
        elif inputs_embeds is not None:
            input_shape = inputs_embeds.size()[:-1]
        else:
            raise ValueError(
                'You have to specify either input_ids or inputs_embeds')

        batch_size, seq_length = input_shape
        device = input_ids.device if input_ids is not None else inputs_embeds.device

        # past_key_values_length
        past_key_values_length = past_key_values[0][0].shape[
            2] if past_key_values is not None else 0

        if attention_mask is None:
            attention_mask = torch.ones(
                ((batch_size, seq_length + past_key_values_length)),
                device=device)

        if token_type_ids is None:
            if hasattr(self.embeddings, 'token_type_ids'):
                buffered_token_type_ids = self.embeddings.token_type_ids[:, :
                                                                         seq_length]
                buffered_token_type_ids_expanded = buffered_token_type_ids.expand(
                    batch_size, seq_length)
                token_type_ids = buffered_token_type_ids_expanded
            else:
                token_type_ids = torch.zeros(
                    input_shape, dtype=torch.long, device=device)

        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
        # ourselves in which case we just need to make it broadcastable to all heads.
        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(
            attention_mask, input_shape, device)

        # If a 2D or 3D attention mask is provided for the cross-attention
        # we need to make broadcastable to [batch_size, num_heads, seq_length, seq_length]
        if self.config.is_decoder and encoder_hidden_states is not None:
            encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states.size(
            )
            encoder_hidden_shape = (encoder_batch_size,
                                    encoder_sequence_length)
            if encoder_attention_mask is None:
                encoder_attention_mask = torch.ones(
                    encoder_hidden_shape, device=device)
            encoder_extended_attention_mask = self.invert_attention_mask(
                encoder_attention_mask)
        else:
            encoder_extended_attention_mask = None

        # Prepare head mask if needed
        # 1.0 in head_mask indicate we keep the head
        # attention_probs has shape bsz x n_heads x N x N
        # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
        # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
        head_mask = self.get_head_mask(head_mask,
                                       self.config.num_hidden_layers)

        embedding_output, orignal_embeds = self.embeddings(
            input_ids=input_ids,
            position_ids=position_ids,
            token_type_ids=token_type_ids,
            inputs_embeds=inputs_embeds,
            past_key_values_length=past_key_values_length,
            return_inputs_embeds=True,
        )
        encoder_outputs = self.encoder(
            embedding_output,
            attention_mask=extended_attention_mask,
            head_mask=head_mask,
            encoder_hidden_states=encoder_hidden_states,
            encoder_attention_mask=encoder_extended_attention_mask,
            past_key_values=past_key_values,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        sequence_output = encoder_outputs[0]
        pooled_output = self.pooler(
            sequence_output) if self.pooler is not None else None

        if not return_dict:
            return (sequence_output,
                    pooled_output) + encoder_outputs[1:] + (orignal_embeds, )

        return BaseModelOutputWithPoolingAndCrossAttentionsWithEmbedding(
            last_hidden_state=sequence_output,
            pooler_output=pooled_output,
            past_key_values=encoder_outputs.past_key_values,
            hidden_states=encoder_outputs.hidden_states,
            attentions=encoder_outputs.attentions,
            cross_attentions=encoder_outputs.cross_attentions,
            embedding_output=orignal_embeds)

    def extract_sequence_outputs(self, outputs):
        return outputs['last_hidden_state']

    def extract_pooled_outputs(self, outputs):
        return outputs['pooler_output']


 class SbertEmbeddings(nn.Module):
    """Construct the embeddings from word, position and token_type embeddings."""

    def __init__(self, config):
        super().__init__()
        self.word_embeddings = nn.Embedding(
            config.vocab_size,
            config.hidden_size,
            padding_idx=config.pad_token_id)
        self.position_embeddings = nn.Embedding(config.max_position_embeddings,
                                                config.hidden_size)
        self.token_type_embeddings = nn.Embedding(config.type_vocab_size,
                                                  config.hidden_size)

        # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
        # any TensorFlow checkpoint file
        self.LayerNorm = nn.LayerNorm(
            config.hidden_size, eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        # position_ids (1, len position emb) is contiguous in memory and exported when serialized
        self.position_embedding_type = getattr(config,
                                               'position_embedding_type',
                                               'absolute')
        self.register_buffer(
            'position_ids',
            torch.arange(config.max_position_embeddings).expand((1, -1)))
        if version.parse(torch.__version__) > version.parse('1.6.0'):
            self.register_buffer(
                'token_type_ids',
                torch.zeros(
                    self.position_ids.size(),
                    dtype=torch.long,
                    device=self.position_ids.device),
                persistent=False,
            )

    def forward(self,
                input_ids=None,
                token_type_ids=None,
                position_ids=None,
                inputs_embeds=None,
                past_key_values_length=0,
                return_inputs_embeds=False):
        if input_ids is not None:
            input_shape = input_ids.size()
        else:
            input_shape = inputs_embeds.size()[:-1]

        seq_length = input_shape[1]

        if position_ids is None:
            position_ids = self.position_ids[:,
                                             past_key_values_length:seq_length
                                             + past_key_values_length]

        # Setting the token_type_ids to the registered buffer in constructor where it is all zeros, which usually occurs
        # when its auto-generated, registered buffer helps users when tracing the model without passing token_type_ids
        # issue #5664
        if token_type_ids is None:
            if hasattr(self, 'token_type_ids'):
                buffered_token_type_ids = self.token_type_ids[:, :seq_length]
                buffered_token_type_ids_expanded = buffered_token_type_ids.expand(
                    input_shape[0], seq_length)
                token_type_ids = buffered_token_type_ids_expanded
            else:
                token_type_ids = torch.zeros(
                    input_shape,
                    dtype=torch.long,
                    device=self.position_ids.device)

        if inputs_embeds is None:
            inputs_embeds = self.word_embeddings(input_ids)
        token_type_embeddings = self.token_type_embeddings(token_type_ids)

        embeddings = inputs_embeds + token_type_embeddings
        if self.position_embedding_type == 'absolute':
            position_embeddings = self.position_embeddings(position_ids)
            embeddings += position_embeddings
        embeddings = self.LayerNorm(embeddings)
        embeddings = self.dropout(embeddings)
        if not return_inputs_embeds:
            return embeddings
        else:
            return embeddings, inputs_embeds


 class SbertSelfAttention(nn.Module):

    def __init__(self, config):
        super().__init__()
        if config.hidden_size % config.num_attention_heads != 0 and not hasattr(
                config, 'embedding_size'):
            raise ValueError(
                f'The hidden size ({config.hidden_size}) is not a multiple of the number of attention '
                f'heads ({config.num_attention_heads})')

        self.num_attention_heads = config.num_attention_heads
        self.attention_head_size = int(config.hidden_size
                                       / config.num_attention_heads)
        self.all_head_size = self.num_attention_heads * self.attention_head_size

        self.query = nn.Linear(config.hidden_size, self.all_head_size)
        self.key = nn.Linear(config.hidden_size, self.all_head_size)
        self.value = nn.Linear(config.hidden_size, self.all_head_size)

        self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
        self.position_embedding_type = getattr(config,
                                               'position_embedding_type',
                                               'absolute')
        if self.position_embedding_type == 'relative_key' or self.position_embedding_type == 'relative_key_query':
            self.max_position_embeddings = config.max_position_embeddings
            self.distance_embedding = nn.Embedding(
                2 * config.max_position_embeddings - 1,
                self.attention_head_size)

        self.is_decoder = config.is_decoder

    def transpose_for_scores(self, x):
        new_x_shape = x.size()[:-1] + (self.num_attention_heads,
                                       self.attention_head_size)
        x = x.view(*new_x_shape)
        return x.permute(0, 2, 1, 3)

    def forward(
        self,
        hidden_states,
        attention_mask=None,
        head_mask=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        past_key_value=None,
        output_attentions=False,
    ):
        mixed_query_layer = self.query(hidden_states)

        # If this is instantiated as a cross-attention module, the keys
        # and values come from an encoder; the attention mask needs to be
        # such that the encoder's padding tokens are not attended to.
        is_cross_attention = encoder_hidden_states is not None

        if is_cross_attention and past_key_value is not None:
            # reuse k,v, cross_attentions
            key_layer = past_key_value[0]
            value_layer = past_key_value[1]
            attention_mask = encoder_attention_mask
        elif is_cross_attention:
            key_layer = self.transpose_for_scores(
                self.key(encoder_hidden_states))
            value_layer = self.transpose_for_scores(
                self.value(encoder_hidden_states))
            attention_mask = encoder_attention_mask
        elif past_key_value is not None:
            key_layer = self.transpose_for_scores(self.key(hidden_states))
            value_layer = self.transpose_for_scores(self.value(hidden_states))
            key_layer = torch.cat([past_key_value[0], key_layer], dim=2)
            value_layer = torch.cat([past_key_value[1], value_layer], dim=2)
        else:
            key_layer = self.transpose_for_scores(self.key(hidden_states))
            value_layer = self.transpose_for_scores(self.value(hidden_states))

        query_layer = self.transpose_for_scores(mixed_query_layer)

        if self.is_decoder:
            # if cross_attention save Tuple(torch.Tensor, torch.Tensor) of all cross attention key/value_states.
            # Further calls to cross_attention layer can then reuse all cross-attention
            # key/value_states (first "if" case)
            # if uni-directional self-attention (decoder) save Tuple(torch.Tensor, torch.Tensor) of
            # all previous decoder key/value_states. Further calls to uni-directional self-attention
            # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
            # if encoder bi-directional self-attention `past_key_value` is always `None`
            past_key_value = (key_layer, value_layer)

        # Take the dot product between "query" and "key" to get the raw attention scores.
        attention_scores = torch.matmul(query_layer,
                                        key_layer.transpose(-1, -2))

        if self.position_embedding_type == 'relative_key' or self.position_embedding_type == 'relative_key_query':
            seq_length = hidden_states.size()[1]
            position_ids_l = torch.arange(
                seq_length, dtype=torch.long,
                device=hidden_states.device).view(-1, 1)
            position_ids_r = torch.arange(
                seq_length, dtype=torch.long,
                device=hidden_states.device).view(1, -1)
            distance = position_ids_l - position_ids_r
            positional_embedding = self.distance_embedding(
                distance + self.max_position_embeddings - 1)
            positional_embedding = positional_embedding.to(
                dtype=query_layer.dtype)  # fp16 compatibility

            if self.position_embedding_type == 'relative_key':
                relative_position_scores = torch.einsum(
                    'bhld,lrd->bhlr', query_layer, positional_embedding)
                attention_scores = attention_scores + relative_position_scores
            elif self.position_embedding_type == 'relative_key_query':
                relative_position_scores_query = torch.einsum(
                    'bhld,lrd->bhlr', query_layer, positional_embedding)
                relative_position_scores_key = torch.einsum(
                    'bhrd,lrd->bhlr', key_layer, positional_embedding)
                attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key

        attention_scores = attention_scores / math.sqrt(
            self.attention_head_size)
        if attention_mask is not None:
            # Apply the attention mask is (precomputed for all layers in SbertModel forward() function)
            attention_scores = attention_scores + attention_mask

        # Normalize the attention scores to probabilities.
        attention_probs = nn.Softmax(dim=-1)(attention_scores)

        # This is actually dropping out entire tokens to attend to, which might
        # seem a bit unusual, but is taken from the original Transformer paper.
        attention_probs = self.dropout(attention_probs)

        # Mask heads if we want to
        if head_mask is not None:
            attention_probs = attention_probs * head_mask

        context_layer = torch.matmul(attention_probs, value_layer)

        context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
        new_context_layer_shape = context_layer.size()[:-2] + (
            self.all_head_size, )
        context_layer = context_layer.view(*new_context_layer_shape)

        outputs = (context_layer,
                   attention_probs) if output_attentions else (context_layer, )

        if self.is_decoder:
            outputs = outputs + (past_key_value, )
        return outputs


 class SbertSelfOutput(nn.Module):

    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        self.LayerNorm = nn.LayerNorm(
            config.hidden_size, eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)

    def forward(self, hidden_states, input_tensor):
        hidden_states = self.dense(hidden_states)
        hidden_states = self.dropout(hidden_states)
        hidden_states = self.LayerNorm(hidden_states + input_tensor)
        return hidden_states


 class SbertAttention(nn.Module):

    def __init__(self, config):
        super().__init__()
        self.self = SbertSelfAttention(config)
        self.output = SbertSelfOutput(config)
        self.pruned_heads = set()

    def prune_heads(self, heads):
        if len(heads) == 0:
            return
        heads, index = find_pruneable_heads_and_indices(
            heads, self.self.num_attention_heads,
            self.self.attention_head_size, self.pruned_heads)

        # Prune linear layers
        self.self.query = prune_linear_layer(self.self.query, index)
        self.self.key = prune_linear_layer(self.self.key, index)
        self.self.value = prune_linear_layer(self.self.value, index)
        self.output.dense = prune_linear_layer(self.output.dense, index, dim=1)

        # Update hyper params and store pruned heads
        self.self.num_attention_heads = self.self.num_attention_heads - len(
            heads)
        self.self.all_head_size = self.self.attention_head_size * self.self.num_attention_heads
        self.pruned_heads = self.pruned_heads.union(heads)

    def forward(
        self,
        hidden_states,
        attention_mask=None,
        head_mask=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        past_key_value=None,
        output_attentions=False,
    ):
        self_outputs = self.self(
            hidden_states,
            attention_mask,
            head_mask,
            encoder_hidden_states,
            encoder_attention_mask,
            past_key_value,
            output_attentions,
        )
        attention_output = self.output(self_outputs[0], hidden_states)
        outputs = (attention_output,
                   ) + self_outputs[1:]  # add attentions if we output them
        return outputs


 class SbertIntermediate(nn.Module):

    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.intermediate_size)
        if isinstance(config.hidden_act, str):
            self.intermediate_act_fn = ACT2FN[config.hidden_act]
        else:
            self.intermediate_act_fn = config.hidden_act

    def forward(self, hidden_states):
        hidden_states = self.dense(hidden_states)
        hidden_states = self.intermediate_act_fn(hidden_states)
        return hidden_states


 class SbertOutput(nn.Module):

    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.intermediate_size, config.hidden_size)
        self.LayerNorm = nn.LayerNorm(
            config.hidden_size, eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)

    def forward(self, hidden_states, input_tensor):
        hidden_states = self.dense(hidden_states)
        hidden_states = self.dropout(hidden_states)
        hidden_states = self.LayerNorm(hidden_states + input_tensor)
        return hidden_states


 class SbertLayer(nn.Module):

    def __init__(self, config):
        super().__init__()
        self.chunk_size_feed_forward = config.chunk_size_feed_forward
        self.seq_len_dim = 1
        self.attention = SbertAttention(config)
        self.is_decoder = config.is_decoder
        self.add_cross_attention = config.add_cross_attention
        if self.add_cross_attention:
            if not self.is_decoder:
                raise ValueError(
                    f'{self} should be used as a decoder model if cross attention is added'
                )
            self.crossattention = SbertAttention(config)
        self.intermediate = SbertIntermediate(config)
        self.output = SbertOutput(config)

    def forward(
        self,
        hidden_states,
        attention_mask=None,
        head_mask=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        past_key_value=None,
        output_attentions=False,
    ):
        # decoder uni-directional self-attention cached key/values tuple is at positions 1,2
        self_attn_past_key_value = past_key_value[:
                                                  2] if past_key_value is not None else None
        self_attention_outputs = self.attention(
            hidden_states,
            attention_mask,
            head_mask,
            output_attentions=output_attentions,
            past_key_value=self_attn_past_key_value,
        )
        attention_output = self_attention_outputs[0]

        # if decoder, the last output is tuple of self-attn cache
        if self.is_decoder:
            outputs = self_attention_outputs[1:-1]
            present_key_value = self_attention_outputs[-1]
        else:
            outputs = self_attention_outputs[
                1:]  # add self attentions if we output attention weights

        cross_attn_present_key_value = None
        if self.is_decoder and encoder_hidden_states is not None:
            if not hasattr(self, 'crossattention'):
                raise ValueError(
                    f'If `encoder_hidden_states` are passed, {self} has to be instantiated'
                    f'with cross-attention layers by setting `config.add_cross_attention=True`'
                )

            # cross_attn cached key/values tuple is at positions 3,4 of past_key_value tuple
            cross_attn_past_key_value = past_key_value[
                -2:] if past_key_value is not None else None
            cross_attention_outputs = self.crossattention(
                attention_output,
                attention_mask,
                head_mask,
                encoder_hidden_states,
                encoder_attention_mask,
                cross_attn_past_key_value,
                output_attentions,
            )
            attention_output = cross_attention_outputs[0]
            outputs = outputs + cross_attention_outputs[
                1:-1]  # add cross attentions if we output attention weights

            # add cross-attn cache to positions 3,4 of present_key_value tuple
            cross_attn_present_key_value = cross_attention_outputs[-1]
            present_key_value = present_key_value + cross_attn_present_key_value

        layer_output = apply_chunking_to_forward(self.feed_forward_chunk,
                                                 self.chunk_size_feed_forward,
                                                 self.seq_len_dim,
                                                 attention_output)
        outputs = (layer_output, ) + outputs

        # if decoder, return the attn key/values as the last output
        if self.is_decoder:
            outputs = outputs + (present_key_value, )

        return outputs

    def feed_forward_chunk(self, attention_output):
        intermediate_output = self.intermediate(attention_output)
        layer_output = self.output(intermediate_output, attention_output)
        return layer_output


 class SbertEncoder(nn.Module):

    def __init__(self, config):
        super().__init__()
        self.config = config
        self.layer = nn.ModuleList(
            [SbertLayer(config) for _ in range(config.num_hidden_layers)])
        self.gradient_checkpointing = False

    def forward(
        self,
        hidden_states,
        attention_mask=None,
        head_mask=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        past_key_values=None,
        use_cache=None,
        output_attentions=False,
        output_hidden_states=False,
        return_dict=True,
    ):
        all_hidden_states = () if output_hidden_states else None
        all_self_attentions = () if output_attentions else None
        all_cross_attentions = (
        ) if output_attentions and self.config.add_cross_attention else None

        next_decoder_cache = () if use_cache else None
        for i, layer_module in enumerate(self.layer):
            if output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_states, )

            layer_head_mask = head_mask[i] if head_mask is not None else None
            past_key_value = past_key_values[
                i] if past_key_values is not None else None

            if self.gradient_checkpointing and self.training:

                if use_cache:
                    logger.warning(
                        '`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...'
                    )
                    use_cache = False

                def create_custom_forward(module):

                    def custom_forward(*inputs):
                        return module(*inputs, past_key_value,
                                      output_attentions)

                    return custom_forward

                layer_outputs = torch.utils.checkpoint.checkpoint(
                    create_custom_forward(layer_module),
                    hidden_states,
                    attention_mask,
                    layer_head_mask,
                    encoder_hidden_states,
                    encoder_attention_mask,
                )
            else:
                layer_outputs = layer_module(
                    hidden_states,
                    attention_mask,
                    layer_head_mask,
                    encoder_hidden_states,
                    encoder_attention_mask,
                    past_key_value,
                    output_attentions,
                )

            hidden_states = layer_outputs[0]
            if use_cache:
                next_decoder_cache += (layer_outputs[-1], )
            if output_attentions:
                all_self_attentions = all_self_attentions + (
                    layer_outputs[1], )
                if self.config.add_cross_attention:
                    all_cross_attentions = all_cross_attentions + (
                        layer_outputs[2], )

        if output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states, )

        if not return_dict:
            return tuple(v for v in [
                hidden_states,
                next_decoder_cache,
                all_hidden_states,
                all_self_attentions,
                all_cross_attentions,
            ] if v is not None)
        return BaseModelOutputWithPastAndCrossAttentions(
            last_hidden_state=hidden_states,
            past_key_values=next_decoder_cache,
            hidden_states=all_hidden_states,
            attentions=all_self_attentions,
            cross_attentions=all_cross_attentions,
        )


 class SbertPooler(nn.Module):

    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        self.activation = nn.Tanh()

    def forward(self, hidden_states):
        # We "pool" the model by simply taking the hidden state corresponding
        # to the first token.
        first_token_tensor = hidden_states[:, 0]
        pooled_output = self.dense(first_token_tensor)
        pooled_output = self.activation(pooled_output)
        return pooled_output


@dataclass
 class SbertForPreTrainingOutput(ModelOutput):
    """
    Output type of :class:`~structbert.utils.BertForPreTraining`.

    Args:
        loss (`optional`, returned when ``labels`` is provided, ``torch.FloatTensor`` of shape :obj:`(1,)`):
            Total loss as the sum of the masked language modeling loss and the next sequence prediction
            (classification) loss.
        prediction_logits (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`):
            Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
        seq_relationship_logits (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, 2)`):
            Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation
            before SoftMax).
        hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when
        ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
            Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
            of shape :obj:`(batch_size, sequence_length, hidden_size)`.

            Hidden-states of the model at the output of each layer plus the initial embedding outputs.
        attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when
        ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
            Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape :obj:`(batch_size, num_heads,
            sequence_length, sequence_length)`.

            Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
            heads.
    """

    loss: Optional[torch.FloatTensor] = None
    prediction_logits: torch.FloatTensor = None
    seq_relationship_logits: torch.FloatTensor = None
    hidden_states: Optional[Tuple[torch.FloatTensor]] = None
    attentions: Optional[Tuple[torch.FloatTensor]] = None


@dataclass
 class BaseModelOutputWithPoolingAndCrossAttentionsWithEmbedding(
        BaseModelOutputWithPoolingAndCrossAttentions):
    embedding_output: torch.FloatTensor = None
    logits: Optional[Union[tuple, torch.FloatTensor]] = None
    kwargs: dict = None
--- a/modelscope/models/nlp/backbones/gpt3/init.py
+++ b/modelscope/models/nlp/backbones/gpt3/init.py
@@ -6,10 +6,12 @@ from modelscope.utils.import_utils import LazyImportModule
 if TYPE_CHECKING:
    from .configuration_gpt3 import GPT3Config
    from .modeling_gpt3 import GPT3Model
    from .gpt3_for_text_generation import GPT3ForTextGeneration
 else:
    _import_structure = {
        'configuration_gpt3': ['GPT3Config'],
        'modeling_gpt3': ['GPT3Model']
        'modeling_gpt3': ['GPT3Model'],
        'gpt3_for_text_generation': ['GPT3ForTextGeneration'],
    }

    import sys
--- a/modelscope/models/nlp/backbones/gpt3/configuration_gpt3.py
+++ b/modelscope/models/nlp/backbones/gpt3/configuration_gpt3.py
--- a/modelscope/models/nlp/gpt3/gpt3_for_text_generation.py
+++ b/modelscope/models/nlp/gpt3/gpt3_for_text_generation.py
@@ -20,7 +20,7 @@ class GPT3ForTextGeneration(TorchModel):
        """
        super().__init__(model_dir, *args, **kwargs)

        from modelscope.models.nlp import GPT3Model
        from modelscope.models.nlp.gpt3 import GPT3Model
        from transformers import BertTokenizer

        self.model = GPT3Model.from_pretrained(model_dir)
--- a/modelscope/models/nlp/backbones/gpt3/modeling_gpt3.py
+++ b/modelscope/models/nlp/backbones/gpt3/modeling_gpt3.py
--- a/modelscope/models/nlp/heads/init.py
+++ b/modelscope/models/nlp/heads/init.py
@@ -5,9 +5,11 @@ from modelscope.utils.import_utils import LazyImportModule

 if TYPE_CHECKING:
    from .sequence_classification_head import SequenceClassificationHead
    from .torch_pretrain_head import BertMLMHead, RobertaMLMHead
 else:
    _import_structure = {
        'sequence_classification_head': ['SequenceClassificationHead']
        'sequence_classification_head': ['SequenceClassificationHead'],
        'torch_pretrain_head': ['BertMLMHead', 'RobertaMLMHead'],
    }

    import sys
--- a/modelscope/models/nlp/heads/sequence_classification_head.py
+++ b/modelscope/models/nlp/heads/sequence_classification_head.py
@@ -1,5 +1,4 @@
 import importlib
 from typing import Dict, List, Optional, Union
 from typing import Dict

 import torch
 import torch.nn.functional as F
--- a/modelscope/models/nlp/heads/torch_pretrain_head.py
+++ b/modelscope/models/nlp/heads/torch_pretrain_head.py
@@ -0,0 +1,26 @@
 from typing import Dict

 import torch
 from transformers.models.bert.modeling_bert import BertOnlyMLMHead
 from transformers.models.roberta.modeling_roberta import RobertaLMHead

 from modelscope.metainfo import Heads
 from modelscope.models.base import TorchHead
 from modelscope.models.builder import HEADS
 from modelscope.utils.constant import Tasks


@HEADS.register_module(Tasks.fill_mask, module_name=Heads.bert_mlm)
 class BertMLMHead(BertOnlyMLMHead, TorchHead):

    def compute_loss(self, outputs: Dict[str, torch.Tensor],
                     labels) -> Dict[str, torch.Tensor]:
        raise NotImplementedError()


@HEADS.register_module(Tasks.fill_mask, module_name=Heads.roberta_mlm)
 class RobertaMLMHead(RobertaLMHead, TorchHead):

    def compute_loss(self, outputs: Dict[str, torch.Tensor],
                     labels) -> Dict[str, torch.Tensor]:
        raise NotImplementedError()
--- a/modelscope/models/nlp/masked_language.py
+++ b/modelscope/models/nlp/masked_language.py
@@ -1,72 +1,115 @@
 from typing import Dict
 from typing import Any, Dict, Optional, Union

 import numpy as np
 from transformers import BertForMaskedLM as BertForMaskedLMTransformer

 from modelscope.metainfo import Models
 from modelscope.models import TorchModel
 from modelscope.models.base import Tensor
 from modelscope.models.base import TorchModel
 from modelscope.models.builder import MODELS
 from modelscope.models.nlp.structbert import SbertForMaskedLM
 from modelscope.models.nlp.veco import \
    VecoForMaskedLM as VecoForMaskedLMTransformer
 from modelscope.outputs import OutputKeys
 from modelscope.utils.constant import Tasks

 __all__ = ['BertForMaskedLM', 'StructBertForMaskedLM', 'VecoForMaskedLM']


 class MaskedLanguageModelBase(TorchModel):

    def __init__(self, model_dir: str, *args, **kwargs):
        super().__init__(model_dir, *args, **kwargs)
        self.model = self.build_model()

    def build_model(self):
        raise NotImplementedError()

    def train(self):
        return self.model.train()

    def eval(self):
        return self.model.eval()

    @property
    def config(self):
        if hasattr(self.model, 'config'):
            return self.model.config
        return None

    def forward(self, input: Dict[str, Tensor]) -> Dict[str, np.ndarray]:
        """return the result by the model

        Args:
            input (Dict[str, Any]): the preprocessed data

        Returns:
            Dict[str, np.ndarray]: results
        """
        rst = self.model(
            input_ids=input['input_ids'],
            attention_mask=input['attention_mask'],
            token_type_ids=input['token_type_ids'])
        return {'logits': rst['logits'], 'input_ids': input['input_ids']}


@MODELS.register_module(Tasks.fill_mask, module_name=Models.structbert)
 class StructBertForMaskedLM(MaskedLanguageModelBase):

    def build_model(self):
        from sofa import SbertForMaskedLM
        return SbertForMaskedLM.from_pretrained(self.model_dir)


@MODELS.register_module(Tasks.fill_mask, module_name=Models.veco)
 class VecoForMaskedLM(MaskedLanguageModelBase):

    def build_model(self):
        from sofa import VecoForMaskedLM
        return VecoForMaskedLM.from_pretrained(self.model_dir)
 class StructBertForMaskedLM(TorchModel, SbertForMaskedLM):

    def __init__(self, config, model_dir):
        super(TorchModel, self).__init__(model_dir)
        SbertForMaskedLM.__init__(self, config)

    def forward(self,
                input_ids=None,
                attention_mask=None,
                token_type_ids=None,
                position_ids=None,
                head_mask=None,
                labels=None):
        output = SbertForMaskedLM.forward(
            self,
            input_ids=input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            labels=labels)
        output[OutputKeys.INPUT_IDS] = input_ids
        return output

    @classmethod
    def _instantiate(cls, **kwargs):
        model_dir = kwargs.get('model_dir')
        return super(SbertForMaskedLM, StructBertForMaskedLM).from_pretrained(
            pretrained_model_name_or_path=model_dir, model_dir=model_dir)


@MODELS.register_module(Tasks.fill_mask, module_name=Models.bert)
 class BertForMaskedLM(MaskedLanguageModelBase):
 class BertForMaskedLM(TorchModel, BertForMaskedLMTransformer):

    def __init__(self, config, model_dir):
        super(TorchModel, self).__init__(model_dir)
        BertForMaskedLMTransformer.__init__(self, config)

    def forward(self,
                input_ids=None,
                attention_mask=None,
                token_type_ids=None,
                position_ids=None,
                head_mask=None,
                labels=None):
        output = BertForMaskedLMTransformer.forward(
            self,
            input_ids=input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            labels=labels)
        output[OutputKeys.INPUT_IDS] = input_ids
        return output

    @classmethod
    def _instantiate(cls, **kwargs):
        model_dir = kwargs.get('model_dir')
        return super(BertForMaskedLMTransformer,
                     BertForMaskedLM).from_pretrained(
                         pretrained_model_name_or_path=model_dir,
                         model_dir=model_dir)

    def build_model(self):
        from transformers import BertForMaskedLM
        return BertForMaskedLM.from_pretrained(self.model_dir)

@MODELS.register_module(Tasks.fill_mask, module_name=Models.veco)
 class VecoForMaskedLM(TorchModel, VecoForMaskedLMTransformer):

    def __init__(self, config, model_dir):
        super(TorchModel, self).__init__(model_dir)
        VecoForMaskedLMTransformer.__init__(self, config)

    def forward(self,
                input_ids=None,
                attention_mask=None,
                token_type_ids=None,
                position_ids=None,
                head_mask=None,
                labels=None):
        output = VecoForMaskedLMTransformer.forward(
            self,
            input_ids=input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            labels=labels)
        output[OutputKeys.INPUT_IDS] = input_ids
        return output

    @classmethod
    def _instantiate(cls, **kwargs):
        model_dir = kwargs.get('model_dir')
        return super(VecoForMaskedLMTransformer,
                     VecoForMaskedLM).from_pretrained(
                         pretrained_model_name_or_path=model_dir,
                         model_dir=model_dir)
--- a/modelscope/models/nlp/palm_v2/init.py
+++ b/modelscope/models/nlp/palm_v2/init.py
@@ -0,0 +1,43 @@
 # Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
 # All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from typing import TYPE_CHECKING

 from modelscope.utils.import_utils import LazyImportModule

 if TYPE_CHECKING:
    from .configuration_palm import PalmConfig
    from .modeling_palm import (
        AbsSummarizer,
        PalmForConditionalGeneration,
        Translator,
    )
    from .palm_for_text_generation import PalmForTextGeneration
 else:
    _import_structure = {
        'configuration_palm': ['PalmConfig'],
        'modeling_palm':
        ['AbsSummarizer', 'PalmForConditionalGeneration', 'Translator'],
        'palm_for_text_generation': ['PalmForTextGeneration'],
    }

    import sys

    sys.modules[__name__] = LazyImportModule(
        __name__,
        globals()['__file__'],
        _import_structure,
        module_spec=__spec__,
        extra_objects={},
    )
--- a/modelscope/models/nlp/palm_v2/configuration_palm.py
+++ b/modelscope/models/nlp/palm_v2/configuration_palm.py
@@ -0,0 +1,116 @@
 # Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
 # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
 # Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """ PALM model configuration """

 from transformers.configuration_utils import PretrainedConfig

 from modelscope.utils import logger as logging

 logger = logging.get_logger(__name__)


 class PalmConfig(PretrainedConfig):
    r"""
    Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
    outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.


    Args:
        vocab_size (:obj:`int`, `optional`, defaults to 30522):
            Vocabulary size of the BERT model. Defines the number of different tokens that can be represented by the
            :obj:`inputs_ids` passed when calling :class:`~transformers.BertModel` or
            :class:`~transformers.TFBertModel`.
        hidden_size (:obj:`int`, `optional`, defaults to 768):
            Dimensionality of the encoder layers and the pooler layer.
        num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
            Number of hidden layers in the Transformer encoder.
        num_attention_heads (:obj:`int`, `optional`, defaults to 12):
            Number of attention heads for each attention layer in the Transformer encoder.
        intermediate_size (:obj:`int`, `optional`, defaults to 3072):
            Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
        hidden_act (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`):
            The non-linear activation function (function or string) in the encoder and pooler. If string,
            :obj:`"gelu"`, :obj:`"relu"`, :obj:`"silu"` and :obj:`"gelu_new"` are supported.
        hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
        attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
            The dropout ratio for the attention probabilities.
        max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
            The maximum sequence length that this model might ever be used with. Typically set this to something large
            just in case (e.g., 512 or 1024 or 2048).
        type_vocab_size (:obj:`int`, `optional`, defaults to 2):
            The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.BertModel` or
            :class:`~transformers.TFBertModel`.
        initializer_range (:obj:`float`, `optional`, defaults to 0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
        layernorm_epsilon (:obj:`float`, `optional`, defaults to 1e-12):
            The epsilon used by the layer normalization layers.
        dec_hidden_layers (:obj:`int`, `optional`, defaults to 12):
            Number of hidden layers in the Transformer decoder.
        attn_separate (:obj:`bool`, `optional`, defaults to false):
            Whether or not to separate the q, k, v of attention.

    Examples::

        >>> from modelscope.models.nlp.palm_v2 import PalmForConditionalGeneration, PalmConfig
        >>> configuration = PalmConfig()

        >>> # Initializing a model from the configuration
        >>> model = PalmForConditionalGeneration(configuration)

        >>> # Accessing the model configuration
        >>> configuration = model.config
    """
    model_type = 'palm'

    def __init__(self,
                 encoder='roberta',
                 encoder_pth='roberta-base',
                 max_pos=512,
                 share_emb=False,
                 dec_layers=12,
                 dec_hidden_size=768,
                 dec_heads=8,
                 dec_ff_size=3072,
                 dec_dropout=0.2,
                 use_bert_emb=True,
                 label_smoothing=0.1,
                 alpha=0.95,
                 beam_size=5,
                 min_length=40,
                 max_length=130,
                 sample_topk=False,
                 block_trigram=False,
                 **kwargs):
        super().__init__(**kwargs)
        self.encoder = encoder
        self.encoder_pth = encoder_pth
        self.max_pos = max_pos
        self.share_emb = share_emb
        self.dec_layers = dec_layers
        self.dec_hidden_size = dec_hidden_size
        self.dec_heads = dec_heads
        self.dec_ff_size = dec_ff_size
        self.dec_dropout = dec_dropout
        self.use_bert_emb = use_bert_emb
        self.label_smoothing = label_smoothing
        # Translator
        self.alpha = alpha
        self.beam_size = beam_size
        self.min_length = min_length
        self.max_length = max_length
        self.sample_topk = sample_topk
        self.block_trigram = block_trigram
--- a/modelscope/models/nlp/palm_v2/dureader_eval.py
+++ b/modelscope/models/nlp/palm_v2/dureader_eval.py
@@ -0,0 +1,872 @@
 # ==============================================================================
 # Copyright 2017 Baidu.com, Inc. All Rights Reserved
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #    http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
 """
 This module computes evaluation metrics for DuReader dataset.
 """

 import argparse
 import copy
 import math
 import re
 import sys
 import zipfile
 from collections import Counter, defaultdict

 import json
 import numpy as np
 from rouge import Rouge

 EMPTY = ''
 YESNO_LABELS = set(['Yes', 'No', 'Depends'])


 def my_lcs(string, sub):
    """
    Calculates longest common subsequence for a pair of tokenized strings
    :param string : list of str : tokens from a string split using whitespace
    :param sub : list of str : shorter string, also split using whitespace
    :returns: length (list of int): length of the longest common subsequence between the two strings

    Note: my_lcs only gives length of the longest common subsequence, not the actual LCS
    """
    if (len(string) < len(sub)):
        sub, string = string, sub

    lengths = [[0 for i in range(0,
                                 len(sub) + 1)]
               for j in range(0,
                              len(string) + 1)]

    for j in range(1, len(sub) + 1):
        for i in range(1, len(string) + 1):
            if (string[i - 1] == sub[j - 1]):
                lengths[i][j] = lengths[i - 1][j - 1] + 1
            else:
                lengths[i][j] = max(lengths[i - 1][j], lengths[i][j - 1])

    return lengths[len(string)][len(sub)]


 class Bleu:

    def __init__(self, n=4):
        # default compute Blue score up to 4
        self._n = n
        self._hypo_for_image = {}
        self.ref_for_image = {}

    def compute_score(self, gts, res):
        assert (list(gts.keys()) == list(res.keys()))
        imgIds = list(gts.keys())

        bleu_scorer = BleuScorer(n=self._n)
        for id in imgIds:
            hypo = res[id]
            ref = gts[id]

            # Sanity check.
            assert (type(hypo) is list)
            assert (len(hypo) == 1)
            assert (type(ref) is list)
            assert (len(ref) >= 1)

            bleu_scorer += (hypo[0], ref)

        score, scores = bleu_scorer.compute_score(option='closest', verbose=1)
        return score, scores

    def method(self):
        return 'Bleu'


 def precook(s, n=4, out=False):
    """Takes a string as input and returns an object that can be given to
    either cook_refs or cook_test. This is optional: cook_refs and cook_test
    can take string arguments as well."""
    words = s.split()
    counts = defaultdict(int)
    for k in range(1, n + 1):
        for i in range(len(words) - k + 1):
            ngram = tuple(words[i:i + k])
            counts[ngram] += 1
    return (len(words), counts)


 def cook_refs(refs, eff=None, n=4):  # lhuang: oracle will call with "average"
    '''Takes a list of reference sentences for a single segment
    and returns an object that encapsulates everything that BLEU
    needs to know about them.'''

    reflen = []
    maxcounts = {}
    for ref in refs:
        rl, counts = precook(ref, n)
        reflen.append(rl)
        for (ngram, count) in counts.items():
            maxcounts[ngram] = max(maxcounts.get(ngram, 0), count)

    # Calculate effective reference sentence length.
    if eff == 'shortest':
        reflen = min(reflen)
    elif eff == 'average':
        reflen = float(sum(reflen)) / len(reflen)

    # lhuang: N.B.: leave reflen computaiton to the very end!!

    # lhuang: N.B.: in case of "closest", keep a list of reflens!! (bad design)

    return reflen, maxcounts


 def cook_test(test, xxx_todo_changeme, eff=None, n=4):
    '''Takes a test sentence and returns an object that
    encapsulates everything that BLEU needs to know about it.'''
    (reflen, refmaxcounts) = xxx_todo_changeme
    testlen, counts = precook(test, n, True)

    result = {}

    # Calculate effective reference sentence length.

    if eff == 'closest':
        result['reflen'] = min((abs(ref - testlen), ref) for ref in reflen)[1]
    else:  # i.e., "average" or "shortest" or None
        result['reflen'] = reflen

    result['testlen'] = testlen

    result['guess'] = [max(0, testlen - k + 1) for k in range(1, n + 1)]

    result['correct'] = [0] * n
    for (ngram, count) in counts.items():
        result['correct'][len(ngram) - 1] += min(
            refmaxcounts.get(ngram, 0), count)

    return result


 class BleuScorer(object):
    """Bleu scorer.
    """

    __slots__ = 'n', 'crefs', 'ctest', '_score', '_ratio', '_testlen', '_reflen', 'special_reflen'

    # special_reflen is used in oracle (proportional effective ref len for a node).

    def copy(self):
        ''' copy the refs.'''
        new = BleuScorer(n=self.n)
        new.ctest = copy.copy(self.ctest)
        new.crefs = copy.copy(self.crefs)
        new._score = None
        return new

    def __init__(self, test=None, refs=None, n=4, special_reflen=None):
        ''' singular instance '''

        self.n = n
        self.crefs = []
        self.ctest = []
        self.cook_append(test, refs)
        self.special_reflen = special_reflen

    def cook_append(self, test, refs):
        '''called by constructor and __iadd__ to avoid creating new instances.'''

        if refs is not None:
            self.crefs.append(cook_refs(refs))
            if test is not None:
                cooked_test = cook_test(test, self.crefs[-1])
                self.ctest.append(cooked_test)  # N.B.: -1
            else:
                self.ctest.append(
                    None)  # lens of crefs and ctest have to match

        self._score = None  # need to recompute

    def ratio(self, option=None):
        self.compute_score(option=option)
        return self._ratio

    def score_ratio(self, option=None):
        '''return (bleu, len_ratio) pair'''
        return (self.fscore(option=option), self.ratio(option=option))

    def score_ratio_str(self, option=None):
        return '%.4f (%.2f)' % self.score_ratio(option)

    def reflen(self, option=None):
        self.compute_score(option=option)
        return self._reflen

    def testlen(self, option=None):
        self.compute_score(option=option)
        return self._testlen

    def retest(self, new_test):
        if type(new_test) is str:
            new_test = [new_test]
        assert len(new_test) == len(self.crefs), new_test
        self.ctest = []
        for t, rs in zip(new_test, self.crefs):
            self.ctest.append(cook_test(t, rs))
        self._score = None

        return self

    def rescore(self, new_test):
        ''' replace test(s) with new test(s), and returns the new score.'''

        return self.retest(new_test).compute_score()

    def size(self):
        assert len(self.crefs) == len(
            self.ctest), 'refs/test mismatch! %d<>%d' % (len(
                self.crefs), len(self.ctest))
        return len(self.crefs)

    def __iadd__(self, other):
        '''add an instance (e.g., from another sentence).'''

        if type(other) is tuple:
            # avoid creating new BleuScorer instances
            self.cook_append(other[0], other[1])
        else:
            assert self.compatible(other), 'incompatible BLEUs.'
            self.ctest.extend(other.ctest)
            self.crefs.extend(other.crefs)
            self._score = None  # need to recompute

        return self

    def compatible(self, other):
        return isinstance(other, BleuScorer) and self.n == other.n

    def single_reflen(self, option='average'):
        return self._single_reflen(self.crefs[0][0], option)

    def _single_reflen(self, reflens, option=None, testlen=None):

        if option == 'shortest':
            reflen = min(reflens)
        elif option == 'average':
            reflen = float(sum(reflens)) / len(reflens)
        elif option == 'closest':
            reflen = min((abs(ref - testlen), ref) for ref in reflens)[1]
        else:
            assert False, 'unsupported reflen option %s' % option

        return reflen

    def recompute_score(self, option=None, verbose=0):
        self._score = None
        return self.compute_score(option, verbose)

    def compute_score(self, option=None, verbose=0):
        n = self.n
        small = 1e-9
        tiny = 1e-15  # so that if guess is 0 still return 0
        bleu_list = [[] for _ in range(n)]

        if self._score is not None:
            return self._score

        if option is None:
            option = 'average' if len(self.crefs) == 1 else 'closest'

        self._testlen = 0
        self._reflen = 0
        totalcomps = {
            'testlen': 0,
            'reflen': 0,
            'guess': [0] * n,
            'correct': [0] * n
        }

        # for each sentence
        for comps in self.ctest:
            testlen = comps['testlen']
            self._testlen += testlen

            if self.special_reflen is None:  # need computation
                reflen = self._single_reflen(comps['reflen'], option, testlen)
            else:
                reflen = self.special_reflen

            self._reflen += reflen

            for key in ['guess', 'correct']:
                for k in range(n):
                    totalcomps[key][k] += comps[key][k]

            # append per image bleu score
            bleu = 1.
            for k in range(n):
                bleu *= (float(comps['correct'][k]) + tiny) / (
                    float(comps['guess'][k]) + small)
                bleu_list[k].append(bleu**(1. / (k + 1)))
            ratio = (testlen + tiny) / (reflen + small
                                        )  # N.B.: avoid zero division
            if ratio < 1:
                for k in range(n):
                    bleu_list[k][-1] *= math.exp(1 - 1 / ratio)

            if verbose > 1:
                print(comps, reflen)

        totalcomps['reflen'] = self._reflen
        totalcomps['testlen'] = self._testlen

        bleus = []
        bleu = 1.
        for k in range(n):
            bleu *= float(totalcomps['correct'][k] + tiny) / (
                totalcomps['guess'][k] + small)
            bleus.append(bleu**(1. / (k + 1)))
        ratio = (self._testlen + tiny) / (self._reflen + small
                                          )  # N.B.: avoid zero division
        if ratio < 1:
            for k in range(n):
                bleus[k] *= math.exp(1 - 1 / ratio)

        if verbose > 0:
            print(totalcomps)
            print('ratio:', ratio)

        self._score = bleus
        return self._score, bleu_list


 def normalize(s):
    """
    Normalize strings to space joined chars.

    Args:
        s: a list of strings.

    Returns:
        A list of normalized strings.
    """
    if not s:
        return s
    normalized = []
    for ss in s:
        tokens = [c for c in list(ss) if len(c.strip()) != 0]
        normalized.append(' '.join(tokens))
    return normalized


 def data_check(obj, task):
    """
    Check data.

    Raises:
        Raises AssertionError when data is not legal.
    """
    assert 'question_id' in obj, "Missing 'question_id' field."
    assert 'question_type' in obj, \
        "Missing 'question_type' field. question_id: {}".format(obj['question_type'])

    assert 'yesno_answers' in obj, \
        "Missing 'yesno_answers' field. question_id: {}".format(obj['question_id'])
    assert isinstance(obj['yesno_answers'], list), \
        r"""'yesno_answers' field must be a list, if the 'question_type' is not
            'YES_NO', then this field should be an empty list.
            question_id: {}""".format(obj['question_id'])

    assert 'entity_answers' in obj, \
        "Missing 'entity_answers' field. question_id: {}".format(obj['question_id'])
    assert isinstance(
        obj['entity_answers'],
        list) and len(obj['entity_answers']) > 0, r"""'entity_answers' field
            must be a list, and has at least one element, which can be a empty list.
            question_id: {}""".format(obj['question_id'])


 def read_file(file_name, task, is_ref=False):
    """
    Read predict answers or reference answers from file.

    Args:
        file_name: the name of the file containing predict result or reference
                   result.

    Returns:
        A dictionary mapping question_id to the result information. The result
        information itself is also a dictionary with has four keys:
        - question_type: type of the query.
        - yesno_answers: A list of yesno answers corresponding to 'answers'.
        - answers: A list of predicted answers.
        - entity_answers: A list, each element is also a list containing the entities
                    tagged out from the corresponding answer string.
    """

    def _open(file_name, mode, zip_obj=None):
        if zip_obj is not None:
            return zip_obj.open(file_name, mode)
        return open(file_name, mode)

    results = {}
    keys = ['answers', 'yesno_answers', 'entity_answers', 'question_type']
    if is_ref:
        keys += ['source']

    zf = zipfile.ZipFile(file_name,
                         'r') if file_name.endswith('.zip') else None
    file_list = [file_name] if zf is None else zf.namelist()

    for fn in file_list:
        for line in _open(fn, 'r', zip_obj=zf):
            try:
                obj = json.loads(line.strip())
            except ValueError:
                raise ValueError('Every line of data should be legal json')
            data_check(obj, task)
            qid = obj['question_id']
            assert qid not in results, 'Duplicate question_id: {}'.format(qid)
            results[qid] = {}
            for k in keys:
                results[qid][k] = obj[k]
    return results


 def compute_bleu_rouge(pred_dict, ref_dict, bleu_order=4):
    """
    Compute bleu and rouge scores.
    """
    assert set(pred_dict.keys()) == set(ref_dict.keys()), \
        'missing keys: {}'.format(set(ref_dict.keys()) - set(pred_dict.keys()))
    scores = {}
    bleu_scores, _ = Bleu(bleu_order).compute_score(ref_dict, pred_dict)
    for i, bleu_score in enumerate(bleu_scores):
        scores['Bleu-%d' % (i + 1)] = bleu_score
    # rouge_score, _ = Rouge().compute_score(ref_dict, pred_dict)
    rouge_score = Rouge().get_scores(
        list(map(lambda x: x[0], pred_dict.values())),
        list(map(lambda x: x[0], ref_dict.values())))
    rouge_score = sum([d['rouge-l']['f']
                       for d in rouge_score]) / len(rouge_score)
    scores['Rouge-L'] = rouge_score
    return scores


 def local_prf(pred_list, ref_list):
    """
    Compute local precision recall and f1-score,
    given only one prediction list and one reference list
    """
    common = Counter(pred_list) & Counter(ref_list)
    num_same = sum(common.values())
    if num_same == 0:
        return 0, 0, 0
    p = 1.0 * num_same / len(pred_list)
    r = 1.0 * num_same / len(ref_list)
    f1 = (2 * p * r) / (p + r)
    return p, r, f1


 def compute_prf(pred_dict, ref_dict):
    """
    Compute precision recall and f1-score.
    """
    # pred_question_ids = set(pred_dict.keys())
    ref_question_ids = set(ref_dict.keys())
    correct_preds, total_correct, total_preds = 0, 0, 0
    for question_id in ref_question_ids:
        pred_entity_list = pred_dict.get(question_id, [[]])
        assert len(pred_entity_list) == 1, \
            'the number of entity list for question_id {} is not 1.'.format(question_id)
        pred_entity_list = pred_entity_list[0]
        all_ref_entity_lists = ref_dict[question_id]
        best_local_f1 = 0
        best_ref_entity_list = None
        for ref_entity_list in all_ref_entity_lists:
            local_f1 = local_prf(pred_entity_list, ref_entity_list)[2]
            if local_f1 > best_local_f1:
                best_ref_entity_list = ref_entity_list
                best_local_f1 = local_f1
        if best_ref_entity_list is None:
            if len(all_ref_entity_lists) > 0:
                best_ref_entity_list = sorted(
                    all_ref_entity_lists, key=lambda x: len(x))[0]
            else:
                best_ref_entity_list = []
        gold_entities = set(best_ref_entity_list)
        pred_entities = set(pred_entity_list)
        correct_preds += len(gold_entities & pred_entities)
        total_preds += len(pred_entities)
        total_correct += len(gold_entities)
    p = float(correct_preds) / total_preds if correct_preds > 0 else 0
    r = float(correct_preds) / total_correct if correct_preds > 0 else 0
    f1 = 2 * p * r / (p + r) if correct_preds > 0 else 0
    return {'Precision': p, 'Recall': r, 'F1': f1}


 def prepare_prf(pred_dict, ref_dict):
    """
    Prepares data for calculation of prf scores.
    """
    preds = {k: v['entity_answers'] for k, v in pred_dict.items()}
    refs = {k: v['entity_answers'] for k, v in ref_dict.items()}
    return preds, refs


 def filter_dict(result_dict, key_tag):
    """
    Filter a subset of the result_dict, where keys ends with 'key_tag'.
    """
    filtered = {}
    for k, v in result_dict.items():
        if k.endswith(key_tag):
            filtered[k] = v
    return filtered


 def get_metrics(pred_result, ref_result, task, source):
    """
    Computes metrics.
    """
    metrics = {}

    ref_result_filtered = {}
    pred_result_filtered = {}
    if source == 'both':
        ref_result_filtered = ref_result
        pred_result_filtered = pred_result
    else:
        for question_id, info in ref_result.items():
            if info['source'] == source:
                ref_result_filtered[question_id] = info
                if question_id in pred_result:
                    pred_result_filtered[question_id] = pred_result[
                        question_id]

    if task == 'main' or task == 'all' \
            or task == 'description':
        pred_dict, ref_dict = prepare_bleu(pred_result_filtered,
                                           ref_result_filtered, task)
        metrics = compute_bleu_rouge(pred_dict, ref_dict)
    elif task == 'yesno':
        pred_dict, ref_dict = prepare_bleu(pred_result_filtered,
                                           ref_result_filtered, task)
        keys = ['Yes', 'No', 'Depends']
        preds = [filter_dict(pred_dict, k) for k in keys]
        refs = [filter_dict(ref_dict, k) for k in keys]

        metrics = compute_bleu_rouge(pred_dict, ref_dict)

        for k, pred, ref in zip(keys, preds, refs):
            m = compute_bleu_rouge(pred, ref)
            k_metric = [(k + '|' + key, v) for key, v in m.items()]
            metrics.update(k_metric)

    elif task == 'entity':
        pred_dict, ref_dict = prepare_prf(pred_result_filtered,
                                          ref_result_filtered)
        pred_dict_bleu, ref_dict_bleu = prepare_bleu(pred_result_filtered,
                                                     ref_result_filtered, task)
        metrics = compute_prf(pred_dict, ref_dict)
        metrics.update(compute_bleu_rouge(pred_dict_bleu, ref_dict_bleu))
    else:
        raise ValueError('Illegal task name: {}'.format(task))

    return metrics


 def prepare_bleu(pred_result, ref_result, task):
    """
    Prepares data for calculation of bleu and rouge scores.
    """
    pred_list, ref_list = [], []
    qids = ref_result.keys()
    for qid in qids:
        if task == 'main':
            pred, ref = get_main_result(qid, pred_result, ref_result)
        elif task == 'yesno':
            pred, ref = get_yesno_result(qid, pred_result, ref_result)
        elif task == 'all':
            pred, ref = get_all_result(qid, pred_result, ref_result)
        elif task == 'entity':
            pred, ref = get_entity_result(qid, pred_result, ref_result)
        elif task == 'description':
            pred, ref = get_desc_result(qid, pred_result, ref_result)
        else:
            raise ValueError('Illegal task name: {}'.format(task))
        if pred and ref:
            pred_list += pred
            ref_list += ref
    pred_dict = dict(pred_list)
    ref_dict = dict(ref_list)
    for qid, ans in ref_dict.items():
        ref_dict[qid] = normalize(ref_dict[qid])
        pred_dict[qid] = normalize(pred_dict.get(qid, [EMPTY]))
        if not ans or ans == [EMPTY]:
            del ref_dict[qid]
            del pred_dict[qid]

    for k, v in pred_dict.items():
        assert len(v) == 1, \
            'There should be only one predict answer. question_id: {}'.format(k)
    return pred_dict, ref_dict


 def get_main_result(qid, pred_result, ref_result):
    """
    Prepare answers for task 'main'.

    Args:
        qid: question_id.
        pred_result: A dict include all question_id's result information read
                     from args.pred_file.
        ref_result: A dict incluce all question_id's result information read
                    from args.ref_file.
    Returns:
        Two lists, the first one contains predict result, the second
        one contains reference result of the same question_id. Each list has
        elements of tuple (question_id, answers), 'answers' is a list of strings.
    """
    ref_ans = ref_result[qid]['answers']
    if not ref_ans:
        ref_ans = [EMPTY]
    pred_ans = pred_result.get(qid, {}).get('answers', [])[:1]
    if not pred_ans:
        pred_ans = [EMPTY]

    return [(qid, pred_ans)], [(qid, ref_ans)]


 def get_entity_result(qid, pred_result, ref_result):
    """
    Prepare answers for task 'entity'.

    Args:
        qid: question_id.
        pred_result: A dict include all question_id's result information read
                     from args.pred_file.
        ref_result: A dict incluce all question_id's result information read
                    from args.ref_file.
    Returns:
        Two lists, the first one contains predict result, the second
        one contains reference result of the same question_id. Each list has
        elements of tuple (question_id, answers), 'answers' is a list of strings.
    """
    if ref_result[qid]['question_type'] != 'ENTITY':
        return None, None
    return get_main_result(qid, pred_result, ref_result)


 def get_desc_result(qid, pred_result, ref_result):
    """
    Prepare answers for task 'description'.

    Args:
        qid: question_id.
        pred_result: A dict include all question_id's result information read
                     from args.pred_file.
        ref_result: A dict incluce all question_id's result information read
                    from args.ref_file.
    Returns:
        Two lists, the first one contains predict result, the second
        one contains reference result of the same question_id. Each list has
        elements of tuple (question_id, answers), 'answers' is a list of strings.
    """
    if ref_result[qid]['question_type'] != 'DESCRIPTION':
        return None, None
    return get_main_result(qid, pred_result, ref_result)


 def get_yesno_result(qid, pred_result, ref_result):
    """
    Prepare answers for task 'yesno'.

    Args:
        qid: question_id.
        pred_result: A dict include all question_id's result information read
                     from args.pred_file.
        ref_result: A dict incluce all question_id's result information read
                    from args.ref_file.
    Returns:
        Two lists, the first one contains predict result, the second
        one contains reference result of the same question_id. Each list has
        elements of tuple (question_id, answers), 'answers' is a list of strings.
    """

    def _uniq(li, is_ref):
        uniq_li = []
        left = []
        keys = set()
        for k, v in li:
            if k not in keys:
                uniq_li.append((k, v))
                keys.add(k)
            else:
                left.append((k, v))

        if is_ref:
            dict_li = dict(uniq_li)
            for k, v in left:
                dict_li[k] += v
            uniq_li = [(k, v) for k, v in dict_li.items()]
        return uniq_li

    def _expand_result(uniq_li):
        expanded = uniq_li[:]
        keys = set([x[0] for x in uniq_li])
        for k in YESNO_LABELS - keys:
            expanded.append((k, [EMPTY]))
        return expanded

    def _get_yesno_ans(qid, result_dict, is_ref=False):
        if qid not in result_dict:
            return [(str(qid) + '_' + k, v) for k, v in _expand_result([])]
        yesno_answers = result_dict[qid]['yesno_answers']
        answers = result_dict[qid]['answers']
        lbl_ans = _uniq([(k, [v]) for k, v in zip(yesno_answers, answers)],
                        is_ref)
        ret = [(str(qid) + '_' + k, v) for k, v in _expand_result(lbl_ans)]
        return ret

    if ref_result[qid]['question_type'] != 'YES_NO':
        return None, None

    ref_ans = _get_yesno_ans(qid, ref_result, is_ref=True)
    pred_ans = _get_yesno_ans(qid, pred_result)
    return pred_ans, ref_ans


 def get_all_result(qid, pred_result, ref_result):
    """
    Prepare answers for task 'all'.

    Args:
        qid: question_id.
        pred_result: A dict include all question_id's result information read
                     from args.pred_file.
        ref_result: A dict incluce all question_id's result information read
                    from args.ref_file.
    Returns:
        Two lists, the first one contains predict result, the second
        one contains reference result of the same question_id. Each list has
        elements of tuple (question_id, answers), 'answers' is a list of strings.
    """
    if ref_result[qid]['question_type'] == 'YES_NO':
        return get_yesno_result(qid, pred_result, ref_result)
    return get_main_result(qid, pred_result, ref_result)


 def format_metrics(metrics, task, err_msg):
    """
    Format metrics. 'err' field returns any error occured during evaluation.

    Args:
        metrics: A dict object contains metrics for different tasks.
        task: Task name.
        err_msg: Exception raised during evaluation.
    Returns:
        Formatted result.
    """
    result = {}
    sources = ['both', 'search', 'zhidao']
    if err_msg is not None:
        return {'errorMsg': str(err_msg), 'errorCode': 1, 'data': []}
    data = []
    if task != 'all' and task != 'main':
        sources = ['both']

    if task == 'entity':
        metric_names = ['Bleu-4', 'Rouge-L']
        metric_names_prf = ['F1', 'Precision', 'Recall']
        for name in metric_names + metric_names_prf:
            for src in sources:
                obj = {
                    'name': name,
                    'value': round(metrics[src].get(name, 0) * 100, 2),
                    'type': src,
                }
                data.append(obj)
    elif task == 'yesno':
        metric_names = ['Bleu-4', 'Rouge-L']
        details = ['Yes', 'No', 'Depends']
        src = sources[0]
        for name in metric_names:
            obj = {
                'name': name,
                'value': round(metrics[src].get(name, 0) * 100, 2),
                'type': 'All',
            }
            data.append(obj)
            for d in details:
                obj = {
                    'name': name,
                    'value': round(metrics[src].get(d + '|' + name, 0) * 100,
                                   2),
                    'type': d
                }
                data.append(obj)
    else:
        metric_names = ['Bleu-4', 'Rouge-L']
        for name in metric_names:
            for src in sources:
                obj = {
                    'name': name,
                    'value': round(metrics[src].get(name, 0) * 100, 2),
                    'type': src
                }
                data.append(obj)

    result['data'] = data
    result['errorCode'] = 0
    result['errorMsg'] = 'success'

    return result


 def main(args):
    """
    Do evaluation.
    """
    err = None
    metrics = {}
    try:
        pred_result = read_file(args.pred_file, args.task)
        ref_result = read_file(args.ref_file, args.task, is_ref=True)
        sources = ['both', 'search', 'zhidao']
        if args.task not in set(['main', 'all']):
            sources = sources[:1]
        for source in sources:
            metrics[source] = get_metrics(pred_result, ref_result, args.task,
                                          source)
    except ValueError as ve:
        err = ve
    except AssertionError as ae:
        err = ae

    print(
        json.dumps(
            format_metrics(metrics, args.task, err),
            ensure_ascii=False).encode('utf8'))


 if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('pred_file', help='predict file')
    parser.add_argument('ref_file', help='reference file')
    parser.add_argument(
        'task', help='task name: Main|Yes_No|All|Entity|Description')

    args = parser.parse_args()
    args.task = args.task.lower().replace('_', '')
    main(args)
--- a/modelscope/models/nlp/palm_v2/modeling_palm.py
+++ b/modelscope/models/nlp/palm_v2/modeling_palm.py
--- a/modelscope/models/nlp/palm_v2/palm_for_text_generation.py
+++ b/modelscope/models/nlp/palm_v2/palm_for_text_generation.py
@@ -22,8 +22,8 @@ class PalmForTextGeneration(TorchModel):
        """
        super().__init__(model_dir, *args, **kwargs)

        from sofa.models.palm_v2 import (PalmForConditionalGeneration,
                                         Translator)
        from modelscope.models.nlp.palm_v2 import (
            PalmForConditionalGeneration, Translator)
        self.model = PalmForConditionalGeneration.from_pretrained(model_dir)
        self.tokenizer = self.model.tokenizer
        self.generator = Translator(self.model)
--- a/modelscope/models/nlp/sbert_for_nli.py
+++ b/modelscope/models/nlp/sbert_for_nli.py
@@ -1,23 +0,0 @@
 from modelscope.metainfo import Models
 from modelscope.models.builder import MODELS
 from modelscope.utils.constant import Tasks
 from .sbert_for_sequence_classification import \
    SbertForSequenceClassificationBase

 __all__ = ['SbertForNLI']


@MODELS.register_module(Tasks.nli, module_name=Models.structbert)
 class SbertForNLI(SbertForSequenceClassificationBase):

    def __init__(self, model_dir: str, *args, **kwargs):
        """initialize the text generation model from the `model_dir` path.

        Args:
            model_dir (str): the model path.
            model_cls (Optional[Any], optional): model loader, if None, use the
                default loader to load model weights, by default None.
        """
        super().__init__(
            model_dir, *args, model_args={'num_labels': 3}, **kwargs)
        assert self.model.config.num_labels == 3
--- a/modelscope/models/nlp/sbert_for_sentence_similarity.py
+++ b/modelscope/models/nlp/sbert_for_sentence_similarity.py
@@ -1,25 +0,0 @@
 from modelscope.metainfo import Models
 from modelscope.models.builder import MODELS
 from modelscope.utils.constant import Tasks
 from .sbert_for_sequence_classification import \
    SbertForSequenceClassificationBase

 __all__ = ['SbertForSentenceSimilarity']


@MODELS.register_module(
    Tasks.sentence_similarity, module_name=Models.structbert)
 class SbertForSentenceSimilarity(SbertForSequenceClassificationBase):

    def __init__(self, model_dir: str, *args, **kwargs):
        """initialize the sentence similarity model from the `model_dir` path.

        Args:
            model_dir (str): the model path.
            model_cls (Optional[Any], optional): model loader, if None, use the
                default loader to load model weights, by default None.
        """
        super().__init__(
            model_dir, *args, model_args={'num_labels': 2}, **kwargs)
        self.model_dir = model_dir
        assert self.model.config.num_labels == 2
--- a/modelscope/models/nlp/sbert_for_sentiment_classification.py
+++ b/modelscope/models/nlp/sbert_for_sentiment_classification.py
@@ -1,22 +0,0 @@
 from modelscope.metainfo import Models
 from modelscope.models.builder import MODELS
 from modelscope.utils.constant import Tasks
 from .sbert_for_sequence_classification import \
    SbertForSequenceClassificationBase

 __all__ = ['SbertForSentimentClassification']


@MODELS.register_module(
    Tasks.sentiment_classification, module_name=Models.structbert)
 class SbertForSentimentClassification(SbertForSequenceClassificationBase):

    def __init__(self, model_dir: str, *args, **kwargs):
        """initialize the text generation model from the `model_dir` path.

        Args:
            model_dir (str): the model path.
        """
        super().__init__(
            model_dir, *args, model_args={'num_labels': 2}, **kwargs)
        assert self.model.config.num_labels == 2
--- a/modelscope/models/nlp/sbert_for_sequence_classification.py
+++ b/modelscope/models/nlp/sbert_for_sequence_classification.py
@@ -1,82 +0,0 @@
 import os
 from typing import Any, Dict

 import json
 import numpy as np
 import torch
 from sofa.models.sbert.modeling_sbert import SbertModel, SbertPreTrainedModel
 from torch import nn

 from modelscope.models import TorchModel


 class SbertTextClassfier(SbertPreTrainedModel):

    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels
        self.config = config
        self.encoder = SbertModel(config, add_pooling_layer=True)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    def forward(self,
                input_ids=None,
                token_type_ids=None,
                labels=None,
                **kwargs):
        outputs = self.encoder(
            input_ids,
            token_type_ids=token_type_ids,
            return_dict=None,
        )
        pooled_output = outputs[1]
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
            return {'logits': logits, 'loss': loss}
        return {'logits': logits}

    def build(**kwags):
        return SbertTextClassfier.from_pretrained(model_dir, **model_args)


 class SbertForSequenceClassificationBase(TorchModel):

    def __init__(self, model_dir: str, model_args=None, *args, **kwargs):
        super().__init__(model_dir, *args, **kwargs)
        if model_args is None:
            model_args = {}
        self.model = SbertTextClassfier.from_pretrained(
            model_dir, **model_args)
        self.id2label = {}
        self.label_path = os.path.join(self.model_dir, 'label_mapping.json')
        if os.path.exists(self.label_path):
            with open(self.label_path) as f:
                self.label_mapping = json.load(f)
            self.id2label = {
                idx: name
                for name, idx in self.label_mapping.items()
            }

    def train(self):
        return self.model.train()

    def eval(self):
        return self.model.eval()

    def forward(self, input: Dict[str, Any]) -> Dict[str, np.ndarray]:
        input_ids = torch.tensor(input['input_ids'], dtype=torch.long)
        token_type_ids = torch.tensor(
            input['token_type_ids'], dtype=torch.long)
        return self.model.forward(input_ids, token_type_ids)

    def postprocess(self, input, **kwargs):
        logits = input['logits']
        probs = logits.softmax(-1).cpu().numpy()
        pred = logits.argmax(-1).cpu().numpy()
        logits = logits.cpu().numpy()
        res = {'predictions': pred, 'probabilities': probs, 'logits': logits}
        return res
--- a/modelscope/models/nlp/sbert_for_token_classification.py
+++ b/modelscope/models/nlp/sbert_for_token_classification.py
@@ -1,64 +0,0 @@
 from typing import Any, Dict, Union

 import numpy as np
 import torch

 from modelscope.metainfo import Models
 from modelscope.models import TorchModel
 from modelscope.models.base import Tensor
 from modelscope.models.builder import MODELS
 from modelscope.utils.constant import Tasks

 __all__ = ['SbertForTokenClassification']


@MODELS.register_module(Tasks.word_segmentation, module_name=Models.structbert)
 class SbertForTokenClassification(TorchModel):

    def __init__(self, model_dir: str, *args, **kwargs):
        """initialize the word segmentation model from the `model_dir` path.

        Args:
            model_dir (str): the model path.
            model_cls (Optional[Any], optional): model loader, if None, use the
                default loader to load model weights, by default None.
        """
        super().__init__(model_dir, *args, **kwargs)
        self.model_dir = model_dir
        import sofa
        self.model = sofa.SbertForTokenClassification.from_pretrained(
            self.model_dir)
        self.config = sofa.SbertConfig.from_pretrained(self.model_dir)

    def train(self):
        return self.model.train()

    def eval(self):
        return self.model.eval()

    def forward(self, input: Dict[str,
                                  Any]) -> Dict[str, Union[str, np.ndarray]]:
        """return the result by the model

        Args:
            input (Dict[str, Any]): the preprocessed data

        Returns:
            Dict[str, Union[str,np.ndarray]]: results
                Example:
                    {
                        'predictions': array([1,4]), # lable 0-negative 1-positive
                        'logits': array([[-0.53860897,  1.5029076 ]], dtype=float32) # true value
                        'text': str(今天),
                    }
        """
        input_ids = torch.tensor(input['input_ids']).unsqueeze(0)
        return {**self.model(input_ids), 'text': input['text']}

    def postprocess(self, input: Dict[str, Tensor],
                    **kwargs) -> Dict[str, Tensor]:
        logits = input['logits']
        pred = torch.argmax(logits[0], dim=-1)
        pred = pred.cpu().numpy()
        rst = {'predictions': pred, 'logits': logits, 'text': input['text']}
        return rst
--- a/modelscope/models/nlp/sbert_for_zero_shot_classification.py
+++ b/modelscope/models/nlp/sbert_for_zero_shot_classification.py
@@ -1,50 +0,0 @@
 from typing import Any, Dict

 import numpy as np

 from modelscope.metainfo import Models
 from modelscope.models import TorchModel
 from modelscope.models.builder import MODELS
 from modelscope.utils.constant import Tasks

 __all__ = ['SbertForZeroShotClassification']


@MODELS.register_module(
    Tasks.zero_shot_classification, module_name=Models.structbert)
 class SbertForZeroShotClassification(TorchModel):

    def __init__(self, model_dir: str, *args, **kwargs):
        """initialize the zero shot classification model from the `model_dir` path.

        Args:
            model_dir (str): the model path.
        """

        super().__init__(model_dir, *args, **kwargs)
        from sofa import SbertForSequenceClassification
        self.model = SbertForSequenceClassification.from_pretrained(model_dir)

    def train(self):
        return self.model.train()

    def eval(self):
        return self.model.eval()

    def forward(self, input: Dict[str, Any]) -> Dict[str, np.ndarray]:
        """return the result by the model

        Args:
            input (Dict[str, Any]): the preprocessed data

        Returns:
            Dict[str, np.ndarray]: results
                Example:
                    {
                        'logits': array([[-0.53860897,  1.5029076 ]], dtype=float32) # true value
                    }
        """
        outputs = self.model(**input)
        logits = outputs['logits'].cpu().numpy()
        res = {'logits': logits}
        return res
--- a/modelscope/models/nlp/sequence_classification.py
+++ b/modelscope/models/nlp/sequence_classification.py
@@ -1,85 +1,174 @@
 import os
 from typing import Any, Dict
 from abc import abstractmethod

 import json
 import numpy as np
 from torch import nn

 from modelscope.metainfo import TaskModels
 from modelscope.metainfo import Models
 from modelscope.models.base import TorchModel
 from modelscope.models.builder import MODELS
 from modelscope.models.nlp.structbert import SbertPreTrainedModel
 from modelscope.models.nlp.veco import \
    VecoForSequenceClassification as VecoForSequenceClassificationTransform
 from modelscope.outputs import OutputKeys
 from modelscope.utils.constant import Tasks
 from .task_model import SingleBackboneTaskModelBase
 from modelscope.utils.hub import parse_label_mapping
 from modelscope.utils.tensor_utils import (torch_nested_detach,
                                           torch_nested_numpify)

 __all__ = ['SequenceClassificationModel']
 __all__ = ['SbertForSequenceClassification', 'VecoForSequenceClassification']


@MODELS.register_module(
    Tasks.sentiment_classification, module_name=TaskModels.text_classification)
@MODELS.register_module(
    Tasks.text_classification, module_name=TaskModels.text_classification)
 class SequenceClassificationModel(SingleBackboneTaskModelBase):
 class SequenceClassificationBase(TorchModel):
    base_model_prefix: str = 'bert'

    def __init__(self, config, model_dir):
        super().__init__(model_dir)
        self.num_labels = config.num_labels
        self.config = config
        setattr(self, self.base_model_prefix, self.build_base_model())
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    def __init__(self, model_dir: str, *args, **kwargs):
        """initialize the sequence classification model from the `model_dir` path.
    @abstractmethod
    def build_base_model(self):
        """Build the backbone model.

        Args:
            model_dir (str): the model path.
        Returns: the backbone instance.
        """
        super().__init__(model_dir, *args, **kwargs)
        if 'base_model_prefix' in kwargs:
            self._base_model_prefix = kwargs['base_model_prefix']

        backbone_cfg = self.cfg.backbone
        head_cfg = self.cfg.head

        # get the num_labels from label_mapping.json
        self.id2label = {}
        self.label_path = os.path.join(model_dir, 'label_mapping.json')
        if os.path.exists(self.label_path):
            with open(self.label_path) as f:
                self.label_mapping = json.load(f)
            self.id2label = {
                idx: name
                for name, idx in self.label_mapping.items()
            }
        head_cfg['num_labels'] = len(self.label_mapping)

        self.build_backbone(backbone_cfg)
        self.build_head(head_cfg)

    def forward(self, input: Dict[str, Any]) -> Dict[str, np.ndarray]:
        outputs = super().forward(input)
        sequence_output, pooled_output = self.extract_backbone_outputs(outputs)
        outputs = self.head.forward(pooled_output)
        if 'labels' in input:
            loss = self.compute_loss(outputs, input['labels'])
            outputs.update(loss)
        return outputs

    def extract_logits(self, outputs):
        return outputs[OutputKeys.LOGITS].cpu().detach()

    def extract_backbone_outputs(self, outputs):
        sequence_output = None
        pooled_output = None
        if hasattr(self.backbone, 'extract_sequence_outputs'):
            sequence_output = self.backbone.extract_sequence_outputs(outputs)
        if hasattr(self.backbone, 'extract_pooled_outputs'):
            pooled_output = self.backbone.extract_pooled_outputs(outputs)
        return sequence_output, pooled_output

    def compute_loss(self, outputs, labels):
        loss = self.head.compute_loss(outputs, labels)
        return loss
        pass

    @property
    def base_model(self):
        return getattr(self, self.base_model_prefix)

    def forward(self, **kwargs):
        labels = None
        if OutputKeys.LABEL in kwargs:
            labels = kwargs.pop(OutputKeys.LABEL)
        elif OutputKeys.LABELS in kwargs:
            labels = kwargs.pop(OutputKeys.LABELS)

        outputs = self.base_model.forward(**kwargs)

        # backbone model should return pooled_output as its second output
        pooled_output = outputs[1]
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
            return {OutputKeys.LOGITS: logits, OutputKeys.LOSS: loss}
        return {OutputKeys.LOGITS: logits}

    def postprocess(self, input, **kwargs):
        logits = self.extract_logits(input)
        probs = logits.softmax(-1).numpy()
        pred = logits.argmax(-1).numpy()
        logits = logits.numpy()
        logits = input[OutputKeys.LOGITS]
        probs = torch_nested_numpify(torch_nested_detach(logits.softmax(-1)))
        pred = torch_nested_numpify(torch_nested_detach(logits.argmax(-1)))
        logits = torch_nested_numpify(torch_nested_detach(logits))
        res = {
            OutputKeys.PREDICTIONS: pred,
            OutputKeys.PROBABILITIES: probs,
            OutputKeys.LOGITS: logits
        }
        return res


@MODELS.register_module(
    Tasks.sentence_similarity, module_name=Models.structbert)
@MODELS.register_module(
    Tasks.sentiment_classification, module_name=Models.structbert)
@MODELS.register_module(Tasks.nli, module_name=Models.structbert)
@MODELS.register_module(
    Tasks.zero_shot_classification, module_name=Models.structbert)
 class SbertForSequenceClassification(SequenceClassificationBase,
                                     SbertPreTrainedModel):
    base_model_prefix: str = 'bert'
    supports_gradient_checkpointing = True
    _keys_to_ignore_on_load_missing = [r'position_ids']

    def __init__(self, config, model_dir):
        if hasattr(config, 'base_model_prefix'):
            SbertForSequenceClassification.base_model_prefix = config.base_model_prefix
        super().__init__(config, model_dir)

    def build_base_model(self):
        from .structbert import SbertModel
        return SbertModel(self.config, add_pooling_layer=True)

    def forward(self,
                input_ids=None,
                attention_mask=None,
                token_type_ids=None,
                labels=None,
                **kwargs):
        return super().forward(
            input_ids=input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            labels=labels)

    @classmethod
    def _instantiate(cls, **kwargs):
        model_dir = kwargs.get('model_dir')
        num_labels = kwargs.get('num_labels')
        if num_labels is None:
            label2id = parse_label_mapping(model_dir)
            if label2id is not None and len(label2id) > 0:
                num_labels = len(label2id)

        model_args = {} if num_labels is None else {'num_labels': num_labels}
        return super(SbertPreTrainedModel,
                     SbertForSequenceClassification).from_pretrained(
                         pretrained_model_name_or_path=kwargs.get('model_dir'),
                         model_dir=kwargs.get('model_dir'),
                         **model_args)


@MODELS.register_module(Tasks.sentence_similarity, module_name=Models.veco)
@MODELS.register_module(
    Tasks.sentiment_classification, module_name=Models.veco)
@MODELS.register_module(Tasks.nli, module_name=Models.veco)
 class VecoForSequenceClassification(TorchModel,
                                    VecoForSequenceClassificationTransform):

    def __init__(self, config, model_dir):
        super().__init__(model_dir)
        VecoForSequenceClassificationTransform.__init__(self, config)

    def forward(self,
                input_ids=None,
                attention_mask=None,
                token_type_ids=None,
                position_ids=None,
                head_mask=None,
                inputs_embeds=None,
                labels=None,
                output_attentions=None,
                output_hidden_states=None,
                **kwargs):
        return VecoForSequenceClassificationTransform.forward(
            self,
            input_ids=input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            labels=labels)

    @classmethod
    def _instantiate(cls, **kwargs):
        model_dir = kwargs.get('model_dir')
        num_labels = kwargs.get('num_labels')
        if num_labels is None:
            label2id = parse_label_mapping(model_dir)
            if label2id is not None and len(label2id) > 0:
                num_labels = len(label2id)

        model_args = {} if num_labels is None else {'num_labels': num_labels}
        return super(VecoForSequenceClassificationTransform,
                     VecoForSequenceClassification).from_pretrained(
                         pretrained_model_name_or_path=kwargs.get('model_dir'),
                         model_dir=kwargs.get('model_dir'),
                         **model_args)
--- a/modelscope/models/nlp/space/init.py
+++ b/modelscope/models/nlp/space/init.py
@@ -0,0 +1,28 @@
 from typing import TYPE_CHECKING

 from modelscope.utils.import_utils import LazyImportModule

 if TYPE_CHECKING:
    from .model import SpaceGenerator
    from .model import SpaceModelBase, SpaceTokenizer, SpaceConfig
    from .space_for_dialog_intent_prediction import SpaceForDialogIntent
    from .space_for_dialog_modeling import SpaceForDialogModeling
    from .space_for_dialog_state_tracking import SpaceForDialogStateTracking
 else:
    _import_structure = {
        'model':
        ['SpaceGenerator', 'SpaceModelBase', 'SpaceTokenizer', 'SpaceConfig'],
        'space_for_dialog_intent_prediction': ['SpaceForDialogIntent'],
        'space_for_dialog_modeling': ['SpaceForDialogModeling'],
        'space_for_dialog_state_tracking': ['SpaceForDialogStateTracking'],
    }

    import sys

    sys.modules[__name__] = LazyImportModule(
        __name__,
        globals()['__file__'],
        _import_structure,
        module_spec=__spec__,
        extra_objects={},
    )
--- a/modelscope/models/nlp/space/model/init.py
+++ b/modelscope/models/nlp/space/model/init.py
@@ -0,0 +1,10 @@
 from .configuration_space import SpaceConfig
 from .gen_unified_transformer import GenUnifiedTransformer
 from .generator import Generator as SpaceGenerator
 from .intent_unified_transformer import IntentUnifiedTransformer
 from .model_base import SpaceModelBase
 from .modeling_space import (SpaceForDST, SpaceForMaskedLM,
                             SpaceForPreTraining, SpaceModel)
 from .tokenization_space import (BasicTokenizer, SpaceTokenizer,
                                 WordpieceTokenizer)
 from .unified_transformer import UnifiedTransformer
--- a/modelscope/models/nlp/space/model/configuration_space.py
+++ b/modelscope/models/nlp/space/model/configuration_space.py
@@ -0,0 +1,32 @@
 # Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
 # Copyright 2018 The Google AI Language Team Authors.
 # Copyright 2020 The HuggingFace Inc. team.
 # Copyright (c) 2018, NVIDIA CORPORATION.
 # All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """Space configuration, mainly copied from :class:`~transformers.configuration_xlm_roberta` """

 from modelscope.models.nlp.structbert import SbertConfig
 from modelscope.utils import logger as logging

 logger = logging.get_logger(__name__)


 class SpaceConfig(SbertConfig):
    """
    This class overrides [`SbertConfig`]. Please check the superclass for the appropriate
    documentation alongside usage examples.
    """

    model_type = 'space'
--- a/modelscope/models/nlp/backbones/space/model/gen_unified_transformer.py
+++ b/modelscope/models/nlp/backbones/space/model/gen_unified_transformer.py
--- a/modelscope/models/nlp/backbones/space/model/generator.py
+++ b/modelscope/models/nlp/backbones/space/model/generator.py
--- a/modelscope/models/nlp/backbones/space/model/intent_unified_transformer.py
+++ b/modelscope/models/nlp/backbones/space/model/intent_unified_transformer.py
--- a/modelscope/models/nlp/backbones/space/model/model_base.py
+++ b/modelscope/models/nlp/backbones/space/model/model_base.py
--- a/modelscope/models/nlp/space/model/modeling_space.py
+++ b/modelscope/models/nlp/space/model/modeling_space.py
@@ -0,0 +1,268 @@
 # Copyright 2019 Facebook AI Research and the HuggingFace Inc. team.
 # Copyright (c) 2018, NVIDIA CORPORATION.
 # Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
 # All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """PyTorch Space model. mainly copied from :module:`~transformers.modeling_xlm_roberta`"""

 import torch
 from torch import nn
 from torch.nn import CrossEntropyLoss
 from transformers.file_utils import add_start_docstrings

 from modelscope.models.nlp.structbert.modeling_sbert import (
    SbertForMaskedLM, SbertModel, SbertPreTrainedModel)
 from .configuration_space import SpaceConfig

 SPACE_START_DOCSTRING = r"""

    This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic
    methods the library implements for all its model (such as downloading or saving, resizing the input embeddings,
    pruning heads etc.)

    This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)
    subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to
    general usage and behavior.

    Parameters:
        config ([`SpaceConfig`]): Model configuration class with all the parameters of the
            model. Initializing with a config file does not load the weights associated with the model, only the
            configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model
            weights.
 """


@add_start_docstrings(
    'The bare Space Model transformer outputting raw hidden-states without any specific head on top. '
    'It is identical with the Bert Model from Transformers',
    SPACE_START_DOCSTRING,
 )
 class SpaceModel(SbertModel):
    """
    This class overrides [`SbertModel`]. Please check the superclass for the appropriate
    documentation alongside usage examples.
    """

    config_class = SpaceConfig


@add_start_docstrings(
    """
    Space Model transformer with Dialog state tracking heads on top (a inform projection
    layer with a dialog state layer and a set of slots including history infromation from
    previous dialog) e.g. for multiwoz2.2 tasks.
    """,
    SPACE_START_DOCSTRING,
 )
 class SpaceForDST(SbertPreTrainedModel):

    def __init__(self, config):
        super(SpaceForDST, self).__init__(config)
        self.slot_list = config.dst_slot_list
        self.class_types = config.dst_class_types
        self.class_labels = config.dst_class_labels
        self.token_loss_for_nonpointable = config.dst_token_loss_for_nonpointable
        self.refer_loss_for_nonpointable = config.dst_refer_loss_for_nonpointable
        self.class_aux_feats_inform = config.dst_class_aux_feats_inform
        self.class_aux_feats_ds = config.dst_class_aux_feats_ds
        self.class_loss_ratio = config.dst_class_loss_ratio

        # Only use refer loss if refer class is present in dataset.
        if 'refer' in self.class_types:
            self.refer_index = self.class_types.index('refer')
        else:
            self.refer_index = -1

        self.bert = SpaceModel(config)
        self.dropout = nn.Dropout(config.dst_dropout_rate)
        self.dropout_heads = nn.Dropout(config.dst_heads_dropout_rate)

        if self.class_aux_feats_inform:
            self.add_module(
                'inform_projection',
                nn.Linear(len(self.slot_list), len(self.slot_list)))
        if self.class_aux_feats_ds:
            self.add_module(
                'ds_projection',
                nn.Linear(len(self.slot_list), len(self.slot_list)))

        aux_dims = len(self.slot_list) * (
            self.class_aux_feats_inform + self.class_aux_feats_ds
        )  # second term is 0, 1 or 2

        for slot in self.slot_list:
            self.add_module(
                'class_' + slot,
                nn.Linear(config.hidden_size + aux_dims, self.class_labels))
            self.add_module('token_' + slot, nn.Linear(config.hidden_size, 2))
            self.add_module(
                'refer_' + slot,
                nn.Linear(config.hidden_size + aux_dims,
                          len(self.slot_list) + 1))

        self.init_weights()

    def forward(self,
                input_ids,
                input_mask=None,
                segment_ids=None,
                position_ids=None,
                head_mask=None,
                start_pos=None,
                end_pos=None,
                inform_slot_id=None,
                refer_id=None,
                class_label_id=None,
                diag_state=None):
        outputs = self.bert(
            input_ids,
            attention_mask=input_mask,
            token_type_ids=segment_ids,
            position_ids=position_ids,
            head_mask=head_mask)

        sequence_output = outputs[0]
        pooled_output = outputs[1]

        sequence_output = self.dropout(sequence_output)
        pooled_output = self.dropout(pooled_output)

        # TODO: establish proper format in labels already?
        if inform_slot_id is not None:
            inform_labels = torch.stack(list(inform_slot_id.values()),
                                        1).float()
        if diag_state is not None:
            diag_state_labels = torch.clamp(
                torch.stack(list(diag_state.values()), 1).float(), 0.0, 1.0)

        total_loss = 0
        per_slot_per_example_loss = {}
        per_slot_class_logits = {}
        per_slot_start_logits = {}
        per_slot_end_logits = {}
        per_slot_refer_logits = {}
        for slot in self.slot_list:
            if self.class_aux_feats_inform and self.class_aux_feats_ds:
                pooled_output_aux = torch.cat(
                    (pooled_output, self.inform_projection(inform_labels),
                     self.ds_projection(diag_state_labels)), 1)
            elif self.class_aux_feats_inform:
                pooled_output_aux = torch.cat(
                    (pooled_output, self.inform_projection(inform_labels)), 1)
            elif self.class_aux_feats_ds:
                pooled_output_aux = torch.cat(
                    (pooled_output, self.ds_projection(diag_state_labels)), 1)
            else:
                pooled_output_aux = pooled_output
            class_logits = self.dropout_heads(
                getattr(self, 'class_' + slot)(pooled_output_aux))

            token_logits = self.dropout_heads(
                getattr(self, 'token_' + slot)(sequence_output))
            start_logits, end_logits = token_logits.split(1, dim=-1)
            start_logits = start_logits.squeeze(-1)
            end_logits = end_logits.squeeze(-1)

            refer_logits = self.dropout_heads(
                getattr(self, 'refer_' + slot)(pooled_output_aux))

            per_slot_class_logits[slot] = class_logits
            per_slot_start_logits[slot] = start_logits
            per_slot_end_logits[slot] = end_logits
            per_slot_refer_logits[slot] = refer_logits

            # If there are no labels, don't compute loss
            if class_label_id is not None and start_pos is not None and end_pos is not None and refer_id is not None:
                # If we are on multi-GPU, split add a dimension
                if len(start_pos[slot].size()) > 1:
                    start_pos[slot] = start_pos[slot].squeeze(-1)
                if len(end_pos[slot].size()) > 1:
                    end_pos[slot] = end_pos[slot].squeeze(-1)
                # sometimes the start/end positions are outside our model inputs, we ignore these terms
                ignored_index = start_logits.size(1)  # This is a single index
                start_pos[slot].clamp_(0, ignored_index)
                end_pos[slot].clamp_(0, ignored_index)

                class_loss_fct = CrossEntropyLoss(reduction='none')
                token_loss_fct = CrossEntropyLoss(
                    reduction='none', ignore_index=ignored_index)
                refer_loss_fct = CrossEntropyLoss(reduction='none')

                start_loss = token_loss_fct(start_logits, start_pos[slot])
                end_loss = token_loss_fct(end_logits, end_pos[slot])
                token_loss = (start_loss + end_loss) / 2.0

                token_is_pointable = (start_pos[slot] > 0).float()
                if not self.token_loss_for_nonpointable:
                    token_loss *= token_is_pointable

                refer_loss = refer_loss_fct(refer_logits, refer_id[slot])
                token_is_referrable = torch.eq(class_label_id[slot],
                                               self.refer_index).float()
                if not self.refer_loss_for_nonpointable:
                    refer_loss *= token_is_referrable

                class_loss = class_loss_fct(class_logits, class_label_id[slot])

                if self.refer_index > -1:
                    per_example_loss = (self.class_loss_ratio) * class_loss + (
                        (1 - self.class_loss_ratio) / 2) * token_loss + (
                            (1 - self.class_loss_ratio) / 2) * refer_loss
                else:
                    per_example_loss = self.class_loss_ratio * class_loss + (
                        1 - self.class_loss_ratio) * token_loss

                total_loss += per_example_loss.sum()
                per_slot_per_example_loss[slot] = per_example_loss

        # add hidden states and attention if they are here
        outputs = (total_loss, ) + (
            per_slot_per_example_loss,
            per_slot_class_logits,
            per_slot_start_logits,
            per_slot_end_logits,
            per_slot_refer_logits,
        ) + outputs[2:]

        return outputs


@add_start_docstrings(
    'The Space Model Model with a `language modeling` head on tops',
    SPACE_START_DOCSTRING,
 )
 class SpaceForMaskedLM(SbertForMaskedLM):
    """
    This class overrides [`SbertForMaskedLM`]. Please check the superclass for the
    appropriate documentation alongside usage examples.
    """

    config_class = SpaceConfig


@add_start_docstrings(
    """
    Space Model with only one head on top as done during the pretraining: a `masked language modeling` head.
    """,
    SPACE_START_DOCSTRING,
 )
 class SpaceForPreTraining(SbertPreTrainedModel):

    def __init__(self, model_name_or_path: str):
        super(SpaceForPreTraining, self).__init__()
        self.bert_model = SpaceForMaskedLM.from_pretrained(model_name_or_path)

    def forward(self, input_ids: torch.tensor, mlm_labels: torch.tensor):
        outputs = self.bert_model(input_ids, masked_lm_labels=mlm_labels)
        return outputs[0]
--- a/modelscope/models/nlp/space/model/tokenization_space.py
+++ b/modelscope/models/nlp/space/model/tokenization_space.py
@@ -0,0 +1,29 @@
 # Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. team.
 # Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
 # All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License
 """Tokenization classes for Space. mainly copied from :module:`~transformers.tokenization_xlm_roberta`"""

 from modelscope.models.nlp.structbert import (BasicTokenizer, SbertTokenizer,
                                              WordpieceTokenizer)
 from modelscope.utils import logger as logging

 logger = logging.get_logger(__name__)


 class SpaceTokenizer(SbertTokenizer):
    """
    This class overrides [`SpaceTokenizer`]. Please check the superclass for the appropriate
    documentation alongside usage examples.
    """
--- a/modelscope/models/nlp/backbones/space/model/unified_transformer.py
+++ b/modelscope/models/nlp/backbones/space/model/unified_transformer.py
@@ -5,10 +5,9 @@ import torch
 import torch.nn as nn
 import torch.nn.functional as F

 from modelscope.models.nlp.backbones.space.model.model_base import \
    SpaceModelBase
 from modelscope.models.nlp.backbones.space.modules.embedder import Embedder
 from modelscope.models.nlp.backbones.space.modules.transformer_block import \
 from modelscope.models.nlp.space.model.model_base import SpaceModelBase
 from modelscope.models.nlp.space.modules.embedder import Embedder
 from modelscope.models.nlp.space.modules.transformer_block import \
    TransformerBlock


--- a/modelscope/models/nlp/backbones/space/modules/init.py
+++ b/modelscope/models/nlp/backbones/space/modules/init.py
--- a/modelscope/models/nlp/backbones/space/modules/embedder.py
+++ b/modelscope/models/nlp/backbones/space/modules/embedder.py
--- a/modelscope/models/nlp/backbones/space/modules/feedforward.py
+++ b/modelscope/models/nlp/backbones/space/modules/feedforward.py
--- a/modelscope/models/nlp/backbones/space/modules/functions.py
+++ b/modelscope/models/nlp/backbones/space/modules/functions.py
--- a/modelscope/models/nlp/backbones/space/modules/multihead_attention.py
+++ b/modelscope/models/nlp/backbones/space/modules/multihead_attention.py
--- a/modelscope/models/nlp/backbones/space/modules/transformer_block.py
+++ b/modelscope/models/nlp/backbones/space/modules/transformer_block.py
--- a/modelscope/models/nlp/space/space_for_dialog_intent_prediction.py
+++ b/modelscope/models/nlp/space/space_for_dialog_intent_prediction.py
@@ -7,7 +7,7 @@ from modelscope.metainfo import Models
 from modelscope.models import TorchModel
 from modelscope.models.base import Tensor
 from modelscope.models.builder import MODELS
 from modelscope.models.nlp.backbones import SpaceGenerator, SpaceModelBase
 from modelscope.models.nlp.space import SpaceGenerator, SpaceModelBase
 from modelscope.preprocessors.space import IntentBPETextField
 from modelscope.utils.config import Config
 from modelscope.utils.constant import ModelFile, Tasks
--- a/modelscope/models/nlp/space/space_for_dialog_modeling.py
+++ b/modelscope/models/nlp/space/space_for_dialog_modeling.py
@@ -7,7 +7,7 @@ from modelscope.metainfo import Models
 from modelscope.models import TorchModel
 from modelscope.models.base import Tensor
 from modelscope.models.builder import MODELS
 from modelscope.models.nlp.backbones import SpaceGenerator, SpaceModelBase
 from modelscope.models.nlp.space import SpaceGenerator, SpaceModelBase
 from modelscope.preprocessors.space import MultiWOZBPETextField
 from modelscope.utils.config import Config
 from modelscope.utils.constant import ModelFile, Tasks
--- a/modelscope/models/nlp/space/space_for_dialog_state_tracking.py
+++ b/modelscope/models/nlp/space/space_for_dialog_state_tracking.py
@@ -21,7 +21,7 @@ class SpaceForDialogStateTracking(TorchModel):

        super().__init__(model_dir, *args, **kwargs)

        from sofa.models.space import SpaceConfig, SpaceForDST
        from modelscope.models.nlp.space.model import SpaceForDST, SpaceConfig
        self.model_dir = model_dir

        self.config = SpaceConfig.from_pretrained(self.model_dir)
--- a/modelscope/models/nlp/structbert/init.py
+++ b/modelscope/models/nlp/structbert/init.py
@@ -0,0 +1,45 @@
 # Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
 # All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.

 from typing import TYPE_CHECKING

 from modelscope.utils.import_utils import LazyImportModule

 if TYPE_CHECKING:
    from .configuration_sbert import SbertConfig
    from .modeling_sbert import (SbertForMaskedLM, SbertModel,
                                 SbertPreTrainedModel)
    from .tokenization_sbert import (BasicTokenizer, SbertTokenizer,
                                     WordpieceTokenizer)
    from .tokenization_sbert_fast import SbertTokenizerFast
 else:
    _import_structure = {
        'configuration_sbert': ['SbertConfig'],
        'modeling_sbert':
        ['SbertForMaskedLM', 'SbertModel', 'SbertPreTrainedModel'],
        'tokenization_sbert':
        ['BasicTokenizer', 'SbertTokenizer', 'WordpieceTokenizer'],
        'tokenization_sbert_fast': ['SbertTokenizerFast'],
    }

    import sys

    sys.modules[__name__] = LazyImportModule(
        __name__,
        globals()['__file__'],
        _import_structure,
        module_spec=__spec__,
        extra_objects={},
    )
--- a/modelscope/models/nlp/backbones/structbert/adv_utils.py
+++ b/modelscope/models/nlp/backbones/structbert/adv_utils.py
@@ -59,7 +59,8 @@ def compute_adv_loss(embedding,
    """
    Calculate the adv loss of the model.
    :param embedding: Original sentense embedding
    :param model: The model or the forward function(including decoder/classifier), accept kwargs as input, output logits
    :param model: The model, or the forward function(including decoder/classifier),
            accept kwargs as input, output logits
    :param ori_logits: The original logits outputed from the model function
    :param ori_loss: The original loss
    :param adv_grad_factor: This factor will be multipled by the KL loss grad and then the result will be added to
@@ -119,7 +120,8 @@ def compute_adv_loss_pair(embedding,
    """
    Calculate the adv loss of the model. This function is used in the pair logits scenerio.
    :param embedding: Original sentense embedding
    :param model: The model or the forward function(including decoder/classifier), accept kwargs as input, output logits
    :param model: The model, or the forward function(including decoder/classifier),
            accept kwargs as input, output logits
    :param start_logits: The original start logits outputed from the model function
    :param end_logits: The original end logits outputed from the model function
    :param ori_loss: The original loss
--- a/modelscope/models/nlp/backbones/structbert/configuration_sbert.py
+++ b/modelscope/models/nlp/backbones/structbert/configuration_sbert.py
@@ -24,11 +24,12 @@ logger = logging.get_logger(__name__)

 class SbertConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a :class:`~sofa.models.SbertModel`.
    This is the configuration class to store the configuration
    of a :class:`~modelscope.models.nlp.structbert.SbertModel`.
    It is used to instantiate a SBERT model according to the specified arguments.

    Configuration objects inherit from :class:`~sofa.utils.PretrainedConfig` and can be used to control the model
    outputs. Read the documentation from :class:`~sofa.utils.PretrainedConfig` for more information.
    Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
    outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.


    Args:
@@ -99,11 +100,13 @@ class SbertConfig(PretrainedConfig):
                 type_vocab_size=2,
                 initializer_range=0.02,
                 layer_norm_eps=1e-12,
                 pad_token_id=0,
                 position_embedding_type='absolute',
                 use_cache=True,
                 classifier_dropout=None,
                 **kwargs):
        super().__init__(**kwargs)
        super().__init__(pad_token_id=pad_token_id, **kwargs)

        self.vocab_size = vocab_size
        self.hidden_size = hidden_size
        self.num_hidden_layers = num_hidden_layers
--- a/modelscope/models/nlp/structbert/modeling_sbert.py
+++ b/modelscope/models/nlp/structbert/modeling_sbert.py
--- a/modelscope/models/nlp/structbert/tokenization_sbert.py
+++ b/modelscope/models/nlp/structbert/tokenization_sbert.py
@@ -0,0 +1,516 @@
 # Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
 # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
 # All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """Tokenization classes for Sbert. mainly copied from :module:`~transformers.tokenization_bert`"""

 import collections
 import os
 import unicodedata
 from typing import List, Optional, Tuple

 from transformers.tokenization_utils import (PreTrainedTokenizer, _is_control,
                                             _is_punctuation, _is_whitespace)

 from modelscope.utils.logger import get_logger

 logger = get_logger(__name__)

 VOCAB_FILES_NAMES = {'vocab_file': 'vocab.txt'}

 PRETRAINED_VOCAB_FILES_MAP = {'vocab_file': {}}

 PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
    'chinese_sbert-large-std-512': 512,
    'english_sbert-large-std-512': 512,
 }

 PRETRAINED_INIT_CONFIGURATION = {
    'english_sbert-large-std-512': {
        'do_lower_case': True
    },
 }


 def load_vocab(vocab_file):
    """Loads a vocabulary file into a dictionary."""
    vocab = collections.OrderedDict()
    with open(vocab_file, 'r', encoding='utf-8') as reader:
        tokens = reader.readlines()
    for index, token in enumerate(tokens):
        token = token.rstrip('\n')
        vocab[token] = index
    return vocab


 def whitespace_tokenize(text):
    """Runs basic whitespace cleaning and splitting on a piece of text."""
    text = text.strip()
    if not text:
        return []
    tokens = text.split()
    return tokens


 class SbertTokenizer(PreTrainedTokenizer):
    r"""
    Construct a SBERT tokenizer. Based on WordPiece.

    This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the main methods.
    Users should refer to this superclass for more information regarding those methods.

    Args:
        vocab_file (:obj:`str`):
            File containing the vocabulary.
        do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
            Whether or not to lowercase the input when tokenizing.
        do_basic_tokenize (:obj:`bool`, `optional`, defaults to :obj:`True`):
            Whether or not to do basic tokenization before WordPiece.
        never_split (:obj:`Iterable`, `optional`):
            Collection of tokens which will never be split during tokenization. Only has an effect when
            :obj:`do_basic_tokenize=True`
        unk_token (:obj:`str`, `optional`, defaults to :obj:`"[UNK]"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
        sep_token (:obj:`str`, `optional`, defaults to :obj:`"[SEP]"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
        pad_token (:obj:`str`, `optional`, defaults to :obj:`"[PAD]"`):
            The token used for padding, for example when batching sequences of different lengths.
        cls_token (:obj:`str`, `optional`, defaults to :obj:`"[CLS]"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
        mask_token (:obj:`str`, `optional`, defaults to :obj:`"[MASK]"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
        tokenize_chinese_chars (:obj:`bool`, `optional`, defaults to :obj:`True`):
            Whether or not to tokenize Chinese characters.

            This should likely be deactivated for Japanese (see this `issue
            <https://github.com/huggingface/transformers/issues/328>`__).
        strip_accents: (:obj:`bool`, `optional`):
            Whether or not to strip all accents. If this option is not specified, then it will be determined by the
            value for :obj:`lowercase` (as in the original BERT).
    """

    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    pretrained_init_configuration = PRETRAINED_INIT_CONFIGURATION
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

    def __init__(self,
                 vocab_file,
                 do_lower_case=True,
                 do_basic_tokenize=True,
                 never_split=None,
                 unk_token='[UNK]',
                 sep_token='[SEP]',
                 pad_token='[PAD]',
                 cls_token='[CLS]',
                 mask_token='[MASK]',
                 tokenize_chinese_chars=True,
                 strip_accents=None,
                 **kwargs):
        super().__init__(
            do_lower_case=do_lower_case,
            do_basic_tokenize=do_basic_tokenize,
            never_split=never_split,
            unk_token=unk_token,
            sep_token=sep_token,
            pad_token=pad_token,
            cls_token=cls_token,
            mask_token=mask_token,
            tokenize_chinese_chars=tokenize_chinese_chars,
            strip_accents=strip_accents,
            **kwargs,
        )

        if not os.path.isfile(vocab_file):
            raise ValueError(
                f"Can't find a vocabulary file at path '{vocab_file}'. To load the vocabulary from a Google pretrained "
                'model use `tokenizer = SbertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)`'
            )
        self.vocab = load_vocab(vocab_file)
        self.ids_to_tokens = collections.OrderedDict([
            (ids, tok) for tok, ids in self.vocab.items()
        ])
        self.do_basic_tokenize = do_basic_tokenize
        if do_basic_tokenize:
            self.basic_tokenizer = BasicTokenizer(
                do_lower_case=do_lower_case,
                never_split=never_split,
                tokenize_chinese_chars=tokenize_chinese_chars,
                strip_accents=strip_accents,
            )
        self.wordpiece_tokenizer = WordpieceTokenizer(
            vocab=self.vocab, unk_token=self.unk_token)

    @property
    def do_lower_case(self):
        return self.basic_tokenizer.do_lower_case

    @property
    def vocab_size(self):
        return len(self.vocab)

    def get_vocab(self):
        return dict(self.vocab, **self.added_tokens_encoder)

    def _tokenize(self, text):
        split_tokens = []
        if self.do_basic_tokenize:
            for token in self.basic_tokenizer.tokenize(
                    text, never_split=self.all_special_tokens):

                # If the token is part of the never_split set
                if token in self.basic_tokenizer.never_split:
                    split_tokens.append(token)
                else:
                    split_tokens += self.wordpiece_tokenizer.tokenize(token)
        else:
            split_tokens = self.wordpiece_tokenizer.tokenize(text)
        return split_tokens

    def _convert_token_to_id(self, token):
        """Converts a token (str) in an id using the vocab."""
        return self.vocab.get(token, self.vocab.get(self.unk_token))

    def _convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab."""
        return self.ids_to_tokens.get(index, self.unk_token)

    def convert_tokens_to_string(self, tokens):
        """Converts a sequence of tokens (string) in a single string."""
        out_string = ' '.join(tokens).replace(' ##', '').strip()
        return out_string

    def build_inputs_with_special_tokens(
            self,
            token_ids_0: List[int],
            token_ids_1: Optional[List[int]] = None) -> List[int]:
        """
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. A SBERT sequence has the following format:

        - single sequence: ``[CLS] X [SEP]``
        - pair of sequences: ``[CLS] A [SEP] B [SEP]``

        Args:
            token_ids_0 (:obj:`List[int]`):
                List of IDs to which the special tokens will be added.
            token_ids_1 (:obj:`List[int]`, `optional`):
                Optional second list of IDs for sequence pairs.

        Returns:
            :obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
        """
        if token_ids_1 is None:
            return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
        cls = [self.cls_token_id]
        sep = [self.sep_token_id]
        return cls + token_ids_0 + sep + token_ids_1 + sep

    def get_special_tokens_mask(
            self,
            token_ids_0: List[int],
            token_ids_1: Optional[List[int]] = None,
            already_has_special_tokens: bool = False) -> List[int]:
        """
        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
        special tokens using the tokenizer ``prepare_for_model`` method.

        Args:
            token_ids_0 (:obj:`List[int]`):
                List of IDs.
            token_ids_1 (:obj:`List[int]`, `optional`):
                Optional second list of IDs for sequence pairs.
            already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`):
                Whether or not the token list is already formatted with special tokens for the model.

        Returns:
            :obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
        """

        if already_has_special_tokens:
            return super().get_special_tokens_mask(
                token_ids_0=token_ids_0,
                token_ids_1=token_ids_1,
                already_has_special_tokens=True)

        if token_ids_1 is not None:
            return [1] + ([0] * len(token_ids_0)) + [1] + (
                [0] * len(token_ids_1)) + [1]
        return [1] + ([0] * len(token_ids_0)) + [1]

    def create_token_type_ids_from_sequences(
            self,
            token_ids_0: List[int],
            token_ids_1: Optional[List[int]] = None) -> List[int]:
        """
        Create a mask from the two sequences passed to be used in a sequence-pair classification task. A SBERT sequence
        pair mask has the following format:

        ::

            0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
            | first sequence    | second sequence |

        If :obj:`token_ids_1` is :obj:`None`, this method only returns the first portion of the mask (0s).

        Args:
            token_ids_0 (:obj:`List[int]`):
                List of IDs.
            token_ids_1 (:obj:`List[int]`, `optional`):
                Optional second list of IDs for sequence pairs.

        Returns:
            :obj:`List[int]`: List of `token type IDs <../glossary.html#token-type-ids>`_ according to the given
            sequence(s).
        """
        sep = [self.sep_token_id]
        cls = [self.cls_token_id]
        if token_ids_1 is None:
            return len(cls + token_ids_0 + sep) * [0]
        return len(cls + token_ids_0 + sep) * [0] + len(token_ids_1
                                                        + sep) * [1]

    def save_vocabulary(self,
                        save_directory: str,
                        filename_prefix: Optional[str] = None) -> Tuple[str]:
        index = 0
        if os.path.isdir(save_directory):
            vocab_file = os.path.join(
                save_directory,
                (filename_prefix + '-' if filename_prefix else '')
                + VOCAB_FILES_NAMES['vocab_file'])
        else:
            vocab_file = (filename_prefix
                          + '-' if filename_prefix else '') + save_directory
        with open(vocab_file, 'w', encoding='utf-8') as writer:
            for token, token_index in sorted(
                    self.vocab.items(), key=lambda kv: kv[1]):
                if index != token_index:
                    logger.warning(
                        f'Saving vocabulary to {vocab_file}: vocabulary indices are not consecutive.'
                        ' Please check that the vocabulary is not corrupted!')
                    index = token_index
                writer.write(token + '\n')
                index += 1
        return (vocab_file, )


 class BasicTokenizer(object):
    """
    Constructs a BasicTokenizer that will run basic tokenization (punctuation splitting, lower casing, etc.).

    Args:
        do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
            Whether or not to lowercase the input when tokenizing.
        never_split (:obj:`Iterable`, `optional`):
            Collection of tokens which will never be split during tokenization. Only has an effect when
            :obj:`do_basic_tokenize=True`
        tokenize_chinese_chars (:obj:`bool`, `optional`, defaults to :obj:`True`):
            Whether or not to tokenize Chinese characters.

            This should likely be deactivated for Japanese (see this `issue
            <https://github.com/huggingface/transformers/issues/328>`__).
        strip_accents: (:obj:`bool`, `optional`):
            Whether or not to strip all accents. If this option is not specified, then it will be determined by the
            value for :obj:`lowercase` (as in the original BERT).
    """

    def __init__(self,
                 do_lower_case=True,
                 never_split=None,
                 tokenize_chinese_chars=True,
                 strip_accents=None):
        if never_split is None:
            never_split = []
        self.do_lower_case = do_lower_case
        self.never_split = set(never_split)
        self.tokenize_chinese_chars = tokenize_chinese_chars
        self.strip_accents = strip_accents

    def tokenize(self, text, never_split=None):
        """
        Basic Tokenization of a piece of text. Split on "white spaces" only, for sub-word tokenization, see
        WordPieceTokenizer.

        Args:
            **never_split**: (`optional`) list of str
                Kept for backward compatibility purposes. Now implemented directly at the base class level (see
                :func:`PreTrainedTokenizer.tokenize`) List of token not to split.
        """
        # union() returns a new set by concatenating the two sets.
        never_split = self.never_split.union(
            set(never_split)) if never_split else self.never_split
        text = self._clean_text(text)

        # This was added on November 1st, 2018 for the multilingual and Chinese
        # models. This is also applied to the English models now, but it doesn't
        # matter since the English models were not trained on any Chinese data
        # and generally don't have any Chinese data in them (there are Chinese
        # characters in the vocabulary because Wikipedia does have some Chinese
        # words in the English Wikipedia.).
        if self.tokenize_chinese_chars:
            text = self._tokenize_chinese_chars(text)
        orig_tokens = whitespace_tokenize(text)
        split_tokens = []
        for token in orig_tokens:
            if token not in never_split:
                if self.do_lower_case:
                    token = token.lower()
                    if self.strip_accents is not False:
                        token = self._run_strip_accents(token)
                elif self.strip_accents:
                    token = self._run_strip_accents(token)
            split_tokens.extend(self._run_split_on_punc(token, never_split))

        output_tokens = whitespace_tokenize(' '.join(split_tokens))
        return output_tokens

    def _run_strip_accents(self, text):
        """Strips accents from a piece of text."""
        text = unicodedata.normalize('NFD', text)
        output = []
        for char in text:
            cat = unicodedata.category(char)
            if cat == 'Mn':
                continue
            output.append(char)
        return ''.join(output)

    def _run_split_on_punc(self, text, never_split=None):
        """Splits punctuation on a piece of text."""
        if never_split is not None and text in never_split:
            return [text]
        chars = list(text)
        i = 0
        start_new_word = True
        output = []
        while i < len(chars):
            char = chars[i]
            if _is_punctuation(char):
                output.append([char])
                start_new_word = True
            else:
                if start_new_word:
                    output.append([])
                start_new_word = False
                output[-1].append(char)
            i += 1

        return [''.join(x) for x in output]

    def _tokenize_chinese_chars(self, text):
        """Adds whitespace around any CJK character."""
        output = []
        for char in text:
            cp = ord(char)
            if self._is_chinese_char(cp):
                output.append(' ')
                output.append(char)
                output.append(' ')
            else:
                output.append(char)
        return ''.join(output)

    def _is_chinese_char(self, cp):
        """Checks whether CP is the codepoint of a CJK character."""
        # This defines a "chinese character" as anything in the CJK Unicode block:
        #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
        #
        # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
        # despite its name. The modern Korean Hangul alphabet is a different block,
        # as is Japanese Hiragana and Katakana. Those alphabets are used to write
        # space-separated words, so they are not treated specially and handled
        # like the all of the other languages.
        if ((0x4E00 <= cp <= 0x9FFF) or (0x3400 <= cp <= 0x4DBF)
                or (0x20000 <= cp <= 0x2A6DF) or (0x2A700 <= cp <= 0x2B73F)
                or (0x2B740 <= cp <= 0x2B81F) or (0x2B820 <= cp <= 0x2CEAF)
                or (0xF900 <= cp <= 0xFAFF) or (0x2F800 <= cp <= 0x2FA1F)):
            return True

        return False

    def _clean_text(self, text):
        """Performs invalid character removal and whitespace cleanup on text."""
        output = []
        for char in text:
            cp = ord(char)
            if cp == 0 or cp == 0xFFFD or _is_control(char):
                continue
            if _is_whitespace(char):
                output.append(' ')
            else:
                output.append(char)
        return ''.join(output)


 class WordpieceTokenizer(object):
    """Runs WordPiece tokenization."""

    def __init__(self, vocab, unk_token, max_input_chars_per_word=100):
        self.vocab = vocab
        self.unk_token = unk_token
        self.max_input_chars_per_word = max_input_chars_per_word

    def tokenize(self, text):
        """
        Tokenizes a piece of text into its word pieces. This uses a greedy longest-match-first algorithm to perform
        tokenization using the given vocabulary.

        For example, :obj:`input = "unaffable"` wil return as output :obj:`["un", "##aff", "##able"]`.

        Args:
          text: A single token or whitespace separated tokens. This should have
            already been passed through `BasicTokenizer`.

        Returns:
          A list of wordpiece tokens.
        """

        output_tokens = []
        for token in whitespace_tokenize(text):
            chars = list(token)
            if len(chars) > self.max_input_chars_per_word:
                output_tokens.append(self.unk_token)
                continue

            is_bad = False
            start = 0
            sub_tokens = []
            while start < len(chars):
                end = len(chars)
                cur_substr = None
                while start < end:
                    substr = ''.join(chars[start:end])
                    if start > 0:
                        substr = '##' + substr
                    if substr in self.vocab:
                        cur_substr = substr
                        break
                    end -= 1
                if cur_substr is None:
                    is_bad = True
                    break
                sub_tokens.append(cur_substr)
                start = end

            if is_bad:
                output_tokens.append(self.unk_token)
            else:
                output_tokens.extend(sub_tokens)
        return output_tokens
--- a/modelscope/models/nlp/structbert/tokenization_sbert_fast.py
+++ b/modelscope/models/nlp/structbert/tokenization_sbert_fast.py
@@ -0,0 +1,200 @@
 # Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
 # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
 # All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """Fast Tokenization classes for Sbert. mainly copied from :module:`~transformers.tokenization_bert_fast`"""

 from typing import List, Optional, Tuple

 import json
 import transformers
 from tokenizers import normalizers
 from transformers.tokenization_utils_fast import PreTrainedTokenizerFast

 from modelscope.utils.logger import get_logger
 from .tokenization_sbert import SbertTokenizer

 logger = get_logger(__name__)

 VOCAB_FILES_NAMES = {
    'vocab_file': 'vocab.txt',
    'tokenizer_file': 'tokenizer.json'
 }

 PRETRAINED_VOCAB_FILES_MAP = {
    'vocab_file': {},
    'tokenizer_file': {},
 }

 PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
    'chinese_sbert-large-std-512': 512,
    'english_sbert-large-std-512': 512,
 }

 PRETRAINED_INIT_CONFIGURATION = {
    'english_sbert-large-std-512': {
        'do_lower_case': True
    },
 }

 transformers.SLOW_TO_FAST_CONVERTERS[
    'SbertTokenizer'] = transformers.SLOW_TO_FAST_CONVERTERS['BertTokenizer']


 class SbertTokenizerFast(PreTrainedTokenizerFast):
    r"""
    Construct a "fast" SBERT tokenizer (backed by HuggingFace's `tokenizers` library). Based on WordPiece.

    This tokenizer inherits from :class:`~transformers.PreTrainedTokenizerFast` which contains most of the main
    methods. Users should refer to this superclass for more information regarding those methods.

    Args:
        vocab_file (:obj:`str`):
            File containing the vocabulary.
        do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
            Whether or not to lowercase the input when tokenizing.
        unk_token (:obj:`str`, `optional`, defaults to :obj:`"[UNK]"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
        sep_token (:obj:`str`, `optional`, defaults to :obj:`"[SEP]"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
        pad_token (:obj:`str`, `optional`, defaults to :obj:`"[PAD]"`):
            The token used for padding, for example when batching sequences of different lengths.
        cls_token (:obj:`str`, `optional`, defaults to :obj:`"[CLS]"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
        mask_token (:obj:`str`, `optional`, defaults to :obj:`"[MASK]"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
        clean_text (:obj:`bool`, `optional`, defaults to :obj:`True`):
            Whether or not to clean the text before tokenization by removing any control characters and replacing all
            whitespaces by the classic one.
        tokenize_chinese_chars (:obj:`bool`, `optional`, defaults to :obj:`True`):
            Whether or not to tokenize Chinese characters. This should likely be deactivated for Japanese (see `this
            issue <https://github.com/huggingface/transformers/issues/328>`__).
        strip_accents: (:obj:`bool`, `optional`):
            Whether or not to strip all accents. If this option is not specified, then it will be determined by the
            value for :obj:`lowercase` (as in the original BERT).
        wordpieces_prefix: (:obj:`str`, `optional`, defaults to :obj:`"##"`):
            The prefix for subwords.
    """

    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    pretrained_init_configuration = PRETRAINED_INIT_CONFIGURATION
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    slow_tokenizer_class = SbertTokenizer

    def __init__(self,
                 vocab_file=None,
                 tokenizer_file=None,
                 do_lower_case=True,
                 unk_token='[UNK]',
                 sep_token='[SEP]',
                 pad_token='[PAD]',
                 cls_token='[CLS]',
                 mask_token='[MASK]',
                 tokenize_chinese_chars=True,
                 strip_accents=None,
                 **kwargs):
        super().__init__(
            vocab_file,
            tokenizer_file=tokenizer_file,
            do_lower_case=do_lower_case,
            unk_token=unk_token,
            sep_token=sep_token,
            pad_token=pad_token,
            cls_token=cls_token,
            mask_token=mask_token,
            tokenize_chinese_chars=tokenize_chinese_chars,
            strip_accents=strip_accents,
            **kwargs,
        )

        pre_tok_state = json.loads(
            self.backend_tokenizer.normalizer.__getstate__())
        if (pre_tok_state.get('lowercase', do_lower_case) != do_lower_case
                or pre_tok_state.get('strip_accents',
                                     strip_accents) != strip_accents):
            pre_tok_class = getattr(normalizers, pre_tok_state.pop('type'))
            pre_tok_state['lowercase'] = do_lower_case
            pre_tok_state['strip_accents'] = strip_accents
            self.backend_tokenizer.normalizer = pre_tok_class(**pre_tok_state)

        self.do_lower_case = do_lower_case

    def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
        """
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. A SBERT sequence has the following format:

        - single sequence: ``[CLS] X [SEP]``
        - pair of sequences: ``[CLS] A [SEP] B [SEP]``

        Args:
            token_ids_0 (:obj:`List[int]`):
                List of IDs to which the special tokens will be added.
            token_ids_1 (:obj:`List[int]`, `optional`):
                Optional second list of IDs for sequence pairs.

        Returns:
            :obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
        """
        output = [self.cls_token_id] + token_ids_0 + [self.sep_token_id]

        if token_ids_1:
            output += token_ids_1 + [self.sep_token_id]

        return output

    def create_token_type_ids_from_sequences(
            self,
            token_ids_0: List[int],
            token_ids_1: Optional[List[int]] = None) -> List[int]:
        """
        Create a mask from the two sequences passed to be used in a sequence-pair classification task. A SBERT sequence
        pair mask has the following format:

        ::

            0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
            | first sequence    | second sequence |

        If :obj:`token_ids_1` is :obj:`None`, this method only returns the first portion of the mask (0s).

        Args:
            token_ids_0 (:obj:`List[int]`):
                List of IDs.
            token_ids_1 (:obj:`List[int]`, `optional`):
                Optional second list of IDs for sequence pairs.

        Returns:
            :obj:`List[int]`: List of `token type IDs <../glossary.html#token-type-ids>`_ according to the given
            sequence(s).
        """
        sep = [self.sep_token_id]
        cls = [self.cls_token_id]
        if token_ids_1 is None:
            return len(cls + token_ids_0 + sep) * [0]
        return len(cls + token_ids_0 + sep) * [0] + len(token_ids_1
                                                        + sep) * [1]

    def save_vocabulary(self,
                        save_directory: str,
                        filename_prefix: Optional[str] = None) -> Tuple[str]:
        files = self._tokenizer.model.save(
            save_directory, name=filename_prefix)
        return tuple(files)
--- a/modelscope/models/nlp/task_models/init.py
+++ b/modelscope/models/nlp/task_models/init.py
--- a/modelscope/models/nlp/task_models/sequence_classification.py
+++ b/modelscope/models/nlp/task_models/sequence_classification.py
@@ -0,0 +1,86 @@
 import os
 from typing import Any, Dict

 import json
 import numpy as np

 from modelscope.metainfo import TaskModels
 from modelscope.models.builder import MODELS
 from modelscope.models.nlp.task_models.task_model import \
    SingleBackboneTaskModelBase
 from modelscope.outputs import OutputKeys
 from modelscope.utils.constant import Tasks

 __all__ = ['SequenceClassificationModel']


@MODELS.register_module(
    Tasks.sentiment_classification, module_name=TaskModels.text_classification)
@MODELS.register_module(
    Tasks.text_classification, module_name=TaskModels.text_classification)
 class SequenceClassificationModel(SingleBackboneTaskModelBase):

    def __init__(self, model_dir: str, *args, **kwargs):
        """initialize the sequence classification model from the `model_dir` path.

        Args:
            model_dir (str): the model path.
        """
        super().__init__(model_dir, *args, **kwargs)
        if 'base_model_prefix' in kwargs:
            self._base_model_prefix = kwargs['base_model_prefix']

        backbone_cfg = self.cfg.backbone
        head_cfg = self.cfg.head

        # get the num_labels from label_mapping.json
        self.id2label = {}
        self.label_path = os.path.join(model_dir, 'label_mapping.json')
        if os.path.exists(self.label_path):
            with open(self.label_path) as f:
                self.label_mapping = json.load(f)
            self.id2label = {
                idx: name
                for name, idx in self.label_mapping.items()
            }
        head_cfg['num_labels'] = len(self.label_mapping)

        self.build_backbone(backbone_cfg)
        self.build_head(head_cfg)

    def forward(self, input: Dict[str, Any]) -> Dict[str, np.ndarray]:
        outputs = super().forward(input)
        sequence_output, pooled_output = self.extract_backbone_outputs(outputs)
        outputs = self.head.forward(pooled_output)
        if 'labels' in input:
            loss = self.compute_loss(outputs, input['labels'])
            outputs.update(loss)
        return outputs

    def extract_logits(self, outputs):
        return outputs[OutputKeys.LOGITS].cpu().detach()

    def extract_backbone_outputs(self, outputs):
        sequence_output = None
        pooled_output = None
        if hasattr(self.backbone, 'extract_sequence_outputs'):
            sequence_output = self.backbone.extract_sequence_outputs(outputs)
        if hasattr(self.backbone, 'extract_pooled_outputs'):
            pooled_output = self.backbone.extract_pooled_outputs(outputs)
        return sequence_output, pooled_output

    def compute_loss(self, outputs, labels):
        loss = self.head.compute_loss(outputs, labels)
        return loss

    def postprocess(self, input, **kwargs):
        logits = self.extract_logits(input)
        probs = logits.softmax(-1).numpy()
        pred = logits.argmax(-1).numpy()
        logits = logits.numpy()
        res = {
            OutputKeys.PREDICTIONS: pred,
            OutputKeys.PROBABILITIES: probs,
            OutputKeys.LOGITS: logits
        }
        return res
--- a/modelscope/models/nlp/task_models/task_model.py
+++ b/modelscope/models/nlp/task_models/task_model.py
@@ -11,8 +11,8 @@ from modelscope.models.base import TorchModel
 from modelscope.models.builder import build_backbone, build_head
 from modelscope.utils.config import ConfigDict
 from modelscope.utils.constant import Fields, Tasks
 from modelscope.utils.file_utils import func_receive_dict_inputs
 from modelscope.utils.logger import get_logger
 from modelscope.utils.utils import if_func_receive_dict_inputs

 logger = get_logger(__name__)

@@ -424,12 +424,15 @@ class SingleBackboneTaskModelBase(BaseTaskModel):

    def forward(self, input: Dict[str, Any]) -> Dict[str, Any]:
        """default forward method is the backbone-only forward"""
        if if_func_receive_dict_inputs(self.backbone.forward):
        if func_receive_dict_inputs(self.backbone.forward):
            outputs = self.backbone.forward(input)
        else:
            outputs = self.backbone.forward(**input)
        return outputs

    def compute_loss(self, outputs: Dict[str, Any], labels):
        raise NotImplementedError()


 class EncoderDecoderTaskModelBase(BaseTaskModel):
    """
@@ -472,13 +475,13 @@ class EncoderDecoderTaskModelBase(BaseTaskModel):
        return getattr(self, self._decoder_prefix)

    def forward(self, input: Dict[str, Any]) -> Dict[str, Any]:
        if if_func_receive_dict_inputs(self.encoder_.forward):
        if func_receive_dict_inputs(self.encoder_.forward):
            encoder_outputs = self.encoder_.forward(input)
        else:
            encoder_outputs = self.encoder_.forward(**input)
        decoder_inputs = self.project_decoder_inputs_and_mediate(
            input, encoder_outputs)
        if if_func_receive_dict_inputs(self.decoder_.forward):
        if func_receive_dict_inputs(self.decoder_.forward):
            outputs = self.decoder_.forward(decoder_inputs)
        else:
            outputs = self.decoder_.forward(**decoder_inputs)
--- a/modelscope/models/nlp/token_classification.py
+++ b/modelscope/models/nlp/token_classification.py
@@ -0,0 +1,147 @@
 from abc import abstractmethod
 from typing import Dict

 import numpy as np
 import torch
 from torch import nn

 from modelscope.metainfo import Models
 from modelscope.models.base import TorchModel
 from modelscope.models.builder import MODELS
 from modelscope.outputs import OutputKeys
 from modelscope.utils.constant import Tasks
 from modelscope.utils.hub import parse_label_mapping
 from modelscope.utils.tensor_utils import (torch_nested_detach,
                                           torch_nested_numpify)
 from .structbert import SbertPreTrainedModel

 __all__ = ['SbertForTokenClassification']


 class TokenClassification(TorchModel):

    base_model_prefix: str = 'bert'

    def __init__(self, config, model_dir):
        super().__init__(model_dir)
        self.num_labels = config.num_labels
        self.config = config
        setattr(self, self.base_model_prefix, self.build_base_model())
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None
            else config.hidden_dropout_prob)
        self.dropout = nn.Dropout(classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    @abstractmethod
    def build_base_model(self):
        """Build the backbone model.

        Returns: the backbone instance.
        """
        pass

    @property
    def base_model(self):
        return getattr(self, self.base_model_prefix)

    def compute_loss(self, logits, labels, **kwargs):
        """Compute loss.

        For example, if backbone is pretrained model, there will be a 'attention_mask' parameter to skip
        useless tokens.

        Args:
            logits: The logits from the classifier
            labels: The labels
            **kwargs: Other input params.

        Returns: Loss.

        """
        pass

    def forward(self, **kwargs):
        labels = None
        if OutputKeys.LABEL in kwargs:
            labels = kwargs.pop(OutputKeys.LABEL)
        elif OutputKeys.LABELS in kwargs:
            labels = kwargs.pop(OutputKeys.LABELS)

        outputs = self.base_model(**kwargs)
        # base model should return the sequence_output as its first output
        sequence_output = outputs[0]
        sequence_output = self.dropout(sequence_output)
        logits = self.classifier(sequence_output)
        if labels is not None:
            loss = self.compute_loss(logits, labels, **kwargs)
            return {OutputKeys.LOGITS: logits, OutputKeys.LOSS: loss}
        return {OutputKeys.LOGITS: logits}

    def postprocess(self, input: Dict[str, np.ndarray],
                    **kwargs) -> Dict[str, np.ndarray]:
        logits = input[OutputKeys.LOGITS]
        pred = torch.argmax(logits[0], dim=-1)
        pred = torch_nested_numpify(torch_nested_detach(pred))
        logits = torch_nested_numpify(torch_nested_detach(logits))
        rst = {OutputKeys.PREDICTIONS: pred, OutputKeys.LOGITS: logits}
        return rst


@MODELS.register_module(Tasks.word_segmentation, module_name=Models.structbert)
@MODELS.register_module(
    Tasks.token_classification, module_name=Models.structbert)
 class SbertForTokenClassification(TokenClassification, SbertPreTrainedModel):

    supports_gradient_checkpointing = True
    _keys_to_ignore_on_load_unexpected = [r'pooler']

    def __init__(self, config, model_dir):
        if hasattr(config, 'base_model_prefix'):
            SbertForTokenClassification.base_model_prefix = config.base_model_prefix
        super().__init__(config, model_dir)

    def build_base_model(self):
        from .structbert import SbertModel
        return SbertModel(self.config, add_pooling_layer=False)

    def forward(self,
                input_ids=None,
                attention_mask=None,
                token_type_ids=None,
                labels=None,
                **kwargs):
        return super().forward(
            input_ids=input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            labels=labels)

    def compute_loss(self, logits, labels, attention_mask=None, **kwargs):
        loss_fct = nn.CrossEntropyLoss()
        # Only keep active parts of the loss
        if attention_mask is not None:
            active_loss = attention_mask.view(-1) == 1
            active_logits = logits.view(-1, self.num_labels)
            active_labels = torch.where(
                active_loss, labels.view(-1),
                torch.tensor(loss_fct.ignore_index).type_as(labels))
            return loss_fct(active_logits, active_labels)
        else:
            return loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

    @classmethod
    def _instantiate(cls, **kwargs):
        model_dir = kwargs.get('model_dir')
        num_labels = kwargs.get('num_labels')
        if num_labels is None:
            label2id = parse_label_mapping(model_dir)
            if label2id is not None and len(label2id) > 0:
                num_labels = len(label2id)

        model_args = {} if num_labels is None else {'num_labels': num_labels}
        return super(SbertPreTrainedModel,
                     SbertForTokenClassification).from_pretrained(
                         pretrained_model_name_or_path=kwargs.get('model_dir'),
                         model_dir=kwargs.get('model_dir'),
                         **model_args)
--- a/modelscope/models/nlp/veco/init.py
+++ b/modelscope/models/nlp/veco/init.py
@@ -0,0 +1,43 @@
 # Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
 # All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.

 from typing import TYPE_CHECKING

 from modelscope.utils.import_utils import LazyImportModule

 if TYPE_CHECKING:
    from .configuration_veco import VecoConfig
    from .modeling_veco import (VecoForMaskedLM, VecoForSequenceClassification,
                                VecoModel)
    from .tokenization_veco import VecoTokenizer
    from .tokenization_veco_fast import VecoTokenizerFast
 else:
    _import_structure = {
        'configuration_veco': ['VecoConfig'],
        'modeling_veco':
        ['VecoForMaskedLM', 'VecoForSequenceClassification', 'VecoModel'],
        'tokenization_veco': ['VecoTokenizer'],
        'tokenization_veco_fast': ['VecoTokenizerFast'],
    }

    import sys

    sys.modules[__name__] = LazyImportModule(
        __name__,
        globals()['__file__'],
        _import_structure,
        module_spec=__spec__,
        extra_objects={},
    )
--- a/modelscope/models/nlp/veco/configuration_veco.py
+++ b/modelscope/models/nlp/veco/configuration_veco.py
@@ -0,0 +1,33 @@
 # Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
 # Copyright 2018 The Google AI Language Team Authors.
 # Copyright 2020 The HuggingFace Inc. team.
 # Copyright (c) 2018, NVIDIA CORPORATION.
 # All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """Veco configuration, mainly copied from :class:`~transformers.configuration_xlm_roberta` """

 from transformers import RobertaConfig

 from modelscope.utils import logger as logging

 logger = logging.get_logger(__name__)


 class VecoConfig(RobertaConfig):
    """
    This class overrides [`RobertaConfig`]. Please check the superclass for the appropriate
    documentation alongside usage examples.
    """

    model_type = 'veco'
--- a/modelscope/models/nlp/veco/modeling_veco.py
+++ b/modelscope/models/nlp/veco/modeling_veco.py
@@ -0,0 +1,143 @@
 # Copyright 2019 Facebook AI Research and the HuggingFace Inc. team.
 # Copyright (c) 2018, NVIDIA CORPORATION.
 # Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
 # All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """PyTorch Veco model. mainly copied from :module:`~transformers.modeling_xlm_roberta`"""

 from transformers import (RobertaForMaskedLM, RobertaForMultipleChoice,
                          RobertaForQuestionAnswering,
                          RobertaForSequenceClassification,
                          RobertaForTokenClassification, RobertaModel)
 from transformers.file_utils import add_start_docstrings

 from modelscope.metainfo import Models
 from modelscope.models.builder import BACKBONES
 from modelscope.utils import logger as logging
 from modelscope.utils.constant import Fields
 from .configuration_veco import VecoConfig

 logger = logging.get_logger(__name__)

 VECO_PRETRAINED_MODEL_ARCHIVE_LIST = []

 VECO_START_DOCSTRING = r"""

    This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic
    methods the library implements for all its model (such as downloading or saving, resizing the input embeddings,
    pruning heads etc.)

    This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)
    subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to
    general usage and behavior.

    Parameters:
        config ([`VecoConfig`]): Model configuration class with all the parameters of the
            model. Initializing with a config file does not load the weights associated with the model, only the
            configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model
            weights.
 """


@add_start_docstrings(
    'The bare Veco Model transformer outputting raw hidden-states without any specific head on top.',
    VECO_START_DOCSTRING,
 )
 class VecoModel(RobertaModel):
    """
    This class overrides [`RobertaModel`]. Please check the superclass for the appropriate
    documentation alongside usage examples.
    """

    config_class = VecoConfig


@add_start_docstrings(
    """
    Veco Model transformer with a sequence classification/regression head on top (a linear layer on top of the
    pooled output) e.g. for GLUE tasks.
    """,
    VECO_START_DOCSTRING,
 )
 class VecoForSequenceClassification(RobertaForSequenceClassification):
    """
    This class overrides [`RobertaForSequenceClassification`]. Please check the superclass for the
    appropriate documentation alongside usage examples.
    """

    config_class = VecoConfig


@add_start_docstrings(
    """
    Veco Model transformer with a masked language model head on top (a linear layer on top of the
    pooled output).
    """,
    VECO_START_DOCSTRING,
 )
 class VecoForMaskedLM(RobertaForMaskedLM):
    """
    This class overrides [`RobertaForMaskedLM`]. Please check the superclass for the
    appropriate documentation alongside usage examples.
    """

    config_class = VecoConfig


@add_start_docstrings(
    """
    Veco Model with a multiple choice classification head on top (a linear layer on top of the pooled output and
    a softmax) e.g. for RocStories/SWAG tasks.
    """,
    VECO_START_DOCSTRING,
 )
 class VecoForMultipleChoice(RobertaForMultipleChoice):
    """
    This class overrides [`RobertaForMultipleChoice`]. Please check the superclass for the
    appropriate documentation alongside usage examples.
    """

    config_class = VecoConfig


@add_start_docstrings(
    """
    Veco Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g.
    for Named-Entity-Recognition (NER) tasks.
    """,
    VECO_START_DOCSTRING,
 )
 class VecoForTokenClassification(RobertaForTokenClassification):
    """
    This class overrides [`RobertaForTokenClassification`]. Please check the superclass for the
    appropriate documentation alongside usage examples.
    """

    config_class = VecoConfig


@add_start_docstrings(
    """
    Veco Model with a span classification head on top for extractive question-answering tasks like SQuAD (a
    linear layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
    """,
    VECO_START_DOCSTRING,
 )
 class VecoForQuestionAnswering(RobertaForQuestionAnswering):
    """
    This class overrides [`RobertaForQuestionAnswering`]. Please check the superclass for the
    appropriate documentation alongside usage examples.
    """

    config_class = VecoConfig
--- a/modelscope/models/nlp/veco/tokenization_veco.py
+++ b/modelscope/models/nlp/veco/tokenization_veco.py
@@ -0,0 +1,321 @@
 # Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. team.
 # Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
 # All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License
 """Tokenization classes for Veco. mainly copied from :module:`~transformers.tokenization_xlm_roberta`"""

 import os
 from shutil import copyfile
 from typing import Any, Dict, List, Optional, Tuple

 import sentencepiece as spm
 from transformers.tokenization_utils import AddedToken, PreTrainedTokenizer

 from modelscope.utils import logger as logging

 logger = logging.get_logger(__name__)

 SPIECE_UNDERLINE = '▁'

 VOCAB_FILES_NAMES = {'vocab_file': 'sentencepiece.bpe.model'}

 PRETRAINED_VOCAB_FILES_MAP = {'vocab_file': {}}

 PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {}


 class VecoTokenizer(PreTrainedTokenizer):
    """
    Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on
    [SentencePiece](https://github.com/google/sentencepiece).

    This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods.
    Users should refer to this superclass for more information regarding those methods.

    Args:
        vocab_file (`str`):
            Path to the vocabulary file.
        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.

            <Tip>

            When building a sequence using special tokens, this is not the token that is used for the beginning of
            sequence. The token used is the `cls_token`.

            </Tip>

        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token.

            <Tip>

            When building a sequence using special tokens, this is not the token that is used for the end of
            sequence. The token used is the `sep_token`.

            </Tip>

        sep_token (`str`, *optional*, defaults to `"</s>"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
        cls_token (`str`, *optional*, defaults to `"<s>"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
        mask_token (`str`, *optional*, defaults to `"<mask>"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
        additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
            Additional special tokens used by the tokenizer.
        sp_model_kwargs (`dict`, *optional*):
            Will be passed to the `SentencePieceProcessor.__init__()` method.
            The [Python wrapper for SentencePiece](https://github.com/google/sentencepiece/tree/master/python)
            can be used, among other things, to set:

            - `enable_sampling`: Enable subword regularization.
            - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout.

              - `nbest_size = {0,1}`: No sampling is performed.
              - `nbest_size > 1`: samples from the nbest_size results.
              - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
                using forward-filtering-and-backward-sampling algorithm.

            - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
              BPE-dropout.

    Attributes:
        sp_model (`SentencePieceProcessor`):
            The *SentencePiece* processor that is used for every conversion (string, tokens and IDs).
    """

    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    model_input_names = ['input_ids', 'attention_mask']

    def __init__(self,
                 vocab_file,
                 bos_token='<s>',
                 eos_token='</s>',
                 sep_token='</s>',
                 cls_token='<s>',
                 unk_token='<unk>',
                 pad_token='<pad>',
                 mask_token='<mask>',
                 sp_model_kwargs: Optional[Dict[str, Any]] = None,
                 **kwargs) -> None:
        # Mask token behave like a normal word, i.e. include the space before it
        mask_token = AddedToken(
            mask_token, lstrip=True, rstrip=False) if isinstance(
                mask_token, str) else mask_token

        self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs

        super().__init__(
            bos_token=bos_token,
            eos_token=eos_token,
            unk_token=unk_token,
            sep_token=sep_token,
            cls_token=cls_token,
            pad_token=pad_token,
            mask_token=mask_token,
            sp_model_kwargs=self.sp_model_kwargs,
            **kwargs,
        )

        self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
        self.sp_model.Load(str(vocab_file))
        self.vocab_file = vocab_file

        # Original fairseq vocab and spm vocab must be "aligned":
        # Vocab    |    0    |    1    |   2    |    3    |  4  |  5  |  6  |   7   |   8   |  9
        # -------- | ------- | ------- | ------ | ------- | --- | --- | --- | ----- | ----- | ----
        # fairseq  | '<s>'   | '<pad>' | '</s>' | '<unk>' | ',' | '.' | '▁' | 's'   | '▁de' | '-'
        # spm      | '<unk>' | '<s>'   | '</s>' | ','     | '.' | '▁' | 's' | '▁de' | '-'   | '▁a'

        # Mimic fairseq token-to-id alignment for the first 4 token
        self.fairseq_tokens_to_ids = {
            '<s>': 0,
            '<pad>': 1,
            '</s>': 2,
            '<unk>': 3
        }

        # The first "real" token "," has position 4 in the original fairseq vocab and position 3 in the spm vocab
        self.fairseq_offset = 1

        self.fairseq_tokens_to_ids['<mask>'] = len(
            self.sp_model) + self.fairseq_offset
        self.fairseq_ids_to_tokens = {
            v: k
            for k, v in self.fairseq_tokens_to_ids.items()
        }

    def __getstate__(self):
        state = self.__dict__.copy()
        state['sp_model'] = None
        state['sp_model_proto'] = self.sp_model.serialized_model_proto()
        return state

    def __setstate__(self, d):
        self.__dict__ = d

        # for backward compatibility
        if not hasattr(self, 'sp_model_kwargs'):
            self.sp_model_kwargs = {}

        self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
        self.sp_model.LoadFromSerializedProto(self.sp_model_proto)

    def build_inputs_with_special_tokens(
            self,
            token_ids_0: List[int],
            token_ids_1: Optional[List[int]] = None) -> List[int]:
        """
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. An Veco sequence has the following format:

        - single sequence: `<s> X </s>`
        - pair of sequences: `<s> A </s></s> B </s>`

        Args:
            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """

        if token_ids_1 is None:
            return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
        cls = [self.cls_token_id]
        sep = [self.sep_token_id]
        return cls + token_ids_0 + sep + sep + token_ids_1 + sep

    def get_special_tokens_mask(
            self,
            token_ids_0: List[int],
            token_ids_1: Optional[List[int]] = None,
            already_has_special_tokens: bool = False) -> List[int]:
        """
        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
        special tokens using the tokenizer `prepare_for_model` method.

        Args:
            token_ids_0 (`List[int]`):
                List of IDs.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.
            already_has_special_tokens (`bool`, *optional*, defaults to `False`):
                Whether or not the token list is already formatted with special tokens for the model.

        Returns:
            `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
        """

        if already_has_special_tokens:
            return super().get_special_tokens_mask(
                token_ids_0=token_ids_0,
                token_ids_1=token_ids_1,
                already_has_special_tokens=True)

        if token_ids_1 is None:
            return [1] + ([0] * len(token_ids_0)) + [1]
        return [1] + ([0] * len(token_ids_0)) + [1, 1] + (
            [0] * len(token_ids_1)) + [1]

    def create_token_type_ids_from_sequences(
            self,
            token_ids_0: List[int],
            token_ids_1: Optional[List[int]] = None) -> List[int]:
        """
        Create a mask from the two sequences passed to be used in a sequence-pair classification task. Veco does
        not make use of token type ids, therefore a list of zeros is returned.

        Args:
            token_ids_0 (`List[int]`):
                List of IDs.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of zeros.

        """

        sep = [self.sep_token_id]
        cls = [self.cls_token_id]

        if token_ids_1 is None:
            return len(cls + token_ids_0 + sep) * [0]
        return len(cls + token_ids_0 + sep + sep + token_ids_1 + sep) * [0]

    @property
    def vocab_size(self):
        return len(
            self.sp_model) + self.fairseq_offset + 1  # Add the <mask> token

    def get_vocab(self):
        vocab = {
            self.convert_ids_to_tokens(i): i
            for i in range(self.vocab_size)
        }
        vocab.update(self.added_tokens_encoder)
        return vocab

    def _tokenize(self, text: str) -> List[str]:
        return self.sp_model.encode(text, out_type=str)

    def _convert_token_to_id(self, token):
        """Converts a token (str) in an id using the vocab."""
        if token in self.fairseq_tokens_to_ids:
            return self.fairseq_tokens_to_ids[token]
        spm_id = self.sp_model.PieceToId(token)

        # Need to return unknown token if the SP model returned 0
        return spm_id + self.fairseq_offset if spm_id else self.unk_token_id

    def _convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab."""
        if index in self.fairseq_ids_to_tokens:
            return self.fairseq_ids_to_tokens[index]
        return self.sp_model.IdToPiece(index - self.fairseq_offset)

    def convert_tokens_to_string(self, tokens):
        """Converts a sequence of tokens (strings for sub-words) in a single string."""
        out_string = ''.join(tokens).replace(SPIECE_UNDERLINE, ' ').strip()
        return out_string

    def save_vocabulary(self,
                        save_directory: str,
                        filename_prefix: Optional[str] = None) -> Tuple[str]:
        if not os.path.isdir(save_directory):
            logger.error(
                f'Vocabulary path ({save_directory}) should be a directory')
            return
        out_vocab_file = os.path.join(
            save_directory, (filename_prefix + '-' if filename_prefix else '')
            + VOCAB_FILES_NAMES['vocab_file'])

        if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file):
            copyfile(self.vocab_file, out_vocab_file)

        return (out_vocab_file, )
--- a/modelscope/models/nlp/veco/tokenization_veco_fast.py
+++ b/modelscope/models/nlp/veco/tokenization_veco_fast.py
@@ -0,0 +1,213 @@
 # Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. team.
 # Copyright 2021-2022 The Alibaba DAMO NLP Team Authors.
 # All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License
 """Fast Tokenization classes for Veco. mainly copied from :module:`~transformers.tokenization_xlm_roberta_fast`"""

 import os
 from shutil import copyfile
 from typing import List, Optional, Tuple

 import transformers
 from transformers.file_utils import is_sentencepiece_available
 from transformers.tokenization_utils import AddedToken
 from transformers.tokenization_utils_fast import PreTrainedTokenizerFast

 from modelscope.utils import logger as logging

 if is_sentencepiece_available():
    from .tokenization_veco import VecoTokenizer
 else:
    VecoTokenizer = None

 logger = logging.get_logger(__name__)

 VOCAB_FILES_NAMES = {
    'vocab_file': 'sentencepiece.bpe.model',
    'tokenizer_file': 'tokenizer.json'
 }

 PRETRAINED_VOCAB_FILES_MAP = {
    'vocab_file': {},
    'tokenizer_file': {},
 }

 PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {}

 transformers.SLOW_TO_FAST_CONVERTERS[
    'VecoTokenizer'] = transformers.SLOW_TO_FAST_CONVERTERS[
        'XLMRobertaTokenizer']


 class VecoTokenizerFast(PreTrainedTokenizerFast):
    """
    Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`].
    Based on [BPE](https://huggingface.co/docs/tokenizers/python/latest/components.html?highlight=BPE#models).

    This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main
    methods. Users should refer to this superclass for more information regarding those methods.

    Args:
        vocab_file (`str`):
            Path to the vocabulary file.
        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.

            <Tip>

            When building a sequence using special tokens, this is not the token that is used for the beginning of
            sequence. The token used is the `cls_token`.

            </Tip>

        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token.

            <Tip>

            When building a sequence using special tokens, this is not the token that is used for the end of
            sequence. The token used is the `sep_token`.

            </Tip>

        sep_token (`str`, *optional*, defaults to `"</s>"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
        cls_token (`str`, *optional*, defaults to `"<s>"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
        mask_token (`str`, *optional*, defaults to `"<mask>"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
        additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
            Additional special tokens used by the tokenizer.
    """

    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    model_input_names = ['input_ids', 'attention_mask']
    slow_tokenizer_class = VecoTokenizer

    def __init__(self,
                 vocab_file=None,
                 tokenizer_file=None,
                 bos_token='<s>',
                 eos_token='</s>',
                 sep_token='</s>',
                 cls_token='<s>',
                 unk_token='<unk>',
                 pad_token='<pad>',
                 mask_token='<mask>',
                 **kwargs):
        # Mask token behave like a normal word, i.e. include the space before it
        mask_token = AddedToken(
            mask_token, lstrip=True, rstrip=False) if isinstance(
                mask_token, str) else mask_token

        super().__init__(
            vocab_file,
            tokenizer_file=tokenizer_file,
            bos_token=bos_token,
            eos_token=eos_token,
            sep_token=sep_token,
            cls_token=cls_token,
            unk_token=unk_token,
            pad_token=pad_token,
            mask_token=mask_token,
            **kwargs,
        )

        self.vocab_file = vocab_file
        self.can_save_slow_tokenizer = False if not self.vocab_file else True

    def build_inputs_with_special_tokens(
            self,
            token_ids_0: List[int],
            token_ids_1: Optional[List[int]] = None) -> List[int]:
        """
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. An Veco sequence has the following format:

        - single sequence: `<s> X </s>`
        - pair of sequences: `<s> A </s></s> B </s>`

        Args:
            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """

        if token_ids_1 is None:
            return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
        cls = [self.cls_token_id]
        sep = [self.sep_token_id]
        return cls + token_ids_0 + sep + sep + token_ids_1 + sep

    def create_token_type_ids_from_sequences(
            self,
            token_ids_0: List[int],
            token_ids_1: Optional[List[int]] = None) -> List[int]:
        """
        Create a mask from the two sequences passed to be used in a sequence-pair classification task. Veco does
        not make use of token type ids, therefore a list of zeros is returned.

        Args:
            token_ids_0 (`List[int]`):
                List of IDs.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of zeros.

        """

        sep = [self.sep_token_id]
        cls = [self.cls_token_id]

        if token_ids_1 is None:
            return len(cls + token_ids_0 + sep) * [0]
        return len(cls + token_ids_0 + sep + sep + token_ids_1 + sep) * [0]

    def save_vocabulary(self,
                        save_directory: str,
                        filename_prefix: Optional[str] = None) -> Tuple[str]:
        if not self.can_save_slow_tokenizer:
            raise ValueError(
                'Your fast tokenizer does not have the necessary information to save the vocabulary for a slow '
                'tokenizer.')

        if not os.path.isdir(save_directory):
            logger.error(
                f'Vocabulary path ({save_directory}) should be a directory.')
            return
        out_vocab_file = os.path.join(
            save_directory, (filename_prefix + '-' if filename_prefix else '')
            + VOCAB_FILES_NAMES['vocab_file'])

        if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file):
            copyfile(self.vocab_file, out_vocab_file)

        return (out_vocab_file, )
--- a/modelscope/msdatasets/ms_dataset.py
+++ b/modelscope/msdatasets/ms_dataset.py
@@ -517,3 +517,10 @@ class MsDataset:
    def to_hf_dataset(self) -> Dataset:
        self._hf_ds.reset_format()
        return self._hf_ds

    @staticmethod
    def interleave_datasets(datasets: List[Any],
                            probabilities: Optional[List[float]] = None,
                            seed: Optional[int] = None):
        from datasets import interleave_datasets
        return interleave_datasets(datasets, probabilities, seed)
--- a/modelscope/outputs.py
+++ b/modelscope/outputs.py
@@ -9,6 +9,7 @@ class OutputKeys(object):
    SCORES = 'scores'
    LABEL = 'label'
    LABELS = 'labels'
    INPUT_IDS = 'input_ids'
    LABEL_POS = 'label_pos'
    POSES = 'poses'
    CAPTION = 'caption'
--- a/modelscope/pipelines/nlp/init.py
+++ b/modelscope/pipelines/nlp/init.py
@@ -9,9 +9,8 @@ if TYPE_CHECKING:
    from .dialog_state_tracking_pipeline import DialogStateTrackingPipeline
    from .fill_mask_pipeline import FillMaskPipeline
    from .named_entity_recognition_pipeline import NamedEntityRecognitionPipeline
    from .nli_pipeline import NLIPipeline
    from .sentence_similarity_pipeline import SentenceSimilarityPipeline
    from .sentiment_classification_pipeline import SentimentClassificationPipeline
    from .pair_sentence_classification_pipeline import PairSentenceClassificationPipeline
    from .single_sentence_classification_pipeline import SingleSentenceClassificationPipeline
    from .sequence_classification_pipeline import SequenceClassificationPipeline
    from .text_generation_pipeline import TextGenerationPipeline
    from .translation_pipeline import TranslationPipeline
@@ -28,10 +27,10 @@ else:
        'dialog_modeling_pipeline': ['DialogModelingPipeline'],
        'dialog_state_tracking_pipeline': ['DialogStateTrackingPipeline'],
        'fill_mask_pipeline': ['FillMaskPipeline'],
        'nli_pipeline': ['NLIPipeline'],
        'sentence_similarity_pipeline': ['SentenceSimilarityPipeline'],
        'sentiment_classification_pipeline':
        ['SentimentClassificationPipeline'],
        'single_sentence_classification_pipeline':
        ['SingleSentenceClassificationPipeline'],
        'pair_sentence_classification_pipeline':
        ['PairSentenceClassificationPipeline'],
        'sequence_classification_pipeline': ['SequenceClassificationPipeline'],
        'text_generation_pipeline': ['TextGenerationPipeline'],
        'word_segmentation_pipeline': ['WordSegmentationPipeline'],
--- a/modelscope/pipelines/nlp/fill_mask_pipeline.py
+++ b/modelscope/pipelines/nlp/fill_mask_pipeline.py
@@ -5,11 +5,10 @@ import torch

 from modelscope.metainfo import Pipelines
 from modelscope.models import Model
 from modelscope.models.nlp.masked_language import MaskedLanguageModelBase
 from modelscope.outputs import OutputKeys
 from modelscope.pipelines.base import Pipeline, Tensor
 from modelscope.pipelines.builder import PIPELINES
 from modelscope.preprocessors import FillMaskPreprocessor
 from modelscope.preprocessors import FillMaskPreprocessor, Preprocessor
 from modelscope.utils.config import Config
 from modelscope.utils.constant import ModelFile, Tasks

@@ -21,18 +20,18 @@ _type_map = {'veco': 'roberta', 'sbert': 'bert'}
 class FillMaskPipeline(Pipeline):

    def __init__(self,
                 model: Union[MaskedLanguageModelBase, str],
                 preprocessor: Optional[FillMaskPreprocessor] = None,
                 first_sequence='sentense',
                 model: Union[Model, str],
                 preprocessor: Optional[Preprocessor] = None,
                 first_sequence='sentence',
                 **kwargs):
        """use `model` and `preprocessor` to create a nlp fill mask pipeline for prediction

        Args:
            model (MaskedLanguageModelBase): a model instance
            preprocessor (FillMaskPreprocessor): a preprocessor instance
            model (Model): a model instance
            preprocessor (Preprocessor): a preprocessor instance
        """
        fill_mask_model = model if isinstance(
            model, MaskedLanguageModelBase) else Model.from_pretrained(model)
            model, Model) else Model.from_pretrained(model)

        if preprocessor is None:
            preprocessor = FillMaskPreprocessor(
@@ -73,7 +72,7 @@ class FillMaskPipeline(Pipeline):
    def forward(self, inputs: Dict[str, Any],
                **forward_params) -> Dict[str, Any]:
        with torch.no_grad():
            return super().forward(inputs, **forward_params)
            return self.model(inputs, **forward_params)

    def postprocess(self, inputs: Dict[str, Tensor]) -> Dict[str, Tensor]:
        """process the prediction results
@@ -85,8 +84,8 @@ class FillMaskPipeline(Pipeline):
            Dict[str, str]: the prediction results
        """
        import numpy as np
        logits = inputs['logits'].detach().cpu().numpy()
        input_ids = inputs['input_ids'].detach().cpu().numpy()
        logits = inputs[OutputKeys.LOGITS].detach().cpu().numpy()
        input_ids = inputs[OutputKeys.INPUT_IDS].detach().cpu().numpy()
        pred_ids = np.argmax(logits, axis=-1)
        model_type = self.model.config.model_type
        process_type = model_type if model_type in self.mask_id else _type_map[
--- a/modelscope/pipelines/nlp/named_entity_recognition_pipeline.py
+++ b/modelscope/pipelines/nlp/named_entity_recognition_pipeline.py
@@ -4,11 +4,10 @@ import torch

 from modelscope.metainfo import Pipelines
 from modelscope.models import Model
 from modelscope.models.nlp import TransformerCRFForNamedEntityRecognition
 from modelscope.outputs import OutputKeys
 from modelscope.pipelines.base import Pipeline, Tensor
 from modelscope.pipelines.base import Pipeline
 from modelscope.pipelines.builder import PIPELINES
 from modelscope.preprocessors import NERPreprocessor
 from modelscope.preprocessors import NERPreprocessor, Preprocessor
 from modelscope.utils.constant import Tasks

 __all__ = ['NamedEntityRecognitionPipeline']
@@ -20,13 +19,12 @@ __all__ = ['NamedEntityRecognitionPipeline']
 class NamedEntityRecognitionPipeline(Pipeline):

    def __init__(self,
                 model: Union[TransformerCRFForNamedEntityRecognition, str],
                 preprocessor: Optional[NERPreprocessor] = None,
                 model: Union[Model, str],
                 preprocessor: Optional[Preprocessor] = None,
                 **kwargs):

        model = model if isinstance(model,
                                    TransformerCRFForNamedEntityRecognition
                                    ) else Model.from_pretrained(model)
                                    Model) else Model.from_pretrained(model)
        if preprocessor is None:
            preprocessor = NERPreprocessor(model.model_dir)
        model.eval()
--- a/modelscope/pipelines/nlp/nli_pipeline.py
+++ b/modelscope/pipelines/nlp/nli_pipeline.py
@@ -1,73 +0,0 @@
 import uuid
 from typing import Any, Dict, Union

 import numpy as np
 import torch

 from modelscope.metainfo import Pipelines
 from modelscope.models import Model
 from modelscope.models.nlp import SbertForNLI
 from modelscope.outputs import OutputKeys
 from modelscope.pipelines.base import Pipeline
 from modelscope.pipelines.builder import PIPELINES
 from modelscope.preprocessors import NLIPreprocessor
 from modelscope.utils.constant import Tasks

 __all__ = ['NLIPipeline']


@PIPELINES.register_module(Tasks.nli, module_name=Pipelines.nli)
 class NLIPipeline(Pipeline):

    def __init__(self,
                 model: Union[SbertForNLI, str],
                 preprocessor: NLIPreprocessor = None,
                 first_sequence='first_sequence',
                 second_sequence='second_sequence',
                 **kwargs):
        """use `model` and `preprocessor` to create a nlp text classification pipeline for prediction

        Args:
            model (SbertForNLI): a model instance
            preprocessor (NLIPreprocessor): a preprocessor instance
        """
        assert isinstance(model, str) or isinstance(model, SbertForNLI), \
            'model must be a single str or SbertForNLI'
        model = model if isinstance(
            model, SbertForNLI) else Model.from_pretrained(model)
        if preprocessor is None:
            preprocessor = NLIPreprocessor(
                model.model_dir,
                first_sequence=first_sequence,
                second_sequence=second_sequence)
        model.eval()
        super().__init__(model=model, preprocessor=preprocessor, **kwargs)
        assert len(model.id2label) > 0

    def forward(self, inputs: Dict[str, Any],
                **forward_params) -> Dict[str, Any]:
        with torch.no_grad():
            return super().forward(inputs, **forward_params)

    def postprocess(self,
                    inputs: Dict[str, Any],
                    topk: int = 5) -> Dict[str, str]:
        """process the prediction results

        Args:
            inputs (Dict[str, Any]): _description_

        Returns:
            Dict[str, str]: the prediction results
        """

        probs = inputs['probabilities'][0]
        num_classes = probs.shape[0]
        topk = min(topk, num_classes)
        top_indices = np.argpartition(probs, -topk)[-topk:]
        cls_ids = top_indices[np.argsort(probs[top_indices])]
        probs = probs[cls_ids].tolist()

        cls_names = [self.model.id2label[cid] for cid in cls_ids]

        return {OutputKeys.SCORES: probs, OutputKeys.LABELS: cls_names}
--- a/modelscope/pipelines/nlp/pair_sentence_classification_pipeline.py
+++ b/modelscope/pipelines/nlp/pair_sentence_classification_pipeline.py
@@ -0,0 +1,37 @@
 from typing import Union

 from modelscope.models.base import Model
 from ...metainfo import Pipelines
 from ...preprocessors import (PairSentenceClassificationPreprocessor,
                              Preprocessor)
 from ...utils.constant import Tasks
 from ..builder import PIPELINES
 from .sequence_classification_pipeline_base import \
    SequenceClassificationPipelineBase

 __all__ = ['PairSentenceClassificationPipeline']


@PIPELINES.register_module(Tasks.nli, module_name=Pipelines.nli)
@PIPELINES.register_module(
    Tasks.sentence_similarity, module_name=Pipelines.sentence_similarity)
 class PairSentenceClassificationPipeline(SequenceClassificationPipelineBase):

    def __init__(self,
                 model: Union[Model, str],
                 preprocessor: Preprocessor = None,
                 first_sequence='first_sequence',
                 second_sequence='second_sequence',
                 **kwargs):
        """use `model` and `preprocessor` to create a nlp pair sentence classification pipeline for prediction

        Args:
            model (Model): a model instance
            preprocessor (Preprocessor): a preprocessor instance
        """
        if preprocessor is None:
            preprocessor = PairSentenceClassificationPreprocessor(
                model.model_dir if isinstance(model, Model) else model,
                first_sequence=first_sequence,
                second_sequence=second_sequence)
        super().__init__(model=model, preprocessor=preprocessor, **kwargs)
--- a/modelscope/pipelines/nlp/sentence_similarity_pipeline.py
+++ b/modelscope/pipelines/nlp/sentence_similarity_pipeline.py
@@ -1,73 +0,0 @@
 from typing import Any, Dict, Union

 import numpy as np
 import torch

 from modelscope.metainfo import Pipelines
 from modelscope.models import Model
 from modelscope.models.nlp import SbertForSentenceSimilarity
 from modelscope.outputs import OutputKeys
 from modelscope.pipelines.base import Input, Pipeline
 from modelscope.pipelines.builder import PIPELINES
 from modelscope.preprocessors import SentenceSimilarityPreprocessor
 from modelscope.utils.constant import Tasks

 __all__ = ['SentenceSimilarityPipeline']


@PIPELINES.register_module(
    Tasks.sentence_similarity, module_name=Pipelines.sentence_similarity)
 class SentenceSimilarityPipeline(Pipeline):

    def __init__(self,
                 model: Union[Model, str],
                 preprocessor: SentenceSimilarityPreprocessor = None,
                 first_sequence='first_sequence',
                 second_sequence='second_sequence',
                 **kwargs):
        """use `model` and `preprocessor` to create a nlp sentence similarity pipeline for prediction

        Args:
            model (SbertForSentenceSimilarity): a model instance
            preprocessor (SentenceSimilarityPreprocessor): a preprocessor instance
        """
        assert isinstance(model, str) or isinstance(model, SbertForSentenceSimilarity), \
            'model must be a single str or SbertForSentenceSimilarity'
        sc_model = model if isinstance(
            model,
            SbertForSentenceSimilarity) else Model.from_pretrained(model)
        if preprocessor is None:
            preprocessor = SentenceSimilarityPreprocessor(
                sc_model.model_dir,
                first_sequence=first_sequence,
                second_sequence=second_sequence)
        sc_model.eval()
        super().__init__(model=sc_model, preprocessor=preprocessor, **kwargs)

        assert hasattr(self.model, 'id2label'), \
            'id2label map should be initalizaed in init function.'

    def forward(self, inputs: Dict[str, Any],
                **forward_params) -> Dict[str, Any]:
        with torch.no_grad():
            return super().forward(inputs, **forward_params)

    def postprocess(self, inputs: Dict[str, Any],
                    **postprocess_params) -> Dict[str, str]:
        """process the prediction results

        Args:
            inputs (Dict[str, Any]): _description_

        Returns:
            Dict[str, str]: the prediction results
        """

        probs = inputs['probabilities'][0]
        num_classes = probs.shape[0]
        top_indices = np.argpartition(probs, -num_classes)[-num_classes:]
        cls_ids = top_indices[np.argsort(-probs[top_indices], axis=-1)]
        probs = probs[cls_ids].tolist()
        cls_names = [self.model.id2label[cid] for cid in cls_ids]
        b = 0
        return {OutputKeys.SCORES: probs[b], OutputKeys.LABELS: cls_names[b]}
--- a/modelscope/pipelines/nlp/sentiment_classification_pipeline.py
+++ b/modelscope/pipelines/nlp/sentiment_classification_pipeline.py
@@ -1,74 +0,0 @@
 from typing import Any, Dict, Union

 import numpy as np
 import torch

 from modelscope.metainfo import Pipelines
 from modelscope.models import Model
 from modelscope.models.nlp import SequenceClassificationModel
 from modelscope.outputs import OutputKeys
 from modelscope.pipelines.base import Pipeline
 from modelscope.pipelines.builder import PIPELINES
 from modelscope.preprocessors import SentimentClassificationPreprocessor
 from modelscope.utils.constant import Tasks

 __all__ = ['SentimentClassificationPipeline']


@PIPELINES.register_module(
    Tasks.sentiment_classification,
    module_name=Pipelines.sentiment_classification)
 class SentimentClassificationPipeline(Pipeline):

    def __init__(self,
                 model: Union[SequenceClassificationModel, str],
                 preprocessor: SentimentClassificationPreprocessor = None,
                 first_sequence='first_sequence',
                 second_sequence='second_sequence',
                 **kwargs):
        """use `model` and `preprocessor` to create a nlp text classification pipeline for prediction

        Args:
            model (SequenceClassificationModel): a model instance
            preprocessor (SentimentClassificationPreprocessor): a preprocessor instance
        """
        assert isinstance(model, str) or isinstance(model, SequenceClassificationModel), \
            'model must be a single str or SentimentClassification'
        model = model if isinstance(
            model,
            SequenceClassificationModel) else Model.from_pretrained(model)
        if preprocessor is None:
            preprocessor = SentimentClassificationPreprocessor(
                model.model_dir,
                first_sequence=first_sequence,
                second_sequence=second_sequence)
        model.eval()
        super().__init__(model=model, preprocessor=preprocessor, **kwargs)
        assert len(model.id2label) > 0

    def forward(self, inputs: Dict[str, Any],
                **forward_params) -> Dict[str, Any]:
        with torch.no_grad():
            return super().forward(inputs, **forward_params)

    def postprocess(self,
                    inputs: Dict[str, Any],
                    topk: int = 5) -> Dict[str, str]:
        """process the prediction results

        Args:
            inputs (Dict[str, Any]): _description_

        Returns:
            Dict[str, str]: the prediction results
        """

        probs = inputs['probabilities'][0]
        num_classes = probs.shape[0]
        topk = min(topk, num_classes)
        top_indices = np.argpartition(probs, -topk)[-topk:]
        cls_ids = top_indices[np.argsort(probs[top_indices])]
        probs = probs[cls_ids].tolist()

        cls_names = [self.model.id2label[cid] for cid in cls_ids]
        return {OutputKeys.SCORES: probs, OutputKeys.LABELS: cls_names}
--- a/modelscope/pipelines/nlp/sequence_classification_pipeline_base.py
+++ b/modelscope/pipelines/nlp/sequence_classification_pipeline_base.py
@@ -0,0 +1,60 @@
 from typing import Any, Dict, Union

 import numpy as np
 import torch

 from modelscope.models.base import Model
 from modelscope.outputs import OutputKeys
 from ...preprocessors import Preprocessor
 from ..base import Pipeline


 class SequenceClassificationPipelineBase(Pipeline):

    def __init__(self, model: Union[Model, str], preprocessor: Preprocessor,
                 **kwargs):
        """use `model` and `preprocessor` to create a nlp text classification pipeline for prediction

        Args:
            model (str or Model): a model instance
            preprocessor (Preprocessor): a preprocessor instance
        """
        assert isinstance(model, str) or isinstance(model, Model), \
            'model must be a single str or Model'
        model = model if isinstance(model,
                                    Model) else Model.from_pretrained(model)
        assert preprocessor is not None
        model.eval()
        super().__init__(model=model, preprocessor=preprocessor, **kwargs)
        self.id2label = kwargs.get('id2label')
        if self.id2label is None and hasattr(self.preprocessor, 'id2label'):
            self.id2label = self.preprocessor.id2label
        assert self.id2label is not None, 'Cannot convert id to the original label, please pass in the mapping ' \
                                          'as a parameter or make sure the preprocessor has the attribute.'

    def forward(self, inputs: Dict[str, Any],
                **forward_params) -> Dict[str, Any]:
        with torch.no_grad():
            return self.model(inputs, **forward_params)

    def postprocess(self,
                    inputs: Dict[str, Any],
                    topk: int = 5) -> Dict[str, str]:
        """process the prediction results

        Args:
            inputs (Dict[str, Any]): _description_
            topk (int): The topk probs to take
        Returns:
            Dict[str, str]: the prediction results
        """

        probs = inputs[OutputKeys.PROBABILITIES][0]
        num_classes = probs.shape[0]
        topk = min(topk, num_classes)
        top_indices = np.argpartition(probs, -topk)[-topk:]
        cls_ids = top_indices[np.argsort(probs[top_indices])]
        probs = probs[cls_ids].tolist()

        cls_names = [self.id2label[cid] for cid in cls_ids]
        return {OutputKeys.SCORES: probs, OutputKeys.LABELS: cls_names}
--- a/modelscope/pipelines/nlp/single_sentence_classification_pipeline.py
+++ b/modelscope/pipelines/nlp/single_sentence_classification_pipeline.py
@@ -0,0 +1,35 @@
 from typing import Union

 from ...metainfo import Pipelines
 from ...models import Model
 from ...preprocessors import (Preprocessor,
                              SingleSentenceClassificationPreprocessor)
 from ...utils.constant import Tasks
 from ..builder import PIPELINES
 from .sequence_classification_pipeline_base import \
    SequenceClassificationPipelineBase

 __all__ = ['SingleSentenceClassificationPipeline']


@PIPELINES.register_module(
    Tasks.sentiment_classification,
    module_name=Pipelines.sentiment_classification)
 class SingleSentenceClassificationPipeline(SequenceClassificationPipelineBase):

    def __init__(self,
                 model: Union[Model, str],
                 preprocessor: Preprocessor = None,
                 first_sequence='first_sequence',
                 **kwargs):
        """use `model` and `preprocessor` to create a nlp single sentence classification pipeline for prediction

        Args:
            model (Model): a model instance
            preprocessor (Preprocessor): a preprocessor instance
        """
        if preprocessor is None:
            preprocessor = SingleSentenceClassificationPreprocessor(
                model.model_dir if isinstance(model, Model) else model,
                first_sequence=first_sequence)
        super().__init__(model=model, preprocessor=preprocessor, **kwargs)
--- a/modelscope/pipelines/nlp/text_generation_pipeline.py
+++ b/modelscope/pipelines/nlp/text_generation_pipeline.py
@@ -3,7 +3,7 @@ from typing import Any, Dict, Optional, Union
 import torch

 from modelscope.metainfo import Pipelines
 from modelscope.models.base import TorchModel
 from modelscope.models.base import Model
 from modelscope.pipelines.base import Pipeline, Tensor
 from modelscope.pipelines.builder import PIPELINES
 from modelscope.preprocessors import TextGenerationPreprocessor
@@ -17,7 +17,7 @@ __all__ = ['TextGenerationPipeline']
 class TextGenerationPipeline(Pipeline):

    def __init__(self,
                 model: Union[TorchModel, str],
                 model: Union[Model, str],
                 preprocessor: Optional[TextGenerationPreprocessor] = None,
                 **kwargs):
        """use `model` and `preprocessor` to create a nlp text generation pipeline for prediction
@@ -26,8 +26,8 @@ class TextGenerationPipeline(Pipeline):
            model (PalmForTextGeneration): a model instance
            preprocessor (TextGenerationPreprocessor): a preprocessor instance
        """
        model = model if isinstance(
            model, TorchModel) else TorchModel.from_pretrained(model)
        model = model if isinstance(model,
                                    Model) else Model.from_pretrained(model)
        if preprocessor is None:
            preprocessor = TextGenerationPreprocessor(
                model.model_dir,
--- a/modelscope/pipelines/nlp/translation_pipeline.py
+++ b/modelscope/pipelines/nlp/translation_pipeline.py
@@ -4,11 +4,9 @@ from typing import Any, Dict
 import numpy as np
 import tensorflow as tf

 from modelscope.hub.snapshot_download import snapshot_download
 from modelscope.metainfo import Pipelines
 from modelscope.models.nlp import CsanmtForTranslation
 from modelscope.outputs import OutputKeys
 from modelscope.pipelines.base import Pipeline, Tensor
 from modelscope.pipelines.base import Pipeline
 from modelscope.pipelines.builder import PIPELINES
 from modelscope.utils.constant import ModelFile, Tasks
 from modelscope.utils.logger import get_logger
--- a/modelscope/pipelines/nlp/word_segmentation_pipeline.py
+++ b/modelscope/pipelines/nlp/word_segmentation_pipeline.py
@@ -4,11 +4,11 @@ import torch

 from modelscope.metainfo import Pipelines
 from modelscope.models import Model
 from modelscope.models.nlp import SbertForTokenClassification
 from modelscope.outputs import OutputKeys
 from modelscope.pipelines.base import Pipeline, Tensor
 from modelscope.pipelines.builder import PIPELINES
 from modelscope.preprocessors import TokenClassificationPreprocessor
 from modelscope.preprocessors import (Preprocessor,
                                      TokenClassificationPreprocessor)
 from modelscope.utils.constant import Tasks

 __all__ = ['WordSegmentationPipeline']
@@ -18,33 +18,35 @@ __all__ = ['WordSegmentationPipeline']
    Tasks.word_segmentation, module_name=Pipelines.word_segmentation)
 class WordSegmentationPipeline(Pipeline):

    def __init__(
            self,
            model: Union[SbertForTokenClassification, str],
            preprocessor: Optional[TokenClassificationPreprocessor] = None,
            **kwargs):
    def __init__(self,
                 model: Union[Model, str],
                 preprocessor: Optional[Preprocessor] = None,
                 **kwargs):
        """use `model` and `preprocessor` to create a nlp word segmentation pipeline for prediction

        Args:
            model (StructBertForTokenClassification): a model instance
            preprocessor (TokenClassificationPreprocessor): a preprocessor instance
            model (Model): a model instance
            preprocessor (Preprocessor): a preprocessor instance
        """
        model = model if isinstance(
            model,
            SbertForTokenClassification) else Model.from_pretrained(model)
        model = model if isinstance(model,
                                    Model) else Model.from_pretrained(model)
        if preprocessor is None:
            preprocessor = TokenClassificationPreprocessor(model.model_dir)
        model.eval()
        super().__init__(model=model, preprocessor=preprocessor, **kwargs)
        self.tokenizer = preprocessor.tokenizer
        self.config = model.config
        assert len(self.config.id2label) > 0
        self.id2label = self.config.id2label
        self.id2label = kwargs.get('id2label')
        if self.id2label is None and hasattr(self.preprocessor, 'id2label'):
            self.id2label = self.preprocessor.id2label
        assert self.id2label is not None, 'Cannot convert id to the original label, please pass in the mapping ' \
                                          'as a parameter or make sure the preprocessor has the attribute.'

    def forward(self, inputs: Dict[str, Any],
                **forward_params) -> Dict[str, Any]:
        text = inputs.pop(OutputKeys.TEXT)
        with torch.no_grad():
            return super().forward(inputs, **forward_params)
            return {
                **self.model(inputs, **forward_params), OutputKeys.TEXT: text
            }

    def postprocess(self, inputs: Dict[str, Any],
                    **postprocess_params) -> Dict[str, str]:
--- a/modelscope/pipelines/nlp/zero_shot_classification_pipeline.py
+++ b/modelscope/pipelines/nlp/zero_shot_classification_pipeline.py
@@ -5,11 +5,11 @@ from scipy.special import softmax

 from modelscope.metainfo import Pipelines
 from modelscope.models import Model
 from modelscope.models.nlp import SbertForZeroShotClassification
 from modelscope.outputs import OutputKeys
 from modelscope.pipelines.base import Pipeline
 from modelscope.pipelines.builder import PIPELINES
 from modelscope.preprocessors import ZeroShotClassificationPreprocessor
 from modelscope.preprocessors import (Preprocessor,
                                      ZeroShotClassificationPreprocessor)
 from modelscope.utils.constant import Tasks

 __all__ = ['ZeroShotClassificationPipeline']
@@ -21,19 +21,18 @@ __all__ = ['ZeroShotClassificationPipeline']
 class ZeroShotClassificationPipeline(Pipeline):

    def __init__(self,
                 model: Union[SbertForZeroShotClassification, str],
                 preprocessor: ZeroShotClassificationPreprocessor = None,
                 model: Union[Model, str],
                 preprocessor: Preprocessor = None,
                 **kwargs):
        """use `model` and `preprocessor` to create a nlp text classification pipeline for prediction
        """use `model` and `preprocessor` to create a nlp zero-shot text classification pipeline for prediction
        Args:
            model (SbertForZeroShotClassification): a model instance
            preprocessor (SentimentClassificationPreprocessor): a preprocessor instance
            model (Model): a model instance
            preprocessor (Preprocessor): a preprocessor instance
        """
        assert isinstance(model, str) or isinstance(model, SbertForZeroShotClassification), \
            'model must be a single str or SbertForZeroShotClassification'
        model = model if isinstance(
            model,
            SbertForZeroShotClassification) else Model.from_pretrained(model)
        assert isinstance(model, str) or isinstance(model, Model), \
            'model must be a single str or Model'
        model = model if isinstance(model,
                                    Model) else Model.from_pretrained(model)
        self.entailment_id = 0
        self.contradiction_id = 2
        if preprocessor is None:
@@ -58,7 +57,7 @@ class ZeroShotClassificationPipeline(Pipeline):
    def forward(self, inputs: Dict[str, Any],
                **forward_params) -> Dict[str, Any]:
        with torch.no_grad():
            return super().forward(inputs, **forward_params)
            return self.model(inputs, **forward_params)

    def postprocess(self,
                    inputs: Dict[str, Any],
@@ -70,7 +69,7 @@ class ZeroShotClassificationPipeline(Pipeline):
        Returns:
            Dict[str, Any]: the prediction results
        """
        logits = inputs['logits']
        logits = inputs[OutputKeys.LOGITS]
        if multi_label or len(candidate_labels) == 1:
            logits = logits[..., [self.contradiction_id, self.entailment_id]]
            scores = softmax(logits, axis=-1)[..., 1]
--- a/modelscope/preprocessors/init.py
+++ b/modelscope/preprocessors/init.py
@@ -18,11 +18,11 @@ if TYPE_CHECKING:
                              MPlugVisualQuestionAnsweringPreprocessor)
    from .nlp import (Tokenize, SequenceClassificationPreprocessor,
                      TextGenerationPreprocessor,
                      TokenClassificationPreprocessor, NLIPreprocessor,
                      SentimentClassificationPreprocessor,
                      SentenceSimilarityPreprocessor, FillMaskPreprocessor,
                      ZeroShotClassificationPreprocessor, NERPreprocessor,
                      TextErrorCorrectionPreprocessor)
                      TokenClassificationPreprocessor,
                      SingleSentenceClassificationPreprocessor,
                      PairSentenceClassificationPreprocessor,
                      FillMaskPreprocessor, ZeroShotClassificationPreprocessor,
                      NERPreprocessor, TextErrorCorrectionPreprocessor)
    from .space import (DialogIntentPredictionPreprocessor,
                        DialogModelingPreprocessor,
                        DialogStateTrackingPreprocessor)
@@ -46,8 +46,8 @@ else:
        'nlp': [
            'Tokenize', 'SequenceClassificationPreprocessor',
            'TextGenerationPreprocessor', 'TokenClassificationPreprocessor',
            'NLIPreprocessor', 'SentimentClassificationPreprocessor',
            'SentenceSimilarityPreprocessor', 'FillMaskPreprocessor',
            'SingleSentenceClassificationPreprocessor',
            'PairSentenceClassificationPreprocessor', 'FillMaskPreprocessor',
            'ZeroShotClassificationPreprocessor', 'NERPreprocessor',
            'TextErrorCorrectionPreprocessor'
        ],
--- a/modelscope/preprocessors/base.py
+++ b/modelscope/preprocessors/base.py
@@ -1,5 +1,5 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.

 import os
 from abc import ABC, abstractmethod
 from typing import Any, Dict

@@ -10,6 +10,8 @@ class Preprocessor(ABC):

    def __init__(self, *args, **kwargs):
        self._mode = ModeKeys.INFERENCE
        self.device = int(
            os.environ['LOCAL_RANK']) if 'LOCAL_RANK' in os.environ else None
        pass

    @abstractmethod
--- a/modelscope/preprocessors/nlp.py
+++ b/modelscope/preprocessors/nlp.py
@@ -2,14 +2,14 @@

 import os.path as osp
 import uuid
 from typing import Any, Dict, Optional, Union
 from typing import Any, Dict, Iterable, Optional, Tuple, Union

 from transformers import AutoTokenizer

 from modelscope.metainfo import Preprocessors
 from modelscope.models import Model
 from modelscope.metainfo import Models, Preprocessors
 from modelscope.outputs import OutputKeys
 from modelscope.utils.constant import Fields, InputFields, ModeKeys
 from modelscope.utils.hub import parse_label_mapping
 from modelscope.utils.hub import get_model_type, parse_label_mapping
 from modelscope.utils.type_assert import type_assert
 from .base import Preprocessor
 from .builder import PREPROCESSORS
@@ -17,8 +17,8 @@ from .builder import PREPROCESSORS
 __all__ = [
    'Tokenize', 'SequenceClassificationPreprocessor',
    'TextGenerationPreprocessor', 'TokenClassificationPreprocessor',
    'NLIPreprocessor', 'SentimentClassificationPreprocessor',
    'FillMaskPreprocessor', 'SentenceSimilarityPreprocessor',
    'PairSentenceClassificationPreprocessor',
    'SingleSentenceClassificationPreprocessor', 'FillMaskPreprocessor',
    'ZeroShotClassificationPreprocessor', 'NERPreprocessor',
    'TextErrorCorrectionPreprocessor'
 ]
@@ -38,99 +38,6 @@ class Tokenize(Preprocessor):
        return data


 class NLPPreprocessorBase(Preprocessor):

    def __init__(self, model_dir: str, *args, **kwargs):
        """preprocess the data via the vocab.txt from the `model_dir` path

        Args:
            model_dir (str): model path
        """

        super().__init__(*args, **kwargs)
        self.model_dir: str = model_dir
        self.first_sequence: str = kwargs.pop('first_sequence',
                                              'first_sequence')
        self.second_sequence = kwargs.pop('second_sequence', 'second_sequence')
        self.tokenize_kwargs = kwargs
        self.tokenizer = self.build_tokenizer(model_dir)
        self.label2id = parse_label_mapping(self.model_dir)

    def build_tokenizer(self, model_dir):
        from sofa import SbertTokenizer
        return SbertTokenizer.from_pretrained(model_dir)

    @type_assert(object, object)
    def __call__(self, data: Union[str, tuple, Dict]) -> Dict[str, Any]:
        """process the raw input data

        Args:
            data (tuple): [sentence1, sentence2]
                sentence1 (str): a sentence
                    Example:
                        'you are so handsome.'
                sentence2 (str): a sentence
                    Example:
                        'you are so beautiful.'
        Returns:
            Dict[str, Any]: the preprocessed data
        """

        text_a, text_b = None, None
        if isinstance(data, str):
            text_a = data
        elif isinstance(data, tuple):
            assert len(data) == 2
            text_a, text_b = data
        elif isinstance(data, dict):
            text_a = data.get(self.first_sequence)
            text_b = data.get(self.second_sequence, None)

        rst = self.tokenizer(text_a, text_b, **self.tokenize_kwargs)
        if self._mode == ModeKeys.TRAIN:
            rst = {k: v.squeeze() for k, v in rst.items()}
            if self.label2id is not None and 'label' in data:
                rst['label'] = self.label2id[str(data['label'])]
        return rst


@PREPROCESSORS.register_module(
    Fields.nlp, module_name=Preprocessors.nli_tokenizer)
 class NLIPreprocessor(NLPPreprocessorBase):

    def __init__(self, model_dir: str, *args, **kwargs):
        kwargs['truncation'] = True
        kwargs['padding'] = False
        kwargs['return_tensors'] = 'pt'
        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
        super().__init__(model_dir, *args, **kwargs)


@PREPROCESSORS.register_module(
    Fields.nlp, module_name=Preprocessors.sen_cls_tokenizer)
 class SentimentClassificationPreprocessor(NLPPreprocessorBase):

    def __init__(self, model_dir: str, *args, **kwargs):
        kwargs['truncation'] = True
        kwargs['padding'] = 'max_length'
        kwargs['return_tensors'] = 'pt'
        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
        super().__init__(model_dir, *args, **kwargs)


@PREPROCESSORS.register_module(
    Fields.nlp, module_name=Preprocessors.sen_sim_tokenizer)
 class SentenceSimilarityPreprocessor(NLPPreprocessorBase):

    def __init__(self, model_dir: str, *args, **kwargs):
        kwargs['truncation'] = True
        kwargs['padding'] = False if 'padding' not in kwargs else kwargs[
            'padding']
        kwargs['return_tensors'] = 'pt'
        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
        super().__init__(model_dir, *args, **kwargs)


@PREPROCESSORS.register_module(
    Fields.nlp, module_name=Preprocessors.bert_seq_cls_tokenizer)
 class SequenceClassificationPreprocessor(Preprocessor):
@@ -197,32 +104,193 @@ class SequenceClassificationPreprocessor(Preprocessor):
        return rst


 class NLPTokenizerPreprocessorBase(Preprocessor):

    def __init__(self, model_dir: str, pair: bool, mode: str, **kwargs):
        """preprocess the data via the vocab.txt from the `model_dir` path

        Args:
            model_dir (str): model path
        """

        super().__init__(**kwargs)
        self.model_dir: str = model_dir
        self.first_sequence: str = kwargs.pop('first_sequence',
                                              'first_sequence')
        self.second_sequence = kwargs.pop('second_sequence', 'second_sequence')
        self.pair = pair
        self._mode = mode
        self.label = kwargs.pop('label', OutputKeys.LABEL)
        self.label2id = None
        if 'label2id' in kwargs:
            self.label2id = kwargs.pop('label2id')
        if self.label2id is None:
            self.label2id = parse_label_mapping(self.model_dir)

        self.tokenize_kwargs = kwargs
        self.tokenizer = self.build_tokenizer(model_dir)

    @property
    def id2label(self):
        if self.label2id is not None:
            return {id: label for label, id in self.label2id.items()}
        return None

    def build_tokenizer(self, model_dir):
        model_type = get_model_type(model_dir)
        if model_type in (Models.structbert, Models.gpt3, Models.palm):
            from modelscope.models.nlp.structbert import SbertTokenizerFast
            return SbertTokenizerFast.from_pretrained(model_dir)
        elif model_type == Models.veco:
            from modelscope.models.nlp.veco import VecoTokenizerFast
            return VecoTokenizerFast.from_pretrained(model_dir)
        else:
            return AutoTokenizer.from_pretrained(model_dir)

    def __call__(self, data: Union[str, Tuple, Dict]) -> Dict[str, Any]:
        """process the raw input data

        Args:
            data (tuple): [sentence1, sentence2]
                sentence1 (str): a sentence
                    Example:
                        'you are so handsome.'
                sentence2 (str): a sentence
                    Example:
                        'you are so beautiful.'
        Returns:
            Dict[str, Any]: the preprocessed data
        """

        text_a, text_b, labels = self.parse_text_and_label(data)
        output = self.tokenizer(
            text_a,
            text_b,
            return_tensors='pt' if self._mode == ModeKeys.INFERENCE else None,
            **self.tokenize_kwargs)
        self.labels_to_id(labels, output)
        return output

    def parse_text_and_label(self, data):
        text_a, text_b, labels = None, None, None
        if isinstance(data, str):
            text_a = data
        elif isinstance(data, tuple) or isinstance(data, list):
            if len(data) == 3:
                text_a, text_b, labels = data
            elif len(data) == 2:
                if self.pair:
                    text_a, text_b = data
                else:
                    text_a, labels = data
        elif isinstance(data, dict):
            text_a = data.get(self.first_sequence)
            text_b = data.get(self.second_sequence)
            labels = data.get(self.label)

        return text_a, text_b, labels

    def labels_to_id(self, labels, output):

        def label_can_be_mapped(label):
            return isinstance(label, str) or isinstance(label, int)

        if labels is not None:
            if isinstance(labels, Iterable) and all([label_can_be_mapped(label) for label in labels]) \
                    and self.label2id is not None:
                output[OutputKeys.LABEL] = [
                    self.label2id[str(label)] for label in labels
                ]
            elif label_can_be_mapped(labels) and self.label2id is not None:
                output[OutputKeys.LABEL] = self.label2id[str(labels)]
            else:
                output[OutputKeys.LABEL] = labels


@PREPROCESSORS.register_module(
    Fields.nlp, module_name='bert-seq-cls-tokenizer-finetune')
 class SentenceSimilarityFinetunePreprocessor(SentenceSimilarityPreprocessor):
    """Sentence similarity preprocessor in the finetune scenario
    Fields.nlp, module_name=Preprocessors.nli_tokenizer)
@PREPROCESSORS.register_module(
    Fields.nlp, module_name=Preprocessors.sen_sim_tokenizer)
 class PairSentenceClassificationPreprocessor(NLPTokenizerPreprocessorBase):

    def __init__(self, model_dir: str, mode=ModeKeys.INFERENCE, **kwargs):
        kwargs['truncation'] = kwargs.get('truncation', True)
        kwargs['padding'] = kwargs.get(
            'padding', False if mode == 'inference' else 'max_length')
        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
        super().__init__(model_dir, pair=True, mode=mode, **kwargs)

    Mainly added the label mapping procedure.
    """

    def __init__(self, model_dir: str, *args, **kwargs):
        kwargs['padding'] = 'max_length'
        super().__init__(model_dir, *args, **kwargs)
@PREPROCESSORS.register_module(
    Fields.nlp, module_name=Preprocessors.sen_cls_tokenizer)
 class SingleSentenceClassificationPreprocessor(NLPTokenizerPreprocessorBase):

    def __init__(self, model_dir: str, mode=ModeKeys.INFERENCE, **kwargs):
        kwargs['truncation'] = kwargs.get('truncation', True)
        kwargs['padding'] = kwargs.get(
            'padding', False if mode == 'inference' else 'max_length')
        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
        super().__init__(model_dir, pair=False, mode=mode, **kwargs)


@PREPROCESSORS.register_module(
    Fields.nlp, module_name=Preprocessors.zero_shot_cls_tokenizer)
 class ZeroShotClassificationPreprocessor(NLPTokenizerPreprocessorBase):

    def __init__(self, model_dir: str, mode=ModeKeys.INFERENCE, **kwargs):
        """preprocess the data via the vocab.txt from the `model_dir` path

        Args:
            model_dir (str): model path
        """
        self.sequence_length = kwargs.pop('sequence_length', 512)
        super().__init__(model_dir, pair=False, mode=mode, **kwargs)

    def __call__(self, data: Union[str, Dict], hypothesis_template: str,
                 candidate_labels: list) -> Dict[str, Any]:
        """process the raw input data

        Args:
            data (str or dict): a sentence
                Example:
                    'you are so handsome.'

        Returns:
            Dict[str, Any]: the preprocessed data
        """
        if isinstance(data, dict):
            data = data.get(self.first_sequence)

        pairs = [[data, hypothesis_template.format(label)]
                 for label in candidate_labels]

        features = self.tokenizer(
            pairs,
            padding=True,
            truncation=True,
            max_length=self.sequence_length,
            truncation_strategy='only_first',
            return_tensors='pt' if self._mode == ModeKeys.INFERENCE else None)
        return features


@PREPROCESSORS.register_module(
    Fields.nlp, module_name=Preprocessors.text_gen_tokenizer)
 class TextGenerationPreprocessor(NLPPreprocessorBase):
 class TextGenerationPreprocessor(NLPTokenizerPreprocessorBase):

    def __init__(self, model_dir: str, tokenizer=None, *args, **kwargs):
    def __init__(self,
                 model_dir: str,
                 tokenizer=None,
                 mode=ModeKeys.INFERENCE,
                 **kwargs):
        self.tokenizer = self.build_tokenizer(
            model_dir) if tokenizer is None else tokenizer
        kwargs['truncation'] = True
        kwargs['padding'] = True
        kwargs['return_tensors'] = 'pt'
        kwargs['return_token_type_ids'] = False
        kwargs['truncation'] = kwargs.get('truncation', True)
        kwargs['padding'] = kwargs.get('padding', True)
        kwargs['return_token_type_ids'] = kwargs.get('return_token_type_ids',
                                                     False)
        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
        super().__init__(model_dir, *args, **kwargs)
        super().__init__(model_dir, pair=False, mode=mode, **kwargs)

    @staticmethod
    def get_roberta_tokenizer_dir(model_dir: str) -> Optional[str]:
@@ -240,19 +308,13 @@ class TextGenerationPreprocessor(NLPPreprocessorBase):
                roberta_tokenizer_dir, do_lower_case=False)
        return super().build_tokenizer(model_dir)


@PREPROCESSORS.register_module(
    Fields.nlp, module_name='palm-text-gen-tokenizer-finetune')
 class TextGenerationFinetunePreprocessor(TextGenerationPreprocessor):

    @type_assert(object, dict)
    def __call__(self, data: dict) -> Dict[str, Any]:
    def __call__(self, data: Union[Dict, str]) -> Dict[str, Any]:
        if self._mode == 'inference':
            return super().__call__(data)
        src_txt = data['src_txt']
        tgt_txt = data['tgt_txt']
        src_rst = super().__call__(src_txt)
        tgt_rst = super().__call__(tgt_txt)
        src_rst = {k: v.squeeze() for k, v in src_rst.items()}
        tgt_rst = {k: v.squeeze() for k, v in tgt_rst.items()}

        return {
            'src': src_rst['input_ids'],
@@ -261,87 +323,69 @@ class TextGenerationFinetunePreprocessor(TextGenerationPreprocessor):
        }


@PREPROCESSORS.register_module(Fields.nlp)
 class FillMaskPreprocessor(NLPPreprocessorBase):
@PREPROCESSORS.register_module(Fields.nlp, module_name=Preprocessors.fill_mask)
 class FillMaskPreprocessor(NLPTokenizerPreprocessorBase):

    def __init__(self, model_dir: str, *args, **kwargs):
        kwargs['truncation'] = True
        kwargs['padding'] = 'max_length'
        kwargs['return_tensors'] = 'pt'
    def __init__(self, model_dir: str, mode=ModeKeys.INFERENCE, **kwargs):
        kwargs['truncation'] = kwargs.get('truncation', True)
        kwargs['padding'] = kwargs.get('padding', 'max_length')
        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
        kwargs['return_token_type_ids'] = True
        super().__init__(model_dir, *args, **kwargs)

    def build_tokenizer(self, model_dir):
        from modelscope.utils.hub import get_model_type
        model_type = get_model_type(model_dir)
        if model_type in ['sbert', 'structbert', 'bert']:
            from sofa import SbertTokenizer
            return SbertTokenizer.from_pretrained(model_dir, use_fast=False)
        elif model_type == 'veco':
            from sofa import VecoTokenizer
            return VecoTokenizer.from_pretrained(model_dir, use_fast=False)
        else:
            # TODO Only support veco & sbert
            raise RuntimeError(f'Unsupported model type: {model_type}')
        kwargs['return_token_type_ids'] = kwargs.get('return_token_type_ids',
                                                     True)
        super().__init__(model_dir, pair=False, mode=mode, **kwargs)


@PREPROCESSORS.register_module(
    Fields.nlp, module_name=Preprocessors.token_cls_tokenizer)
 class TokenClassificationPreprocessor(NLPPreprocessorBase):

    def __init__(self, model_dir: str, *args, **kwargs):
        super().__init__(model_dir, *args, **kwargs)

    @type_assert(object, str)
    def __call__(self, data: Union[str, Dict]) -> Dict[str, Any]:
        """process the raw input data
    Fields.nlp,
    module_name=Preprocessors.word_segment_text_to_label_preprocessor)
 class WordSegmentationBlankSetToLabelPreprocessor(Preprocessor):

        Args:
            data (str): a sentence
                Example:
                    'you are so handsome.'

        Returns:
            Dict[str, Any]: the preprocessed data
        """

        # preprocess the data for the model input
        if isinstance(data, dict):
            data = data[self.first_sequence]
        text = data.replace(' ', '').strip()
        tokens = []
        for token in text:
            token = self.tokenizer.tokenize(token)
            tokens.extend(token)
        input_ids = self.tokenizer.convert_tokens_to_ids(tokens)
        input_ids = self.tokenizer.build_inputs_with_special_tokens(input_ids)
        attention_mask = [1] * len(input_ids)
        token_type_ids = [0] * len(input_ids)
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.first_sequence: str = kwargs.pop('first_sequence',
                                              'first_sequence')
        self.label = kwargs.pop('label', OutputKeys.LABELS)

    def __call__(self, data: str) -> Union[Dict[str, Any], Tuple]:
        data = data.split(' ')
        data = list(filter(lambda x: len(x) > 0, data))

        def produce_train_sample(words):
            chars = []
            labels = []
            for word in words:
                chars.extend(list(word))
                if len(word) == 1:
                    labels.append('S-CWS')
                else:
                    labels.extend(['B-CWS'] + ['I-CWS'] * (len(word) - 2)
                                  + ['E-CWS'])
            assert len(chars) == len(labels)
            return chars, labels

        chars, labels = produce_train_sample(data)
        return {
            'text': text,
            'input_ids': input_ids,
            'attention_mask': attention_mask,
            'token_type_ids': token_type_ids
            self.first_sequence: chars,
            self.label: labels,
        }


@PREPROCESSORS.register_module(
    Fields.nlp, module_name=Preprocessors.zero_shot_cls_tokenizer)
 class ZeroShotClassificationPreprocessor(NLPPreprocessorBase):

    def __init__(self, model_dir: str, *args, **kwargs):
        """preprocess the data via the vocab.txt from the `model_dir` path
    Fields.nlp, module_name=Preprocessors.token_cls_tokenizer)
 class TokenClassificationPreprocessor(NLPTokenizerPreprocessorBase):

        Args:
            model_dir (str): model path
        """
        self.sequence_length = kwargs.pop('sequence_length', 512)
        super().__init__(model_dir, *args, **kwargs)
    def __init__(self, model_dir: str, mode=ModeKeys.INFERENCE, **kwargs):
        kwargs['truncation'] = kwargs.get('truncation', True)
        kwargs['padding'] = kwargs.get(
            'padding', False if mode == ModeKeys.INFERENCE else 'max_length')
        kwargs['max_length'] = kwargs.pop('sequence_length', 128)
        kwargs['is_split_into_words'] = kwargs.pop(
            'is_split_into_words',
            False if mode == ModeKeys.INFERENCE else True)
        self.label_all_tokens = kwargs.pop('label_all_tokens', False)
        super().__init__(model_dir, pair=False, mode=mode, **kwargs)

    @type_assert(object, str)
    def __call__(self, data, hypothesis_template: str,
                 candidate_labels: list) -> Dict[str, Any]:
    def __call__(self, data: Union[str, Dict]) -> Dict[str, Any]:
        """process the raw input data

        Args:
@@ -352,20 +396,74 @@ class ZeroShotClassificationPreprocessor(NLPPreprocessorBase):
        Returns:
            Dict[str, Any]: the preprocessed data
        """
        if isinstance(data, dict):
            data = data.get(self.first_sequence)

        pairs = [[data, hypothesis_template.format(label)]
                 for label in candidate_labels]

        features = self.tokenizer(
            pairs,
            padding=True,
            truncation=True,
            max_length=self.sequence_length,
            return_tensors='pt',
            truncation_strategy='only_first')
        return features
        # preprocess the data for the model input
        # if isinstance(data, dict):
        #     data = data[self.first_sequence]
        # text = data.replace(' ', '').strip()
        # tokens = []
        # for token in text:
        #     token = self.tokenizer.tokenize(token)
        #     tokens.extend(token)
        # input_ids = self.tokenizer.convert_tokens_to_ids(tokens)
        # input_ids = self.tokenizer.build_inputs_with_special_tokens(input_ids)
        # attention_mask = [1] * len(input_ids)
        # token_type_ids = [0] * len(input_ids)

        # new code to deal with labels
        # tokenized_inputs = self.tokenizer(data, truncation=True, is_split_into_words=True)

        text_a = None
        labels_list = None
        if isinstance(data, str):
            text_a = data
        elif isinstance(data, dict):
            text_a = data.get(self.first_sequence)
            labels_list = data.get(self.label)
        tokenized_inputs = self.tokenizer(
            text_a,
            return_tensors='pt' if self._mode == ModeKeys.INFERENCE else None,
            **self.tokenize_kwargs)

        if labels_list is not None:
            assert self.label2id is not None
            # Map that sends B-Xxx label to its I-Xxx counterpart
            b_to_i_label = []
            label_enumerate_values = [
                k for k, v in sorted(
                    self.label2id.items(), key=lambda item: item[1])
            ]
            for idx, label in enumerate(label_enumerate_values):
                if label.startswith('B-') and label.replace(
                        'B-', 'I-') in label_enumerate_values:
                    b_to_i_label.append(
                        label_enumerate_values.index(
                            label.replace('B-', 'I-')))
                else:
                    b_to_i_label.append(idx)

            label_row = [self.label2id[lb] for lb in labels_list]
            word_ids = tokenized_inputs.word_ids()
            previous_word_idx = None
            label_ids = []
            for word_idx in word_ids:
                if word_idx is None:
                    label_ids.append(-100)
                elif word_idx != previous_word_idx:
                    label_ids.append(label_row[word_idx])
                else:
                    if self.label_all_tokens:
                        label_ids.append(b_to_i_label[label_row[word_idx]])
                    else:
                        label_ids.append(-100)
                previous_word_idx = word_idx
            labels = label_ids
            tokenized_inputs['labels'] = labels
            # new code end

        if self._mode == ModeKeys.INFERENCE:
            tokenized_inputs[OutputKeys.TEXT] = text_a
        return tokenized_inputs


@PREPROCESSORS.register_module(
--- a/modelscope/preprocessors/space/dialog_state_tracking_preprocessor.py
+++ b/modelscope/preprocessors/space/dialog_state_tracking_preprocessor.py
@@ -24,7 +24,7 @@ class DialogStateTrackingPreprocessor(Preprocessor):
        """
        super().__init__(*args, **kwargs)

        from sofa.models.space import SpaceConfig, SpaceTokenizer
        from modelscope.models.nlp.space import SpaceConfig, SpaceTokenizer
        self.model_dir: str = model_dir
        self.config = SpaceConfig.from_pretrained(self.model_dir)
        self.tokenizer = SpaceTokenizer.from_pretrained(self.model_dir)
--- a/modelscope/task_datasets/init.py
+++ b/modelscope/task_datasets/init.py
@@ -7,12 +7,14 @@ if TYPE_CHECKING:
    from .base import TaskDataset
    from .builder import TASK_DATASETS, build_task_dataset
    from .torch_base_dataset import TorchTaskDataset
    from .veco_dataset import VecoDataset

 else:
    _import_structure = {
        'base': ['TaskDataset'],
        'builder': ['TASK_DATASETS', 'build_task_dataset'],
        'torch_base_dataset': ['TorchTaskDataset'],
        'veco_dataset': ['VecoDataset'],
    }
    import sys

--- a/modelscope/task_datasets/base.py
+++ b/modelscope/task_datasets/base.py
@@ -1,6 +1,6 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.
 from abc import ABC, abstractmethod
 from typing import Any, List, Tuple
 from typing import Any, List, Tuple, Union


 class TaskDataset(ABC):
@@ -8,7 +8,7 @@ class TaskDataset(ABC):
    """

    def __init__(self,
                 datasets: Tuple[Any, List[Any]],
                 datasets: Union[Any, List[Any]],
                 mode,
                 preprocessor=None,
                 **kwargs):
@@ -18,7 +18,7 @@ class TaskDataset(ABC):
        self._inner_dataset = self.prepare_dataset(datasets)

    @abstractmethod
    def prepare_dataset(self, datasets: Tuple[Any, List[Any]]) -> Any:
    def prepare_dataset(self, datasets: Union[Any, List[Any]]) -> Any:
        """Prepare a dataset.

        User can process the input datasets in a whole dataset perspective.
--- a/modelscope/task_datasets/torch_base_dataset.py
+++ b/modelscope/task_datasets/torch_base_dataset.py
@@ -1,5 +1,5 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.
 from typing import Any, List, Tuple
 from typing import Any, List, Tuple, Union

 from torch.utils.data import ConcatDataset, Dataset

@@ -14,7 +14,7 @@ class TorchTaskDataset(TaskDataset, Dataset):
    """

    def __init__(self,
                 datasets: Tuple[Any, List[Any]],
                 datasets: Union[Any, List[Any]],
                 mode,
                 preprocessor=None,
                 **kwargs):
@@ -26,7 +26,7 @@ class TorchTaskDataset(TaskDataset, Dataset):
    def __len__(self):
        return len(self._inner_dataset)

    def prepare_dataset(self, datasets: Tuple[Any, List[Any]]) -> Any:
    def prepare_dataset(self, datasets: Union[Any, List[Any]]) -> Any:
        """Prepare a dataset.

        User can process the input datasets in a whole dataset perspective.
--- a/modelscope/task_datasets/veco_dataset.py
+++ b/modelscope/task_datasets/veco_dataset.py
@@ -0,0 +1,76 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.
 from typing import Any, List, Union

 import numpy as np
 from datasets import Dataset, IterableDataset, concatenate_datasets

 from modelscope.metainfo import Models
 from modelscope.utils.constant import Tasks
 from .builder import TASK_DATASETS
 from .torch_base_dataset import TorchTaskDataset


@TASK_DATASETS.register_module(module_name=Models.veco, group_key=Tasks.nli)
 class VecoDataset(TorchTaskDataset):

    def __init__(self,
                 datasets: Union[Any, List[Any]],
                 mode,
                 preprocessor=None,
                 **kwargs):
        self.seed = kwargs.get('seed', 42)
        self.permutation = None
        self.datasets = None
        super().__init__(datasets, mode, preprocessor, **kwargs)

    def switch_dataset(self, idx):
        """Switch dataset in evaluation.

        Veco evaluates dataset one by one.

        Args:
            idx: The index of the dataset
        """
        if self.mode == 'train':
            raise ValueError(
                'Only support switch dataset in the evaluation loop')
        if idx >= len(self.datasets):
            raise ValueError(
                'Index is bigger than the number of the datasets.')
        self._inner_dataset = self.datasets[idx]

    def __getitem__(self, item):
        if self.permutation is not None:
            item = self.permutation[item]
        return super().__getitem__(item)

    def prepare_dataset(self, datasets: Union[Any, List[Any]]) -> Any:
        """Compose all the datasets.

        If the mode is 'train', all datasets will be mixed together, if the mode is 'eval',
        the datasets will be kept and returns the first one.

        Args:
            datasets: The datasets to be composed.

        Returns: The final dataset.
        """
        if not isinstance(datasets, (list, tuple)):
            datasets = [datasets]
        if self.mode == 'train':
            if len(datasets) == 1:
                return datasets[0]
            elif all([
                    isinstance(dataset, (Dataset, IterableDataset))
                    for dataset in datasets
            ]):
                dataset = concatenate_datasets(list(datasets))
                return dataset.shuffle(seed=self.seed)
            else:
                generator = np.random.default_rng(self.seed)
                _len = sum([len(dataset) for dataset in datasets])
                self.permutation = generator.permutation(_len)
            return super().prepare_dataset(datasets)
        else:
            self.datasets = datasets
            return self.datasets[0]
--- a/modelscope/trainers/init.py
+++ b/modelscope/trainers/init.py
@@ -4,4 +4,5 @@ from .cv import (ImageInstanceSegmentationTrainer,
                 ImagePortraitEnhancementTrainer)
 from .multi_modal import CLIPTrainer
 from .nlp import SequenceClassificationTrainer
 from .nlp_trainer import NlpEpochBasedTrainer, VecoTrainer
 from .trainer import EpochBasedTrainer
--- a/modelscope/trainers/hooks/evaluation_hook.py
+++ b/modelscope/trainers/hooks/evaluation_hook.py
@@ -32,6 +32,7 @@ class EvaluationHook(Hook):
    def do_evaluate(self, trainer):
        """Evaluate the results."""
        eval_res = trainer.evaluate()
        trainer.data_loader = trainer.train_dataloader
        for name, val in eval_res.items():
            trainer.log_buffer.output[name] = val

--- a/modelscope/trainers/hooks/lr_scheduler_hook.py
+++ b/modelscope/trainers/hooks/lr_scheduler_hook.py
@@ -21,9 +21,6 @@ class LrSchedulerHook(Hook):
    def __init__(self, by_epoch=True, warmup=None) -> None:
        super().__init__()
        self.by_epoch = by_epoch
        if not self.by_epoch:
            raise ValueError('We only support ``by_epoch=True`` now!')

        self.warmup = warmup
        self.warmup_lr_scheduler = None

@@ -49,6 +46,11 @@ class LrSchedulerHook(Hook):
        return lr

    def before_train_iter(self, trainer):
        if not self.by_epoch:
            if self.warmup_lr_scheduler is not None:
                self.warmup_lr_scheduler.step()
            else:
                trainer.lr_scheduler.step()
        trainer.log_buffer.output[LogKeys.LR] = self._get_log_lr(trainer)

    def before_train_epoch(self, trainer):
--- a/modelscope/trainers/nlp_trainer.py
+++ b/modelscope/trainers/nlp_trainer.py
@@ -0,0 +1,192 @@
 import os
 from typing import Callable, Dict, Optional, Tuple, Union

 import torch
 from torch import nn
 from torch.utils.data import Dataset

 from modelscope.hub.snapshot_download import snapshot_download
 from modelscope.metrics.builder import build_metric
 from modelscope.models.base import Model, TorchModel
 from modelscope.msdatasets import MsDataset
 from modelscope.preprocessors import Preprocessor, build_preprocessor
 from modelscope.utils.config import Config, ConfigDict
 from modelscope.utils.constant import (DEFAULT_MODEL_REVISION, ModeKeys,
                                       ModelFile, Tasks)
 from .base import TRAINERS
 from .trainer import EpochBasedTrainer


@TRAINERS.register_module(module_name='NlpEpochBasedTrainer')
 class NlpEpochBasedTrainer(EpochBasedTrainer):

    def __init__(
            self,
            model: Optional[Union[TorchModel, nn.Module, str]] = None,
            cfg_file: Optional[str] = None,
            cfg_modify_fn: Optional[Callable] = None,
            arg_parse_fn: Optional[Callable] = None,
            data_collator: Optional[Callable] = None,
            train_dataset: Optional[Union[MsDataset, Dataset]] = None,
            eval_dataset: Optional[Union[MsDataset, Dataset]] = None,
            preprocessor: Optional[Preprocessor] = None,
            optimizers: Tuple[torch.optim.Optimizer,
                              torch.optim.lr_scheduler._LRScheduler] = (None,
                                                                        None),
            model_revision: Optional[str] = DEFAULT_MODEL_REVISION,
            **kwargs):
        """Add code to adapt with nlp models.

        Args:
            cfg_modify_fn: An input fn which is used to modify the cfg read out of the file.
        """

        if isinstance(model, str):
            if os.path.exists(model):
                model_dir = model if os.path.isdir(model) else os.path.dirname(
                    model)
            else:
                model_dir = snapshot_download(model, revision=model_revision)
            cfg_file = os.path.join(model_dir, ModelFile.CONFIGURATION)
        else:
            assert cfg_file is not None, 'Config file should not be None if model is an nn.Module class'
            model_dir = os.path.dirname(cfg_file)

        self.cfg_modify_fn = cfg_modify_fn
        self.cfg = self.rebuild_config(Config.from_file(cfg_file))
        try:
            labels = self.cfg.dataset.train.labels
        except AttributeError:
            labels = None

        self.label2id = None
        self.num_labels = None
        if labels is not None and len(labels) > 0:
            self.label2id = {label: idx for idx, label in enumerate(labels)}
            self.id2label = {idx: label for idx, label in enumerate(labels)}
            self.num_labels = len(labels)

        def build_dataset_keys(cfg):
            if cfg is not None:
                input_keys = {
                    'first_sequence': getattr(cfg, 'first_sequence', None),
                    'second_sequence': getattr(cfg, 'second_sequence', None),
                    'label': getattr(cfg, 'label', None),
                }
            else:
                input_keys = {}

            return {k: v for k, v in input_keys.items() if v is not None}

        self.train_keys = build_dataset_keys(
            self.cfg.dataset.train if hasattr(self.cfg, 'dataset')
            and hasattr(self.cfg.dataset, 'train') else None)
        # TODO eval may has special keys, which is now not supported.
        # because there is only one preprocessor in the trainer, and it only supports one group of keys.
        self.eval_keys = self.train_keys

        super().__init__(
            model=model_dir,
            cfg_file=cfg_file,
            arg_parse_fn=arg_parse_fn,
            data_collator=data_collator,
            preprocessor=preprocessor,
            optimizers=optimizers,
            model_revision=model_revision,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,
            **kwargs)

    def rebuild_config(self, cfg: Config):
        if self.cfg_modify_fn is not None:
            return self.cfg_modify_fn(cfg)
        return cfg

    def build_model(self) -> Union[nn.Module, TorchModel]:
        """ Instantiate a pytorch model and return.

        By default, we will create a model using config from configuration file. You can
        override this method in a subclass.

        """
        model_args = {} if self.num_labels is None else {
            'num_labels': self.num_labels
        }
        model = Model.from_pretrained(
            self.model_dir, cfg_dict=self.cfg, **model_args)
        if not isinstance(model, nn.Module) and hasattr(model, 'model'):
            return model.model
        elif isinstance(model, nn.Module):
            return model

    def build_preprocessor(self) -> Preprocessor:
        """Build the preprocessor.

        User can override this method to implement custom logits.

        Returns: The preprocessor instance.

        """
        model_args = {} if self.label2id is None else {
            'label2id': self.label2id
        }
        cfg = ConfigDict({
            **getattr(self.cfg, 'preprocessor'),
            'model_dir':
            self.model_dir,
            **model_args,
            'mode':
            ModeKeys.TRAIN,
            **self.train_keys,
        })
        return build_preprocessor(cfg, Tasks.find_field_by_task(self.cfg.task))


@TRAINERS.register_module(module_name='VecoTrainer')
 class VecoTrainer(NlpEpochBasedTrainer):

    def evaluate(self, checkpoint_path=None):
        """Veco evaluates the datasets one by one.

        """
        from modelscope.task_datasets import VecoDataset
        self.model.eval()
        self._mode = ModeKeys.EVAL
        metric_values = {}

        if self.eval_dataset is None:
            val_data = self.cfg.dataset.val
            self.eval_dataset = self.build_dataset(
                val_data, mode=ModeKeys.EVAL)

        idx = 0
        dataset_cnt = 1
        if isinstance(self.eval_dataset, VecoDataset):
            self.eval_dataset.switch_dataset(idx)
            dataset_cnt = len(self.eval_dataset.datasets)

        while True:
            self.eval_dataloader = self._build_dataloader_with_dataset(
                self.eval_dataset, **self.cfg.evaluation.get('dataloader', {}))
            self.data_loader = self.eval_dataloader

            metric_classes = [
                build_metric(metric, default_args={'trainer': self})
                for metric in self.metrics
            ]
            self.evaluation_loop(self.eval_dataloader, checkpoint_path,
                                 metric_classes)

            for m_idx, metric_cls in enumerate(metric_classes):
                if f'eval_dataset[{idx}]' not in metric_values:
                    metric_values[f'eval_dataset[{idx}]'] = {}
                metric_values[f'eval_dataset[{idx}]'][
                    self.metrics[m_idx]] = metric_cls.evaluate()

            idx += 1
            if idx < dataset_cnt:
                self.eval_dataset.switch_dataset(idx)
            else:
                break

        return metric_values
--- a/modelscope/trainers/trainer.py
+++ b/modelscope/trainers/trainer.py
@@ -22,7 +22,8 @@ from modelscope.models.base import Model, TorchModel
 from modelscope.msdatasets.ms_dataset import MsDataset
 from modelscope.preprocessors import build_preprocessor
 from modelscope.preprocessors.base import Preprocessor
 from modelscope.task_datasets import TorchTaskDataset, build_task_dataset
 from modelscope.task_datasets.builder import build_task_dataset
 from modelscope.task_datasets.torch_base_dataset import TorchTaskDataset
 from modelscope.trainers.hooks.builder import HOOKS
 from modelscope.trainers.hooks.priority import Priority, get_priority
 from modelscope.trainers.lrscheduler.builder import build_lr_scheduler
@@ -30,12 +31,12 @@ from modelscope.trainers.optimizer.builder import build_optimizer
 from modelscope.utils.config import Config, ConfigDict
 from modelscope.utils.constant import (DEFAULT_MODEL_REVISION, Hubs, ModeKeys,
                                       ModelFile, Tasks, TrainerStages)
 from modelscope.utils.file_utils import func_receive_dict_inputs
 from modelscope.utils.logger import get_logger
 from modelscope.utils.registry import build_from_cfg
 from modelscope.utils.tensor_utils import torch_default_data_collator
 from modelscope.utils.torch_utils import (broadcast, create_device,
                                          get_dist_info, init_dist)
 from modelscope.utils.utils import if_func_receive_dict_inputs
 from .base import BaseTrainer
 from .builder import TRAINERS
 from .default_config import DEFAULT_CONFIG
@@ -87,6 +88,7 @@ class EpochBasedTrainer(BaseTrainer):
                                                                        None),
            model_revision: Optional[str] = DEFAULT_MODEL_REVISION,
            **kwargs):

        if isinstance(model, str):
            if os.path.exists(model):
                self.model_dir = model if os.path.isdir(
@@ -108,9 +110,9 @@ class EpochBasedTrainer(BaseTrainer):
            self.model = model

        super().__init__(cfg_file, arg_parse_fn)

        # add default config
        self.cfg.merge_from_dict(self._get_default_config(), force=False)
        self.cfg = self.rebuild_config(self.cfg)

        if 'work_dir' in kwargs:
            self.work_dir = kwargs['work_dir']
@@ -130,9 +132,9 @@ class EpochBasedTrainer(BaseTrainer):
        self.device = create_device(device_name == 'cpu')

        self.train_dataset = self.to_task_dataset(
            train_dataset, mode='train', preprocessor=self.preprocessor)
            train_dataset, mode=ModeKeys.TRAIN, preprocessor=self.preprocessor)
        self.eval_dataset = self.to_task_dataset(
            eval_dataset, mode='eval', preprocessor=self.preprocessor)
            eval_dataset, mode=ModeKeys.EVAL, preprocessor=self.preprocessor)

        self.data_collator = data_collator if data_collator is not None else torch_default_data_collator
        self.metrics = self.get_metrics()
@@ -168,6 +170,14 @@ class EpochBasedTrainer(BaseTrainer):
            if not is_parallel(self.model) and self._dist:
                self.model = self.to_parallel(self.model)

    def rebuild_config(self, cfg: Config):
        """A method used to rebuild the config, any subclass can override this method.

        Returns: The rebuilt config

        """
        return cfg

    @property
    def mode(self):
        return self._mode
@@ -203,7 +213,7 @@ class EpochBasedTrainer(BaseTrainer):
        return self._max_epochs * len(self.data_loader)

    def to_task_dataset(self,
                        datasets: Tuple[Dataset, List[Dataset]],
                        datasets: Union[Dataset, List[Dataset]],
                        mode: str,
                        preprocessor: Optional[Preprocessor] = None):
        """Build the task specific dataset processor for this trainer.
@@ -229,17 +239,13 @@ class EpochBasedTrainer(BaseTrainer):
                cfg = ConfigDict(
                    type=self.cfg.task, mode=mode, datasets=datasets)
                return build_task_dataset(cfg, self.cfg.task)
            elif isinstance(datasets,
                            Dataset) or (isinstance(datasets, List)
                                         and isinstance(datasets[0], Dataset)):
            else:
                cfg = ConfigDict(
                    type=self.cfg.model.type, mode=mode, datasets=datasets)
                    type=self.cfg.model.type,
                    mode=mode,
                    datasets=datasets,
                    preprocessor=preprocessor)
                return build_task_dataset(cfg, self.cfg.task)
            else:
                raise ValueError(
                    f'invalid datasets type: {type(datasets)}, '
                    f'expected  `MsDataset`, `torch.utils.data.Dataset` or list of them.'
                )
        except Exception:
            if isinstance(datasets, (List, Tuple)) or preprocessor is not None:
                return TorchTaskDataset(
@@ -262,8 +268,11 @@ class EpochBasedTrainer(BaseTrainer):
        # TODO @wenmeng.zwm @jiangnana.jnn add support for different preprocessor
        # when they are different ones in training and evaluation
        cfg = ConfigDict({
            **getattr(self.cfg, 'preprocessor'), 'model_dir':
            self.model_dir
            **getattr(self.cfg, 'preprocessor'),
            'model_dir':
            self.model_dir,
            'mode':
            ModeKeys.TRAIN,
        })
        return build_preprocessor(cfg, Tasks.find_field_by_task(self.cfg.task))

@@ -324,6 +333,8 @@ class EpochBasedTrainer(BaseTrainer):
                **self.cfg.evaluation.get('dataloader', {}))
        self.data_loader = self.eval_dataloader
        metric_classes = [build_metric(metric) for metric in self.metrics]
        for m in metric_classes:
            m.trainer = self
        metric_values = self.evaluation_loop(self.eval_dataloader,
                                             checkpoint_path, metric_classes)

@@ -338,10 +349,9 @@ class EpochBasedTrainer(BaseTrainer):
        """ Instantiate a pytorch model and return.

        By default, we will create a model using config from configuration file. You can
        subclass and override this method in a subclass.
        override this method in a subclass.

        """
        # TODO temp implementation, waiting for @zhangzhicheng
        model = Model.from_pretrained(self.model_dir)
        if not isinstance(model, nn.Module) and hasattr(model, 'model'):
            return model.model
@@ -412,9 +422,8 @@ class EpochBasedTrainer(BaseTrainer):
        self._mode = ModeKeys.TRAIN
        inputs = self.collate_fn(inputs)
        # call model forward but not __call__ to skip postprocess
        if isinstance(
                inputs,
                Mapping) and not if_func_receive_dict_inputs(model.forward):
        if isinstance(inputs,
                      Mapping) and not func_receive_dict_inputs(model.forward):
            train_outputs = model.forward(**inputs)
        else:
            train_outputs = model.forward(inputs)
@@ -495,7 +504,7 @@ class EpochBasedTrainer(BaseTrainer):
        if self.eval_dataset is None:
            val_data = self.cfg.dataset.val
            self.eval_dataset = self.build_dataset(
                val_data, mode=ModeKeys.TRAIN)
                val_data, mode=ModeKeys.EVAL)

        batch_size = self.cfg.evaluation.batch_size
        workers = self.cfg.evaluation.workers
@@ -523,7 +532,8 @@ class EpochBasedTrainer(BaseTrainer):
        )
        torch_dataset = dataset.to_torch_dataset(
            preprocessors=self.preprocessor, )
        return torch_dataset
        dataset = self.to_task_dataset(torch_dataset, mode)
        return dataset

    def create_optimizer_and_scheduler(self):
        """ Create optimizer and lr scheduler
--- a/modelscope/trainers/utils/inference.py
+++ b/modelscope/trainers/utils/inference.py
@@ -10,9 +10,9 @@ import torch
 from torch import distributed as dist
 from tqdm import tqdm

 from modelscope.utils.file_utils import func_receive_dict_inputs
 from modelscope.utils.torch_utils import (broadcast, get_dist_info, is_master,
                                          make_tmp_dir)
 from modelscope.utils.utils import if_func_receive_dict_inputs


 def single_gpu_test(model,
@@ -37,18 +37,19 @@ def single_gpu_test(model,
            if data_collate_fn is not None:
                data = data_collate_fn(data)
            with torch.no_grad():
                if isinstance(data,
                              Mapping) and not if_func_receive_dict_inputs(
                                  model.forward):

                    result = model(**data)
                if isinstance(data, Mapping) and not func_receive_dict_inputs(
                        model.forward):
                    result = model.forward(**data)
                else:
                    result = model(data)
                    result = model.forward(data)
            if metric_classes is not None:
                for metric_cls in metric_classes:
                    metric_cls.add(result, data)

            batch_size = len(result)
            if isinstance(data, dict):
                batch_size = len(next(iter(data.values())))
            else:
                batch_size = len(data)
            for _ in range(batch_size):
                pbar.update()

@@ -101,16 +102,18 @@ def multi_gpu_test(model,
                data = data_collate_fn(data)
            data_list.append(data)
            with torch.no_grad():
                if isinstance(data,
                              Mapping) and not if_func_receive_dict_inputs(
                                  model.forward):
                    result = model(**data)
                if isinstance(data, Mapping) and not func_receive_dict_inputs(
                        model.forward):
                    result = model.forward(**data)
                else:
                    result = model(data)
                    result = model.forward(data)
            results.append(result)

            if rank == 0:
                batch_size = len(result)
                if isinstance(data, dict):
                    batch_size = len(next(iter(data.values())))
                else:
                    batch_size = len(data)
                batch_size_all = batch_size * world_size
                count += batch_size_all
                if count > len(dataset):