You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

Datasets.rst 4.1 kB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
  1. `Learn the Basics <Basics.html>`_ ||
  2. `Quick Start <Quick-Start.html>`_ ||
  3. **Dataset & Data Structure** ||
  4. `Learning Part <Learning.html>`_ ||
  5. `Reasoning Part <Reasoning.html>`_ ||
  6. `Evaluation Metrics <Evaluation.html>`_ ||
  7. `Bridge <Bridge.html>`_
  8. Dataset & Data Structure
  9. ========================
  10. In this section, we will look at the dataset and data structure in ABL Kit.
  11. .. code:: python
  12. import torch
  13. from ablkit.data.structures import ListData
  14. Dataset
  15. -------
  16. ABL Kit requires user data to be either structured as a tuple ``(X, gt_pseudo_label, Y)`` or a ``ListData`` (the underlying data structure utilized in ABL Kit, cf. the next section) object with ``X``, ``gt_pseudo_label`` and ``Y`` attributes. Regardless of the chosen format, the data should encompass three essential components:
  17. - ``X``: List[List[Any]]
  18. A list of sublists representing the input data. We refer to each sublist in ``X`` as an **example** and each example may contain several **instances**.
  19. - ``gt_pseudo_label``: List[List[Any]], optional
  20. A list of sublists with each sublist representing ground-truth pseudo-labels of an example. Each pseudo-label in the sublist serves as ground-truth for each **instance** within the example.
  21. .. note::
  22. ``gt_pseudo_label`` is only used to evaluate the performance of the learning part but not to train the model. If the pseudo-label of the instances in the datasets are unlabeled, ``gt_pseudo_label`` should be ``None``.
  23. - ``Y``: List[Any]
  24. A list representing the ground-truth reasoning result for each **example** in ``X``.
  25. .. warning::
  26. The length of ``X``, ``gt_pseudo_label`` (if not ``None``) and ``Y`` should be the same. Also, each sublist in ``gt_pseudo_label`` should have the same length as the sublist in ``X``.
  27. As an illustration, in the MNIST Addition task, the data are organized as follows:
  28. .. image:: ../_static/img/Datasets_1.png
  29. :width: 350px
  30. :align: center
  31. .. |data_example| image:: ../_static/img/data_example.png
  32. :alt: alternate text
  33. :scale: 8%
  34. .. |instance| image:: ../_static/img/instance.png
  35. :alt: alternate text
  36. :scale: 55%
  37. where each sublist in ``X``, e.g., |data_example|, is a data example and each image in the sublist, e.g., |instance|, is an instance.
  38. Data Structure
  39. --------------
  40. Besides the user-provided dataset, various forms of data are utilized and dynamicly generated throughout the training and testing process of ABL framework. Examples include raw data, predicted pseudo-label, abduced pseudo-label, pseudo-label indices, etc. To manage this diversity and ensure a stable, versatile interface, ABL Kit employs `abstract data interfaces <../API/ablkit.data.html#structure>`_ to encapsulate different forms of data that will be used in the total learning process.
  41. ``ListData`` is the underlying abstract data interface utilized in ABL Kit. As the fundamental data structure, ``ListData`` implements commonly used data manipulation methods and is responsible for transferring data between various components of ABL, ensuring that stages such as prediction, abductive reasoning, and training can utilize ``ListData`` as a unified input format. Before proceeding to other stages, user-provided datasets will be firstly converted into ``ListData``.
  42. Besides providing a tuple of ``(X, gt_pseudo_label, Y)``, ABL Kit also allows users to directly supply data in ``ListData`` format, which similarly requires the inclusion of these three attributes. The following code shows the basic usage of ``ListData``. More information can be found in the `API documentation <../API/ablkit.data.html#structure>`_.
  43. .. code-block:: python
  44. # Prepare data
  45. X = [list(torch.randn(3, 28, 28)), list(torch.randn(3, 28, 28))]
  46. gt_pseudo_label = [[1, 2, 3], [4, 5, 6]]
  47. Y = [1, 2]
  48. # Convert data into ListData
  49. data = ListData(X=X, Y=Y, gt_pseudo_label=gt_pseudo_label)
  50. # Get data
  51. X = data.X
  52. Y = data.Y
  53. gt_pseudo_label = data.gt_pseudo_label
  54. # Set data
  55. data.X = X
  56. data.Y = Y
  57. data.gt_pseudo_label = gt_pseudo_label

An efficient Python toolkit for Abductive Learning (ABL), a novel paradigm that integrates machine learning and logical reasoning in a unified framework.