You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

Zoo.rst 10 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255
  1. Zoo
  2. ===
  3. .. raw:: html
  4. <p>For detailed code implementation, please view it on <a class="reference external" href="https://github.com/AbductiveLearning/ABLKit/tree/Dev/examples/zoo" target="_blank">GitHub</a>.</p>
  5. Below shows an implementation of
  6. `Zoo <https://archive.ics.uci.edu/dataset/111/zoo>`__ dataset. In this task,
  7. attributes of animals (such as presence of hair, eggs, etc.) and their
  8. targets (the animal class they belong to) are given, along with a
  9. knowledge base which contains information about the relations between
  10. attributes and targets, e.g., Implies(milk == 1, mammal == 1).
  11. The goal of this task is to develop a learning model that can predict
  12. the targets of animals based on their attributes. In the initial stages,
  13. when the model is under-trained, it may produce incorrect predictions
  14. that conflict with the relations contained in the knowledge base. When
  15. this happens, abductive reasoning can be employed to adjust these
  16. results and retrain the model accordingly. This process enables us to
  17. further update the learning model.
  18. .. code:: ipython3
  19. # Import necessary libraries and modules
  20. import os.path as osp
  21. import numpy as np
  22. from sklearn.ensemble import RandomForestClassifier
  23. from ablkit.bridge import SimpleBridge
  24. from ablkit.data.evaluation import ReasoningMetric, SymbolAccuracy
  25. from ablkit.learning import ABLModel
  26. from ablkit.reasoning import Reasoner
  27. from ablkit.utils import ABLLogger, confidence_dist, print_log, tab_data_to_tuple
  28. from get_dataset import load_and_preprocess_dataset, split_dataset
  29. from kb import ZooKB
  30. Working with Data
  31. -----------------
  32. First, we load and preprocess the `Zoo
  33. dataset <https://archive.ics.uci.edu/dataset/111/zoo>`__, and split it
  34. into labeled/unlabeled/test data
  35. .. code:: ipython3
  36. X, y = load_and_preprocess_dataset(dataset_id=62)
  37. X_label, y_label, X_unlabel, y_unlabel, X_test, y_test = split_dataset(X, y, test_size=0.3)
  38. Zoo dataset consists of tabular data. The attributes contain 17 boolean
  39. values (e.g., hair, feathers, eggs, milk, airborne, aquatic, etc.) and
  40. the target is an integer value in the range [0,6] representing 7 classes
  41. (e.g., mammal, bird, reptile, fish, amphibian, insect, and other). Below
  42. is an illustration:
  43. .. code:: ipython3
  44. print("Shape of X and y:", X.shape, y.shape)
  45. print("First five elements of X:")
  46. print(X[:5])
  47. print("First five elements of y:")
  48. print(y[:5])
  49. Out:
  50. .. code:: none
  51. :class: code-out
  52. Shape of X and y: (101, 16) (101,)
  53. First five elements of X:
  54. [[True False False True False False True True True True False False 4
  55. False False True]
  56. [True False False True False False False True True True False False 4
  57. True False True]
  58. [False False True False False True True True True False False True 0
  59. True False False]
  60. [True False False True False False True True True True False False 4
  61. False False True]
  62. [True False False True False False True True True True False False 4
  63. True False True]]
  64. First five elements of y:
  65. [0 0 3 0 0]
  66. Next, we transform the tabular data to the format required by
  67. ABL Kit, which is a tuple of (X, gt_pseudo_label, Y). In this task,
  68. we treat the attributes as X and the targets as gt_pseudo_label (ground
  69. truth pseudo-labels). Y (reasoning results) are expected to be 0,
  70. indicating no rules are violated.
  71. .. code:: ipython3
  72. label_data = tab_data_to_tuple(X_label, y_label, reasoning_result = 0)
  73. data = tab_data_to_tuple(X_test, y_test, reasoning_result = 0)
  74. train_data = tab_data_to_tuple(X_unlabel, y_unlabel, reasoning_result = 0)
  75. Building the Learning Part
  76. --------------------------
  77. To build the learning part, we need to first build a machine learning
  78. base model. We use a `Random
  79. Forest <https://en.wikipedia.org/wiki/Random_forest>`__ as the base
  80. model.
  81. .. code:: ipython3
  82. base_model = RandomForestClassifier()
  83. However, the base model built above deals with instance-level data, and
  84. can not directly deal with example-level data. Therefore, we wrap the
  85. base model into ``ABLModel``, which enables the learning part to train,
  86. test, and predict on example-level data.
  87. .. code:: ipython3
  88. model = ABLModel(base_model)
  89. Building the Reasoning Part
  90. ---------------------------
  91. In the reasoning part, we first build a knowledge base which contains
  92. information about the relations between attributes (X) and targets
  93. (pseudo-labels), e.g., Implies(milk == 1, mammal == 1). The knowledge
  94. base is built in the ``ZooKB`` class within file ``examples/zoo/kb.py``, and is
  95. derived from the ``KBBase`` class.
  96. .. code:: ipython3
  97. kb = ZooKB()
  98. As mentioned, for all attributes and targets in the dataset, the
  99. reasoning results are expected to be 0 since there should be no
  100. violations of the established knowledge in real data. As shown below:
  101. .. code:: ipython3
  102. for idx, (x, y_item) in enumerate(zip(X[:5], y[:5])):
  103. print(f"Example {idx}: the attributes are: {x}, and the target is {y_item}.")
  104. print(f"Reasoning result is {kb.logic_forward([y_item], [x])}.")
  105. print()
  106. Out:
  107. .. code:: none
  108. :class: code-out
  109. Example 0: the attributes are: [True False False True False False True True True True False False 4 False
  110. False True], and the target is 0.
  111. Reasoning result is 0.
  112. Example 1: the attributes are: [True False False True False False False True True True False False 4 True
  113. False True], and the target is 0.
  114. Reasoning result is 0.
  115. Example 2: the attributes are: [False False True False False True True True True False False True 0 True
  116. False False], and the target is 3.
  117. Reasoning result is 0.
  118. Example 3: the attributes are: [True False False True False False True True True True False False 4 False
  119. False True], and the target is 0.
  120. Reasoning result is 0.
  121. Example 4: the attributes are: [True False False True False False True True True True False False 4 True
  122. False True], and the target is 0.
  123. Reasoning result is 0.
  124. Then, we create a reasoner by instantiating the class ``Reasoner``. Due
  125. to the indeterminism of abductive reasoning, there could be multiple
  126. candidates compatible with the knowledge base. When this happens, reasoner
  127. can minimize inconsistencies between the knowledge base and
  128. pseudo-labels predicted by the learning part, and then return only one
  129. candidate that has the highest consistency.
  130. .. code:: ipython3
  131. def consitency(data_example, candidates, candidate_idxs, reasoning_results):
  132. pred_prob = data_example.pred_prob
  133. model_scores = confidence_dist(pred_prob, candidate_idxs)
  134. rule_scores = np.array(reasoning_results)
  135. scores = model_scores + rule_scores
  136. return scores
  137. reasoner = Reasoner(kb, dist_func=consitency)
  138. Building Evaluation Metrics
  139. ---------------------------
  140. Next, we set up evaluation metrics. These metrics will be used to
  141. evaluate the model performance during training and testing.
  142. Specifically, we use ``SymbolAccuracy`` and ``ReasoningMetric``, which
  143. are used to evaluate the accuracy of the machine learning model’s
  144. predictions and the accuracy of the final reasoning results,
  145. respectively.
  146. .. code:: ipython3
  147. metric_list = [SymbolAccuracy(prefix="zoo"), ReasoningMetric(kb=kb, prefix="zoo")]
  148. Bridging Learning and Reasoning
  149. -------------------------------
  150. Now, the last step is to bridge the learning and reasoning part. We
  151. proceed with this step by creating an instance of ``SimpleBridge``.
  152. .. code:: ipython3
  153. bridge = SimpleBridge(model, reasoner, metric_list)
  154. Perform training and testing by invoking the ``train`` and ``test``
  155. methods of ``SimpleBridge``.
  156. .. code:: ipython3
  157. # Build logger
  158. print_log("Abductive Learning on the Zoo example.", logger="current")
  159. log_dir = ABLLogger.get_current_instance().log_dir
  160. weights_dir = osp.join(log_dir, "weights")
  161. print_log("------- Use labeled data to pretrain the model -----------", logger="current")
  162. base_model.fit(X_label, y_label)
  163. print_log("------- Test the initial model -----------", logger="current")
  164. bridge.test(test_data)
  165. print_log("------- Use ABL to train the model -----------", logger="current")
  166. bridge.train(train_data=train_data, label_data=label_data, loops=3, segment_size=len(X_unlabel), save_dir=weights_dir)
  167. print_log("------- Test the final model -----------", logger="current")
  168. bridge.test(test_data)
  169. Out:
  170. .. code:: none
  171. :class: code-out
  172. abl - INFO - Abductive Learning on the ZOO example.
  173. abl - INFO - ------- Use labeled data to pretrain the model -----------
  174. abl - INFO - ------- Test the initial model -----------
  175. abl - INFO - Evaluation ended, zoo/character_accuracy: 0.903 zoo/reasoning_accuracy: 0.903
  176. abl - INFO - ------- Use ABL to train the model -----------
  177. abl - INFO - loop(train) [1/3] segment(train) [1/1]
  178. abl - INFO - Evaluation start: loop(val) [1]
  179. abl - INFO - Evaluation ended, zoo/character_accuracy: 1.000 zoo/reasoning_accuracy: 1.000
  180. abl - INFO - loop(train) [2/3] segment(train) [1/1]
  181. abl - INFO - Evaluation start: loop(val) [2]
  182. abl - INFO - Evaluation ended, zoo/character_accuracy: 1.000 zoo/reasoning_accuracy: 1.000
  183. abl - INFO - loop(train) [3/3] segment(train) [1/1]
  184. abl - INFO - Evaluation start: loop(val) [3]
  185. abl - INFO - Evaluation ended, zoo/character_accuracy: 1.000 zoo/reasoning_accuracy: 1.000
  186. abl - INFO - ------- Test the final model -----------
  187. abl - INFO - Evaluation ended, zoo/character_accuracy: 0.968 zoo/reasoning_accuracy: 0.968
  188. We may see from the results, after undergoing training with ABL, the
  189. model’s accuracy has improved.

An efficient Python toolkit for Abductive Learning (ABL), a novel paradigm that integrates machine learning and logical reasoning in a unified framework.