| @@ -0,0 +1,65 @@ | |||
| Node labels: [symbol] | |||
| Node attributes: [chem, charge, x, y] | |||
| Edge labels: [valence] | |||
| Node labels were converted to integer values using this map: | |||
| Component 0: | |||
| 0 C | |||
| 1 O | |||
| 2 N | |||
| 3 Cl | |||
| 4 F | |||
| 5 S | |||
| 6 Se | |||
| 7 P | |||
| 8 Na | |||
| 9 I | |||
| 10 Co | |||
| 11 Br | |||
| 12 Li | |||
| 13 Si | |||
| 14 Mg | |||
| 15 Cu | |||
| 16 As | |||
| 17 B | |||
| 18 Pt | |||
| 19 Ru | |||
| 20 K | |||
| 21 Pd | |||
| 22 Au | |||
| 23 Te | |||
| 24 W | |||
| 25 Rh | |||
| 26 Zn | |||
| 27 Bi | |||
| 28 Pb | |||
| 29 Ge | |||
| 30 Sb | |||
| 31 Sn | |||
| 32 Ga | |||
| 33 Hg | |||
| 34 Ho | |||
| 35 Tl | |||
| 36 Ni | |||
| 37 Tb | |||
| Edge labels were converted to integer values using this map: | |||
| Component 0: | |||
| 0 1 | |||
| 1 2 | |||
| 2 3 | |||
| Class labels were converted to integer values using this map: | |||
| 0 a | |||
| 1 i | |||
| @@ -0,0 +1,75 @@ | |||
| README for dataset DD | |||
| === Usage === | |||
| This folder contains the following comma separated text files | |||
| (replace DS by the name of the dataset): | |||
| n = total number of nodes | |||
| m = total number of edges | |||
| N = number of graphs | |||
| (1) DS_A.txt (m lines) | |||
| sparse (block diagonal) adjacency matrix for all graphs, | |||
| each line corresponds to (row, col) resp. (node_id, node_id) | |||
| (2) DS_graph_indicator.txt (n lines) | |||
| column vector of graph identifiers for all nodes of all graphs, | |||
| the value in the i-th line is the graph_id of the node with node_id i | |||
| (3) DS_graph_labels.txt (N lines) | |||
| class labels for all graphs in the dataset, | |||
| the value in the i-th line is the class label of the graph with graph_id i | |||
| (4) DS_node_labels.txt (n lines) | |||
| column vector of node labels, | |||
| the value in the i-th line corresponds to the node with node_id i | |||
| There are OPTIONAL files if the respective information is available: | |||
| (5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt) | |||
| labels for the edges in DS_A_sparse.txt | |||
| (6) DS_edge_attributes.txt (m lines; same size as DS_A.txt) | |||
| attributes for the edges in DS_A.txt | |||
| (7) DS_node_attributes.txt (n lines) | |||
| matrix of node attributes, | |||
| the comma seperated values in the i-th line is the attribute vector of the node with node_id i | |||
| (8) DS_graph_attributes.txt (N lines) | |||
| regression values for all graphs in the dataset, | |||
| the value in the i-th line is the attribute of the graph with graph_id i | |||
| === Description === | |||
| D&D is a dataset of 1178 protein structures (Dobson and Doig, 2003). Each protein is | |||
| represented by a graph, in which the nodes are amino acids and two nodes are connected | |||
| by an edge if they are less than 6 Angstroms apart. The prediction task is to classify | |||
| the protein structures into enzymes and non-enzymes. | |||
| === Previous Use of the Dataset === | |||
| Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph | |||
| Kernels from Propagated Information. Under review at MLJ. | |||
| Neumann, M., Patricia, N., Garnett, R., Kersting, K.: Efficient Graph Kernels by | |||
| Randomization. In: P.A. Flach, T.D. Bie, N. Cristianini (eds.) ECML/PKDD, Notes in | |||
| Computer Science, vol. 7523, pp. 378-393. Springer (2012). | |||
| Shervashidze, N., Schweitzer, P., van Leeuwen, E., Mehlhorn, K., Borgwardt, K.: | |||
| Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2539-2561 (2011) | |||
| === References === | |||
| P. D. Dobson and A. J. Doig. Distinguishing enzyme structures from non-enzymes without | |||
| alignments. J. Mol. Biol., 330(4):771–783, Jul 2003. | |||
| @@ -0,0 +1,70 @@ | |||
| README for dataset NCI1 | |||
| === Usage === | |||
| This folder contains the following comma separated text files | |||
| (replace DS by the name of the dataset): | |||
| n = total number of nodes | |||
| m = total number of edges | |||
| N = number of graphs | |||
| (1) DS_A.txt (m lines) | |||
| sparse (block diagonal) adjacency matrix for all graphs, | |||
| each line corresponds to (row, col) resp. (node_id, node_id) | |||
| (2) DS_graph_indicator.txt (n lines) | |||
| column vector of graph identifiers for all nodes of all graphs, | |||
| the value in the i-th line is the graph_id of the node with node_id i | |||
| (3) DS_graph_labels.txt (N lines) | |||
| class labels for all graphs in the dataset, | |||
| the value in the i-th line is the class label of the graph with graph_id i | |||
| (4) DS_node_labels.txt (n lines) | |||
| column vector of node labels, | |||
| the value in the i-th line corresponds to the node with node_id i | |||
| There are OPTIONAL files if the respective information is available: | |||
| (5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt) | |||
| labels for the edges in DS_A_sparse.txt | |||
| (6) DS_edge_attributes.txt (m lines; same size as DS_A.txt) | |||
| attributes for the edges in DS_A.txt | |||
| (7) DS_node_attributes.txt (n lines) | |||
| matrix of node attributes, | |||
| the comma seperated values in the i-th line is the attribute vector of the node with node_id i | |||
| (8) DS_graph_attributes.txt (N lines) | |||
| regression values for all graphs in the dataset, | |||
| the value in the i-th line is the attribute of the graph with graph_id i | |||
| === Description === | |||
| NCI1 and NCI109 represent two balanced subsets of datasets of chemical compounds screened | |||
| for activity against non-small cell lung cancer and ovarian cancer cell lines respectively | |||
| (Wale and Karypis (2006) and http://pubchem.ncbi.nlm.nih.gov). | |||
| === Previous Use of the Dataset === | |||
| Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph | |||
| Kernels from Propagated Information. Under review at MLJ. | |||
| Neumann, M., Patricia, N., Garnett, R., Kersting, K.: Efficient Graph Kernels by | |||
| Randomization. In: P.A. Flach, T.D. Bie, N. Cristianini (eds.) ECML/PKDD, Notes in | |||
| Computer Science, vol. 7523, pp. 378-393. Springer (2012). | |||
| Shervashidze, N., Schweitzer, P., van Leeuwen, E., Mehlhorn, K., Borgwardt, K.: | |||
| Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2539-2561 (2011) | |||
| === References === | |||
| N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and | |||
| classification. In Proc. of ICDM, pages 678–689, Hong Kong, 2006. | |||
| @@ -0,0 +1,70 @@ | |||
| README for dataset NCI109 | |||
| === Usage === | |||
| This folder contains the following comma separated text files | |||
| (replace DS by the name of the dataset): | |||
| n = total number of nodes | |||
| m = total number of edges | |||
| N = number of graphs | |||
| (1) DS_A.txt (m lines) | |||
| sparse (block diagonal) adjacency matrix for all graphs, | |||
| each line corresponds to (row, col) resp. (node_id, node_id) | |||
| (2) DS_graph_indicator.txt (n lines) | |||
| column vector of graph identifiers for all nodes of all graphs, | |||
| the value in the i-th line is the graph_id of the node with node_id i | |||
| (3) DS_graph_labels.txt (N lines) | |||
| class labels for all graphs in the dataset, | |||
| the value in the i-th line is the class label of the graph with graph_id i | |||
| (4) DS_node_labels.txt (n lines) | |||
| column vector of node labels, | |||
| the value in the i-th line corresponds to the node with node_id i | |||
| There are OPTIONAL files if the respective information is available: | |||
| (5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt) | |||
| labels for the edges in DS_A_sparse.txt | |||
| (6) DS_edge_attributes.txt (m lines; same size as DS_A.txt) | |||
| attributes for the edges in DS_A.txt | |||
| (7) DS_node_attributes.txt (n lines) | |||
| matrix of node attributes, | |||
| the comma seperated values in the i-th line is the attribute vector of the node with node_id i | |||
| (8) DS_graph_attributes.txt (N lines) | |||
| regression values for all graphs in the dataset, | |||
| the value in the i-th line is the attribute of the graph with graph_id i | |||
| === Description === | |||
| NCI1 and NCI109 represent two balanced subsets of datasets of chemical compounds screened | |||
| for activity against non-small cell lung cancer and ovarian cancer cell lines respectively | |||
| (Wale and Karypis (2006) and http://pubchem.ncbi.nlm.nih.gov). | |||
| === Previous Use of the Dataset === | |||
| Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph | |||
| Kernels from Propagated Information. Under review at MLJ. | |||
| Neumann, M., Patricia, N., Garnett, R., Kersting, K.: Efficient Graph Kernels by | |||
| Randomization. In: P.A. Flach, T.D. Bie, N. Cristianini (eds.) ECML/PKDD, Notes in | |||
| Computer Science, vol. 7523, pp. 378-393. Springer (2012). | |||
| Shervashidze, N., Schweitzer, P., van Leeuwen, E., Mehlhorn, K., Borgwardt, K.: | |||
| Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2539-2561 (2011) | |||
| === References === | |||
| N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and | |||
| classification. In Proc. of ICDM, pages 678–689, Hong Kong, 2006. | |||
| @@ -12,21 +12,21 @@ import multiprocessing | |||
| from pygraph.kernels.commonWalkKernel import commonwalkkernel | |||
| dslist = [ | |||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| # 'task': 'regression'}, # node symb | |||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # # contains single node graph, node symb | |||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # # node nsymb | |||
| # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| # # node symb/nsymb | |||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # contains single node graph, node symb | |||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| 'task': 'regression'}, # node symb | |||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # node nsymb | |||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| # node symb/nsymb | |||
| # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||
| # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||
| # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||
| # | |||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | |||
| @@ -12,22 +12,22 @@ import multiprocessing | |||
| from pygraph.kernels.marginalizedKernel import marginalizedkernel | |||
| dslist = [ | |||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| # 'task': 'regression'}, # node symb | |||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # # contains single node graph, node symb | |||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # # node nsymb | |||
| # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| # # node symb/nsymb | |||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # contains single node graph, node symb | |||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| 'task': 'regression'}, # node symb | |||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # node nsymb | |||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| # node symb/nsymb | |||
| # {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||
| # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||
| # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||
| {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||
| # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||
| # | |||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | |||
| # # node/edge symb | |||
| @@ -17,22 +17,23 @@ import numpy as np | |||
| dslist = [ | |||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| # 'task': 'regression'}, # node symb | |||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # # contains single node graph, node symb | |||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| # # node symb/nsymb | |||
| # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||
| # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # contains single node graph, node symb | |||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| 'task': 'regression'}, # node symb | |||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # node nsymb | |||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| # node symb/nsymb | |||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||
| # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # # node nsymb | |||
| {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||
| {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||
| {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||
| # | |||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | |||
| # # node/edge symb | |||
| @@ -8,14 +8,14 @@ from pygraph.utils.kernels import deltakernel, gaussiankernel, kernelproduct | |||
| # datasets | |||
| dslist = [ | |||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| # 'task': 'regression'}, # node symb | |||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # # contains single node graph, node symb | |||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # contains single node graph, node symb | |||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| 'task': 'regression'}, # node symb | |||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # node nsymb | |||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| @@ -14,22 +14,22 @@ from pygraph.kernels.structuralspKernel import structuralspkernel | |||
| from pygraph.utils.kernels import deltakernel, gaussiankernel, kernelproduct | |||
| dslist = [ | |||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| # 'task': 'regression'}, # node symb | |||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # # contains single node graph, node symb | |||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # # node nsymb | |||
| # {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||
| # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||
| # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # contains single node graph, node symb | |||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| 'task': 'regression'}, # node symb | |||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # node nsymb | |||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||
| {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||
| {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||
| # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| # # node symb/nsymb | |||
| {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||
| # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||
| # | |||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | |||
| # # node/edge symb | |||
| @@ -14,22 +14,22 @@ from pygraph.kernels.treeletKernel import treeletkernel | |||
| from pygraph.utils.kernels import gaussiankernel, polynomialkernel | |||
| dslist = [ | |||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| # 'task': 'regression'}, # node symb | |||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # contains single node graph, node symb | |||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| 'task': 'regression'}, # node symb | |||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| # node symb/nsymb | |||
| {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||
| {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # node nsymb | |||
| {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| # node symb/nsymb | |||
| # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # # node nsymb | |||
| # | |||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | |||
| # # node/edge symb | |||
| @@ -12,21 +12,21 @@ import multiprocessing | |||
| from pygraph.kernels.untilHPathKernel import untilhpathkernel | |||
| dslist = [ | |||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| # 'task': 'regression'}, # node symb | |||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # # contains single node graph, node symb | |||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # # node nsymb | |||
| # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| # # node symb/nsymb | |||
| # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||
| # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||
| # {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # contains single node graph, node symb | |||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| 'task': 'regression'}, # node symb | |||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # node nsymb | |||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| # node symb/nsymb | |||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||
| {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||
| {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||
| {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||
| # | |||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | |||
| @@ -14,22 +14,22 @@ from pygraph.kernels.weisfeilerLehmanKernel import weisfeilerlehmankernel | |||
| dslist = [ | |||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| # 'task': 'regression'}, # node symb | |||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # # contains single node graph, node symb | |||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||
| # contains single node graph, node symb | |||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| 'task': 'regression'}, # node symb | |||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||
| # # node nsymb | |||
| # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| # # node symb/nsymb | |||
| # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||
| # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||
| # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||
| # node symb/nsymb | |||
| {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||
| {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||
| {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||
| # | |||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | |||
| # # node/edge symb | |||
| @@ -277,7 +277,8 @@ def gk_iam_nearest(Gn, alpha, idx_gi, Kmatrix, k, r_max): | |||
| # return dhat, ghat_list | |||
| def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, gkernel): | |||
| def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, | |||
| gkernel, c_ei=1, c_er=1, c_es=1, epsilon=0.001): | |||
| """This function constructs graph pre-image by the iterative pre-image | |||
| framework in reference [1], algorithm 1, where the step of generating new | |||
| graphs randomly is replaced by the IAM algorithm in reference [2]. | |||
| @@ -312,37 +313,44 @@ def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, g | |||
| return 0, g0hat_list | |||
| dhat = dis_gs[0] # the nearest distance | |||
| ghat_list = [g.copy() for g in g0hat_list] | |||
| for g in ghat_list: | |||
| draw_Letter_graph(g) | |||
| # for g in ghat_list: | |||
| # draw_Letter_graph(g) | |||
| # nx.draw_networkx(g) | |||
| # plt.show() | |||
| print(g.nodes(data=True)) | |||
| print(g.edges(data=True)) | |||
| # print(g.nodes(data=True)) | |||
| # print(g.edges(data=True)) | |||
| Gk = [Gn_init[ig].copy() for ig in sort_idx[0:k]] # the k nearest neighbors | |||
| for gi in Gk: | |||
| # nx.draw_networkx(gi) | |||
| # plt.show() | |||
| draw_Letter_graph(g) | |||
| print(gi.nodes(data=True)) | |||
| print(gi.edges(data=True)) | |||
| # for gi in Gk: | |||
| ## nx.draw_networkx(gi) | |||
| ## plt.show() | |||
| # draw_Letter_graph(g) | |||
| # print(gi.nodes(data=True)) | |||
| # print(gi.edges(data=True)) | |||
| Gs_nearest = Gk.copy() | |||
| # gihat_list = [] | |||
| # i = 1 | |||
| r = 1 | |||
| while r < r_max: | |||
| print('r =', r) | |||
| # found = False | |||
| r = 0 | |||
| itr = 0 | |||
| # cur_sod = dhat | |||
| # old_sod = cur_sod * 2 | |||
| sod_list = [dhat] | |||
| found = False | |||
| nb_updated = 0 | |||
| while r < r_max:# and not found: # @todo: if not found?# and np.abs(old_sod - cur_sod) > epsilon: | |||
| print('\nr =', r) | |||
| print('itr for gk =', itr, '\n') | |||
| found = False | |||
| # Gs_nearest = Gk + gihat_list | |||
| # g_tmp = iam(Gs_nearest) | |||
| g_tmp_list = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||
| Gn_median, Gs_nearest, c_ei=1, c_er=1, c_es=1) | |||
| for g in g_tmp_list: | |||
| g_tmp_list, _ = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||
| Gn_median, Gs_nearest, c_ei=c_ei, c_er=c_er, c_es=c_es) | |||
| # for g in g_tmp_list: | |||
| # nx.draw_networkx(g) | |||
| # plt.show() | |||
| draw_Letter_graph(g) | |||
| print(g.nodes(data=True)) | |||
| print(g.edges(data=True)) | |||
| # draw_Letter_graph(g) | |||
| # print(g.nodes(data=True)) | |||
| # print(g.edges(data=True)) | |||
| # compute distance between phi and the new generated graphs. | |||
| knew = compute_kernel(g_tmp_list + Gn_median, gkernel, False) | |||
| @@ -358,6 +366,7 @@ def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, g | |||
| # k_g1_list[1] + alpha[1] * alpha[1] * k_list[1]) | |||
| # find the new k nearest graphs. | |||
| dnew_best = min(dnew_list) | |||
| dis_gs = dnew_list + dis_gs # add the new nearest distances. | |||
| Gs_nearest = [g.copy() for g in g_tmp_list] + Gs_nearest # add the corresponding graphs. | |||
| sort_idx = np.argsort(dis_gs) | |||
| @@ -367,21 +376,34 @@ def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, g | |||
| print(dis_gs[-1]) | |||
| Gs_nearest = [Gs_nearest[idx] for idx in sort_idx[0:k]] | |||
| nb_best = len(np.argwhere(dis_gs == dis_gs[0]).flatten().tolist()) | |||
| if len([i for i in sort_idx[0:nb_best] if i < len(dnew_list)]) > 0: | |||
| print('I have smaller or equal distance!') | |||
| if dnew_best < dhat and np.abs(dnew_best - dhat) > epsilon: | |||
| print('I have smaller distance!') | |||
| print(str(dhat) + '->' + str(dis_gs[0])) | |||
| dhat = dis_gs[0] | |||
| idx_best_list = np.argwhere(dnew_list == dhat).flatten().tolist() | |||
| ghat_list = [g_tmp_list[idx].copy() for idx in idx_best_list] | |||
| for g in ghat_list: | |||
| # nx.draw_networkx(g) | |||
| # plt.show() | |||
| draw_Letter_graph(g) | |||
| print(g.nodes(data=True)) | |||
| print(g.edges(data=True)) | |||
| r = 0 | |||
| else: | |||
| # for g in ghat_list: | |||
| ## nx.draw_networkx(g) | |||
| ## plt.show() | |||
| # draw_Letter_graph(g) | |||
| # print(g.nodes(data=True)) | |||
| # print(g.edges(data=True)) | |||
| r = 0 | |||
| found = True | |||
| nb_updated += 1 | |||
| elif np.abs(dnew_best - dhat) < epsilon: | |||
| print('I have almost equal distance!') | |||
| print(str(dhat) + '->' + str(dnew_best)) | |||
| if not found: | |||
| r += 1 | |||
| # old_sod = cur_sod | |||
| # cur_sod = dnew_best | |||
| sod_list.append(dhat) | |||
| itr += 1 | |||
| print('\nthe graph is updated', nb_updated, 'times.') | |||
| print('sods in kernel space:', sod_list, '\n') | |||
| return dhat, ghat_list | |||
| @@ -9,6 +9,7 @@ Iterative alternate minimizations using GED. | |||
| import numpy as np | |||
| import random | |||
| import networkx as nx | |||
| from tqdm import tqdm | |||
| import sys | |||
| #from Cython_GedLib_2 import librariesImport, script | |||
| @@ -181,13 +182,27 @@ def GED(g1, g2, lib='gedlib'): | |||
| return dis, pi_forward, pi_backward | |||
| def median_distance(Gn, Gn_median, measure='ged', verbose=False): | |||
| dis_list = [] | |||
| pi_forward_list = [] | |||
| for idx, G in tqdm(enumerate(Gn), desc='computing median distances', | |||
| file=sys.stdout) if verbose else enumerate(Gn): | |||
| dis_sum = 0 | |||
| pi_forward_list.append([]) | |||
| for G_p in Gn_median: | |||
| dis_tmp, pi_tmp_forward, pi_tmp_backward = GED(G, G_p) | |||
| pi_forward_list[idx].append(pi_tmp_forward) | |||
| dis_sum += dis_tmp | |||
| dis_list.append(dis_sum) | |||
| return dis_list, pi_forward_list | |||
| # --------------------------- These are tests --------------------------------# | |||
| def test_iam_with_more_graphs_as_init(Gn, G_candidate, c_ei=3, c_er=3, c_es=1, | |||
| node_label='atom', edge_label='bond_type'): | |||
| """See my name, then you know what I do. | |||
| """ | |||
| from tqdm import tqdm | |||
| # Gn = Gn[0:10] | |||
| Gn = [nx.convert_node_labels_to_integers(g) for g in Gn] | |||
| @@ -321,7 +336,7 @@ def test_iam_with_more_graphs_as_init(Gn, G_candidate, c_ei=3, c_er=3, c_es=1, | |||
| def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||
| Gn_median, Gn_candidate, c_ei=3, c_er=3, c_es=1, node_label='atom', | |||
| edge_label='bond_type', connected=True): | |||
| edge_label='bond_type', connected=False): | |||
| """See my name, then you know what I do. | |||
| """ | |||
| from tqdm import tqdm | |||
| @@ -330,8 +345,11 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||
| node_ir = np.inf # corresponding to the node remove and insertion. | |||
| label_r = 'thanksdanny' # the label for node remove. # @todo: make this label unrepeatable. | |||
| ds_attrs = get_dataset_attributes(Gn_median + Gn_candidate, | |||
| attr_names=['edge_labeled', 'node_attr_dim'], | |||
| attr_names=['edge_labeled', 'node_attr_dim', 'edge_attr_dim'], | |||
| edge_label=edge_label) | |||
| ite_max = 50 | |||
| epsilon = 0.001 | |||
| def generate_graph(G, pi_p_forward, label_set): | |||
| @@ -460,13 +478,15 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||
| g_tmp.remove_edge(nd1, nd2) | |||
| # do not change anything when equal. | |||
| # find the best graph generated in this iteration and update pi_p. | |||
| # # find the best graph generated in this iteration and update pi_p. | |||
| # @todo: should we update all graphs generated or just the best ones? | |||
| dis_list, pi_forward_list = median_distance(G_new_list, Gn_median) | |||
| # @todo: should we remove the identical and connectivity check? | |||
| # Don't know which is faster. | |||
| G_new_list, idx_list = remove_duplicates(G_new_list) | |||
| pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||
| if ds_attrs['node_attr_dim'] == 0 and ds_attrs['edge_attr_dim'] == 0: | |||
| G_new_list, idx_list = remove_duplicates(G_new_list) | |||
| pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||
| dis_list = [dis_list[idx] for idx in idx_list] | |||
| # if connected == True: | |||
| # G_new_list, idx_list = remove_disconnected(G_new_list) | |||
| # pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||
| @@ -482,25 +502,10 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||
| # print(g.nodes(data=True)) | |||
| # print(g.edges(data=True)) | |||
| return G_new_list, pi_forward_list | |||
| return G_new_list, pi_forward_list, dis_list | |||
| def median_distance(Gn, Gn_median, measure='ged', verbose=False): | |||
| dis_list = [] | |||
| pi_forward_list = [] | |||
| for idx, G in tqdm(enumerate(Gn), desc='computing median distances', | |||
| file=sys.stdout) if verbose else enumerate(Gn): | |||
| dis_sum = 0 | |||
| pi_forward_list.append([]) | |||
| for G_p in Gn_median: | |||
| dis_tmp, pi_tmp_forward, pi_tmp_backward = GED(G, G_p) | |||
| pi_forward_list[idx].append(pi_tmp_forward) | |||
| dis_sum += dis_tmp | |||
| dis_list.append(dis_sum) | |||
| return dis_list, pi_forward_list | |||
| def best_median_graphs(Gn_candidate, dis_all, pi_all_forward): | |||
| def best_median_graphs(Gn_candidate, pi_all_forward, dis_all): | |||
| idx_min_list = np.argwhere(dis_all == np.min(dis_all)).flatten().tolist() | |||
| dis_min = dis_all[idx_min_list[0]] | |||
| pi_forward_min_list = [pi_all_forward[idx] for idx in idx_min_list] | |||
| @@ -508,25 +513,45 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||
| return G_min_list, pi_forward_min_list, dis_min | |||
| def iteration_proc(G, pi_p_forward): | |||
| def iteration_proc(G, pi_p_forward, cur_sod): | |||
| G_list = [G] | |||
| pi_forward_list = [pi_p_forward] | |||
| old_sod = cur_sod * 2 | |||
| sod_list = [cur_sod] | |||
| # iterations. | |||
| for itr in range(0, 5): # @todo: the convergence condition? | |||
| # print('itr is', itr) | |||
| itr = 0 | |||
| while itr < ite_max and np.abs(old_sod - cur_sod) > epsilon: | |||
| # for itr in range(0, 5): # the convergence condition? | |||
| print('itr is', itr) | |||
| G_new_list = [] | |||
| pi_forward_new_list = [] | |||
| dis_new_list = [] | |||
| for idx, G in enumerate(G_list): | |||
| label_set = get_node_labels(Gn_median + [G], node_label) | |||
| G_tmp_list, pi_forward_tmp_list = generate_graph( | |||
| G_tmp_list, pi_forward_tmp_list, dis_tmp_list = generate_graph( | |||
| G, pi_forward_list[idx], label_set) | |||
| G_new_list += G_tmp_list | |||
| pi_forward_new_list += pi_forward_tmp_list | |||
| dis_new_list += dis_tmp_list | |||
| G_list = G_new_list[:] | |||
| pi_forward_list = pi_forward_new_list[:] | |||
| dis_list = dis_new_list[:] | |||
| old_sod = cur_sod | |||
| cur_sod = np.min(dis_list) | |||
| sod_list.append(cur_sod) | |||
| itr += 1 | |||
| G_list, idx_list = remove_duplicates(G_list) | |||
| pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||
| # @todo: do we return all graphs or the best ones? | |||
| # get the best ones of the generated graphs. | |||
| G_list, pi_forward_list, dis_min = best_median_graphs( | |||
| G_list, pi_forward_list, dis_list) | |||
| if ds_attrs['node_attr_dim'] == 0 and ds_attrs['edge_attr_dim'] == 0: | |||
| G_list, idx_list = remove_duplicates(G_list) | |||
| pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||
| # dis_list = [dis_list[idx] for idx in idx_list] | |||
| # import matplotlib.pyplot as plt | |||
| # for g in G_list: | |||
| @@ -535,7 +560,9 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||
| # print(g.nodes(data=True)) | |||
| # print(g.edges(data=True)) | |||
| return G_list, pi_forward_list # do we return all graphs or the best ones? | |||
| print('\nsods:', sod_list, '\n') | |||
| return G_list, pi_forward_list, dis_min | |||
| def remove_duplicates(Gn): | |||
| @@ -570,28 +597,37 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||
| # phase 1: initilize. | |||
| # compute set-median. | |||
| dis_min = np.inf | |||
| dis_all, pi_all_forward = median_distance(Gn_candidate, Gn_median) | |||
| dis_list, pi_forward_all = median_distance(Gn_candidate, Gn_median) | |||
| # find all smallest distances. | |||
| idx_min_list = np.argwhere(dis_all == np.min(dis_all)).flatten().tolist() | |||
| dis_min = dis_all[idx_min_list[0]] | |||
| idx_min_list = np.argwhere(dis_list == np.min(dis_list)).flatten().tolist() | |||
| dis_min = dis_list[idx_min_list[0]] | |||
| # phase 2: iteration. | |||
| G_list = [] | |||
| for idx_min in idx_min_list[::-1]: | |||
| dis_list = [] | |||
| pi_forward_list = [] | |||
| for idx_min in idx_min_list: | |||
| # print('idx_min is', idx_min) | |||
| G = Gn_candidate[idx_min].copy() | |||
| # list of edit operations. | |||
| pi_p_forward = pi_all_forward[idx_min] | |||
| pi_p_forward = pi_forward_all[idx_min] | |||
| # pi_p_backward = pi_all_backward[idx_min] | |||
| Gi_list, pi_i_forward_list = iteration_proc(G, pi_p_forward) | |||
| Gi_list, pi_i_forward_list, dis_i_min = iteration_proc(G, pi_p_forward, dis_min) | |||
| G_list += Gi_list | |||
| dis_list.append(dis_i_min) | |||
| pi_forward_list += pi_i_forward_list | |||
| G_list, _ = remove_duplicates(G_list) | |||
| if ds_attrs['node_attr_dim'] == 0 and ds_attrs['edge_attr_dim'] == 0: | |||
| G_list, idx_list = remove_duplicates(G_list) | |||
| dis_list = [dis_list[idx] for idx in idx_list] | |||
| pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||
| if connected == True: | |||
| G_list_con, _ = remove_disconnected(G_list) | |||
| # if there is no connected graphs at all, then remain the disconnected ones. | |||
| if len(G_list_con) > 0: # @todo: ?????????????????????????? | |||
| G_list = G_list_con | |||
| G_list_con, idx_list = remove_disconnected(G_list) | |||
| # if there is no connected graphs at all, then remain the disconnected ones. | |||
| if len(G_list_con) > 0: # @todo: ?????????????????????????? | |||
| G_list = G_list_con | |||
| dis_list = [dis_list[idx] for idx in idx_list] | |||
| pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||
| # import matplotlib.pyplot as plt | |||
| # for g in G_list: | |||
| @@ -601,15 +637,15 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||
| # print(g.edges(data=True)) | |||
| # get the best median graphs | |||
| dis_all, pi_all_forward = median_distance(G_list, Gn_median) | |||
| # dis_list, pi_forward_list = median_distance(G_list, Gn_median) | |||
| G_min_list, pi_forward_min_list, dis_min = best_median_graphs( | |||
| G_list, dis_all, pi_all_forward) | |||
| G_list, pi_forward_list, dis_list) | |||
| # for g in G_min_list: | |||
| # nx.draw_networkx(g) | |||
| # plt.show() | |||
| # print(g.nodes(data=True)) | |||
| # print(g.edges(data=True)) | |||
| return G_min_list | |||
| return G_min_list, dis_min | |||
| if __name__ == '__main__': | |||
| @@ -0,0 +1,218 @@ | |||
| import sys | |||
| sys.path.insert(0, "../") | |||
| #import pathlib | |||
| import numpy as np | |||
| import networkx as nx | |||
| import time | |||
| #import librariesImport | |||
| #import script | |||
| #sys.path.insert(0, "/home/bgauzere/dev/optim-graphes/") | |||
| #import pygraph | |||
| from pygraph.utils.graphfiles import loadDataset | |||
| def replace_graph_in_env(script, graph, old_id, label='median'): | |||
| """ | |||
| Replace a graph in script | |||
| If old_id is -1, add a new graph to the environnemt | |||
| """ | |||
| if(old_id > -1): | |||
| script.PyClearGraph(old_id) | |||
| new_id = script.PyAddGraph(label) | |||
| for i in graph.nodes(): | |||
| script.PyAddNode(new_id,str(i),graph.node[i]) # !! strings are required bt gedlib | |||
| for e in graph.edges: | |||
| script.PyAddEdge(new_id, str(e[0]),str(e[1]), {}) | |||
| script.PyInitEnv() | |||
| script.PySetMethod("IPFP", "") | |||
| script.PyInitMethod() | |||
| return new_id | |||
| #Dessin median courrant | |||
| def draw_Letter_graph(graph, savepath=''): | |||
| import numpy as np | |||
| import networkx as nx | |||
| import matplotlib.pyplot as plt | |||
| plt.figure() | |||
| pos = {} | |||
| for n in graph.nodes: | |||
| pos[n] = np.array([float(graph.node[n]['attributes'][0]), | |||
| float(graph.node[n]['attributes'][1])]) | |||
| nx.draw_networkx(graph, pos) | |||
| if savepath != '': | |||
| plt.savefig(savepath + str(time.time()) + '.eps', format='eps', dpi=300) | |||
| plt.show() | |||
| plt.clf() | |||
| #compute new mappings | |||
| def update_mappings(script,median_id,listID): | |||
| med_distances = {} | |||
| med_mappings = {} | |||
| sod = 0 | |||
| for i in range(0,len(listID)): | |||
| script.PyRunMethod(median_id,listID[i]) | |||
| med_distances[i] = script.PyGetUpperBound(median_id,listID[i]) | |||
| med_mappings[i] = script.PyGetForwardMap(median_id,listID[i]) | |||
| sod += med_distances[i] | |||
| return med_distances, med_mappings, sod | |||
| def calcul_Sij(all_mappings, all_graphs,i,j): | |||
| s_ij = 0 | |||
| for k in range(0,len(all_mappings)): | |||
| cur_graph = all_graphs[k] | |||
| cur_mapping = all_mappings[k] | |||
| size_graph = cur_graph.order() | |||
| if ((cur_mapping[i] < size_graph) and | |||
| (cur_mapping[j] < size_graph) and | |||
| (cur_graph.has_edge(cur_mapping[i], cur_mapping[j]) == True)): | |||
| s_ij += 1 | |||
| return s_ij | |||
| # def update_median_nodes_L1(median,listIdSet,median_id,dataset, mappings): | |||
| # from scipy.stats.mstats import gmean | |||
| # for i in median.nodes(): | |||
| # for k in listIdSet: | |||
| # vectors = [] #np.zeros((len(listIdSet),2)) | |||
| # if(k != median_id): | |||
| # phi_i = mappings[k][i] | |||
| # if(phi_i < dataset[k].order()): | |||
| # vectors.append([float(dataset[k].node[phi_i]['x']),float(dataset[k].node[phi_i]['y'])]) | |||
| # new_labels = gmean(vectors) | |||
| # median.node[i]['x'] = str(new_labels[0]) | |||
| # median.node[i]['y'] = str(new_labels[1]) | |||
| # return median | |||
| def update_median_nodes(median,dataset,mappings): | |||
| #update node attributes | |||
| for i in median.nodes(): | |||
| nb_sub=0 | |||
| mean_label = {'x' : 0, 'y' : 0} | |||
| for k in range(0,len(mappings)): | |||
| phi_i = mappings[k][i] | |||
| if ( phi_i < dataset[k].order() ): | |||
| nb_sub += 1 | |||
| mean_label['x'] += 0.75*float(dataset[k].node[phi_i]['x']) | |||
| mean_label['y'] += 0.75*float(dataset[k].node[phi_i]['y']) | |||
| median.node[i]['x'] = str((1/0.75)*(mean_label['x']/nb_sub)) | |||
| median.node[i]['y'] = str((1/0.75)*(mean_label['y']/nb_sub)) | |||
| return median | |||
| def update_median_edges(dataset, mappings, median, cei=0.425,cer=0.425): | |||
| #for letter high, ceir = 1.7, alpha = 0.75 | |||
| size_dataset = len(dataset) | |||
| ratio_cei_cer = cer/(cei + cer) | |||
| threshold = size_dataset*ratio_cei_cer | |||
| order_graph_median = median.order() | |||
| for i in range(0,order_graph_median): | |||
| for j in range(i+1,order_graph_median): | |||
| s_ij = calcul_Sij(mappings,dataset,i,j) | |||
| if(s_ij > threshold): | |||
| median.add_edge(i,j) | |||
| else: | |||
| if(median.has_edge(i,j)): | |||
| median.remove_edge(i,j) | |||
| return median | |||
| def compute_median(script, listID, dataset,verbose=False): | |||
| """Compute a graph median of a dataset according to an environment | |||
| Parameters | |||
| script : An gedlib initialized environnement | |||
| listID (list): a list of ID in script: encodes the dataset | |||
| dataset (list): corresponding graphs in networkX format. We assume that graph | |||
| listID[i] corresponds to dataset[i] | |||
| Returns: | |||
| A networkX graph, which is the median, with corresponding sod | |||
| """ | |||
| print(len(listID)) | |||
| median_set_index, median_set_sod = compute_median_set(script, listID) | |||
| print(median_set_index) | |||
| print(median_set_sod) | |||
| sods = [] | |||
| #Ajout median dans environnement | |||
| set_median = dataset[median_set_index].copy() | |||
| median = dataset[median_set_index].copy() | |||
| cur_med_id = replace_graph_in_env(script,median,-1) | |||
| med_distances, med_mappings, cur_sod = update_mappings(script,cur_med_id,listID) | |||
| sods.append(cur_sod) | |||
| if(verbose): | |||
| print(cur_sod) | |||
| ite_max = 50 | |||
| old_sod = cur_sod * 2 | |||
| ite = 0 | |||
| epsilon = 0.001 | |||
| best_median | |||
| while((ite < ite_max) and (np.abs(old_sod - cur_sod) > epsilon )): | |||
| median = update_median_nodes(median,dataset, med_mappings) | |||
| median = update_median_edges(dataset,med_mappings,median) | |||
| cur_med_id = replace_graph_in_env(script,median,cur_med_id) | |||
| med_distances, med_mappings, cur_sod = update_mappings(script,cur_med_id,listID) | |||
| sods.append(cur_sod) | |||
| if(verbose): | |||
| print(cur_sod) | |||
| ite += 1 | |||
| return median, cur_sod, sods, set_median | |||
| draw_Letter_graph(median) | |||
| def compute_median_set(script,listID): | |||
| 'Returns the id in listID corresponding to median set' | |||
| #Calcul median set | |||
| N=len(listID) | |||
| map_id_to_index = {} | |||
| map_index_to_id = {} | |||
| for i in range(0,len(listID)): | |||
| map_id_to_index[listID[i]] = i | |||
| map_index_to_id[i] = listID[i] | |||
| distances = np.zeros((N,N)) | |||
| for i in listID: | |||
| for j in listID: | |||
| script.PyRunMethod(i,j) | |||
| distances[map_id_to_index[i],map_id_to_index[j]] = script.PyGetUpperBound(i,j) | |||
| median_set_index = np.argmin(np.sum(distances,0)) | |||
| sod = np.min(np.sum(distances,0)) | |||
| return median_set_index, sod | |||
| #if __name__ == "__main__": | |||
| # #Chargement du dataset | |||
| # script.PyLoadGXLGraph('/home/bgauzere/dev/gedlib/data/datasets/Letter/HIGH/', '/home/bgauzere/dev/gedlib/data/collections/Letter_Z.xml') | |||
| # script.PySetEditCost("LETTER") | |||
| # script.PyInitEnv() | |||
| # script.PySetMethod("IPFP", "") | |||
| # script.PyInitMethod() | |||
| # | |||
| # dataset,my_y = pygraph.utils.graphfiles.loadDataset("/home/bgauzere/dev/gedlib/data/datasets/Letter/HIGH/Letter_Z.cxl") | |||
| # | |||
| # listID = script.PyGetAllGraphIds() | |||
| # median, sod = compute_median(script,listID,dataset,verbose=True) | |||
| # | |||
| # print(sod) | |||
| # draw_Letter_graph(median) | |||
| if __name__ == '__main__': | |||
| # test draw_Letter_graph | |||
| ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt', | |||
| 'extra_params': {}} # node nsymb | |||
| Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params']) | |||
| print(y_all) | |||
| for g in Gn: | |||
| draw_Letter_graph(g) | |||
| @@ -0,0 +1,423 @@ | |||
| #!/usr/bin/env python3 | |||
| # -*- coding: utf-8 -*- | |||
| """ | |||
| Created on Thu Jul 4 12:20:16 2019 | |||
| @author: ljia | |||
| """ | |||
| import numpy as np | |||
| import networkx as nx | |||
| import matplotlib.pyplot as plt | |||
| import time | |||
| from tqdm import tqdm | |||
| import sys | |||
| sys.path.insert(0, "../") | |||
| from pygraph.utils.graphfiles import loadDataset | |||
| from median import draw_Letter_graph | |||
| # --------------------------- These are tests --------------------------------# | |||
| def test_who_is_the_closest_in_kernel_space(Gn): | |||
| idx_gi = [0, 6] | |||
| g1 = Gn[idx_gi[0]] | |||
| g2 = Gn[idx_gi[1]] | |||
| # create the "median" graph. | |||
| gnew = g2.copy() | |||
| gnew.remove_node(0) | |||
| nx.draw_networkx(gnew) | |||
| plt.show() | |||
| print(gnew.nodes(data=True)) | |||
| Gn = [gnew] + Gn | |||
| # compute gram matrix | |||
| Kmatrix = compute_kernel(Gn, 'untilhpathkernel', True) | |||
| # the distance matrix | |||
| dmatrix = gram2distances(Kmatrix) | |||
| print(np.sort(dmatrix[idx_gi[0] + 1])) | |||
| print(np.argsort(dmatrix[idx_gi[0] + 1])) | |||
| print(np.sort(dmatrix[idx_gi[1] + 1])) | |||
| print(np.argsort(dmatrix[idx_gi[1] + 1])) | |||
| # for all g in Gn, compute (d(g1, g) + d(g2, g)) / 2 | |||
| dis_median = [(dmatrix[i, idx_gi[0] + 1] + dmatrix[i, idx_gi[1] + 1]) / 2 for i in range(len(Gn))] | |||
| print(np.sort(dis_median)) | |||
| print(np.argsort(dis_median)) | |||
| return | |||
| def test_who_is_the_closest_in_GED_space(Gn): | |||
| from iam import GED | |||
| idx_gi = [0, 6] | |||
| g1 = Gn[idx_gi[0]] | |||
| g2 = Gn[idx_gi[1]] | |||
| # create the "median" graph. | |||
| gnew = g2.copy() | |||
| gnew.remove_node(0) | |||
| nx.draw_networkx(gnew) | |||
| plt.show() | |||
| print(gnew.nodes(data=True)) | |||
| Gn = [gnew] + Gn | |||
| # compute GEDs | |||
| ged_matrix = np.zeros((len(Gn), len(Gn))) | |||
| for i1 in tqdm(range(len(Gn)), desc='computing GEDs', file=sys.stdout): | |||
| for i2 in range(len(Gn)): | |||
| dis, _, _ = GED(Gn[i1], Gn[i2], lib='gedlib') | |||
| ged_matrix[i1, i2] = dis | |||
| print(np.sort(ged_matrix[idx_gi[0] + 1])) | |||
| print(np.argsort(ged_matrix[idx_gi[0] + 1])) | |||
| print(np.sort(ged_matrix[idx_gi[1] + 1])) | |||
| print(np.argsort(ged_matrix[idx_gi[1] + 1])) | |||
| # for all g in Gn, compute (GED(g1, g) + GED(g2, g)) / 2 | |||
| dis_median = [(ged_matrix[i, idx_gi[0] + 1] + ged_matrix[i, idx_gi[1] + 1]) / 2 for i in range(len(Gn))] | |||
| print(np.sort(dis_median)) | |||
| print(np.argsort(dis_median)) | |||
| return | |||
| def test_will_IAM_give_the_median_graph_we_wanted(Gn): | |||
| idx_gi = [0, 6] | |||
| g1 = Gn[idx_gi[0]].copy() | |||
| g2 = Gn[idx_gi[1]].copy() | |||
| # del Gn[idx_gi[0]] | |||
| # del Gn[idx_gi[1] - 1] | |||
| g_median = test_iam_with_more_graphs_as_init([g1, g2], [g1, g2], c_ei=1, c_er=1, c_es=1) | |||
| # g_median = test_iam_with_more_graphs_as_init(Gn, Gn, c_ei=1, c_er=1, c_es=1) | |||
| nx.draw_networkx(g_median) | |||
| plt.show() | |||
| print(g_median.nodes(data=True)) | |||
| print(g_median.edges(data=True)) | |||
| def test_new_IAM_allGraph_deleteNodes(Gn): | |||
| idx_gi = [0, 6] | |||
| # g1 = Gn[idx_gi[0]].copy() | |||
| # g2 = Gn[idx_gi[1]].copy() | |||
| # g1 = nx.Graph(name='haha') | |||
| # g1.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'O'}), (2, {'atom': 'C'})]) | |||
| # g1.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'})]) | |||
| # g2 = nx.Graph(name='hahaha') | |||
| # g2.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'O'}), (2, {'atom': 'C'}), | |||
| # (3, {'atom': 'O'}), (4, {'atom': 'C'})]) | |||
| # g2.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'}), | |||
| # (2, 3, {'bond_type': '1'}), (3, 4, {'bond_type': '1'})]) | |||
| g1 = nx.Graph(name='haha') | |||
| g1.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'C'}), (2, {'atom': 'C'}), | |||
| (3, {'atom': 'S'}), (4, {'atom': 'S'})]) | |||
| g1.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'}), | |||
| (2, 3, {'bond_type': '1'}), (2, 4, {'bond_type': '1'})]) | |||
| g2 = nx.Graph(name='hahaha') | |||
| g2.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'C'}), (2, {'atom': 'C'}), | |||
| (3, {'atom': 'O'}), (4, {'atom': 'O'})]) | |||
| g2.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'}), | |||
| (2, 3, {'bond_type': '1'}), (2, 4, {'bond_type': '1'})]) | |||
| # g2 = g1.copy() | |||
| # g2.add_nodes_from([(3, {'atom': 'O'})]) | |||
| # g2.add_nodes_from([(4, {'atom': 'C'})]) | |||
| # g2.add_edges_from([(1, 3, {'bond_type': '1'})]) | |||
| # g2.add_edges_from([(3, 4, {'bond_type': '1'})]) | |||
| # del Gn[idx_gi[0]] | |||
| # del Gn[idx_gi[1] - 1] | |||
| nx.draw_networkx(g1) | |||
| plt.show() | |||
| print(g1.nodes(data=True)) | |||
| print(g1.edges(data=True)) | |||
| nx.draw_networkx(g2) | |||
| plt.show() | |||
| print(g2.nodes(data=True)) | |||
| print(g2.edges(data=True)) | |||
| g_median = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations([g1, g2], [g1, g2], c_ei=1, c_er=1, c_es=1) | |||
| # g_median = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(Gn, Gn, c_ei=1, c_er=1, c_es=1) | |||
| nx.draw_networkx(g_median) | |||
| plt.show() | |||
| print(g_median.nodes(data=True)) | |||
| print(g_median.edges(data=True)) | |||
| def test_the_simple_two(Gn, gkernel): | |||
| from gk_iam import gk_iam_nearest_multi, compute_kernel | |||
| lmbda = 0.03 # termination probalility | |||
| r_max = 10 # recursions | |||
| l = 500 | |||
| alpha_range = np.linspace(0.5, 0.5, 1) | |||
| k = 2 # k nearest neighbors | |||
| # randomly select two molecules | |||
| np.random.seed(1) | |||
| idx_gi = [0, 6] # np.random.randint(0, len(Gn), 2) | |||
| g1 = Gn[idx_gi[0]] | |||
| g2 = Gn[idx_gi[1]] | |||
| Gn_mix = [g.copy() for g in Gn] | |||
| Gn_mix.append(g1.copy()) | |||
| Gn_mix.append(g2.copy()) | |||
| # g_tmp = iam([g1, g2]) | |||
| # nx.draw_networkx(g_tmp) | |||
| # plt.show() | |||
| # compute | |||
| # k_list = [] # kernel between each graph and itself. | |||
| # k_g1_list = [] # kernel between each graph and g1 | |||
| # k_g2_list = [] # kernel between each graph and g2 | |||
| # for ig, g in tqdm(enumerate(Gn), desc='computing self kernels', file=sys.stdout): | |||
| # ktemp = compute_kernel([g, g1, g2], 'marginalizedkernel', False) | |||
| # k_list.append(ktemp[0][0, 0]) | |||
| # k_g1_list.append(ktemp[0][0, 1]) | |||
| # k_g2_list.append(ktemp[0][0, 2]) | |||
| km = compute_kernel(Gn_mix, gkernel, True) | |||
| # k_list = np.diag(km) # kernel between each graph and itself. | |||
| # k_g1_list = km[idx_gi[0]] # kernel between each graph and g1 | |||
| # k_g2_list = km[idx_gi[1]] # kernel between each graph and g2 | |||
| g_best = [] | |||
| dis_best = [] | |||
| # for each alpha | |||
| for alpha in alpha_range: | |||
| print('alpha =', alpha) | |||
| dhat, ghat_list = gk_iam_nearest_multi(Gn, [g1, g2], [alpha, 1 - alpha], | |||
| range(len(Gn), len(Gn) + 2), km, | |||
| k, r_max,gkernel) | |||
| dis_best.append(dhat) | |||
| g_best.append(ghat_list) | |||
| for idx, item in enumerate(alpha_range): | |||
| print('when alpha is', item, 'the shortest distance is', dis_best[idx]) | |||
| print('the corresponding pre-images are') | |||
| for g in g_best[idx]: | |||
| nx.draw_networkx(g) | |||
| plt.show() | |||
| print(g.nodes(data=True)) | |||
| print(g.edges(data=True)) | |||
| def test_remove_bests(Gn, gkernel): | |||
| from gk_iam import gk_iam_nearest_multi, compute_kernel | |||
| lmbda = 0.03 # termination probalility | |||
| r_max = 10 # recursions | |||
| l = 500 | |||
| alpha_range = np.linspace(0.5, 0.5, 1) | |||
| k = 20 # k nearest neighbors | |||
| # randomly select two molecules | |||
| np.random.seed(1) | |||
| idx_gi = [0, 6] # np.random.randint(0, len(Gn), 2) | |||
| g1 = Gn[idx_gi[0]] | |||
| g2 = Gn[idx_gi[1]] | |||
| # remove the best 2 graphs. | |||
| del Gn[idx_gi[0]] | |||
| del Gn[idx_gi[1] - 1] | |||
| # del Gn[8] | |||
| Gn_mix = [g.copy() for g in Gn] | |||
| Gn_mix.append(g1.copy()) | |||
| Gn_mix.append(g2.copy()) | |||
| # compute | |||
| km = compute_kernel(Gn_mix, gkernel, True) | |||
| g_best = [] | |||
| dis_best = [] | |||
| # for each alpha | |||
| for alpha in alpha_range: | |||
| print('alpha =', alpha) | |||
| dhat, ghat_list = gk_iam_nearest_multi(Gn, [g1, g2], [alpha, 1 - alpha], | |||
| range(len(Gn), len(Gn) + 2), km, | |||
| k, r_max, gkernel) | |||
| dis_best.append(dhat) | |||
| g_best.append(ghat_list) | |||
| for idx, item in enumerate(alpha_range): | |||
| print('when alpha is', item, 'the shortest distance is', dis_best[idx]) | |||
| print('the corresponding pre-images are') | |||
| for g in g_best[idx]: | |||
| draw_Letter_graph(g) | |||
| # nx.draw_networkx(g) | |||
| # plt.show() | |||
| print(g.nodes(data=True)) | |||
| print(g.edges(data=True)) | |||
| def test_gkiam_letter_h(): | |||
| from gk_iam import gk_iam_nearest_multi, compute_kernel | |||
| from iam import median_distance | |||
| ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt', | |||
| 'extra_params': {}} # node nsymb | |||
| # ds = {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt', | |||
| # 'extra_params': {}} # node nsymb | |||
| Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params']) | |||
| gkernel = 'structuralspkernel' | |||
| lmbda = 0.03 # termination probalility | |||
| r_max = 3 # recursions | |||
| # alpha_range = np.linspace(0.5, 0.5, 1) | |||
| k = 10 # k nearest neighbors | |||
| # classify graphs according to letters. | |||
| idx_dict = get_same_item_indices(y_all) | |||
| time_list = [] | |||
| sod_list = [] | |||
| sod_min_list = [] | |||
| for letter in idx_dict: | |||
| print('\n-------------------------------------------------------\n') | |||
| Gn_let = [Gn[i].copy() for i in idx_dict[letter]] | |||
| Gn_mix = Gn_let + [g.copy() for g in Gn_let] | |||
| alpha_range = np.linspace(1 / len(Gn_let), 1 / len(Gn_let), 1) | |||
| # compute | |||
| time0 = time.time() | |||
| km = compute_kernel(Gn_mix, gkernel, True) | |||
| g_best = [] | |||
| dis_best = [] | |||
| # for each alpha | |||
| for alpha in alpha_range: | |||
| print('alpha =', alpha) | |||
| dhat, ghat_list = gk_iam_nearest_multi(Gn_let, Gn_let, [alpha] * len(Gn_let), | |||
| range(len(Gn_let), len(Gn_mix)), km, | |||
| k, r_max, gkernel, c_ei=1.7, | |||
| c_er=1.7, c_es=1.7) | |||
| dis_best.append(dhat) | |||
| g_best.append(ghat_list) | |||
| time_list.append(time.time() - time0) | |||
| # show best graphs and save them to file. | |||
| for idx, item in enumerate(alpha_range): | |||
| print('when alpha is', item, 'the shortest distance is', dis_best[idx]) | |||
| print('the corresponding pre-images are') | |||
| for g in g_best[idx]: | |||
| draw_Letter_graph(g, savepath='results/gk_iam/') | |||
| # nx.draw_networkx(g) | |||
| # plt.show() | |||
| print(g.nodes(data=True)) | |||
| print(g.edges(data=True)) | |||
| # compute the corresponding sod in graph space. (alpha range not considered.) | |||
| sod_tmp, _ = median_distance(g_best[0], Gn_let) | |||
| sod_list.append(sod_tmp) | |||
| sod_min_list.append(np.min(sod_tmp)) | |||
| print('\nsods in graph space: ', sod_list) | |||
| print('\nsmallest sod in graph space for each letter: ', sod_min_list) | |||
| print('\ntimes:', time_list) | |||
| def get_same_item_indices(ls): | |||
| """Get the indices of the same items in a list. Return a dict keyed by items. | |||
| """ | |||
| idx_dict = {} | |||
| for idx, item in enumerate(ls): | |||
| if item in idx_dict: | |||
| idx_dict[item].append(idx) | |||
| else: | |||
| idx_dict[item] = [idx] | |||
| return idx_dict | |||
| #def compute_letter_median_by_average(Gn): | |||
| # return g_median | |||
| def test_iam_letter_h(): | |||
| from iam import test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations | |||
| from gk_iam import dis_gstar, compute_kernel | |||
| ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt', | |||
| 'extra_params': {}} # node nsymb | |||
| # ds = {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt', | |||
| # 'extra_params': {}} # node nsymb | |||
| Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params']) | |||
| lmbda = 0.03 # termination probalility | |||
| # alpha_range = np.linspace(0.5, 0.5, 1) | |||
| # classify graphs according to letters. | |||
| idx_dict = get_same_item_indices(y_all) | |||
| time_list = [] | |||
| sod_list = [] | |||
| sod_min_list = [] | |||
| for letter in idx_dict: | |||
| Gn_let = [Gn[i].copy() for i in idx_dict[letter]] | |||
| alpha_range = np.linspace(1 / len(Gn_let), 1 / len(Gn_let), 1) | |||
| # compute | |||
| g_best = [] | |||
| dis_best = [] | |||
| time0 = time.time() | |||
| # for each alpha | |||
| for alpha in alpha_range: | |||
| print('alpha =', alpha) | |||
| ghat_list, dhat = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||
| Gn_let, Gn_let, c_ei=1.7, c_er=1.7, c_es=1.7) | |||
| dis_best.append(dhat) | |||
| g_best.append(ghat_list) | |||
| time_list.append(time.time() - time0) | |||
| # show best graphs and save them to file. | |||
| for idx, item in enumerate(alpha_range): | |||
| print('when alpha is', item, 'the shortest distance is', dis_best[idx]) | |||
| print('the corresponding pre-images are') | |||
| for g in g_best[idx]: | |||
| draw_Letter_graph(g, savepath='results/iam/') | |||
| # nx.draw_networkx(g) | |||
| # plt.show() | |||
| print(g.nodes(data=True)) | |||
| print(g.edges(data=True)) | |||
| # compute the corresponding sod in kernel space. (alpha range not considered.) | |||
| gkernel = 'structuralspkernel' | |||
| sod_tmp = [] | |||
| Gn_mix = g_best[0] + Gn_let | |||
| km = compute_kernel(Gn_mix, gkernel, True) | |||
| for ig, g in tqdm(enumerate(g_best[0]), desc='computing kernel sod', file=sys.stdout): | |||
| dtemp = dis_gstar(ig, range(len(g_best[0]), len(Gn_mix)), | |||
| [alpha_range[0]] * len(Gn_let), km, withterm3=False) | |||
| sod_tmp.append(dtemp) | |||
| sod_list.append(sod_tmp) | |||
| sod_min_list.append(np.min(sod_tmp)) | |||
| print('\nsods in kernel space: ', sod_list) | |||
| print('\nsmallest sod in kernel space for each letter: ', sod_min_list) | |||
| print('\ntimes:', time_list) | |||
| if __name__ == '__main__': | |||
| # ds = {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt', | |||
| # 'extra_params': {}} # node/edge symb | |||
| ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt', | |||
| 'extra_params': {}} # node nsymb | |||
| # ds = {'name': 'Acyclic', 'dataset': '../datasets/monoterpenoides/trainset_9.ds', | |||
| # 'extra_params': {}} | |||
| # ds = {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||
| # 'extra_params': {}} # node symb | |||
| Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params']) | |||
| # Gn = Gn[0:20] | |||
| # import networkx.algorithms.isomorphism as iso | |||
| # G1 = nx.MultiDiGraph() | |||
| # G2 = nx.MultiDiGraph() | |||
| # G1.add_nodes_from([1,2,3], fill='red') | |||
| # G2.add_nodes_from([10,20,30,40], fill='red') | |||
| # nx.add_path(G1, [1,2,3,4], weight=3, linewidth=2.5) | |||
| # nx.add_path(G2, [10,20,30,40], weight=3) | |||
| # nm = iso.categorical_node_match('fill', 'red') | |||
| # print(nx.is_isomorphic(G1, G2, node_match=nm)) | |||
| # | |||
| # test_new_IAM_allGraph_deleteNodes(Gn) | |||
| # test_will_IAM_give_the_median_graph_we_wanted(Gn) | |||
| # test_who_is_the_closest_in_GED_space(Gn) | |||
| # test_who_is_the_closest_in_kernel_space(Gn) | |||
| # test_the_simple_two(Gn, 'untilhpathkernel') | |||
| # test_remove_bests(Gn, 'untilhpathkernel') | |||
| test_gkiam_letter_h() | |||
| # test_iam_letter_h() | |||
| @@ -23,7 +23,7 @@ from pygraph.utils.parallel import parallel_gm | |||
| def commonwalkkernel(*args, | |||
| node_label='atom', | |||
| edge_label='bond_type', | |||
| n=None, | |||
| # n=None, | |||
| weight=1, | |||
| compute_method=None, | |||
| n_jobs=None, | |||
| @@ -35,26 +35,28 @@ def commonwalkkernel(*args, | |||
| List of graphs between which the kernels are calculated. | |||
| / | |||
| G1, G2 : NetworkX graphs | |||
| 2 graphs between which the kernel is calculated. | |||
| Two graphs between which the kernel is calculated. | |||
| node_label : string | |||
| node attribute used as label. The default node label is atom. | |||
| Node attribute used as symbolic label. The default node label is 'atom'. | |||
| edge_label : string | |||
| edge attribute used as label. The default edge label is bond_type. | |||
| n : integer | |||
| Longest length of walks. Only useful when applying the 'brute' method. | |||
| Edge attribute used as symbolic label. The default edge label is 'bond_type'. | |||
| # n : integer | |||
| # Longest length of walks. Only useful when applying the 'brute' method. | |||
| weight: integer | |||
| Weight coefficient of different lengths of walks, which represents beta | |||
| in 'exp' method and gamma in 'geo'. | |||
| compute_method : string | |||
| Method used to compute walk kernel. The Following choices are | |||
| available: | |||
| 'exp' : exponential serial method applied on the direct product graph, | |||
| as shown in reference [1]. The time complexity is O(n^6) for graphs | |||
| with n vertices. | |||
| 'geo' : geometric serial method applied on the direct product graph, as | |||
| shown in reference [1]. The time complexity is O(n^6) for graphs with n | |||
| vertices. | |||
| 'brute' : brute force, simply search for all walks and compare them. | |||
| 'exp': method based on exponential serials applied on the direct | |||
| product graph, as shown in reference [1]. The time complexity is O(n^6) | |||
| for graphs with n vertices. | |||
| 'geo': method based on geometric serials applied on the direct product | |||
| graph, as shown in reference [1]. The time complexity is O(n^6) for | |||
| graphs with n vertices. | |||
| # 'brute': brute force, simply search for all walks and compare them. | |||
| n_jobs : int | |||
| Number of jobs for parallelization. | |||
| Return | |||
| ------ | |||
| @@ -44,17 +44,20 @@ def marginalizedkernel(*args, | |||
| List of graphs between which the kernels are calculated. | |||
| / | |||
| G1, G2 : NetworkX graphs | |||
| 2 graphs between which the kernel is calculated. | |||
| Two graphs between which the kernel is calculated. | |||
| node_label : string | |||
| node attribute used as label. The default node label is atom. | |||
| Node attribute used as symbolic label. The default node label is 'atom'. | |||
| edge_label : string | |||
| edge attribute used as label. The default edge label is bond_type. | |||
| Edge attribute used as symbolic label. The default edge label is 'bond_type'. | |||
| p_quit : integer | |||
| the termination probability in the random walks generating step | |||
| The termination probability in the random walks generating step. | |||
| n_iteration : integer | |||
| time of iterations to calculate R_inf | |||
| Time of iterations to calculate R_inf. | |||
| remove_totters : boolean | |||
| whether to remove totters. The default value is True. | |||
| Whether to remove totterings by method introduced in [2]. The default | |||
| value is False. | |||
| n_jobs : int | |||
| Number of jobs for parallelization. | |||
| Return | |||
| ------ | |||
| @@ -41,15 +41,62 @@ def randomwalkkernel(*args, | |||
| List of graphs between which the kernels are calculated. | |||
| / | |||
| G1, G2 : NetworkX graphs | |||
| 2 graphs between which the kernel is calculated. | |||
| node_label : string | |||
| node attribute used as label. The default node label is atom. | |||
| Two graphs between which the kernel is calculated. | |||
| compute_method : string | |||
| Method used to compute kernel. The Following choices are | |||
| available: | |||
| 'sylvester' - Sylvester equation method. | |||
| 'conjugate' - conjugate gradient method. | |||
| 'fp' - fixed-point iterations. | |||
| 'spectral' - spectral decomposition. | |||
| weight : float | |||
| A constant weight set for random walks of length h. | |||
| p : None | |||
| Initial probability distribution on the unlabeled direct product graph | |||
| of two graphs. It is set to be uniform over all vertices in the direct | |||
| product graph. | |||
| q : None | |||
| Stopping probability distribution on the unlabeled direct product graph | |||
| of two graphs. It is set to be uniform over all vertices in the direct | |||
| product graph. | |||
| edge_weight: float | |||
| Edge attribute name corresponding to the edge weight. | |||
| node_kernels: dict | |||
| A dictionary of kernel functions for nodes, including 3 items: 'symb' | |||
| for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix' | |||
| for both labels. The first 2 functions take two node labels as | |||
| parameters, and the 'mix' function takes 4 parameters, a symbolic and a | |||
| non-symbolic label for each the two nodes. Each label is in form of 2-D | |||
| dimension array (n_samples, n_features). Each function returns a number | |||
| as the kernel value. Ignored when nodes are unlabeled. This argument | |||
| is designated to conjugate gradient method and fixed-point iterations. | |||
| edge_kernels: dict | |||
| A dictionary of kernel functions for edges, including 3 items: 'symb' | |||
| for symbolic edge labels, 'nsymb' for non-symbolic edge labels, 'mix' | |||
| for both labels. The first 2 functions take two edge labels as | |||
| parameters, and the 'mix' function takes 4 parameters, a symbolic and a | |||
| non-symbolic label for each the two edges. Each label is in form of 2-D | |||
| dimension array (n_samples, n_features). Each function returns a number | |||
| as the kernel value. Ignored when edges are unlabeled. This argument | |||
| is designated to conjugate gradient method and fixed-point iterations. | |||
| node_label: string | |||
| Node attribute used as label. The default node label is atom. This | |||
| argument is designated to conjugate gradient method and fixed-point | |||
| iterations. | |||
| edge_label : string | |||
| edge attribute used as label. The default edge label is bond_type. | |||
| h : integer | |||
| Longest length of walks. | |||
| method : string | |||
| Method used to compute the random walk kernel. Available methods are 'sylvester', 'conjugate', 'fp', 'spectral' and 'kron'. | |||
| Edge attribute used as label. The default edge label is bond_type. This | |||
| argument is designated to conjugate gradient method and fixed-point | |||
| iterations. | |||
| sub_kernel: string | |||
| Method used to compute walk kernel. The Following choices are | |||
| available: | |||
| 'exp' : method based on exponential serials. | |||
| 'geo' : method based on geometric serials. | |||
| n_jobs: int | |||
| Number of jobs for parallelization. | |||
| Return | |||
| ------ | |||
| @@ -168,7 +215,7 @@ def _sylvester_equation(Gn, lmda, p, q, eweight, n_jobs, verbose=True): | |||
| if q == None: | |||
| # don't normalize adjacency matrices if q is a uniform vector. Note | |||
| # A_wave_list accually contains the transposes of the adjacency matrices. | |||
| # A_wave_list actually contains the transposes of the adjacency matrices. | |||
| A_wave_list = [ | |||
| nx.adjacency_matrix(G, eweight).todense().transpose() for G in | |||
| (tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) if | |||
| @@ -259,7 +306,7 @@ def _conjugate_gradient(Gn, lmda, p, q, ds_attrs, node_kernels, edge_kernels, | |||
| # # this is faster from unlabeled graphs. @todo: why? | |||
| # if q == None: | |||
| # # don't normalize adjacency matrices if q is a uniform vector. Note | |||
| # # A_wave_list accually contains the transposes of the adjacency matrices. | |||
| # # A_wave_list actually contains the transposes of the adjacency matrices. | |||
| # A_wave_list = [ | |||
| # nx.adjacency_matrix(G, eweight).todense().transpose() for G in | |||
| # tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) | |||
| @@ -376,7 +423,7 @@ def _fixed_point(Gn, lmda, p, q, ds_attrs, node_kernels, edge_kernels, | |||
| # # this is faster from unlabeled graphs. @todo: why? | |||
| # if q == None: | |||
| # # don't normalize adjacency matrices if q is a uniform vector. Note | |||
| # # A_wave_list accually contains the transposes of the adjacency matrices. | |||
| # # A_wave_list actually contains the transposes of the adjacency matrices. | |||
| # A_wave_list = [ | |||
| # nx.adjacency_matrix(G, eweight).todense().transpose() for G in | |||
| # tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) | |||
| @@ -481,7 +528,7 @@ def _spectral_decomposition(Gn, weight, p, q, sub_kernel, eweight, n_jobs, verbo | |||
| for G in (tqdm(Gn, desc='spectral decompose', file=sys.stdout) if | |||
| verbose else Gn): | |||
| # don't normalize adjacency matrices if q is a uniform vector. Note | |||
| # A accually is the transpose of the adjacency matrix. | |||
| # A actually is the transpose of the adjacency matrix. | |||
| A = nx.adjacency_matrix(G, eweight).todense().transpose() | |||
| ew, ev = np.linalg.eig(A) | |||
| D_list.append(ew) | |||
| @@ -33,12 +33,12 @@ def spkernel(*args, | |||
| List of graphs between which the kernels are calculated. | |||
| / | |||
| G1, G2 : NetworkX graphs | |||
| 2 graphs between which the kernel is calculated. | |||
| Two graphs between which the kernel is calculated. | |||
| node_label : string | |||
| node attribute used as label. The default node label is atom. | |||
| Node attribute used as label. The default node label is atom. | |||
| edge_weight : string | |||
| Edge attribute name corresponding to the edge weight. | |||
| node_kernels: dict | |||
| node_kernels : dict | |||
| A dictionary of kernel functions for nodes, including 3 items: 'symb' | |||
| for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix' | |||
| for both labels. The first 2 functions take two node labels as | |||
| @@ -46,6 +46,8 @@ def spkernel(*args, | |||
| non-symbolic label for each the two nodes. Each label is in form of 2-D | |||
| dimension array (n_samples, n_features). Each function returns an | |||
| number as the kernel value. Ignored when nodes are unlabeled. | |||
| n_jobs : int | |||
| Number of jobs for parallelization. | |||
| Return | |||
| ------ | |||
| @@ -42,14 +42,15 @@ def structuralspkernel(*args, | |||
| List of graphs between which the kernels are calculated. | |||
| / | |||
| G1, G2 : NetworkX graphs | |||
| 2 graphs between which the kernel is calculated. | |||
| Two graphs between which the kernel is calculated. | |||
| node_label : string | |||
| node attribute used as label. The default node label is atom. | |||
| Node attribute used as label. The default node label is atom. | |||
| edge_weight : string | |||
| Edge attribute name corresponding to the edge weight. | |||
| Edge attribute name corresponding to the edge weight. Applied for the | |||
| computation of the shortest paths. | |||
| edge_label : string | |||
| edge attribute used as label. The default edge label is bond_type. | |||
| node_kernels: dict | |||
| Edge attribute used as label. The default edge label is bond_type. | |||
| node_kernels : dict | |||
| A dictionary of kernel functions for nodes, including 3 items: 'symb' | |||
| for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix' | |||
| for both labels. The first 2 functions take two node labels as | |||
| @@ -57,7 +58,7 @@ def structuralspkernel(*args, | |||
| non-symbolic label for each the two nodes. Each label is in form of 2-D | |||
| dimension array (n_samples, n_features). Each function returns a number | |||
| as the kernel value. Ignored when nodes are unlabeled. | |||
| edge_kernels: dict | |||
| edge_kernels : dict | |||
| A dictionary of kernel functions for edges, including 3 items: 'symb' | |||
| for symbolic edge labels, 'nsymb' for non-symbolic edge labels, 'mix' | |||
| for both labels. The first 2 functions take two edge labels as | |||
| @@ -65,6 +66,13 @@ def structuralspkernel(*args, | |||
| non-symbolic label for each the two edges. Each label is in form of 2-D | |||
| dimension array (n_samples, n_features). Each function returns a number | |||
| as the kernel value. Ignored when edges are unlabeled. | |||
| compute_method : string | |||
| Computation method to store the shortest paths and compute the graph | |||
| kernel. The Following choices are available: | |||
| 'trie': store paths as tries. | |||
| 'naive': store paths to lists. | |||
| n_jobs : int | |||
| Number of jobs for parallelization. | |||
| Return | |||
| ------ | |||
| @@ -40,11 +40,19 @@ def treeletkernel(*args, | |||
| The sub-kernel between 2 real number vectors. Each vector counts the | |||
| numbers of isomorphic treelets in a graph. | |||
| node_label : string | |||
| Node attribute used as label. The default node label is atom. | |||
| Node attribute used as label. The default node label is atom. | |||
| edge_label : string | |||
| Edge attribute used as label. The default edge label is bond_type. | |||
| labeled : boolean | |||
| Whether the graphs are labeled. The default is True. | |||
| parallel : string/None | |||
| Which paralleliztion method is applied to compute the kernel. The | |||
| Following choices are available: | |||
| 'imap_unordered': use Python's multiprocessing.Pool.imap_unordered | |||
| method. | |||
| None: no parallelization is applied. | |||
| n_jobs : int | |||
| Number of jobs for parallelization. The default is to use all | |||
| computational cores. This argument is only valid when one of the | |||
| parallelization method is applied. | |||
| Return | |||
| ------ | |||
| @@ -26,7 +26,7 @@ def untilhpathkernel(*args, | |||
| node_label='atom', | |||
| edge_label='bond_type', | |||
| depth=10, | |||
| k_func='tanimoto', | |||
| k_func='MinMax', | |||
| compute_method='trie', | |||
| n_jobs=None, | |||
| verbose=True): | |||
| @@ -38,7 +38,7 @@ def untilhpathkernel(*args, | |||
| List of graphs between which the kernels are calculated. | |||
| / | |||
| G1, G2 : NetworkX graphs | |||
| 2 graphs between which the kernel is calculated. | |||
| Two graphs between which the kernel is calculated. | |||
| node_label : string | |||
| Node attribute used as label. The default node label is atom. | |||
| edge_label : string | |||
| @@ -47,9 +47,17 @@ def untilhpathkernel(*args, | |||
| Depth of search. Longest length of paths. | |||
| k_func : function | |||
| A kernel function applied using different notions of fingerprint | |||
| similarity. | |||
| compute_method: string | |||
| Computation method, 'trie' or 'naive'. | |||
| similarity, defining the type of feature map and normalization method | |||
| applied for the graph kernel. The Following choices are available: | |||
| 'MinMax': use the MiniMax kernel and counting feature map. | |||
| 'tanimoto': use the Tanimoto kernel and binary feature map. | |||
| compute_method : string | |||
| Computation method to store paths and compute the graph kernel. The | |||
| Following choices are available: | |||
| 'trie': store paths as tries. | |||
| 'naive': store paths to lists. | |||
| n_jobs : int | |||
| Number of jobs for parallelization. | |||
| Return | |||
| ------ | |||
| @@ -38,15 +38,28 @@ def weisfeilerlehmankernel(*args, | |||
| List of graphs between which the kernels are calculated. | |||
| / | |||
| G1, G2 : NetworkX graphs | |||
| 2 graphs between which the kernel is calculated. | |||
| Two graphs between which the kernel is calculated. | |||
| node_label : string | |||
| node attribute used as label. The default node label is atom. | |||
| Node attribute used as label. The default node label is atom. | |||
| edge_label : string | |||
| edge attribute used as label. The default edge label is bond_type. | |||
| Edge attribute used as label. The default edge label is bond_type. | |||
| height : int | |||
| subtree height | |||
| Subtree height. | |||
| base_kernel : string | |||
| base kernel used in each iteration of WL kernel. The default base kernel is subtree kernel. For user-defined kernel, base_kernel is the name of the base kernel function used in each iteration of WL kernel. This function returns a Numpy matrix, each element of which is the user-defined Weisfeiler-Lehman kernel between 2 praphs. | |||
| Base kernel used in each iteration of WL kernel. Only default 'subtree' | |||
| kernel can be applied for now. | |||
| # The default base | |||
| # kernel is subtree kernel. For user-defined kernel, base_kernel is the | |||
| # name of the base kernel function used in each iteration of WL kernel. | |||
| # This function returns a Numpy matrix, each element of which is the | |||
| # user-defined Weisfeiler-Lehman kernel between 2 praphs. | |||
| parallel : None | |||
| Which paralleliztion method is applied to compute the kernel. No | |||
| parallelization can be applied for now. | |||
| n_jobs : int | |||
| Number of jobs for parallelization. The default is to use all | |||
| computational cores. This argument is only valid when one of the | |||
| parallelization method is applied and can be ignored for now. | |||
| Return | |||
| ------ | |||