| @@ -0,0 +1,65 @@ | |||||
| Node labels: [symbol] | |||||
| Node attributes: [chem, charge, x, y] | |||||
| Edge labels: [valence] | |||||
| Node labels were converted to integer values using this map: | |||||
| Component 0: | |||||
| 0 C | |||||
| 1 O | |||||
| 2 N | |||||
| 3 Cl | |||||
| 4 F | |||||
| 5 S | |||||
| 6 Se | |||||
| 7 P | |||||
| 8 Na | |||||
| 9 I | |||||
| 10 Co | |||||
| 11 Br | |||||
| 12 Li | |||||
| 13 Si | |||||
| 14 Mg | |||||
| 15 Cu | |||||
| 16 As | |||||
| 17 B | |||||
| 18 Pt | |||||
| 19 Ru | |||||
| 20 K | |||||
| 21 Pd | |||||
| 22 Au | |||||
| 23 Te | |||||
| 24 W | |||||
| 25 Rh | |||||
| 26 Zn | |||||
| 27 Bi | |||||
| 28 Pb | |||||
| 29 Ge | |||||
| 30 Sb | |||||
| 31 Sn | |||||
| 32 Ga | |||||
| 33 Hg | |||||
| 34 Ho | |||||
| 35 Tl | |||||
| 36 Ni | |||||
| 37 Tb | |||||
| Edge labels were converted to integer values using this map: | |||||
| Component 0: | |||||
| 0 1 | |||||
| 1 2 | |||||
| 2 3 | |||||
| Class labels were converted to integer values using this map: | |||||
| 0 a | |||||
| 1 i | |||||
| @@ -0,0 +1,75 @@ | |||||
| README for dataset DD | |||||
| === Usage === | |||||
| This folder contains the following comma separated text files | |||||
| (replace DS by the name of the dataset): | |||||
| n = total number of nodes | |||||
| m = total number of edges | |||||
| N = number of graphs | |||||
| (1) DS_A.txt (m lines) | |||||
| sparse (block diagonal) adjacency matrix for all graphs, | |||||
| each line corresponds to (row, col) resp. (node_id, node_id) | |||||
| (2) DS_graph_indicator.txt (n lines) | |||||
| column vector of graph identifiers for all nodes of all graphs, | |||||
| the value in the i-th line is the graph_id of the node with node_id i | |||||
| (3) DS_graph_labels.txt (N lines) | |||||
| class labels for all graphs in the dataset, | |||||
| the value in the i-th line is the class label of the graph with graph_id i | |||||
| (4) DS_node_labels.txt (n lines) | |||||
| column vector of node labels, | |||||
| the value in the i-th line corresponds to the node with node_id i | |||||
| There are OPTIONAL files if the respective information is available: | |||||
| (5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt) | |||||
| labels for the edges in DS_A_sparse.txt | |||||
| (6) DS_edge_attributes.txt (m lines; same size as DS_A.txt) | |||||
| attributes for the edges in DS_A.txt | |||||
| (7) DS_node_attributes.txt (n lines) | |||||
| matrix of node attributes, | |||||
| the comma seperated values in the i-th line is the attribute vector of the node with node_id i | |||||
| (8) DS_graph_attributes.txt (N lines) | |||||
| regression values for all graphs in the dataset, | |||||
| the value in the i-th line is the attribute of the graph with graph_id i | |||||
| === Description === | |||||
| D&D is a dataset of 1178 protein structures (Dobson and Doig, 2003). Each protein is | |||||
| represented by a graph, in which the nodes are amino acids and two nodes are connected | |||||
| by an edge if they are less than 6 Angstroms apart. The prediction task is to classify | |||||
| the protein structures into enzymes and non-enzymes. | |||||
| === Previous Use of the Dataset === | |||||
| Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph | |||||
| Kernels from Propagated Information. Under review at MLJ. | |||||
| Neumann, M., Patricia, N., Garnett, R., Kersting, K.: Efficient Graph Kernels by | |||||
| Randomization. In: P.A. Flach, T.D. Bie, N. Cristianini (eds.) ECML/PKDD, Notes in | |||||
| Computer Science, vol. 7523, pp. 378-393. Springer (2012). | |||||
| Shervashidze, N., Schweitzer, P., van Leeuwen, E., Mehlhorn, K., Borgwardt, K.: | |||||
| Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2539-2561 (2011) | |||||
| === References === | |||||
| P. D. Dobson and A. J. Doig. Distinguishing enzyme structures from non-enzymes without | |||||
| alignments. J. Mol. Biol., 330(4):771–783, Jul 2003. | |||||
| @@ -0,0 +1,70 @@ | |||||
| README for dataset NCI1 | |||||
| === Usage === | |||||
| This folder contains the following comma separated text files | |||||
| (replace DS by the name of the dataset): | |||||
| n = total number of nodes | |||||
| m = total number of edges | |||||
| N = number of graphs | |||||
| (1) DS_A.txt (m lines) | |||||
| sparse (block diagonal) adjacency matrix for all graphs, | |||||
| each line corresponds to (row, col) resp. (node_id, node_id) | |||||
| (2) DS_graph_indicator.txt (n lines) | |||||
| column vector of graph identifiers for all nodes of all graphs, | |||||
| the value in the i-th line is the graph_id of the node with node_id i | |||||
| (3) DS_graph_labels.txt (N lines) | |||||
| class labels for all graphs in the dataset, | |||||
| the value in the i-th line is the class label of the graph with graph_id i | |||||
| (4) DS_node_labels.txt (n lines) | |||||
| column vector of node labels, | |||||
| the value in the i-th line corresponds to the node with node_id i | |||||
| There are OPTIONAL files if the respective information is available: | |||||
| (5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt) | |||||
| labels for the edges in DS_A_sparse.txt | |||||
| (6) DS_edge_attributes.txt (m lines; same size as DS_A.txt) | |||||
| attributes for the edges in DS_A.txt | |||||
| (7) DS_node_attributes.txt (n lines) | |||||
| matrix of node attributes, | |||||
| the comma seperated values in the i-th line is the attribute vector of the node with node_id i | |||||
| (8) DS_graph_attributes.txt (N lines) | |||||
| regression values for all graphs in the dataset, | |||||
| the value in the i-th line is the attribute of the graph with graph_id i | |||||
| === Description === | |||||
| NCI1 and NCI109 represent two balanced subsets of datasets of chemical compounds screened | |||||
| for activity against non-small cell lung cancer and ovarian cancer cell lines respectively | |||||
| (Wale and Karypis (2006) and http://pubchem.ncbi.nlm.nih.gov). | |||||
| === Previous Use of the Dataset === | |||||
| Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph | |||||
| Kernels from Propagated Information. Under review at MLJ. | |||||
| Neumann, M., Patricia, N., Garnett, R., Kersting, K.: Efficient Graph Kernels by | |||||
| Randomization. In: P.A. Flach, T.D. Bie, N. Cristianini (eds.) ECML/PKDD, Notes in | |||||
| Computer Science, vol. 7523, pp. 378-393. Springer (2012). | |||||
| Shervashidze, N., Schweitzer, P., van Leeuwen, E., Mehlhorn, K., Borgwardt, K.: | |||||
| Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2539-2561 (2011) | |||||
| === References === | |||||
| N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and | |||||
| classification. In Proc. of ICDM, pages 678–689, Hong Kong, 2006. | |||||
| @@ -0,0 +1,70 @@ | |||||
| README for dataset NCI109 | |||||
| === Usage === | |||||
| This folder contains the following comma separated text files | |||||
| (replace DS by the name of the dataset): | |||||
| n = total number of nodes | |||||
| m = total number of edges | |||||
| N = number of graphs | |||||
| (1) DS_A.txt (m lines) | |||||
| sparse (block diagonal) adjacency matrix for all graphs, | |||||
| each line corresponds to (row, col) resp. (node_id, node_id) | |||||
| (2) DS_graph_indicator.txt (n lines) | |||||
| column vector of graph identifiers for all nodes of all graphs, | |||||
| the value in the i-th line is the graph_id of the node with node_id i | |||||
| (3) DS_graph_labels.txt (N lines) | |||||
| class labels for all graphs in the dataset, | |||||
| the value in the i-th line is the class label of the graph with graph_id i | |||||
| (4) DS_node_labels.txt (n lines) | |||||
| column vector of node labels, | |||||
| the value in the i-th line corresponds to the node with node_id i | |||||
| There are OPTIONAL files if the respective information is available: | |||||
| (5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt) | |||||
| labels for the edges in DS_A_sparse.txt | |||||
| (6) DS_edge_attributes.txt (m lines; same size as DS_A.txt) | |||||
| attributes for the edges in DS_A.txt | |||||
| (7) DS_node_attributes.txt (n lines) | |||||
| matrix of node attributes, | |||||
| the comma seperated values in the i-th line is the attribute vector of the node with node_id i | |||||
| (8) DS_graph_attributes.txt (N lines) | |||||
| regression values for all graphs in the dataset, | |||||
| the value in the i-th line is the attribute of the graph with graph_id i | |||||
| === Description === | |||||
| NCI1 and NCI109 represent two balanced subsets of datasets of chemical compounds screened | |||||
| for activity against non-small cell lung cancer and ovarian cancer cell lines respectively | |||||
| (Wale and Karypis (2006) and http://pubchem.ncbi.nlm.nih.gov). | |||||
| === Previous Use of the Dataset === | |||||
| Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph | |||||
| Kernels from Propagated Information. Under review at MLJ. | |||||
| Neumann, M., Patricia, N., Garnett, R., Kersting, K.: Efficient Graph Kernels by | |||||
| Randomization. In: P.A. Flach, T.D. Bie, N. Cristianini (eds.) ECML/PKDD, Notes in | |||||
| Computer Science, vol. 7523, pp. 378-393. Springer (2012). | |||||
| Shervashidze, N., Schweitzer, P., van Leeuwen, E., Mehlhorn, K., Borgwardt, K.: | |||||
| Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2539-2561 (2011) | |||||
| === References === | |||||
| N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and | |||||
| classification. In Proc. of ICDM, pages 678–689, Hong Kong, 2006. | |||||
| @@ -12,21 +12,21 @@ import multiprocessing | |||||
| from pygraph.kernels.commonWalkKernel import commonwalkkernel | from pygraph.kernels.commonWalkKernel import commonwalkkernel | ||||
| dslist = [ | dslist = [ | ||||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| # 'task': 'regression'}, # node symb | |||||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # # contains single node graph, node symb | |||||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
| # # node nsymb | |||||
| # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
| # # node symb/nsymb | |||||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # contains single node graph, node symb | |||||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| 'task': 'regression'}, # node symb | |||||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
| # node nsymb | |||||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
| # node symb/nsymb | |||||
| # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | ||||
| # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | ||||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||||
| # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | ||||
| # | # | ||||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
| @@ -12,22 +12,22 @@ import multiprocessing | |||||
| from pygraph.kernels.marginalizedKernel import marginalizedkernel | from pygraph.kernels.marginalizedKernel import marginalizedkernel | ||||
| dslist = [ | dslist = [ | ||||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| # 'task': 'regression'}, # node symb | |||||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # # contains single node graph, node symb | |||||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
| # # node nsymb | |||||
| # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
| # # node symb/nsymb | |||||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # contains single node graph, node symb | |||||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| 'task': 'regression'}, # node symb | |||||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
| # node nsymb | |||||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
| # node symb/nsymb | |||||
| # {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | # {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | ||||
| # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | ||||
| # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | ||||
| {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
| # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
| # | # | ||||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
| # # node/edge symb | # # node/edge symb | ||||
| @@ -17,22 +17,23 @@ import numpy as np | |||||
| dslist = [ | dslist = [ | ||||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| # 'task': 'regression'}, # node symb | |||||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # # contains single node graph, node symb | |||||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
| # # node symb/nsymb | |||||
| # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
| # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # contains single node graph, node symb | |||||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| 'task': 'regression'}, # node symb | |||||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
| # node nsymb | |||||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
| # node symb/nsymb | |||||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | ||||
| # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
| # # node nsymb | |||||
| {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
| {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
| {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
| # | # | ||||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
| # # node/edge symb | # # node/edge symb | ||||
| @@ -8,14 +8,14 @@ from pygraph.utils.kernels import deltakernel, gaussiankernel, kernelproduct | |||||
| # datasets | # datasets | ||||
| dslist = [ | dslist = [ | ||||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| # 'task': 'regression'}, # node symb | |||||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # # contains single node graph, node symb | |||||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # contains single node graph, node symb | |||||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| 'task': 'regression'}, # node symb | |||||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | ||||
| # node nsymb | # node nsymb | ||||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | ||||
| @@ -14,22 +14,22 @@ from pygraph.kernels.structuralspKernel import structuralspkernel | |||||
| from pygraph.utils.kernels import deltakernel, gaussiankernel, kernelproduct | from pygraph.utils.kernels import deltakernel, gaussiankernel, kernelproduct | ||||
| dslist = [ | dslist = [ | ||||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| # 'task': 'regression'}, # node symb | |||||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # # contains single node graph, node symb | |||||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
| # # node nsymb | |||||
| # {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||||
| # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
| # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # contains single node graph, node symb | |||||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| 'task': 'regression'}, # node symb | |||||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
| # node nsymb | |||||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||||
| {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
| {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
| # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | ||||
| # # node symb/nsymb | # # node symb/nsymb | ||||
| {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
| # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
| # | # | ||||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
| # # node/edge symb | # # node/edge symb | ||||
| @@ -14,22 +14,22 @@ from pygraph.kernels.treeletKernel import treeletkernel | |||||
| from pygraph.utils.kernels import gaussiankernel, polynomialkernel | from pygraph.utils.kernels import gaussiankernel, polynomialkernel | ||||
| dslist = [ | dslist = [ | ||||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| # 'task': 'regression'}, # node symb | |||||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | ||||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | ||||
| # contains single node graph, node symb | # contains single node graph, node symb | ||||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| 'task': 'regression'}, # node symb | |||||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | ||||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | ||||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | ||||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
| # node symb/nsymb | |||||
| {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | ||||
| {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | ||||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | ||||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
| # node nsymb | |||||
| {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
| # node symb/nsymb | |||||
| # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
| # # node nsymb | |||||
| # | # | ||||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
| # # node/edge symb | # # node/edge symb | ||||
| @@ -12,21 +12,21 @@ import multiprocessing | |||||
| from pygraph.kernels.untilHPathKernel import untilhpathkernel | from pygraph.kernels.untilHPathKernel import untilhpathkernel | ||||
| dslist = [ | dslist = [ | ||||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| # 'task': 'regression'}, # node symb | |||||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # # contains single node graph, node symb | |||||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
| # # node nsymb | |||||
| # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
| # # node symb/nsymb | |||||
| # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
| # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
| # {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # contains single node graph, node symb | |||||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| 'task': 'regression'}, # node symb | |||||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
| # node nsymb | |||||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
| # node symb/nsymb | |||||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||||
| {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
| {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
| {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | ||||
| # | # | ||||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
| @@ -14,22 +14,22 @@ from pygraph.kernels.weisfeilerLehmanKernel import weisfeilerlehmankernel | |||||
| dslist = [ | dslist = [ | ||||
| # {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| # 'task': 'regression'}, # node symb | |||||
| # {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| # 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # # contains single node graph, node symb | |||||
| # {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| # {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| # {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
| 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
| # contains single node graph, node symb | |||||
| {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| 'task': 'regression'}, # node symb | |||||
| {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
| {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
| {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
| # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | ||||
| # # node nsymb | # # node nsymb | ||||
| # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
| # # node symb/nsymb | |||||
| # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
| # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
| # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
| {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | ||||
| {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
| # node symb/nsymb | |||||
| {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
| {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
| {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
| # | # | ||||
| # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
| # # node/edge symb | # # node/edge symb | ||||
| @@ -277,7 +277,8 @@ def gk_iam_nearest(Gn, alpha, idx_gi, Kmatrix, k, r_max): | |||||
| # return dhat, ghat_list | # return dhat, ghat_list | ||||
| def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, gkernel): | |||||
| def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, | |||||
| gkernel, c_ei=1, c_er=1, c_es=1, epsilon=0.001): | |||||
| """This function constructs graph pre-image by the iterative pre-image | """This function constructs graph pre-image by the iterative pre-image | ||||
| framework in reference [1], algorithm 1, where the step of generating new | framework in reference [1], algorithm 1, where the step of generating new | ||||
| graphs randomly is replaced by the IAM algorithm in reference [2]. | graphs randomly is replaced by the IAM algorithm in reference [2]. | ||||
| @@ -312,37 +313,44 @@ def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, g | |||||
| return 0, g0hat_list | return 0, g0hat_list | ||||
| dhat = dis_gs[0] # the nearest distance | dhat = dis_gs[0] # the nearest distance | ||||
| ghat_list = [g.copy() for g in g0hat_list] | ghat_list = [g.copy() for g in g0hat_list] | ||||
| for g in ghat_list: | |||||
| draw_Letter_graph(g) | |||||
| # for g in ghat_list: | |||||
| # draw_Letter_graph(g) | |||||
| # nx.draw_networkx(g) | # nx.draw_networkx(g) | ||||
| # plt.show() | # plt.show() | ||||
| print(g.nodes(data=True)) | |||||
| print(g.edges(data=True)) | |||||
| # print(g.nodes(data=True)) | |||||
| # print(g.edges(data=True)) | |||||
| Gk = [Gn_init[ig].copy() for ig in sort_idx[0:k]] # the k nearest neighbors | Gk = [Gn_init[ig].copy() for ig in sort_idx[0:k]] # the k nearest neighbors | ||||
| for gi in Gk: | |||||
| # nx.draw_networkx(gi) | |||||
| # plt.show() | |||||
| draw_Letter_graph(g) | |||||
| print(gi.nodes(data=True)) | |||||
| print(gi.edges(data=True)) | |||||
| # for gi in Gk: | |||||
| ## nx.draw_networkx(gi) | |||||
| ## plt.show() | |||||
| # draw_Letter_graph(g) | |||||
| # print(gi.nodes(data=True)) | |||||
| # print(gi.edges(data=True)) | |||||
| Gs_nearest = Gk.copy() | Gs_nearest = Gk.copy() | ||||
| # gihat_list = [] | # gihat_list = [] | ||||
| # i = 1 | # i = 1 | ||||
| r = 1 | |||||
| while r < r_max: | |||||
| print('r =', r) | |||||
| # found = False | |||||
| r = 0 | |||||
| itr = 0 | |||||
| # cur_sod = dhat | |||||
| # old_sod = cur_sod * 2 | |||||
| sod_list = [dhat] | |||||
| found = False | |||||
| nb_updated = 0 | |||||
| while r < r_max:# and not found: # @todo: if not found?# and np.abs(old_sod - cur_sod) > epsilon: | |||||
| print('\nr =', r) | |||||
| print('itr for gk =', itr, '\n') | |||||
| found = False | |||||
| # Gs_nearest = Gk + gihat_list | # Gs_nearest = Gk + gihat_list | ||||
| # g_tmp = iam(Gs_nearest) | # g_tmp = iam(Gs_nearest) | ||||
| g_tmp_list = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
| Gn_median, Gs_nearest, c_ei=1, c_er=1, c_es=1) | |||||
| for g in g_tmp_list: | |||||
| g_tmp_list, _ = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
| Gn_median, Gs_nearest, c_ei=c_ei, c_er=c_er, c_es=c_es) | |||||
| # for g in g_tmp_list: | |||||
| # nx.draw_networkx(g) | # nx.draw_networkx(g) | ||||
| # plt.show() | # plt.show() | ||||
| draw_Letter_graph(g) | |||||
| print(g.nodes(data=True)) | |||||
| print(g.edges(data=True)) | |||||
| # draw_Letter_graph(g) | |||||
| # print(g.nodes(data=True)) | |||||
| # print(g.edges(data=True)) | |||||
| # compute distance between phi and the new generated graphs. | # compute distance between phi and the new generated graphs. | ||||
| knew = compute_kernel(g_tmp_list + Gn_median, gkernel, False) | knew = compute_kernel(g_tmp_list + Gn_median, gkernel, False) | ||||
| @@ -358,6 +366,7 @@ def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, g | |||||
| # k_g1_list[1] + alpha[1] * alpha[1] * k_list[1]) | # k_g1_list[1] + alpha[1] * alpha[1] * k_list[1]) | ||||
| # find the new k nearest graphs. | # find the new k nearest graphs. | ||||
| dnew_best = min(dnew_list) | |||||
| dis_gs = dnew_list + dis_gs # add the new nearest distances. | dis_gs = dnew_list + dis_gs # add the new nearest distances. | ||||
| Gs_nearest = [g.copy() for g in g_tmp_list] + Gs_nearest # add the corresponding graphs. | Gs_nearest = [g.copy() for g in g_tmp_list] + Gs_nearest # add the corresponding graphs. | ||||
| sort_idx = np.argsort(dis_gs) | sort_idx = np.argsort(dis_gs) | ||||
| @@ -367,21 +376,34 @@ def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, g | |||||
| print(dis_gs[-1]) | print(dis_gs[-1]) | ||||
| Gs_nearest = [Gs_nearest[idx] for idx in sort_idx[0:k]] | Gs_nearest = [Gs_nearest[idx] for idx in sort_idx[0:k]] | ||||
| nb_best = len(np.argwhere(dis_gs == dis_gs[0]).flatten().tolist()) | nb_best = len(np.argwhere(dis_gs == dis_gs[0]).flatten().tolist()) | ||||
| if len([i for i in sort_idx[0:nb_best] if i < len(dnew_list)]) > 0: | |||||
| print('I have smaller or equal distance!') | |||||
| if dnew_best < dhat and np.abs(dnew_best - dhat) > epsilon: | |||||
| print('I have smaller distance!') | |||||
| print(str(dhat) + '->' + str(dis_gs[0])) | print(str(dhat) + '->' + str(dis_gs[0])) | ||||
| dhat = dis_gs[0] | dhat = dis_gs[0] | ||||
| idx_best_list = np.argwhere(dnew_list == dhat).flatten().tolist() | idx_best_list = np.argwhere(dnew_list == dhat).flatten().tolist() | ||||
| ghat_list = [g_tmp_list[idx].copy() for idx in idx_best_list] | ghat_list = [g_tmp_list[idx].copy() for idx in idx_best_list] | ||||
| for g in ghat_list: | |||||
| # nx.draw_networkx(g) | |||||
| # plt.show() | |||||
| draw_Letter_graph(g) | |||||
| print(g.nodes(data=True)) | |||||
| print(g.edges(data=True)) | |||||
| r = 0 | |||||
| else: | |||||
| # for g in ghat_list: | |||||
| ## nx.draw_networkx(g) | |||||
| ## plt.show() | |||||
| # draw_Letter_graph(g) | |||||
| # print(g.nodes(data=True)) | |||||
| # print(g.edges(data=True)) | |||||
| r = 0 | |||||
| found = True | |||||
| nb_updated += 1 | |||||
| elif np.abs(dnew_best - dhat) < epsilon: | |||||
| print('I have almost equal distance!') | |||||
| print(str(dhat) + '->' + str(dnew_best)) | |||||
| if not found: | |||||
| r += 1 | r += 1 | ||||
| # old_sod = cur_sod | |||||
| # cur_sod = dnew_best | |||||
| sod_list.append(dhat) | |||||
| itr += 1 | |||||
| print('\nthe graph is updated', nb_updated, 'times.') | |||||
| print('sods in kernel space:', sod_list, '\n') | |||||
| return dhat, ghat_list | return dhat, ghat_list | ||||
| @@ -9,6 +9,7 @@ Iterative alternate minimizations using GED. | |||||
| import numpy as np | import numpy as np | ||||
| import random | import random | ||||
| import networkx as nx | import networkx as nx | ||||
| from tqdm import tqdm | |||||
| import sys | import sys | ||||
| #from Cython_GedLib_2 import librariesImport, script | #from Cython_GedLib_2 import librariesImport, script | ||||
| @@ -181,13 +182,27 @@ def GED(g1, g2, lib='gedlib'): | |||||
| return dis, pi_forward, pi_backward | return dis, pi_forward, pi_backward | ||||
| def median_distance(Gn, Gn_median, measure='ged', verbose=False): | |||||
| dis_list = [] | |||||
| pi_forward_list = [] | |||||
| for idx, G in tqdm(enumerate(Gn), desc='computing median distances', | |||||
| file=sys.stdout) if verbose else enumerate(Gn): | |||||
| dis_sum = 0 | |||||
| pi_forward_list.append([]) | |||||
| for G_p in Gn_median: | |||||
| dis_tmp, pi_tmp_forward, pi_tmp_backward = GED(G, G_p) | |||||
| pi_forward_list[idx].append(pi_tmp_forward) | |||||
| dis_sum += dis_tmp | |||||
| dis_list.append(dis_sum) | |||||
| return dis_list, pi_forward_list | |||||
| # --------------------------- These are tests --------------------------------# | # --------------------------- These are tests --------------------------------# | ||||
| def test_iam_with_more_graphs_as_init(Gn, G_candidate, c_ei=3, c_er=3, c_es=1, | def test_iam_with_more_graphs_as_init(Gn, G_candidate, c_ei=3, c_er=3, c_es=1, | ||||
| node_label='atom', edge_label='bond_type'): | node_label='atom', edge_label='bond_type'): | ||||
| """See my name, then you know what I do. | """See my name, then you know what I do. | ||||
| """ | """ | ||||
| from tqdm import tqdm | |||||
| # Gn = Gn[0:10] | # Gn = Gn[0:10] | ||||
| Gn = [nx.convert_node_labels_to_integers(g) for g in Gn] | Gn = [nx.convert_node_labels_to_integers(g) for g in Gn] | ||||
| @@ -321,7 +336,7 @@ def test_iam_with_more_graphs_as_init(Gn, G_candidate, c_ei=3, c_er=3, c_es=1, | |||||
| def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | ||||
| Gn_median, Gn_candidate, c_ei=3, c_er=3, c_es=1, node_label='atom', | Gn_median, Gn_candidate, c_ei=3, c_er=3, c_es=1, node_label='atom', | ||||
| edge_label='bond_type', connected=True): | |||||
| edge_label='bond_type', connected=False): | |||||
| """See my name, then you know what I do. | """See my name, then you know what I do. | ||||
| """ | """ | ||||
| from tqdm import tqdm | from tqdm import tqdm | ||||
| @@ -330,8 +345,11 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
| node_ir = np.inf # corresponding to the node remove and insertion. | node_ir = np.inf # corresponding to the node remove and insertion. | ||||
| label_r = 'thanksdanny' # the label for node remove. # @todo: make this label unrepeatable. | label_r = 'thanksdanny' # the label for node remove. # @todo: make this label unrepeatable. | ||||
| ds_attrs = get_dataset_attributes(Gn_median + Gn_candidate, | ds_attrs = get_dataset_attributes(Gn_median + Gn_candidate, | ||||
| attr_names=['edge_labeled', 'node_attr_dim'], | |||||
| attr_names=['edge_labeled', 'node_attr_dim', 'edge_attr_dim'], | |||||
| edge_label=edge_label) | edge_label=edge_label) | ||||
| ite_max = 50 | |||||
| epsilon = 0.001 | |||||
| def generate_graph(G, pi_p_forward, label_set): | def generate_graph(G, pi_p_forward, label_set): | ||||
| @@ -460,13 +478,15 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
| g_tmp.remove_edge(nd1, nd2) | g_tmp.remove_edge(nd1, nd2) | ||||
| # do not change anything when equal. | # do not change anything when equal. | ||||
| # find the best graph generated in this iteration and update pi_p. | |||||
| # # find the best graph generated in this iteration and update pi_p. | |||||
| # @todo: should we update all graphs generated or just the best ones? | # @todo: should we update all graphs generated or just the best ones? | ||||
| dis_list, pi_forward_list = median_distance(G_new_list, Gn_median) | dis_list, pi_forward_list = median_distance(G_new_list, Gn_median) | ||||
| # @todo: should we remove the identical and connectivity check? | # @todo: should we remove the identical and connectivity check? | ||||
| # Don't know which is faster. | # Don't know which is faster. | ||||
| G_new_list, idx_list = remove_duplicates(G_new_list) | |||||
| pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||||
| if ds_attrs['node_attr_dim'] == 0 and ds_attrs['edge_attr_dim'] == 0: | |||||
| G_new_list, idx_list = remove_duplicates(G_new_list) | |||||
| pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||||
| dis_list = [dis_list[idx] for idx in idx_list] | |||||
| # if connected == True: | # if connected == True: | ||||
| # G_new_list, idx_list = remove_disconnected(G_new_list) | # G_new_list, idx_list = remove_disconnected(G_new_list) | ||||
| # pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | # pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | ||||
| @@ -482,25 +502,10 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
| # print(g.nodes(data=True)) | # print(g.nodes(data=True)) | ||||
| # print(g.edges(data=True)) | # print(g.edges(data=True)) | ||||
| return G_new_list, pi_forward_list | |||||
| return G_new_list, pi_forward_list, dis_list | |||||
| def median_distance(Gn, Gn_median, measure='ged', verbose=False): | |||||
| dis_list = [] | |||||
| pi_forward_list = [] | |||||
| for idx, G in tqdm(enumerate(Gn), desc='computing median distances', | |||||
| file=sys.stdout) if verbose else enumerate(Gn): | |||||
| dis_sum = 0 | |||||
| pi_forward_list.append([]) | |||||
| for G_p in Gn_median: | |||||
| dis_tmp, pi_tmp_forward, pi_tmp_backward = GED(G, G_p) | |||||
| pi_forward_list[idx].append(pi_tmp_forward) | |||||
| dis_sum += dis_tmp | |||||
| dis_list.append(dis_sum) | |||||
| return dis_list, pi_forward_list | |||||
| def best_median_graphs(Gn_candidate, dis_all, pi_all_forward): | |||||
| def best_median_graphs(Gn_candidate, pi_all_forward, dis_all): | |||||
| idx_min_list = np.argwhere(dis_all == np.min(dis_all)).flatten().tolist() | idx_min_list = np.argwhere(dis_all == np.min(dis_all)).flatten().tolist() | ||||
| dis_min = dis_all[idx_min_list[0]] | dis_min = dis_all[idx_min_list[0]] | ||||
| pi_forward_min_list = [pi_all_forward[idx] for idx in idx_min_list] | pi_forward_min_list = [pi_all_forward[idx] for idx in idx_min_list] | ||||
| @@ -508,25 +513,45 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
| return G_min_list, pi_forward_min_list, dis_min | return G_min_list, pi_forward_min_list, dis_min | ||||
| def iteration_proc(G, pi_p_forward): | |||||
| def iteration_proc(G, pi_p_forward, cur_sod): | |||||
| G_list = [G] | G_list = [G] | ||||
| pi_forward_list = [pi_p_forward] | pi_forward_list = [pi_p_forward] | ||||
| old_sod = cur_sod * 2 | |||||
| sod_list = [cur_sod] | |||||
| # iterations. | # iterations. | ||||
| for itr in range(0, 5): # @todo: the convergence condition? | |||||
| # print('itr is', itr) | |||||
| itr = 0 | |||||
| while itr < ite_max and np.abs(old_sod - cur_sod) > epsilon: | |||||
| # for itr in range(0, 5): # the convergence condition? | |||||
| print('itr is', itr) | |||||
| G_new_list = [] | G_new_list = [] | ||||
| pi_forward_new_list = [] | pi_forward_new_list = [] | ||||
| dis_new_list = [] | |||||
| for idx, G in enumerate(G_list): | for idx, G in enumerate(G_list): | ||||
| label_set = get_node_labels(Gn_median + [G], node_label) | label_set = get_node_labels(Gn_median + [G], node_label) | ||||
| G_tmp_list, pi_forward_tmp_list = generate_graph( | |||||
| G_tmp_list, pi_forward_tmp_list, dis_tmp_list = generate_graph( | |||||
| G, pi_forward_list[idx], label_set) | G, pi_forward_list[idx], label_set) | ||||
| G_new_list += G_tmp_list | G_new_list += G_tmp_list | ||||
| pi_forward_new_list += pi_forward_tmp_list | pi_forward_new_list += pi_forward_tmp_list | ||||
| dis_new_list += dis_tmp_list | |||||
| G_list = G_new_list[:] | G_list = G_new_list[:] | ||||
| pi_forward_list = pi_forward_new_list[:] | pi_forward_list = pi_forward_new_list[:] | ||||
| dis_list = dis_new_list[:] | |||||
| old_sod = cur_sod | |||||
| cur_sod = np.min(dis_list) | |||||
| sod_list.append(cur_sod) | |||||
| itr += 1 | |||||
| G_list, idx_list = remove_duplicates(G_list) | |||||
| pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||||
| # @todo: do we return all graphs or the best ones? | |||||
| # get the best ones of the generated graphs. | |||||
| G_list, pi_forward_list, dis_min = best_median_graphs( | |||||
| G_list, pi_forward_list, dis_list) | |||||
| if ds_attrs['node_attr_dim'] == 0 and ds_attrs['edge_attr_dim'] == 0: | |||||
| G_list, idx_list = remove_duplicates(G_list) | |||||
| pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||||
| # dis_list = [dis_list[idx] for idx in idx_list] | |||||
| # import matplotlib.pyplot as plt | # import matplotlib.pyplot as plt | ||||
| # for g in G_list: | # for g in G_list: | ||||
| @@ -535,7 +560,9 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
| # print(g.nodes(data=True)) | # print(g.nodes(data=True)) | ||||
| # print(g.edges(data=True)) | # print(g.edges(data=True)) | ||||
| return G_list, pi_forward_list # do we return all graphs or the best ones? | |||||
| print('\nsods:', sod_list, '\n') | |||||
| return G_list, pi_forward_list, dis_min | |||||
| def remove_duplicates(Gn): | def remove_duplicates(Gn): | ||||
| @@ -570,28 +597,37 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
| # phase 1: initilize. | # phase 1: initilize. | ||||
| # compute set-median. | # compute set-median. | ||||
| dis_min = np.inf | dis_min = np.inf | ||||
| dis_all, pi_all_forward = median_distance(Gn_candidate, Gn_median) | |||||
| dis_list, pi_forward_all = median_distance(Gn_candidate, Gn_median) | |||||
| # find all smallest distances. | # find all smallest distances. | ||||
| idx_min_list = np.argwhere(dis_all == np.min(dis_all)).flatten().tolist() | |||||
| dis_min = dis_all[idx_min_list[0]] | |||||
| idx_min_list = np.argwhere(dis_list == np.min(dis_list)).flatten().tolist() | |||||
| dis_min = dis_list[idx_min_list[0]] | |||||
| # phase 2: iteration. | # phase 2: iteration. | ||||
| G_list = [] | G_list = [] | ||||
| for idx_min in idx_min_list[::-1]: | |||||
| dis_list = [] | |||||
| pi_forward_list = [] | |||||
| for idx_min in idx_min_list: | |||||
| # print('idx_min is', idx_min) | # print('idx_min is', idx_min) | ||||
| G = Gn_candidate[idx_min].copy() | G = Gn_candidate[idx_min].copy() | ||||
| # list of edit operations. | # list of edit operations. | ||||
| pi_p_forward = pi_all_forward[idx_min] | |||||
| pi_p_forward = pi_forward_all[idx_min] | |||||
| # pi_p_backward = pi_all_backward[idx_min] | # pi_p_backward = pi_all_backward[idx_min] | ||||
| Gi_list, pi_i_forward_list = iteration_proc(G, pi_p_forward) | |||||
| Gi_list, pi_i_forward_list, dis_i_min = iteration_proc(G, pi_p_forward, dis_min) | |||||
| G_list += Gi_list | G_list += Gi_list | ||||
| dis_list.append(dis_i_min) | |||||
| pi_forward_list += pi_i_forward_list | |||||
| G_list, _ = remove_duplicates(G_list) | |||||
| if ds_attrs['node_attr_dim'] == 0 and ds_attrs['edge_attr_dim'] == 0: | |||||
| G_list, idx_list = remove_duplicates(G_list) | |||||
| dis_list = [dis_list[idx] for idx in idx_list] | |||||
| pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||||
| if connected == True: | if connected == True: | ||||
| G_list_con, _ = remove_disconnected(G_list) | |||||
| # if there is no connected graphs at all, then remain the disconnected ones. | |||||
| if len(G_list_con) > 0: # @todo: ?????????????????????????? | |||||
| G_list = G_list_con | |||||
| G_list_con, idx_list = remove_disconnected(G_list) | |||||
| # if there is no connected graphs at all, then remain the disconnected ones. | |||||
| if len(G_list_con) > 0: # @todo: ?????????????????????????? | |||||
| G_list = G_list_con | |||||
| dis_list = [dis_list[idx] for idx in idx_list] | |||||
| pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||||
| # import matplotlib.pyplot as plt | # import matplotlib.pyplot as plt | ||||
| # for g in G_list: | # for g in G_list: | ||||
| @@ -601,15 +637,15 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
| # print(g.edges(data=True)) | # print(g.edges(data=True)) | ||||
| # get the best median graphs | # get the best median graphs | ||||
| dis_all, pi_all_forward = median_distance(G_list, Gn_median) | |||||
| # dis_list, pi_forward_list = median_distance(G_list, Gn_median) | |||||
| G_min_list, pi_forward_min_list, dis_min = best_median_graphs( | G_min_list, pi_forward_min_list, dis_min = best_median_graphs( | ||||
| G_list, dis_all, pi_all_forward) | |||||
| G_list, pi_forward_list, dis_list) | |||||
| # for g in G_min_list: | # for g in G_min_list: | ||||
| # nx.draw_networkx(g) | # nx.draw_networkx(g) | ||||
| # plt.show() | # plt.show() | ||||
| # print(g.nodes(data=True)) | # print(g.nodes(data=True)) | ||||
| # print(g.edges(data=True)) | # print(g.edges(data=True)) | ||||
| return G_min_list | |||||
| return G_min_list, dis_min | |||||
| if __name__ == '__main__': | if __name__ == '__main__': | ||||
| @@ -0,0 +1,218 @@ | |||||
| import sys | |||||
| sys.path.insert(0, "../") | |||||
| #import pathlib | |||||
| import numpy as np | |||||
| import networkx as nx | |||||
| import time | |||||
| #import librariesImport | |||||
| #import script | |||||
| #sys.path.insert(0, "/home/bgauzere/dev/optim-graphes/") | |||||
| #import pygraph | |||||
| from pygraph.utils.graphfiles import loadDataset | |||||
| def replace_graph_in_env(script, graph, old_id, label='median'): | |||||
| """ | |||||
| Replace a graph in script | |||||
| If old_id is -1, add a new graph to the environnemt | |||||
| """ | |||||
| if(old_id > -1): | |||||
| script.PyClearGraph(old_id) | |||||
| new_id = script.PyAddGraph(label) | |||||
| for i in graph.nodes(): | |||||
| script.PyAddNode(new_id,str(i),graph.node[i]) # !! strings are required bt gedlib | |||||
| for e in graph.edges: | |||||
| script.PyAddEdge(new_id, str(e[0]),str(e[1]), {}) | |||||
| script.PyInitEnv() | |||||
| script.PySetMethod("IPFP", "") | |||||
| script.PyInitMethod() | |||||
| return new_id | |||||
| #Dessin median courrant | |||||
| def draw_Letter_graph(graph, savepath=''): | |||||
| import numpy as np | |||||
| import networkx as nx | |||||
| import matplotlib.pyplot as plt | |||||
| plt.figure() | |||||
| pos = {} | |||||
| for n in graph.nodes: | |||||
| pos[n] = np.array([float(graph.node[n]['attributes'][0]), | |||||
| float(graph.node[n]['attributes'][1])]) | |||||
| nx.draw_networkx(graph, pos) | |||||
| if savepath != '': | |||||
| plt.savefig(savepath + str(time.time()) + '.eps', format='eps', dpi=300) | |||||
| plt.show() | |||||
| plt.clf() | |||||
| #compute new mappings | |||||
| def update_mappings(script,median_id,listID): | |||||
| med_distances = {} | |||||
| med_mappings = {} | |||||
| sod = 0 | |||||
| for i in range(0,len(listID)): | |||||
| script.PyRunMethod(median_id,listID[i]) | |||||
| med_distances[i] = script.PyGetUpperBound(median_id,listID[i]) | |||||
| med_mappings[i] = script.PyGetForwardMap(median_id,listID[i]) | |||||
| sod += med_distances[i] | |||||
| return med_distances, med_mappings, sod | |||||
| def calcul_Sij(all_mappings, all_graphs,i,j): | |||||
| s_ij = 0 | |||||
| for k in range(0,len(all_mappings)): | |||||
| cur_graph = all_graphs[k] | |||||
| cur_mapping = all_mappings[k] | |||||
| size_graph = cur_graph.order() | |||||
| if ((cur_mapping[i] < size_graph) and | |||||
| (cur_mapping[j] < size_graph) and | |||||
| (cur_graph.has_edge(cur_mapping[i], cur_mapping[j]) == True)): | |||||
| s_ij += 1 | |||||
| return s_ij | |||||
| # def update_median_nodes_L1(median,listIdSet,median_id,dataset, mappings): | |||||
| # from scipy.stats.mstats import gmean | |||||
| # for i in median.nodes(): | |||||
| # for k in listIdSet: | |||||
| # vectors = [] #np.zeros((len(listIdSet),2)) | |||||
| # if(k != median_id): | |||||
| # phi_i = mappings[k][i] | |||||
| # if(phi_i < dataset[k].order()): | |||||
| # vectors.append([float(dataset[k].node[phi_i]['x']),float(dataset[k].node[phi_i]['y'])]) | |||||
| # new_labels = gmean(vectors) | |||||
| # median.node[i]['x'] = str(new_labels[0]) | |||||
| # median.node[i]['y'] = str(new_labels[1]) | |||||
| # return median | |||||
| def update_median_nodes(median,dataset,mappings): | |||||
| #update node attributes | |||||
| for i in median.nodes(): | |||||
| nb_sub=0 | |||||
| mean_label = {'x' : 0, 'y' : 0} | |||||
| for k in range(0,len(mappings)): | |||||
| phi_i = mappings[k][i] | |||||
| if ( phi_i < dataset[k].order() ): | |||||
| nb_sub += 1 | |||||
| mean_label['x'] += 0.75*float(dataset[k].node[phi_i]['x']) | |||||
| mean_label['y'] += 0.75*float(dataset[k].node[phi_i]['y']) | |||||
| median.node[i]['x'] = str((1/0.75)*(mean_label['x']/nb_sub)) | |||||
| median.node[i]['y'] = str((1/0.75)*(mean_label['y']/nb_sub)) | |||||
| return median | |||||
| def update_median_edges(dataset, mappings, median, cei=0.425,cer=0.425): | |||||
| #for letter high, ceir = 1.7, alpha = 0.75 | |||||
| size_dataset = len(dataset) | |||||
| ratio_cei_cer = cer/(cei + cer) | |||||
| threshold = size_dataset*ratio_cei_cer | |||||
| order_graph_median = median.order() | |||||
| for i in range(0,order_graph_median): | |||||
| for j in range(i+1,order_graph_median): | |||||
| s_ij = calcul_Sij(mappings,dataset,i,j) | |||||
| if(s_ij > threshold): | |||||
| median.add_edge(i,j) | |||||
| else: | |||||
| if(median.has_edge(i,j)): | |||||
| median.remove_edge(i,j) | |||||
| return median | |||||
| def compute_median(script, listID, dataset,verbose=False): | |||||
| """Compute a graph median of a dataset according to an environment | |||||
| Parameters | |||||
| script : An gedlib initialized environnement | |||||
| listID (list): a list of ID in script: encodes the dataset | |||||
| dataset (list): corresponding graphs in networkX format. We assume that graph | |||||
| listID[i] corresponds to dataset[i] | |||||
| Returns: | |||||
| A networkX graph, which is the median, with corresponding sod | |||||
| """ | |||||
| print(len(listID)) | |||||
| median_set_index, median_set_sod = compute_median_set(script, listID) | |||||
| print(median_set_index) | |||||
| print(median_set_sod) | |||||
| sods = [] | |||||
| #Ajout median dans environnement | |||||
| set_median = dataset[median_set_index].copy() | |||||
| median = dataset[median_set_index].copy() | |||||
| cur_med_id = replace_graph_in_env(script,median,-1) | |||||
| med_distances, med_mappings, cur_sod = update_mappings(script,cur_med_id,listID) | |||||
| sods.append(cur_sod) | |||||
| if(verbose): | |||||
| print(cur_sod) | |||||
| ite_max = 50 | |||||
| old_sod = cur_sod * 2 | |||||
| ite = 0 | |||||
| epsilon = 0.001 | |||||
| best_median | |||||
| while((ite < ite_max) and (np.abs(old_sod - cur_sod) > epsilon )): | |||||
| median = update_median_nodes(median,dataset, med_mappings) | |||||
| median = update_median_edges(dataset,med_mappings,median) | |||||
| cur_med_id = replace_graph_in_env(script,median,cur_med_id) | |||||
| med_distances, med_mappings, cur_sod = update_mappings(script,cur_med_id,listID) | |||||
| sods.append(cur_sod) | |||||
| if(verbose): | |||||
| print(cur_sod) | |||||
| ite += 1 | |||||
| return median, cur_sod, sods, set_median | |||||
| draw_Letter_graph(median) | |||||
| def compute_median_set(script,listID): | |||||
| 'Returns the id in listID corresponding to median set' | |||||
| #Calcul median set | |||||
| N=len(listID) | |||||
| map_id_to_index = {} | |||||
| map_index_to_id = {} | |||||
| for i in range(0,len(listID)): | |||||
| map_id_to_index[listID[i]] = i | |||||
| map_index_to_id[i] = listID[i] | |||||
| distances = np.zeros((N,N)) | |||||
| for i in listID: | |||||
| for j in listID: | |||||
| script.PyRunMethod(i,j) | |||||
| distances[map_id_to_index[i],map_id_to_index[j]] = script.PyGetUpperBound(i,j) | |||||
| median_set_index = np.argmin(np.sum(distances,0)) | |||||
| sod = np.min(np.sum(distances,0)) | |||||
| return median_set_index, sod | |||||
| #if __name__ == "__main__": | |||||
| # #Chargement du dataset | |||||
| # script.PyLoadGXLGraph('/home/bgauzere/dev/gedlib/data/datasets/Letter/HIGH/', '/home/bgauzere/dev/gedlib/data/collections/Letter_Z.xml') | |||||
| # script.PySetEditCost("LETTER") | |||||
| # script.PyInitEnv() | |||||
| # script.PySetMethod("IPFP", "") | |||||
| # script.PyInitMethod() | |||||
| # | |||||
| # dataset,my_y = pygraph.utils.graphfiles.loadDataset("/home/bgauzere/dev/gedlib/data/datasets/Letter/HIGH/Letter_Z.cxl") | |||||
| # | |||||
| # listID = script.PyGetAllGraphIds() | |||||
| # median, sod = compute_median(script,listID,dataset,verbose=True) | |||||
| # | |||||
| # print(sod) | |||||
| # draw_Letter_graph(median) | |||||
| if __name__ == '__main__': | |||||
| # test draw_Letter_graph | |||||
| ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt', | |||||
| 'extra_params': {}} # node nsymb | |||||
| Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params']) | |||||
| print(y_all) | |||||
| for g in Gn: | |||||
| draw_Letter_graph(g) | |||||
| @@ -0,0 +1,423 @@ | |||||
| #!/usr/bin/env python3 | |||||
| # -*- coding: utf-8 -*- | |||||
| """ | |||||
| Created on Thu Jul 4 12:20:16 2019 | |||||
| @author: ljia | |||||
| """ | |||||
| import numpy as np | |||||
| import networkx as nx | |||||
| import matplotlib.pyplot as plt | |||||
| import time | |||||
| from tqdm import tqdm | |||||
| import sys | |||||
| sys.path.insert(0, "../") | |||||
| from pygraph.utils.graphfiles import loadDataset | |||||
| from median import draw_Letter_graph | |||||
| # --------------------------- These are tests --------------------------------# | |||||
| def test_who_is_the_closest_in_kernel_space(Gn): | |||||
| idx_gi = [0, 6] | |||||
| g1 = Gn[idx_gi[0]] | |||||
| g2 = Gn[idx_gi[1]] | |||||
| # create the "median" graph. | |||||
| gnew = g2.copy() | |||||
| gnew.remove_node(0) | |||||
| nx.draw_networkx(gnew) | |||||
| plt.show() | |||||
| print(gnew.nodes(data=True)) | |||||
| Gn = [gnew] + Gn | |||||
| # compute gram matrix | |||||
| Kmatrix = compute_kernel(Gn, 'untilhpathkernel', True) | |||||
| # the distance matrix | |||||
| dmatrix = gram2distances(Kmatrix) | |||||
| print(np.sort(dmatrix[idx_gi[0] + 1])) | |||||
| print(np.argsort(dmatrix[idx_gi[0] + 1])) | |||||
| print(np.sort(dmatrix[idx_gi[1] + 1])) | |||||
| print(np.argsort(dmatrix[idx_gi[1] + 1])) | |||||
| # for all g in Gn, compute (d(g1, g) + d(g2, g)) / 2 | |||||
| dis_median = [(dmatrix[i, idx_gi[0] + 1] + dmatrix[i, idx_gi[1] + 1]) / 2 for i in range(len(Gn))] | |||||
| print(np.sort(dis_median)) | |||||
| print(np.argsort(dis_median)) | |||||
| return | |||||
| def test_who_is_the_closest_in_GED_space(Gn): | |||||
| from iam import GED | |||||
| idx_gi = [0, 6] | |||||
| g1 = Gn[idx_gi[0]] | |||||
| g2 = Gn[idx_gi[1]] | |||||
| # create the "median" graph. | |||||
| gnew = g2.copy() | |||||
| gnew.remove_node(0) | |||||
| nx.draw_networkx(gnew) | |||||
| plt.show() | |||||
| print(gnew.nodes(data=True)) | |||||
| Gn = [gnew] + Gn | |||||
| # compute GEDs | |||||
| ged_matrix = np.zeros((len(Gn), len(Gn))) | |||||
| for i1 in tqdm(range(len(Gn)), desc='computing GEDs', file=sys.stdout): | |||||
| for i2 in range(len(Gn)): | |||||
| dis, _, _ = GED(Gn[i1], Gn[i2], lib='gedlib') | |||||
| ged_matrix[i1, i2] = dis | |||||
| print(np.sort(ged_matrix[idx_gi[0] + 1])) | |||||
| print(np.argsort(ged_matrix[idx_gi[0] + 1])) | |||||
| print(np.sort(ged_matrix[idx_gi[1] + 1])) | |||||
| print(np.argsort(ged_matrix[idx_gi[1] + 1])) | |||||
| # for all g in Gn, compute (GED(g1, g) + GED(g2, g)) / 2 | |||||
| dis_median = [(ged_matrix[i, idx_gi[0] + 1] + ged_matrix[i, idx_gi[1] + 1]) / 2 for i in range(len(Gn))] | |||||
| print(np.sort(dis_median)) | |||||
| print(np.argsort(dis_median)) | |||||
| return | |||||
| def test_will_IAM_give_the_median_graph_we_wanted(Gn): | |||||
| idx_gi = [0, 6] | |||||
| g1 = Gn[idx_gi[0]].copy() | |||||
| g2 = Gn[idx_gi[1]].copy() | |||||
| # del Gn[idx_gi[0]] | |||||
| # del Gn[idx_gi[1] - 1] | |||||
| g_median = test_iam_with_more_graphs_as_init([g1, g2], [g1, g2], c_ei=1, c_er=1, c_es=1) | |||||
| # g_median = test_iam_with_more_graphs_as_init(Gn, Gn, c_ei=1, c_er=1, c_es=1) | |||||
| nx.draw_networkx(g_median) | |||||
| plt.show() | |||||
| print(g_median.nodes(data=True)) | |||||
| print(g_median.edges(data=True)) | |||||
| def test_new_IAM_allGraph_deleteNodes(Gn): | |||||
| idx_gi = [0, 6] | |||||
| # g1 = Gn[idx_gi[0]].copy() | |||||
| # g2 = Gn[idx_gi[1]].copy() | |||||
| # g1 = nx.Graph(name='haha') | |||||
| # g1.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'O'}), (2, {'atom': 'C'})]) | |||||
| # g1.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'})]) | |||||
| # g2 = nx.Graph(name='hahaha') | |||||
| # g2.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'O'}), (2, {'atom': 'C'}), | |||||
| # (3, {'atom': 'O'}), (4, {'atom': 'C'})]) | |||||
| # g2.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'}), | |||||
| # (2, 3, {'bond_type': '1'}), (3, 4, {'bond_type': '1'})]) | |||||
| g1 = nx.Graph(name='haha') | |||||
| g1.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'C'}), (2, {'atom': 'C'}), | |||||
| (3, {'atom': 'S'}), (4, {'atom': 'S'})]) | |||||
| g1.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'}), | |||||
| (2, 3, {'bond_type': '1'}), (2, 4, {'bond_type': '1'})]) | |||||
| g2 = nx.Graph(name='hahaha') | |||||
| g2.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'C'}), (2, {'atom': 'C'}), | |||||
| (3, {'atom': 'O'}), (4, {'atom': 'O'})]) | |||||
| g2.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'}), | |||||
| (2, 3, {'bond_type': '1'}), (2, 4, {'bond_type': '1'})]) | |||||
| # g2 = g1.copy() | |||||
| # g2.add_nodes_from([(3, {'atom': 'O'})]) | |||||
| # g2.add_nodes_from([(4, {'atom': 'C'})]) | |||||
| # g2.add_edges_from([(1, 3, {'bond_type': '1'})]) | |||||
| # g2.add_edges_from([(3, 4, {'bond_type': '1'})]) | |||||
| # del Gn[idx_gi[0]] | |||||
| # del Gn[idx_gi[1] - 1] | |||||
| nx.draw_networkx(g1) | |||||
| plt.show() | |||||
| print(g1.nodes(data=True)) | |||||
| print(g1.edges(data=True)) | |||||
| nx.draw_networkx(g2) | |||||
| plt.show() | |||||
| print(g2.nodes(data=True)) | |||||
| print(g2.edges(data=True)) | |||||
| g_median = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations([g1, g2], [g1, g2], c_ei=1, c_er=1, c_es=1) | |||||
| # g_median = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(Gn, Gn, c_ei=1, c_er=1, c_es=1) | |||||
| nx.draw_networkx(g_median) | |||||
| plt.show() | |||||
| print(g_median.nodes(data=True)) | |||||
| print(g_median.edges(data=True)) | |||||
| def test_the_simple_two(Gn, gkernel): | |||||
| from gk_iam import gk_iam_nearest_multi, compute_kernel | |||||
| lmbda = 0.03 # termination probalility | |||||
| r_max = 10 # recursions | |||||
| l = 500 | |||||
| alpha_range = np.linspace(0.5, 0.5, 1) | |||||
| k = 2 # k nearest neighbors | |||||
| # randomly select two molecules | |||||
| np.random.seed(1) | |||||
| idx_gi = [0, 6] # np.random.randint(0, len(Gn), 2) | |||||
| g1 = Gn[idx_gi[0]] | |||||
| g2 = Gn[idx_gi[1]] | |||||
| Gn_mix = [g.copy() for g in Gn] | |||||
| Gn_mix.append(g1.copy()) | |||||
| Gn_mix.append(g2.copy()) | |||||
| # g_tmp = iam([g1, g2]) | |||||
| # nx.draw_networkx(g_tmp) | |||||
| # plt.show() | |||||
| # compute | |||||
| # k_list = [] # kernel between each graph and itself. | |||||
| # k_g1_list = [] # kernel between each graph and g1 | |||||
| # k_g2_list = [] # kernel between each graph and g2 | |||||
| # for ig, g in tqdm(enumerate(Gn), desc='computing self kernels', file=sys.stdout): | |||||
| # ktemp = compute_kernel([g, g1, g2], 'marginalizedkernel', False) | |||||
| # k_list.append(ktemp[0][0, 0]) | |||||
| # k_g1_list.append(ktemp[0][0, 1]) | |||||
| # k_g2_list.append(ktemp[0][0, 2]) | |||||
| km = compute_kernel(Gn_mix, gkernel, True) | |||||
| # k_list = np.diag(km) # kernel between each graph and itself. | |||||
| # k_g1_list = km[idx_gi[0]] # kernel between each graph and g1 | |||||
| # k_g2_list = km[idx_gi[1]] # kernel between each graph and g2 | |||||
| g_best = [] | |||||
| dis_best = [] | |||||
| # for each alpha | |||||
| for alpha in alpha_range: | |||||
| print('alpha =', alpha) | |||||
| dhat, ghat_list = gk_iam_nearest_multi(Gn, [g1, g2], [alpha, 1 - alpha], | |||||
| range(len(Gn), len(Gn) + 2), km, | |||||
| k, r_max,gkernel) | |||||
| dis_best.append(dhat) | |||||
| g_best.append(ghat_list) | |||||
| for idx, item in enumerate(alpha_range): | |||||
| print('when alpha is', item, 'the shortest distance is', dis_best[idx]) | |||||
| print('the corresponding pre-images are') | |||||
| for g in g_best[idx]: | |||||
| nx.draw_networkx(g) | |||||
| plt.show() | |||||
| print(g.nodes(data=True)) | |||||
| print(g.edges(data=True)) | |||||
| def test_remove_bests(Gn, gkernel): | |||||
| from gk_iam import gk_iam_nearest_multi, compute_kernel | |||||
| lmbda = 0.03 # termination probalility | |||||
| r_max = 10 # recursions | |||||
| l = 500 | |||||
| alpha_range = np.linspace(0.5, 0.5, 1) | |||||
| k = 20 # k nearest neighbors | |||||
| # randomly select two molecules | |||||
| np.random.seed(1) | |||||
| idx_gi = [0, 6] # np.random.randint(0, len(Gn), 2) | |||||
| g1 = Gn[idx_gi[0]] | |||||
| g2 = Gn[idx_gi[1]] | |||||
| # remove the best 2 graphs. | |||||
| del Gn[idx_gi[0]] | |||||
| del Gn[idx_gi[1] - 1] | |||||
| # del Gn[8] | |||||
| Gn_mix = [g.copy() for g in Gn] | |||||
| Gn_mix.append(g1.copy()) | |||||
| Gn_mix.append(g2.copy()) | |||||
| # compute | |||||
| km = compute_kernel(Gn_mix, gkernel, True) | |||||
| g_best = [] | |||||
| dis_best = [] | |||||
| # for each alpha | |||||
| for alpha in alpha_range: | |||||
| print('alpha =', alpha) | |||||
| dhat, ghat_list = gk_iam_nearest_multi(Gn, [g1, g2], [alpha, 1 - alpha], | |||||
| range(len(Gn), len(Gn) + 2), km, | |||||
| k, r_max, gkernel) | |||||
| dis_best.append(dhat) | |||||
| g_best.append(ghat_list) | |||||
| for idx, item in enumerate(alpha_range): | |||||
| print('when alpha is', item, 'the shortest distance is', dis_best[idx]) | |||||
| print('the corresponding pre-images are') | |||||
| for g in g_best[idx]: | |||||
| draw_Letter_graph(g) | |||||
| # nx.draw_networkx(g) | |||||
| # plt.show() | |||||
| print(g.nodes(data=True)) | |||||
| print(g.edges(data=True)) | |||||
| def test_gkiam_letter_h(): | |||||
| from gk_iam import gk_iam_nearest_multi, compute_kernel | |||||
| from iam import median_distance | |||||
| ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt', | |||||
| 'extra_params': {}} # node nsymb | |||||
| # ds = {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt', | |||||
| # 'extra_params': {}} # node nsymb | |||||
| Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params']) | |||||
| gkernel = 'structuralspkernel' | |||||
| lmbda = 0.03 # termination probalility | |||||
| r_max = 3 # recursions | |||||
| # alpha_range = np.linspace(0.5, 0.5, 1) | |||||
| k = 10 # k nearest neighbors | |||||
| # classify graphs according to letters. | |||||
| idx_dict = get_same_item_indices(y_all) | |||||
| time_list = [] | |||||
| sod_list = [] | |||||
| sod_min_list = [] | |||||
| for letter in idx_dict: | |||||
| print('\n-------------------------------------------------------\n') | |||||
| Gn_let = [Gn[i].copy() for i in idx_dict[letter]] | |||||
| Gn_mix = Gn_let + [g.copy() for g in Gn_let] | |||||
| alpha_range = np.linspace(1 / len(Gn_let), 1 / len(Gn_let), 1) | |||||
| # compute | |||||
| time0 = time.time() | |||||
| km = compute_kernel(Gn_mix, gkernel, True) | |||||
| g_best = [] | |||||
| dis_best = [] | |||||
| # for each alpha | |||||
| for alpha in alpha_range: | |||||
| print('alpha =', alpha) | |||||
| dhat, ghat_list = gk_iam_nearest_multi(Gn_let, Gn_let, [alpha] * len(Gn_let), | |||||
| range(len(Gn_let), len(Gn_mix)), km, | |||||
| k, r_max, gkernel, c_ei=1.7, | |||||
| c_er=1.7, c_es=1.7) | |||||
| dis_best.append(dhat) | |||||
| g_best.append(ghat_list) | |||||
| time_list.append(time.time() - time0) | |||||
| # show best graphs and save them to file. | |||||
| for idx, item in enumerate(alpha_range): | |||||
| print('when alpha is', item, 'the shortest distance is', dis_best[idx]) | |||||
| print('the corresponding pre-images are') | |||||
| for g in g_best[idx]: | |||||
| draw_Letter_graph(g, savepath='results/gk_iam/') | |||||
| # nx.draw_networkx(g) | |||||
| # plt.show() | |||||
| print(g.nodes(data=True)) | |||||
| print(g.edges(data=True)) | |||||
| # compute the corresponding sod in graph space. (alpha range not considered.) | |||||
| sod_tmp, _ = median_distance(g_best[0], Gn_let) | |||||
| sod_list.append(sod_tmp) | |||||
| sod_min_list.append(np.min(sod_tmp)) | |||||
| print('\nsods in graph space: ', sod_list) | |||||
| print('\nsmallest sod in graph space for each letter: ', sod_min_list) | |||||
| print('\ntimes:', time_list) | |||||
| def get_same_item_indices(ls): | |||||
| """Get the indices of the same items in a list. Return a dict keyed by items. | |||||
| """ | |||||
| idx_dict = {} | |||||
| for idx, item in enumerate(ls): | |||||
| if item in idx_dict: | |||||
| idx_dict[item].append(idx) | |||||
| else: | |||||
| idx_dict[item] = [idx] | |||||
| return idx_dict | |||||
| #def compute_letter_median_by_average(Gn): | |||||
| # return g_median | |||||
| def test_iam_letter_h(): | |||||
| from iam import test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations | |||||
| from gk_iam import dis_gstar, compute_kernel | |||||
| ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt', | |||||
| 'extra_params': {}} # node nsymb | |||||
| # ds = {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt', | |||||
| # 'extra_params': {}} # node nsymb | |||||
| Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params']) | |||||
| lmbda = 0.03 # termination probalility | |||||
| # alpha_range = np.linspace(0.5, 0.5, 1) | |||||
| # classify graphs according to letters. | |||||
| idx_dict = get_same_item_indices(y_all) | |||||
| time_list = [] | |||||
| sod_list = [] | |||||
| sod_min_list = [] | |||||
| for letter in idx_dict: | |||||
| Gn_let = [Gn[i].copy() for i in idx_dict[letter]] | |||||
| alpha_range = np.linspace(1 / len(Gn_let), 1 / len(Gn_let), 1) | |||||
| # compute | |||||
| g_best = [] | |||||
| dis_best = [] | |||||
| time0 = time.time() | |||||
| # for each alpha | |||||
| for alpha in alpha_range: | |||||
| print('alpha =', alpha) | |||||
| ghat_list, dhat = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
| Gn_let, Gn_let, c_ei=1.7, c_er=1.7, c_es=1.7) | |||||
| dis_best.append(dhat) | |||||
| g_best.append(ghat_list) | |||||
| time_list.append(time.time() - time0) | |||||
| # show best graphs and save them to file. | |||||
| for idx, item in enumerate(alpha_range): | |||||
| print('when alpha is', item, 'the shortest distance is', dis_best[idx]) | |||||
| print('the corresponding pre-images are') | |||||
| for g in g_best[idx]: | |||||
| draw_Letter_graph(g, savepath='results/iam/') | |||||
| # nx.draw_networkx(g) | |||||
| # plt.show() | |||||
| print(g.nodes(data=True)) | |||||
| print(g.edges(data=True)) | |||||
| # compute the corresponding sod in kernel space. (alpha range not considered.) | |||||
| gkernel = 'structuralspkernel' | |||||
| sod_tmp = [] | |||||
| Gn_mix = g_best[0] + Gn_let | |||||
| km = compute_kernel(Gn_mix, gkernel, True) | |||||
| for ig, g in tqdm(enumerate(g_best[0]), desc='computing kernel sod', file=sys.stdout): | |||||
| dtemp = dis_gstar(ig, range(len(g_best[0]), len(Gn_mix)), | |||||
| [alpha_range[0]] * len(Gn_let), km, withterm3=False) | |||||
| sod_tmp.append(dtemp) | |||||
| sod_list.append(sod_tmp) | |||||
| sod_min_list.append(np.min(sod_tmp)) | |||||
| print('\nsods in kernel space: ', sod_list) | |||||
| print('\nsmallest sod in kernel space for each letter: ', sod_min_list) | |||||
| print('\ntimes:', time_list) | |||||
| if __name__ == '__main__': | |||||
| # ds = {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt', | |||||
| # 'extra_params': {}} # node/edge symb | |||||
| ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt', | |||||
| 'extra_params': {}} # node nsymb | |||||
| # ds = {'name': 'Acyclic', 'dataset': '../datasets/monoterpenoides/trainset_9.ds', | |||||
| # 'extra_params': {}} | |||||
| # ds = {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
| # 'extra_params': {}} # node symb | |||||
| Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params']) | |||||
| # Gn = Gn[0:20] | |||||
| # import networkx.algorithms.isomorphism as iso | |||||
| # G1 = nx.MultiDiGraph() | |||||
| # G2 = nx.MultiDiGraph() | |||||
| # G1.add_nodes_from([1,2,3], fill='red') | |||||
| # G2.add_nodes_from([10,20,30,40], fill='red') | |||||
| # nx.add_path(G1, [1,2,3,4], weight=3, linewidth=2.5) | |||||
| # nx.add_path(G2, [10,20,30,40], weight=3) | |||||
| # nm = iso.categorical_node_match('fill', 'red') | |||||
| # print(nx.is_isomorphic(G1, G2, node_match=nm)) | |||||
| # | |||||
| # test_new_IAM_allGraph_deleteNodes(Gn) | |||||
| # test_will_IAM_give_the_median_graph_we_wanted(Gn) | |||||
| # test_who_is_the_closest_in_GED_space(Gn) | |||||
| # test_who_is_the_closest_in_kernel_space(Gn) | |||||
| # test_the_simple_two(Gn, 'untilhpathkernel') | |||||
| # test_remove_bests(Gn, 'untilhpathkernel') | |||||
| test_gkiam_letter_h() | |||||
| # test_iam_letter_h() | |||||
| @@ -23,7 +23,7 @@ from pygraph.utils.parallel import parallel_gm | |||||
| def commonwalkkernel(*args, | def commonwalkkernel(*args, | ||||
| node_label='atom', | node_label='atom', | ||||
| edge_label='bond_type', | edge_label='bond_type', | ||||
| n=None, | |||||
| # n=None, | |||||
| weight=1, | weight=1, | ||||
| compute_method=None, | compute_method=None, | ||||
| n_jobs=None, | n_jobs=None, | ||||
| @@ -35,26 +35,28 @@ def commonwalkkernel(*args, | |||||
| List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
| / | / | ||||
| G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
| 2 graphs between which the kernel is calculated. | |||||
| Two graphs between which the kernel is calculated. | |||||
| node_label : string | node_label : string | ||||
| node attribute used as label. The default node label is atom. | |||||
| Node attribute used as symbolic label. The default node label is 'atom'. | |||||
| edge_label : string | edge_label : string | ||||
| edge attribute used as label. The default edge label is bond_type. | |||||
| n : integer | |||||
| Longest length of walks. Only useful when applying the 'brute' method. | |||||
| Edge attribute used as symbolic label. The default edge label is 'bond_type'. | |||||
| # n : integer | |||||
| # Longest length of walks. Only useful when applying the 'brute' method. | |||||
| weight: integer | weight: integer | ||||
| Weight coefficient of different lengths of walks, which represents beta | Weight coefficient of different lengths of walks, which represents beta | ||||
| in 'exp' method and gamma in 'geo'. | in 'exp' method and gamma in 'geo'. | ||||
| compute_method : string | compute_method : string | ||||
| Method used to compute walk kernel. The Following choices are | Method used to compute walk kernel. The Following choices are | ||||
| available: | available: | ||||
| 'exp' : exponential serial method applied on the direct product graph, | |||||
| as shown in reference [1]. The time complexity is O(n^6) for graphs | |||||
| with n vertices. | |||||
| 'geo' : geometric serial method applied on the direct product graph, as | |||||
| shown in reference [1]. The time complexity is O(n^6) for graphs with n | |||||
| vertices. | |||||
| 'brute' : brute force, simply search for all walks and compare them. | |||||
| 'exp': method based on exponential serials applied on the direct | |||||
| product graph, as shown in reference [1]. The time complexity is O(n^6) | |||||
| for graphs with n vertices. | |||||
| 'geo': method based on geometric serials applied on the direct product | |||||
| graph, as shown in reference [1]. The time complexity is O(n^6) for | |||||
| graphs with n vertices. | |||||
| # 'brute': brute force, simply search for all walks and compare them. | |||||
| n_jobs : int | |||||
| Number of jobs for parallelization. | |||||
| Return | Return | ||||
| ------ | ------ | ||||
| @@ -44,17 +44,20 @@ def marginalizedkernel(*args, | |||||
| List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
| / | / | ||||
| G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
| 2 graphs between which the kernel is calculated. | |||||
| Two graphs between which the kernel is calculated. | |||||
| node_label : string | node_label : string | ||||
| node attribute used as label. The default node label is atom. | |||||
| Node attribute used as symbolic label. The default node label is 'atom'. | |||||
| edge_label : string | edge_label : string | ||||
| edge attribute used as label. The default edge label is bond_type. | |||||
| Edge attribute used as symbolic label. The default edge label is 'bond_type'. | |||||
| p_quit : integer | p_quit : integer | ||||
| the termination probability in the random walks generating step | |||||
| The termination probability in the random walks generating step. | |||||
| n_iteration : integer | n_iteration : integer | ||||
| time of iterations to calculate R_inf | |||||
| Time of iterations to calculate R_inf. | |||||
| remove_totters : boolean | remove_totters : boolean | ||||
| whether to remove totters. The default value is True. | |||||
| Whether to remove totterings by method introduced in [2]. The default | |||||
| value is False. | |||||
| n_jobs : int | |||||
| Number of jobs for parallelization. | |||||
| Return | Return | ||||
| ------ | ------ | ||||
| @@ -41,15 +41,62 @@ def randomwalkkernel(*args, | |||||
| List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
| / | / | ||||
| G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
| 2 graphs between which the kernel is calculated. | |||||
| node_label : string | |||||
| node attribute used as label. The default node label is atom. | |||||
| Two graphs between which the kernel is calculated. | |||||
| compute_method : string | |||||
| Method used to compute kernel. The Following choices are | |||||
| available: | |||||
| 'sylvester' - Sylvester equation method. | |||||
| 'conjugate' - conjugate gradient method. | |||||
| 'fp' - fixed-point iterations. | |||||
| 'spectral' - spectral decomposition. | |||||
| weight : float | |||||
| A constant weight set for random walks of length h. | |||||
| p : None | |||||
| Initial probability distribution on the unlabeled direct product graph | |||||
| of two graphs. It is set to be uniform over all vertices in the direct | |||||
| product graph. | |||||
| q : None | |||||
| Stopping probability distribution on the unlabeled direct product graph | |||||
| of two graphs. It is set to be uniform over all vertices in the direct | |||||
| product graph. | |||||
| edge_weight: float | |||||
| Edge attribute name corresponding to the edge weight. | |||||
| node_kernels: dict | |||||
| A dictionary of kernel functions for nodes, including 3 items: 'symb' | |||||
| for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix' | |||||
| for both labels. The first 2 functions take two node labels as | |||||
| parameters, and the 'mix' function takes 4 parameters, a symbolic and a | |||||
| non-symbolic label for each the two nodes. Each label is in form of 2-D | |||||
| dimension array (n_samples, n_features). Each function returns a number | |||||
| as the kernel value. Ignored when nodes are unlabeled. This argument | |||||
| is designated to conjugate gradient method and fixed-point iterations. | |||||
| edge_kernels: dict | |||||
| A dictionary of kernel functions for edges, including 3 items: 'symb' | |||||
| for symbolic edge labels, 'nsymb' for non-symbolic edge labels, 'mix' | |||||
| for both labels. The first 2 functions take two edge labels as | |||||
| parameters, and the 'mix' function takes 4 parameters, a symbolic and a | |||||
| non-symbolic label for each the two edges. Each label is in form of 2-D | |||||
| dimension array (n_samples, n_features). Each function returns a number | |||||
| as the kernel value. Ignored when edges are unlabeled. This argument | |||||
| is designated to conjugate gradient method and fixed-point iterations. | |||||
| node_label: string | |||||
| Node attribute used as label. The default node label is atom. This | |||||
| argument is designated to conjugate gradient method and fixed-point | |||||
| iterations. | |||||
| edge_label : string | edge_label : string | ||||
| edge attribute used as label. The default edge label is bond_type. | |||||
| h : integer | |||||
| Longest length of walks. | |||||
| method : string | |||||
| Method used to compute the random walk kernel. Available methods are 'sylvester', 'conjugate', 'fp', 'spectral' and 'kron'. | |||||
| Edge attribute used as label. The default edge label is bond_type. This | |||||
| argument is designated to conjugate gradient method and fixed-point | |||||
| iterations. | |||||
| sub_kernel: string | |||||
| Method used to compute walk kernel. The Following choices are | |||||
| available: | |||||
| 'exp' : method based on exponential serials. | |||||
| 'geo' : method based on geometric serials. | |||||
| n_jobs: int | |||||
| Number of jobs for parallelization. | |||||
| Return | Return | ||||
| ------ | ------ | ||||
| @@ -168,7 +215,7 @@ def _sylvester_equation(Gn, lmda, p, q, eweight, n_jobs, verbose=True): | |||||
| if q == None: | if q == None: | ||||
| # don't normalize adjacency matrices if q is a uniform vector. Note | # don't normalize adjacency matrices if q is a uniform vector. Note | ||||
| # A_wave_list accually contains the transposes of the adjacency matrices. | |||||
| # A_wave_list actually contains the transposes of the adjacency matrices. | |||||
| A_wave_list = [ | A_wave_list = [ | ||||
| nx.adjacency_matrix(G, eweight).todense().transpose() for G in | nx.adjacency_matrix(G, eweight).todense().transpose() for G in | ||||
| (tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) if | (tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) if | ||||
| @@ -259,7 +306,7 @@ def _conjugate_gradient(Gn, lmda, p, q, ds_attrs, node_kernels, edge_kernels, | |||||
| # # this is faster from unlabeled graphs. @todo: why? | # # this is faster from unlabeled graphs. @todo: why? | ||||
| # if q == None: | # if q == None: | ||||
| # # don't normalize adjacency matrices if q is a uniform vector. Note | # # don't normalize adjacency matrices if q is a uniform vector. Note | ||||
| # # A_wave_list accually contains the transposes of the adjacency matrices. | |||||
| # # A_wave_list actually contains the transposes of the adjacency matrices. | |||||
| # A_wave_list = [ | # A_wave_list = [ | ||||
| # nx.adjacency_matrix(G, eweight).todense().transpose() for G in | # nx.adjacency_matrix(G, eweight).todense().transpose() for G in | ||||
| # tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) | # tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) | ||||
| @@ -376,7 +423,7 @@ def _fixed_point(Gn, lmda, p, q, ds_attrs, node_kernels, edge_kernels, | |||||
| # # this is faster from unlabeled graphs. @todo: why? | # # this is faster from unlabeled graphs. @todo: why? | ||||
| # if q == None: | # if q == None: | ||||
| # # don't normalize adjacency matrices if q is a uniform vector. Note | # # don't normalize adjacency matrices if q is a uniform vector. Note | ||||
| # # A_wave_list accually contains the transposes of the adjacency matrices. | |||||
| # # A_wave_list actually contains the transposes of the adjacency matrices. | |||||
| # A_wave_list = [ | # A_wave_list = [ | ||||
| # nx.adjacency_matrix(G, eweight).todense().transpose() for G in | # nx.adjacency_matrix(G, eweight).todense().transpose() for G in | ||||
| # tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) | # tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) | ||||
| @@ -481,7 +528,7 @@ def _spectral_decomposition(Gn, weight, p, q, sub_kernel, eweight, n_jobs, verbo | |||||
| for G in (tqdm(Gn, desc='spectral decompose', file=sys.stdout) if | for G in (tqdm(Gn, desc='spectral decompose', file=sys.stdout) if | ||||
| verbose else Gn): | verbose else Gn): | ||||
| # don't normalize adjacency matrices if q is a uniform vector. Note | # don't normalize adjacency matrices if q is a uniform vector. Note | ||||
| # A accually is the transpose of the adjacency matrix. | |||||
| # A actually is the transpose of the adjacency matrix. | |||||
| A = nx.adjacency_matrix(G, eweight).todense().transpose() | A = nx.adjacency_matrix(G, eweight).todense().transpose() | ||||
| ew, ev = np.linalg.eig(A) | ew, ev = np.linalg.eig(A) | ||||
| D_list.append(ew) | D_list.append(ew) | ||||
| @@ -33,12 +33,12 @@ def spkernel(*args, | |||||
| List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
| / | / | ||||
| G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
| 2 graphs between which the kernel is calculated. | |||||
| Two graphs between which the kernel is calculated. | |||||
| node_label : string | node_label : string | ||||
| node attribute used as label. The default node label is atom. | |||||
| Node attribute used as label. The default node label is atom. | |||||
| edge_weight : string | edge_weight : string | ||||
| Edge attribute name corresponding to the edge weight. | Edge attribute name corresponding to the edge weight. | ||||
| node_kernels: dict | |||||
| node_kernels : dict | |||||
| A dictionary of kernel functions for nodes, including 3 items: 'symb' | A dictionary of kernel functions for nodes, including 3 items: 'symb' | ||||
| for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix' | for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix' | ||||
| for both labels. The first 2 functions take two node labels as | for both labels. The first 2 functions take two node labels as | ||||
| @@ -46,6 +46,8 @@ def spkernel(*args, | |||||
| non-symbolic label for each the two nodes. Each label is in form of 2-D | non-symbolic label for each the two nodes. Each label is in form of 2-D | ||||
| dimension array (n_samples, n_features). Each function returns an | dimension array (n_samples, n_features). Each function returns an | ||||
| number as the kernel value. Ignored when nodes are unlabeled. | number as the kernel value. Ignored when nodes are unlabeled. | ||||
| n_jobs : int | |||||
| Number of jobs for parallelization. | |||||
| Return | Return | ||||
| ------ | ------ | ||||
| @@ -42,14 +42,15 @@ def structuralspkernel(*args, | |||||
| List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
| / | / | ||||
| G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
| 2 graphs between which the kernel is calculated. | |||||
| Two graphs between which the kernel is calculated. | |||||
| node_label : string | node_label : string | ||||
| node attribute used as label. The default node label is atom. | |||||
| Node attribute used as label. The default node label is atom. | |||||
| edge_weight : string | edge_weight : string | ||||
| Edge attribute name corresponding to the edge weight. | |||||
| Edge attribute name corresponding to the edge weight. Applied for the | |||||
| computation of the shortest paths. | |||||
| edge_label : string | edge_label : string | ||||
| edge attribute used as label. The default edge label is bond_type. | |||||
| node_kernels: dict | |||||
| Edge attribute used as label. The default edge label is bond_type. | |||||
| node_kernels : dict | |||||
| A dictionary of kernel functions for nodes, including 3 items: 'symb' | A dictionary of kernel functions for nodes, including 3 items: 'symb' | ||||
| for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix' | for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix' | ||||
| for both labels. The first 2 functions take two node labels as | for both labels. The first 2 functions take two node labels as | ||||
| @@ -57,7 +58,7 @@ def structuralspkernel(*args, | |||||
| non-symbolic label for each the two nodes. Each label is in form of 2-D | non-symbolic label for each the two nodes. Each label is in form of 2-D | ||||
| dimension array (n_samples, n_features). Each function returns a number | dimension array (n_samples, n_features). Each function returns a number | ||||
| as the kernel value. Ignored when nodes are unlabeled. | as the kernel value. Ignored when nodes are unlabeled. | ||||
| edge_kernels: dict | |||||
| edge_kernels : dict | |||||
| A dictionary of kernel functions for edges, including 3 items: 'symb' | A dictionary of kernel functions for edges, including 3 items: 'symb' | ||||
| for symbolic edge labels, 'nsymb' for non-symbolic edge labels, 'mix' | for symbolic edge labels, 'nsymb' for non-symbolic edge labels, 'mix' | ||||
| for both labels. The first 2 functions take two edge labels as | for both labels. The first 2 functions take two edge labels as | ||||
| @@ -65,6 +66,13 @@ def structuralspkernel(*args, | |||||
| non-symbolic label for each the two edges. Each label is in form of 2-D | non-symbolic label for each the two edges. Each label is in form of 2-D | ||||
| dimension array (n_samples, n_features). Each function returns a number | dimension array (n_samples, n_features). Each function returns a number | ||||
| as the kernel value. Ignored when edges are unlabeled. | as the kernel value. Ignored when edges are unlabeled. | ||||
| compute_method : string | |||||
| Computation method to store the shortest paths and compute the graph | |||||
| kernel. The Following choices are available: | |||||
| 'trie': store paths as tries. | |||||
| 'naive': store paths to lists. | |||||
| n_jobs : int | |||||
| Number of jobs for parallelization. | |||||
| Return | Return | ||||
| ------ | ------ | ||||
| @@ -40,11 +40,19 @@ def treeletkernel(*args, | |||||
| The sub-kernel between 2 real number vectors. Each vector counts the | The sub-kernel between 2 real number vectors. Each vector counts the | ||||
| numbers of isomorphic treelets in a graph. | numbers of isomorphic treelets in a graph. | ||||
| node_label : string | node_label : string | ||||
| Node attribute used as label. The default node label is atom. | |||||
| Node attribute used as label. The default node label is atom. | |||||
| edge_label : string | edge_label : string | ||||
| Edge attribute used as label. The default edge label is bond_type. | Edge attribute used as label. The default edge label is bond_type. | ||||
| labeled : boolean | |||||
| Whether the graphs are labeled. The default is True. | |||||
| parallel : string/None | |||||
| Which paralleliztion method is applied to compute the kernel. The | |||||
| Following choices are available: | |||||
| 'imap_unordered': use Python's multiprocessing.Pool.imap_unordered | |||||
| method. | |||||
| None: no parallelization is applied. | |||||
| n_jobs : int | |||||
| Number of jobs for parallelization. The default is to use all | |||||
| computational cores. This argument is only valid when one of the | |||||
| parallelization method is applied. | |||||
| Return | Return | ||||
| ------ | ------ | ||||
| @@ -26,7 +26,7 @@ def untilhpathkernel(*args, | |||||
| node_label='atom', | node_label='atom', | ||||
| edge_label='bond_type', | edge_label='bond_type', | ||||
| depth=10, | depth=10, | ||||
| k_func='tanimoto', | |||||
| k_func='MinMax', | |||||
| compute_method='trie', | compute_method='trie', | ||||
| n_jobs=None, | n_jobs=None, | ||||
| verbose=True): | verbose=True): | ||||
| @@ -38,7 +38,7 @@ def untilhpathkernel(*args, | |||||
| List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
| / | / | ||||
| G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
| 2 graphs between which the kernel is calculated. | |||||
| Two graphs between which the kernel is calculated. | |||||
| node_label : string | node_label : string | ||||
| Node attribute used as label. The default node label is atom. | Node attribute used as label. The default node label is atom. | ||||
| edge_label : string | edge_label : string | ||||
| @@ -47,9 +47,17 @@ def untilhpathkernel(*args, | |||||
| Depth of search. Longest length of paths. | Depth of search. Longest length of paths. | ||||
| k_func : function | k_func : function | ||||
| A kernel function applied using different notions of fingerprint | A kernel function applied using different notions of fingerprint | ||||
| similarity. | |||||
| compute_method: string | |||||
| Computation method, 'trie' or 'naive'. | |||||
| similarity, defining the type of feature map and normalization method | |||||
| applied for the graph kernel. The Following choices are available: | |||||
| 'MinMax': use the MiniMax kernel and counting feature map. | |||||
| 'tanimoto': use the Tanimoto kernel and binary feature map. | |||||
| compute_method : string | |||||
| Computation method to store paths and compute the graph kernel. The | |||||
| Following choices are available: | |||||
| 'trie': store paths as tries. | |||||
| 'naive': store paths to lists. | |||||
| n_jobs : int | |||||
| Number of jobs for parallelization. | |||||
| Return | Return | ||||
| ------ | ------ | ||||
| @@ -38,15 +38,28 @@ def weisfeilerlehmankernel(*args, | |||||
| List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
| / | / | ||||
| G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
| 2 graphs between which the kernel is calculated. | |||||
| Two graphs between which the kernel is calculated. | |||||
| node_label : string | node_label : string | ||||
| node attribute used as label. The default node label is atom. | |||||
| Node attribute used as label. The default node label is atom. | |||||
| edge_label : string | edge_label : string | ||||
| edge attribute used as label. The default edge label is bond_type. | |||||
| Edge attribute used as label. The default edge label is bond_type. | |||||
| height : int | height : int | ||||
| subtree height | |||||
| Subtree height. | |||||
| base_kernel : string | base_kernel : string | ||||
| base kernel used in each iteration of WL kernel. The default base kernel is subtree kernel. For user-defined kernel, base_kernel is the name of the base kernel function used in each iteration of WL kernel. This function returns a Numpy matrix, each element of which is the user-defined Weisfeiler-Lehman kernel between 2 praphs. | |||||
| Base kernel used in each iteration of WL kernel. Only default 'subtree' | |||||
| kernel can be applied for now. | |||||
| # The default base | |||||
| # kernel is subtree kernel. For user-defined kernel, base_kernel is the | |||||
| # name of the base kernel function used in each iteration of WL kernel. | |||||
| # This function returns a Numpy matrix, each element of which is the | |||||
| # user-defined Weisfeiler-Lehman kernel between 2 praphs. | |||||
| parallel : None | |||||
| Which paralleliztion method is applied to compute the kernel. No | |||||
| parallelization can be applied for now. | |||||
| n_jobs : int | |||||
| Number of jobs for parallelization. The default is to use all | |||||
| computational cores. This argument is only valid when one of the | |||||
| parallelization method is applied and can be ignored for now. | |||||
| Return | Return | ||||
| ------ | ------ | ||||