| @@ -0,0 +1,367 @@ | |||||
| ---------------------------------- | |||||
| --- Python interface of LIBSVM --- | |||||
| ---------------------------------- | |||||
| Table of Contents | |||||
| ================= | |||||
| - Introduction | |||||
| - Installation | |||||
| - Quick Start | |||||
| - Design Description | |||||
| - Data Structures | |||||
| - Utility Functions | |||||
| - Additional Information | |||||
| Introduction | |||||
| ============ | |||||
| Python (http://www.python.org/) is a programming language suitable for rapid | |||||
| development. This tool provides a simple Python interface to LIBSVM, a library | |||||
| for support vector machines (http://www.csie.ntu.edu.tw/~cjlin/libsvm). The | |||||
| interface is very easy to use as the usage is the same as that of LIBSVM. The | |||||
| interface is developed with the built-in Python library "ctypes." | |||||
| Installation | |||||
| ============ | |||||
| On Unix systems, type | |||||
| > make | |||||
| The interface needs only LIBSVM shared library, which is generated by | |||||
| the above command. We assume that the shared library is on the LIBSVM | |||||
| main directory or in the system path. | |||||
| For windows, the shared library libsvm.dll for 32-bit python is ready | |||||
| in the directory `..\windows'. You can also copy it to the system | |||||
| directory (e.g., `C:\WINDOWS\system32\' for Windows XP). To regenerate | |||||
| the shared library, please follow the instruction of building windows | |||||
| binaries in LIBSVM README. | |||||
| Quick Start | |||||
| =========== | |||||
| There are two levels of usage. The high-level one uses utility functions | |||||
| in svmutil.py and the usage is the same as the LIBSVM MATLAB interface. | |||||
| >>> from svmutil import * | |||||
| # Read data in LIBSVM format | |||||
| >>> y, x = svm_read_problem('../heart_scale') | |||||
| >>> m = svm_train(y[:200], x[:200], '-c 4') | |||||
| >>> p_label, p_acc, p_val = svm_predict(y[200:], x[200:], m) | |||||
| # Construct problem in python format | |||||
| # Dense data | |||||
| >>> y, x = [1,-1], [[1,0,1], [-1,0,-1]] | |||||
| # Sparse data | |||||
| >>> y, x = [1,-1], [{1:1, 3:1}, {1:-1,3:-1}] | |||||
| >>> prob = svm_problem(y, x) | |||||
| >>> param = svm_parameter('-t 0 -c 4 -b 1') | |||||
| >>> m = svm_train(prob, param) | |||||
| # Precomputed kernel data (-t 4) | |||||
| # Dense data | |||||
| >>> y, x = [1,-1], [[1, 2, -2], [2, -2, 2]] | |||||
| # Sparse data | |||||
| >>> y, x = [1,-1], [{0:1, 1:2, 2:-2}, {0:2, 1:-2, 2:2}] | |||||
| # isKernel=True must be set for precomputed kernel | |||||
| >>> prob = svm_problem(y, x, isKernel=True) | |||||
| >>> param = svm_parameter('-t 4 -c 4 -b 1') | |||||
| >>> m = svm_train(prob, param) | |||||
| # For the format of precomputed kernel, please read LIBSVM README. | |||||
| # Other utility functions | |||||
| >>> svm_save_model('heart_scale.model', m) | |||||
| >>> m = svm_load_model('heart_scale.model') | |||||
| >>> p_label, p_acc, p_val = svm_predict(y, x, m, '-b 1') | |||||
| >>> ACC, MSE, SCC = evaluations(y, p_label) | |||||
| # Getting online help | |||||
| >>> help(svm_train) | |||||
| The low-level use directly calls C interfaces imported by svm.py. Note that | |||||
| all arguments and return values are in ctypes format. You need to handle them | |||||
| carefully. | |||||
| >>> from svm import * | |||||
| >>> prob = svm_problem([1,-1], [{1:1, 3:1}, {1:-1,3:-1}]) | |||||
| >>> param = svm_parameter('-c 4') | |||||
| >>> m = libsvm.svm_train(prob, param) # m is a ctype pointer to an svm_model | |||||
| # Convert a Python-format instance to svm_nodearray, a ctypes structure | |||||
| >>> x0, max_idx = gen_svm_nodearray({1:1, 3:1}) | |||||
| >>> label = libsvm.svm_predict(m, x0) | |||||
| Design Description | |||||
| ================== | |||||
| There are two files svm.py and svmutil.py, which respectively correspond to | |||||
| low-level and high-level use of the interface. | |||||
| In svm.py, we adopt the Python built-in library "ctypes," so that | |||||
| Python can directly access C structures and interface functions defined | |||||
| in svm.h. | |||||
| While advanced users can use structures/functions in svm.py, to | |||||
| avoid handling ctypes structures, in svmutil.py we provide some easy-to-use | |||||
| functions. The usage is similar to LIBSVM MATLAB interface. | |||||
| Data Structures | |||||
| =============== | |||||
| Four data structures derived from svm.h are svm_node, svm_problem, svm_parameter, | |||||
| and svm_model. They all contain fields with the same names in svm.h. Access | |||||
| these fields carefully because you directly use a C structure instead of a | |||||
| Python object. For svm_model, accessing the field directly is not recommanded. | |||||
| Programmers should use the interface functions or methods of svm_model class | |||||
| in Python to get the values. The following description introduces additional | |||||
| fields and methods. | |||||
| Before using the data structures, execute the following command to load the | |||||
| LIBSVM shared library: | |||||
| >>> from svm import * | |||||
| - class svm_node: | |||||
| Construct an svm_node. | |||||
| >>> node = svm_node(idx, val) | |||||
| idx: an integer indicates the feature index. | |||||
| val: a float indicates the feature value. | |||||
| Show the index and the value of a node. | |||||
| >>> print(node) | |||||
| - Function: gen_svm_nodearray(xi [,feature_max=None [,isKernel=False]]) | |||||
| Generate a feature vector from a Python list/tuple or a dictionary: | |||||
| >>> xi, max_idx = gen_svm_nodearray({1:1, 3:1, 5:-2}) | |||||
| xi: the returned svm_nodearray (a ctypes structure) | |||||
| max_idx: the maximal feature index of xi | |||||
| feature_max: if feature_max is assigned, features with indices larger than | |||||
| feature_max are removed. | |||||
| isKernel: if isKernel == True, the list index starts from 0 for precomputed | |||||
| kernel. Otherwise, the list index starts from 1. The default | |||||
| value is False. | |||||
| - class svm_problem: | |||||
| Construct an svm_problem instance | |||||
| >>> prob = svm_problem(y, x) | |||||
| y: a Python list/tuple of l labels (type must be int/double). | |||||
| x: a Python list/tuple of l data instances. Each element of x must be | |||||
| an instance of list/tuple/dictionary type. | |||||
| Note that if your x contains sparse data (i.e., dictionary), the internal | |||||
| ctypes data format is still sparse. | |||||
| For pre-computed kernel, the isKernel flag should be set to True: | |||||
| >>> prob = svm_problem(y, x, isKernel=True) | |||||
| Please read LIBSVM README for more details of pre-computed kernel. | |||||
| - class svm_parameter: | |||||
| Construct an svm_parameter instance | |||||
| >>> param = svm_parameter('training_options') | |||||
| If 'training_options' is empty, LIBSVM default values are applied. | |||||
| Set param to LIBSVM default values. | |||||
| >>> param.set_to_default_values() | |||||
| Parse a string of options. | |||||
| >>> param.parse_options('training_options') | |||||
| Show values of parameters. | |||||
| >>> print(param) | |||||
| - class svm_model: | |||||
| There are two ways to obtain an instance of svm_model: | |||||
| >>> model = svm_train(y, x) | |||||
| >>> model = svm_load_model('model_file_name') | |||||
| Note that the returned structure of interface functions | |||||
| libsvm.svm_train and libsvm.svm_load_model is a ctypes pointer of | |||||
| svm_model, which is different from the svm_model object returned | |||||
| by svm_train and svm_load_model in svmutil.py. We provide a | |||||
| function toPyModel for the conversion: | |||||
| >>> model_ptr = libsvm.svm_train(prob, param) | |||||
| >>> model = toPyModel(model_ptr) | |||||
| If you obtain a model in a way other than the above approaches, | |||||
| handle it carefully to avoid memory leak or segmentation fault. | |||||
| Some interface functions to access LIBSVM models are wrapped as | |||||
| members of the class svm_model: | |||||
| >>> svm_type = model.get_svm_type() | |||||
| >>> nr_class = model.get_nr_class() | |||||
| >>> svr_probability = model.get_svr_probability() | |||||
| >>> class_labels = model.get_labels() | |||||
| >>> sv_indices = model.get_sv_indices() | |||||
| >>> nr_sv = model.get_nr_sv() | |||||
| >>> is_prob_model = model.is_probability_model() | |||||
| >>> support_vector_coefficients = model.get_sv_coef() | |||||
| >>> support_vectors = model.get_SV() | |||||
| Utility Functions | |||||
| ================= | |||||
| To use utility functions, type | |||||
| >>> from svmutil import * | |||||
| The above command loads | |||||
| svm_train() : train an SVM model | |||||
| svm_predict() : predict testing data | |||||
| svm_read_problem() : read the data from a LIBSVM-format file. | |||||
| svm_load_model() : load a LIBSVM model. | |||||
| svm_save_model() : save model to a file. | |||||
| evaluations() : evaluate prediction results. | |||||
| - Function: svm_train | |||||
| There are three ways to call svm_train() | |||||
| >>> model = svm_train(y, x [, 'training_options']) | |||||
| >>> model = svm_train(prob [, 'training_options']) | |||||
| >>> model = svm_train(prob, param) | |||||
| y: a list/tuple of l training labels (type must be int/double). | |||||
| x: a list/tuple of l training instances. The feature vector of | |||||
| each training instance is an instance of list/tuple or dictionary. | |||||
| training_options: a string in the same form as that for LIBSVM command | |||||
| mode. | |||||
| prob: an svm_problem instance generated by calling | |||||
| svm_problem(y, x). | |||||
| For pre-computed kernel, you should use | |||||
| svm_problem(y, x, isKernel=True) | |||||
| param: an svm_parameter instance generated by calling | |||||
| svm_parameter('training_options') | |||||
| model: the returned svm_model instance. See svm.h for details of this | |||||
| structure. If '-v' is specified, cross validation is | |||||
| conducted and the returned model is just a scalar: cross-validation | |||||
| accuracy for classification and mean-squared error for regression. | |||||
| To train the same data many times with different | |||||
| parameters, the second and the third ways should be faster.. | |||||
| Examples: | |||||
| >>> y, x = svm_read_problem('../heart_scale') | |||||
| >>> prob = svm_problem(y, x) | |||||
| >>> param = svm_parameter('-s 3 -c 5 -h 0') | |||||
| >>> m = svm_train(y, x, '-c 5') | |||||
| >>> m = svm_train(prob, '-t 2 -c 5') | |||||
| >>> m = svm_train(prob, param) | |||||
| >>> CV_ACC = svm_train(y, x, '-v 3') | |||||
| - Function: svm_predict | |||||
| To predict testing data with a model, use | |||||
| >>> p_labs, p_acc, p_vals = svm_predict(y, x, model [,'predicting_options']) | |||||
| y: a list/tuple of l true labels (type must be int/double). It is used | |||||
| for calculating the accuracy. Use [0]*len(x) if true labels are | |||||
| unavailable. | |||||
| x: a list/tuple of l predicting instances. The feature vector of | |||||
| each predicting instance is an instance of list/tuple or dictionary. | |||||
| predicting_options: a string of predicting options in the same format as | |||||
| that of LIBSVM. | |||||
| model: an svm_model instance. | |||||
| p_labels: a list of predicted labels | |||||
| p_acc: a tuple including accuracy (for classification), mean | |||||
| squared error, and squared correlation coefficient (for | |||||
| regression). | |||||
| p_vals: a list of decision values or probability estimates (if '-b 1' | |||||
| is specified). If k is the number of classes in training data, | |||||
| for decision values, each element includes results of predicting | |||||
| k(k-1)/2 binary-class SVMs. For classification, k = 1 is a | |||||
| special case. Decision value [+1] is returned for each testing | |||||
| instance, instead of an empty list. | |||||
| For probabilities, each element contains k values indicating | |||||
| the probability that the testing instance is in each class. | |||||
| Note that the order of classes is the same as the 'model.label' | |||||
| field in the model structure. | |||||
| Example: | |||||
| >>> m = svm_train(y, x, '-c 5') | |||||
| >>> p_labels, p_acc, p_vals = svm_predict(y, x, m) | |||||
| - Functions: svm_read_problem/svm_load_model/svm_save_model | |||||
| See the usage by examples: | |||||
| >>> y, x = svm_read_problem('data.txt') | |||||
| >>> m = svm_load_model('model_file') | |||||
| >>> svm_save_model('model_file', m) | |||||
| - Function: evaluations | |||||
| Calculate some evaluations using the true values (ty) and predicted | |||||
| values (pv): | |||||
| >>> (ACC, MSE, SCC) = evaluations(ty, pv) | |||||
| ty: a list of true values. | |||||
| pv: a list of predict values. | |||||
| ACC: accuracy. | |||||
| MSE: mean squared error. | |||||
| SCC: squared correlation coefficient. | |||||
| Additional Information | |||||
| ====================== | |||||
| This interface was written by Hsiang-Fu Yu from Department of Computer | |||||
| Science, National Taiwan University. If you find this tool useful, please | |||||
| cite LIBSVM as follows | |||||
| Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support | |||||
| vector machines. ACM Transactions on Intelligent Systems and | |||||
| Technology, 2:27:1--27:27, 2011. Software available at | |||||
| http://www.csie.ntu.edu.tw/~cjlin/libsvm | |||||
| For any question, please contact Chih-Jen Lin <cjlin@csie.ntu.edu.tw>, | |||||
| or check the FAQ page: | |||||
| http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html | |||||