| @@ -0,0 +1,367 @@ | |||
| ---------------------------------- | |||
| --- Python interface of LIBSVM --- | |||
| ---------------------------------- | |||
| Table of Contents | |||
| ================= | |||
| - Introduction | |||
| - Installation | |||
| - Quick Start | |||
| - Design Description | |||
| - Data Structures | |||
| - Utility Functions | |||
| - Additional Information | |||
| Introduction | |||
| ============ | |||
| Python (http://www.python.org/) is a programming language suitable for rapid | |||
| development. This tool provides a simple Python interface to LIBSVM, a library | |||
| for support vector machines (http://www.csie.ntu.edu.tw/~cjlin/libsvm). The | |||
| interface is very easy to use as the usage is the same as that of LIBSVM. The | |||
| interface is developed with the built-in Python library "ctypes." | |||
| Installation | |||
| ============ | |||
| On Unix systems, type | |||
| > make | |||
| The interface needs only LIBSVM shared library, which is generated by | |||
| the above command. We assume that the shared library is on the LIBSVM | |||
| main directory or in the system path. | |||
| For windows, the shared library libsvm.dll for 32-bit python is ready | |||
| in the directory `..\windows'. You can also copy it to the system | |||
| directory (e.g., `C:\WINDOWS\system32\' for Windows XP). To regenerate | |||
| the shared library, please follow the instruction of building windows | |||
| binaries in LIBSVM README. | |||
| Quick Start | |||
| =========== | |||
| There are two levels of usage. The high-level one uses utility functions | |||
| in svmutil.py and the usage is the same as the LIBSVM MATLAB interface. | |||
| >>> from svmutil import * | |||
| # Read data in LIBSVM format | |||
| >>> y, x = svm_read_problem('../heart_scale') | |||
| >>> m = svm_train(y[:200], x[:200], '-c 4') | |||
| >>> p_label, p_acc, p_val = svm_predict(y[200:], x[200:], m) | |||
| # Construct problem in python format | |||
| # Dense data | |||
| >>> y, x = [1,-1], [[1,0,1], [-1,0,-1]] | |||
| # Sparse data | |||
| >>> y, x = [1,-1], [{1:1, 3:1}, {1:-1,3:-1}] | |||
| >>> prob = svm_problem(y, x) | |||
| >>> param = svm_parameter('-t 0 -c 4 -b 1') | |||
| >>> m = svm_train(prob, param) | |||
| # Precomputed kernel data (-t 4) | |||
| # Dense data | |||
| >>> y, x = [1,-1], [[1, 2, -2], [2, -2, 2]] | |||
| # Sparse data | |||
| >>> y, x = [1,-1], [{0:1, 1:2, 2:-2}, {0:2, 1:-2, 2:2}] | |||
| # isKernel=True must be set for precomputed kernel | |||
| >>> prob = svm_problem(y, x, isKernel=True) | |||
| >>> param = svm_parameter('-t 4 -c 4 -b 1') | |||
| >>> m = svm_train(prob, param) | |||
| # For the format of precomputed kernel, please read LIBSVM README. | |||
| # Other utility functions | |||
| >>> svm_save_model('heart_scale.model', m) | |||
| >>> m = svm_load_model('heart_scale.model') | |||
| >>> p_label, p_acc, p_val = svm_predict(y, x, m, '-b 1') | |||
| >>> ACC, MSE, SCC = evaluations(y, p_label) | |||
| # Getting online help | |||
| >>> help(svm_train) | |||
| The low-level use directly calls C interfaces imported by svm.py. Note that | |||
| all arguments and return values are in ctypes format. You need to handle them | |||
| carefully. | |||
| >>> from svm import * | |||
| >>> prob = svm_problem([1,-1], [{1:1, 3:1}, {1:-1,3:-1}]) | |||
| >>> param = svm_parameter('-c 4') | |||
| >>> m = libsvm.svm_train(prob, param) # m is a ctype pointer to an svm_model | |||
| # Convert a Python-format instance to svm_nodearray, a ctypes structure | |||
| >>> x0, max_idx = gen_svm_nodearray({1:1, 3:1}) | |||
| >>> label = libsvm.svm_predict(m, x0) | |||
| Design Description | |||
| ================== | |||
| There are two files svm.py and svmutil.py, which respectively correspond to | |||
| low-level and high-level use of the interface. | |||
| In svm.py, we adopt the Python built-in library "ctypes," so that | |||
| Python can directly access C structures and interface functions defined | |||
| in svm.h. | |||
| While advanced users can use structures/functions in svm.py, to | |||
| avoid handling ctypes structures, in svmutil.py we provide some easy-to-use | |||
| functions. The usage is similar to LIBSVM MATLAB interface. | |||
| Data Structures | |||
| =============== | |||
| Four data structures derived from svm.h are svm_node, svm_problem, svm_parameter, | |||
| and svm_model. They all contain fields with the same names in svm.h. Access | |||
| these fields carefully because you directly use a C structure instead of a | |||
| Python object. For svm_model, accessing the field directly is not recommanded. | |||
| Programmers should use the interface functions or methods of svm_model class | |||
| in Python to get the values. The following description introduces additional | |||
| fields and methods. | |||
| Before using the data structures, execute the following command to load the | |||
| LIBSVM shared library: | |||
| >>> from svm import * | |||
| - class svm_node: | |||
| Construct an svm_node. | |||
| >>> node = svm_node(idx, val) | |||
| idx: an integer indicates the feature index. | |||
| val: a float indicates the feature value. | |||
| Show the index and the value of a node. | |||
| >>> print(node) | |||
| - Function: gen_svm_nodearray(xi [,feature_max=None [,isKernel=False]]) | |||
| Generate a feature vector from a Python list/tuple or a dictionary: | |||
| >>> xi, max_idx = gen_svm_nodearray({1:1, 3:1, 5:-2}) | |||
| xi: the returned svm_nodearray (a ctypes structure) | |||
| max_idx: the maximal feature index of xi | |||
| feature_max: if feature_max is assigned, features with indices larger than | |||
| feature_max are removed. | |||
| isKernel: if isKernel == True, the list index starts from 0 for precomputed | |||
| kernel. Otherwise, the list index starts from 1. The default | |||
| value is False. | |||
| - class svm_problem: | |||
| Construct an svm_problem instance | |||
| >>> prob = svm_problem(y, x) | |||
| y: a Python list/tuple of l labels (type must be int/double). | |||
| x: a Python list/tuple of l data instances. Each element of x must be | |||
| an instance of list/tuple/dictionary type. | |||
| Note that if your x contains sparse data (i.e., dictionary), the internal | |||
| ctypes data format is still sparse. | |||
| For pre-computed kernel, the isKernel flag should be set to True: | |||
| >>> prob = svm_problem(y, x, isKernel=True) | |||
| Please read LIBSVM README for more details of pre-computed kernel. | |||
| - class svm_parameter: | |||
| Construct an svm_parameter instance | |||
| >>> param = svm_parameter('training_options') | |||
| If 'training_options' is empty, LIBSVM default values are applied. | |||
| Set param to LIBSVM default values. | |||
| >>> param.set_to_default_values() | |||
| Parse a string of options. | |||
| >>> param.parse_options('training_options') | |||
| Show values of parameters. | |||
| >>> print(param) | |||
| - class svm_model: | |||
| There are two ways to obtain an instance of svm_model: | |||
| >>> model = svm_train(y, x) | |||
| >>> model = svm_load_model('model_file_name') | |||
| Note that the returned structure of interface functions | |||
| libsvm.svm_train and libsvm.svm_load_model is a ctypes pointer of | |||
| svm_model, which is different from the svm_model object returned | |||
| by svm_train and svm_load_model in svmutil.py. We provide a | |||
| function toPyModel for the conversion: | |||
| >>> model_ptr = libsvm.svm_train(prob, param) | |||
| >>> model = toPyModel(model_ptr) | |||
| If you obtain a model in a way other than the above approaches, | |||
| handle it carefully to avoid memory leak or segmentation fault. | |||
| Some interface functions to access LIBSVM models are wrapped as | |||
| members of the class svm_model: | |||
| >>> svm_type = model.get_svm_type() | |||
| >>> nr_class = model.get_nr_class() | |||
| >>> svr_probability = model.get_svr_probability() | |||
| >>> class_labels = model.get_labels() | |||
| >>> sv_indices = model.get_sv_indices() | |||
| >>> nr_sv = model.get_nr_sv() | |||
| >>> is_prob_model = model.is_probability_model() | |||
| >>> support_vector_coefficients = model.get_sv_coef() | |||
| >>> support_vectors = model.get_SV() | |||
| Utility Functions | |||
| ================= | |||
| To use utility functions, type | |||
| >>> from svmutil import * | |||
| The above command loads | |||
| svm_train() : train an SVM model | |||
| svm_predict() : predict testing data | |||
| svm_read_problem() : read the data from a LIBSVM-format file. | |||
| svm_load_model() : load a LIBSVM model. | |||
| svm_save_model() : save model to a file. | |||
| evaluations() : evaluate prediction results. | |||
| - Function: svm_train | |||
| There are three ways to call svm_train() | |||
| >>> model = svm_train(y, x [, 'training_options']) | |||
| >>> model = svm_train(prob [, 'training_options']) | |||
| >>> model = svm_train(prob, param) | |||
| y: a list/tuple of l training labels (type must be int/double). | |||
| x: a list/tuple of l training instances. The feature vector of | |||
| each training instance is an instance of list/tuple or dictionary. | |||
| training_options: a string in the same form as that for LIBSVM command | |||
| mode. | |||
| prob: an svm_problem instance generated by calling | |||
| svm_problem(y, x). | |||
| For pre-computed kernel, you should use | |||
| svm_problem(y, x, isKernel=True) | |||
| param: an svm_parameter instance generated by calling | |||
| svm_parameter('training_options') | |||
| model: the returned svm_model instance. See svm.h for details of this | |||
| structure. If '-v' is specified, cross validation is | |||
| conducted and the returned model is just a scalar: cross-validation | |||
| accuracy for classification and mean-squared error for regression. | |||
| To train the same data many times with different | |||
| parameters, the second and the third ways should be faster.. | |||
| Examples: | |||
| >>> y, x = svm_read_problem('../heart_scale') | |||
| >>> prob = svm_problem(y, x) | |||
| >>> param = svm_parameter('-s 3 -c 5 -h 0') | |||
| >>> m = svm_train(y, x, '-c 5') | |||
| >>> m = svm_train(prob, '-t 2 -c 5') | |||
| >>> m = svm_train(prob, param) | |||
| >>> CV_ACC = svm_train(y, x, '-v 3') | |||
| - Function: svm_predict | |||
| To predict testing data with a model, use | |||
| >>> p_labs, p_acc, p_vals = svm_predict(y, x, model [,'predicting_options']) | |||
| y: a list/tuple of l true labels (type must be int/double). It is used | |||
| for calculating the accuracy. Use [0]*len(x) if true labels are | |||
| unavailable. | |||
| x: a list/tuple of l predicting instances. The feature vector of | |||
| each predicting instance is an instance of list/tuple or dictionary. | |||
| predicting_options: a string of predicting options in the same format as | |||
| that of LIBSVM. | |||
| model: an svm_model instance. | |||
| p_labels: a list of predicted labels | |||
| p_acc: a tuple including accuracy (for classification), mean | |||
| squared error, and squared correlation coefficient (for | |||
| regression). | |||
| p_vals: a list of decision values or probability estimates (if '-b 1' | |||
| is specified). If k is the number of classes in training data, | |||
| for decision values, each element includes results of predicting | |||
| k(k-1)/2 binary-class SVMs. For classification, k = 1 is a | |||
| special case. Decision value [+1] is returned for each testing | |||
| instance, instead of an empty list. | |||
| For probabilities, each element contains k values indicating | |||
| the probability that the testing instance is in each class. | |||
| Note that the order of classes is the same as the 'model.label' | |||
| field in the model structure. | |||
| Example: | |||
| >>> m = svm_train(y, x, '-c 5') | |||
| >>> p_labels, p_acc, p_vals = svm_predict(y, x, m) | |||
| - Functions: svm_read_problem/svm_load_model/svm_save_model | |||
| See the usage by examples: | |||
| >>> y, x = svm_read_problem('data.txt') | |||
| >>> m = svm_load_model('model_file') | |||
| >>> svm_save_model('model_file', m) | |||
| - Function: evaluations | |||
| Calculate some evaluations using the true values (ty) and predicted | |||
| values (pv): | |||
| >>> (ACC, MSE, SCC) = evaluations(ty, pv) | |||
| ty: a list of true values. | |||
| pv: a list of predict values. | |||
| ACC: accuracy. | |||
| MSE: mean squared error. | |||
| SCC: squared correlation coefficient. | |||
| Additional Information | |||
| ====================== | |||
| This interface was written by Hsiang-Fu Yu from Department of Computer | |||
| Science, National Taiwan University. If you find this tool useful, please | |||
| cite LIBSVM as follows | |||
| Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support | |||
| vector machines. ACM Transactions on Intelligent Systems and | |||
| Technology, 2:27:1--27:27, 2011. Software available at | |||
| http://www.csie.ntu.edu.tw/~cjlin/libsvm | |||
| For any question, please contact Chih-Jen Lin <cjlin@csie.ntu.edu.tw>, | |||
| or check the FAQ page: | |||
| http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html | |||