Data Mining Research

Showing posts with label dimensionality reduction. Show all posts

Friday, September 28, 2007

Introduction to feature selection (part 2)

This post continues the previous one about feature selection. I now give some examples of wrapper-based techniques for feature selection.

In wrapper techniques, using the classification algorithm as a black box, any search strategy can be used in combination. This makes wrapper approaches universal. The accuracy of the classification algorithm may be used as the objective function of the search strategy. As with any classification algorithm, wrapper feature selection techniques face the overfitting problem that may happen while training. One way to reduce the overfitting problem is to use a k-fold cross-validation strategy. Correlation coefficient and mutual information are cheaper to evaluate than cross-validation. However, when the number of features is high, they become difficult to evaluate.

Several search algorithms that select a subset of m features out of d exist in the literature. Although exhaustive search is guaranteed to find the optimal feature subset, it is not feasible when d is not small. The branch and bound procedure explores only a part of all possible feature combinations. It is guaranteed to find the optimal feature subset in less time than needed for exhaustive search. Its main drawback concerns the monotony assumption of the feature selection objective function. That is, if a variable is added to the feature set, it should never decrease the value of the objective function, which is not always the case.

Individual ranking procedures can be named naive methods. The idea is to individually rank each feature at a time, according to its prediction power. This technique is valid only if every feature is independent, which is usually not the case in practice. Sequential forward selection (SFS) starts with a single feature and iteratively add a feature to increase the classification criteria. Sequential Backward Selection (SBS) starts with all features and iteratively remove a single feature to increase the classification accuracy. Although combination of features are taken into account with this technique, a high number of computations are necessary since it starts with the set of all features. This may not be feasible for very high dimensional data set. The plus l take away r method combines the former two techniques by first applying SFS for l features and then SBS for r features. Although this method seems promising, a value for l and r have to be given by the user. Sequential Forward Floating Search (SFFS) and Sequential Backward Floating Search (SBFS) are generalization of the two former methods. In this case, l and r are determined automatically.

I hope these two posts have introduced you to feature selection and more specifically to wrapper-based techniques. Feel free to comment.

(1) François, D., High-dimensional data analysis: optimal metrics and feature selection, Université catholique de Louvain, Louvain-la-Neuve, Belgium, 2007.
(2) Reunanen, J., Overfitting in Making Comparisons Between Variable Selection Methods, Journal of Machine Learning Research, 2003, 3, 1371-1382.
(3) Jain, A.K., Duin, R.P.W. and Mao, J., Statistical Pattern Recognition: A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22, 1, 4-37.

Continue reading... Sphere: Related Content

Tuesday, September 25, 2007

Introduction to feature selection (part 1)

Feature selection is a technique used to reduce the number of features before applying a data mining algorithm. Irrelevant features may have negative effects on a prediction task. Moreover, the computational complexity
of a classification algorithm may suffer from the curse of dimensionality caused by several features. When a data set has too many irrelevant variables and only a few examples, overfitting is likely to occur. In addition, data are usually better characterized using fewer variables. Feature selection has been applied in fields such as multimedia database search, image classification and biometric recognition. A comprehensive introduction to feature selection can be found in the paper by Guyon et al.

Feature selection techniques can be divided in three main categories: embedded approaches (feature selection is part of the classification algorithm, i.e. decision tree), filter approaches (features are selected before the classification algorithm is used) and wrapper approaches (the classification algorithm is used as a black box to find the best subset of attributes). Due to its very definition, embedded approaches are limited since they only suit a particular classification algorithm. A relevant feature is not necessarily relevant for a given classification algorithm. Filter methods, however, do the assumptions that the feature selection process is independent from the classification step. The work done by Kohavi et al. (1995) recommends to replace filter approach by wrappers. The latter provide usually better results, the price being higher computational complexity. Although already known in statistics and pattern recognition, wrappers are new in the data mining community.

Selected references:

(1) Kudo, M. and Sklansky, J., Comparison of algorithms that select features for pattern classiers, Pattern Recognition, 2000, 33, 25-41.
(2) Blum, A.L. and Langley, P., Selection of relevant features and examples in machine learning, Artificial Intelligence, 1997, 97, 245-271.
(3) Kohavi, R. and John, G., Feature Selection for Knowledge Discovery and Data Mining, The Wrapper Approach, Kluwer Academic Publishers, 1998, 33-50.

Continue reading... Sphere: Related Content

Wednesday, February 07, 2007

Feature selection

One of the most interesting and well written paper I have read regarding data mining is certainly "An Introduction to Variable and Feature Selection" (Guyon and Elisseeff, 2003). It is freely available on the Journal of Machine Learning Research website. After reading this paper, you should have a good view of what feature selection really is about. Although not popularized, the paper is written in a very readable way.

Feature selection may be useful for facilitating data visualization, reducing storage requirements and increasing performances of learning algorithms. The paper starts by a checklist of crucial points to discuss before applying any learning algorithm on your data. Then, topics such as variable ranking and variable subset selection are covered. A clear distinction is made between three different techniques for variable selection: wrappers, filters and embedded methods.

The article continues on dimensionality reduction and validation techniques in the case of variable selection. Finally examples of open problems are outlined. I have read several papers in data mining and related topics, and this is certainly the most comprehensive and readable one. In addition to the paper, and for more details about Matlab implementation, you can have a look at this post on Will's blog.

Continue reading... Sphere: Related Content