Feature Selection Methods (Filter, Wrapper, Embedded)

Feature selection is a crucial step in the data science process that aims to identify the most relevant and informative features from a given dataset. By selecting the right features, we can improve the performance of our models, reduce overfitting, and gain a deeper understanding of the underlying patterns in the data.

There are three main categories of feature selection methods: filter, wrapper, and embedded. Each category has its strengths and weaknesses, and the choice of the appropriate method depends on the specific problem and dataset at hand.

Filter Methods

Filter methods evaluate the relevance of features based on their intrinsic properties without considering any specific learning algorithm. These methods measure the statistical significance between each feature and the target variable, and subsequently, rank them according to the calculated scores.

Some commonly used filter methods include:

  • Chi-squared test: This statistical test measures the independence between each feature and the target variable in a categorical classification task.
  • Information Gain: This method calculates the reduction in entropy achieved by each feature to determine its relevance.
  • Correlation Coefficient: It quantifies the linear relationship between two variables, providing insights into how much information a particular feature contributes to the target variable.

One advantage of filter methods is their computational efficiency, as they don't require training a model. However, they only consider the individual predictive power of each feature and may overlook the dependencies among them.

Wrapper Methods

Unlike filter methods, wrapper methods aim to find the optimal subset of features by using a specific learning algorithm to evaluate the performance of different feature subsets. These methods select features based on how well they improve the accuracy or other performance metrics of the chosen model.

Commonly used wrapper methods include:

  • Recursive Feature Elimination (RFE): This method starts with all features and successively eliminates the least significant ones based on coefficients or feature importance scores.
  • Forward Selection: It iteratively adds features that yield the highest model performance until no significant improvement is observed.
  • Backward Elimination: This method starts with all features and successively removes the least valuable one based on statistical tests or model performance.

Wrapper methods provide a more accurate assessment of feature importance by considering the interaction and dependencies among features. However, they can be computationally expensive, especially for large datasets, as they involve repeatedly training the model.

Embedded Methods

Embedded methods incorporate feature selection as an integral part of the model training process. These methods learn which features to include by optimizing the model's performance during training.

Some common embedded methods are:

  • Lasso Regression: By adding an L1 regularization term to the linear regression model, Lasso regression encourages sparsity, leading to automatic feature selection.
  • Random Forest Importance: Random Forest models provide a feature importance measure based on the decrease in model performance when a particular feature is randomly permuted.
  • Gradient Boosting Feature Importance: Similar to Random Forest, gradient boosting models assign importance scores to each feature based on their contribution to the model's performance improvement.

Embedded methods combine the advantages of filter and wrapper methods. They consider the interactions between features while being computationally efficient. However, they may not perform as well as wrapper methods when it comes to highly correlated features.


Feature selection is a critical step in any data science project. Depending on the context, different methods such as filter, wrapper, or embedded can be employed. Filter methods offer computational efficiency but may overlook feature interactions. Wrapper methods consider feature dependencies but can be computationally expensive. Embedded methods strike a balance between the two but still have their limitations.

Ultimately, the choice of the appropriate method depends on the specific dataset and the goals of the analysis. Experimenting with different feature selection techniques can help identify the most informative and relevant features, leading to improved models and better insights.

noob to master © copyleft