Univariate Selection is one of the machine learning techniques that identifies the most important features in a dataset. You can view it as a  preprocessing step to an  estimator. Sklearn comes with a preprocessing routine that implements the transform method.

  • SelectKBest removes all but the k highest scoring features
  • SelectPercentile removes all but a user-specified highest scoring percentage of features
  • using common univariate statistical tests for each feature: false positive rate SelectFpr, false discovery rate SelectFdr, or family wise error SelectFwe.
  • GenericUnivariateSelect allows to perform univariate feature selection with a configurable strategy. This allows to select the best univariate selection strategy with hyper-parameter search estimator.

These objects take as input a scoring function that returns univariate scores and p-values (or only scores for SelectKBest and SelectPercentile):

Load libraries

Read Data

Find target variable

Target variable contains two classes: 1 Malignant and 2 Benign

Convert categorical values into numerical value using LabelEncoder

Separate features and targer variables:

Univariate feature selection using sklearn feature selection

Out of 31 features system has selected best k (5) features

Accuracy of Logistic Regression Model after reduction

Accuracy of the model works out to 91.9%