This technique measures the relationship between features and target variables individually. Features having high absolute correlation with target variable are considered relevant for future analysis. Identifies the features which have strong relationship with target variable. Also considers the multi-collinearity among features to avoid redundancy. The price (target variable) may be highly correlated with features like square footage and number of rooms. But they have not been included. They have been removed. Pearson Correlation : Used To measure the Linear relationship between continuous feature and target variables.Spearman Rank Correlation: Used when features are ordinal and the relationship is non-linear

Calculate the Correlation coefficient between  each feature and target variables individually.  Compare the absolute value of the correlation coefficients to a pre-defined threshold. Select features with correlation coefficients above the threshold.

Load Libraries

Let us use the data available in Kaggle Breast Cancer Wisconsin (Diagnostic) Data Set and reduce the features available. This dataset contains one target/dependent variable and 31 features(independent/explanatory variables)

Read Data

Unique elements of Diagnosis

Malignant – M and Benign -B

Convert categorical into numeric using Label encoder

Once data is given preprocess the data using sklearn and import LabelEncoder library. Convert the categorical column diagnosis into numeric column (1 or 0) by fit_transform function

Shape

Data contains 569 records and 31(features)+1(target) variables

Separate features and target

Convert selected features into a data frame and display

Convert the numeric values into float using numpy

Find correlation between target and individual features and compare it with threshold

Selected features out of 31 features:

Out of original 31 features system has selected 15 features based on the correlation metric. Features having more than 0.5 correlation values (threshold) have been selected for further analysis

After reduction of features let us find accuracy of Logistic regression model undertaken

Accuracy: using Logistic Regression

Correlated Features – heatmap

Heatmap using seaborn

Correlation for 15 features

Ordinary Least Square Method (OLS)

OLS Summary