As we explained we have to apply different type of feature scaling

Model

Requirement

Tree based models: Random Forest and Gradient Boosting Do not require feature normalization or scaling
Neural Networks Require feature normalization as the model converges more quickly. Beneficial to Neural network model
Selection of appropriate feature scaling / normalization techniques Is important step in pre-processing of the dataset given on hand

Z-Score Normalization:

  1. It is a data pre-processing technique.
  2. It Rescales features to have mean = 0 and standard deviation = 1.
  3. It is sensitive to outliers: Since mean (µ) and standard deviation (σ) are influenced by extreme values.
  4. Achieved by centering data around the mean and scaling it based on the feature’s standard deviation
  5. Works well when the data distribution is approximately Gaussian (normal).
  6. Effect on models:
    • Beneficial for algorithms assuming normal-like distribution (e.g., linear regression, logistic regression, LDA).
    • Improves gradient descent convergence for models like neural networks.
    • Distance-based models (KNN, SVM) perform better when features are standardized.

Here we convert the normal variate X into Standard Normal Variate Z using the formula

Each normal variate X is converted to Z

Function used for this Normalization technique

def  z_score_standardization(series):

return (series – series.mean())/(series.std())

Then call this function using your independent features  after dropping dependent variable feature

Let us download teleco Customer Churn dataset from https://www.kaggle.com/datasets/royjafari/customer-churn

It is a free data.

You can refer the following publications:

  • Jafari-Marandi, R., Denton, J., Idris, A., Smith, B. K., & Keramati, A. Optimum profit-driven churn decision making: innovative artificial neural networks in telecom industry. Neural Computing and Applications, 1-34.
  • Keramati, A., Jafari-Marandi, R., Aliannejadi, M., Ahmadian, I., Mozaffari, M., & Abbasi, U. (2014). Improved churn prediction in the telecommunication industry using data mining techniques. Applied Soft Computing, 24, 994-1012.
  • Keramati, A., & Ardabili, S. M. (2011). Churn analysis for an Iranian mobile operator. Telecommunications Policy, 35(4), 344-356.

Or you can download using the following algorithm

import kagglehub

# Download latest version

path = kagglehub.dataset_download(“royjafari/customer-churn”)

print(“Path to dataset files:”, path)

The data contains 3150 rows (records) and  13  features and one Dependent variable (Churn)

There were two more features called FP and FN. I have removed them.

Load libraries and read the data

 

View the size and type of features you handle:

Check if any null values are present

See the Descriptive Statistics of your data

Correlation heatmap:

Boxplot Before using normalization of features

Logistic Regression using SK Learn library

Predict yhat using features

Find Coefficients and intercept

Find R^2 (R-Square)

R^2 works out to 19.94% when no normalization technique is used

Arrive at Confusion Matrix

Confusion Matrix heat map using seaborn and Matplotlib

Find Accuracy:

Find Precision and Recall

Find F1-score

Classification Report

Regular  denotes customer who are not defaulting customer

Default denotes customers who may churn and go away from the telephone company

Accuracy of the model is 89.40%

Out of 3150 customers 2655 customers appear to be regular who may not churn from the company  (84.28%)

Out of 3150 customers 495 customers appear to change their mind and churn away from the company (15.72%)

ROC-AUC calculation:

Another way to calculate accuracy, precision, recall, f1-score and AUC

Z-Score Normalization Technique

Define a function and normalize features

BOXPLOT after z-score normalization

Logistic Regression using normalized data with the help of stats.models.api library

Yhat(churn)  =-3.5030+0.9551*callfailure +1.0746*complains-0.2563*Subscription Length-0.6307*chargeamt+0.4560*secondsused

-3.1738*Frequency of use -5.2897*freqofsms-0.1901*distinctcallno+0.0764*agegrp+0.0617*tarriffplan+0.6092*status

+0.0346*Age +4.2785*custvalue

Logistic Regression using SKLEARN lib

Predict churn

Find Coefficients and intercept

Find R-Square for Z-Score Normalized Data:

Confusion Matrix

Confusion Matrix heatmap

Classification Report

 

Accuracy

Precision, Recall and F1-score

ROC-AUC

Accuracy, precision, recall, f1-score and AUC with a single program

Comparison of matrices with Original data and normalized data

R^2 has increased when we use normalized data.