Scaling To Shape:

Under this we have four techniques with which we can transform the skewed feature date

  1. Log Transformation                         log(X)
  2. Square Root Transformation          sqrt(X)
  3. Square Transformation                   X^2
  4. Exponential Transformation          (X,0.5)

Log Transformation:

  1. It is used to reduce the skewness of the data
  2. Data with a long tail(ex. Exponential Growth) is managed by this
  3. It is effective when we deal with Right-Skewed data (Positively skewed data)

Usage:

  1. Useful to data having exponential growth
  2. Useful to data having significant variation in magnitude
  3. Both of the above will affect  the performance of the model

Formula

Where:

  • X is the original value
  • b is the base
  • Load libraries and read data
  • Applicable to Positively skewed data

HAPPINESS: ( Refer to Happiness Data shown in Skewness Calculation)

Descriptive Statistics

Draw Histogram Plot

 

 

 

Histogram using Original Data

Right Skewed

Transform using Log Transformation Technique

Draw  Histogram using transformed data

Compare Original Histogram and Transformed Histogram

                                  Right Skewed                                                                                             Normalized

                                 Before normalization                                                                                After Normalization

2.Square Root Transformation

The same data Happiness

Define squareroot function and call

Draw Histogram Plot

Right Skewed                                                                         Normalized

Before normalization                                                             After normalization

3.Squared transformation: This is meant for left skewed data. Not applicable for right skewed data

Left Skewed Data (negatively skewed data)

This is used to reduce left skewness in data

Apply this when you  want to approach a more symmetrical distribution

Used when your data is negatively skewed

By squaring each value, it can help normalize distributions that are mildly skewed to the left

Formula

Define the function and call later

def Squared_transformation(Series)

return np.squaret(Series)

sqrttransformedgpa = Squared_transformation(df[‘GPA’])

Load libraries and read GPA data

Descriptive Statistics

Draw Histogram plot for the original data

Left tail is long

Normalize using Square Transformation Technique

 

Draw Histogram Plot using Normalized data

4. Exponential Transformation

a)Applies an exponential function to each element in the feature

  1.  relationship between a feature and the target is exponential in nature, applying an exponential transformation can linearize the data.
  2. useful for linear regression models where the assumption is that  the relationship between variables is linear.

Formula

def exponential_transformation(series, exponent=0.5):

return np.power(series, exponent)

This method is not suitable for any features in a dataset