While creating machine learning models we assume that our data follows normal distribution. In reality it is not so. We get data with large outliers that may lead to skewed distribution. Skewed distribution always affects the performance of the model we create. So our prediction may go wrong. Scaling to Shape techniques enable us to reduce the skewness and to enhance the performance of our model.
Before going into the details of this techniques we should learn about Continuous probability distribution.
Continuous Probability Distribution – normal distribution
Continuous Probability Distribution comes under Theoretical Distribution. A probability distribution which has the following probability density function (pdf) is called normal distribution. It is a bell-shaped curve.

Where:
x is continuous. Called Normal Variate.
Normal distribution has two parameters µ and sigma.
Here π = 3.14 and e= 2.718.
Mean E(X)= µ,
VAR(X) = sigma^2
SD(X) =sigma


Characteristics of Normal Distribution
- Continuous probability Distribution
- Probability Density Function is given by

- Mu and sigma are the parameters
- It is bell-shaped and symmetric about its mean
- It is symmetrical on both sides (not skewed). Skewness = 0
- The mean, median and mode are equal
- Mean divide the curve into two equal parts
- The quartile deviation QD = 2/3 σ
- The mean deviation MD = 4/5σ
- The X-axis is asymptote to the curve
- Asymptote is a straight line that touches the curve at infinity. So the curve never touches the X-axis
Standard Normal Distribution:
Normal Variate(X) with mean µ = 0 and standard deviation σ = 1 is called standard normal variate(X). It is denoted by Z. The probability density function (p.d.f) is given by


Problem: Our data –> Marks of 15 students are given below.

Normal distribution and Standard normal distributions are drawn

Normal Distribution shows mean as 65 and standard deviation as 22.36
Standard normal distribution shows mean as 0 and standard deviation as 1
Calculation of Z

Characteristics of Standard Normal Variate
- Z is a standard Normal variate. To find any probability of X we can use standard normal variate (Z)
- Any normal distribution can be converted into a standard normal distribution
- Area can be read from the table of areas under standard normal curve
- Let X be a normal variate with mean µ and standard deviation σ
- Then Z is a standard normal variate
- Find Z using

- Standard Normal Variate is denoted by N(0,1)
- Statisticians have developed Standard Normal Table Values
- Z varies from-∞ to +∞
- The mean of standard normal distribution Is 0 and SD is 1
Advantage:
You can find the Probability of any value of X using Standard Normal variate Z after conversion of X. Let us explain this using one example problem.
Problem:
The weight of Halwa packed by the filling machine follows a normal distribution with mean weight of 500 gm and standard deviation of 10 gm. A pack is selected at random. What is the probability that
- The pack’s weight will exceed 515 gm?
- The pack’s weight lies within 480 and 520 gm?
- The proportion of packs will have less than 480 and greater than 520 gm
if 10000 packs are supplied how many packs will be rejected given that 480 gm and 520 gm are lower and upper limit for acceptance?
Solution: X is normal variate with parameters mean= 500 and standard deviation =10.Therefore Z, the Standard Normal Variate is found out by using



b) What is the probability that the pack’s weight lies within 480 and 529 gm


Probability of Rejection:
If the weight lies outside these values, then it will be rejected. The probability of rejection = 1- 0.9544=0.0456 The number of packets that will be rejected is given by N ∗P =10000∗0.0456 = 456
Symmetrical and Skewed
Normal Distribution is always symmetrical. If the data follows a normal distribution, then its mean, median and mode will be equal to each other
Mean = Median = Mode
If the distribution is asymmetrical then we call that distribution as skewed distribution. The normal distribution has a skewness of 0. Skewness tells us about where most of the values are concentrated on an ascending scale.
Thumb Rule

Left Skewed and Right Skewed


Summary

We have two types of Skewness:
Negative Skewness:
- If the skewness is less than 0 then the distribution is negatively skewed.
- For negatively skewed data, most of the values will be concentrated above the average value and tail on the left side of the distribution will be longer or flatter.
Positive Skewness:
- If the skewness is greater than 0 then the distribution is positively skewed.
- For positively skewed data, most of the values will be concentrated below the average value and tail on the right side of the distribution will be longer or flatter.
What does skewness tell us?
- Skewness of a data indicates the direction and relative magnitude of a distribution’s deviation from the normal distribution.
- Skewness considers the extremes(outliers) of the dataset rather than concentrating only on the average.
- Analyst need to look at the extremes (outliers)
Why skewed data is not used in creating machine learning models?
Many machine learning models assumes normal distribution but in reality, data points may not be perfectly symmetric. If the data are skewed, then this kind of model will always underestimate the skewness risk. Outliers and skewed data will never support the accuracy of the model undertaken.
Skewed Data in the real world
Real-world examples with Right Skewed Data
a) Income distribution – Right Skewed

Happiness Distribution

Real-world examples with Left Skewed Data
Retirement Age

Testing Score – GPA

In Right-skewed data mean will always be higher than median and mode
In left-skewed data mean will always be lower than median and mode
Skewness measures (formulae)

When our data is skewed either left or right, we can not use the techniques like log transformation, square-root transformation, Square transformation and Exponential Transformation. These techniques enable you to reduce the skewness and maintain the accuracy of the model performance
