• Using this technique you can replace the extreme values of your data.
  • This is just to limit the effect of the outliers on the calculations or the results you get from the data
  • The mean value calculated after such replacement of the extreme values is called winsrized mean
  • For example, what is  90% winsorization?
  • It means the replacement of the top 5% and bottom 5% of the data.
  • The top 5% of the data is replaced by the value of the data at the 95th percentile and
  • The value of the bottom 5% of the data is replaced by the value of the data at the 5th percentile.

We use the same dataset houseprice.csv

Load the libraries and read data

Descriptive statistics

Separate features and target(price)

Original data means

 

 

Draw boxplot using Original Data

Transform the data using Winsorizing Technique

Transformed data

Draw Boxplot using normalized data

Find mean of feature bedrooms

Find mean of Bathrooms

Find mean of sqft_living

Find mean of sqft_lot

Find mean of floors

 

Find mean of sqft_above

 

Find mean of sqft_basement

Find mean of sqft_living15

Find mean of yr_built

 

Normalization avoid the extreme values. So when we compare the means of feature before and after normalization we notice the mean of all features have reduced