- Using this technique you can replace the extreme values of your data.
- This is just to limit the effect of the outliers on the calculations or the results you get from the data
- The mean value calculated after such replacement of the extreme values is called winsrized mean
- For example, what is 90% winsorization?
- It means the replacement of the top 5% and bottom 5% of the data.
- The top 5% of the data is replaced by the value of the data at the 95th percentile and
- The value of the bottom 5% of the data is replaced by the value of the data at the 5th percentile.
We use the same dataset houseprice.csv
Load the libraries and read data

Descriptive statistics

Separate features and target(price)

Original data means

Draw boxplot using Original Data

Transform the data using Winsorizing Technique

Transformed data

Draw Boxplot using normalized data


Find mean of feature bedrooms

Find mean of Bathrooms

Find mean of sqft_living

Find mean of sqft_lot

Find mean of floors

Find mean of sqft_above

Find mean of sqft_basement

Find mean of sqft_living15

Find mean of yr_built



Normalization avoid the extreme values. So when we compare the means of feature before and after normalization we notice the mean of all features have reduced
