Leave-one-out encoding is a technique used to encode categorical variables in machine learning. It’s a modification of target encoding where the average target value for a category is calculated excluding the target value of the current data point. This helps to mitigate target leakage and reduce the impact of outliers.

The way how it works:

  1. Identify the category class  of  the data point in the categorical feature
  2. Calculate  the average  target value for that category, but exclude the target value for the current data point(current row)
  3. Replace the original categorical value with this calculated average

 

  • Imagine you have a dataset with a “City” column and a “SalePrice” column (target variable).
  • You have a data point for “Chennai city ” with a “SalePrice” of  Rs65,00,000.
  • To calculate the leave-one-out encoding for “Chennai  City”, you would:
    • Find all other data points in the dataset with “Chennai City”.
    • Calculate the average “SalePrice” for those other data points (excluding the Rs.65,00,000).
    • Replace the original “Chennai City” with this average value.

Example: Imagine you have a dataset with  5 categorical features  like Region, country, item type, Sale Channel, Order Priority and two numerical features like Units sold, unit price and the target variable Total Revenue.  Leave One Out encoding is similar to target encoding.

Load and read data

Separate Target (Total Revenue) and other independent variables; Here total revenue is separated from other features.  Other features have been assigned to X and target variable is assigned to y

Use LeaveOneOut Encoder and convert categorical features into numerical value based on mean of Total Revenue

Advantages:

 

Reduces target leakage:

Here you exclude the current data point of the target value ( first Total revenue is  Rs,3692591.2   relating Sales channel  – Offline) .

For finding out numerical representation of first offline:

Omit the target value and sum up the remaining  9 values of related to Offline category  and divide that by 9,

First offline numerical value = Sum of the remaining 9 values/9= 9712838.44/9 = 1079204.271

Now replace offline with the 1079204.271

you prevent the model from “seeing” the target value during training and thus avoid leakage.

Handles outliers:

Outliers in the target variable can heavily influence target encoding. Leave-one-out encoding mitigates this by excluding the current data point’s value when calculating the average.

More stable than target encoding:

By excluding the current data point’s target value, the model’s predictions are less sensitive to changes in individual data points.

In summary, leave-one-out encoding is a powerful technique for encoding categorical variables that helps to reduce target leakage, handle outliers, and provide a more stable and reliable representation of the data for machine learning models.