Ordinal encoding is a method for converting categorical data into numerical data while preserving the inherent order or ranking of the categories.  It’s useful when working with machine learning models that expect numerical input features, especially when the categories have a natural order (like “low,” “medium,” “high”).

Ordinal encoding assigns unique numerical values based on the order of the categories whereas one-hot encoding creates a separate binary columns for each category. Ordinal encoding maintains the natural order of the categories, which can be crucial for models that can interpret this order (e.g., linear models, SVMs). It’s more efficient than one-hot encoding, especially for categorical features with many categories, as it doesn’t inflate the dimensionality of the feature space. It’s well-suited for models that require numerical input, like neural networks, where categorical features can be a barrier. 

Advantages:

  1. Preserves the order
  2. Reduces dimensionality
  3. Suitable for various models that require numerical input

Download from kaggle

This is another encoding technique meant for categories which have a inherent meaningful order. Say the feature  education_level contains ‘Secondary’, ‘Primary’, ‘Postgraduate’,’ University’, ‘College’ and ‘High School’. Let us download addiction_population_data.csv from kaggle website Addiction Population EDA & Visulation.

Store the file in any one of your drive. In my case I have stored in my D drive as addiction.csv

Load libraries and read data

Dataset contains 25 columns.

information about data

the following features contain missing values

  1. education_level
  2. social_support
  3. therapy_history

Since they contain missing values (NaN or na) their Dtype is mentioned as object. The first task is to fill the missing values are remove them.  It is better to fill the missing values as shown below

Now check if any features contain missing values

Now there are no missing values

Categories/class as they appear in dataset.

Ordinal Encoder

Alphabetical Order

OrdinalEncoder does not code education_level in same hierarchy as we do in the real life. It codes in alphabetical order.

To get the code in the same hierarchy as we do in our real life for education level

Once again Load and read data in other dataframe

Find the categories/class of education_level

Since the education_level had  many missing values. We filled with the class “College”

you have to import OrdinalEncoder lib from sklearn.preprocessing module

Manually create categories order as a list. Using OrdinalEncoder function  encode categories as per categories order we have defined.

Programmed Order

Now categories of education_level feature are coded as we desired.