Hashing Encoding:

It is also known as feature hashing. This technique converts categorical features into numerical values using hashing function. It is useful when you deal with high-cardinality datasets where one-hot encoding may not be practical due to memory constraints as that technique may create a large number of columns. Dimensionality curse will not allow us to undertake one-hot encoding technique.
Hash Function:
A function that maps a large input (like a string representing a category) to a smaller, fixed-size output (a hash value).
Collision: The possibility that two different input values (categories) map to the same hash value.
High Cardinality: A feature with a large number of unique values.
Working way
Input: A categorical feature with a large number of unique categories (e.g., different product names, cities).
Hashing: A hash function is applied to each category, producing a hash value.
Numeric Representation: The hash values are then used as numerical representations of the categories, suitable for use in machine learning models.
Advantages:
Memory Efficiency: Hashing reduces the number of features needed to represent categorical data, especially when you deal with high-cardinally data
Scalable: Allows model to handle a large number of unique categories without significant memory overhead
Can handle Unseen Values
Disadvantages:
Collisions:
The risk of different categories hashing to the same value, which can impact model performance.
Potential for Information Loss:
Collisions can lead to a loss of information about the original categories.
Load Libraries

Read data

Using BinaryEncoder convert gender category into numerical value


Using Hash encoding convert the categorical feature ‘card’ into hash value

Concate hashed_df with the original Df and drop the original feature Card

Type is remaining. containing 4 types

Concate hashed_df with original df and drop the original feature “type”

