One Hot Encoding VS Label Encoding

Prasant Kumar
3 min readAug 8, 2021

In my initial stage of the journey of Machine Learning, I was really confused about which encoding technique to apply. I thought, and I hope you must have also got this thing into your mind anytime that

  1. Why do we apply One Hot Encoding one feature and Label Encoding in other,
  2. Why do we select features for One Hot Encoding and Label Encoding separately?

Very firstly let's understand types of encoding

  1. Nominal Encoding

Nominal Encoding is specifically applied to only those features which are having Independent categories in nature. Like, Gender, Country, Blood Group, and various others which can’t be compared with each other.

Creating a dataset for giving an example

Here we use One Hot Encoders for encoding because it creates a separate column for each category, there it defines whether the value of the category is mentioned for a particular entry or not by mentioning its value as 0 or 1.

One-Hot Encoding on Gender Column

2. Ordinal Encoding

Ordinal Encoding is specifically applied to only those features which are having Comparable categories in nature. Like, Qualifications, Economic Status, Satisfaction rating and various others which can be compared with each other.

Creating a dataset for giving an example

There we use Label Encoders for encoding because they replace them with labels that are comparable with each other.

Taking the example of Satisfaction rating replacing “extremely dislike”- 0, “dislike”- 1, “neutral”- 2, “like”- 3, “extremely like”- 4, using labels we can better compare the amount of satisfaction to the customer.

Automatically assigning any labels to any category, It can be done manually also.

What if we apply Label Encoding at the place of One Hot Encoding or vice-versa?

The model will treat the Independent Category feature and Comparable Category feature or vice-versa which can ruin the whole sense of the use-case and the aim of the use-case will also get affected because of this mistake.

So beware when you are applying encoding to your features, firstly identify them as Independent or Comparable Category features then choose which encoding technique to apply on which features.

I hope now all your doubts about which one to choose at for what feature is clear.

--

--