Difference between Normalization and Standardization: 2 ways of feature scaling your data, How to do it in Excel

Difference between Normalization and Standardization

Normalization and Standardization are very integral parts of data processing. While processing data we often encounter different kind of variables which have different original scale. Using these scale can put more weightage to variables which have a large range in their data. In order to deal with this problem, we will be using feature rescaling of the independent variables so that the weights of all variables will be on the same scale.

In this article we will be discussing about the two feature scaling methods i.e. Normalization and Standardization. Both the terms are sometime used interchangeably. But they usually refer different things.

Standardization

It is also known as Z-score normalization and in it features are rescaled to ensure mean and standard deviation to be 0 and 1, respectively. The formula to rescale will be as given below.

Difference between Normalization and Standardization
Difference between Normalization and Standardization

It is useful in machine learning algorithms where weight inputs are required. It also requires for algorithms that use distance measurements.

Normalization

Normalization is also known as max-min Normalization. In this method, values are rescaled between 0 and 1. For every feature, the minimum value of that feature gets transformed into 0, and the maximum value gets transformed into 1. The equation can be seen below.

max-min normalization
max-min normalization

Difference between Normalization and Standardization

We use normalization when we know that the distribution of our data doesn’t follow Gaussian’s distribution. It is helpful in algorithms that don’t assume distribution of data like K-nearest Neighbors and Neural Networks.

On the other hand, Standardization is useful where data follow Gaussian’s distribution. Also unlike normalization, standardization does not have a bounding range due to which outlier will not get impacted if any.

Example of Normalization and Standardization

In this example, we will be sharing a case study wherein we have two groups and their KPIs. We wanted to rank them to check who is performing better. But there is a catch as both groups have a different kind of job and performance threshold. So it would be unjust to rank it by just combining the performance of both groups. To make it more practical we will normalize or standardize both the group first, then we will club the data and finally rank them basis a new scaled number.

Group Name Emplyee_Name KPI_1 RANK_before
Group-A Agent_1 155.9% 14
Group-A Agent_2 158.3% 12
Group-A Agent_3 150.7% 16
Group-A Agent_4 137.9% 25
Group-A Agent_5 114.0% 34
Group-A Agent_6 192.5% 1
Group-A Agent_7 191.2% 2
Group-A Agent_8 181.6% 3
Group-A Agent_9 177.7% 4
Group-A Agent_10 175.5% 5
Group-A Agent_11 171.8% 6
Group-A Agent_12 167.5% 7
Group-A Agent_13 165.5% 8
Sample dataset

We will add two more columns Standardization and Normalization and will do the calculation according to the formula mentioned above. After calculating standardization and normalization we will calculate rank by both method.

Group Name Emplyee_Name KPI_1 RANK_before Standardization Normalization RANK_after_Standardization RANK_after_Normalization
Group-A Agent_1 155.9% 14 0.3 0.69 17 17
Group-A Agent_2 158.3% 12 0.4 0.71 13 13
Group-A Agent_3 150.7% 16 0.1 0.64 19 19
Group-A Agent_4 137.9% 25 -0.4 0.53 33 33
Group-A Agent_5 114.0% 34 -1.4 0.33 51 51
Group-A Agent_6 192.5% 1 1.9 1.00 1 1
Group-A Agent_7 191.2% 2 1.8 0.99 2 2
Group-A Agent_8 181.6% 3 1.4 0.91 4 4
Group-A Agent_9 177.7% 4 1.3 0.87 5 5
Group-A Agent_10 175.5% 5 1.2 0.85 6 6
Group-A Agent_11 171.8% 6 1.0 0.82 7 7
Group-A Agent_12 167.5% 7 0.8 0.79 8 8
Group-A Agent_13 165.5% 8 0.8 0.77 9 9
Group-A Agent_14 164.0% 9 0.7 0.76 10 10

We can infer below observations from the above table:

  • Rank from standardization and normalization is the same.
  • New ranks are different from the rank calculated earlier as it is based on the rescaled method.

That is all for now for this topic.

You can read more on this topic from the below article.

https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/

https://www.geeksforgeeks.org/normalization-vs-standardization/

Leave a Comment

Your email address will not be published. Required fields are marked *