Understanding and interpreting boxplots: How to create them in Excel

Understanding and interpreting boxplots

What is boxplot

Boxplot also known as box and whisker plot is used in explanatory data analysis. It is used to show the variance and central tendency of any data set. By this, we can visually represent the distribution of data in a dataset. It also helps in comparing two or more sets of data.

How box plots look like

Box plot is a standardized way of showing distribution of data on the basis of five number summary.

Understanding and interpreting boxplots
Box plot structure

PS: Above image is taken from https://www.simplypsychology.org

In the above image, we can see the five summaries as the minimum(min), Q1(lower quartile), median(Q2), Q3(upper quartile), and maximum(max). We also see min and Q1 are connected by a whisker and also Q3 and max, though Q1 and Q3 are connected by a box which is divided by median(Q2).

You can read this to know about mean and median while to read Q1 and Q3 can be read here.

  • Min – Minimum value in the data set.
  • Max – Maximum value in the data set.
  • Q1 – also known as 1st quartile or 25th percentile is the middle value between median and minimum.
  • median/Q2 – It is the middlemost number of the data set when sorted in ascending or descending.
  • Q3 – also known as 3rd quartile or 75th percentile is the middle value between median and maximum.

How to create a box plot in excel

To create a box plot first we need to select our data then go to Insert >> Insert Statistic Chart >> Box and Whisker. A box plot will created as below.

Box plot in excel
Box plot comparison

How to read a box plot

In the above chart, we have taken 3 sets of data from some sampling. In this data, we see the highest maximum number is available in sample_1 while the lowest minimum is in sample_2. Also, we see the highest median is for sample_1 while sample_2 and sample_3 have almost the same median which means there is likely to be different than the next 2 groups. The size of the box means the interquartile range is less for sample_2 as compared to sample_1 and sample_3 which infer variation in sample_2 is less as compared to the other 2. Kindly note here the x signal in the boxplot here shows the mean of the data sets.

Advantages and disadvantages of boxplot

1Represents data distribution of the datasetDoesn’t show individual values
2Provides 5 statistical summaryIt can be used only for numerical data.
3Outliers don’t affectThe mode cannot be identified in a box plot.
4Very good for comparing data
Advantages and disadvantages of box plot

That’s all for now, will meet again with some new posts.

You can read more about Boxplot form the below link



Leave a Comment