Box plot
Box plot, also known as a box-and-whisker plot, is a standardized way of displaying the distribution of data based on a five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. It is a graphical representation used in statistics to show the shape of the distribution, its central value, and its variability. Invented in 1977 by American statistician John Tukey, the box plot is a useful tool in exploratory data analysis for visually showing the distribution's skewness, kurtosis, and outliers.
Overview[edit | edit source]
A box plot divides data into quartiles. The "box" shows the interquartile range (IQR), which is the distance between the first and third quartiles. The "whiskers" extend from either side of the box to show the range of the data, typically to 1.5 * IQR above the third quartile and below the first quartile. Data points outside this range are considered outliers and are often plotted as individual points.
Components[edit | edit source]
- Minimum: The lowest data point excluding any outliers.
- First Quartile (Q1): Also known as the lower quartile, it is the median of the lower half of the dataset.
- Median: The middle value of the dataset.
- Third Quartile (Q3): Also known as the upper quartile, it is the median of the upper half of the dataset.
- Maximum: The highest data point excluding any outliers.
- Outliers: Data points that fall outside of the whiskers.
Construction[edit | edit source]
To construct a box plot, the following steps are typically followed:
- Arrange the data in ascending order.
- Calculate the median, first quartile (Q1), and third quartile (Q3).
- Draw a box from Q1 to Q3 with a line at the median.
- Calculate the interquartile range (IQR) by subtracting Q1 from Q3.
- Draw whiskers from the box to the smallest and largest values within 1.5 * IQR from the quartiles.
- Plot any data points that fall outside of the whiskers as outliers.
Variations[edit | edit source]
Several variations of the box plot exist, including notched box plots, which include a narrowing of the box around the median to provide a rough guide to the significance of the difference between medians; and variable width box plots, where the width of the box is proportional to the size of the group.
Applications[edit | edit source]
Box plots are widely used in descriptive statistics, data analysis, and research. They are particularly useful for comparing distributions between several groups or sets of data. In fields such as economics, medicine, and engineering, box plots provide a compact and efficient way to understand and communicate the distribution of data.
Advantages and Disadvantages[edit | edit source]
Advantages:
- Concise representation of data.
- Easy to compare multiple distributions.
- Identifies outliers.
Disadvantages:
- Does not depict the distribution in as much detail as a histogram or kernel density plot.
- Can be misleading if the data is not symmetric.
Search WikiMD
Ad.Tired of being Overweight? Try W8MD's physician weight loss program.
Semaglutide (Ozempic / Wegovy and Tirzepatide (Mounjaro / Zepbound) available.
Advertise on WikiMD
WikiMD's Wellness Encyclopedia |
Let Food Be Thy Medicine Medicine Thy Food - Hippocrates |
Translate this page: - East Asian
中文,
日本,
한국어,
South Asian
हिन्दी,
தமிழ்,
తెలుగు,
Urdu,
ಕನ್ನಡ,
Southeast Asian
Indonesian,
Vietnamese,
Thai,
မြန်မာဘာသာ,
বাংলা
European
español,
Deutsch,
français,
Greek,
português do Brasil,
polski,
română,
русский,
Nederlands,
norsk,
svenska,
suomi,
Italian
Middle Eastern & African
عربى,
Turkish,
Persian,
Hebrew,
Afrikaans,
isiZulu,
Kiswahili,
Other
Bulgarian,
Hungarian,
Czech,
Swedish,
മലയാളം,
मराठी,
ਪੰਜਾਬੀ,
ગુજરાતી,
Portuguese,
Ukrainian
WikiMD is not a substitute for professional medical advice. See full disclaimer.
Credits:Most images are courtesy of Wikimedia commons, and templates Wikipedia, licensed under CC BY SA or similar.
Contributors: Prab R. Tumpati, MD