|


Defining Quartiles
Date: 07/20/2002 at 16:37:30
From: Tom
Subject: quartiles
Dear Dr. Math:
We have a project for statistics class where we have to collect a
set of data, then find the mean, median, mode, range, upper
quartile, lower quartile, interquartile range, and standard
deviation. We also have to plot the data in a stem-and-leaf plot,
dot plot, histogram, and box-and-whisker plot.
I decided to collect data on the heights of the players on our
soccer team, and got the following data:
{70", 71", 71", 71", 72", 73", 74", 74", 74", 74", 75", 75", 77",
77", 77", 82"}
I didn't have any problems until I was checking my work using my
calculator and a computer. All the values agreed with my hand
calculations except the upper quartile, lower quartile, and
interquartile range.
When I calculated the quartiles and IQR following the textbook, I
got 77" (UQ), 71" (LQ) and 6" (IQR). But when I plugged the values
into my calculator (a TI-83), it gave the upper quartile as 76", the
lower quartile as 71.5", and the IQR as 4.5". I then tried making an
Excel spreadsheet and it gave the upper quartile as 75.5", the lower
quartile as 71.75", and the IQR is 3.75". Then I went to the
computer lab at school and tried using Minitab. That program gave
the upper quartile as 76.5", the lower quartile as 71.25", and the
IQR as 5.25". If they just disagreed with my calculations, I'd
figure that I made a mistake, or there's some sort of rounding going
on, since we're told to take the nearest data point and these
programs obviously don't. But they don't even agree with each other.
They can't all be right! What's going on?
Please help clear up this mystery.
Thanks,
Tom
Date: 07/20/2002 at 16:45:49
From: Doctor Twe
Subject: Re: quartiles
Hi Tom! Thanks for writing to Dr. Math!
Quartiles are simple in concept but can be complicated in execution.
The concept of quartiles is that you arrange the data in ascending
order and divide it into four roughly equal parts. The upper
quartile is the part containing the highest data values, the upper
middle quartile is the part containing the next-highest data values,
the lower quartile is the part containing the lowest data values,
while the lower middle quartile is the part containing the next-
lowest data values.
Here's where it starts to get confusing. The terms 'quartile', 'upper
quartile' and 'lower quartile' each have two meanings. One definition
refers to the subset of all data values in each of those parts. For
example, if I say "my score was in the upper quartile on that math
test", I mean that my score was one of the values in the upper
quartile subset (i.e. the top 25% of all scores on that test).
But the terms can also refer to cut-off values between the subsets.
The 'upper quartile' (sometimes labeled Q3 or UQ) can refer to a
cut-off value between the upper quartile subset and the upper middle
quartile subset. Similarly, the 'lower quartile' (sometimes labeled Q1
or LQ) can refer to a cut-off value between the lower quartile subset
and the lower middle quartile subset.
The term 'quartiles' is sometimes used to collectively refer
to these values plus the median (which is the cut-off value between
the upper middle quartile subset and the lower middle quartile
subset). John Tukey, the statistician who invented the box-and-
whisker plot, referred to these cut-off values as 'hinges' to avoid
confusion. Unfortunately, not everyone followed his lead on that.
It gets worse. Statisticians don't agree on whether the quartile
values ('hinges') should be points from the data set itself, or
whether they can fall between the points (as the median can when
there are an even number of data points). Furthermore, if the
quartile value is not required to be a point in the data set itself,
most data sets don't have a unique set of values {Q1, Q2, Q3} that
divides the data into four "roughly equal" portions. The SAS
statistical software package, for example, allows you to choose from
among five different methods for calculating the quartile values.
How then do we choose the "best" value for the quartiles?
The answer to that question depends in part on the statisticians'
objective in finding quartile values. Tukey wanted a method that was
simple to use, "without the aid of calculating machinery." Others
seek to minimize the bias in selecting the quartile values. Still
others want methods that can be extended to other quantiles (for
example, quintiles or percentiles). Thus, different methods have
been developed for calculating the quartile values.
Tukey's method for finding the quartile values is to find the median
of the data set, then find the median of the upper and lower halves
of the data set. If there are an odd number of values in the data
set, include the median value in both halves when finding the
quartile values. For example, if we have the data set:
{1, 4, 9, 16, 25, 36, 49, 64, 81}
we first find the median value, which is 25. Since there are an odd
number of values in the data set (9), we include the median in both
halves. To find the quartile values, we must find the medians of:
{1, 4, 9, 16, 25} and {25, 36, 49, 64, 81}
Since each of these subsets has an odd number of elements (5), we
use the middle value. Thus the lower quartile value is 9 and the
upper quartile value is 49.
The TI-83 uses a method described by Moore and McCabe (sometimes
referred to as "M-and-M") to find quartile values. Their method is
similar to Tukey's, but you *don't* include the median in either
half when finding the quartile values. Using M-and-M on the data set
above:
{1, 4, 9, 16, 25, 36, 49, 64, 81}
we first find that the median value is 25. This time we'll exclude
the median from each half. To find the quartile values, we must find
the medians of:
{1, 4, 9, 16} and {36, 49, 64, 81}
Since each of these data sets has an even number of elements (4), we
average the middle two values. Thus the lower quartile value is
(4+9)/2 = 6.5 and the upper quartile value is (49+64)/2 = 56.5.
With each of the above methods, the quartile values are always
either one of the data points, or exactly half way between two data
points.
Those methods involve only simple arithmetic and are easily
extendable to octiles (eighths), hexadeciles (sixteenths), etc. They
are not, however, extendable to quintiles (fifths) or percentiles
(hundredths), etc. Furthermore, they tend to have a high bias. (That
is, the quartile values calculated on subsets of the data set tend
to vary more, and are not good predictors of the quartile values of
the entire data set.)
Mendenhall and Sincich, in their text _Statistics for Engineering
and the Sciences_, define a different method of finding quartile
values. To apply their method on a data set with n elements, first
calculate:
L = (1/4)(n+1)
and round to the nearest integer. If L falls halfway between two
integers, round up. The Lth element is the lower quartile value.
Next calculate:
U = (3/4)(n+1)
and round to the nearest integer. If U falls halfway between two
integers, round down. The Uth element is the upper quartile value.
So for our example data set:
{1, 4, 9, 16, 25, 36, 49, 64, 81}
n = 9, so
L = (1/4)(9+1) = 2.5
which becomes 3 after rounding up. The lower quartile value is the
3rd data point, 9. Similarly:
U = (3/4)(9+1) = 7.5
which becomes 7 after rounding down. The upper quartile value is the
7th data point, 49.
Using this method, the upper and lower quartile values are always
two of the data points.
Minitab uses the same method, except it doesn't round the values of
L and U. Instead, it uses linear interpolation between the two
closest data points. For our example above, instead of rounding L to
3, Minitab would let L = 2.5 and find the value half way between the
2nd and 3rd data points. In our example, that would be (4+9)/2 =
6.5. Similarly, the upper quartile value would be half way between
the 7th and 8th data points, which would be (49+64)/2 = 56.5. If L
were 2.25, Minitab would find the value one fourth of the way
between the 2nd and 3rd data points and if L were 2.75, Minitab
would find the value three fourths of the way between the 2nd and
3rd data points.
Excel uses a method described by Freund and Perles, which almost no
one else uses. To apply this method on a data set with n elements,
Excel first calculates L = (1/4)(n+3). The Lth element is the lower
quartile value. If L is not an integer, Excel uses linear
interpolation. Next it calculates U = (1/4)(3n+1). The Uth element
is the upper quartile value. If U is not an integer, Excel again
uses linear interpolation. So for our example data set:
{1, 4, 9, 16, 25, 36, 49, 64, 81}
n = 9, so
L = (1/4)(9+3) = 3
The lower quartile value is the 3rd data point, 9.
U = (1/4)(3*9+1) = 7
The upper quartile value is the 7th data point, 49.
As we can see, these methods sometimes (but not always) produce the
same results. To further illustrate, consider the following data
sets:
A = {1, 2, 3, 4, 5, 6, 7, 8}
B = {1, 2, 3, 4, 5, 6, 7, 8, 9}
C = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
D = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
Here are the upper and lower quartile values, as calculated by each
method described above:
Tukey M&M M&S Mini Excel
----- --- --- ---- -----
Set A LQ: 2.5 2.5 2 2.25 2.75
UQ: 6.5 6.5 7 6.75 6.25
Set B LQ: 3.0 2.5 3 2.50 3.00
UQ: 7.0 7.5 7 7.50 7.00
Set C LQ: 3.0 3.0 3 2.75 3.25
UQ: 8.0 8.0 8 8.25 7.75
Set D LQ: 3.5 3.0 3 3.00 3.50
UQ: 8.5 9.0 9 9.00 8.50
For more information on how and why different software packages
calculate the quartile values, check out:
Ancillary Notes on Quartiles
<http://wwwmaths.murdoch.edu.au/units/c503a/unitnotes/boxhisto/
quartilesmore.html>
Ticky-Tacky Boxes
http://exploringdata.cqu.edu.au/ticktack.htm
Quartiles: How to calculate them?
(This is a document in Microsoft Word format)
http://www-wl.itss.nerc.ac.uk/products/sas/doc/quartiles.doc
I hope this helps! If you have any more questions, write back!
- Doctor TWE, The Math Forum
http://mathforum.org/dr.math/
Date: 07/20/2002 at 16:51:50 From: Tom Subject: Thank you (quartiles) Thanks, Dr. Math! That really helps clear things up! Tom |
Search the Dr. Math Library: |
[Privacy Policy] [Terms of Use]


Ask Dr. MathTM
© 1994-2008 The Math Forum
http://mathforum.org/dr.math/