Google Play badge

Giving quantitative measures of center (median and/or mean) and variability (interquartile range and/or mean absolute deviation), as well as describing any overall pattern and any striking deviations from the overall pattern with reference to the context in which the data were gathered.


Understanding Center and Variability in Data 🧮

Imagine you and your friends are comparing how many minutes you spend on your favorite video game each day. Some of you might play a lot, some just a little. How can we describe the group’s playing habits in a fair, clear way—not just by looking at one person? That is where the ideas of center and variability of data come in.

In this lesson, you learn how to:

These ideas help you make sense of real life: from sports scores 🏀 to test results, from sleep hours to steps on a fitness tracker.

The dot plot in [Figure 1] shows how data values line up on a number line and helps us see center, spread, and unusual points.

A simple dot plot on a number line showing daily hours of video game play for a small group of students, with dots above whole-number values like 0, 1, 2, 3, 4
A simple dot plot on a number line showing daily hours of video game play for a small group of students, with dots above whole-number values like 0, 1, 2, 3, 4
1. Data and Distributions

Data are pieces of information, usually numbers, that tell us about something. For example:

When we collect many data values and display them (for example, in a dot plot, line plot, or histogram), we get a distribution. A distribution shows how often each value appears.

The distribution helps us answer questions like:

To describe a distribution in a useful way, we usually talk about:

2. Measures of Center: Mean and Median

Center gives us a number that describes what is “in the middle” or “typical” for the data.

2.1 Mean (the “fair share” average)

The mean is what most people call the “average.” You find it by:

  1. Adding all the data values.
  2. Dividing by how many values there are.

If the data values are numbers like 3, 4, and 7, the mean is: \(3 + 4 + 7 = 14\) and there are 3 numbers, so \(14 / 3\).

You can think of the mean as the fair share. If you took all the “points” or “amounts” and shared them equally among all the data points, each one would have the mean.

Example (concept): If three friends have 2, 4, and 6 pieces of candy, that is 12 pieces total. If they share fairly, each friend gets 4 pieces. So the mean number of candies is 4.

2.2 Median (the middle value)

The median is the middle number when the data are lined up in order from smallest to largest.

  1. Put the data values in order.
  2. If there is an odd number of values, the median is the middle one.
  3. If there is an even number of values, the median is the average of the two middle numbers.

The median is good for describing what is “typical” when there are a few really large or really small values that might pull the mean up or down.

The box plot in [Figure 2] shows where the median sits in the middle of the data, along with quartiles and the IQR.

A box-and-whisker plot on a number line labeled with minimum, Q1, median, Q3, and maximum, clearly showing the box and whiskers
A box-and-whisker plot on a number line labeled with minimum, Q1, median, Q3, and maximum, clearly showing the box and whiskers
3. Measures of Variability: IQR and MAD

Two data sets can have the same center but be very different. One can be tightly packed; another can be very spread out. Variability tells us how spread out the data are.

3.1 Interquartile Range (IQR)

The interquartile range (IQR) measures the spread of the middle half of the data.

To find the IQR:

  1. Order the data from smallest to largest.
  2. Find the median.
  3. Find the lower quartile (first quartile, Q1): the median of the lower half of the data.
  4. Find the upper quartile (third quartile, Q3): the median of the upper half of the data.
  5. Compute IQR using the formula: \[\textrm{IQR} = Q3 - Q1\]

The IQR tells you how wide the “middle 50%” of the data is. A small IQR means the middle data are close together. A large IQR means the middle data are more spread out.

3.2 Mean Absolute Deviation (MAD)

The mean absolute deviation (MAD) tells us, on average, how far the data values are from the mean.

To find the MAD:

  1. Find the mean of the data.
  2. For each data value, find its distance from the mean. This distance is called the absolute deviation.
  3. Find the mean of all those absolute deviations.

We can write this idea as:

First find the mean of the data, call it \(m\). Then for each value \(x\), find its distance \(|x - m|\). The MAD is the mean of all those distances.

A small MAD means most values are close to the mean. A large MAD means values are often far from the mean.

4. Overall Pattern and Striking Deviations

When we look at a graph of data, like a line plot, histogram, or box plot, we want to describe the overall pattern. We also want to notice any striking deviations—values that do not fit the pattern.

Some things to describe in the overall pattern:

Striking deviations are unusual values, sometimes called outliers. These are points that are much higher or lower than most of the data.

When we spot a striking deviation, we always ask: What might this mean in the real-world context?

The line plot in [Figure 3] highlights one point far away from the rest, which is a striking deviation from the overall pattern.

A line plot of data clustered between 4 and 8 with a single dot at 15 clearly labeled as an outlier/striking deviation
A line plot of data clustered between 4 and 8 with a single dot at 15 clearly labeled as an outlier/striking deviation
5. Worked Example 1: Minutes of Reading

Suppose five students track how many minutes they read after school one day:

Data (in minutes): 10, 20, 20, 30, 60

Step 1: Find the mean

Add the numbers: \(10 + 20 + 20 + 30 + 60 = 140\).

There are 5 data values.

Mean: \[\textrm{mean} = \frac{140}{5} = 28\]

So the mean reading time is 28 minutes.

Step 2: Find the median

The data are already in order: 10, 20, 20, 30, 60.

There are 5 values, so the median is the 3rd value.

Median = 20 minutes.

Notice: the mean (28) is higher than the median (20) because of the 60-minute reading time, which is much larger than the others.

Step 3: Find the IQR
  1. Median is 20. The lower half is 10, 20. The upper half is 30, 60.
  2. Lower quartile Q1: median of 10 and 20. \[Q1 = \frac{10 + 20}{2} = 15\]
  3. Upper quartile Q3: median of 30 and 60. \[Q3 = \frac{30 + 60}{2} = 45\]
  4. Now find the IQR: \[\textrm{IQR} = Q3 - Q1 = 45 - 15 = 30\]

The middle half of the reading times covers 30 minutes (from 15 to 45).

Step 4: Find the MAD
  1. We already found the mean: 28.
  2. Find the distance from 28 for each value:

Now find the mean of these distances:

Add them: \(18 + 8 + 8 + 2 + 32 = 68\)

There are 5 distances.

MAD: \[\textrm{MAD} = \frac{68}{5} = 13.6\]

This tells us that, on average, each student’s reading time is about 13.6 minutes away from the mean of 28 minutes.

Step 5: Describe the overall pattern and striking deviations (in context)

Center: A typical reading time is around 20–28 minutes. The median is 20 minutes, and the mean is 28 minutes.

Variability: The IQR is 30 minutes, and the MAD is about 13.6 minutes, so students’ reading times are pretty spread out.

Striking deviation: The 60-minute reading time is much larger than the others (10, 20, 20, 30). This one value pulls the mean up and makes the spread larger. In context, this could mean one student really loves reading that day 📚, or had extra time.

6. Worked Example 2: Steps per Day

Six students count their steps using fitness trackers:

Data (steps): 4,000; 5,000; 5,000; 6,000; 7,000; 20,000

Write them as: 4000, 5000, 5000, 6000, 7000, 20000.

Step 1: Find the mean

Add the numbers:

4000 + 5000 + 5000 + 6000 + 7000 + 20000 = 46000

There are 6 values.

Mean: \[\textrm{mean} = \frac{46000}{6} \approx 7666.7\]

So the mean is about 7,667 steps.

Step 2: Find the median

The data in order: 4000, 5000, 5000, 6000, 7000, 20000.

With 6 values, the median is the average of the 3rd and 4th values.

3rd value = 5000; 4th value = 6000.

Median: \[\textrm{median} = \frac{5000 + 6000}{2} = 5500\]

A typical number of steps is around 5,500.

Step 3: Describe the pattern and striking deviations

Most data values (4000–7000) are between 4,000 and 7,000 steps.

One value, 20,000, is much larger than all the others. This is a striking deviation or possible outlier.

Because of this one large value, the mean (about 7,667) is higher than what most students actually walked. The median (5,500) may better describe a “typical” student’s steps here.

7. Worked Example 3: Test Scores

Seven students take a quiz. Their scores (out of 10) are:

Data: 5, 6, 7, 7, 8, 9, 9

Step 1: Find the mean

Add the scores: \(5 + 6 + 7 + 7 + 8 + 9 + 9 = 51\)

There are 7 scores.

Mean: \[\textrm{mean} = \frac{51}{7} \approx 7.29\]

The mean score is about 7.3 out of 10.

Step 2: Find the median

Scores in order are already: 5, 6, 7, 7, 8, 9, 9.

With 7 values, the median is the 4th value, which is 7.

Median score = 7.

Step 3: Find the IQR
  1. Median is 7 (4th value).
  2. Lower half: 5, 6, 7.
  3. Upper half: 8, 9, 9.

Lower quartile Q1: median of 5, 6, 7 → Q1 = 6.

Upper quartile Q3: median of 8, 9, 9 → Q3 = 9.

IQR: \[\textrm{IQR} = Q3 - Q1 = 9 - 6 = 3\]

The middle half of the scores is from 6 to 9.

Step 4: Describe pattern and variability

This tells us the class performed fairly consistently on this quiz 🎯.

8. Real-World Applications

These ideas of mean, median, IQR, MAD, overall pattern, and striking deviations appear everywhere in real life:

Knowing how to describe center and variability helps you understand when a number is “normal” for your situation or when something is surprising and worth a closer look 🔍.

9. Summary of Key Ideas ⭐

Data and distributions: Data are numbers that describe something. A distribution shows how those numbers are spread out.

Center:

Variability:

Overall pattern: When you look at a graph, describe the shape, center, and spread of the data.

Striking deviations: Values that are far from most of the data are important to notice. In context, they might be mistakes, special events, or important discoveries.

By combining measures of center (mean and median) with measures of variability (IQR and MAD), and by paying attention to patterns and unusual values, you can turn raw numbers into clear stories about the world around you 🌍.

Download Primer to continue