Primer lesson: Giving quantitative measures of center (median and/or mean) and variability (interquartile range and/or mean absolute deviation), as well as describing any overall pattern and any striking deviations from the overall pattern with reference to the context in which the data were gathered.

Understanding Center and Variability in Data 🧮

Imagine you and your friends are comparing how many minutes you spend on your favorite video game each day. Some of you might play a lot, some just a little. How can we describe the group’s playing habits in a fair, clear way—not just by looking at one person? That is where the ideas of center and variability of data come in.

In this lesson, you learn how to:

Describe the center of a data set using mean and median.
Describe how spread out the data are using interquartile range (IQR) and mean absolute deviation (MAD).
Talk about the overall pattern of a data set.
Notice and describe any striking deviations from that pattern, all while thinking about the real-world context the data come from.

These ideas help you make sense of real life: from sports scores 🏀 to test results, from sleep hours to steps on a fitness tracker.

The dot plot in [Figure 1] shows how data values line up on a number line and helps us see center, spread, and unusual points.

A simple dot plot on a number line showing daily hours of video game play for a small group of students, with dots above whole-number values like 0, 1, 2, 3, 4

1. Data and Distributions

Data are pieces of information, usually numbers, that tell us about something. For example:

The number of text messages you send each day.
The number of minutes you read each night.
The heights of students in your class.

When we collect many data values and display them (for example, in a dot plot, line plot, or histogram), we get a distribution. A distribution shows how often each value appears.

The distribution helps us answer questions like:

What values are common?
What values are rare?
Are the data tightly packed together or spread out?

To describe a distribution in a useful way, we usually talk about:

Center – a number that represents a “typical” or “middle” value.
Variability – how spread out the data are.
Overall pattern – the general shape and trend.
Striking deviations – values that stand out as very different from the rest.

2. Measures of Center: Mean and Median

Center gives us a number that describes what is “in the middle” or “typical” for the data.

2.1 Mean (the “fair share” average)

The mean is what most people call the “average.” You find it by:

Adding all the data values.
Dividing by how many values there are.

If the data values are numbers like 3, 4, and 7, the mean is: \(3 + 4 + 7 = 14\) and there are 3 numbers, so \(14 / 3\).

You can think of the mean as the fair share. If you took all the “points” or “amounts” and shared them equally among all the data points, each one would have the mean.

Example (concept): If three friends have 2, 4, and 6 pieces of candy, that is 12 pieces total. If they share fairly, each friend gets 4 pieces. So the mean number of candies is 4.

2.2 Median (the middle value)

The median is the middle number when the data are lined up in order from smallest to largest.

Put the data values in order.
If there is an odd number of values, the median is the middle one.
If there is an even number of values, the median is the average of the two middle numbers.

The median is good for describing what is “typical” when there are a few really large or really small values that might pull the mean up or down.

The box plot in [Figure 2] shows where the median sits in the middle of the data, along with quartiles and the IQR.

A box-and-whisker plot on a number line labeled with minimum, Q1, median, Q3, and maximum, clearly showing the box and whiskers

3. Measures of Variability: IQR and MAD

Two data sets can have the same center but be very different. One can be tightly packed; another can be very spread out. Variability tells us how spread out the data are.

3.1 Interquartile Range (IQR)

The interquartile range (IQR) measures the spread of the middle half of the data.

To find the IQR:

Order the data from smallest to largest.
Find the median.
Find the lower quartile (first quartile, Q1): the median of the lower half of the data.
Find the upper quartile (third quartile, Q3): the median of the upper half of the data.
Compute IQR using the formula: \[\textrm{IQR} = Q3 - Q1\]

The IQR tells you how wide the “middle 50%” of the data is. A small IQR means the middle data are close together. A large IQR means the middle data are more spread out.

3.2 Mean Absolute Deviation (MAD)

The mean absolute deviation (MAD) tells us, on average, how far the data values are from the mean.

To find the MAD:

Find the mean of the data.
For each data value, find its distance from the mean. This distance is called the absolute deviation.
Find the mean of all those absolute deviations.

We can write this idea as:

First find the mean of the data, call it \(m\). Then for each value \(x\), find its distance \(|x - m|\). The MAD is the mean of all those distances.

A small MAD means most values are close to the mean. A large MAD means values are often far from the mean.

4. Overall Pattern and Striking Deviations

When we look at a graph of data, like a line plot, histogram, or box plot, we want to describe the overall pattern. We also want to notice any striking deviations—values that do not fit the pattern.

Some things to describe in the overall pattern:

Shape – Is it roughly symmetric, or does it lean to one side (skewed)?
Center – What is a typical value (mean or median)?
Spread – How wide are the data (range, IQR, MAD)?

Striking deviations are unusual values, sometimes called outliers. These are points that are much higher or lower than most of the data.

When we spot a striking deviation, we always ask: What might this mean in the real-world context?

Was there an error in measuring or recording?
Did something special happen that day?
Is this showing an important difference?

The line plot in [Figure 3] highlights one point far away from the rest, which is a striking deviation from the overall pattern.

A line plot of data clustered between 4 and 8 with a single dot at 15 clearly labeled as an outlier/striking deviation

5. Worked Example 1: Minutes of Reading

Suppose five students track how many minutes they read after school one day:

Data (in minutes): 10, 20, 20, 30, 60

Step 1: Find the mean

Add the numbers: \(10 + 20 + 20 + 30 + 60 = 140\).

There are 5 data values.

Mean: \[\textrm{mean} = \frac{140}{5} = 28\]

So the mean reading time is 28 minutes.

Step 2: Find the median

The data are already in order: 10, 20, 20, 30, 60.

There are 5 values, so the median is the 3rd value.

Median = 20 minutes.

Notice: the mean (28) is higher than the median (20) because of the 60-minute reading time, which is much larger than the others.

Step 3: Find the IQR

Median is 20. The lower half is 10, 20. The upper half is 30, 60.
Lower quartile Q1: median of 10 and 20. \[Q1 = \frac{10 + 20}{2} = 15\]
Upper quartile Q3: median of 30 and 60. \[Q3 = \frac{30 + 60}{2} = 45\]
Now find the IQR: \[\textrm{IQR} = Q3 - Q1 = 45 - 15 = 30\]

The middle half of the reading times covers 30 minutes (from 15 to 45).

Step 4: Find the MAD

We already found the mean: 28.
Find the distance from 28 for each value:

For 10: distance is \(|10 - 28| = 18\)
For 20: distance is \(|20 - 28| = 8\)
For the second 20: also 8
For 30: distance is \(|30 - 28| = 2\)
For 60: distance is \(|60 - 28| = 32\)

Now find the mean of these distances:

Add them: \(18 + 8 + 8 + 2 + 32 = 68\)

There are 5 distances.

MAD: \[\textrm{MAD} = \frac{68}{5} = 13.6\]

This tells us that, on average, each student’s reading time is about 13.6 minutes away from the mean of 28 minutes.

Step 5: Describe the overall pattern and striking deviations (in context)

Center: A typical reading time is around 20–28 minutes. The median is 20 minutes, and the mean is 28 minutes.

Variability: The IQR is 30 minutes, and the MAD is about 13.6 minutes, so students’ reading times are pretty spread out.

Striking deviation: The 60-minute reading time is much larger than the others (10, 20, 20, 30). This one value pulls the mean up and makes the spread larger. In context, this could mean one student really loves reading that day 📚, or had extra time.

6. Worked Example 2: Steps per Day

Six students count their steps using fitness trackers:

Data (steps): 4,000; 5,000; 5,000; 6,000; 7,000; 20,000

Write them as: 4000, 5000, 5000, 6000, 7000, 20000.

Step 1: Find the mean

Add the numbers:

4000 + 5000 + 5000 + 6000 + 7000 + 20000 = 46000

There are 6 values.

Mean: \[\textrm{mean} = \frac{46000}{6} \approx 7666.7\]

So the mean is about 7,667 steps.

Step 2: Find the median

The data in order: 4000, 5000, 5000, 6000, 7000, 20000.

With 6 values, the median is the average of the 3rd and 4th values.

3rd value = 5000; 4th value = 6000.

Median: \[\textrm{median} = \frac{5000 + 6000}{2} = 5500\]

A typical number of steps is around 5,500.

Step 3: Describe the pattern and striking deviations

Most data values (4000–7000) are between 4,000 and 7,000 steps.

One value, 20,000, is much larger than all the others. This is a striking deviation or possible outlier.

Overall pattern: Most students walk about 4,000 to 7,000 steps.
Striking deviation in context: One student walked 20,000 steps—that might be a day with a long hike or sports tournament.

Because of this one large value, the mean (about 7,667) is higher than what most students actually walked. The median (5,500) may better describe a “typical” student’s steps here.

7. Worked Example 3: Test Scores

Seven students take a quiz. Their scores (out of 10) are:

Data: 5, 6, 7, 7, 8, 9, 9

Step 1: Find the mean

Add the scores: \(5 + 6 + 7 + 7 + 8 + 9 + 9 = 51\)

There are 7 scores.

Mean: \[\textrm{mean} = \frac{51}{7} \approx 7.29\]

The mean score is about 7.3 out of 10.

Step 2: Find the median

Scores in order are already: 5, 6, 7, 7, 8, 9, 9.

With 7 values, the median is the 4th value, which is 7.

Median score = 7.

Step 3: Find the IQR

Median is 7 (4th value).
Lower half: 5, 6, 7.
Upper half: 8, 9, 9.

Lower quartile Q1: median of 5, 6, 7 → Q1 = 6.

Upper quartile Q3: median of 8, 9, 9 → Q3 = 9.

IQR: \[\textrm{IQR} = Q3 - Q1 = 9 - 6 = 3\]

The middle half of the scores is from 6 to 9.

Step 4: Describe pattern and variability

Center: A typical score is around 7 (median) or 7.3 (mean).
Variability: The IQR is 3, which is not very large compared to the scale 0–10. Most students scored within 3 points of each other.
Striking deviations: There are no extremely low or high scores that are far away from the rest. So there are no obvious striking deviations here.

This tells us the class performed fairly consistently on this quiz 🎯.

8. Real-World Applications

These ideas of mean, median, IQR, MAD, overall pattern, and striking deviations appear everywhere in real life:

Sports: To compare players, we look at average points per game (mean) and how consistent they are (variability). One “striking deviation” might be a super high-scoring game.
School: Teachers may look at test score distributions to understand how the class is doing. If one student has a score very different from everyone else, that is important to notice.
Health and fitness: Apps that track your steps, sleep, or heart rate use averages and spreads to show whether your week was typical or unusual.
Video games: Game designers examine data like average time spent on a level and how widely that time spreads to see if a level is too hard or too easy.
Social media: Companies look at average likes, comments, or watch time and notice outliers (viral posts) that are striking deviations from normal activity.

Knowing how to describe center and variability helps you understand when a number is “normal” for your situation or when something is surprising and worth a closer look 🔍.

9. Summary of Key Ideas ⭐

Data and distributions: Data are numbers that describe something. A distribution shows how those numbers are spread out.

Center:

Mean is the “fair share” average: add all values and divide by how many there are.
Median is the middle value when the data are ordered.

Variability:

IQR is the range of the middle half of the data: \(\textrm{IQR} = Q3 - Q1\).
MAD is the average distance of all data points from the mean.

Overall pattern: When you look at a graph, describe the shape, center, and spread of the data.

Striking deviations: Values that are far from most of the data are important to notice. In context, they might be mistakes, special events, or important discoveries.

By combining measures of center (mean and median) with measures of variability (IQR and MAD), and by paying attention to patterns and unusual values, you can turn raw numbers into clear stories about the world around you 🌍.

Giving quantitative measures of center (median and/or mean) and variability (interquartile range and/or mean absolute deviation), as well as describing any overall pattern and any striking deviations from the overall pattern with reference to the context in which the data were gathered.

Understanding Center and Variability in Data 🧮

1. Data and Distributions

2. Measures of Center: Mean and Median

2.1 Mean (the “fair share” average)

2.2 Median (the middle value)

3. Measures of Variability: IQR and MAD

3.1 Interquartile Range (IQR)

3.2 Mean Absolute Deviation (MAD)

4. Overall Pattern and Striking Deviations

5. Worked Example 1: Minutes of Reading

Step 1: Find the mean

Step 2: Find the median

Step 3: Find the IQR

Step 4: Find the MAD

Step 5: Describe the overall pattern and striking deviations (in context)

6. Worked Example 2: Steps per Day

Step 1: Find the mean

Step 2: Find the median

Step 3: Describe the pattern and striking deviations

7. Worked Example 3: Test Scores

Step 1: Find the mean

Step 2: Find the median

Step 3: Find the IQR

Step 4: Describe pattern and variability

8. Real-World Applications

9. Summary of Key Ideas ⭐

Download Primer to continue