Statistics & Data Analysis: Mean, Median, Mode, and the Correlation Coefficient
Mean, median, and mode summarize a data set in one number — but they tell different stories. Plus correlation vs causation, and how to read a correlation coefficient.
One number to summarize many
Statistics is the art of compressing a list of numbers into a single value that captures the “typical” or “middle.” The Texas Algebra 1 CBE tests three measures of center, one measure of spread, and the basics of correlation.
Three measures of center
- Mean (average)
- Add all values, divide by count. Sensitive to outliers.
- Median
- Sort first. The middle value (or average of two middles if count is even). Resistant to outliers.
- Mode
- The value that appears most often. Can have multiple modes, or none.
Worked: mean
Worked: median
Why median sometimes wins
Five salaries: 40k, 42k, 45k, 50k, 2,000k (the boss).
Mean ≈ $435k — misleading! Median = $45k — truer picture.
When data has extreme values, prefer median.
Correlation coefficient (r)
The correlation coefficient r measures how well a straight line fits a scatter plot. It always falls between −1 and +1.
Correlation vs causation
Two variables can move together (high correlation) without one causing the other. Ice cream sales and drowning rates both rise in summer — but ice cream doesn't cause drowning. Both are caused by hot weather. Causation requires controlled experiments, not just observation.
3-second recap
- Mean = average. Median = middle (after sorting). Mode = most frequent.
- Outliers? Prefer the median.
- r in [−1, +1]: sign = direction, |r| = strength.
- Correlation ≠ causation.