Decoding the Numbers: How Linear Regression Reveals Hidden Relationships

Have you ever wondered if taller people really do have broader shoulders? Whether consuming more calories at Starbucks means consuming more carbs? Or if more attractive professors actually get better teaching evaluations?

These are more than just idle questions. They touch on fundamental patterns in nutrition, human physiology, and even unconscious bias in education. Beneath these questions lies a powerful statistical tool that can help us uncover meaningful relationships in data: linear regression.

In this article, we’ll explore how linear regression works, when to use it, and what it can tell us about the world around us. Using three real-world examples, we’ll see how this technique allows us to predict one variable from another and quantify just how strong these relationships really are.

What Is Linear Regression, Really?

At its heart, linear regression is a way to describe how one variable changes when another variable changes. It allows us to draw a “line of best fit” through scattered data points, creating a model that can help us make predictions.

Think of it as finding the most accurate trend line possible — one that minimizes the distance between each actual data point and our prediction line.

The resulting equation generally looks like this:

ŷ = a + bx

Where:

ŷ is our predicted value
x is our input value
a is the y-intercept (where the line crosses the y-axis)
b is the slope (how much y changes when x increases by 1)

But rather than get lost in formulas, let’s see how this works with real data.

Case Study 1: Calories and Carbs at Starbucks

Next time you’re at Starbucks, look at the menu board. You’ll notice calorie counts prominently displayed — but what about carbohydrates? For people managing conditions like diabetes or following keto diets, this information is crucial.

Using a dataset of Starbucks menu items, researchers found a clear positive relationship between calories and carbohydrates:

shows the relationship between
the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain

The regression analysis revealed that for every additional calorie in a Starbucks item, you can expect about 0.1 grams of additional carbohydrates. The equation looks like this:

Carbs (g) = 10 + 0.1 × Calories

This means a Starbucks item with 300 calories would likely contain around 40 grams of carbs (10 + 0.1 × 300).

How reliable is this prediction? The residual plots (showing the differences between actual and predicted values) indicate that while the relationship is strong, there’s still variability. A 300-calorie item might have 30 grams of carbs, or it might have 50 — depending on whether it’s higher in fat, protein, or carbohydrates.

This illustrates an important principle: regression gives us trends, not certainties. It’s a powerful tool for making educated predictions, but those predictions always come with a margin of error.

Case Study 2: Height and Shoulder Girth

Is there a relationship between how tall someone is and how broad their shoulders are? Researchers collected data from 507 physically active individuals, measuring both height and shoulder girth (the circumference around the deltoid muscles).

The data revealed a moderately strong positive relationship:

shows the relationship between height and shoulder girth (over
deltoid muscles), both measured in centimeters.

The regression equation was:

Height (cm) = 105.96 + 0.608 × Shoulder girth (cm)

This means that for each additional centimeter of shoulder girth, we predict an increase of about 0.61 centimeters in height.

The correlation coefficient (r) was 0.67, which means the relationship is moderately strong. When we square this (R²), we get 0.45 — indicating that about 45% of the variability in height can be explained by shoulder girth.

What about the remaining 55%? That’s due to other factors not captured in our model — genetics, proportions of leg length to torso length, and individual variations in body structure.

This highlights another key insight: partial explanation is still valuable.

Even though our model doesn’t perfectly predict height, knowing someone’s shoulder measurements improves our prediction accuracy by 45% compared to just guessing the average height for everyone.

Can We Use This Model for Anyone?

An important question arose: Could we use this model to predict the height of a one-year-old child with a 56 cm shoulder girth?

The answer is a resounding no — and this introduces a crucial concept in regression analysis: models only apply within the range of data used to create them. Our data came from physically active adults with shoulder girths between 85–135 cm. Using it for infants would be extrapolating far beyond our data range and would produce meaningless results.

Additionally, children have fundamentally different body proportions than adults, so the relationship between measurements follows entirely different patterns.

This teaches us an important lesson: know the limits of your model. Statistical relationships discovered in one population may not apply to others, and predictions should never be made far outside the range of your original data.

Case Study 3: Beauty and Teaching Evaluations

Our final example touches on something more controversial: do students give better teaching evaluations to more attractive professors?

Researchers at the University of Texas, Austin collected data on teaching evaluation scores and standardized beauty scores for 463 professors. The beauty scores were standardized so that 0 represents average attractiveness, negative values are below average, and positive values are above average.

The data showed a statistically significant positive relationship:

The regression equation was:

Teaching evaluation = 4.010 + 0.133 × Beauty score

This means that for each one-unit increase in standardized beauty (a substantial increase, moving from average to quite attractive), the teaching evaluation score is expected to increase by 0.133 points.

Is this effect meaningful? The t-value of 4.13 and p-value effectively zero suggest it’s highly unlikely this relationship occurred by chance. The evidence strongly supports that more attractive professors receive higher teaching evaluations, on average.

However, the scatterplot shows enormous variability. Many professors with below-average beauty scores received excellent evaluations, and some attractive professors received poor ratings. Beauty is just one of many factors affecting evaluations — and presumably, actual teaching effectiveness plays a much larger role.

This reveals another important insight about regression: statistical significance doesn’t always mean practical significance.

While there’s strong evidence of a beauty effect, its magnitude is relatively small compared to the overall variability in teaching evaluations.

What Makes a Good Regression Analysis?

Before using a regression model, we need to check if certain conditions are met:

Linearity: The relationship between variables should be linear.
Independence: Observations should be independent of one another.
Normality: The residuals (errors) should follow a normal distribution.
Equal variance: The spread of residuals should be consistent across the range of predicted values.

We typically check these using diagnostic plots, like those shown for the teaching evaluations study:

Residuals vs. beauty, histogram of residuals

normal Q-Q plot, and residuals vs. order of data collection

When these conditions are reasonably satisfied, we can have confidence in our regression results.

From Data to Decisions: Why Regression Matters

Linear regression isn’t just a statistical technique — it’s a way of thinking about relationships in the world around us. By quantifying these relationships, we can:

Make informed predictions: Estimating carbohydrates from calories helps people make dietary choices.
Understand underlying mechanisms: The relationship between height and shoulder width tells us about human proportions.
Reveal unconscious biases: The beauty-evaluation connection highlights how non-teaching factors may influence student ratings.

The real power of regression lies not in the formulas, but in how it helps us see patterns that might otherwise remain hidden.

The Limitations: What Regression Can’t Tell Us

Despite its power, regression has important limitations:

Correlation doesn’t imply causation: Just because two variables are related doesn’t mean one causes the other.
Extrapolation is dangerous: Predictions outside your data range are unreliable.
Models are simplifications: The real world is always more complex than our equations.
Data quality matters: Poor measurements or biased samples lead to misleading results.

Bringing It All Together

Linear regression gives us a lens to examine relationships between variables, whether we’re studying nutrition, human physiology, or educational bias. It allows us to make predictions based on patterns in data and quantify the strength of those relationships.

The next time you see a claim about how one factor relates to another — whether it’s calories and carbs, height and shoulder width, or beauty and teaching evaluations — ask yourself:

How strong is this relationship?
How much variability exists around the trend?
Does the relationship apply to the specific situation I’m interested in?

By thinking like a statistician, you can move beyond anecdotes and intuition to make decisions based on data. And in a world overflowing with information, that’s a skill worth developing.

Create your own predictions with our Starbucks Carbohydrate Predictor Tool, based on the regression model discussed in this article. (https://github.com/olimiemma/Starbucks-Carbohydrate-Predictor.)

Here’s your customized “About the Author” section that’s relevant to the article on linear regression:

About the Author

Emmanuel is a multidisciplinary expert who bridges data science, statistics, and practical applications in everyday contexts. With over two decades of experience as a software engineer, data analyst, and designer, he brings a unique perspective to statistical analysis and visualization that makes complex concepts accessible to everyone.

His approach to data analysis is distinguished by an ability to uncover meaningful patterns while maintaining a human-centered perspective. Rather than getting lost in technical formulas, Emmanuel focuses on how statistical tools like linear regression can provide practical insights into relationships that affect our daily lives — from nutrition and body measurements to educational assessment.

Emmanuel’s work combines rigorous statistical methods with nuanced interpretations that acknowledge real-world applications. He is passionate about transforming complex data into actionable insights that can inform both professional decisions and personal understanding.

Beyond technical analysis, Emmanuel creates tools like the Starbucks Carbohydrate Predictor that demonstrate how statistical models can be deployed in user-friendly applications.

His writing and podcasting work with institutions like MIT’s OpenCourseWare and Yeshiva University’s Katz School of Science and Health in Manhattan New York, further extends his mission to bridge technical expertise with practical applications that benefit everyday users.

Discover more of Emmanuel’s work, hobbies etc: https://linktr.ee/olimiemma

Search This Blog

Artificial IntelliTools