Beauty in the Classroom: What Really Drives Professor Evaluations? (A data-driven exploration)
Beauty in the Classroom: What Really Drives Professor Evaluations? (A data-driven exploration)
A data-driven exploration of how appearance, gender, and other factors influence teaching evaluations

We’ve all been there at the end of a semester: faced with those standardized evaluation forms asking us to rate our professors. Most universities consider these evaluations essential for tenure decisions, promotion, and even annual performance reviews. But what factors are truly being measured in these evaluations? Are they valid indicators of teaching effectiveness, or are they influenced by factors entirely unrelated to a professor’s teaching abilities?
In this article, I’ll dive into an analysis of real evaluation data from the University of Texas at Austin to uncover what’s actually driving student ratings. The results may surprise you ,and make you think differently about those evaluation forms.
The Dataset: Beauty and the Professor
The data comes from a fascinating study titled “Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity” by Hamermesh and Parker (2005). The researchers collected end-of-semester evaluations for 463 courses taught by 94 professors. Additionally, they had six students rate the professors’ physical appearance on a scale of 1–10.
The dataset contains numerous variables including professors’ gender, age, ethnicity, educational background, as well as course characteristics like class size and level. The central question: does a professor’s physical attractiveness impact their teaching evaluations?
Let’s start by looking at how students rated their professors.
How Do Students Rate Professors?

The first thing that jumps out is that student ratings are heavily skewed toward the high end of the scale. The evaluation scores range from 1 (very unsatisfactory) to 5 (excellent), but the vast majority of ratings fall between 3.5 and 5.0. This negative skew (or left skew) means that students generally give favorable evaluations, with relatively few giving low scores.
This distribution suggests several possibilities:
- Students tend to be generous in their evaluations
- Universities generally hire qualified instructors
- Students who disliked a course might be less likely to complete evaluations
- The 5-point scale might not provide enough discrimination at the high end
This skewed distribution will be important to keep in mind as we explore what factors influence these ratings.
Does Beauty Matter in the Classroom?
Now for the core question: does a professor’s physical attractiveness affect their teaching evaluations? Let’s look at the relationship between beauty ratings and evaluation scores.

The initial scatter plot shows what appears to be a slight positive correlation between beauty ratings and evaluation scores. However, when we first plotted this data, we encountered a common data visualization problem: overplotting. Many professors had identical or very similar scores, making it difficult to see the full pattern.
Using a technique called “jittering,” which adds small random displacements to each point, we get a better view of the data. Now we can see there’s a positive relationship, but it’s not as strong as we might have initially thought.

To quantify this relationship, I ran a linear regression model predicting evaluation scores based on beauty ratings. Here’s what I found:
score = 3.88 + 0.067 × beauty_rating
For each one-point increase in beauty rating, a professor’s predicted evaluation score increases by 0.067 points. This effect is statistically significant (p < 0.001), meaning it’s unlikely to be due to random chance.
However, the model’s R-squared value is only 0.035, indicating that beauty ratings explain just 3.5% of the variation in teaching evaluation scores. While there is a real relationship, it’s a relatively weak one.
Is It More Than Just Looks?
Beauty isn’t the only factor that might influence evaluations. Next I explored how gender impacts teaching ratings.
When I added gender to the model, I found something interesting:
score = 3.75 + 0.074 × beauty_rating + 0.172 × male
Not only does beauty remain significant, but gender emerges as an important factor too. Male professors receive ratings that are 0.172 points higher than female professors with equivalent beauty ratings. And interestingly, the beauty coefficient actually increased slightly when controlling for gender.
This suggests gender was acting as a confounding variable. In this dataset, male professors might have lower beauty ratings on average but higher teaching scores, which was partially masking the relationship between beauty and evaluations.
The Full Picture: What Really Drives Evaluations?
To get a comprehensive understanding, I built a full model including all potential factors, then used backward selection to identify the most important predictors. The final model revealed seven significant factors:
score = 3.97 + 0.22(male) - 0.28(non-English) - 0.006(age) + 0.004(percent_eval) + 0.44(one_credit) + 0.049(beauty_avg) - 0.22(color_photo)
Let’s break down each of these factors:
- Gender: Male professors receive ratings 0.22 points higher than female professors.
- Language Background: Professors educated at non-English speaking institutions receive ratings 0.28 points lower.
- Age: Younger professors receive higher ratings (0.006 points per year younger).
- Evaluation Participation: Classes where more students complete evaluations get higher ratings.
- Course Credits: One-credit courses receive ratings 0.44 points higher than multi-credit courses.
- Beauty Rating: More attractive professors receive higher ratings.
- Photo Color: Professors with black and white photos receive higher ratings than those with color photos.
Together, these factors explain about 16.3% of the variation in teaching scores.
Class Size and Evaluation Participation
One particularly interesting relationship I discovered was between class size and evaluation participation.

The data shows a clear pattern: as class size increases, the percentage of students completing evaluations tends to decrease. Small classes (fewer than 50 students) often achieve participation rates above 80%, while very large classes (over 200 students) typically see rates closer to 60%.

This pattern holds regardless of whether courses are upper or lower level.

This matters because if evaluations represent the opinions of a smaller proportion of students in larger classes, we might be getting a biased picture of teaching effectiveness in those courses.
Age and Beauty: An Uncomfortable Relationship
Another interesting finding emerged when I examined the relationship between professor age and beauty ratings.

The data reveals a negative correlation between age and beauty ratings. Younger professors tend to receive higher beauty ratings than older professors. This raises important questions about how our cultural beauty standards may be influencing teaching evaluations.

If younger professors are perceived as more attractive, and more attractive professors receive higher teaching evaluations, this could create a systematic bias against older faculty members — completely unrelated to their teaching ability.
What Does This All Mean?
The analysis reveals several uncomfortable truths about teaching evaluations:
- Bias exists: Factors unrelated to teaching effectiveness — including gender, age, and physical appearance — significantly influence student evaluations.
- Appearance matters: Despite explaining only a small portion of the variance, physical attractiveness consistently predicts higher evaluation scores.
- Gender inequality persists: Male professors receive higher evaluations than female professors with equivalent qualifications and beauty ratings.
- Age plays a role: Younger professors tend to receive higher ratings, potentially due to both direct age effects and indirect effects through beauty perceptions.
- Course characteristics matter: One-credit courses and classes with high evaluation participation receive better ratings.
These findings should give us pause about how we use teaching evaluations in academia. If factors unrelated to teaching quality systematically influence student ratings, can these evaluations be fair measures for promotion, tenure, and salary decisions?
Beyond UT Austin: Can We Generalize?
It’s important to note some limitations of this study. The data comes from a single university, representing a specific institutional culture and student population. The findings might not generalize perfectly to other institutions with different demographics or evaluation systems.
Additionally, the final model explains only about 16.3% of the variation in evaluation scores. This means that over 80% of what determines a professor’s rating isn’t captured by these variables — presumably including actual teaching quality!
Rethinking Teaching Evaluations
Given these findings, how should universities approach teaching evaluations?
- Use multiple measures: Complement student evaluations with peer observations, teaching portfolios, and learning outcome assessments.
- Control for bias: Consider adjusting evaluation scores to account for known biases related to gender, age, and course characteristics.
- Redesign evaluation forms: Create questions that focus specifically on teaching behaviors rather than general satisfaction.
- Educate students: Make students aware of potential biases when completing evaluations.
- Context matters: Interpret evaluations within the context of class size, level, and participation rates.
Conclusion: Beauty is Only Skin Deep (But Its Impact Isn’t)
The data tells a clear story: what we think of as “teaching evaluations” are influenced by factors far beyond teaching effectiveness. Physical appearance, gender, age, and even the color of a professor’s photograph all play measurable roles in how students rate their instructors.
This doesn’t mean we should abandon student evaluations entirely. Students’ perspectives provide valuable feedback on their learning experiences. However, we should be thoughtful about how these evaluations are used, especially for high-stakes decisions like tenure and promotion.
The next time you fill out a teaching evaluation, take a moment to reflect: are you really evaluating the quality of instruction, or might your ratings be influenced by factors that have nothing to do with how well your professor taught?
The analysis in this article was performed using R and a dataset from the study “Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity” by Hamermesh and Parker (2005). The complete code and analysis can be found on my GitHub repository [ https://github.com/olimiemma/Beauty-in-the-Classroom-Analysis-of-Professor-Evaluation-Data ].
About the Author
For over a decade, Emmanuel has explored the intersection of data science, human behavior, and educational assessment, driven by a curiosity about how we evaluate and measure performance.
With more than twenty years of experience as a software engineer, data analyst, and storyteller, Emmanuel has applied statistical analysis to challenges across various sectors including education, tech, marketing, and logistics. His work with global corporations and Non Profit like UNHCR and Right to Play has given him unique perspectives on how data impacts real-world decision-making.
This analysis of professor evaluations represents his ongoing interest in uncovering hidden biases in institutional assessment systems. By applying rigorous data science techniques to educational data, Emmanuel seeks to spark important conversations about how we measure teaching effectiveness and the unconscious factors that influence our judgments.
His passion for exploring data science and AI’s potential — shaped by real-world business acumen and a fascination with human decision-making — drives his writing and podcasting work with institutions like MIT’s OpenCourseWare and Yeshiva University’s Katz School of Science and Health in Manhattan New York, further extends his mission to bridge technical expertise with practical applications that benefit everyday users.
Discover more of Emmanuel’s work, hobbies etc: https://linktr.ee/olimiemma
Did you find this analysis interesting? How might these findings change your approach to completing or interpreting teaching evaluations? Share your thoughts in the comments below!
Comments
Post a Comment