Is there a relationship between weight and success in PhD programs?

TLDR: No. Details on caveats, background, methods and specific results below.

On June 2nd 2013, Geoffrey Miller suggested on Twitter that there might be a strong relationship between excess body fat and adverse outcomes in PhD programs, mediated by willpower.

Predictably, a firestorm of indignation ensued, leading Miller to delete the tweet, lock his account and apologize.

This unfortunate episode opens a whole host of issues, including whether scientists should tweet (a notion that many scientists, including Miller answer in the affirmative). Another one is the state of educational attainment in PhD programs – only slightly more than half of those who enter finish with a PhD within 10 years. A third concern is the inherent communication structure of twitter itself. It is almost impossible to lay out any argument with the necessary nuance. Not in 140 characters. Thus, there are clear biases towards absolute statements. Combined with the potential for global (re)sharing of a tweet, this is a recipe for this kind of thing to happen over and over again, particularly if one was to hold non-PC beliefs (defined as those who are likely to inflame passions). The strong form of Miller’s thesis (likely molded by Twitter’s 140 character limit) could be refuted by a single counterexample, a fact that was not lost on those who took exception to the statement.

This leads to the core of the matter, which is the thesis itself. A lot of people found the claim implausible even at face value. Yet, it is not inconceivable to craft a rationale that is internally consistent. For instance, one could argue that grad school is inherently stressful and taxes one’s willpower to the max (as evidenced by the extremely high attrition rates). If one further allows that willpower might be a finite resource (as is indeed suggested by leading figures in willpower research) and also assumes that increased body fat is a good proxy for lack of willpower, one might pose the hypothesis that those with limited willpower resources might be at risk for higher attrition rates.

However, in the complex world of social science, nothing is ever simple or straightforward. This is not physics. For instance, one could plausibly argue that people will anticipate the extremely high willpower demands of grad school and focus on their studies at the expense of everything else, willpower being a limited resource and all (as an aside, this is what happened to me – I gained 50 pounds in grad school and didn’t even really notice it). Indeed, there are so many potential confounds that one would have a hard time coming up with any clear-cut prediction. Are people who enter PhD programs (a highly self-selective population) as athletic as the rest of the population (on average)? Are they outside as much? (There are known links between weight and Vitamin D levels). In other words, the link between body weight, the probability of entering a PhD program to begin with and other lifestyle choices is extremely complex and far from simple, even in absence of the willpower connection. Ironically, recent research suggests that eating those carbs might *help* find the willpower to finish that PhD, but only if you believe in it.

So what is one to make of this? The tried and true way to settle issues of this kind in the past several hundred years has been an empirical one. If the question hasn’t already been answered in the literature, collect some data and see which model of reality is most likely, given the data.

As this issue has – to my knowledge – not been answered within existing literature, one has to collect fresh data. Fortunately, the internet now affords the capability of rapid data collection. I was in the lucky position to obtain a suitable dataset. To be clear, even with the wonders of the internet age, it is much easier to make a sweeping statement than to pursue it scientifically. For instance, obesity (the category in the original tweet) is defined by body mass index (BMI). This is an extremely simple measure that normalizes body weight by height, but has many obvious shortcomings. There is mounting evidence that body fat percentage, not BMI per se would make a better marker of obesity (an issue that is compounded by the fact that muscle is much more dense than fat, and thus weighs more). However, BMI is a measure that is extremely easy to assess, whereas body fat percentage is not. Most people do not have ready access to a dexascan or any of the other semi-reliable measurement methods. So BMI is still widely used. A more serious problem is the fact that the data needed to address the issue is the BMI at the beginning of a program, not the current BMI of people who obtained their PhD. However, I would not trust anyone to reliably recall their weight over such long time periods, so the data would have to be collected prospectively – as people enter a PhD program (which is prohibitively costly in terms of resources). A reasonable response? Perhaps to use the data one has, not the data one might want or wish to have and see how far one gets. An even more serious issue is the self-selective nature of survey participation. Given the intensity of the controversy, combined with the fact that some people probably feel insecure about their weight as well as educational outcomes in a systematic fashion, this is a serious concern. However, it should be noted that the vast majority of psychological studies suffer from this problem. Samples of convenience are the norm, representative samples rare. This doesn’t keep people from publishing high profile papers. And yes, everyone knows at this point (that’s a good thing!) that surveys are not experiments, so the data will be entirely correlational in nature.

Being mindful of all these concerns, what do we have?

167 people who claimed to have started and finished (or dropped out from) a PhD program participated in the survey. Of these, 161 answered questions about both their time to completion and BMI. The data of one participant was excluded because she reported a BMI of 3.8, which is incompatible with life (probably a typo), leaving data from 160 participants in the analysis.

The descriptives (see figure 1) look plausible. This is what I would expect a time to completion histogram for PhDs to look like (although I would like to meet the person who claims to have finished within a single year). Given the existing literature on PhD completion rates, the only notable thing is the relative absence of ABDs. But who can blame them? Perhaps this is a sour subject that they would rather not revisit. BMIs ranged from 18 to 53, which also seems reasonable.


Figure 1. y-axis: Counts. x-axis left panel: Time to completion (PhD). A = Abandoned. Right panel: BMI

So far, so good. Onward to the all-important relational analysis. The correlation between time to completion and BMI is a whopping -0.035, p = 0.67. In other words, there is absolutely no trace of a linear relationship between these variables. More evidence of absence than absence of evidence. Note that one can actually legitimately use a Pearson correlation here, as both scales (time to completion and BMI) have the necessary qualities. However, I will spare us a scatterplot here, as the “time to completion” scale is rather granular and the datapoints basically form a big blob. Put differently, the lack of observed correlation is not due to a few outliers.

Time to completion vs. BMI

Figure 2. y-axis: BMI. x-axis: Time to completion. S: Short (1-3 years, n = 18), N: "Normal" (4-6 years, n = 103), L: Long (7+ years, n = 33), A: Abandoned (n=6). Error bars represent standard error of the mean.

There is no statistically reliable difference in terms of BMI between those who took longer for their PhD and those who finished in a “normal” timeframe. The same goes for those who abandoned their hopes of a PhD (p = 0.38). If anything, one could make an argument that those who finished fastest (within 3 years) were somewhat on the heavier side, but there is a lot of variation in this group. And who knows? Maybe they finished fastest because they knew what they were doing, suggesting that they were older students and statistically likely to be heavier (BMI increases with age, on average).

Now, one could bemoan the fact that most people (more than half of this sample) finished their PhD within 4-6 years and are thus put into a single group here. Besides, the original claim was about obesity. So let’s revisit the data with flipped axes in figure 3.

Figure 3

Figure 3. y-axis: Time to completion in years. a-axis: UW = Underweight (n = 1), NW = Normal weight (n = 89), OW = Overweight (n = 37), OB = Obese (n = 27). Error bars represent standard error of the mean.

Looking at this, one could spin an exciting yarn about how underweight people – spending all their precious willpower resources on staying thin – take much longer than average to finish their PhD. Alas, the underweight “group” in this sample consisted of a single person, so this would not be a legitimate claim. As above, there is no evidence of a linear relationship. Doing a direct comparison between the normal weight group (mean time to completion 5.3 years) and the obese group (mean time to completion 5.0 years) is not significant, p = 0.35 and the “trend” is in the wrong direction.

There were other questions in the survey, such as time since PhD and whether body weight was stable since. If there was any clear trend, this information would allow to weigh more reliable datapoints (from people with recent PhDs and unchanged weights) more heavily. Similarly, one could do an analysis by gender or field. But I don’t see any reason to water down the power of the dataset by such tesselations at this point, as the dataset is not large enough for that kind of thing. But others can explore this, if they want to. Speaking of power: I expect a charge that this investigation is underpowered. It is. Yet, I would have expected *some* trace of some relationship, if it was real. Also, the power issue is easy to remedy. Simply collect more data. Moreover, it should be understood that this is what could be done in a few days. Certainly, people are free to do that, perhaps with refined hypotheses (although I’m not optimistic for strong effects, given all the above).

To summarize, there is not even a hint of a shred of supporting evidence for the original hypothesis. A total nonresult. Less exciting? Not necessarily. Just ask Michelson and Morley. Note that a non-result is actually meaningful here. One could argue that the self-selective nature of the survey – presumably full of people with an axe to grind who want to stick it to Miller – would produce a strong negative correlation. But that is not what we see. Also, judging from the colorful comments, not all participants were aware of the controversy to begin with. One thing to note about null-results is that just because *this* specific way of probing the relationship didn’t work does *not* mean that there is no possible relationship. This is due to the asymmetric nature of scientific logic and generally true for studies based on inferential statistics that report null-results, not just here. However, the lack of relationship found here is so resounding that I will here. This should not discourage others who want to explore this further. Another issue is that I – deliberately – stayed from constructs such as conscientiousness or willpower. Like IQ, their measurement is not without substantial controversy in itself, as there are no objective operationalization criteria. Staying clear of these, I decided to focus on the directly observable quantities BMI and educational attainment, without the need for operationalization, which is legitimate given the initial hypothesis.

Now that we know the empirical side of things, I do not believe we should “break the staff” (as the Germans say) over Geoffrey Miller. We routinely give second chances to people who – by their own admission – lead a parasitic existence, contributing nothing. Miller has contributed important research – and we regrettably all make mistakes on a daily basis, so there is perhaps no reason to single him out. The ethics of casting stones and all that…

Miller aside, there is an even bigger issue at stake here, namely the communication between science and society. Personally, I am only interested in the truth status of statements. I suspect many scientists feel the same way, let the dice fall there they may. That is – however – not how society at large generally tends to react. There is typically a lot of passion and vested interests. People *want* things to be true, or *not* to be true. The dialogue between science and society is notoriously tricky. Since the times of Galileo and Darwin, scientists have routinely come under personal attack for proposing extremely unpopular claims. Sadly, the tried and true approach is to attack the source of the unwelcome statements. And it does work, as the sad case of Semmelweis illustrates. In this case, the claim was not true. But what if it had been? It must be possible for scientists to express what they sincerely believe to be the truth, no matter how inconvenient. But how? How should scientists communicate with the public? How should the public treat scientists? What is the mutual responsibility? Besides the obvious need to show mutual sincere respect, I have no ready answers for any of this and would be curious what people think about it. If you do have a position, please feel free to express it in the comments below.

This entry was posted in Science, Social commentary. Bookmark the permalink.

4 Responses to Is there a relationship between weight and success in PhD programs?

  1. John says:

    I think it’s ironic that the mistake Geoffrey Miller is making is largely the same one he makes with his thoughts on eugenics. “Willpower” is probably not a single, defined thing just like “intelligence” is not a single, defined thing. There are many kinds of ‘intelligence’ (as I’ve gone on about ) and there’s no reason to think that there is an absolute combination that is “best” for society.

    Similarly, willpower is more than just working memory limits–and we use the term in many places for different things. The willpower to set aside 2 hours for studying is different than the willpower to not buy that bag of chips which is different than not finishing all the chips in one sitting. Sure there might be some correlations, but they’re almost definitely not absolute. Geoffrey Miller seems to think in absolutes.

    Anyway, I’m glad your data came out to be so normal–painting sentient creatures in absolutes bothers my ego.

  2. Joshua Gowin says:

    The data certainly tell the tale. Nice that you were able to get so many responses. Thanks for sharing.

  3. Krista says:

    I was just considering this the other day. I began contemplating if there is a specific type of person who is inclined towards pursuing a PhD and was attempting to mentally construct what this person would look like. As it happens, the majority of my contact with PhD candidates has been through the geology and natural resources departments. As far as I can recall, they have mostly been apparently normal weighted with few exceptions, I wouldn’t feel comfortable calling those exceptions “obese”. Perhaps this is due the physically intensive nature of those two departments.

    I am glad to see your data. I would hate for someone to be negatively influenced by Miller’s blanket statement and lose the confidence to pursue a PhD program.

  4. Harold says:

    You’ve given Miller far too much credit. His original claim was that all obese applicants would be unable to complete a PhD. If you had shown a strong statistical tendency for obese students to do badly in PhD programs, but found one obese student who completed a PhD, you would have refuted his claim.

    If you had found a generic tendency for obese students to do better or worse in PhD programs, this would raise a valid social concern. Such a study could predispose to discrimination against obese (or non-obese) students, who should be judged solely on their own individual merits, assuming that there is diversity within each class. Such a study would be cherry-picked by those with an emotional or cultural prejudice (a category to which Miller seems to belong), in order to rationalize their discrimination.

    I don’t have an easy answer to these questions, but one thought is that research on humans that can serve no purpose except to reinforce or refute irrational discrimination against individuals is a waste of time. It isn’t needed to rebut discrimination, which is already wrong enough (for example we all know that there is surely at least one talented obese graduate student out there somewhere, and that scorning a great applicant on the grounds of their weight would be stupid), and it runs the risk of being used to rationalize discrimination.

Leave a Reply

Your email address will not be published.

six × = 6