thinktoomuch.net

Pondering the South African Memesphere – Looking for the Good in Everything

thinktoomuch.net header image 2

Diet Coke Makes You Fat — On Correlation and Causality

November 2nd, 2010 · Posted by Hugo · 5 Comments

A piece of wisdom I got from my father:

Diet Coke makes you fat! Easy to prove too, simply look at the people drinking Diet Coke, they’re all fat!

Wise words indeed, ;) because they were said in jest, illustrating a very important point. While discussing this important point, ignore the fact that both thin and fat people drink both “Diet” and normal drinks. With two variables under consideration, (A) people drinking Diet Coke, and (B) people that are fat:

  • Discovery: all people drinking Diet Coke are fat — (A) and (B) is correlated
  • Naive conclusion: consuming Diet Coke makes you fat — correlation doesn’t mean (A) causes (B) (see footnote 1)
  • Rather more likely: fat people are ordering Diet Coke because they want to lose weight: being fat causes Diet Coke consumption — maybe rather (B) causes (A)!

This joke illustrates the fact that correlation does not mean causation. For another example, consider “people that smoke” and “people with lung cancer”. Simply finding that these two are correlated does not mean that smoking causes cancer!

In fact, after the second world war, cigarette companies made some efforts to dismiss the “smoking causes lung cancer” conclusion by insisting it was unscientific. I haven’t read their claims, but the “correlation does not mean causation” could certainly have been an argument they could use.

Of course the argument that having lung cancer causes you to smoke is silly, but there is a third possibility: the presence of a third variable (C) which causes both (A) and (B). Consider, for example, the possibility of a genetic trait that causes someone to smoke and also causes cancer. In such a situation it would be possible that choosing to smoke or not smoke would make no difference to a person’s final fate.

Establishing a causal connection is thus more challenging. The simplest way to do so would be to set up a randomised trial in which you directly manipulate the hypothesised cause: take your group of volunteers, non-smokers, then randomly tell half of them to smoke and the other half not to. Then after a decade or two, see if the smoking half has more cancer. By making the selection in a random manner, you rule out the possibility of the unknown third factor (C) being the cause of the cancer.

Darn ethics!

So yes, having a complete lack of an ethical code would make science so much easier. ;) You cannot go around condemning half of a population to suffer cancer simply for the purposes of science! (An assertion of moral values here.) Could you maybe insist half of a group of smokers must stop smoking instead? And what, insist that the other half must continue?

How about volunteers? Well, if people get to voluntarily choose between smoking and not smoking, you will have a problem ruling out self-selection bias: some unknown factor (C), be it a genetic predisposition or something else, could influence what group people choose. Consequently we won’t be able to rule out the possibility of (C) also affecting someone’s chance of getting cancer.

Even simply informing people, prior to them deciding to volunteer, that they must be prepared to either continue smoking, or quit smoking, and stick with it for the duration of the study, creates the possibility of self-selection bias affecting your results.

Luckily we often don’t have to resort to direct experimentation to establish scientific fact: there are numerous other things we could also study to determine causality. Consider that lung cancer rates have greatly increased with the advent of smoking, then you already show a likely connection. (See footnote 2.)

So just to be abundantly clear: we do know for a fact that smoking increases the chance of lung cancer. This post merely used the smoking/cancer example as a way to illustrate the challenges that have to be overcome to test scientific hypotheses, such that they may become proven theories.

A contemporary example, and the moderation of skepticism

Just today a friend shared a post titled Why More Equality?. It shares the following fact about rich countries:

In rich countries, a smaller gap between rich and poor means a happier, healthier, and more successful population.

That is correlation. From that, the post makes some claims, including:

If the UK were more equal, we’d be better off as a population. For example, the evidence suggests that if we halved inequality here:

  • Murder rates would halve
  • Mental illness would reduce by two thirds
  • Obesity would halve
  • Imprisonment would reduce by 80%
  • Teen births would reduce by 80%
  • Levels of trust would increase by 85%

This claim makes a causal connection: “if we fix equality, the rest will follow”. This is very likely true, and they may have done their research well, but they have not provided the data to back this claim on that page. Ditto for this claim:

More economic growth will NOT lead to a happier, healthier, or more successful population. In fact, there is no relation between income per head and social well-being in rich countries.

Again, I believe that claim is correct, but supporting data has not been provided. The more cautious and clearly correct statement would be:

Data does not show that countries with greater economic growth is happier. In fact, there is no relation between income per head and social well-being in rich countries.

The claim that it has not been demonstrated to lead to a happier life versus the claim that it does not lead to a happier life: nuance. When it comes to science journalism, one of the mistakes journalists and publications make way too often is to assert causality while the study cited makes no such claim.

Ponder these possibilities, most ludicrous, some might be interesting to think about: might having a less obese society contribute to economic equality? Might high crime rates cause economic inequality? (Murder is mentioned, as is imprisonment.) How about high rates of mental illness? ;) Does increased trust in a society aid wealth sharing? Or, is there a (C) that causes both increased trust and increased equality? Might there be a (C), say, a good education for all, that causes fewer teen births as well as greater economic equality?

Thought experiments that really do suggest equality would help. A good education aids equality and a lot of the other things. But also, economic equality helps everyone get a good education. (Clearly correlated.) As xkcd points out:


Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing “look over there”.


Footnote 1: I’m making no statements about the benefits or damage caused by the consumption of Diet drinks. Doing a Google Search for “Diet Coke makes you fat” gives a bunch of results making claims about this statement — this post does not take part in that discussion, which I have not researched. I do dislike artificial sweeteners though, and I do prefer to avoid them due to both personal taste and rumoured or proven health concerns.

Footnote 2: There could of course also be a third factor causing both higher cancer rates and an increased number of smokers. Discounting other correlations for a moment, it could be trade and chocolate: maybe chocolate causes cancer, and increased global trade in industrialised nations causes an increased consumption of both chocolate and tobacco, resulting in more smoking as well as an increased cancer rate in the population affected. ;)

Footnote 3: yea, I’m using an xkcd comic. Because this xkcd comic doesn’t suck. When I discovered it via my bookmarks on causation and correlation, while wrapping up this post, I was surprised by how well it fits this post.

Questions and discussions encouraged! (As are more examples of correlation vs causation.)

Categories: Science
Tags: · ·

5 responses so far ↓

  • 1 Zach // Nov 24, 2010 at 8:18 am

    Do you think the book on which that page is based (The Spirit Level) demonstrates only correlation as well? Because admittedly that web page is skimpy.
    Have you had a chance to read the book? It’s quite fascinating, but I’m afraid I can’t recall how well it establishes causation.
    It’s written by an expert in epidemiology, so one would have thought this is the first issue they would address.

    http://www.amazon.co.uk/Spirit-Level-Societies-Almost-Always/dp/1846140390

    Naturally there’s a blog dedicated to debunking it. Written by our capitalist overlords, or at the very least a fan of Ayn Rand no doubt:
    spiritleveldelusion.blogspot.com

  • 2 Wim Conradie // Dec 31, 2010 at 4:13 pm

    Hi Hugo

    Amazing fantastic post!

    Thanks!

  • 3 Kenneth Oberlander // Jan 16, 2011 at 9:40 pm

    How’d I miss this last year? Nice post Hugo.

    A canonical (heh) example: the inverse relationship between pirate numbers and global warming, according to the gospels of the Flying Spaghetti Monster.

    Happy New Year, by the way. Pity we couldn’t meet up for a beer while you were in SA, Hugo.

  • 4 Hugo // Jan 26, 2011 at 2:00 am

    Thanks Kenneth! I’ll be back in town towards the end of February actually, better luck then maybe. And before then I should have finished a post about double-blind testing, written in the context of powerbalance bracelets. ;)

  • 5 Wim Conradie // Feb 6, 2011 at 12:09 am

    Looking forward to that post!

Leave a Comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>