Monday, November 5, 2007

Simpson's Paradox explained...

The apparent paradox arises because, taking the drug is correlated with Gender.
In the above example, say men are more likely to take the drug than women. Say 75% of men take the drug while only 25% of women take the drug.
Let me explain it with some numbers, say women and men are equally represented in the population. (say 100 men and 100 women) Then there will be 75 men who will take the drug and 25 women who will take the drug. From the statistics mentioned earlier among those who took the drug, 52.5 men (70%) will be cured and 5 women(20%) will be cured. So 57.5% of those who took the drug are cured. By similar reasoning, 20 men who did not take the drug are cured and 30 women who did not take the drug are cured. i.e. 50% of those who did not take the drug are cured which explain the surprising result.

The moral of the story is that the right probabilistic query to ask the model is not the observational query P(cure|drug) but the causal query P(cure|do(drug))

It is interesting how Simpson's paradox has at times been used to explain altruism in a Darwinian setting wherein natural selection inherently disadvantages individuals who confer benefits on their competitors. The Stanford Encyclopedia explains this fairly well. The summary is that although seemingly counter-intuitive, populations generally sustain altruistic individuals and do not get run over with selfish individuals thus relieving the lazy/inefficient/incapable individuals of some evolutionary stress to survive ;)