There are many mistakes one can make when setting up a scientific study. Awareness of such mistakes and biases is very important so that we can avoid drawing false conclusions from bad data. Today I’m briefly introducing selection bias.
Suppose we are doing a survey in which we need to talk to a large number of people, we want to collect statistically representative opinions on privacy concerns. We pick up the phone book, pick random names from the list and start dialling. That ought to be quick and practical, right?
No it won’t. It would be a bad idea, and for more reasons than just the fact that we’re making unsolicited calls (“cold calling”): people more obsessed over privacy are less likely to be listed in the phone book, thus giving our survey a sampling bias — the opinions of the privacy conscious would not be adequately represented in our results. Combined with that there will be the subset of people we phone that will refuse to talk to us. This subset might also correlate in some way with opinions relevant to the survey, resulting in further bias in our results.
Let’s consider another cause of sampling bias: self-selection bias. Suppose you were to openly look for volunteers to take part in some kind of study of human sexuality. You could end up with a disproportionately large number of liberals and people with exhibitionist tendencies, and too few conservative, shy or repressed participants. Unless you can somehow accurately compensate for this effect, your conclusions would not generalise to the population at large. They would be biased by self-selection, your study would be under the impression that humanity is more liberal and exhibitionist than it really is.
If you are ever going to rely on volunteering for a study, you have to be sure that the variables being studied is completely unrelated and uncorrelated to the criteria by which people end up self-selecting. Easier said than done.
There are many more types of selection bias. Wikipedia lists a bunch.