00:01
bias is when an erroneous conclusion can arise
from how we select our subjects. There are
two kinds of selection biases really, in a
broad sense. The first kind is when there's
a systematic difference between individuals
in one group of my study and individuals in
another group of my study, in a way that I
did not intend, and the other kind is when
there is a systematic difference in those that
have been selected for a study versus those
who weren't selected. So consider this example.
Let's say you're trying to measure the average
height of all Americans and to do so we're
going to take a sample of some Americans and
extrapolate that data to the entire country
and my sample is made up of 100 professional
basketball players, do you see the problem?
I think you do. Basketball players are typically
much taller than most Americans and so the
sample that I'm using is going to overestimate
the average height of all Americans, that's
a selection bias. Or maybe I'm interested
in measuring the relationship between socioeconomic
status or SES and health and I'm going to
do so by sending out a flyer for people to
come meet me in a church basement at 11 AM
at which point I'm going to give them a questionnaire
to ask about their health and their SES status.
01:09
Now what kind of people are going to show
up in a church basement at 11 AM on a weekday,
think about it, what do you think? Well I
think the kinds of people that are going to
show up are those people who typically don't
have jobs, and if you haven't got a job, you're
going to be probably of a lower socioeconomic
status, so I'm selecting an enriched extreme
population from which I'm going to derive
a relationship between SES and health and
I'm going to extrapolate that relationship
to describe the general trend of SES and health,
which may not be appropriate because my sample
is specific and my sample is of those with
low SES, that's a selection bias. Now consider
another example, let's say I'm doing a study
on antibiotic completion rates among different
ethnicities and I'm doing the study in central
Europe. Now I'm going to collect individuals
from a central immobile location, let's say
an office somewhere and I'll keep track of
individuals, how much they're conforming to
their antibiotic schedule. Now nomadic individuals,
the Roma, are going to be lost to follow-up,
they are going to move away, I won’t know
what their antibiotic completion rate is going
to be, so I will lose their data, we call
that a kind of lost to follow-up bias.
02:29
Now let's look at a very famous example of
selection bias. In 1981 there was a study
in the New England Journal of Medicine and
it showed an association between drinking
coffee and getting pancreatic cancer, this
was all over the news and it made people very
upset and very scared, because we love our
coffee and we don't like our pancreatic cancer.
02:46
The problem is, is that study was fraught
with selection bias, it was a case-controlled
study. If your remember what a case-control
study is, it begins by ascertaining the outcome
status, so we find people who have a disease,
people who don't have a disease and we look
backwards to see what their exposure was.
So in this case-control study, the cases were
people with pancreatic cancer, the controls
were other people in the hospital, that's
important, we typically choose our controls
to be as similar to the cases as possible
to avoid any extraneous variables getting
in the way, but they're different in the sense
that one group has the outcome we care about,
pancreatic cancer, and the other one doesn't.
03:30
So in this study, they look backwards to see
how many people in both groups had been exposed
to coffee to an extreme extent. So do you
see the bias here? Well commonly in case-control
studies, the biases tend to arise from how
you select the controls, and in this study
that was definitely the case. The doctors
chose their controls from other gastrointestinal
patients in the same hospital wing, there
are pancreatic cancer patients and other patients
undergoing the same experience in the same
wing, perhaps having the same doctors, but
they had gastrointestinal disease. As a result,
there is a bias in the sense that those with
G.I. disorder were less likely to have drunk
coffee recently. As a result, that reduced
the exposure of coffee in that group and as
a comparison that caused us to artificially
give a sense that the pancreatic cancer group
had a greater consumption of coffee than the
G.I. group, that's not the case. Compared
to the general population, they are exactly
the same. So that created a spurious association
between coffee and cancer that does not appear
in the general population, so we conclude
today that probably cancer is not associated
with pancreatic cancer. So that study had
a bias that caused us to make a very serious
conclusion that was erroneous.