00:01
Hello and welcome to epidemiology. You know
when a lot of people think about epidemiology,
they tend to conflate it with statistics and
in fact they think epidemiologists are sometimes
statisticians and that's a mistake. We use
statistics as a tool in what we do, but we're
not statisticians. We do however like to understand
what statistics is all about. Now I'm not
a fan of statistics, I don't think I'm very
good at it and yet I teach it, and I've got a
PhD in biostatistics, that should tell you
something, it should tell you that if you
are not confident in this topic, it's okay,
it's okay to stumble through it, you will
be fine and because I'm fine with it as well,
it's not that difficult. Today we're going
to learn about the meaning of P-values and
confidence intervals and the null hypothesis.
00:45
These are the foundation for what we call
frequentist statistics. We're also going to
learn about the differences between data and
information and what makes data into information.
00:56
And we're going to learn about the differences
between different types of variables, the
kinds of variables that exist tell us what
kind of statistical tests we want to apply.
01:07
So as I mentioned there is a close relationship
between epidemiology and statistics, but they
are not the same thing, statistics is a tool
used by epidemiologists. What are these numbers,
53, 61 and 62, absent of any context, they
mean nothing to you, you could perform math
on them, you can compute their mean, their
differences, you can add them all up and multiply
it by something else, that doesn't give you
any information. But if I tell you that these
are the ages of Barack Obama, Angela Merkel
and Vladimir Putin as of mid-2015, suddenly
you have context, the numbers are data. When
I add context to those numbers, they become
information, it is the information that I
care about. Suddenly if I took the average
of these numbers, I get an average age of
3 leaders, it means something now. Otherwise
their numbers on which I perform mathematical
functions and I know nothing about the context
anymore or what to conclude from that. So
I'm trying to do a distinction here between
mathematics, research, computer technology
and other kinds of application of numbers.
02:18
When we talk about a variable, a variable
is a place keeper for an idea and that idea
we can perform mathematical functions on to learn
more things about larger ideas. In mathematics,
a variable is a value that may change within
a problem or scope of a problem. In research
a variable can be a logical set of attributes,
like gender or age, where you live, what state
you live in, where you're born, that sort
of thing. In computers, a variable is a symbol
given to an unknown quantity, as a register,
a space to keep a number in, I can perform
math on that as well. I could depict these
ideas as symbols in math, X is the common
place keeper for a variable, in research X
might code for a patient's age or in computers,
it could be a string, depending upon what
computer language I'm using. So in mathematics
we write a relationship between two variables
as a function, for example, F(x) = 210 – x.
03:20
Now that tells me that every time I have a
value of X, I subtract that from 210 and I
get a value for F(x), I haven't got any context
yet. On the other hand if I tell you that
this is a function for measuring the directed
maximum heart rate when you're working out,
suddenly you have context. X is your age,
210 minus your age gives you the heart rate
that you target during your workout, it has
meeting now, it has context. In mathematics
again, your heart rate is the dependent variable,
the X is the independent variable. The X is
free to be whatever it wants, but the heart
rate depends on X, the age. In epidemiology
we rename those constants, exposure and outcome.
I don't mean constants, I mean variables.
04:07
So X, my age, is an exposure, the heart rate
is the outcome, it's the same idea though.
04:13
I can consider this to be cancer rate and
smoking rate, that has perhaps a bit more
meaning in an epidemiological public health
context. So the smoking rate is the exposure,
the cancer rate is the outcome.