00:01
Hello and welcome to epidemiology. You know
sometimes my clients or my research partners
bring data they've already collected to me
and ask me to analyze it and the problem is
they've arranged the data or collected it
in such a way that is not really useful or
amenable to easy analysis. So today we're
going to learn about data and data analysis
and some of the concepts underlying data.
So after today's lecture, you're going to
understand the limitations of quantifying
data. You're going to be able to identify
the different types of measurement that a
variable can embody and again variables make
up our data. You're going to know why the
normal curve is important in statistics. And
you're going to understand the difference
between type I and type II error, also a fundamentally
important concept in statistics and data analysis.
00:49
So let's begin by asking the question, what
is measurement? What do you think measurement
is? You measure things all the time, you measure
your weight, you measure your height, maybe
you measure certain qualities of a patient's
blood sample. Measurement is when we assign
a quantity to a quality, ultimately there
is a quality we're trying to assess or learn
about, and we quantify it in order to do math
on it. So a value that may change within the
scope of a problem is a variable. That's what
a variable is, it is something that is changing
all the time as opposed to a constant, there
are constants in life, there are variables
in life. Data analysis is all about processing
the relationship between variables and constants
between each other. So in the world of mathematics
a variable can be written as X, it's a place
keeper, it's just a registry that we later
fill with a number and perform functions on.
01:43
In research, a variable is a logical set of
attributes like gender, age, something we
want to learn about. And in computer science,
a variable is just a symbolic name given to
an unknown quantity. So the word variable
is used in a variety of contexts depending
upon the discipline that you come from. So
when we're defining a variable, there are
actually two components to consider, there
is a conceptual component and an operational
component. The concept is when we think about
the thing we're trying to measure conceptually,
the operational component is when we define
the variable itself and it's that operational
aspect on which the mathematics is performed.
So consider this, consider if I'm trying to
measure happiness. Now devise a scale for
measuring happiness from 1 to 5 and if I ask
you, on a scale of 1 to 5, how happy are you
and you say 3. Okay, I can do math on that
number now, but I've lost a kind of nuance
about the thing I was trying to measure, which
is the happiness. That always happens. The
variables operational characteristics I do
my statistics on, but we can't forget, there
is a conceptual underlying philosophy that's
important as well. So a variable is a kind
bucket for containing quality or information.
02:55
It isn't the quality or information itself
is, it's just the container. I perform statistics
or mathematics on the bucket, ultimately I
want to derive meaning and philosophy and
importance from the quality inside the bucket.
03:09
There are two flavors of variables and we've
covered this in a previous lecture, the two
flavors are continuous and categorical. So
a continuous variable is something like age
or height or distance or temperature. It's
a measurement that has meaning in between
its values. I can be 25 years old. I can be
25.5 years old. I can be 25.51 years old.
03:31
There's still meaning there. On the other
hand, a categorical or discrete variable doesn't
have meaning in between. Age group, gender,
number of children, number of siblings, where
I was born. There is nothing in between that
gives me meaning. Those are the two general
categories. I can create a dichotomous categorical
variable that means it has two levels, dichotomous
means having two levels, like sex, male or
female, or disease presence, yes or no. I
can dichotomize an existing continuous variable.
In other words I can create a two level categorical
variable from an existing continuous variable.
For example, I can take age, which is a continuous
flowing concept and create age group out of
it, maybe under 18 versus 18 and over.