Cohert and Case Control Studies

What is a Cohort?
General Term = group of people who share a common experience
—-Persons born in same year- birth cohort
—-Persons who share a common behavior
e.g., cohort of smokers; employees at the tire manufacturing plant
—-Persons in the same class- e.g., MPH cohort of spring, 2013

Cohort Studies
Exposed and unexposed individuals
are followed (forward in time) to
determine the incidence of disease
in each group

3 Types of Cohort Study Designs
1.) Prospective
2.) Retrospective

Prospective Cohert

Investigator collects information on
the exposure status of study subjects
at the time the study begins and
identifies new cases of disease that
develop from that time on, until the
end of the follow-up interval.

Retrospective Cohert
(Non-concurrent / historical)

Investigator determines exposure status
from information recorded at some time
in the past, and disease status is
determined from that point in the past up
until the present (i.e., the follow-up
period has already occurred)

Ambi-directional Cohort
(Combined nonconcurrent/concurrent)

Cohort and exposure status
identified from past records,
followed into the present, then
followed into the future. Most often
used when additional follow-up time
is needed.

Selection of Study Population
. General population sample –
Representative sample from the
general population
Makes results highly generalizable
Expensive, labor intensive, and may
have problem with loss to follow-up.
E.g., Framingham; Strong Heart

. Special cohort – Defined population
based on membership in a particular
subgroup of interest.
Follow-up may be easier
Results may not be as widely
E.g., Nurses Health Study; U.S.

Presence of a Distinctive Exposure –
Selected because they are known to be
exposed to a certain factor
Generally p used for occupational cohorts.
Cohorts may be stable and easy to follow.
Results may not be generalizable.
E.g., Atomic bomb survivors; persons
exposed to chemicals on the job; licensed

What is the Reference Group for the Exposed?
. Internal Comparison – from the same
population as the exposed group
e.g., unexposed in the population
(Framingham, low cholesterol levels)
2. External Comparison – outside of the
exposed group
e.g., general population data; another
study’s data
3. Combined – both internal and external
comparison groups can be used

Cohort Characteristics
Population based vs. Non-population
Open cohort (persons enter and leave
over the course of follow-up) vs. Closed
cohort (begin with a fixed study group;
persons may leave but no new members

Assessment of Exposure
1.Definition of exposure – what will
constitute “exposure”?
2. Sources of exposure data
• It i nervew
• Existing records
• Physical exam
3. Timing of exposure – onset
4. Quantifying exposure
• Frequency – how often?
Continuous or intermittent?
• Intensity – how much?
• Duration – how long?
5. Changes in exposure status – how to
deal with?

Measurement of Disease
Determine disease-free status at start of study
Outcomes must be clearly defined and
measurable – who is an incident case?
How will you determine whether or not disease
has occurred?
Procedures should be well-defined and the
same for both exposed and un-exposed groups
Best if assessment of disease incidence is done
blinded to exposure status

Measurement of Affiliation of Cohort Studies
Cumulative Incidence!!!

Outcome Measures in Cohort Studies: Cumulative Incidence
CI: all cases known to have occurred in
the baseline cohort during the follow-up
time divided by the study population at
baseline, per unit of time.
Numerator is # of incident cases;
Denominator is # people in study
Closed cohort
CI is a risk measure

ID: all cases known to have occurred in the
baseline cohort during the follow-up time,
divided by the amount of “at risk”
experience (usually in units of person-time)
contributed by all members of the cohort
Numerator is # of incident cases;
Denominator is a blend of # people and their
time at risk
Open or closed cohort
ID is a rate measure

Risk Ratio (RR) (Relative Risk)
CI(E-exposed)/ CI (noE-non-exposed) = RR

(a/(a+b))/(c/(c+d)) = RR

Interpretation of RR
RR = 1.0 No association between
exposure and outcome
RR > 1.0 Exposed are at a higher risk of
the outcome than the not
exposed ( i i i i ) d (positive association)
RR < 1.0 Exposed are at a lower risk of the outcome than the not exposed (negative association; preventive factor)

Attributable Risk (AR) in the Exposed
How much of the disease in people who
are exposed is due to the exposure?
(AR in the exposed = absolute
difference in incidence)
What percentage of the disease in the
exposed is due to the exposure?
(AR% in the exposed =percentage)

Population Attributable Risk (PAR)
How much of the disease in the
population is due to the exposure?
( )PAR = absolute difference)
What percentage of the disease in the
population is due to the exposure?
(PAR% = percentage)

Advantages of Cohort Design
1. More certain of temporal relationship
between exposure and disease
2. Multiple effects of a single exposure can be
3. Bias in ascertainment of exposure is
minimized (i.e., cannot be biased by
knowledge of outcome)
4. Can more easily assess changes in risk
factor status
5. Direct measurement of incidence of disease

Limitations of Cohort Design
Should initially include individuals
free of disease, but disease process
may have already begun, but not yet
2. Expense and time required may
limit feasibility
3. Loss to follow-up
4. Not suitable for rare diseases
(requires too large a cohort)

Summary of Cohort Design
Disease incidence is compared in
exposed and un-exposed
Comparisons for exposed group’s
experience can be internal, external or
Two measures of incidence can be
Cumulative incidence (measures risk
of disease)
Incidence density (measures rate
disease incidence)
Primary measures of association from a
cohort study are:
Relative risk or Rate ratio (RR)
= 1 0 i di t i ti b t .0 indicate no association between
exposure and disease incidence
>1.0 suggest positive association
<1.0 suggest protection (negative association) CAN ALSO CALCULATE OR

Case Control Study (What does it do?)
Selects the study population on the
basis of disease status
A case-control study begins with people
who have the disease (cases) and
compares them to people who don’t
have the disease (controls)
Compare the odds of past exposure to a
suspected risk factor between cases and

Case Control Study Example
Suppose we are interested in investigating
an association between childhood
cataracts and exposure to rubella virus in
Cases would be children with cataracts
Controls would be children without
For each child we would determine
whether or not their mother was exposed
to rubella during her pregnancy with that

Issues in Case Selection: Definition
Diagnostic criteria – Clearly defined,
objective, standardized criteria
Objective: to produce a uniform,
homogenous group of cases
Example: In a case-control study of
preterm delivery, cases would be
identified as babies born < 37 weeks of gestational age as defined by first trimester ultrasound measurement of crown-rump length.Criteria for eligibility - Clearly defined reasons to include/exclude cases E.g., by age, gender, potential for exposure Apply equally to cases and controls Example: A study of recent OC use and MI would exclude males and postmenopausal and surgically sterilized women because they have no risk of recent exposure. Including them would bias the results towards the null.

Case Control: Incident vs Prevalent Cases
Prefer to study incident cases
Why incident cases?
-Reduce potential for I/P bias
(P≈ IxD etiology v. survival/duration)
-Diagnoses more likely to be uniform,
using same criteria
-Recall of exposure may be better

Case Control Case Selection
Cases are selected without
reference to (without knowledge
of) their exposure history

(Base it on disease)

Case Control Sources of Cases (Hospital Based)
all cases admitted to a
single hospital or group of hospitals
within a given time period who meet
the eligibility criteria

Advantages: easier, cheaper

• Potential for biased sample of cases
• Referral patterns
• Only suitable for diseases that are
usually hospitalized

Case Control Sources of Cases (Community or Population Sample)
include all (or random sample)
cases in defined geographical area
within specified time period; use
registry if it exists

Advantages: representative case group

• Costly
• Time-consuming
• Difficult to do without registry

Sources of Cases (Other Sources)
Choices of case source depends on the disease/condition, especially if it is a one for which people are not uniformly hospitalized

Other Sources:
-Registries which are not population-based
-Large pre-paid insurance plans
-Retirement communities

Selection of Controls
Purpose: Controls provide a comparison
group for cases
Definition: Controls intended to
rep qy present the frequency of exposure in
the population from which the cases
Controls are selected from the same
source population as the cases were
Controls are free of disease under study

Controls usually similar to cases with
regard to past potential for exposure,
during same period of risk under study
Usually select controls in same manner
as cases selected in order to select from
the same source population
Eligibility criteria – any exclusion
criteria applied to the cases also applied
to the controls

Selection of Control Example
A case-control study of tonsillectomy and
lymphoma conducted in Iowa would
select cancer cases from the state-wide
SEER cancer registry.
Because cases are identified on a statewide basis, controls should also be chosen
on state-wide basis.
The “source population” for cases is the
State of Iowa.

Again Control Selection


Community/General Population Controls
Controls are selected from a random
sample of the general population
Random-digit dialing commonly used
Appropriate if cases are population based

Advantages/ Disadvantages of Community/General Population Controls
Advantages: Highly representative;
calculate population frequency of
exposure appropriate for population based cases
Disadvantages: Costly, problems with
refusal and phone coverage

Hospital-Based Controls
People seeking medical care at same
institution as cases for conditions unrelated
to disease under study
Exclude persons with diseases known or
suspected to be related to exposure under
The illness of the controls should have the
same referral patterns to the health care
facility as that of cases
May use multiple diagnoses

Advantages/Disadvantages of Hospital-Based Controls
Advantages: Captive population, clearly
identified, economical method, less
recall bias
Disadvantages: Potential for selection
bias, less generalizable

Neighborhood Controls
-Controls Selected from Same neighborhood as cases through canvassing or use of phone number

Advantages/Disadvantages of Neighborhood Controls
-Advantages: Provides controls of similar socioeconomic status and environment as cases

-Disadvantages: Overmatching possible, low response rates

Multiple Types of Controls
-Used to assess potential biases (e.g. recall) or to compensate for the deficiencies of other types of controls

Advantages / Disadvantages of Multiple Types of Control
-Advantages: Internal replicate of study: assess certain types of bias

-Disadvantages: Costly and time-consuming, if results differ, must be able to explain

Example of Case Control
A case-control study of brain tumors in
children included two different types of
Children with other solid tumors (“sick
A random sample of healthy children from
the same birth cohorts as cases (community
Purpose: to assess results for potential
recall bias

Case Control Question
In a case-control study of breast cancer
and pesticide exposure in Oklahoma,
cases of breast cancer are chosen from
the tumor registries of all major
hospitals in Oklahoma County. What is
the “source population” for these
cases? How should controls be picked?

Representativeness Vs. Comparability
Is goal to select cases and controls
that are representative of those
with and without the disease, or to
select cases and controls so they
are like each other in all ways

Relates to the generalizability of the
Advantages: increase generalizability,
reduces potential for certain types of
selection bias
Disadvantages: may be more difficult,
time consuming, resource intensive

Internal validity; cases and controls have
equal probability of past exposure if there
is no association between exposure and
Advantages: easier to detect smaller
differences; reduces chances for
unmeasured confounding
Disadvantages: may have select group of
cases, potential for Berkson’s bias
(hospital-based studies only)

Assessment of Exposure
Exposure is prior to disease onset (or
reference date for controls)
Techniques: Information on prior
exposure in cases and controls may be
ascertained through personal interview,
hospital or medical records,
employment records, pharmacy or lab
records, or direct measurement

Timing of exposure – when did it occur
in relation to disease onset or index

Quantification of exposure for dose response analysis
Includes amount, duration and
frequency of exposure.

Assessment of Exposure Example
Weinmann et al. (1994) conducted a
case control study of renal cell cancer in
relation to antecedent use of
antihypertensive medications within the
memb hi f K i ership of Kaiser-P t ermanente
For cases and their matched controls,
outpatient and inpatient medical records
were reviewed for information regarding
medication use up to a date three
months prior to the case’s diagnosis

Measure of Association in Case Control Studies

Odds Ratio (OR)
Ratio of two odds (the odds that
cases were exposed ÷ the odds that
controls were exposed)
What is an “odds”?
The odds of an event is defined as the
probability of an event occurring (P)
divided by the probability of the event
not occurring (1-P)
Odds = P ÷ (1-P)

In other words, odds that cases were
exposed = probability that cases were
exposed ÷ probability that cases were
not exposed. This is one of the two
odds that make up the odds ratio.

Example of Odds
If there are 100 smokers and 60 develop
a chronic cough, the probability of
smokers developing a cough is
60 / 100 = 60%
The probability of smokers not
developing a cough is 100% – 60% (or 40 /
100) = 40%
The odds of developing a cough are 60:40
or 1.5
Note how the odds of developing a cough
differs from the probability of developing
a cough

Odds Ratio (How to do it?)

Interpretation of OR
The odds of having smoked cigarettes
among cases of lung cancer are 14.0
times greater than the odds of having
smoked cigarettes among controls.

OR = 1.0 No association
between exposure
and outcome
OR > 1.0 Positive association
between exposure
and outcome
OR < 1.0 Negative or inverse association between exposure and outcome

Relative Risk?
You cannot, in most circumstances,
use case-control data to calculate
incidence and thus, you cannot
directly calculate relative risks from
case-cont l d t rol data
Study groups are chosen on the
basis of presence of disease, not
Odds ratios are good
approximations of the relative risk
if your study is properly designed

Matching of Cases and Controls
Definition: The process of selecting
controls so that they are similar to
cases for characteristics, such as age,
gender, race, or socioeconomic status,
that might be confounding variables.
Matching is one of the ways to deal with
confounding, by making cases and
controls similar across these

Types of Mismatching (Individual)
Individual (pair) matching – selecting one
or more controls for each individual case
e.g., for a case of MI who was a white
male aged 67 years, select one or more
controls without MI who are white,
male and 62-72 years of age – repeat
for each case in the study

Types of Mismatching (Frequency (group) Matching)
Frequency (group) matching – the
control group is selected so that its
distribution is similar to that of cases for
py gotentially confounding variables such as
age, gender and race. Requires that all of
the cases be selected first or that the
distribution is known.
e.g., if 68% of your MI cases are male,
68% of your controls are chosen to be

The odds of having been born high birth
weight are 2.57 times higher in children
with brain tumors than in children
without brain tumors.

Advantages of Case Control
. May be quicker and less expensive
compared to cohort studies.
2. Well suited for rare disease and
diseases with long latent periods.
3. Fewer subjects required than for
cohort studies.
4. Multiple etiologic factors can be

Disadvantages of Case Control
1. More potential sources of bias and
error than cohort studies
2. Temporal relationship – did exposure
cause disease or did disease cause
3. Appropriate control group may be
difficult to identify.

Summary of Case Control
Frequency of past exposure is compared
in persons with the disease (cases) and
persons without (controls)
Cases and controls should be selected
without any knowledge of their exposure
Controls provide an estimate of the
exposure frequency in the population
from which the cases arose

Measure of association from a casecontrol study = odds ratio
Ratio of the odds of exposure in cases
to the odds of exposure in the controls
=1.0 suggests no association between
exposure and disease
>1.0 suggests a positive association
<1.0 suggests a protective (negative) association