Research Methods: Chapter 8
much of the value of research depends on how data are collected
critical part is deciding what will be observed and what will not
In selecting a group of subjects for study, social science resorts oftenest some type of sampling. in general sampling refers to selecting part of population.
process of selecting observations
is ordinarily used to select observations for one of two related reasons
1)often not possible to collect info from all persons or other units we wish to study
2)often it is not necessary to collect data from all persons or other units
goal of all sampling: to reduce, or at least understand, potential biases that may be at work in selecting subjects
The ultimate purpose of sampling is to select a set of elements from a population in such a way that descriptions of those elements (sample statistics) accurately portray the parameters of the total population from which the elements are selected. Probability sampling enhances the likelihood of accomplishing this aim and also provides methods for estimating the degree of probable success
ex. in studying cases in a criminal court, we may not be able to examine all cases so we select a sample to represent that population of all cases processed through some court.
2)we may want to generalize from a sample to an unobserved population the sample intended to represent
ex. if we interview a sample of community residents, we may want to generalize our findings to all community residents-those we interviewed and those we did not.
ex. our sample of criminal court cases can be generalized to the population of all criminal court cases
a sample of individuals form a population, if it is to provide useful descriptions of the total population, must contain essentially the same variations that exist in the population.
A special type of sampling that enables us to make statistical generalizations to a larger population
a method of selection in which each member of a population has known chance or probability of being selected
knowing the probability that any individual member of a population could be selected makes it possible for us to make predictions that our sample accurately represents the larger population
if all members of a population are identical in all respects-demographic characteristics, attitudes, experiences, behaviors, and so on-there is no need for careful camping procedures. any sample will be sufficient. but in reality humans varying many ways because any real population is heterogeneous.
ex. flipping a coin
two reasons for using this
1) this procedure serves as a check on conscious or unconscious bias on the part of the researcher. Because a researcher who selects cases on a intuitive basis might choose cases that will support his or her research expectations or hypothesis. Random selection erases this danger
2) we can draw on probability theory, which allows us to estimate population parameters and to estimate how accurate our statistics are likely to be
1) though never perfectly representative, are typically more representative than other types of samples because they avoid the biases
2)permits us to estimate the accuracy or representativeness of the sample. can provide an accurate estimate of success or failure, because they enable us to draw on probability theory
How close sample statistic is to the population parameter and access to the math
enable us to make relatively few observations and then generalize from those observations a much wider population
ex. if we are interested in what proportion of high school students have used marijuana, collecting data from a probability same of a few thousand students will serve just as well as trying to study every high school student in the country.
helps researchers generalize from observed cases to unobserved ones
has its own logic and can provide useful samples for criminal justice inquiry
ex. if selecting lawyers, if the researcher is intimidated by intimidating lawyers then they may not chose them for the sample because they do not want to go up to them.
The researcher might make a conscious effort to interview every 10th lawyer who enters the courthouse, but he still cannot be sure of a representative sample because different typos of lawyers visit the courthouse with different frequencies and some never go to the courthouse at all. so the resulting sample will overrepresent lawyer who visit the courthouse more often
polls linked to web blogs, text messages or email cannot be trusted to represent the general population
some techniques can help us avoid bias
ex. if the population contains fifty percent women, a representative sample will include “close to” five percent women
samples do not have not be representative in all respects: representativeness is limited to those characteristics that are relevant to the substantive interests of the study
this principle forms the basis of probability sampling
even carefully selected samples are seldom, if ever, perfectly representative of the populations which they are drawn
in survey research elements are typically people or certain types of people
in Cj research-other kinds of units can be the elements like correction facilities, police beats, or court cases.
ex. the vague “delinquents” might be the target for a study, a more precise description of the population includes the definition of the element “delinquent” (a person charged with a delinquent offense) and the time referent for the study ( charged with a delinquent offense in the previous six months)
ex. the average income income of all families in a city and the age distribution of the city’s population are parameter.
important portion of criminal justice research involves estimating population parameters on the basis of sample observations
are used make estimates of population parameters
ex. the average income computed from a sample and the age distribution of that sample are statistics, and those statistics are used to estimate income and age parameters
a list of elements in a population that is used to select a sample
the list or quasi-list of elements from which a probability sample is selected, or the list or quasi list for our target population (quasi list because even though an actual list might not exist, we can draw samples as if there were a list)
Once sampling frame established-this can be produced by assigning a single number to each element in the frame, not skipping any number in the process. A table of random numbers or computer is used for generating them is then used to select elements for the sample
rarely used, not most efficient method
the researcher chooses all elements in the list for inclusion in the sample. if list contains 10,000 elements and we want a sample of 1000 we elect every 10th element for our sample. to ensure against any possible human bias, we should select the first element at random. we begin by selecting a random number between 1 and 10. the element having that number, plus every 10th element following it, is included in the sample
danger=if the list of elements is arranged in a cyclical pattern that coincides with the sampling interval, a biased sample may be drawn.
if considering systematic sample from a list, we have to carefully examine the nature of that list. if the elements are arranged in any particular order, we have to figure out whether that order will bias the sample to be selected and take steps to counteract
so superior to simple random sampling because of convenience
it is a method for obtaining a greater degree of representativeness-decreasing the probable sampling error
sampling error is reduced by two factors in the sample design 1)a large sample produces a smaller sampling error that a small sample does and 2) a homogeneous population produces samples with smaller sampling errors than a heterogenous population does. So if 99% of the population agrees with a certain statement, it is extremely unlikely that any probability sample will greatly misrepresent the extent of agreement. If the population is split 50-50 on the statement, then the sampling error will be much greater
Stratified based on this second factor in sampling theory
Rather than selecting our sample from the total population at large, we select appropriate numbers of elements from homogeneous subsets of that population.
ultimate function is to organize the population into homogeneous subset and to select the appropriate number of elements from each. The choice of stratification variables typically depends on what variables are available.
ensures the proper representation of the stratification variables to enhance representation of other variables related to them.
it is more likely to be more representative of number of variables than is a simple random sample
if only a small number of people in a population exhibit some attribute or characteristic of interest, then a large sample must be drawn to produce adequate numbers of elements that exhibit the uncommon condition
way of obtaining sufficient numbers of these rare cases by selecting a number disproportionate to their representation in the population
ex. national crime survey in which one goal is to obtain some minimum number of crime victims in a sample. Because crime victimization for certain offenses such as robbery or rape is rare on a national scale, persons who live in large urban areas, where serious crime is more common are disproportionate
beginning in 2004, the BCD disproportionality oversampled areas served by smaller police forces to produce a large enough number of cases to statistically represent rural areas
is simplified by the existence of a national list of something close to addresses. The post code address file (PAF) lists postal delivery points nationwide and is further subdivided to distinguish small users, those addresses receiving lecithin fifty items per day.
Postcode sectors roughly corresponding to five digit zip codes, are easily defined clusters of addresses from the PAF. Samples of addresses are then selected from within these sectors. in addition they use booster samples in increases the number or respondents who were ethnic minority or ages 16-24, because victimization experiences of ethic minorities were of special interest to police and other public officials.
Multicluster sampling is used in this
So this samples a disproportionate number of minority and young residents, who are more likely to be victims of crime. sampling procedures for this simpler than those for NCVS, because a suitable sampling frame exists at the national level.
ex. population of cities, state, nation or all police officers in the U.S
such a design to measure this involves the initial sampling of groups of elements-clustures-followed by the selection of elements within each of the selected clusters
may be used when it is either impossible or impractical to compile an exhaustive list of the elements that compose the target population
often though population elements are already grouped into subpopulations and a list of those subpopulations either exists or can be created
ex. U.S law enforcement officers are employed by individual cities, countries or states it is possible to create lists of those political units. for cluster sampling then we could sample the list of cities, countries and states in some manner like using systematic sample. Next obtain lists of law enforcement officers form agencies in each of the selected jurisdictions. Then sample each other lists to provide samples of police officers for studying.
Cluster sampling involves the reputation of two basic steps: listing and sampling
Subject to two sampling errors-
1-the initial sample of clusters represents the population clusters only within a range of sampling error.
2-the sample of elements selected within given cluster represents all the elements in that cluster only within a range of sampling error
general guideline-maximize the number of clusters selected while decreasing the number of elements within each cluster. Efficiency of cluster sampling is based on the ability to minimize the list of population elements.
stratification can take place at each level of sampling. the elements listed within a selected cluster might be stratified before the next stage of sampling.
only grabbing percentages you need
NCVS seeks to represent the nationwide population of persons age 12 and over who are living in households. “living in households'” is important because the NCVS producers are not designed to sample homeless people or people who live in institutional settings, like military.
Because there is no national list of households in the U.S, multistage cluster sampling must be used proceed from larger units to households and their residents.
National sampling frame used in the first stage defines primary sampling units (PSUs) as large metropolitan areas, nonmetropolitan counties, or groups of contiguious counties (to represent rural areas). the largest 93 PSUs are specified as self representing and are automatically included in the first stage of sampling. the remaining PSUs are stratified by size, population density, reported crimes, and other variables
An additional 152 non self reporting PSUs are then selected with a probability proportionate to the population of the PSU.
Second Stage-involves designating four different sampling frames within each PSU. each of these frames is used to select different types of subsequent units. first the housing unit frame lists addresses of housing units from census records.Second a group quarters frame lists group quarters such as dormitories and rooming houses. Third, a building permit frame lists newly constructed housing units form local government sources. Fourth, an area frame lists census blocks (physical geographic units) from which independent address lists are generated and sampled. these four frames are necessary because up to date lists of residential addresses are not available in this country .
So in total, it starts with demographic units handworks down to selection of housing units
so first NCVS uses proportionate sampling to select a large number of respondents who may then represent the relatively rare attribute of victimization
could be because probability sampling is inappropriate
ex. not a list of all auto thieves, nor are we going to be able to create anything other than a partial and highly selective list
the likelihood that any given element will be selected is not known
also use this to represent patterns of complex variation
Pretesting a questioner is another situation in which purposive sampling is common
ex. if we plan to study peoples attitudes about court ordered restitution for crime victims, we might want to test the questionnaire on a sample of crime victims. Instead of selecting a probability sample of the general populations, we might select some number of known crime victims, perhaps from court records
seldom produces any data of any general value, it may be useful to pretest a questionnaire but it should not be used for a study purportedly describing students as a whole
can be appropriate in some situations, best justified if the researcher wants to study the characteristics of people who are passing the sampling point at some given time.
begins by identifying a single subject or small number of subjects and then asking the subjects to identify others like him or her who might be wiling to participate in the study
Cj research on active criminals or deviants frequently uses this
essentially variations on purposive samples and on samples of available subjects
most appropriate when it is impossible to determine the probability that any given element will be selected in a sample. also may be necessary when the target population is difficult to locate or even identify