How data is collected

how we collect representative data is fundamental to criminal justice research

much of the value of research depends on how data are collected

critical part is deciding what will be observed and what will not

much of the value of research depends on how data are collected

critical part is deciding what will be observed and what will not

Sampling

selecting some units of a larger population for further study

In selecting a group of subjects for study, social science resorts oftenest some type of sampling. in general sampling refers to selecting part of population.

process of selecting observations

is ordinarily used to select observations for one of two related reasons

1)often not possible to collect info from all persons or other units we wish to study

2)often it is not necessary to collect data from all persons or other units

goal of all sampling: to reduce, or at least understand, potential biases that may be at work in selecting subjects

The ultimate purpose of sampling is to select a set of elements from a population in such a way that descriptions of those elements (sample statistics) accurately portray the parameters of the total population from which the elements are selected. Probability sampling enhances the likelihood of accomplishing this aim and also provides methods for estimating the degree of probable success

In selecting a group of subjects for study, social science resorts oftenest some type of sampling. in general sampling refers to selecting part of population.

process of selecting observations

is ordinarily used to select observations for one of two related reasons

1)often not possible to collect info from all persons or other units we wish to study

2)often it is not necessary to collect data from all persons or other units

goal of all sampling: to reduce, or at least understand, potential biases that may be at work in selecting subjects

The ultimate purpose of sampling is to select a set of elements from a population in such a way that descriptions of those elements (sample statistics) accurately portray the parameters of the total population from which the elements are selected. Probability sampling enhances the likelihood of accomplishing this aim and also provides methods for estimating the degree of probable success

Selecting Samples

1)want to select samples to represent some larger population of people or other things

ex. in studying cases in a criminal court, we may not be able to examine all cases so we select a sample to represent that population of all cases processed through some court.

2)we may want to generalize from a sample to an unobserved population the sample intended to represent

ex. if we interview a sample of community residents, we may want to generalize our findings to all community residents-those we interviewed and those we did not.

ex. our sample of criminal court cases can be generalized to the population of all criminal court cases

a sample of individuals form a population, if it is to provide useful descriptions of the total population, must contain essentially the same variations that exist in the population.

ex. in studying cases in a criminal court, we may not be able to examine all cases so we select a sample to represent that population of all cases processed through some court.

2)we may want to generalize from a sample to an unobserved population the sample intended to represent

ex. if we interview a sample of community residents, we may want to generalize our findings to all community residents-those we interviewed and those we did not.

ex. our sample of criminal court cases can be generalized to the population of all criminal court cases

a sample of individuals form a population, if it is to provide useful descriptions of the total population, must contain essentially the same variations that exist in the population.

Probability Sampling

sampling in which the probability that an element will be included in a sample is known

A special type of sampling that enables us to make statistical generalizations to a larger population

a method of selection in which each member of a population has known chance or probability of being selected

knowing the probability that any individual member of a population could be selected makes it possible for us to make predictions that our sample accurately represents the larger population

if all members of a population are identical in all respects-demographic characteristics, attitudes, experiences, behaviors, and so on-there is no need for careful camping procedures. any sample will be sufficient. but in reality humans varying many ways because any real population is heterogeneous.

A special type of sampling that enables us to make statistical generalizations to a larger population

a method of selection in which each member of a population has known chance or probability of being selected

knowing the probability that any individual member of a population could be selected makes it possible for us to make predictions that our sample accurately represents the larger population

if all members of a population are identical in all respects-demographic characteristics, attitudes, experiences, behaviors, and so on-there is no need for careful camping procedures. any sample will be sufficient. but in reality humans varying many ways because any real population is heterogeneous.

Probability Sampling; Random Selection

the key to this process is random selection, insures each element has an equal chance of being selected independent of any other event in the selection process.

ex. flipping a coin

two reasons for using this

1) this procedure serves as a check on conscious or unconscious bias on the part of the researcher. Because a researcher who selects cases on a intuitive basis might choose cases that will support his or her research expectations or hypothesis. Random selection erases this danger

2) we can draw on probability theory, which allows us to estimate population parameters and to estimate how accurate our statistics are likely to be

ex. flipping a coin

two reasons for using this

1) this procedure serves as a check on conscious or unconscious bias on the part of the researcher. Because a researcher who selects cases on a intuitive basis might choose cases that will support his or her research expectations or hypothesis. Random selection erases this danger

2) we can draw on probability theory, which allows us to estimate population parameters and to estimate how accurate our statistics are likely to be

Advantages of probability sampling

two special advantages

1) though never perfectly representative, are typically more representative than other types of samples because they avoid the biases

2)permits us to estimate the accuracy or representativeness of the sample. can provide an accurate estimate of success or failure, because they enable us to draw on probability theory

1) though never perfectly representative, are typically more representative than other types of samples because they avoid the biases

2)permits us to estimate the accuracy or representativeness of the sample. can provide an accurate estimate of success or failure, because they enable us to draw on probability theory

How close sample statistic is to the population parameter and access to the math

enable us to make relatively few observations and then generalize from those observations a much wider population

ex. if we are interested in what proportion of high school students have used marijuana, collecting data from a probability same of a few thousand students will serve just as well as trying to study every high school student in the country.

helps researchers generalize from observed cases to unobserved ones

Non-probability sampling techniques

although probability sampling is central to criminal justice research, it cannot be used in many situations of interest

has its own logic and can provide useful samples for criminal justice inquiry

has its own logic and can provide useful samples for criminal justice inquiry

Sampling Bias

simply means that those selected are not typical or representative of the larger population of which they have been chosen. This kind of bias is virtually inevitable when a researcher picks subjects casually.

ex. if selecting lawyers, if the researcher is intimidated by intimidating lawyers then they may not chose them for the sample because they do not want to go up to them.

The researcher might make a conscious effort to interview every 10th lawyer who enters the courthouse, but he still cannot be sure of a representative sample because different typos of lawyers visit the courthouse with different frequencies and some never go to the courthouse at all. so the resulting sample will overrepresent lawyer who visit the courthouse more often

polls linked to web blogs, text messages or email cannot be trusted to represent the general population

some techniques can help us avoid bias

ex. if selecting lawyers, if the researcher is intimidated by intimidating lawyers then they may not chose them for the sample because they do not want to go up to them.

The researcher might make a conscious effort to interview every 10th lawyer who enters the courthouse, but he still cannot be sure of a representative sample because different typos of lawyers visit the courthouse with different frequencies and some never go to the courthouse at all. so the resulting sample will overrepresent lawyer who visit the courthouse more often

polls linked to web blogs, text messages or email cannot be trusted to represent the general population

some techniques can help us avoid bias

Sample

is representative of the population from which it is selected if the aggregate characteristics of the sample closely approximate those same aggregate characteristics for the population

ex. if the population contains fifty percent women, a representative sample will include “close to” five percent women

samples do not have not be representative in all respects: representativeness is limited to those characteristics that are relevant to the substantive interests of the study

ex. if the population contains fifty percent women, a representative sample will include “close to” five percent women

samples do not have not be representative in all respects: representativeness is limited to those characteristics that are relevant to the substantive interests of the study

Equal probability of selection method

basic principle of probability sampling is that a sample will be representative of the population from whiteout is selected if all members of the population have this

this principle forms the basis of probability sampling

even carefully selected samples are seldom, if ever, perfectly representative of the populations which they are drawn

this principle forms the basis of probability sampling

even carefully selected samples are seldom, if ever, perfectly representative of the populations which they are drawn

Sample Element

that unit about which information is collected and that provides the basis of analysis

in survey research elements are typically people or certain types of people

in Cj research-other kinds of units can be the elements like correction facilities, police beats, or court cases.

in survey research elements are typically people or certain types of people

in Cj research-other kinds of units can be the elements like correction facilities, police beats, or court cases.

Population

theoretically specified forgoing of study elements

ex. the vague “delinquents” might be the target for a study, a more precise description of the population includes the definition of the element “delinquent” (a person charged with a delinquent offense) and the time referent for the study ( charged with a delinquent offense in the previous six months)

ex. the vague “delinquents” might be the target for a study, a more precise description of the population includes the definition of the element “delinquent” (a person charged with a delinquent offense) and the time referent for the study ( charged with a delinquent offense in the previous six months)

Population Parameter

value for a given variable in a population.

ex. the average income income of all families in a city and the age distribution of the city’s population are parameter.

important portion of criminal justice research involves estimating population parameters on the basis of sample observations

ex. the average income income of all families in a city and the age distribution of the city’s population are parameter.

important portion of criminal justice research involves estimating population parameters on the basis of sample observations

Sample Statistic

summary description of a given variable in the sample

are used make estimates of population parameters

ex. the average income computed from a sample and the age distribution of that sample are statistics, and those statistics are used to estimate income and age parameters

are used make estimates of population parameters

ex. the average income computed from a sample and the age distribution of that sample are statistics, and those statistics are used to estimate income and age parameters

Sampling frame

our target population is all adult residents. in order to draw an actual sample, we need some sort of list of elements in our population. such a list is called a sampling frame.

a list of elements in a population that is used to select a sample

the list or quasi-list of elements from which a probability sample is selected, or the list or quasi list for our target population (quasi list because even though an actual list might not exist, we can draw samples as if there were a list)

a list of elements in a population that is used to select a sample

the list or quasi-list of elements from which a probability sample is selected, or the list or quasi list for our target population (quasi list because even though an actual list might not exist, we can draw samples as if there were a list)

Simple Random Sampling

forms the basis of probability theory and the statistical tools we use to estimate population parameters, standard error, and confidence intervals. Such statistics assume unbiased sampling, and simple random sampling is the foundation of unbiased sampling

Once sampling frame established-this can be produced by assigning a single number to each element in the frame, not skipping any number in the process. A table of random numbers or computer is used for generating them is then used to select elements for the sample

rarely used, not most efficient method

Once sampling frame established-this can be produced by assigning a single number to each element in the frame, not skipping any number in the process. A table of random numbers or computer is used for generating them is then used to select elements for the sample

rarely used, not most efficient method

Systematic Sampling

Simple random sampling typically requires a list of elements. when such a list is available, researcher usually uses this sampling rather than simple random sampling.

the researcher chooses all elements in the list for inclusion in the sample. if list contains 10,000 elements and we want a sample of 1000 we elect every 10th element for our sample. to ensure against any possible human bias, we should select the first element at random. we begin by selecting a random number between 1 and 10. the element having that number, plus every 10th element following it, is included in the sample

danger=if the list of elements is arranged in a cyclical pattern that coincides with the sampling interval, a biased sample may be drawn.

if considering systematic sample from a list, we have to carefully examine the nature of that list. if the elements are arranged in any particular order, we have to figure out whether that order will bias the sample to be selected and take steps to counteract

so superior to simple random sampling because of convenience

the researcher chooses all elements in the list for inclusion in the sample. if list contains 10,000 elements and we want a sample of 1000 we elect every 10th element for our sample. to ensure against any possible human bias, we should select the first element at random. we begin by selecting a random number between 1 and 10. the element having that number, plus every 10th element following it, is included in the sample

danger=if the list of elements is arranged in a cyclical pattern that coincides with the sampling interval, a biased sample may be drawn.

if considering systematic sample from a list, we have to carefully examine the nature of that list. if the elements are arranged in any particular order, we have to figure out whether that order will bias the sample to be selected and take steps to counteract

so superior to simple random sampling because of convenience

Stratified Sampling

not an alternative to random or systematic selection, it represents a possible modification inter use.

it is a method for obtaining a greater degree of representativeness-decreasing the probable sampling error

sampling error is reduced by two factors in the sample design 1)a large sample produces a smaller sampling error that a small sample does and 2) a homogeneous population produces samples with smaller sampling errors than a heterogenous population does. So if 99% of the population agrees with a certain statement, it is extremely unlikely that any probability sample will greatly misrepresent the extent of agreement. If the population is split 50-50 on the statement, then the sampling error will be much greater

Stratified based on this second factor in sampling theory

Rather than selecting our sample from the total population at large, we select appropriate numbers of elements from homogeneous subsets of that population.

ultimate function is to organize the population into homogeneous subset and to select the appropriate number of elements from each. The choice of stratification variables typically depends on what variables are available.

ensures the proper representation of the stratification variables to enhance representation of other variables related to them.

it is more likely to be more representative of number of variables than is a simple random sample

it is a method for obtaining a greater degree of representativeness-decreasing the probable sampling error

sampling error is reduced by two factors in the sample design 1)a large sample produces a smaller sampling error that a small sample does and 2) a homogeneous population produces samples with smaller sampling errors than a heterogenous population does. So if 99% of the population agrees with a certain statement, it is extremely unlikely that any probability sample will greatly misrepresent the extent of agreement. If the population is split 50-50 on the statement, then the sampling error will be much greater

Stratified based on this second factor in sampling theory

Rather than selecting our sample from the total population at large, we select appropriate numbers of elements from homogeneous subsets of that population.

ultimate function is to organize the population into homogeneous subset and to select the appropriate number of elements from each. The choice of stratification variables typically depends on what variables are available.

ensures the proper representation of the stratification variables to enhance representation of other variables related to them.

it is more likely to be more representative of number of variables than is a simple random sample

Disproportionate stratified sample

purposely produce samples that are not representative of a population on some variable

if only a small number of people in a population exhibit some attribute or characteristic of interest, then a large sample must be drawn to produce adequate numbers of elements that exhibit the uncommon condition

way of obtaining sufficient numbers of these rare cases by selecting a number disproportionate to their representation in the population

ex. national crime survey in which one goal is to obtain some minimum number of crime victims in a sample. Because crime victimization for certain offenses such as robbery or rape is rare on a national scale, persons who live in large urban areas, where serious crime is more common are disproportionate

if only a small number of people in a population exhibit some attribute or characteristic of interest, then a large sample must be drawn to produce adequate numbers of elements that exhibit the uncommon condition

way of obtaining sufficient numbers of these rare cases by selecting a number disproportionate to their representation in the population

ex. national crime survey in which one goal is to obtain some minimum number of crime victims in a sample. Because crime victimization for certain offenses such as robbery or rape is rare on a national scale, persons who live in large urban areas, where serious crime is more common are disproportionate

British Crime Survey (BCS)

nationwide survey of people age 16 and older in England and Wales. Over its first 20 years the BCS selectively oversampled people or areas to yield larger numbers of designated subjects than would result for proportionate random samples for the population

beginning in 2004, the BCD disproportionality oversampled areas served by smaller police forces to produce a large enough number of cases to statistically represent rural areas

is simplified by the existence of a national list of something close to addresses. The post code address file (PAF) lists postal delivery points nationwide and is further subdivided to distinguish small users, those addresses receiving lecithin fifty items per day.

Postcode sectors roughly corresponding to five digit zip codes, are easily defined clusters of addresses from the PAF. Samples of addresses are then selected from within these sectors. in addition they use booster samples in increases the number or respondents who were ethnic minority or ages 16-24, because victimization experiences of ethic minorities were of special interest to police and other public officials.

Multicluster sampling is used in this

So this samples a disproportionate number of minority and young residents, who are more likely to be victims of crime. sampling procedures for this simpler than those for NCVS, because a suitable sampling frame exists at the national level.

beginning in 2004, the BCD disproportionality oversampled areas served by smaller police forces to produce a large enough number of cases to statistically represent rural areas

is simplified by the existence of a national list of something close to addresses. The post code address file (PAF) lists postal delivery points nationwide and is further subdivided to distinguish small users, those addresses receiving lecithin fifty items per day.

Postcode sectors roughly corresponding to five digit zip codes, are easily defined clusters of addresses from the PAF. Samples of addresses are then selected from within these sectors. in addition they use booster samples in increases the number or respondents who were ethnic minority or ages 16-24, because victimization experiences of ethic minorities were of special interest to police and other public officials.

Multicluster sampling is used in this

So this samples a disproportionate number of minority and young residents, who are more likely to be victims of crime. sampling procedures for this simpler than those for NCVS, because a suitable sampling frame exists at the national level.

Multistage Cluster Sample

sampling frame not readily available

ex. population of cities, state, nation or all police officers in the U.S

such a design to measure this involves the initial sampling of groups of elements-clustures-followed by the selection of elements within each of the selected clusters

may be used when it is either impossible or impractical to compile an exhaustive list of the elements that compose the target population

often though population elements are already grouped into subpopulations and a list of those subpopulations either exists or can be created

ex. U.S law enforcement officers are employed by individual cities, countries or states it is possible to create lists of those political units. for cluster sampling then we could sample the list of cities, countries and states in some manner like using systematic sample. Next obtain lists of law enforcement officers form agencies in each of the selected jurisdictions. Then sample each other lists to provide samples of police officers for studying.

Cluster sampling involves the reputation of two basic steps: listing and sampling

Subject to two sampling errors-

1-the initial sample of clusters represents the population clusters only within a range of sampling error.

2-the sample of elements selected within given cluster represents all the elements in that cluster only within a range of sampling error

general guideline-maximize the number of clusters selected while decreasing the number of elements within each cluster. Efficiency of cluster sampling is based on the ability to minimize the list of population elements.

ex. population of cities, state, nation or all police officers in the U.S

such a design to measure this involves the initial sampling of groups of elements-clustures-followed by the selection of elements within each of the selected clusters

may be used when it is either impossible or impractical to compile an exhaustive list of the elements that compose the target population

often though population elements are already grouped into subpopulations and a list of those subpopulations either exists or can be created

ex. U.S law enforcement officers are employed by individual cities, countries or states it is possible to create lists of those political units. for cluster sampling then we could sample the list of cities, countries and states in some manner like using systematic sample. Next obtain lists of law enforcement officers form agencies in each of the selected jurisdictions. Then sample each other lists to provide samples of police officers for studying.

Cluster sampling involves the reputation of two basic steps: listing and sampling

Subject to two sampling errors-

1-the initial sample of clusters represents the population clusters only within a range of sampling error.

2-the sample of elements selected within given cluster represents all the elements in that cluster only within a range of sampling error

general guideline-maximize the number of clusters selected while decreasing the number of elements within each cluster. Efficiency of cluster sampling is based on the ability to minimize the list of population elements.

Sampling Units

population elements, or aggregates of those elements

Multistage Cluster Sampling with Stratification

we can use stratification techniques to refine and improve the sample being selected.

stratification can take place at each level of sampling. the elements listed within a selected cluster might be stratified before the next stage of sampling.

only grabbing percentages you need

stratification can take place at each level of sampling. the elements listed within a selected cluster might be stratified before the next stage of sampling.

only grabbing percentages you need

NCVS with sampling

varius parts of the NCVS have been modified since the surveys were been in 1972 they basic sampling strategies have remained relatively constant.

NCVS seeks to represent the nationwide population of persons age 12 and over who are living in households. “living in households'” is important because the NCVS producers are not designed to sample homeless people or people who live in institutional settings, like military.

Because there is no national list of households in the U.S, multistage cluster sampling must be used proceed from larger units to households and their residents.

National sampling frame used in the first stage defines primary sampling units (PSUs) as large metropolitan areas, nonmetropolitan counties, or groups of contiguious counties (to represent rural areas). the largest 93 PSUs are specified as self representing and are automatically included in the first stage of sampling. the remaining PSUs are stratified by size, population density, reported crimes, and other variables

An additional 152 non self reporting PSUs are then selected with a probability proportionate to the population of the PSU.

Second Stage-involves designating four different sampling frames within each PSU. each of these frames is used to select different types of subsequent units. first the housing unit frame lists addresses of housing units from census records.Second a group quarters frame lists group quarters such as dormitories and rooming houses. Third, a building permit frame lists newly constructed housing units form local government sources. Fourth, an area frame lists census blocks (physical geographic units) from which independent address lists are generated and sampled. these four frames are necessary because up to date lists of residential addresses are not available in this country .

So in total, it starts with demographic units handworks down to selection of housing units

so first NCVS uses proportionate sampling to select a large number of respondents who may then represent the relatively rare attribute of victimization

NCVS seeks to represent the nationwide population of persons age 12 and over who are living in households. “living in households'” is important because the NCVS producers are not designed to sample homeless people or people who live in institutional settings, like military.

Because there is no national list of households in the U.S, multistage cluster sampling must be used proceed from larger units to households and their residents.

National sampling frame used in the first stage defines primary sampling units (PSUs) as large metropolitan areas, nonmetropolitan counties, or groups of contiguious counties (to represent rural areas). the largest 93 PSUs are specified as self representing and are automatically included in the first stage of sampling. the remaining PSUs are stratified by size, population density, reported crimes, and other variables

An additional 152 non self reporting PSUs are then selected with a probability proportionate to the population of the PSU.

Second Stage-involves designating four different sampling frames within each PSU. each of these frames is used to select different types of subsequent units. first the housing unit frame lists addresses of housing units from census records.Second a group quarters frame lists group quarters such as dormitories and rooming houses. Third, a building permit frame lists newly constructed housing units form local government sources. Fourth, an area frame lists census blocks (physical geographic units) from which independent address lists are generated and sampled. these four frames are necessary because up to date lists of residential addresses are not available in this country .

So in total, it starts with demographic units handworks down to selection of housing units

so first NCVS uses proportionate sampling to select a large number of respondents who may then represent the relatively rare attribute of victimization

Non-probability Sampling

sampling in which the probability that an element will be included in a sample is not known

could be because probability sampling is inappropriate

ex. not a list of all auto thieves, nor are we going to be able to create anything other than a partial and highly selective list

the likelihood that any given element will be selected is not known

could be because probability sampling is inappropriate

ex. not a list of all auto thieves, nor are we going to be able to create anything other than a partial and highly selective list

the likelihood that any given element will be selected is not known

Purposive/Judgemental Sampling

sometimes appropriate to select a sample based on our own knowledge of the population, its elements, and the nature of our research aims-in short based on our judgement and the purpose of the study

also use this to represent patterns of complex variation

Pretesting a questioner is another situation in which purposive sampling is common

ex. if we plan to study peoples attitudes about court ordered restitution for crime victims, we might want to test the questionnaire on a sample of crime victims. Instead of selecting a probability sample of the general populations, we might select some number of known crime victims, perhaps from court records

also use this to represent patterns of complex variation

Pretesting a questioner is another situation in which purposive sampling is common

ex. if we plan to study peoples attitudes about court ordered restitution for crime victims, we might want to test the questionnaire on a sample of crime victims. Instead of selecting a probability sample of the general populations, we might select some number of known crime victims, perhaps from court records

Convenience Samples

relying on available subjects-stopping people at a street corner or some other location.

seldom produces any data of any general value, it may be useful to pretest a questionnaire but it should not be used for a study purportedly describing students as a whole

can be appropriate in some situations, best justified if the researcher wants to study the characteristics of people who are passing the sampling point at some given time.

seldom produces any data of any general value, it may be useful to pretest a questionnaire but it should not be used for a study purportedly describing students as a whole

can be appropriate in some situations, best justified if the researcher wants to study the characteristics of people who are passing the sampling point at some given time.

Snowball Samples

commonly used in field research studies or qualitative interviewing

begins by identifying a single subject or small number of subjects and then asking the subjects to identify others like him or her who might be wiling to participate in the study

Cj research on active criminals or deviants frequently uses this

essentially variations on purposive samples and on samples of available subjects

most appropriate when it is impossible to determine the probability that any given element will be selected in a sample. also may be necessary when the target population is difficult to locate or even identify

begins by identifying a single subject or small number of subjects and then asking the subjects to identify others like him or her who might be wiling to participate in the study

Cj research on active criminals or deviants frequently uses this

essentially variations on purposive samples and on samples of available subjects

most appropriate when it is impossible to determine the probability that any given element will be selected in a sample. also may be necessary when the target population is difficult to locate or even identify