By definition, a subset of a population selected for study is a

sample

The distinction between descriptive and inferential statistics is that

descriptive statistics describe data sets, inferential statistics involve generalizing to populations

A characteristic whose value may change from one individual to another is a

variable

According to Chebyshev’s Rule, at least what percent of data is within 5 standard deviations of the mean?

96

The Empirical Rule can be used when assessing a distribution if

the distribution is approximately normal

A treatment that has no active ingredients is a

placebo treatment

A study cannot be an experiment if

a procedure of random assignment to treatments is not performed

The term used to describe the bias that occurs if some segment of a population is systematically excluded form a sample is

selection bias

The z-score and percentile are measures of

variability

Jane Doe and Richard Roe are law students. Their scores on the mid terms and finals in the Statistics for Law class, together with the means and standard deviations for these exams are given below:

Midterm: JD 60, RR 50, Mean 40, Stand. Dev. 10

Final: JD 60, RR 50, Mean 80, Stand. Dev. 15

Midterm: JD 60, RR 50, Mean 40, Stand. Dev. 10

Final: JD 60, RR 50, Mean 80, Stand. Dev. 15

Jane did better than Richard on both exams

(do z score = x-mean/stand dev.)

(do z score = x-mean/stand dev.)

The bias that occurs because observations were not made of all individuals selected for a sample is

non-response bias

By definition, a simple random sample is one such that

each possible sample of size n has an equal chance of being selected

The sampling frame is the

list of all elements in a population

An experiment is a planned intervention undertaken to observe the effects of

explanatory variables

Two variables are confounded if

their effects on the response variable cannot be distinguished

Randomization, as a strategy in experimental design, would be unsuccessful if

aspects of the experimental condition other than the values of the explanatory variable, systematically favors a treatment

Blocking would be unsuccessful if

the blocks were heterogeneous on the blocking factor

The design strategy of making multiple observations for each experimental condition is

replication

In utilizing direct control, which of the following are held constant

values of an extraneous variable

In a study of dexterity, data was taken on 100 individuals. The variables measured for each person were the number of errors in picking up a dime in 50 trials, and the total time it took to pick up the dime 50 times. The data set is, therefore,

bivariate

Suppose I have a set of data with 5 numbers: -6.0, -4.5, 0, 5.0, and an unknown 5th number. For these 5 data points, which of the following statistics can NEVER be greater than zero

the median

Look at packet because boxplot

Median is 4, Q1 is at 3, Q3 is at 6

In a study of hatchling resting metabolism, three species, labeled A, B, and C below, were studied. Below is a pie chart of the sample sizes for each of the species. 36 hatchlings were studied in total. Based on the pie chart, about how many of the hatchlings were Species C hatchlings

12

Suppose that a frequency distribution and a cumulative frequency distribution are constructed from the same set of data, using the same classes. Then, for each class,

the frequency (less than or equal to) the cumulative frequency

Which of the following variables yields data that would be suitable for use in a histogram

length of a phone call

Use the following frequency table to determine the proportion of values less than 60

CI: 15-<30 F: 15 CI: 30-<45 F: 14 CI: 45-<60 F: 16 CI: 60-<75 F: 12 CI: 75-<90 F: 18 Total: 75

CI: 15-<30 F: 15 CI: 30-<45 F: 14 CI: 45-<60 F: 16 CI: 60-<75 F: 12 CI: 75-<90 F: 18 Total: 75

.600

Canine problem look at packet

canine problem look at packet. 13

Canine problem look at packet. Considering the graphic displays, the best description of these data would be

Canine problem look at packet. skewed right

Canine problem look at packet. When constructing a modified box plot, one must find the upper and lower mild outlier cutoffs. For these data, the upper mild outlier cutoff would be

Canine problem look at packet. OMIT no right answer, to find you would find the Q3 and Q1 to calculate the IQR (Q3-Q1). Then take the Q3+1.5*IQR

A distribution can have more than one

mode

It is possible for a distribution to be

symmetric and normal

A data set consisting of observations on two or more attributes is called a

multivariate data set

By definition, strata are groups of population units that

form well defined subpopulations

Suppose we have the following data: 12, 17, 13, 25, 16, 21, 30, 14, 16, 18

To find the 10% trimmed mean, what numbers should be deleted from the calculation

To find the 10% trimmed mean, what numbers should be deleted from the calculation

First put in numerical order

12, 13, 14, 16, 16, 17, 21, 25, 30

Then since there’s 10, you would take 10% of ten which is one so you take one off each end, i.e. 12 and 30

12, 13, 14, 16, 16, 17, 21, 25, 30

Then since there’s 10, you would take 10% of ten which is one so you take one off each end, i.e. 12 and 30

The percentage of data points falling at or below the upper quartile is

75

For which of the following statistics would one not need to put the data in order from smallest to largest

the range

In terms of sensitivity to outliers, which is the correct ordering of the following statistics from least sensitive to most sensitive? In other words, if the following statistics were ordered like this:

least sensitive < sensitive < most sensitive what should the ordering be

least sensitive < sensitive < most sensitive what should the ordering be

median < trimmed mean < mean

Suppose that for a set of numeric data, where the numbers are not all different, the standard deviation is less than 1.0. Then it must be true that

the variance < the standard deviation

Tributaries question see packet. The two points to the lower left of the original plot are the two points where zero species were observed. If these points are judged to be erroneous observations and deleted form the analysis, what would be the effect of the deletion on the sample statistics and best fit line for this data

the standard deviation of pH would decrease and the slope would be smaller

Look at packet for tributaries question

Using the equation of the best fit line above, to the nearest unit what is the predicted number of species observed for a mean pH = 6.0

Using the equation of the best fit line above, to the nearest unit what is the predicted number of species observed for a mean pH = 6.0

5

A point is called influential point if

it plays a large role in determining the slope of the least squares line

From March, 1980, to April, 1981, data were gathered on the amount of lead sold in gasoline (metric tons) in Massachusetts vs. the amount of lead found in umbilical blood in Boston (micrograms per deciliter). A summary of the analysis is presented below, and a least squares regression line has been fit to the data. Approximately what percentage of the variation in umbilical lead concentrations can be explained by the linear model

r(squared)= .453

r(squared)= .453

45.3% taken from r(squared) and making it a percentage

Which of the following indicates that an association between x and y is positive

a positive Pearson’s correlation coefficient

The slope of the regression line and the correlation between two variables is related in the following way

the slope and correlation must be of the same sign

When regressing y on x, y is referred to as the

response variable

A good fit of the simple linear regression model would be characterized by

a relatively large r(squared) and a relatively small se

Of the following, which is not true of r

r is always between 0 and 1

Suppose that for two variables, x and y, the least squares line, yhat = a + bx is found, and r is greater than zero. Which of the following statement is correct

for values of x less than xbar, the residuals must generally be relatively large

Look at packet. The fit of the data indicates that on average the estimates of the logs of the proportion returning are declining as the logarithm of the distance increases. The number in the table that indicates this is

the slope of the regression line

From this analysis, the proportion of bats returning that would predicted for a release distance of 30 km is in which range below

.21-.25

From March, 1980, to April, 1981, data were gathered on the amount of lead sold in gasoline (metric tons) in Massachusetts vs. the amount of lead found in umbilical blood in Boston (micrograms per deciliter). A summary of the analysis is presented below, and a least squares regression line has been fit to the data. The residual associated with the observation that has a gasoline lead value of 82 metric tons and 4.5 micrograms per deciliter is in which interval below

-1 less than or equal to residual less than -.5