# Ch. 1-Ch. 5 Summative Review AP Statistics

By definition, a subset of a population selected for study is a
sample

The distinction between descriptive and inferential statistics is that
descriptive statistics describe data sets, inferential statistics involve generalizing to populations

A characteristic whose value may change from one individual to another is a
variable

According to Chebyshev’s Rule, at least what percent of data is within 5 standard deviations of the mean?
96

The Empirical Rule can be used when assessing a distribution if
the distribution is approximately normal

A treatment that has no active ingredients is a
placebo treatment

A study cannot be an experiment if
a procedure of random assignment to treatments is not performed

The term used to describe the bias that occurs if some segment of a population is systematically excluded form a sample is
selection bias

The z-score and percentile are measures of
variability

Jane Doe and Richard Roe are law students. Their scores on the mid terms and finals in the Statistics for Law class, together with the means and standard deviations for these exams are given below:
Midterm: JD 60, RR 50, Mean 40, Stand. Dev. 10
Final: JD 60, RR 50, Mean 80, Stand. Dev. 15
Jane did better than Richard on both exams
(do z score = x-mean/stand dev.)

The bias that occurs because observations were not made of all individuals selected for a sample is
non-response bias

By definition, a simple random sample is one such that
each possible sample of size n has an equal chance of being selected

The sampling frame is the
list of all elements in a population

An experiment is a planned intervention undertaken to observe the effects of
explanatory variables

Two variables are confounded if
their effects on the response variable cannot be distinguished

Randomization, as a strategy in experimental design, would be unsuccessful if
aspects of the experimental condition other than the values of the explanatory variable, systematically favors a treatment

Blocking would be unsuccessful if
the blocks were heterogeneous on the blocking factor

The design strategy of making multiple observations for each experimental condition is
replication

In utilizing direct control, which of the following are held constant
values of an extraneous variable

In a study of dexterity, data was taken on 100 individuals. The variables measured for each person were the number of errors in picking up a dime in 50 trials, and the total time it took to pick up the dime 50 times. The data set is, therefore,
bivariate

Suppose I have a set of data with 5 numbers: -6.0, -4.5, 0, 5.0, and an unknown 5th number. For these 5 data points, which of the following statistics can NEVER be greater than zero
the median

Look at packet because boxplot
Median is 4, Q1 is at 3, Q3 is at 6

In a study of hatchling resting metabolism, three species, labeled A, B, and C below, were studied. Below is a pie chart of the sample sizes for each of the species. 36 hatchlings were studied in total. Based on the pie chart, about how many of the hatchlings were Species C hatchlings
12

Suppose that a frequency distribution and a cumulative frequency distribution are constructed from the same set of data, using the same classes. Then, for each class,
the frequency (less than or equal to) the cumulative frequency

Which of the following variables yields data that would be suitable for use in a histogram
length of a phone call

Use the following frequency table to determine the proportion of values less than 60
CI: 15-<30 F: 15 CI: 30-<45 F: 14 CI: 45-<60 F: 16 CI: 60-<75 F: 12 CI: 75-<90 F: 18 Total: 75
.600

Canine problem look at packet
canine problem look at packet. 13

Canine problem look at packet. Considering the graphic displays, the best description of these data would be
Canine problem look at packet. skewed right

Canine problem look at packet. When constructing a modified box plot, one must find the upper and lower mild outlier cutoffs. For these data, the upper mild outlier cutoff would be
Canine problem look at packet. OMIT no right answer, to find you would find the Q3 and Q1 to calculate the IQR (Q3-Q1). Then take the Q3+1.5*IQR

A distribution can have more than one
mode

It is possible for a distribution to be
symmetric and normal

A data set consisting of observations on two or more attributes is called a
multivariate data set

By definition, strata are groups of population units that
form well defined subpopulations

Suppose we have the following data: 12, 17, 13, 25, 16, 21, 30, 14, 16, 18
To find the 10% trimmed mean, what numbers should be deleted from the calculation
First put in numerical order
12, 13, 14, 16, 16, 17, 21, 25, 30
Then since there’s 10, you would take 10% of ten which is one so you take one off each end, i.e. 12 and 30

The percentage of data points falling at or below the upper quartile is
75

For which of the following statistics would one not need to put the data in order from smallest to largest
the range

In terms of sensitivity to outliers, which is the correct ordering of the following statistics from least sensitive to most sensitive? In other words, if the following statistics were ordered like this:
least sensitive < sensitive < most sensitive what should the ordering be
median < trimmed mean < mean

Suppose that for a set of numeric data, where the numbers are not all different, the standard deviation is less than 1.0. Then it must be true that
the variance < the standard deviation

Tributaries question see packet. The two points to the lower left of the original plot are the two points where zero species were observed. If these points are judged to be erroneous observations and deleted form the analysis, what would be the effect of the deletion on the sample statistics and best fit line for this data
the standard deviation of pH would decrease and the slope would be smaller

Look at packet for tributaries question
Using the equation of the best fit line above, to the nearest unit what is the predicted number of species observed for a mean pH = 6.0
5

A point is called influential point if
it plays a large role in determining the slope of the least squares line

From March, 1980, to April, 1981, data were gathered on the amount of lead sold in gasoline (metric tons) in Massachusetts vs. the amount of lead found in umbilical blood in Boston (micrograms per deciliter). A summary of the analysis is presented below, and a least squares regression line has been fit to the data. Approximately what percentage of the variation in umbilical lead concentrations can be explained by the linear model
r(squared)= .453
45.3% taken from r(squared) and making it a percentage

Which of the following indicates that an association between x and y is positive
a positive Pearson’s correlation coefficient

The slope of the regression line and the correlation between two variables is related in the following way
the slope and correlation must be of the same sign

When regressing y on x, y is referred to as the
response variable

A good fit of the simple linear regression model would be characterized by
a relatively large r(squared) and a relatively small se

Of the following, which is not true of r
r is always between 0 and 1

Suppose that for two variables, x and y, the least squares line, yhat = a + bx is found, and r is greater than zero. Which of the following statement is correct
for values of x less than xbar, the residuals must generally be relatively large

Look at packet. The fit of the data indicates that on average the estimates of the logs of the proportion returning are declining as the logarithm of the distance increases. The number in the table that indicates this is
the slope of the regression line

From this analysis, the proportion of bats returning that would predicted for a release distance of 30 km is in which range below
.21-.25

From March, 1980, to April, 1981, data were gathered on the amount of lead sold in gasoline (metric tons) in Massachusetts vs. the amount of lead found in umbilical blood in Boston (micrograms per deciliter). A summary of the analysis is presented below, and a least squares regression line has been fit to the data. The residual associated with the observation that has a gasoline lead value of 82 metric tons and 4.5 micrograms per deciliter is in which interval below
-1 less than or equal to residual less than -.5