question

Population

answer

The entire group that is the target of interest, not just people. Eg, "the population of 1 bedroom apartments"

question

Sample

answer

A subgroup of the population. Eg, "the 1 bedroom apartments with dishwashers."

question

Steps in the statistics process

answer

1. PRODUCE DATA (by studying a sample of the population) 2. EXPLORATORY DATA ANALYSIS (Summarize data.) 3. PROBABILITY ANALYSIS (Determine how the sample may differ from the population.) 4. INFERENCE (draw conclusions)

question

Data

answer

pieces of info about individuals organized into variables

question

Individual

answer

a particular person or object

question

Variable

answer

a particular characteristic of the individual

question

Dataset

answer

a set of data identified with particular circumstances. Typically displayed in tables with rows as the individuals and columns as the variables

question

Quantitative vs Categorical/Qualitative variables

answer

Quantitaive: Numerical values. Represent a measurement. Categorical: category or label values into which individuals are grouped.

question

Three steps in Exploratory Data Analysis

answer

1. Organize and SUMMARIZE raw data 2. DISCOVER important features and patterns and striking deviations. 3. INTERPRET findings in the context of the problem

question

Examining Distributions

answer

exploring data obtained from one variable at a time

question

Examining Relationships

answer

exploring data obtained from two variables at a time

question

Distribution

answer

what values the variable takes, how often

question

Three types of graphical displays of categorical distributions

answer

1. Pie Charts 2. Bar Charts 3. Pictogram

question

Bins

answer

ranges of data to make charting easier, like a bar chart where each bar shows a range like 70-80%

question

Numerical Summaries

answer

category counts and percentages

question

Four types of Graphical displays of Quantitative Variables

answer

1. Histogram 2. Stemplot 3. Dotplot 4. Boxplot

question

Histogram

answer

like a bar chart but the x axis is numerical, in order. Eg: x axis is years, y axis is Men's income and Women's income. Or, the x axis is number of hours studied, and y axis is number of students falling into each number of hours studied category.

question

4 ways to interpret a histogram

answer

1. Shape - Symmetry/Skewness, Peakness (Modality) 2. Center - midpoint 3. Spread - approx range covered by all the data 4. Outliers - observations that fall outside overall pattern

question

Symmetric distributions (on a histogram)

answer

look symmetric. can be multi-peaked, but symmetrical

question

Skewness (on a histogram)

answer

data is skewed to the right or left because outliers. (Careful because the histogram looks heavy to the opposite side than to that which it is skewed. Think of the outliers as pulling a long tail out from the main data, making it not symmetrical.)

question

Peakedness (on a histogram) (three types)

answer

1. Unimodal (single peaked) distribution 2. Bimodal (double peaked) distribution 3. Uniform distribution (Many peaks, all the same)

question

Stemplot (or stem and leaf plot)

answer

1. Write all the "stems" down in a list, in ascending numerical order. (The stems are all the numbers but the right most number. Eg: dataset 34 35 36 347 367 the stems are 3, 3, 3, 34, 36, but you only use each identical stem once, so it would be 3, 34, 36) 2. Draw a line to the right of the list 3. Write all the leaves next to the stem, and rearrange them in increasing order

question

two Virtues of a stemplot

answer

1. preserves the data while sorting it 2. when rotated looks like a histogram

question

Dotplot

answer

a stemplot with dots instead of leaves

question

Boxplot

answer

Shows the "five number spread": min, Q1, Median, Q2, Max Y axis is range Drawn box is interquartile range Points for outliers, minimum and maximum Is most useful for showing side by side comparisons

question

The Five Number Spread

answer

1. "Upper limit" = Q3 through Max 2. 75th percentile = Median through Q3 3. 50th percentile = Q1 through Median 4. 25th percentile = Q1 (this doesn't make sense) 5. Lower limit = Minimum through Q1

question

3 Measures of Center

answer

1. Mode - the value most often found (not sensitive to outliers) 2. Median - the center value (or average of the two center values) (not sensitive to outliers) 3. Mean - the average (sensitive to outliers)

question

3 measures of spread

answer

1. Range - the distance between max and minimum values 2. Inter-Quartile Range - the range of the middle 50% 3. Standard Deviation - how far the observations are from their mean. (The average may be 9, but the real average is 4 away from 9.)

question

Calculate range

answer

max - min

question

Calculate inter-quartile range

answer

1. Find median (by arranging data in increasing order) 2. Find median of bottom 50% (Q1, "The first quartile) 3. Find median of top 50% (Q3, "The third quartile") 4. Q3-Q1=IQR

question

The 1.5(IQR) criterion for outliers

answer

1. Q1-1.5(IQR) 2.Q3+1.5(IQR) 3. Any datapoints outside of these two points are possible outliers.

question

Outliers - when to keep, when to discard?

answer

1. Keep if could happen again, produced by essentially same process. 2. Discard if produced by a different process and your purpose is to understand the process which produced most of the data. 3. Discard if produced by an error or typo that cannot be fixed.

question

Notations for Standard Deviation

answer

SD, s, Sd, St Dev

question

Calculate Standard Deviation

answer

1. Find the mean 2. Find distances between observations and the mean 3. Square each deviation 4. Add up the squares of each deviation and divide by the number of deviations minus 1 5. Find square root of result EXPLANATION We can't average the deviations because they add up to zero. The reason we average the squares of the deviations minus 1 is beyond the scope of this course to explain. The average of the squared deviations is called the variance of the data.

question

Is the "standard deviation" or "variance of the data" influenced by outliers?

answer

yes, strongly

question

The "standard deviation rule"

answer

Approx 68% of observations fall within 1 standard deviation of the mean Approx 95% of observations fall within 2 standard deviations of the mean Approx 99.7 of observations fall within 3 standard deviations of the mean (3 standard deviations = the standard deviation x 3)

question

Notation for mean

answer

an x with a line over it

question

Choose between using mean and standard deviation verses the five number summary

answer

1. use mean and SD for relatively symmetrical distributions with no outliers 2. use five number summary for all others

question

Steps to choose which data display and numerical summary is best

answer

1. Identify the explanatory/independent variable (x) and the response/dependent variable (y) 2. Is the explanatory variable categorical or quantitative? 3. Is the response variable categorical or quantitative? 4. Notate it C-C, C-Q, Q-C, or Q-Q 5. Select approach based on above

question

Select data display and numerical summary approach for case C-C, C-Q, Q-C, or Q-Q

answer

1. Case C-C: Two way table or double bar chart using conditional percents. 2. Case C-Q: Box plots and five number spread 3. Case Q-C: Not covered in the text 4. Case Q-Q: Scatterplot (explanatory on x, response on y) or labelled scatterplot

question

Correlation Coefficient

answer

Measures the strength and direction of a linear relationship between two quantitative variables. Does not tell you IF a relationship is linear. A curvalinear relationship can include a linear relationship or not.The correlation coefficient tells you the strenghth of the linear relationship, not the curvalinear relationship

question

Notation of the correlation coefficient

answer

r

question

Correlation Coefficient and Outliers

answer

Outliers strongly effect the r-value, so the CC should only be used after seeing the scatterplot.

question

Range of values in the correlation coefficient

answer

-1 to 1 -1 is the strongest negative linear relationship +1 is the strongest positive linear relationship Close to zero is a weaker linear relationship

question

Regression and Linear Regression

answer

The technique that specifies the dependence of the response variable on the explanatory variable. If it's a linear dependence, then it's linear regression. It's finding the line that best fits the pattern of the linear relationship.

question

Calculate linear regression or the "least squares regression line"

answer

1. y=a+bx 2. b=r(Sy/Sx) 3. a = Y with line over it - b(x with line over it) Key: r = the correlation coeffient Sx = standard deviation of the explanatory variable's values Sy=standard deviation of the response variable's values X with line over it = the mean of the explanatory variable's values Y with line over it = the mean of the response variable's values EXPLANATION Find the slope of the "least squares regression line". (Just like the standard line equation, y=a+bX, helps you find the slope, or the change in y when x changes by 1, the "least squares regression line" formula helps you find the average change in the response variable when the explanatory variable increases by 1 unit. It's called the "least squares regression line" because it's the line which results in the smallest sum of squared vertical deviations.

question

Line

answer

a set of points that obey a particular relationship between x and y

question

Equation of the Line (Algebra Review)

answer

Y=a+bX a=the y-intercept, or the value that y takes when x =zero b=the slope, or the change in y when x changes by 1

question

Extrapolation

answer

prediction for ranges of the explanatory variable that are not in the data

question

Causation and lurking variables

answer

Association does not imply causation Lurking variables are not among the variables in a study but could substantially effect your interpretation of the relationship among those variables

question

Simpson's Paradox

answer

Whenever a lurking variable causes us to rethink the direction of an assocation

question

Correlation between quantitative variables vs. correlation between category variables

answer

There can only be correlation between quantitative variables, not category variables

question

10 sampling types and terms

answer

(Remember S,V,V,C,S, S,P,C,M,S) "Some very very cute samples. Some pleasing, cute, magnificent samples." 1. Sampling Frame 2.Volunteer Sample 3. Volunteer Response 4. Convenience Sample 5. Systematic Sampling 6. Simple Random Sample 7. Probability Sampling Plan/Technique 8. Cluster Sampling 9. Multi-Stage Sampling 10. Stratified Sampling

question

Sampling Frame

answer

The study should be designed so that the sampling frame is the entire population being studied. (My notes just say "should be the population studied". May want to double check meaning.)

question

Probability Sampling Plan/Technique

answer

Any sampling plan or technique that relies on random selection

question

Volunteer Sample and Volunteer Response

answer

1. Participants include themselves in the study. Biased because only people with strong opinions volunteer, but sometimes it's the only ethical method. (Eg medical) 2.Participants are not required to respond. Biased because you don't hear from those not interested in responding.

question

Convenience Sample

answer

Individuals happen to be there at researcher's convenience, like standing outside the arts building to catch students to question.

question

Cluster, Multi-Stage, and Stratefied sampling.

answer

CLUSTER: Select random sample of natural clusters (5 out of 40 majors) and use all the individuals within the selected clusters (all the students with those 5 majors). MULTI-STAGE: select random sample of clusters (5 out of 40 maors) and select random individuals within the cluster (random students within the five majors). STRATFIED: Use all the clusters/strata (all 40 majors). Randomly select individuals from each of the strata. (Random students within all 40 majors.)

question

Systematic Sampling

answer

eg: Send to every 50th address. (Would exclude siblings because same last name. Might have other effects that need to be thought of depending on the system.)

question

Simple Random Sample

answer

Select names out of a hat. The only sampling system with no bias.

question

3 Types of studies

answer

1. Observational - no interference 2. Experiment - Researchers control inputs 3. Sample Survey - individuals report (A study can't be both observational and experimental)

question

Prospective vs retrospective studies

answer

forward vs backward in time

question

Factor

answer

the explanatory variable in a study

question

Treatments

answer

Imposed values of the explanatory variable in a study. (Four quitting smoking techniques.)

question

Randomized Controlled Experiment - what is it and can you draw causal conclusions from it?

answer

Researchers control value of explanatory variable with a randomized procedure. (Subjects are randomly assigned to different treatments.) Can draw causal conclusions from this kind of study.

question

notation of "sample"

answer

n

question

Causal Conclusions (when can you draw them?)

answer

you can draw causal conclusions if the researches randomly assigned the explanatory variable to individuals

question

Control Group

answer

Segment of studied individuals who didn't receive treatment (or a sugar pill). Not always necessary, and sometimes ethically questionable.

question

"Blind" and "Double Blind"

answer

Blind - participants don't know what they're getting Double Blind - researchers and participants don't know who is getting what. Prevents "experimenter effect"

question

Experimenter Effect

answer

prevented by double blind studies

question

Hawthorne Effect

answer

Lack of realism (lack of ecological validity) (in a study)

question

noncompliance

answer

when study participants don't do what they are asked to do which skews the data

question

Blocking

answer

Not imposing complete randomization in a study, but blocking individuals into groups like male and female

question

Matched Pairs

answer

1 individual in a study gets 2 treatments or 2 similar individuals get 2 treatments

question

Open vs Closed Questions on a survey

answer

What is your favorite kind of food vs. Which of these five foods is your favorite?

question

6 types of survey questions to be aware of

answer

1. Open vs. Closed questions 2. Unbalanced response options 3. Leading questions 4. Planting ideas with questions 5. complicated questions 6. sensitive questions

question

Leading questions vs. planting ideas with questions

answer

Leading question: "how long have you been beating your wife?" Planting ideas with questions: "Given the huge deficit, are you in favor of universal health care?"

question

Probability Notation

answer

P(it will rain) or P(it will not rain) P(A) or P(not A) P(B), P(C) and so on

question

Probability Rule #1: MEASUREMENT OF PROBABILITY (Made up term for memory tool. No title given to the rule in the text.)

answer

between 0-1 (which means between 0-100% chance). So if the solution is above 1 it's wrong.

question

Theoretical (Classical) vs. Empirical (Observational) Probability

answer

Theoretical (Classical) : flipping coin, rolling dice. Outcomes can be predicted by the nature of the situation. Empirical (Observational) : series of trials with outcomes that can't be predicted

question

Relative Frequency

answer

The probability of an event is the relative frequency occurring in a series of trials. Relative Frequency of event A = number of times A occurred / total number of repetitions

question

Law of large numbers

answer

As the number of trials increases the empirical probability gets closer and closer to the theoretical probability

question

Sample Space vs. "Possible Outcomes for the Event"

answer

Sample Space: The list of all possible outcomes Possible Outcomes for the Event: outcomes which match the "event" being looked for

question

The complement of event A is

answer

not A, or the probability that A does not occur

question

Venn Diagram

answer

Overlapping circles to help visualize relationships between probabilities of events

question

Disjoint

answer

mutually exclusive

question

Probability Rule #2 SUM OF PROBABILITIES (Made up term for memory tool. The rule was given no title in the text.)

answer

P(S)=1 The sum of the probabilities of all possible outcomes is 1

question

Probability Rule #3: THE COMPLEMENT RULE

answer

P(not A) = 1 - P(A) or P(A) = 1 - P(not A) The probability that an event does not occur is 1 minus the probability that it does occur or vice versa. This makes sense when you remember that the sum of all the probabilities is 1. So the likelihood of something not happening is 1 minus the likelihood of it happening. Often, it is easier to find the compliment, which is why we can use this formula either way. Use for problems like, "At least one of several events occur"

question

Probability Rule #4: THE ADDITION RULE FOR DISJOINT EVENTS

answer

If A and B are disjoint events, then P(A or B) = P(A) + P(B). In other words, in probability, "or" always means "+".

question

Probability Rule #5: THE MULTIPLICATION RULE FOR INDEPENDENT EVENTS

answer

P(A and B) = P(A) x P(B). In other words, in probability, "and" always means "x". (Mulitply) (This may seem counterintuitive because you're expecting that multiplying will make a larger number but actually you're always multiplying decimals so it makes a smaller result.)

question

Independent vs Disjoint events

answer

IF EVENT IS DISJOINT, IT CAN'T BE INDEPENDENT. There can be all other combos of the two. DISJOINT = mutually exclusive. One happening means anther can't happen. PART OF "OR" QUESTIONS. INDEPENDENT = one happening doesn't effect the probability of the other happening. PART OF "AND" QUESTIONS (Note: if the group from which individuals are chosen is very large, then one being chosen does not effect the probability that the next being chosen will be any certain type. In a small set, the first selection does effect the next selection.)

question

In probability, "or" means ________ and "and" means _______.

answer

1.addition (more chance of) 2.multiplication (less chance of)

question

Probability Rule #6: GENERAL ADDITION RULE

answer

P(A or B) = P(A) + P(B) - P(A and B) Think of a venn diagram with overlapping circles. You subtract the overlapping part because you included it twice, once as part of A and once as part of B. Problems like this can be interpreted as "at least one of two events". Indeed you can use the compliment rule for them to get the same results, but the general addition rule is easier. The compliment rule is best for "at least ___ of many events".

question

P(A or B) How do you solve?

answer

1. Are the events disjoint? 2. If disjoint, use Addition Rule for Disjoint events: P(A)+P(B). 3. If not disjoint, use general addition rule: P(A)+P(B)-P(A and B)

question

How to solve: Two categorical values each with two possible values

answer

Two way table

question

Notation of conditional probability

answer

P(B|A) Probability of B, given A or Probability of B on the condition that A happened

question

The "definition of conditional probability" formula.

answer

P(B|A) = P(A and B) P(A) Similar to how we say something has a 30 out of 100 chance of happening by saying 30/100, to find the probability of B happening given that A has happened, we take the probability of A and B happening and divide it by the probability of just A happening. Most common test question for this is "Side effect A, Side effect B, and both". What is the probability that the patient who has suffered side effect A will also suffer side effect B? P(B|A) We take the chance of A and B and divide it by the chance of just A. You might think you can use a two way table for these problems, but if the question is, given that the patient got A, what is the chance he got B, then it's not a simple matter of using the given info for the chance of getting both at the same time. You have to take that "both" figure and divide it by the "given side effect" figure. However it's very useful to make a two way table to get the figures to plug into the "definition" formula.

question

Perform an independence check

answer

Events are independent if: Method 1: P(A|B) = P(A) Method 2: P(B|A) = P(B) Method 3: P(B|A) = P(B|not A) Method 4: P(A and B) = P(A) x P(B)

question

Probability Rule #7: (I gave it a number, text did not. Earlier referred to it as a version of rule #5) THE GENERAL MULTIPLICATION RULE

answer

P(A and B) = P(A) x P(B|A)

question

Probability Tree

answer

Draw diagram where possibilities emerge from events. (My words, not the text)

question

When to use a Probability Tree

answer

For scenarios where there are stages or conditional probabilities.

question

Bayes' Rule or Bayes' Theorem

answer

P(A|B) = P(A) x P(B) / P(A) x P(B|A) + P(not A) x P(B|not A) Also known as "The Law of Total Probability" Not sure wrote down this formula right

question

The "definition" of conditional probability vs. The General Multiplication Rule

answer

Definition: P(B|A) = P(A and B)/P(A) General Multiplication Rule: P(A and B) = P(A) x P(B|A) See how they are the same equation?

question

Linear Regression vs Correlation Coefficient

answer

Linear Regression is finding the line that matches the way the data falls on the scatterplot. (If it's not linear than it's just called regression.) Correlation Coefficient is calculating the strength of the linear relationship. (Can't tell you IF there's a linear relationship though.)

question

The range of the Correlation Coefficient vs. the range of probability

answer

Range of Correlation Coefficient is -1 to 1. Close to zero is a weaker linear relationship. Range of probability is 0-1, which can be translated into 0-100% chance.

question

Calculate the Correlation Coefficient

answer

Text says you don't need to know the formula. (It has lots os symbols I don't know.) But it is part of calculating the linear regression. Perhaps you solve for the correlation coefficient.

WGU Intro to Probability and Statistics

Unlock all answers in this set

Haven't found what you were looking for?

Search for samples, answers to your questions and flashcards