Statistics 2023 chapter 1

collections of observations
ex: measurements, genders, survey responses

the science of planning studies/experiments, obtaining data, and organizing, summarizing,presenting, analizing, interpreting and drawing conclusions based on the data

the complete collection of all individuals (scores, people, measurements…) to be studied. The collection is complete in the sense that it includes all individuals being studied.

the collection of data from every member of the population

source of data
is the source objective or is there some incentive to be biased.
For example- if incentives are offered for favorable results the sample is biased

sampling method
Sample data must be collected in an appropriate way, such as through a process of random selection.

If sample data are not collected in an appropriate way, the data may be so completely useless that no amount of statistical torturing can salvage them.

Voluntary response (or self-selected) samples often have bias (those with special interest are more likely to participate). These samples’ results are not necessarily valid.
Other methods are more likely to produce good results.

voluntary response sample
self decided to be included or not
this is often considered to have little validity because people chooese to be involved based on what is important to them.
a random sample carries greater validity

one in which the respondents themselves decide whether to be included
In this case, valid conclusions can be made only about the specific group of people who agree to participate and not about the population.

statistical significance
numerically there is significance to a study (it may not be practically significant but statistically it could be)
ex- adkins diet says there is a 2.1 lb weight loss ater a year,. although statistically significant, the study is not practically significant because people would want a greater loss than 2.1 lbs after a year of dieting.

if the results could happen by chance, they are not statistically significant
if the likelihood of getting the results are small, they are most likely statistically signifcant

practical significance
whether a study makes sense or not- see adkins example in statistical significance

a numerical measurement describing some characteristic of a population

a sub collection of members selected from a population

numerical measurement describing characteristic of a sample

quantitiative data
numbers or counts or measurements
can be further described by disquintiguishing between discrete and continuous types
ex- weights of models, ages of respondents

categorical or qualitative data
names or labels that are not numbers representing counts or measurement or categories
ex- Shirt numbers on professional athletes uniforms – substitutes for names
genders of athletes (or a sample)

discrete data
result when the number of possible values is either a finite number or a ‘countable’ number
(i.e. the number of possible values is
0, 1, 2, 3, . . .)

Example: The number of eggs that a hen lays

continuous data
result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps

Example: The amount of milk that a cow produces; e.g. 2.343115 gallons per day

Levels of Measurement
Nominal – categories only
Ordinal – categories with some order
Interval – differences but no natural starting point
Ratio – differences and a natural starting point

nominal level of measurement
names, labels, categories
cannot be arranged in an order scheme (such as low to high)

Example: Survey responses yes, no, undecided
example- eye colors

ordinal level of measurement
categories are ordered but differences cannot be found or are meaningless
ex-ranks of colleges in a magazine or Course grades A, B, C, D, or F

interval level of measurement
like the ordinal level, with the additional property that the difference between any two data values is meaningful, however, there is no natural zero starting point (where none of the quantity is present)

Example: Years 1000, 2000, 1776, and 1492

temperatures in degrees F or C

ratio level of measurement
there is a natural 0 starting point and ratios are meaningful
ex- distances in miles, feet, km,etc
Example: Prices of college textbooks ($0 represents no cost, a $100 book costs twice as much as a $50 book)

to make a generalization

observational study
observe behavior but do not modify the behavior of the subjects being studied

apply treatment and observe its affect on the subjects

random sample
members from population are selected in a way that each individual member has an equal chance of being selected which inreases the validity of any study

probability sample
selecting members from a population in such a way that each member has a known but not the same chance of being selected

Interpreting Graphs
Interpreting Graphs
To correctly interpret a graph, you must analyze the numerical information given in the graph, so as not to be misled by the graph’s shape. READ labels and units on the axes!

Part (b) is designed to exaggerate the difference by increasing each dimension in proportion to the actual amounts of oil consumption.

Bad Samples
Voluntary response sample
(or self-selected sample)
one in which the respondents themselves decide whether to be included
In this case, valid conclusions can be made only about the specific group of people who agree to participate and not about the population.

Correlation and Causality
Concluding that one variable causes the other variable when in fact the variables are linked
Two variables may seemed linked, smoking and pulse rate, this relationship is called correlation. Cannot conclude the one causes the other. Correlation does not imply causality.

Small Samples
Conclusions should not be based on samples that are far too small.
Example: Basing a school suspension rate on a sample of only three students

Misleading or unclear percentages are sometimes used. For example, if you take 100% of a quantity, you take it all. If you have improved 100%, then are you perfect?! 110% of an effort does not make sense.

Loaded Questions
If survey questions are not worded carefully, the results of a study can be misleading. Survey questions can be “loaded” or intentionally worded to elicit a desired response.
Too little money is being spent on “welfare” versus too little money is being spent on “assistance to the poor.” Results: 19% versus 63%

Order of Questions
Questions are unintentionally loaded by such factors as the order of the items being considered.
Would you say traffic contributes more or less to air pollution than industry? Results: traffic – 45%; industry – 27%
When order reversed. Results: industry – 57%; traffic – 24%

Occurs when someone either refuses to respond to a survey question or is unavailable.
People who refuse to talk to pollsters have a view of the world around them that is markedly different than those who will let poll-takers into their homes

Missing Data
Can dramatically affect results.
Subjects may drop out for reasons unrelated to the study.
People with low incomes are less likely to report their incomes.
US Census suffers from missing people (tend to be homeless or low income).

Self-Interest Study
Some parties with interest to promote will sponsor studies.
Be wary of a survey in which the sponsor can enjoy monetary gain from the results.
When assessing validity of a study, always consider whether the sponsor might influence the results.

Precise Numbers
Because as a figure is precise, many people incorrectly assume that it is also accurate.
A precise number can be an estimate, and it should be referred to that way