# Business Statistics [Maymester – Midterm] Alexander Barker
question

Statistics (Definition)

…the science of collecting, organizing, analyzing, interpreting, and presenting data. Some experts prefer to call statistics data science, a trilogy of tasks involving data modeling, analysis, and decision making.
question

Statistic (Definition)

…a single measure, reported as a number, used to summarize a sample data set.
question

Descriptive Statistics (Definition)

…a single measure, reported as a number, used to summarize a sample data set.
question

Inferential Statistics (Definition)

…generalizing from a sample to a population, estimating unknown population parameters, drawing conclusions, and making decisions
question

Pitfalls of Statistics (List)

Pitfall 1: Conclusions from Small Samples Pitfall 2: Conclusions from Nonrandom Samples Pitfall 3: Conclusions from Rare Events Pitfall 4: Poor Survey Methods Pitfall 5: Assuming a Causal Link Pitfall 6: Generalization to Individuals Pitfall 7: Unconscious Bias Pitfall 8: Significance versus Importance
question

Statistics vs. Probability

Statistics summarizes history, while probability quantifies future uncertainty.
question

Observation (Definition)

…a single member of a collection of items that we want to study, such as a person, firm, or region.
question

Variable (Definition)

…a characteristic of the subject or individual, such as an employee’s income or an invoice amount
question

Data Set (Definition)

…consists of all the values of all of the variables for all of the observations we have chosen to observe.
question

Univariate Data Set (Definition, Example and Typical Tasks)

One variable; Ex: Income; Typical Tasks: Histogram, descriptive statistics, frequency tallies.
question

Bivariate Data Set (Definition, Example, and Typical Tasks)

Two variables; Ex: Income and age; Typical Tasks: Scatter plots, correlations, regression modeling…
question

Multivariate Data Set (Definition, Example, and Typical Tasks)

More than two variables; Ex: Income, age and gender; Typical Tasks: multiple regression, data mining, econometric modeling…
question

Categorical Data (Verbal Label, Coding and Binary Values)

(also called qualitative data); have values that are described by words rather than numbers. Verbal Labeling: cars are called small, mid-sized, sudan, etc… Coding: numbers can represent words; i.e. 1=cash, 2=check, 3=credit card. Binary Values: variables only have two values (i.e. employed and unemployed).
question

Numerical Data (Discreet and Continuous Data)

(also called quantitative data – statistics or tables) counting, measuring something, or some kind of mathematical operation – provides insight into characteristics of a data set using mathematics. Discreet Data: takes on a numerical value – you can count it on your fingers (no negatives – only integers). Continuous Data: number might represent a percentage of customers out of an entire group surveyed – can take on fractional values.
question

Times Series Data vs. Cross Sectional Data

Time Series Data: each observation in the sample represents a different equally spaced point in time (i.e. years, months, days…) – we are interested in trends and patterns over time. Cross Sectional Data: each observation represents a different individual unity at the same point in time – we are interested in variation among observations. We can combine the two data types to get pooled cross sectional and time series data.
question

Levels of Measurement (List)

Nominal Ordinal Interval Ratio
question

Nominal Data (Definition)

(Latin from “name”) identifying categories only; i.e. eye color (blue, brown, green, etc…)
question

Ordinal Data (Definition)

Rank has meaning; no clear meaning to distance; i.e. full sized, compact, subcompact.
question

Interval Data (Definition)

Distance has meaning; i.e. temperature.
question

Ratio Data (Definition)

Meaningful zero exists; i.e. accounts payable – 20\$ is twice as much as 10\$ (ratio of 2:1) – 0 point means the absence of something.
question

Likert Scales (Describe)

Example: “College-bound high school students should be required to study a foreign language – Check one box.” Box Options: “Strongly Agree,” “Somewhat Agree,” “Neither Agree Nor Disagree,” “Somewhat Disagree,” “Strongly Disagree”…
question

Parameter (Definition)

a measurement or characteristic of the population (i.e. a mean or proportion). Usually unknown since we can rarely observe the entire population; i.e. a census of a certain target population is impossible – so these parameters would be estimated using a sample.
question

Target Population (Definition)

…the population that we’re interested in.
question

Sampling Frame (Definition)

… the group from which we take the sample; i.e. phone books, directories, email addresses from a certain online newsletter, etc…
question

Sampling Methods (List)

1. Simple Random Sample 2. Systematic Sample 3. Stratified Sampling 4. Cluster Sampling 5. Judgement Sample 6. Convenience Sample 7 Focus Groups
question

Simple Random Sample (Describe)

… use random numbers to select items from a list.
question

Systematic Sample (Describe)

… select every n-th item from a list or sequence (e.g., restaurant customers); every fifth car gets randomly pulled over.
question

Stratified Sampling (Describe)

… select randomly within defined strata (e.g. by age, occupation, gender)…
question

Cluster Sampling (Describe)

… like stratified sampling except strata are geographical areas (e.g. zip codes)… trying to find locations in their particular market.
question

Judgement Sample (Describe)

… use expert knowledge to choose “typical” items (e.g. which employees to interview for yearly reviews).
question

Convenience Sample (Describe)

… use a sample that happens to be given (e.g. a coworker that just happens to be at lunch with you).
question

Focus Groups (Describe)

… in-depth dialogue with a panel of representative or specific individuals (i.e. iPod users).
question

Basic Steps to Survey Research (List)

Step 1: State the goals of the research. Step 2: Develop the budget (time, money, staff). Step 3: Create a research design (target population, frame, sample size). Step 4: Choose a survey type and method of administration. Step 5: Design a data collection instrument (questionnaire). Step 6: Pretest the survey instrument and revise as needed. Step 7: Administer the survey (follow up if needed). Step 8: Code the data and analyze it.
question

Visual Data Representation

(charts and graphs) provides insight into characteristics of a data set without using mathematics.
question

Stem and Leaf Plot

…a tool of exploratory data analysis (EDA) that seeks to reveal essential data features in an intuitive way. A stem-and-leaf plot is basically a frequency tally, except that we use digits instead of tally marks. For two-digit or three-digit integer data, the stem is the tens digit of the data, and the leaf is the ones digit.
question

Dot Plots

A dot plot is the simplest graphical display of n individual values of numerical data – pretty much like a stem-and-leaf plot, but instead of digits, it uses dots. A stacked dot plot compares two dot plots (stacked on top of one another).
question

Left Skewed Histogram

… negatively skewed, with a long left “tail.”
question

Right Skewed Histogram

… positively skewed, with long right “tail.”
question

Symmetric Histogram

… both “tails” are the same length.
question

Pareto Charts

… special type of bar chart used in quality management to display the frequency of defects or errors of different types; categories are displaced in descending order of frequency.
question

Calculate the Mean

… add up all of the numbers, and then divide by how many numbers there were. =AVERAGE(Data)
question

Calculate the Median

… the 50th percentile, or midpoint, of the sorted sample data. =MEDIAN(Data)
question

Calculate the Mode

… the most frequently occurring data value; i.e. 2 2 5 5 5 6 7 8 8… 5 is the mode. =MODE.SNGL(Data)
question

Calculate the Midrange

the point halfway between the lowest and highest values of x. [x(1) + x(2)] / 2 = (MIN(Data)+MAX(Data))/2
question

Trimmed Mean

… gets rid of outliers, used for economic data. Remove the highest and lowest k percent of the observation. =TRIMMEAN(Data, 0.1)
question

Sample Standard Deviation (S)

=STDEV.S(Data); A low standard deviation indicates that the data points tend to be very close to the mean; high standard deviation indicates that the data points are spread out over a large range of values.
question

… reveals the average distance from the center. Appealing because of its simple interpretation. =AVEDEV(Data)
question

Range

Max(Data) – Min(Data).
question

Empirical Rule

The Empirical Rule states that for data from a normal distribution, we expect the interval ? ± k? to contain a known percentage of data.
question

Method of Medians

… find the median of all of the data – then, find the two medians of the upper and lower sections of the original median.
question

Level of Confidence (Definition)

… a measure of how confident we are in a given marin of error; i.e. 90% level of confidence that an estimate based on a sample will differ by no more than 1.6 standard errors from the “true” population value because of sampling error.
question

Random Experiment (Definition)

… an observational process whose results cannot be known in advance.
question

Sample Space (Definition)

… the set of all outcomes (S) of a random experiment.
question

Discrete Sample Space (Definition)

… a sample space with a countable number of outcomes; i.e. grades; A to F; the probabilities of all simple events must sum to 1.
question

Continuous Sample Space (Definition)

… the sample space cannot be listed but can be described by a rule; i.e. the sample space for the length of a randomly chosen cell phone call would be S={all X such that X>0}, because you don’t know how long cell phone calls can be.
question

Event (Definition)

… any subset of outcomes in the sample space.
question

Simple Event / Elementary Event (Definition)

(“Elementary, my dear Watson!” – which Sherlock Holmes never ACTUALLY said, by the way); a single outcome.
question

Probability (Definition)

… the probability of an event is a number that measures the likelihood that the event will occur; the probability of event A must lie within the interval from 0-1.
question

Empirical Approach (Definition)

…use the empirical or relative frequency approach to assign probabilities by counting the frequency of observed outcomes defined on the experimental sample space – based on HISTORICAL DATA; i.e. default rates on student loans: P(a student defaults)= f/n = (number of defaults / number of loans)
question

The Law of Large Numbers (Definition)

… says that as the number of trials increases, any empirical probability approaches its theoretical limit; i.e. flip a coin 50 times; we would theoretically expect the proportion of heads to be near .50.
question

Classical Approach [A priori] (Definition)

A priori: the process of assigning probabilities before the event is observed or the experiment is conducted; based on logic not experience; Think “priori = PRIOR.” Instead of performing the experiment, we can use deduction to determine the probability of an event.
question

Subjective Approach (Definition)

… reflects someone’s informed judgement about the likelihood of an event; used when there is no repeatable random experiment; i.e. What is the probability that the price of Ford’s stock will rise within the next 30 days?
question

Complement of an Event (Definition)

… of an event A is denoted by A’ and consists of everything in the sample space S except event A.
question

Union of Two Events (Definition)

… consists of all outcomes in the sample space S that are contained either in event A or in event B, or in both.
question

Intersection of Two Events (Definition)

… the event consisting of all outcomes in the sample space S that are contained in both events A and B.
question

Mutually Exclusive Events (Definition)

… two events are mutually exclusive if their intersection is the null set which contains no elements.
question

… in the case of mutually exclusive events, the addition law reduces to: P(A) + P(B).
question

Collectively Exhaustive Events (Definition)

… if their union is the entire sample space S; there can be more than two collectively exhaustive events, as long as they take up the entirety of sample space S.
question

Conditional Probability (Definition)

… the probability of an event A given that event B has occurred; i.e.: P(A in Physics) = 0.2 P(A in Calculus) = 0.2, so… P(A in Physics | A in Calculus) = 0.8 P(A in Physics | C in Calculus) = 0.15
question

Independent Events (Definition)

Event A is independent of event B if the conditional probability P(A | B) is the same as the marginal probability P(A).
question

Contingency Table (Definition)

… also called a cross-tabulation table; used often when gathering empirical data. NOTE: Learn how to create / read / analyze contingency tables – MUY IMPORTANTE.
question

Random Variable (Definition)

… a function or rule that assigns a numerical value to each outcome in the sample space of a random experiment. Uppercase letters (X, Y, etc…) represent random variables. Lowercase letters (x, y, etc…) represent values of random variables.
question

Discrete Random Variable (Definition)

… a variable that has a countable number of distinct values.
question

Discrete Probability Distribution

… assigns a probability to each value of a discrete random variable X.
question

Calculating the Variance

… (x-mu)^2 So, X minus the average of the xP(x) value.
question

Probability Distribution Function (PDF)

… a mathematical function that shows the probability of each X-value.
question

Cumulative Distribution Function (CDF)

… a mathematical function that shows the cumulative sum of probabilities, adding from the smallest to the largest X-value, gradually approaching unity; i.e… x | CDF 1 | .2 2 | .2 + .3 3 | .2 + .3 + .4
question

Bernoulli Experiments

… a random experiment with only 2 outcomes; one outcome is labeled “success” (x=1) and the other a “failure” (x=0). Success is defined as the less likely outcome.
question

Binomial Distribution (Definition)