# Business Statistics [Maymester – Midterm] – Flashcards

## Unlock all answers in this set

question
Statistics (Definition)
...the science of collecting, organizing, analyzing, interpreting, and presenting data. Some experts prefer to call statistics data science, a trilogy of tasks involving data modeling, analysis, and decision making.
question
Statistic (Definition)
...a single measure, reported as a number, used to summarize a sample data set.
question
Descriptive Statistics (Definition)
...a single measure, reported as a number, used to summarize a sample data set.
question
Inferential Statistics (Definition)
...generalizing from a sample to a population, estimating unknown population parameters, drawing conclusions, and making decisions
question
Pitfalls of Statistics (List)
Pitfall 1: Conclusions from Small Samples Pitfall 2: Conclusions from Nonrandom Samples Pitfall 3: Conclusions from Rare Events Pitfall 4: Poor Survey Methods Pitfall 5: Assuming a Causal Link Pitfall 6: Generalization to Individuals Pitfall 7: Unconscious Bias Pitfall 8: Significance versus Importance
question
Statistics vs. Probability
Statistics summarizes history, while probability quantifies future uncertainty.
question
Observation (Definition)
...a single member of a collection of items that we want to study, such as a person, firm, or region.
question
Variable (Definition)
...a characteristic of the subject or individual, such as an employee's income or an invoice amount
question
Data Set (Definition)
...consists of all the values of all of the variables for all of the observations we have chosen to observe.
question
Univariate Data Set (Definition, Example and Typical Tasks)
One variable; Ex: Income; Typical Tasks: Histogram, descriptive statistics, frequency tallies.
question
Bivariate Data Set (Definition, Example, and Typical Tasks)
Two variables; Ex: Income and age; Typical Tasks: Scatter plots, correlations, regression modeling...
question
Multivariate Data Set (Definition, Example, and Typical Tasks)
More than two variables; Ex: Income, age and gender; Typical Tasks: multiple regression, data mining, econometric modeling...
question
Categorical Data (Verbal Label, Coding and Binary Values)
(also called qualitative data); have values that are described by words rather than numbers. Verbal Labeling: cars are called small, mid-sized, sudan, etc... Coding: numbers can represent words; i.e. 1=cash, 2=check, 3=credit card. Binary Values: variables only have two values (i.e. employed and unemployed).
question
Numerical Data (Discreet and Continuous Data)
(also called quantitative data - statistics or tables) counting, measuring something, or some kind of mathematical operation - provides insight into characteristics of a data set using mathematics. Discreet Data: takes on a numerical value - you can count it on your fingers (no negatives - only integers). Continuous Data: number might represent a percentage of customers out of an entire group surveyed - can take on fractional values.
question
Times Series Data vs. Cross Sectional Data
Time Series Data: each observation in the sample represents a different equally spaced point in time (i.e. years, months, days...) - we are interested in trends and patterns over time. Cross Sectional Data: each observation represents a different individual unity at the same point in time - we are interested in variation among observations. We can combine the two data types to get pooled cross sectional and time series data.
question
Levels of Measurement (List)
Nominal Ordinal Interval Ratio
question
Nominal Data (Definition)
(Latin from "name") identifying categories only; i.e. eye color (blue, brown, green, etc...)
question
Ordinal Data (Definition)
Rank has meaning; no clear meaning to distance; i.e. full sized, compact, subcompact.
question
Interval Data (Definition)
Distance has meaning; i.e. temperature.
question
Ratio Data (Definition)
Meaningful zero exists; i.e. accounts payable - 20\$ is twice as much as 10\$ (ratio of 2:1) - 0 point means the absence of something.
question
Likert Scales (Describe)
Example: "College-bound high school students should be required to study a foreign language - Check one box." Box Options: "Strongly Agree," "Somewhat Agree," "Neither Agree Nor Disagree," "Somewhat Disagree," "Strongly Disagree"...
question
Parameter (Definition)
a measurement or characteristic of the population (i.e. a mean or proportion). Usually unknown since we can rarely observe the entire population; i.e. a census of a certain target population is impossible - so these parameters would be estimated using a sample.
question
Target Population (Definition)
...the population that we're interested in.
question
Sampling Frame (Definition)
... the group from which we take the sample; i.e. phone books, directories, email addresses from a certain online newsletter, etc...
question
Sampling Methods (List)
1. Simple Random Sample 2. Systematic Sample 3. Stratified Sampling 4. Cluster Sampling 5. Judgement Sample 6. Convenience Sample 7 Focus Groups
question
Simple Random Sample (Describe)
... use random numbers to select items from a list.
question
Systematic Sample (Describe)
... select every n-th item from a list or sequence (e.g., restaurant customers); every fifth car gets randomly pulled over.
question
Stratified Sampling (Describe)
... select randomly within defined strata (e.g. by age, occupation, gender)...
question
Cluster Sampling (Describe)
... like stratified sampling except strata are geographical areas (e.g. zip codes)... trying to find locations in their particular market.
question
Judgement Sample (Describe)
... use expert knowledge to choose "typical" items (e.g. which employees to interview for yearly reviews).
question
Convenience Sample (Describe)
... use a sample that happens to be given (e.g. a coworker that just happens to be at lunch with you).
question
Focus Groups (Describe)
... in-depth dialogue with a panel of representative or specific individuals (i.e. iPod users).
question
Basic Steps to Survey Research (List)
Step 1: State the goals of the research. Step 2: Develop the budget (time, money, staff). Step 3: Create a research design (target population, frame, sample size). Step 4: Choose a survey type and method of administration. Step 5: Design a data collection instrument (questionnaire). Step 6: Pretest the survey instrument and revise as needed. Step 7: Administer the survey (follow up if needed). Step 8: Code the data and analyze it.
question
Visual Data Representation
(charts and graphs) provides insight into characteristics of a data set without using mathematics.
question
Stem and Leaf Plot
...a tool of exploratory data analysis (EDA) that seeks to reveal essential data features in an intuitive way. A stem-and-leaf plot is basically a frequency tally, except that we use digits instead of tally marks. For two-digit or three-digit integer data, the stem is the tens digit of the data, and the leaf is the ones digit.
question
Dot Plots
A dot plot is the simplest graphical display of n individual values of numerical data - pretty much like a stem-and-leaf plot, but instead of digits, it uses dots. A stacked dot plot compares two dot plots (stacked on top of one another).
question
Left Skewed Histogram
... negatively skewed, with a long left "tail."
question
Right Skewed Histogram
... positively skewed, with long right "tail."
question
Symmetric Histogram
... both "tails" are the same length.
question
Pareto Charts
... special type of bar chart used in quality management to display the frequency of defects or errors of different types; categories are displaced in descending order of frequency.
question
Calculate the Mean
... add up all of the numbers, and then divide by how many numbers there were. =AVERAGE(Data)
question
Calculate the Median
... the 50th percentile, or midpoint, of the sorted sample data. =MEDIAN(Data)
question
Calculate the Mode
... the most frequently occurring data value; i.e. 2 2 5 5 5 6 7 8 8... 5 is the mode. =MODE.SNGL(Data)
question
Calculate the Midrange
the point halfway between the lowest and highest values of x. [x(1) + x(2)] / 2 = (MIN(Data)+MAX(Data))/2
question
Trimmed Mean
... gets rid of outliers, used for economic data. Remove the highest and lowest k percent of the observation. =TRIMMEAN(Data, 0.1)
question
Sample Standard Deviation (S)
=STDEV.S(Data); A low standard deviation indicates that the data points tend to be very close to the mean; high standard deviation indicates that the data points are spread out over a large range of values.
question
... reveals the average distance from the center. Appealing because of its simple interpretation. =AVEDEV(Data)
question
Range
Max(Data) - Min(Data).
question
Empirical Rule
The Empirical Rule states that for data from a normal distribution, we expect the interval ? ± k? to contain a known percentage of data.
question
Method of Medians
... find the median of all of the data - then, find the two medians of the upper and lower sections of the original median.
question
Level of Confidence (Definition)
... a measure of how confident we are in a given marin of error; i.e. 90% level of confidence that an estimate based on a sample will differ by no more than 1.6 standard errors from the "true" population value because of sampling error.
question
Random Experiment (Definition)
... an observational process whose results cannot be known in advance.
question
Sample Space (Definition)
... the set of all outcomes (S) of a random experiment.
question
Discrete Sample Space (Definition)
... a sample space with a countable number of outcomes; i.e. grades; A to F; the probabilities of all simple events must sum to 1.
question
Continuous Sample Space (Definition)
... the sample space cannot be listed but can be described by a rule; i.e. the sample space for the length of a randomly chosen cell phone call would be S={all X such that X>0}, because you don't know how long cell phone calls can be.
question
Event (Definition)
... any subset of outcomes in the sample space.
question
Simple Event / Elementary Event (Definition)
("Elementary, my dear Watson!" - which Sherlock Holmes never ACTUALLY said, by the way); a single outcome.
question
Probability (Definition)
... the probability of an event is a number that measures the likelihood that the event will occur; the probability of event A must lie within the interval from 0-1.
question
Empirical Approach (Definition)
...use the empirical or relative frequency approach to assign probabilities by counting the frequency of observed outcomes defined on the experimental sample space - based on HISTORICAL DATA; i.e. default rates on student loans: P(a student defaults)= f/n = (number of defaults / number of loans)
question
The Law of Large Numbers (Definition)
... says that as the number of trials increases, any empirical probability approaches its theoretical limit; i.e. flip a coin 50 times; we would theoretically expect the proportion of heads to be near .50.
question
Classical Approach [A priori] (Definition)
A priori: the process of assigning probabilities before the event is observed or the experiment is conducted; based on logic not experience; Think "priori = PRIOR." Instead of performing the experiment, we can use deduction to determine the probability of an event.
question
Subjective Approach (Definition)
... reflects someone's informed judgement about the likelihood of an event; used when there is no repeatable random experiment; i.e. What is the probability that the price of Ford's stock will rise within the next 30 days?
question
Complement of an Event (Definition)
... of an event A is denoted by A' and consists of everything in the sample space S except event A.
question
Union of Two Events (Definition)
... consists of all outcomes in the sample space S that are contained either in event A or in event B, or in both.
question
Intersection of Two Events (Definition)
... the event consisting of all outcomes in the sample space S that are contained in both events A and B.
question
Mutually Exclusive Events (Definition)
... two events are mutually exclusive if their intersection is the null set which contains no elements.
question
... in the case of mutually exclusive events, the addition law reduces to: P(A) + P(B).
question
Collectively Exhaustive Events (Definition)
... if their union is the entire sample space S; there can be more than two collectively exhaustive events, as long as they take up the entirety of sample space S.
question
Conditional Probability (Definition)
... the probability of an event A given that event B has occurred; i.e.: P(A in Physics) = 0.2 P(A in Calculus) = 0.2, so... P(A in Physics | A in Calculus) = 0.8 P(A in Physics | C in Calculus) = 0.15
question
Independent Events (Definition)
Event A is independent of event B if the conditional probability P(A | B) is the same as the marginal probability P(A).
question
Contingency Table (Definition)
... also called a cross-tabulation table; used often when gathering empirical data. NOTE: Learn how to create / read / analyze contingency tables - MUY IMPORTANTE.
question
Random Variable (Definition)
... a function or rule that assigns a numerical value to each outcome in the sample space of a random experiment. Uppercase letters (X, Y, etc...) represent random variables. Lowercase letters (x, y, etc...) represent values of random variables.
question
Discrete Random Variable (Definition)
... a variable that has a countable number of distinct values.
question
Discrete Probability Distribution
... assigns a probability to each value of a discrete random variable X.
question
Calculating the Variance
... (x-mu)^2 So, X minus the average of the xP(x) value.
question
Probability Distribution Function (PDF)
... a mathematical function that shows the probability of each X-value.
question
Cumulative Distribution Function (CDF)
... a mathematical function that shows the cumulative sum of probabilities, adding from the smallest to the largest X-value, gradually approaching unity; i.e... x | CDF 1 | .2 2 | .2 + .3 3 | .2 + .3 + .4
question
Bernoulli Experiments