CSUF ISDSA Term Notes – Flashcards
Unlock all answers in this set
Unlock answersquestion
Statistics
answer
A way to get information from data
question
Business Analytics
answer
The scientific process of transforming data into insights for making better business decisions
question
The 3 major types of measurement scales
answer
Nominal, Ordinal, Interval
question
Nominal Data
answer
Qualitative or categorical and labels are used to denote the classes/categories (ex. Students are classified by school such as Business, Humanities, Education, and so on)
question
Ordinal Data
answer
Same as nominal data but there is ordering or ranking (ex. Credit ratings such as Excellent, Good, Fair)
question
Interval Data
answer
Quantitative or numerical in nature (ex. SAT scores, exam grades, income, height, weight, etc)
question
All arithmetic operations are possible for interval data but not for nominal and ordinal data types
answer
True
question
Cross-sectional data
answer
Data is collected at the same or approximately the same point in time
question
Time Series Data
answer
Collected over several time periods (ex. price of gas over a period of years)
question
The two types of Statistics
answer
Descriptive, Inferential
question
Descriptive Statistics
answer
Refers to the summary of important aspects of a data set
question
Inferential Statistics
answer
Goes beyond the data at our disposal (More formally refers about large data set "population" based on smaller set of "sample" data
question
Using a survey of a random sample of 5000 California residents, an economist said over 55% have a positive view of the economy
answer
Inferential Statistics
question
Say, from a survey of a random sample of 5000 students, 80% of those sampled are very excited about stats
answer
Descriptive Statistics
question
Population
answer
A set of items (experimental units) under study
question
Parameter (Variable)
answer
A descriptive measure of the population that is of interest e.g. the mean (Unknown - Use greek letter)
question
Statistic (singular term)
answer
A descriptive measure that is calculated from the sample e.g. the sample mean
question
The purpose of Inferential Stats
answer
To make inferences about a parameter of a population based on information obtained from the statistic or sample (with a certain degree of confidence)
question
The goal of data collection
answer
To obtain a "representative sample" that exhibits the characteristics of the entire population, most common approach is taking random samples
question
Sources of Statistical Data
answer
Extract from a public source, perform a designed experiment, take a survey, perform an observational study
question
Non-Random Sampling Errors
answer
Selection Bias, Non-response Bias, Measurement Errors
question
Three key features to understanding the shape of the data
answer
Symmetry, Skewness, Modality
question
Symmetry
answer
Normal distribution, the curve is shaped like a bell in the middle
question
Skewness, Skewed
answer
The bell will be on the left (negative skew) or to the right (positive skew)
question
Modality
answer
The curve will tell you if there are more than one groups in your data, camel humped curve
question
Mean
answer
The simple average
question
Median
answer
The middle observation after the data has been orderd
question
Mode
answer
The observation that occurs most often
question
Range=
answer
Largest value-smallest value
question
Interquartile Range=
answer
3rd quartile-1st quatile =Q3-Q1
question
Z Score
answer
Used to measure the location of a particular value in the data relative to the mean, the bigger the score, the farther from the mean/average
question
Empirical Rule
answer
For a normal distribution, nearly all of the data will fall withing 3 standard deviations of the mean
question
Outliers
answer
Values that fall outside of the normal range, either unusually large or unusually small
question
Chebyshev's Theorem
answer
For z>1, at least (1-1/z^2)100% of the data values must be withing z standard deviations of the mean
question
Random Variable
answer
A numerical description of the outcome of an event (ex. The amount a company pays out on an individual policy, based on the outcome of a random event)
question
Probability
answer
The likelihood that an event will occur and give values within a particular range
question
Probability Distribution
answer
The collection of all possible values of the random variable X and the associated probabilities P(X=x), always between 0 and 1. The sum of all P(X=x) is always 1
question
Central Limit Theorem
answer
Even if X does not have a normal distribution, X will be approximately normal if n is large (n=30 is usually large enough)
question
p bar is the point estimator or sample statistic for the population parameter; p
answer
pbar=x/n
question
Point Estimator
answer
The value (obtained from a sample) which is considered a best guess or estimate of a population parameter
question
Parameter
answer
We have a single population "target of interest". From this population, we identify a parameter. This parameter is a fixed numerical value. The problem is that we do not know its value, but wish to know it. Interval estimation will help us estimate it.
question
Sampling distribution of xbar
answer
Tells us (in probability of terms) how close a point estimator is to the parameter
question
Margin of Error (ME or MOE)
answer
ME or MOE is a quantification of how close a point estimator is to the parameter.
question
Is it possible that an interval estimate may not capture the parameter?
answer
Yes. This is what we call uncertainty.
question
Is there a way of controlling the uncertainty that the interval captures the parameter?
answer
Yes. Attach a level of desired certainty to the interval estimate.
question
Confidence Level or (1-alpha) (ex. we are 95% confident)
answer
Quantifies how often a confidence interval captures the population parameter
question
Margin of Error
answer
Critical Value x Standard Error
question
Confidence Level and ME work in opposite directions
answer
If we want to be more confident (a higher confidence level), we have to accept a higher ME. A higher ME means that our interval estimate will be less precise.
question
A large n
answer
Has clear benefits as ME is lower but may cost us time and money
question
A small n
answer
Leads to a higher ME, so no real benefit but costs less
question
The compromise
answer
The compromise will be to specify some desired ME or MOE and look for n that can achieve our specified goal
question
Null Hypothesis
answer
The hypothesis that can possibly be disproved using sample information or evidence
question
Type 1 Error
answer
Occurs when rejecting the null hypothesis when it is actually true, or claiming the alternate when it is not true
question
Type 2 Error
answer
Occurs when we fail to reject the null hypothesis when it is actually false
question
When using p-value approach
answer
Reject the null if p-value is less than alpha, and do not reject null if p-value is more than alpha
question
Critical Value Approach
answer
If the Critical Value is a positive, reject null if test statistic is greater than CV. IF CV is negative, reject null if test statistic is less than CV.
question
Matched Sample Design
answer
Each sampled item provides a pair of data values. This design often leads to a smaller sampling error than independent-sample design because it is a variation between sampled items is eliminated as a source of sampling error.
question
Experiment
answer
A study in which the experimenter manipulates attributes of what is being studied and observes the consequences.
question
Factors
answer
These are the attributes that are manipulated by being set to particular levels and then assigned to individuals. An experimenter identifies at least one factor to manipulate. these levels are often called treatments.
question
Observed Response
answer
A quantitative measurement in ANOVA
question
ANOVA
answer
Analysis of Variance, can be used to test for the equality of three or more population means
question
Data obtained from observational or experimental studies can be used for the analysis
answer
True
question
If Ho is rejected, we cannot conclude that all population means are different
answer
True
question
Rejecting Ho means that at least two population means have different values
answer
True
question
For each population, the response (dependent) variable is normally distributed
answer
True
question
ANOVA can be viewed as the process of partitioning the total sum of squares and the degrees of freedom into their corresponding sources: treatments and error
answer
True
question
Randomized Block Design
answer
Experimental units are the objects of interest in the experiment
question
Completely randomized design
answer
An experimental design in which the treatments are randomly assigned to the experimental units
question
Factorial Experiment
answer
Used because the conditions include all possible combinations of the factors.
question
Managerial decisions often are based on the relationship between two or more variables
answer
True
question
Regression Analysis
answer
Can be used to develop an equation showing how the variables are related
question
Dependent Variable
answer
Variable being predicted and denoted by "y"
question
Independent Variable
answer
Variable being used to predict value of the dependent variable and denoted by "x"
question
Simple Linear Regression
answer
Involves one independent variable and one dependent variable, approximated by a straight line
question
Regression Model
answer
Equation that describes how "y" is related to "x"
question
Simple linear regression model
answer
y = Bo + B1x + E
question
Bo and B1
answer
Parameters of the model
question
Bo
answer
The "y" intercept of the regression line
question
B1
answer
The slope of the regression line
question
Estimated Simple Linear Regression Equation
answer
yhat = Bo +B1x