Data Mining for Business Intelligence (Shmueli) ISDS 474 CSUF Chapter 3 and 4 – Flashcards

Unlock all answers in this set

Unlock answers
question
Data visualization and summary statistics help ______ data
answer
condense
question
DV Supports data cleaning. What are examples of data cleaning?
answer
Identify missing values, outliers, incorrect values, duplicates
question
DV supports exploring, which means you ____ some groups.
answer
combine
question
DV Helps identify ____ variables
answer
suitable
question
give 3 examples of Basic Plots
answer
Line Graphs, Bar Charts, and Scatterplots
question
give 2 examples of Distribution Plots
answer
Boxplots and Histograms
question
A bar chart helps you determine ____
answer
differences between subgroups
question
A _____ might replace a category with a 1 or 0
answer
dummy variable
question
A _____ displays relationship between two numerical variables
answer
scatterplot. For example, A decreases B increases
question
Line Graphs, Bar Charts, and Scatterplots are examples of ___ plots
answer
basic
question
___ plots help determine the potential methods and variable transformations
answer
distribution
question
Boxplots and Histograms are examples of ___ plots
answer
distribution
question
___ graphs are best for time series data
answer
line
question
_____ plots are good for prediction tasks, or supervised learning
answer
distribution
question
Histogram shows the distribution of the ____ variable.
answer
outcome. For instance, the median house value.
question
___ plots are useful for comparing subgroups
answer
Side-by-side boxplots. For example, the distribution of outcome variable for two neighborhoods
question
In a box plot, the top outliers defined as those above ____
answer
Quartile 3 + 1.5 times the difference of Q3 and Q1
question
The wider the box, the greater the ____.
answer
variation
question
____ are graphical displays where color is used to convey information
answer
Heat Maps
question
Heat Maps are used to visualize ___ and ____.
answer
Correlation and Missing Data
question
The correlation coefficient lies between __ and ___.
answer
+1 and -1
question
The closer the correlation is to 1, the ___ the association.
answer
stronger
question
A ____ table for p variables has the SAME number of rows and columns
answer
correlation
question
A ____ table can have DIFFERENT number of columns/variables and of rows/records
answer
data
question
How to build correlation table that looks like a basic heat map
answer
Highlight all, Data analysis, Home, conditional formatting
question
What are some Common methods of pre-processing of data?
answer
Rescaling, Aggregation, Zooming and Panning, and Filtering
question
What does Rescaling do?
answer
Can often enhance the plot and illuminate relationships
question
What is Filtering?
answer
removing some "noise" from data to focus attention on certain data
question
What is Zooming and Panning?
answer
- reveal patterns and outliers (Google maps - zoom certain areas of interest)
question
What is Aggregation?
answer
temporal scale: by granularity (monthly, weekly), geographical (by zip codes)
question
What are two ways of deriving new variables?
answer
binning and condensing categories
question
______ removes crowding and allows a better view of the linear relationship between the two logged-scale variables
answer
Rescaling
question
_______ plot Helps visualize and identify clusters and outliers, detect patterns.
answer
scatter plot with labeling
question
Scatterplots for ____ can sometimes be ineffective
answer
large observations
question
Some alternatives for using scatterplots in large observation are:
answer
Sampling....Reduce marker size....Breaking data down into subsets....Aggregation.....Jittering
question
What is jittering?
answer
Slightly moving each marker by adding a small amount of noise
question
___ are actors and relations between them, like "nodes", "edges"
answer
Network graphs
question
____ plot is multiple scatterplots together for pairwise relationships
answer
Matrix
question
Interactive visualization is often preferred over ___ graphs because all plots are on one screen
answer
"static"
question
____ maps are good for hierarchical large-scale data
answer
Tree
question
In ____ plots, the same record is highlighted in each plot
answer
Linked
question
Bar charts, scatterplots Boxplots, histograms, multiple panels, color added Aggregation methods are examples of ___
answer
Prediction and Classification
question
Line charts - temporal and seasonal aggregations and Zooming and panning are examples of _____ forecasting
answer
Time series
question
Matrix plots / Heatmaps / Aggregation / zooming and panning Map charts / parallel coordinate plots are examples of visualization for ___ learning
answer
Unsupervised
Get an explanation on any task
Get unstuck with the help of our AI assistant in seconds
New