Business Intelligence – Second Edition 2011 Chapter 4 – Flashcards
Unlock all answers in this set
Unlock answersquestion
Adaptive Resonance Theory
answer
An unsupervised learning method created by Stephen Grossberg. ART is a neural network architecture that is aimed at being brain like in unsupervised mode.
question
Algorithm
answer
A step-by-step search in which improvements is made at every step until the best solution is found.
question
Apriori Algorithm
answer
The most commonly used algorithm to discover association rules by recursively identifying frequent item sets.
question
Area Under The ROC Curve
answer
A graphical assessment technique for binary classification models where the true positive rate is plotted on the Y-axis and false positive rate is plotted on the X-axis.
question
Artificial Neural Network (ANN)
answer
Computer technology that attempts to build computers that operate like a human brain. The machines posses simultaneous memory storage and work with ambiguous information. Sometimes called, simply, a neural network. See neural computing.
question
Associations
answer
A category of data mining algorithm that establishes relationships about items that occur together in a given record.
question
Axons
answer
An outgoing connection (i.e., terminal) from a biological neuron.
question
Back Propagation
answer
The best-known learning algorithm in neural computing where the learning is done by comparing computed outputs to desired outputs of training cases.
question
Bootstrapping
answer
A sampling technique where a fixed number of instances for the original data are sampled (with replacements) for training and the rest of the dataset is used for testing.
question
Business Analyst
answer
An individual whose job is to analyze business data. Business analytics involved using DSS tools, especially models, in assisting decision makers. It is essentially OLAP/DSS. See business intelligence (BI).
question
Categorical Data
answer
Data that represents the labels of multiple classes used to divide a variable into specific groups.
question
Chromosome
answer
A candidate solution for a genetic algorithm.
question
Classification
answer
Supervised induction used to analyze the historical data stored in a database and to automatically generate a model that can predict future behaviors.
question
Clustering
answer
Partitioning a databases into segments in which the members of a segment share similar qualities.
question
Confidence
answer
In association rules, the conditional probability of finding the RHS of the rule present in a list of transactions where the LHS of the rule exists.
question
Connection Weight
answer
The weight associated with each link in a neural network model. Neural network learning algorithms assess connection weights.
question
CRISP-DM
answer
A cross-industry standard process of conducting data mining projects, which is a sequence of six steps that starts with a good understanding of the business and the need for the data mining project (i.e., the application domain) and ends with the deployment of the solution that satisfied the specific business need.
question
Data Mining
answer
A process that uses statistical, mathematical, artificial intelligence, and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases.
question
Decision Trees
answer
A graphical presentation of a sequence of interrelated decisions to be made under assumed risk. This technique classifies specific entities into particular classes based upon the features of the entities into particular classes based upon the features of the entities; a root followed by internal nodes, each node (including root) is labeled with a question, and arcs associated with each node cover all possible responses.
question
Dendrites
answer
The part of a biological neuron that provides inputs to the cell.
question
Discovery-driven data mining
answer
A form of data mining that finds patterns, association, and relationships among data in order to uncover facts that were previously unknown or not even contemplated by an organization.
question
Distance Measure
answer
A method used to calculate the closeness between pairs of items in most cluster analysis methods. Popular distance measures include Euclidian distance (the ordinary distance between two point that one would measure with a ruler) and Manhattan distance (also called the rectilinear distance, or taxicab distance, between two points.)
question
Entropy
answer
A metric that measures the extent of uncertainty or randomness in a data set. In all the data in a subset belong to just one class, then the is no uncertainty or randomness in that data set, and therefore the entropy is zero.
question
Fuzzy Logic
answer
A logically consistent way of reasoning that can cope with uncertain or particle information. Fuzzy logic is characteristic of human thinking and expert systems.
question
Genetic Algorithm
answer
A software program that learns in a evolutionary manner, similar to the way biological systems evolve.
question
Gini Index
answer
A metric that is used in economic to measure the diversity of the population. The same concept can be used to determine the purity of a specific class as a result of a decision to branch along a particular attribute/variable.
question
Heuristics
answer
Informal, judgmental knowledge of a application area that constitutes the rules of good judgement in the field. Heuristics also encompasses the knowledge of how to solve problems efficiently and effectively, how to improve performance, and so on.
question
Hidden layer
answer
The middle layer of an artificial neural network that has three or more layers.
question
Hypothesis-dirven data mining
answer
A form of data mining that begins with a proposition by the user, who then seeks to validate the truthfulness of the proposition.
question
Information Gain
answer
The splitting mechanism used in ID3 (a popular decision-tree algorithm).
question
Interval Data
answer
Variables that can be measured on interval scales.
question
K-Fold Cross-Validation
answer
A popular accuracy assessment technique for prediction models where the complete dataset is randomly split into k mutually exclusive subsets of approximately equal size. The classification model is trained and tested k times. Each time it is trained on all but one fold and then tested on the remaining single fold. The cross-validation estimate of the overall accuracy of a model is calculated by simply averaging the k individual accuracy measures.
question
Knowledge Discovery in Databases (KDD)
answer
A machine-learning process that performs rule induction or a related procedure to establish knowledge from large data bases.
question
Kohonen's Self-Organizing Feature Map
answer
A type of neural network model for machine learning.
question
Learning Algorithm
answer
The training procedure used by an artificial neural network.
question
Link Analysis
answer
The linkage among many objects of interest is discovered automatically, such as the link between Web pages and referential relationships among groups of academic publications authors.
question
Machine Learning
answer
The process by which a computer learns from experience (e.g., using programs that can learn from historical cases)
question
Microsoft Enterprise Consortium
answer
Worldwide source for access to Microsoft's SQL Server 2008 software suite for academic purposes - teaching and research.
question
Multi-Layered Perceptron (MLP)
answer
Layered structure of artificial neural where several hidden layers can be placed between the input and output layers.
question
Neural Computing
answer
An experimental computer design aimed at building intelligent computers that operate in a manner modeled on the functioning of the human brain.
question
Neural Network
answer
See Neural Network (ANN). Computer technology that attempts to build computers that once operate like a human brain. The machine posses simultaneous memory storage and work with ambiguous information. Sometimes called, simply, a neural network.
question
Neurons
answer
A cell (i.e., processing element) of a biological or article neural network.
question
Nominal Data
answer
A type of data that contains measurements of simple codes assigned to objects as labels, which are not measurements. For example, the variable marital status can be generally categorized as (1) single (2) married, and (3) divorced.
question
Numeric Data
answer
A type of data that represents the numeric values of specific variables. Examples of numerically valued variables include age, number of children, total household income (in U.S. dollars), travel distance (in miles), and temperature (in Fahrenheit degrees).
question
Ordinal Data
answer
Data that contains codes assigned to objects or events as labels that also represent that rank order among them. For example, the variable credit score can be generally categorized as (1) low, (2) medium, and (3) high.
question
Pattern Recognition
answer
A technique of matching an external pattern to a pattern stored in a computer's memory (i.e., the process of classifying data into predetermined categories). Pattern recognition is used in inference engines, image processing, neural computing, and speech recognition.
question
Prediction
answer
The act of telling about the future.
question
Processing Elements (PE)
answer
A neuron in a neural network.
question
RapidMiner
answer
A popular, open-source, free-of-charge data mining software suite that employs a graphical enhanced users interface, rather large number of algorithms, and a variety of data visualization features.
question
Ratio Data
answer
Continuous data where both difference and ratios are interpretable. The distinguishing feature of a ratio scale is the possession of a non arbitrary zero value.
question
Regression
answer
A data mining method for real-world prediction problems where the predicted values (i.e., the output variable or dependent variable) are numeric (e.g., predicting he temperature for tomorrow as 68 degree's Fahrenheit.
question
Result (Outcome) Variable
answer
A variable that expresses the result of a decision (e.g., one concerning profit), usually one of the goals of a decision-making problem.
question
SAS Enterprise Miner
answer
A comprehensive, and commercial data mining software tool developed by SAS Institute.
question
SEMMA
answer
An alternative process for data mining projects proposed by the SAS Institute. The acronym "SEMMA" stands for "sample, explore, modify, model, and asses."
question
Sensitivity Analysis
answer
The technique used to detect favorable and unfavorable opinions toward specific products and services using a large numbers of textual data sources (Customer feedback in the form of WEB postings).
question
Sequence Mining
answer
A pattern discovery method where relationships among the things are examined in terms of their order of occurrence to identify associations over time.
question
Sigmoid Function
answer
An S-shaped transfer function in the range of 0 to 1.
question
Simple Split
answer
Data is partitioned into two mutually exclusive subsets called a training set (or holdout set). It is common to designate two-thirds of the data as the training set and the remaining one-thrid as the test set.
question
SPSS PASW Modeler
answer
A very popular, commercially available, comprehensive data, text, and Web mining software suite developed by SPSS (formerly Clementine).
question
Summation Function
answer
A mechanism to add all the inputs coming into a particular neuron.
question
Supervised Learning
answer
A method of training artificial neural networks in which same cases are shown to the network as input, and the weights are adjusted to minimize the error in the outputs.
question
Support
answer
The measure of how often product and/or services appear together in the dataset that contain all of the products and/or services mentioned in a specific rule.
question
Support Vector Machines (SVM)
answer
A family of generalized linear models, which achieve a classification or regression decision based on the value of the linear combination of input features.
question
Synapse
answer
The connection (where the weights are) between processing elements in a neural network.
question
Transformation (transfer) Function
answer
In a neural network, the function that sums and transforms inputs before a neuron fires. It shows the relationship between the internal activation level and the output of a neuron.
question
Unsupervised Learning
answer
A method of training artificial neural networks in which only input stimuli are shown to the network, which is self-organizing.
question
Weka
answer
A popular, free-of-charge, open-source suite of machine-learning software written in Java, developed at the University of Waikato.