Chapter 8: Big Data, Data Warehouses, and Business Intelligence Systems – Flashcards

Unlock all answers in this set

Unlock answers
question
Big Data
answer
The term for the enormous datasets generated by the Web applications such as search tools and social networks.
question
Business Intelligence (BI) systems
answer
Information systems that assist managers and other professionals in the analysis of current and past activities and in the prediction of future events. Unlike transaction processing systems, they do not support operational activities, such as the recording and processing of orders. Instead they are used to support management assessment, analysis, planning, control and ultimately, decision-making.
question
operational systems
answer
A database system in use for the operations of an enterprise, typically an OLTP system.
question
Online Transaction processing (OLTP) System
answer
An operational database system available for, and dedicated to, transaction processing or the ongoing stream of businesses transactions. Also known as transactional system.
question
1. Read and process data existing in the operational database but they use the operational DBMS to obtain this data, but do not insert, modify or delete operational data 2. Process data extracted from operational databases, in this situation, they manage the extracted database using a BI DBMS which may be the same as or different from the operational DBMS 3. BI systems read data purchased from data vendors
answer
From where do BI systems obtain data? And how do they process the obtained data?
question
reporting systems
answer
This type of BI system sorts, filters, groups and makes elementary calculations on operational data. Summarizes current status to past or predicted status. Classify entities (customers, products, employees, etc.) and report delivery crucial.
question
data mining applications
answer
This type of BI system performs sophisticated analyses on data, analyses that usually involve complex statistical and mathematical processing. They are used for: what-if analyses, predictions, decisions. The results are often incorporated into some other report or system.
question
Online Analytical Processing (OLAP)
answer
A technique for analyzing data values, called measures, aganist characteristics associated with those data values, called dimensions. They are sometimes used to ease the task of report production.
question
1. Most data mining applications have only a few users and those users have sophisticated computer skills 2. The results of a data mining analysis are usually incorporated into some other report, analysis or information system
answer
Why is report delivery not as important for data mining systems as it is for reporting systems? (2 main things)
question
1. Querying data for BI applications can place a substantial burden on the DBMS and unacceptably slow the performance of operational applications 2. The creation and maintenance of BI systems require application programs, facilities and expertise that are not normally available from operations 3. Operational data have problems that limit their use for BI applications
answer
Why is operational data difficult to use?
question
data warehouse
answer
A database system that has data, programs and personnel that specialize in the preparation of data for BI processing.
question
extract, transform and load (ETL) system
answer
This system cleans and prepares data for BI processing. Problematic operational data that may have been cleaned in this system can also be used to update the operational system to fix the original data problems.
question
-"dirty data" problematic data such as "G" for a alue of gender, "213" for age, part of a color "rd", etc. -missing values -inconsistent data (data that has changed, such as a customer's phone number or address) -nonintegrated data (data from two or more sources that need to be combined) -incorrect format -too much data
answer
What are some of the problems of using operational data for BI processing?
question
data warehouse metadata database (or simply, "data warehouse")
answer
The place where metadata concerning the data's source, format, assumptions, constraints and other facts are kept.
question
data warehouse
answer
Can be thought of like a distributor in a supply chain. Takes data from the manufacturers (operational systems and purchased data), cleans and processes them and locates that data. The people who work in a data warehouse are experts at data mgmt, cleaning, transformation and the like. They are usually not experts at a given business function.
question
data mart
answer
A collection of data that is smaller than the data warehouse and addresses a particularly component or functional area of the business. Users obtain data from the data warehouse that pertain to a particular business function. They do not have the expertise of data warehouse employees, but they are knowledgable analysts for a given business function.
question
enterprise data warehouse (EDW) architecture
answer
The system design of a corporation's BI data. In one configuration, a data warehouse maintains all enterprise BI data and acts as an authoritative source for data extracts provided to the data marts. Data marts receive all data from the data warehouse.
question
dimensional database
answer
A database design that is used for data warehouses and is designed for efficient queries and analysis. It contains a central fact table connected to one or more dimension tables.
question
1. Used for unstructured analytical data processing 2. Current and historical data are used 3. Data are loaded and updated systematically, not by users
answer
What are the 3 main characteristics of a dimensional database?
question
slowly changing dimension
answer
Rare changes in data information (such as a customer moving to a different city or state, or changed address) that would affect the data in a data mart
question
star schema
answer
Model for dimensional databases where fact table is the center with connected dimension tables, there is usually a date or time dimension to track changes ver time.
question
snowflake schema
answer
A more complex version of the star schema. In this schema, each dimension of the table is normalized, which may created additional tables attached to the dimension table.
question
measures (of business activities)
answer
Quantitative or factual data about the entity represented by the fact table.
question
RFM Analysis
answer
A data collection and processing analyzes and ranks customers according to their purchasing patterns. It is a simple customer classification technique that considers how recently (R) a customer orders, how frequently (F) a customer orders, and how much money (M) the customer spends per order.
question
OLAP reports
answer
Online Analytical Processing report, which usually contains the results of OLAP, which provides the ability to sum, count, average and perform other simple artithmetic operations on groups of data. Also called an OLAP cube.
question
drill down (into data)
answer
To further divide data into more detail
question
server cluster
answer
A group of associated servers
question
distributed database
answer
A database that is stored and processed on more than one computer
question
partitioning
answer
One way of distributing a database which means breaking down the database into pieces and storing the pieces on multiple computers.
question
replication
answer
A means of distributing a database by storing copies of the database on multiple computers;
question
dirty read
answer
A read of data that have been changed but not yet committed to a database. Such changes may later be rolled back and removed from the database.
question
object-oriented programming (OOP)
answer
A technique for designing and writing computer programs. Today most new program development is done using languages Java, C++, C# and Visual Basic.NET
question
objects
answer
Data structures that have both methods and properties
question
methods
answer
Computer programs that perform some task
question
properties
answer
Data items particular to an object
question
object persistence
answer
When using an OOP, the properties of the object are created and stored in main memory. This is the term for storing the values of properties of an object.
question
object-relational database
answer
Basically add-on features and functions for a DBMS product which facilitate object persistence
question
NoSQL (or Not only SQL) Database
answer
A distributed, replicated database used where this type of DBMS is needed to support large datasets.
question
-Key-Value--Dynamo and MemcacheDB -Document--Couchbase and MongoDB -Column Family- Apache Cassandra and HBase -Graph--Neo4J and AllegroGraph
answer
What are the Four Categories of NoSQL Databases?
question
supercolumns
answer
When columns are grouped into sets , i.e Customer Name ______ consists of a FirstName column and a LastName column and which stores the CustomerName "FirstName LastName"
question
column families
answer
The combination of columns and supercolumns that result to form the database storage equivalent of RDBMS tables
question
keyspace
answer
Where all the column families are contained; this provides the set of RowKey values that can be used in the data store.
question
Map Reduce Process
answer
Process used to break a large analytical task into smaller tasks, assign each smaller task to a separate computer in the cluster, gather the results of each of those tasks and combine them in the final product of the original tasks.
question
Hadoop Distributed File System (HDFS)
answer
An Apache software which provides standard file services to clustered servers so that their file systems can function as one distributed file system. It was originally part of Casandra but the project spun off a nonrelational datastore of its own called HBase and query language named Pig.
Get an explanation on any task
Get unstuck with the help of our AI assistant in seconds
New