Chapter 8: Big Data, Data Warehouses, and Business Intelligence Systems Essay

question

Big Data
answer

The term for the enormous datasets generated by the Web applications such as search tools and social networks.
question

Business Intelligence (BI) systems
answer

Information systems that assist managers and other professionals in the analysis of current and past activities and in the prediction of future events. Unlike transaction processing systems, they do not support operational activities, such as the recording and processing of orders. Instead they are used to support management assessment, analysis, planning, control and ultimately, decision-making.
question

operational systems
answer

A database system in use for the operations of an enterprise, typically an OLTP system.
question

Online Transaction processing (OLTP) System
answer

An operational database system available for, and dedicated to, transaction processing or the ongoing stream of businesses transactions. Also known as transactional system.
question

1. Read and process data existing in the operational database but they use the operational DBMS to obtain this data, but do not insert, modify or delete operational data 2. Process data extracted from operational databases, in this situation, they manage the extracted database using a BI DBMS which may be the same as or different from the operational DBMS 3. BI systems read data purchased from data vendors
answer

From where do BI systems obtain data? And how do they process the obtained data?
question

reporting systems
answer

This type of BI system sorts, filters, groups and makes elementary calculations on operational data. Summarizes current status to past or predicted status. Classify entities (customers, products, employees, etc.) and report delivery crucial.
question

data mining applications
answer

This type of BI system performs sophisticated analyses on data, analyses that usually involve complex statistical and mathematical processing. They are used for: what-if analyses, predictions, decisions. The results are often incorporated into some other report or system.
question

Online Analytical Processing (OLAP)
answer

A technique for analyzing data values, called measures, aganist characteristics associated with those data values, called dimensions. They are sometimes used to ease the task of report production.
question

1. Most data mining applications have only a few users and those users have sophisticated computer skills 2. The results of a data mining analysis are usually incorporated into some other report, analysis or information system
answer

Why is report delivery not as important for data mining systems as it is for reporting systems? (2 main things)
question

1. Querying data for BI applications can place a substantial burden on the DBMS and unacceptably slow the performance of operational applications 2. The creation and maintenance of BI systems require application programs, facilities and expertise that are not normally available from operations 3. Operational data have problems that limit their use for BI applications
answer

Why is operational data difficult to use?
question

data warehouse
answer

A database system that has data, programs and personnel that specialize in the preparation of data for BI processing.
question

extract, transform and load (ETL) system
answer

This system cleans and prepares data for BI processing. Problematic operational data that may have been cleaned in this system can also be used to update the operational system to fix the original data problems.
question

-“dirty data” problematic data such as “G” for a alue of gender, “213” for age, part of a color “rd”, etc. -missing values -inconsistent data (data that has changed, such as a customer’s phone number or address) -nonintegrated data (data from two or more sources that need to be combined) -incorrect format -too much data
answer

What are some of the problems of using operational data for BI processing?
question

data warehouse metadata database (or simply, “data warehouse”)
answer

The place where metadata concerning the data’s source, format, assumptions, constraints and other facts are kept.
question

data warehouse
answer

Can be thought of like a distributor in a supply chain. Takes data from the manufacturers (operational systems and purchased data), cleans and processes them and locates that data. The people who work in a data warehouse are experts at data mgmt, cleaning, transformation and the like. They are usually not experts at a given business function.
question

data mart
answer

A collection of data that is smaller than the data warehouse and addresses a particularly component or functional area of the business. Users obtain data from the data warehouse that pertain to a particular business function. They do not have the expertise of data warehouse employees, but they are knowledgable analysts for a given business function.
question

enterprise data warehouse (EDW) architecture
answer

The system design of a corporation’s BI data. In one configuration, a data warehouse maintains all enterprise BI data and acts as an authoritative source for data extracts provided to the data marts. Data marts receive all data from the data warehouse.
question

dimensional database
answer

A database design that is used for data warehouses and is designed for efficient queries and analysis. It contains a central fact table connected to one or more dimension tables.
question

1. Used for unstructured analytical data processing 2. Current and historical data are used 3. Data are loaded and updated systematically, not by users
answer

What are the 3 main characteristics of a dimensional database?
question

slowly changing dimension
answer

Rare changes in data information (such as a customer moving to a different city or state, or changed address) that would affect the data in a data mart
question

star schema
answer

Model for dimensional databases where fact table is the center with connected dimension tables, there is usually a date or time dimension to track changes ver time.
question

snowflake schema
answer

A more complex version of the star schema. In this schema, each dimension of the table is normalized, which may created additional tables attached to the dimension table.
question

measures (of business activities)
answer

Quantitative or factual data about the entity represented by the fact table.
question

RFM Analysis
answer

A data collection and processing analyzes and ranks customers according to their purchasing patterns. It is a simple customer classification technique that considers how recently (R) a customer orders, how frequently (F) a customer orders, and how much money (M) the customer spends per order.
question

OLAP reports
answer

Online Analytical Processing report, which usually contains the results of OLAP, which provides the ability to sum, count, average and perform other simple artithmetic operations on groups of data. Also called an OLAP cube.
question

drill down (into data)
answer

To further divide data into more detail
question

server cluster
answer

A group of associated servers
question

distributed database
answer

A database that is stored and processed on more than one computer
question

partitioning
answer

One way of distributing a database which means breaking down the database into pieces and storing the pieces on multiple computers.
question

replication
answer

A means of distributing a database by storing copies of the database on multiple computers;
question

dirty read
answer

A read of data that have been changed but not yet committed to a database. Such changes may later be rolled back and removed from the database.
question

object-oriented programming (OOP)
answer

A technique for designing and writing computer programs. Today most new program development is done using languages Java, C++, C# and Visual Basic.NET
question

objects
answer

Data structures that have both methods and properties
question

methods
answer

Computer programs that perform some task
question

properties
answer

Data items particular to an object
question

object persistence
answer

When using an OOP, the properties of the object are created and stored in main memory. This is the term for storing the values of properties of an object.
question

object-relational database
answer

Basically add-on features and functions for a DBMS product which facilitate object persistence
question

NoSQL (or Not only SQL) Database
answer

A distributed, replicated database used where this type of DBMS is needed to support large datasets.
question

-Key-Value–Dynamo and MemcacheDB -Document–Couchbase and MongoDB -Column Family- Apache Cassandra and HBase -Graph–Neo4J and AllegroGraph
answer

What are the Four Categories of NoSQL Databases?
question

supercolumns
answer

When columns are grouped into sets , i.e Customer Name ______ consists of a FirstName column and a LastName column and which stores the CustomerName “FirstName LastName”
question

column families
answer

The combination of columns and supercolumns that result to form the database storage equivalent of RDBMS tables
question

keyspace
answer

Where all the column families are contained; this provides the set of RowKey values that can be used in the data store.
question

Map Reduce Process
answer

Process used to break a large analytical task into smaller tasks, assign each smaller task to a separate computer in the cluster, gather the results of each of those tasks and combine them in the final product of the original tasks.
question

Hadoop Distributed File System (HDFS)
answer

An Apache software which provides standard file services to clustered servers so that their file systems can function as one distributed file system. It was originally part of Casandra but the project spun off a nonrelational datastore of its own called HBase and query language named Pig.

Get instant access to
all materials

Become a Member