Data Warehousing Essay
Data warehousing is term which was coined by the pioneer of data management systems known by the name Bill Immon in the year 1990. It refers to a collection of data that is used for assisting in the managements process of decision making, Data warehousing is compounded by various attributes that subject the process into managerial prospects such as objectivity, stability, time variant, and integration into corporate systems.
The data is objective in that it provides information targeting to address a specific organization’s activities.It is time variant in that it provides specific information about an organization for specific time frame. Elsewhere, it is stable in that the data it stores does not change often. The data is added from time to time without any causal or unplanned withdrawal (Chen, 2002, p.
44). This assists the organization managers to have a consistent image of the organization. Some times, the data volume may grow big in volume such that the storage becomes a challenge. In such a situation therefore, the data relating to specific periods in the history of the organization are provided.
A warehouse that contains data of a single subject is known as data mart. The data is basically collected from different transactions undertaken by the organization in its operations. The required information is extracted and put together in a logical manner form a data base. Data warehousing entails the tools, techniques and methodologies that are used for building up, utilization and maintenance of data processing of programs and hard wares including the data it self. The uses of data ware house are; web mining, data mining and decision support systems (Berson, Smith, 1997, p.31).
Working of data warehousing Data ware houses are basically made for the purpose of enhancing analysis and reporting. Data warehouses usually have a specific organizational structure which is called the data warehouse architecture. The reliability of a data warehouse architecture is based on how its framework assists in constructing, maintenance and utilization of the data ware house. The simplest form of data architecture is made up of several interrelated layers. The first one is the operational database layer.This layer forms the raw data from the organizations operations like manufacturing, financial, projects, personnel, consumer relationship and supply chain systems management.
Such information is usually in enterprise resource planning (ERP) which an information system is structured to coordinate all the resources needed to complete organizational transactions. ERP is anchored on common data bank designed through modular software. It can permit all departments of an organization to store and withdraw data in any time (Orla, 1996, p. 45).The second one is called the information access layer.
It consists of the data that is accessible for the purpose of reporting and analyzing as well as the tools used for analyzing the raw facts and reporting. Business intelligence falls in this category. It is also called decision support system and it describes the techniques, uses and practices that are used for the gathering, analysis presentation and integration of organizations information. It gives the past, the current and anticipated organizations operations (Berson, Smith, 1997, p.54). Thirdly, a data warehouse is composed of data access layer.
It is the point of linkage between the informational access layer and the operational later. It includes the tools used to get the data from the source, change it and put it in the warehouse. Technically, this process is called the Extract, Transform and load (ETL). The extraction of data is normally done from different outside sources and then consolidated under one program.
Every different source utilizes a different data design.The popular sources of data designs are flat files and relational databases though sometimes it may include non-relational data bank structures like virtual storage access and information management system. It may also involve getting the data from outside sources like the screen scraping or web spidering. Extraction may change the data design to undergo the process of transformation. This process consists of a chain of functions aligned to the data extracted so as to come up with data that can be stored in the databank.
Transformation process helps to make the extracted data suitable to the needs of the organization . The transformed data is then loaded into the data warehouse which is conventionally the final target . The loading phase involves storing the transformed data in the data bank. This process varies widely depending on the needs of an organization. In some organizations, the existing information in the data warehouse is overwritten after specified period of time like a week or month with the most current data or cumulative data.
Other organizations may decide to add new/fresh data on historical basis. The time and coverage is a function of the prevailing time and also the organizations requirement (Orla, 1996, p. 38). The fourth aspect is referred to as the Meta data layer. It is also called the data dictionary.
In most of the time, Meta data layer contains more information than the operational system. Some dictionaries are organized for the whole data warehouse while others are for aimed specific tools for reporting and analysis which can access them. BenefitsThere are two major ways that are used in storing data in the data banks. Firstly, is the dimensional approach.
In this approach, the data used for transactions is divided into simple numeric transaction data known as the factors and reference information which provides a strategic fit to the facts. Among the main benefits of dimensional approach is that the information in the data bank is understood easily by the consumer and is basically simple to use. In addition the withdrawal of the data from the databank, the operation is performed quickly and conveniently.The demerits of this approach is that for one to keep the wholesomeness of the facts and the dimensions when storing data from a different source, the process is complicated and that it is hard to adjust the databank design. This mainly occurs if an organization using this approach happens to shift to a new technology of performing its activities (Nong, 2003, p. 43).
The second approach is known as the normalized approach. In this approach, the data is loaded to the data warehouse to some extend in accordance with code normalization rule.In this approach, tables are used which are generally classified according to subject areas which show the data classes. The key merit of normalized approach is that addition of information to the data warehouse is very easy. On the other hand, its demerits is that it is hard for the consumers to combine data that is obtained from various sources into information that is of good value.
Additionally, it is difficult for the consumers to retrieve the data if they do not understand the data sources and the data design of the data bank (Chen, 2002, p.71). History of data warehousing The history of data ware housing can be traced back in early 1960s when Dartmouth and General mills college undertook a joint research project and came up with the terms, facts and dimensions that are used in data loading. In the year 1970, IRI and ACNneilsen developed the data marts which were given to retail sale for use.
By 1983, Teradata came up with a data based management system which was specifically structured to aid in decision making. Data ware housing began in the late 1988 when the idea of databases was born.During that time, two great researchers from IBM Barry Devlin and Paul Murphy came up with the concept of organizations data banks. Basically, the idea was aimed at providing a structured model that could be used in flow of information from the operating systems to the management for informed decision making . The idea tried to solve different challenges that were related to the gaps in the information flow in organizations. This was mainly the information gaps related to high costs of decision making (Berson, Smith, 1997, p.
63). At this time, it was clear that without data warehouse, large amount of information was needed to assist in making decisions by addressing various business environmental parameters that existed. In the subsequent two years, a company called Red Brick designed a data base management system called Red Brick that was specifically tailored for data warehouse. In 1991, another company known as the Prism Solutions developed a software called the prism warehouse manager that was particularly designed for data warehousing.In the same year also, a man called Bill Inmon published a book known as building the data warehouse which was targeting to educate people on how to develop data banks. In the year 1995, an institute for data warehousing was initiated.
It was a profit making institution that was aimed at promoting data banking. Due to increasing demand for data banking services in the year 1996, another man called Ralph Kimball wrote another book for data warehousing called ‘The data ware housing tool kit’.Late in 1997, the program ‘Oracle 8’ was introduced. It had the support for star queries (Nong, 2003, p.
57). The evolution in use of data warehouse in organizations is very important when it comes to introduction of data warehousing system to any organization. Many organizations usually begin with just simple use of data ware house that is developed with time into advanced uses. The stages involved in data ware housing evolution included; firstly, the offline operational database.At this stage, databank is developed by retrieving the data from an existing data base management system to a different server.
The second stage is the offline data warehouse. At this stage, the data banks are adjusted regularly by data from the existing systems. The third stage is the real time data warehouse. Here, the data bank is adjusted after every time the system undertakes a transaction. The forth stage is called integrated data warehouse where it is updated after every transaction is done by the operating system.This is followed by generation of transaction that is submitted to the operating system (Nong, 2003, p.
49). Disadvantages While data ware house has proved to be useful it has some disadvantages. Firstly, data banks are ideally expensive to run. They have very high cost of maintenance. Secondly, data banks become out dated with a short period of time and the process of updating is very costly. Thirdly, if the data base contains some sensitive information, permission to access it may be limited to some few people so that security of the data is safe guarded.
This limitation of access to the data may have adverse effects to the organization (Orla, 1996, p. 34) Advantages The benefits of data warehousing are comprehensive in their description which are modeled in terms of the efficiency that emanates from this process. At one level, they offer a common data store for providing all the information needed irrespective of the source of the data. This makes it more efficient and effective to report and analyze data as compared to multiple data models that are employed in retrieval of data.
Secondly, it offers an opportunity to edit the data before it is loaded to the data bank so that inconsistencies are pinpointed and corrected in advance. This contributes enormously in making the analysis and reporting easy. Thirdly, the data in the data banks is safely stored for a long period of time and it is under management of data warehouse operators who makes sure that there is limited interference from external effects. Fourthly, since data banks are independent from operational systems, information can be accessed without reducing the speed of the operating system.Additionally, they facilitate customer relationship management systems (CRM).
Elsewhere, they enhance the support systems applications that are used by the managers in decision making like reports that show the operations of the organization, reports on trend of performance of an organizations and strategies toward achievement of its goals (Chen, 2002, p. 87). Generally therefore, efficiency in decision making in the current competitive and ever changing world can only be attained through the use of data warehousing to which informed decisions are pegged.As a machine and a tool for management, data warehousing provides the rationally for strong information exchange and storage for the organization.ReferenceBerson Alex & Smith Stephen (1997) Data Warehousing, Data Mining and OLAP. New York, McGraw-Hill, pp.
31, 54, 63 Chen Zhengxin (2002) Intelligent Data Warehousing: From Data Preparation to Data Mining. New York, CRC Press, pp. 44, 71, 87 Nong Ye (2003) The Handbook of Data Mining. New York, Lawrence Erlbaum Associates, pp.
43, 49, 57 Orla O’Sullivan (1996) Data Warehousing –without the Warehouse. ABA Banking Journal, Vol. 88, pp. 34, 38, 47