Data Warehousing And Data Mining Essay Example
Data Warehousing And Data Mining Essay Example

Data Warehousing And Data Mining Essay Example

Available Only on StudyHippo
  • Pages: 15 (4049 words)
  • Published: April 22, 2017
  • Type: Research Paper
View Entire Sample
Text preview

Patterns of Data Mart Development

In the beginning, there were only the islands of information: the operational data stores and legacy systems that required enterprise-wide integration; and the data warehouse: the solution to the problem of integrating diverse and often redundant corporate information assets. However, data marts were not initially included in the vision. However, it became apparent that the vision was too ambitious. Many organizations found it too challenging, expensive, politically sensitive, and time-consuming to directly implement a data warehouse.

A data mart is a subset of an enterprise's data that focuses on specific functions or activities. It serves as a decision support system with specific business-related purposes, such as measuring marketing promotions' impact, forecasting sales performance, analyzing new product introductions' impact on profits, or evaluating the performance of a new company division. Althou

...

gh data marts can incorporate substantial data, even hundreds of gigabytes, they contain less information compared to a data warehouse developed for the same company.

Furthermore, as data marts have a narrower focus on specific business purposes, the processes of system planning and requirements analysis become more manageable. This results in lower costs for design, implementation, testing, and installation compared to data warehouses. In summary, data marts can be completed within a few months and at a cost of hundreds of thousands, rather than millions, of dollars. As a result, they can be accommodated within divisional or departmental budgets rather than relying on enterprise-level funding. This raises the issue of project justification and politics.

There are three reasons why data marts are politically easier to navigate. Firstly, because they are cheaper and often do not require accessing organization-wide budgets, they are less

View entire sample
Join StudyHippo to see entire essay

likely to cause conflicts between departments. Secondly, because they can be completed quickly, they can swiftly demonstrate successful models and gain support from corporate constituents. Thirdly, as data marts serve specific functions for a division or department that falls under their recognized responsibilities, their political justification remains straightforward. It is evident that managers should have access to the best decision support available as long as it remains affordable for their business unit and the technology meets the requirements.

For the first time in computing history, it is possible for conditions to exist which allow for the development of Decision Support System (DSS) applications. Due to this, data marts have become a popular alternative to data warehouses. However, as data marts have gained popularity, three distinct patterns or models of data mart development have emerged. The first model views data marts as subsets of the data warehouse, located on more affordable computing platforms closer to the end-user. These subsets are often aggregated and regularly updated from the central data warehouse. In this model, the data warehouse serves as the parent of the data mart. The second pattern of development rejects the idea of the data warehouse being the primary source and instead sees data marts as derived independently from existing islands of information that predate both data warehouses and data marts.

The data mart utilizes data warehousing techniques and tools to organize data. Essentially, the data mart is a smaller version of a data warehouse with a specific business function. Interestingly, the data warehouse is derived from multiple data marts, creating an evolutionary relationship between them.

The third pattern of development aims to combine and resolve the conflict present

in the first two patterns. In this approach, data marts are created simultaneously with the data warehouse. While both stem from isolated pockets of information, data marts do not have to wait for the implementation of the data warehouse. Instead, each data mart is guided by the enterprise data model established for the data warehouse and is developed in accordance with this model. As a result, the data marts can be completed rapidly and can be adjusted later once the enterprise data warehouse is finalized.

These three patterns of data mart development share a common perspective that does not explicitly acknowledge the importance of user feedback during the development process. Each perspective assumes that the relationship between data warehouses and data marts remains relatively static. One perspective sees the data mart as a subset of the data warehouse, another sees the data warehouse as a result of the data mart, and the third sees parallel development where the data mart is guided by the data warehouse's data model and eventually replaced by it to address the islands of information issue. Regardless of the perspective taken, the role of users in shaping the dynamics between data warehouses and data marts is disregarded.

This white paper focuses on the dynamics discussed above. It begins by providing a more detailed explanation of the original three models. Following that, it presents three alternative models that take into account user feedback in the development of data warehouses and data marts. Lastly, it evaluates the usefulness of the six development patterns in relation to a specific perspective on organizational reality. The top down model, depicted in Figure One, is one of the development

models discussed.

The data warehouse is created by extracting, transforming, and transporting (ETT) information islands. It combines all data in a unified format and software environment. The goal is to consolidate an organization's data resources in the data warehouse. Decision support data is stored in the data warehouse. Once implemented, there is no longer a need for consolidation. The remaining tasks are distributing the data to consumers and presenting it as information for them.

The purpose of data marts is to present specialized subsets of the data warehouse to meet the specific needs of consumers. They also assist in organizing the data for it to be transformed into information. Additionally, they provide an interface for front-end reporting and analysis tools, which ultimately enable business intelligence to be generated. It is important to note that the relationship between data marts and the data warehouse is strictly one-way. Data marts are created from the data warehouse and can only contain what the data warehouse contains. The fulfillment of information needs by data marts is limited to what the data warehouse can fulfill.

The data warehouse is responsible for storing all the necessary data for decision support within an enterprise. If users discover any needs that the data warehouse does not meet, they must engage the enterprise's data warehouse managers to modify the warehouse structure and add or modify the necessary data. The model does not provide an explanation of how changing user needs are recognized and fulfilled. However, it is inconsistent with the model to assume that data marts can fulfill changing needs without first making changes to the data warehouse. The Bottom Up Model, depicted in Figure Two, illustrates

the bottom-up development pattern.

The diagram labeled Figure Two shows that data marts are created using existing islands of information. The data warehouse is then constructed from these data marts. In this model, the data marts are designed and implemented independently, meaning they are not related to each other by design. This type of growth is likely to result in redundancy and important information gaps from an enterprise perspective. Each data mart integrates islands of information specifically for its own function, but this integration is limited to the business function that sponsors the data mart. From the enterprise perspective, this process creates new legacy systems, which become new islands of information.

The new islands have made progress by employing updated technology. However, they are still not integrated and coherent like the old islands were, nor are they capable of supporting enterprisewide functions. The right-hand side of Figure Two showcases the data mart islands, which serve as the foundation for an integrated data warehouse. Another ETT process supports this integration, which will remove redundancy in the data marts, identify gaps left by the isolative data mart creation process, and integrate the old islands of information into the new data warehouse to address these gaps.

This model does not consider the possibility of using older islands of information. It incorrectly assumes that the flow from data marts to the data warehouse will be enough to create a comprehensive data warehouse for the enterprise.
The second model is not clear on what happens after the data warehouse is built. Will the data warehouse become the parent of the data marts and follow a top-down approach? Or will the data warehouse still

be subordinate to the evolving data marts, leading to periodic adjustments in the enterprise data warehouse to align with the changed data marts? These questions are left unanswered in the second model, which only focuses on the creation of the data warehouse.
Among the first three models, parallel development is the most popular pattern of development.

The parallel model acknowledges that the independence of the data marts is limited in two ways. Initially, the development of the data marts must adhere to a data warehouse data model that reflects the perspective of the entire enterprise. This same data model will serve as the basis for ongoing development of the data warehouse, ensuring that the data marts and the data warehouse can be compared and that any gaps or redundancies in information will be anticipated and documented as the construction of the data marts progresses. However, during this process, the data marts will retain a significant level of autonomy. In fact, as the data marts evolve, insights may be gained that will lead to modifications in the enterprise data warehouse model.

Changes that may benefit other data marts being created, as well as the data warehouse itself. Second, the independence of data marts is considered necessary and temporary for the construction of a data warehouse. Once the goal is achieved, the warehouse will replace the data marts, which will become subsets of the fully integrated warehouse. From then on, the data warehouse will supply established data marts, create subsets for new data marts, and determine the course of data mart creation and evolution. The third pattern addresses some of the complexities in the relationship between the data

warehouse and data marts.

The parallel view acknowledges the importance of data marts for providing decision support to organizational departments and divisions in the short-term. It recognizes that data warehouse development projects may take time, so data marts are necessary and valuable applications for organizations to pursue. The parallel view also sees the data marts as contributing to the overall data warehouse by influencing evolution in the enterprise data model. Unlike the second pattern, the parallel view does not allow for unchecked growth in data marts. The contents of data marts should be determined based on the enterprise wide data model.

Redundancies and information gaps will be meticulously monitored. The enterprise data model will track the activities and achievements of data mart projects and be adjusted accordingly. In the parallel development perspective, data mart activities will aid in integrating fragmented information within the data warehouse by forming integrated islands of information aligned with the overall plan established by the enterprise data model. Eventually, these islands will be assimilated into the comprehensive integration of the completed data warehouse.

6 of 14 5/24/02 4:15 PM Data Warehouses and Data Marts: A Dynamic View file:///E|/FrontPage Webs/Content/EISWEB/DWDMDV.html The third view still remains challenging as it relies on the quick advancement of the enterprise data warehouse model. Decision support consumers are not willing to wait.

When organizations have budgets and can support the creation of data marts, the wait for a data model is considerably shorter than the wait for a full-blown data warehouse. However, in large organizations, the JAD sessions and requirements analyses that precede data model development can take many months. This process must be done carefully. For the enterprise data

model to effectively guide data mart development, it must comprehensively address all data needs. Any time the enterprise model fails to identify a necessary table or attribute for a data mart, it loses some legitimacy and reinforces the belief that waiting for the enterprise data warehouse model was not worthwhile and that adjusting it will also not be worthwhile. Additionally, the parallel view assumes that once the data warehouse is built, the data marts will become subsets of the warehouse rather than independent entities.

The assumption that parallel development will end and the data warehouse will meet everyone's needs is flawed. It assumes a centralized approach that is no longer relevant in large enterprises. The first three patterns of development lack consideration for continuous user feedback on data mart and data warehouse activities. While user requirements are considered in the construction of data marts or warehouses, they are dynamic and evolve with exposure to new applications and technologies. Additionally, changes in requirements extend beyond hardware, data mining techniques, database software, and GUI interfaces.

The text explains that changes in information and data requirements may involve adding new attributes and tables to data warehouses and data marts, as well as reorganizing existing ones. These new requirements can impact data models at both the data mart and data warehouse levels. The handling of these requirements will depend on the nature and amount of feedback from users. The text introduces three patterns of development based on user feedback: top-down with feedback, local management with feedback, and a combination of both. The focus then shifts to discussing the development model of a pioneer organization that implemented a data warehouse before creating

any data marts.

Suppose the requirements analysis process was done carefully, and the enterprise data warehouse now contains all of the data and conceptual domains suggested or implied by that process. You are now tasked with developing an application to measure the performance of your department over the past three years and forecast it three months into the future. What information does the enterprise data warehouse need to include in order for you to complete this assignment? Certainly, it needs to include indicators that track performance outcomes, such as changes in sales, profits, and costs.

Some causal variables may be included in the data warehouse, but it is unlikely that all attributes needed will be provided unless the creation of the data warehouse identified all domains and attributes within a comprehensive conceptual framework. This framework should encompass concepts and attributes from all relevant causal models related to the department's performance. Additionally, the data warehouse is not likely to be constructed with a causal modeling perspective unless there was consideration for a data mart during the requirements analysis for the data warehouse. The department's representative would not have had a reason to think in those terms or undergo preparation necessary for such thinking during the data warehousing JAD sessions or other requirements gathering tasks.

Your representative likely provided essential facts and analytical hierarchies to the data warehouse team, such as company organization, geography, time, and product hierarchies. However, the complete set of causal dimensions necessary to measure performance, distinguish it from accident, and separate it from positive or negative outcomes is probably missing. Now, you have been assigned a task that requires some causal modeling. So, what should you

do? My suggestion is to create a data mart by selecting a subset of the data warehouse. However, data gathering doesn't stop there. You can either gather data yourself if your department supports it or obtain data from external services that offer relevant data for your specific problem. If you can do either of these options, you can then supplement the subset of the data warehouse with the new data, undergo a new albeit limited ETT process, and construct a data mart that will be suitable for your analysis task.

The requirements from your boss have caused changes in the data mart, causing it to expand beyond the corporate data warehouse's boundaries. However, not exceeding these boundaries would negatively impact your departmental function and ultimately your job and your boss's job. The integrity of the enterprise data warehouse is not as important as performing your departmental function. This is the initial stage of user feedback as shown in Figure Four. The second stage occurs when the modifications in your data mart are integrated with the enterprise data warehouse.

The process can occur sooner or later. If a company is wise enough, it will allow continuous feedback from departmental data marts to the data warehouse, as well as the integration of necessary changes at the departmental level. Alternatively, a company may choose to ignore the changes made in the departmental data marts. In this case, the changes may accumulate over time, leading to a problem of fragmented information within the organization. Eventually, all these changes will come at once, with both sides blaming each other for allowing the data warehouse to become so disconnected from reality. Regardless of

the pattern that applies, the top-down model of the data warehouse will be influenced by departmental user feedback and the adaptation of departmental data marts.

If the continuous pattern of adjustment to departmental changes is adopted, the data warehouses and data marts will evolve gradually. This evolution will involve continuous feedback from the periphery to the center and continual adjustment of both the periphery and the center to each other. The enterprise data warehouse will not result in a once-and-for-all decision support nirvana, but rather a healthier process of continuous conflict and growth in business intelligence.

html The three user feedback models have similarities in their ability to adjust to user feedback in the long term. After implementing the data warehouse, all three patterns offer the option of incorporating a continuous adjustment process between the data warehouse and the data marts. Alternatively, they provide the opportunity to centralize DSS development in the data warehouse by migrating to the top-down model. However, there are notable differences between the three patterns in the short term.

In the top-down pattern, user feedback prior to implementing the data warehouse involves participating in various activities throughout the software development process, such as system planning, requirements analysis, system design, prototyping, and system acceptance. However, due to the reasons mentioned earlier, this involvement may result in gaps in coverage for causal domains and attributes or unforeseen side effects of departmental performance activities.

On the other hand, the bottom-up pattern ensures much more comprehensive coverage of causal and side effect dimensions by starting development with data marts.

The implementation of the data warehouse means that there will be little initial gap between user data mart requirements and

what is in the data warehouse. This small gap could lead to a decision to migrate to the top-down model for long-term development. However, if this danger is avoided and continuous adjustment path to development is followed, the initial small gap will result in a less painful adjustment process. The future should involve continuous adjustments between local data marts and the enterprise-wide data warehouse. However, it should not be concluded that the bottom-up model with feedback is perfect (See Figure Five).

When using the bottom-up model to develop multiple data marts, it may not initially cause much pain. However, over time, this approach can result in new islands of information and require dealing with redundancies and information gaps during the construction of the data warehouse. On the other hand, if the top-down model with user feedback is used to create data marts after building the data warehouse, it can lead to excessive pain in adjusting to these data marts. The parallel model, shown in Figure Six, offers the most promising solution. It involves a period of mutual adjustment between the enterprise data model and the data marts during development.

The development of a data warehouse can be smooth if the center remains open to feedback from data marts and adjusts itself to departmental perspectives on causal and side effect dimensions and attributes. While data marts should follow the enterprise data warehouse model, the enterprise level model should be guided by input from the individual and collective data marts. The enterprise data warehouse data model is more than just a collection of data mart models, it must include them in order to perform its long-term coordinating and integrating

functions. The danger of implementing the parallel model is in the initial stages of development. This model assumes that the data warehouse data model is completed before data mart development begins, requiring rapid development of the enterprise level model and requiring the data marts to wait until this development is complete. However, this assumption is not necessary for the parallel model.

The data warehouse data model can be developed simultaneously with the first data marts. The data warehouse should work collaboratively with the data mart development staffs, providing guidance and coordination. It is not essential to have a complete enterprise level data warehouse model in order to identify redundancies between departments and track information gaps. Additionally, coordinating the back-ends of the data marts to ensure compatibility does not require a complete enterprise level data warehouse model.

If data marts are coordinated by a central modeling team and encouraged to complete their data marts quickly, they can provide more effective insights into the enterprise's data warehouse requirements compared to JAD or requirements gathering sessions. The development of data marts should consider user feedback to improve their effectiveness. The decision of whether to have centralized or decentralized DSS development is significant in both the short and long term. All three development patterns must decide on their next steps after the development of the data warehouse.

Will data marts be subject to centralized control or will departments and divisions have the freedom to develop their own data marts? It is evident that a combination of autonomy and coordination is the most practical approach for enterprises in the long term. However, the three development patterns remain distinct choices, even if the same

long-term policy of mutual adjustment between data marts and data warehouses is followed. The top-down pattern necessitates a considerable adjustment period to meet the needs of data marts after constructing the data warehouse, in order to mitigate centripetal forces and adapt to the inevitable development of partially independent data marts. Conversely, the bottom-up model requires an additional stage of significant ETT processing to accommodate the development of the data warehouse from the data marts.

The parallel development model necessitates a quick creation of an enterprise level data warehouse data model, unless it is adjusted to only require simultaneous development of data marts and the data warehouse, with coordination from the enterprise team. The parallel development model, which includes feedback and places less or no emphasis on a completed data warehouse data model prior to development, seems to be the recommended choice for a typical developmental pattern. However, organizations often do not have the luxury of making this "rational" choice for development. Therefore, an important question arises: what will be the distribution of different patterns of data mart/data warehouse development in organizations? To begin with, none of the initial three models will be represented since they disregard user feedback, which is an essential empirical factor in the development process. Additionally, the top-down pattern will only be applicable to a small percentage of enterprises due to opposing decentralizing forces prevalent in organizations today.

The popularity of the bottom-up pattern can be enhanced by including coordination from an enterprise-level CIO sponsored data modeling group. This coordination would help avoid the negative effects of uncoordinated bottom-up development, ensuring that the eventual data warehouse incorporates the requirements of the data marts accurately.

Furthermore, the parallel model will also gain popularity due to its ability to provide both coordination and autonomy.

It will become even more popular if the requirement for collaboration in creating data models is added, rather than relying solely on guidance from a finished enterprise-level data model. By combining the bottom-up development approach with coordination from an enterprise level data modeling group, and removing the need for the enterprise-level data model to be completed before starting data mart development, the distinction between these two models will become less clear. Real-life situations will only have slight variations in the level of central coordination needed for their data marts. Ultimately, we will witness a merging of the bottom-up and parallel models of data mart development, resulting in a gradual evolution of data marts and data warehouses through a parallel process of mutual adjustment, change, and adaptation to the new challenges faced by organizations.

Get an explanation on any task
Get unstuck with the help of our AI assistant in seconds
New