Important and application of data mining Essay Example
Important and application of data mining Essay Example

Important and application of data mining Essay Example

Available Only on StudyHippo
  • Pages: 8 (2112 words)
  • Published: September 27, 2017
  • Type: Research Paper
View Entire Sample
Text preview


In the current time, the population of the country in question is witnessing notable economic growth that continues to rise. This can be accomplished by consistently implementing a systematic approach. Consequently, employing data mining techniques can aid in improving decision-making within the organization.

This paper explores the significance and implementation of data mining in different fields, depending on the objectives and aims of a study within an organization. To illustrate how data mining can be utilized, three main areas - hotel, library, and hotel - are used as examples. Keywords: Data Mining, KDD Process, Decision Trees, Ant Colony Clustering Algorithm, Association Rules, Neural Network, Rough Set.


As we all know, organizations that engage in business transactions maintain a vast amount of documents or data in a specific database for future retrieval purposes. This data is gath


ered from various departments that perform diverse tasks aligned with the organization's mission and vision. According to Imberman (2001), large databases may contain thousands or even tens of thousands of fields.

Thus, it is crucial to use available data when making strategic decisions in order to avoid actions that may harm the organization. Moreover, as user requirements change, data quickly becomes obsolete. Data mining techniques enable more efficient analysis compared to conventional methods, which can be time-consuming. Prompt action is essential for competing with competitors and enhancing service and product quality. Furthermore, a team of individuals can provide creativity and synthesis in interpreting results to generate solutions for tasks or projects.

Clearly, data mining provides assistance in various fields for different purposes depending on the desired objectives. The rest of this paper is structured as follows: Section 2 provides th

View entire sample
Join StudyHippo to see entire essay

definition of data mining. Section 3 establishes the significance of data mining. Section 4 elucidates the application of data mining in diverse fields. Section 5 summarizes the conclusions.

Definition of Data Mining

There are several definitions provided by researchers and academics based on their perspective and research. These definitions serve to provide an initial understanding of data mining techniques. Data mining is primarily concerned with manipulating large amounts of data stored in databases. The goal is to identify relevant variables that contribute to the accuracy of predictions used for problem-solving. This definition is provided by Gargano ; Raggad, 1999.

Data excavation involves searching for hidden relationships, patterns, correlations, and similarities within large databases. This is something that traditional methods of collecting information, such as creating surveys, generating pie and bar graphs, user querying, and using decision support systems (DSSs), may overlook. Another writer also shares the same opinion regarding the definition of data mining, which is to uncover hidden patterns, trends, and tendencies.

According to Palace (1996), data mining is the process of discovering correlations or patterns among numerous fields in large relational databases. It can also be defined as the process of extracting knowledge or information by using an appropriate model or framework to analyze data and produce relevant outcomes for the research purpose. Imberman (2001) agrees with this statement and adds that data mining is known by various names such as knowledge extraction, data discovery, data harvesting, exploratory data analysis, data archaeology, data pattern processing, and functional dependence analysis. These different terms highlight the use of specific models or frameworks to reveal the actual circumstances. As defined by Ma, Chou, and Yen (2000), data mining involves

employing artificial intelligence techniques like advanced modeling and rule induction on a large dataset to uncover patterns within the data. Additionally, the steps taken during data mining analysis vary depending on the chosen methodology.

Each methodological analysis is not significantly different from other methods. Forcht ; Cochran (1999) define data mining as an interactive process of organizing data for analysis. The data should be examined for errors or defects, such as extreme outliers, and then removed.

Importance of Data Mining

As discussed earlier, data mining can benefit various parties and levels within an organization. By applying a suitable model, time and cost can be reduced.

Then, the consequences allow the responsible cognition worker to effectively transform into the strategic value of information by critically analyzing the consequence. The procedure should be done carefully to avoid removing or not including useful variables or algorithms in the extraction of dependable information. Data mining techniques will assist in selecting a part of information using appropriate tools to filter outliers and anomalies within the set of data. According to Gargano ; Raggad (1999), there are a few other important aspects of data mining, including: being able to uncover previously hidden information, discovering rules, classifying, dividing, associating, and optimizing. Additionally, according to Goebel ; Gruenwald (1999), several methods are used to search for patterns in data, clarifying vagueness and identifying relationships between variables within databases. The resulting insights will guide decision-making and help predict the impact of actions.

The methodological analysis chosen should adhere to regulations and the status of the information being analyzed. The methods include:
- Statistical Methods: primarily focused on testing preconceived hypotheses and fitting theoretical explanations to data.
- Case-Based Reasoning (CBR):

a technology that solves a given problem by directly using past experiences and solutions.
- Neural Networks: networks comprised of numerous artificial nerve cells interconnected in a manner similar to brain cells, enabling the network to "learn".
- Decision Trees: where each non-terminal node represents a test or decision based on the data point being considered and can also be seen as a specific form of a rule set organized hierarchically.
- Rule Initiation: rules establish statistical correlations between the occurrences of certain properties in a data point or between specific data points in a data set.
- Bayesian Belief Networks: graphical representations of probability distributions derived from accompanying counts in the dataset.
- Genetic algorithms/Evolutionary Programming: generate hypotheses about dependencies between variables, typically in the form of association rules or another internal formalism.

  • Fuzzy Sets: Fuzzy sets are a powerful approach to dealing with incomplete, noisy, or imprecise data, and they can also be useful in constructing uncertain models of the data that offer smarter and smoother performance compared to traditional systems.
  • Rough Sets: Rough sets are a mathematical concept that addresses uncertainty in data and can be used as a standalone solution or combined with other methods like rule induction, classification, or clustering.

A· The ability to automate and integrate everyday, repetitive, tedious decision steps without continuous human intervention. Various steps are involved in processes or analyses on selected information, where the process includes filtering, transforming, testing, modeling, visualizing, and documenting the result or storing it in databases or data warehouses. Each step functions differently and has a responsibility in carrying out the process in order to make it easier and produce high-quality assumptions by automatically generating specific

conditions. For example, a data warehouse also stores past analyses, which allows eliminating redundant output at certain steps.

Ma, Chou & A; Yen (2000) emphasized the features of data mining that aid in the final analysis process. These features include data form finding, which allows users to retrieve or display specific data using data-access languages or data-manipulation languages (DMLs). It also enables users to input query specifications and select desired information from a menu, with the system automatically building the SQL command.

Data mining also has formatting capability, generating various data formats such as tables, spreadsheets, multidimensional displays, and visualizations. Additionally, it possesses content analysis capability for analyzing user-specified content and synthesis capability for timely execution of data synthesis.

In addition to its analytical features, data mining reduces costs and potential errors in decision-making. By following a well-defined methodology and avoiding delays in decision-making that could impact business operations significantly, it minimizes prediction errors.

When managing information in the strategic planning process, various factors must be considered. These include the objectives of analysis, amount of information, variables and their relationships, and adopted testing methods. It is crucial to consult professionals and incorporate the study during the planning phase. Typically, a specific unit or group within an organization is tasked with discovering hidden patterns in another department. Regular meetings between professionals and researchers are essential to ensure that the final outcome satisfies their needs and enhances worker, department, and organizational performance. This approach helps minimize costs compared to traditional research methods that rely on time-consuming data collection from respondents.

If the questionnaire method is utilized, it has the advantage of being a quick and time-saving approach. In contrast, if researchers opt

for the interviewing method, it requires more time due to the necessity of multiple meetings with respondents in case of any uncertainties or inadequate answers. Additionally, certain studies may require researchers to travel to various locations in order to collect genuine opinions. This incurs expenses such as accommodation, food, and flight tickets. On the other hand, data mining involves making use of existing data stored in a data warehouse (e.g., customer transactions, student enrollment records, patient operation records), which significantly reduces the cost involved in obtaining information. When determining their study's objective, researchers start by searching for previous studies within the data warehouse. If they find an appropriate study, certain steps can be skipped or simplified—thus demonstrating how data mining also saves costs and time. As stated by Gargano & Raggad (1999), data mining brings long-term benefits by reducing overall costs associated with system development, implementation, and maintenance.

The application of Data Mining

Data Mining is widely used in various fields, particularly for organizations that prioritize consumer orientation. Industries such as retail, finance, communication, and marketing heavily rely on data mining for analysis and decision-making (Palace, 1996). The healthcare sector also benefits from the use of data mining in its daily operations. Each organization in these different fields deals with unique transactions that are stored in databases. These databases allow for analysis aimed at increasing revenue, attracting more customers, improving customer satisfaction, and other purposes. According to Palace (1996), data mining enables the identification of relationships between internal factors (such as price, product placement, or staff skills) and external factors (such as economic indicators, competition, and customer demographics). Three examples of data

mining applications can be seen in the hotel industry, library management, and hospitals. The main goal is to address weaknesses or failures by interpreting the data mining results and using them to make informed decisions for the best possible solutions.

The illustrations consist of: A information excavation attack to develop profiles of hotel clients, which was conducted by Min, Min, and Ahmed Emam in 2002. The aim was to identify valued clients for targeted interventions based on their anticipated future profitability to the hotel. The following inquiries were made regarding client profiling: 1) Which clients are likely to return as repeat guests? 2) Which clients are most likely to switch to competing hotels? 3) Which service attributes are more important to specific clients? 4) How can the client population be segmented into profitable and unprofitable clients? 5) Which segment of clients best aligns with the hotels' current service capacities? To analyze the data, the researchers utilized decision trees and employed data mining methodology that allows for the generation of relevant rules using visualization and simplicity. The process involves three steps: 1) Data aggregation, which entails selecting data that aligns with the research's objectives and filtering out unwanted information from databases. 2) Data formatting, where all data in the spreadsheet is converted into Statistical Packages for Social Sciences (SPSS) for the purpose of categorization accuracy.The text states that the procedure of choosing algorithms to construct decision trees is known as rules initiation and the specific algorithm used in this case is C5.0. The purpose of this process is to generate sets of rules that provide important hints for hotel managers to take further action. The researcher discovered

that these "if-then" rules are useful in explaining a customer retention strategy with a predictive accuracy ranging from 80.9% to 93.7%. The accuracy of the predictions depends on the conditions set by the rules. The utilization of data mining technology is suggested for offering recommendation services in digital libraries.

A survey was conducted by Chen & A ; Chen, 2006 to provide a recommendation system architecture for improving digital library service in electronic libraries. The diverse range of digital publication formats, such as sound, picture, image, etc., creates difficulties in analyzing and specifying keywords and content in order to extract information from the user to enhance the service in digital libraries. In the methodology section, two data mining models were selected, including the

Ant Colony Clustering Algorithm;

This model is capable of finding the shortest path or reducing time to find the best-fit solution for the problem faced by organizations.

Data Mining refers to the process of extracting useful information from large sets of data. The definition of Data Mining was retrieved on March 2, 2010, from the website:

Get an explanation on any task
Get unstuck with the help of our AI assistant in seconds