Web Usage Mining for a Improved Web-Based Learning Environment Abstract: Web-based education is commonly chosen for remote instruction due to the ease of accessing resources, affordability, and simplicity of deployment. Advanced web-based learning environments have been developed globally and are also popular in e-commerce. However, there is limited focus on automatically detecting access patterns in web-based distance learning to understand scholars' behavior. Educators lack support to evaluate scholars' activities and differentiate their online behaviors. In this paper, we discuss data mining and machine learning techniques that can enhance web-based learning environments for educators to better evaluate the learning process and assist scholars in their journey.
Keywords: Data Mining, e-learning, Web Usage Mining, Learning Activity Evaluation, Adaptive Web Sites
1. Introduction and background:
The rapid development of the World-Wide Web (WWW) has made it the most important me
dium for collecting, sharing, and distributing information.Many organizations and corporations now provide online information and services, such as automated customer support, online shopping, and various resources and applications. Web-based applications and environments for e-commerce, distance education, online collaboration, news broadcasts, etc. are becoming more common. The World Wide Web (WWW) is widely used by people of all ages for a variety of activities, from children sharing music files with friends to seniors receiving photos and messages from grandchildren around the world.
It is not surprising to find web pages dedicated to various courses offered at universities and colleges. These pages offer course materials and related resources even for courses held in traditional classrooms. Due to its quick adoption in the field of education, the web has become the preferred medium for designing modern distance education systems. Web-based technology allows for course delivery an
knowledge sharing through platforms like Virtual-U and Web-CT which offer tools such as content sharing, conferencing systems, quizzes, workspaces, etc.
In these virtual classrooms facilitated by pedagogues or educators who provide resources and facilitate discussions among students; however tracking student activity on these tools can be challenging for them. Additionally assessing the effectiveness of class content poses difficulties too. Educators require non-intrusive methods that are automated to receive feedback from students in order to improve the learning processThe text discusses the need for a system that can guide students and recommend online resources to enhance learning. Currently, there is no such system in place for both students and teachers. However, in the field of electronic commerce, there has been research focused on capitalizing on customers' preferences and purchasing behavior to improve their buying experience and increase profits. Examples of these methods include recommendation systems like Amazon.com and moviefinder.com. Additionally, researchers have used web access history to personalize websites for visitors. Web sequence analyzer WUM is a tool designed specifically for improving web page layout based on access history. Conferences and workshops have been dedicated to analyzing internet usage for e-commerce benefits. While comparing with e-commerce seems simple, it is not as straightforward since the purpose of e-commerce is to understand client access behavior in order to increase revenues and net income.Similarly, both e-commerce and e-learning utilize web-based systems to enhance user experiences. However, the fundamental concepts in these domains differ greatly. In e-commerce, a session is defined as the time from initial website access to a purchase or ordering operation within a short timeframe. On the other hand, in e-learning, a session may span multiple access
sessions over days or even weeks to achieve desired learning outcomes.
Furthermore, while the goals of e-commerce sites are clear and quantifiable - such as promoting product sales and fostering customer loyalty - the goals in e-learning are subjective and challenging to quantify or define.
Web-based course delivery systems rely on web servers to facilitate resource and application access for users. Each request made to a web server is recorded in an access log, capturing details like the source of the request, timestamp, and requested resource. These logs effectively track user navigation and activities on the website. However, before this data can be utilized for analysis or evaluation purposes, it must undergo cleansing and transformation processes to ensure its suitability for data mining algorithms.
The subsequent section will address issues related to log cleansing and transformation. Following that discussion, section 3 will outline important data mining tasks applicable in web usage mining.In addition, section 4 presents examples that demonstrate the ways in which web usage mining can improve web-based learning environments. Finally, section 5 provides concluding remarks.
Furthermore, there are several available web log analysis tools (such as NetTracker, Webtrends, Parallel, and SurfAid). However, these tools primarily offer limited statistical analysis of log data. For example, their reports typically only show the number of clicks on a specific webpage within a given time period. The limitations of these tools prevent them from providing comprehensive insights into implicit information and hidden patterns.
Although newer products utilize more advanced analytics techniques, they still require significant manual intervention and often result in errors due to the large size of web logs [2]. Counting page entries or "hits" is the most common method for
measuring access to net resources or user engagement. However, relying solely on this method is insufficient and often leads to inaccuracies.
Web server log files generated by widely used servers do not contain enough data for thorough analysis. Nevertheless, they do contain valuable information that can be uncovered by a well-designed data mining system.The log files contain information such as the domain name or IP address of the request, username (if applicable), date and time, request method, requested file name, outcome of the request, data size sent back, referring page URL, client agent identification, and a cookie exchanged between client and server. A log entry is automatically added for each resource request made to the web server. However, reader behaviors like backtracking or reloading cached resources are not captured in the log. User entries in the log are ordered chronologically even if a single page request generates multiple entries. Identifying unique users and linking them to their access log entries is a challenge in web log mining. In e-learning applications where users are registered and logged in, this task becomes simpler but identifying sessions remains complex. The objective is to identify sequences of activities from various log entries and categorize them as learning sessions for evaluation by educators or further analysis using data mining tools. The main steps for transforming web log data include removing irrelevant entries like image requests or non-user requests such as those made by web crawlers.The text discusses the challenges of identifying sessions in web logs and how it can be done for learning activities. Sessions involve recognizing sequences of events, such as page or script requests, and determining their start and
end. However, the HTTP protocol does not track semantic sessions, making this task more complex. To map log entries to learning activities and complete traversal paths, sessions can be grouped by scholar. Integration with other data about scholars and groups is also important. Removing irrelevant entries involves filtering out certain types of requests like images or web crawlers. In e-commerce applications, sessions typically end with a purchase or checkout process, but in online learning contexts, idle times may not accurately indicate session breaks as students may visit other sites while accessing the e-learning site. Additionally, acquisition sessions in e-learning applications can span multiple years with various entries generated dynamically using requests for books like quizzes and conference messages.The process of mapping entrance log entries to actual learning activities involves replacing book calls with their assigned parameter values. This task requires comprehensive knowledge of the application books and their respective parameters and relies on a function table provided by the application decorators. The result is a sequence of relevant online activities for learners, such as login, exercise list submission, quiz completion, and reading conference messages.
To complete the traversal paths, cache hits and placeholder tampering can be inferred based on the website's structure and the effective linking of pages and activities. Lastly, integrating the cleaned click streams with existing data about learners, such as their profiles and evaluations, can provide valuable insights. For example, combining the grades associated with completed activities with the event sequences leading to these activities can help identify patterns that distinguish between effective and ineffective sequences of events.
The web log cleansing and transmutation stage is responsible for a significant portion of the effort
and resources required for web usage mining. This stage results in a database of relevant activity sequences grouped by user. These sequences are typically stored in flat files, which can be utilized by current data mining algorithms. Alternatively, the data can be stored in a data warehouse to support ad-hoc online analytical processing.After data preprocessing, the subsequent stages entail utilizing complex data mining algorithms and pattern evaluation to discover patterns. Educators delivering online courses find it crucial to uncover trends and patterns that can be interpreted. There is a significant research effort focused on developing tailored data mining algorithms and systems specifically for e-business related web usage data mining due to the importance of understanding online customer buying behavior in e-commerce.
Apart from the descriptive statistical analysis offered by most web access log analysis tools, specialized data mining approaches have been adapted for web usage mining. These approaches go beyond basic statistical measures like hit frequency, average, mean, session length, duration, etc. The commonly used methods for personalization, system improvement, site modification, and marketing intelligence include association rules generation, clustering, categorization, consecutive form analysis, and dependence patterning [9].
Although these applications were not originally designed for distance learning purposes initially but they can prove beneficial for e-learning systems as well. Association rules generation involves finding relationships between items in transactions and is often employed in market basket analysis.The text discusses the clustering and categorization of objects in web mining, such as users, events, sessions, and pages. It mentions how consecutive form analysis captures the order in which pages are requested. These techniques were originally developed for knowledge discovery from numerical databases but have been adapted for web
mining. The article focuses on a new web use excavation system called WebSIFT that enhances web-based learning environments. WebSIFT is a comprehensive set of tools that can perform information excavation tasks and identify patterns from web logs. Another tool mentioned is WebLogMiner, which uses data warehousing technology to analyze web logs and identify trends. However, these tools are not currently integrated into e-learning systems, making it challenging for educators without data mining expertise to utilize them effectively. The development of the new system aims to address this issue by allowing educators to assess online learning activities and improve course content based on identified patterns tracked on the class websiteIn the context of e-learning, web data mining algorithms can be used to enhance the learning experience. Students can receive hints from the system based on successful peers, as well as cutoffs for frequently visited pages and suggested activities based on the success of similar scholars. The system can also adapt the logical structure of class content to match the learner's pace, interests, or previous behavior.
The presentation and organization of web-based class content is not always intuitive. However, by analyzing common paths taken through the content or frequent changes in individual paths, the layout of the class can be reorganized to better meet individuals' or groups' needs.
There are two types of data mining in e-learning: offline web usage mining and integrated web usage mining. Offline web usage mining involves discovering patterns using a standalone application. This process allows educators to assess access behavior, validate learning models, evaluate learning activities, compare learners and their access patterns, and more.
To explore relationships between learning activities and identify interesting patterns in
online activity sequences, we have developed and implemented a prototype application that utilizes association rules, sequential analysis, and clustering techniques. These techniques help group similar access behaviors (15).
Overall, this text emphasizes how web data mining algorithms play a crucial role in enhancing e-learning experiences by providing personalized hints and suggestions while also improving content organization based on user behavior analysis.New algorithms have been developed to address the difficulty educators and e-learning site designers face in comprehending specific parameters and threshold values required by these algorithms. These new algorithms require minimal user input and can automatically adapt to web log data [3]. Offline web data mining helps educators analyze and validate learning models and website structure, while integrated web data mining involves identifying patterns in the e-learning application. This includes adaptive websites, personalized activities, and automated recommenders based on preferences, activity history, and access patterns from communal inputs. A recommender system based on association rule mining is currently being designed, similar to the text classification method described in [14]. It aims to identify meaningful connections between learning activities and generate real-time association rules. During a session, when activities of a rule's predecessor are verified, the activities in the rule's consequence are suggested as recommended next steps for learners. Additionally, the text classification algorithm from [14] can be used to automatically categorize messages sent by students on an asynchronous conferencing system.This classification can aid educators in evaluating information exchange in a course-related forum. The internet is valuable for online courses, especially in distance education scenarios. However, relying solely on web traffic analysis does not fully utilize hidden patterns within web logs. Web usage mining aims to extract
useful implicit and previously unknown patterns from web usage. Research is being conducted to discover these patterns and enhance profitability for e-commerce websites. However, the goals of these applications are different from those in e-learning: "turning learners into effective better learners." Data mining techniques have shown potential to enhance online education for both educators and learners. While tools using data mining techniques are being developed, research is still in its early stages. Specialized logs from the application side are needed to supplement limited information collected by web servers and provide a better understanding of clickstreams and discovered patterns.
References: 11. R. Cooley, B. Mobasher, and J. Srivastava authored the paper "Web Mining: Information and Pattern Discovery on the World Wide Web," which was presented at the 9th IEEE international conference on Tools with AI in 1997.
2. H. A. Edelstein wrote an article titled "Pan for Gold in the Clickstream" for Informationweek in March 2001 (source: http://www.informationweek.com/828/mining.htm).
3. A workshop on Web Mining will take place concurrently with the SIAM International Conference on Data Mining in Chicago, IL, USA on April 7, 2001.
4. M. N. Garofalakis, R. Rastogi, S.Seshadri, and K.Shim delivered a presentation titled "Data Mining and the Web: Past, Present and Future" at the WIDM99 Proceedings held in Kansas City, U.S.A in 1999.
5. C.Groeneboer,D.Stockley,and T.Calvert presented a paper called "Virtual-U: A collaborative theoretical account for online acquisition environments" at the Second International Conference on Computer Support for Collaborative Learning in Toronto, Ontario in December 1997.
6.J.HanandM.Kamberpublishedabooktitled"DataMiningConceptsandTechniques"in2001.
7.J.Han,J.Pei,B.Mortazavi-Asl,Q.Chen,U.Dayal,andM.-C.Hsu gave a talk titled "FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining" at the Knowledge Discovery and Data Mining Conference (KDD'00) held in Boston, MA in August 2000In July 1999, a paper titled
"A Data Miner analysing the Navigational Behaviour of Web Users" was published by M. Spiliopoulou, L. C. Faulstich, and K. Winkler at the Proceedings of workshop on Machine Learning in User Modeling of the ACAI'99 in Creta, Greece.J. Srivastava, R. Cooley, M. Deshpande, and P. Tan published a paper titled "Web Usage Mining: Discovery and Applications of Usage Patterns signifier Web Data" in SIGKDD Explorations, Vol.1, No.2, Jan.2000.
June 21-22,
2001 hypertext transfer protocol:
//www.chutneytech.com/wecwis2001.html
San Francisco,
CA,
USA,
August 26,
2001,
hypertext transfer protocol:
//robotics.Stanford.EDU/ronnyk/WEBKDD2001/index.html
hypertext transfer protocol://www.webct.com/
ubmitted to the Journal of Intelligent Information Systems.Particular Issue on Automated Text Categorization
in 2001
Proc.IEEE International Conference on Advanced Learning Technologies(ICALT 2001),
Madison,WI,
USA,
6-8 August
2001 li >
< li > O.R.ZaA ? iane,M.Xin,J.Han,_x000D_
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs,_x000D_
Proceedings from the ADL'98 - Progresss in Digital Libraries,
Santa Barbara,
1998.
- Genetic Engineering essays
- Bus essays
- Internal Combustion Engine essays
- Hybrid essays
- Electric Car essays
- Invention essays
- Mechanics essays
- Innovation essays
- Telephone essays
- Software Engineering essays
- Automobile essays
- Cycling essays
- Civil engineering essays
- Mechanical Engineering essays
- Bentley essays
- Adaptation essays
- Adventure essays
- Adversity essays
- Aging essays
- Alcohol essays
- Barbie Doll essays
- Beauty essays
- Care essays
- Carpe diem essays
- Change essays
- Chess essays
- Chicken essays
- Choices essays
- Contrast essays
- Crops essays
- Development essays
- Dream essays
- Evil essays
- Experience essays
- Family essays
- Farm essays
- Fire essays
- First Love essays
- Focus essays
- Greed essays
- Hero essays
- Holiday essays
- House essays
- Housing essays
- Humility essays
- Humor essays
- Hypocrisy essays
- Integrity essays
- Law of Life essays
- Life Changing Experience essays