The technology of choice
Web Usage Mining for a BetterWeb-Based Learning Environment
Web-based engineering is frequently the engineering of pick for distance instruction given the easiness of usage of the tools to shop the resources on the Web, the comparative affordability of accessing the omnipresent Web, and the simpleness of deploying and keeping resources on the Global Web. Many sophisticated web-based acquisition environments have been developed and are in usage around the universe. The same engineering is being used for electronic commercialism and has become highly popular. However, while there are cagey tools developed to understand on-line client ‘s behavior in order to increase gross revenues and net income, there is really small done to automatically detect entree forms to understand scholars ‘ behavior on webbased distance acquisition. Educators, utilizing online acquisition environments and tools, have really small support to measure scholars ‘ activities and discriminate between different scholars ‘ online behaviors. In this paper, we discuss some informations excavation and machine acquisition techniques that could be used to heighten web-based acquisition environments for the pedagogue to better measure the propensity procedure, every bit good as for the scholars to assist them in their learning enterprise.
Data Mining, e-learning, Web Usage Mining, Learning Activity Evaluation, AdaptiveWeb Sites
1. Introduction and background
With the rapid development of the World-Wide Web ( WWW ) , the increased popularity and easiness of usage of its tools, theWorld-WideWeb is going themost of import media for roll uping, sharing and administering information. Many organisations and corporations provide information and services on the Web such as machine-controlled client support, online shopping, and a myriad of resources and applications. Web-based applications and environments for electronic commercialism, distance instruction, online coaction, intelligence broadcasts, etc. , are going common pattern and widespread. The WWW is going omnipresent and an ordinary tool for mundane activities of common people, from a kid sharing music files with friends to a senior receiving exposure and messages from grandchildren across the universe. It is typical to see web pages for classs in all Fieldss taught at universities and colleges supplying class notes and related resources even if these classs are delivered in traditional schoolrooms. It is non surprising that the Web is the agencies of pick to architect modern advanced distance instruction systems. Distance instruction is a field where web-based engineering was really rapidly adopted and used for class bringing and cognition sharing. Typical web-based larning environments such as Virtual-U [ 5 ] and Web-CT [ 13 ] include class content bringing tools, synchronal and asynchronous conferencing systems, canvassing and quiz faculties, practical workspaces for sharing resources, white boards, grade describing systems, logbooks, assignment entry constituents, etc. In a practical schoolroom, pedagogues provide resources such as text, multimedia and simulations, and moderate and animate treatments. Distant scholars are encouraged to peruse the resources and take part in activities. However, it is really hard and clip devouring for pedagogues to thoroughly path and buttocks all the activities performed by all scholars on all these tools. Furthermore, it is difficult to measure the construction of the class content and its effectivity on the acquisition procedure. Resource suppliers do their best to construction the content presuming its efficaciousness. Educators, utilizing Web-based acquisition environments, are in despairing demand for non-intrusive and automatic ways to acquire nonsubjective feedback from scholars in order to better follow the acquisition procedure and measure the online class construction effectivity. On the scholar ‘s side, it would be really utile if the system could automatically steer the scholar ‘s activities and intelligently urge online activities or resources that would favor and better the acquisition. These tools do non be yet and to the best of our cognition there is no distance tilting system to day of the month that provides such automated installations either on the scholar ‘s side or pedagogue ‘s side. In the field of electronic commercialism, nevertheless, given the moneymaking chances, a important research attempt has been made to invent luxuriant methods to take advantage of clients ‘ entrees and purchase behavior in order to heighten the buying experience and client satisfaction by user profiling and smart recommendations, and therefore increase net income. For illustration, systems for recommendation such as Amazon.com that suggests books to buy related to a current purchase based on penchant information and similar user purchases, or recommendation of films with moviefinder.com, use collaborative filtering which predicts a individual ‘s penchants as a additive leaden combination of other people ‘s penchants. Recently, research workers have used web entree history to do web sites more adaptative and individualized and therefore more attractive to visitants, which is critical to maintain clients loyal. WUM [ 8 ] is a particular web sequence analyzer for bettering web pages layout and construction based on the history of entree sequences. Entire conferences and workshops have been dedicated to net usage analysis for the benefit of e-commerce [ 10,11,12 ] . While the analogy with e-commerce seems consecutive frontward, it is surely non every bit simple as it appears. It is true that in e-commerce the end is to increase gross revenues and net income and it is achieved by understanding client entree behavior [ 2 ] , and in elearning the end might be to better the acquisition and it could be besides achieved by understanding scholars ‘ entree forms. However, many constructs involved are basically different. For case a purchase dealing, therefore a session, which is a cardinal edifice block for most web use excavation algorithms, is someway defined get downing from a initial entree to the web site to a buying or order operation, normally in a really short clip frame ( i.e. the same entree session ) . In e-learning, a learning session can cross many entree Sessionss. To larn a construct or achieve an exact consequence in a quiz, many entree Sessionss, spread over many yearss and even hebdomads may be needed. Furthermore, while the end in e-commerce sites may be clear, for illustration promoting the clients to purchase more merchandises and maintaining them loyal, the ends in e-learning are obscure, hard to qualify/quantify and subjective.
Web-based class bringing systems, like any web site or web-based application, rely on web waiters to supply entree to resources and applications. Every individual petition that a Web waiter receives is recorded in an entree log chiefly registering the beginning of the petition, a clip cast and the resource requested, whether the petition is for a web page incorporating an article from a class chapter, the reply to an online test inquiry, or a engagement in an online conference treatment. The web log provides a natural hint of the scholars ‘ pilotage and activities on the site. In order to treat these log entries and extract valuable forms that could be used to heighten the acquisition system or aid in the learning rating, a important cleansing and transmutation stage needs to take topographic point so as to fix the information for information excavation algorithms. The undermentioned subdivision presents the issues related to net log cleansing and transmutation. Section 3 enumerates some of import informations excavation undertakings that can be adopted in web use excavation. Section 4 illustrates with illustrations how web use excavation can be utile to heighten web-based acquisition environments. Finally, Section 5 nowadayss some reasoning comments.
2. Web Log Cleansing
There is an mixture of web log analysis tools available [ 2 ] . Most of them, like NetTracker, webtrends, parallel and SurfAid, etc. , provide limited statistical analysis of web log informations [ 16 ] . Forexample, a typical study has entries of the signifier: “during this clip period T, there were n chinks happening for this peculiar web page p” . However, the consequences provided by these tools are limited in their abilities to assist understand the inexplicit use information and hidden tendencies. New merchandises use more sophisticated and complex analytic agencies but are generic, necessitate of import manual intercession and frequently resort to trying due to the immense size of web logs [ 2 ] . The most normally used method to measure entree to net resources or user involvement in resources is by numbering page entrees or “hits” . However, this is non sufficient and frequently non right. Web server log files of current common web waiters contain deficient informations upon which to establish thorough analysis. However, they contain utile informations from which a well-designed information excavation system can detect good information. Web server log files customarily contain: the sphere name ( or IP reference ) of the petition ; the user name of the user who generated the petition ( if applicable ) ; the day of the month and clip of the petition ; the method of the petition ( GET or POST ) ; the name of the file requested ; the consequence of the petition ( success, failure, mistake, etc. ) ; the size of the information sent back ; the URL of the mentioning page ; the designation of the client agent ; and a cooky, a sting of informations generated by an application and exchanged between the client and the waiter. A log entry is automatically added each clip a petition for a resource reaches the web waiter. While this may reflect the existent usage of the resources on a site, it does non enter reader behaviors like frequent backtracking or frequent reloading of the same resource when the resource is cached by the browser or a placeholder. It is of import to observe that the entries of all users are assorted in the log, merely ordered chronologically even though one individual page petition from a user may bring forth multiple entries in the waiter log. One major job in web log excavation is to place alone users and associate users with their entree log entries. In e-learning applications, nevertheless, the job is simplified since users are non anon. but need to login to the system as registered scholars. However, placing Sessionss is a nontrivial undertaking. The end is to place sequences of activities from the aggregation of assorted log entries as described above, and pattern them as Sessionss of larning activities to be presented to the pedagogues for rating and reading, or forwarded to progress informations excavation tools to farther discover intrinsic utile forms. The major stairss for web log informations transmutation can be summarized as follows:
- Remove irrelevant entries
- Identify entree Sessionss
- Map entree log entries to larning activities
- Complete traverse waies
- Group entree Sessionss by scholar to place learning Sessionss
- Integrate with other informations about scholars and groups of scholars
Removing irrelevant entries is the simple undertaking of weeding out petitions for images, for illustration, or non-user petitions such as web sycophant petitions etc. Identifying Sessionss is a demanding undertaking. The purpose is to acknowledge sequences of events such as A B C B D… where A, B, C, D, etc. are page or script petitions. The challenge is to acknowledge the beginning and the terminal of Sessionss. The job comes from the fact that HTTP, the protocol used for information exchange between web waiters and browsers, is homeless and does non maintain path of semantic Sessionss. In e-commerce applications, the terminal of Sessionss are normally the purchase of a merchandise or the check-out procedure of an e-cart, and idle times between petitions that exceed 25 to 30 proceedingss are use to place cuts between Sessionss. This heuristic is non needfully true in the online acquisition context since scholars can roll in other sites garnering relevant information while their entree session at the e-learning site is still on clasp. Furthermore a acquisition session can cross over yearss with different entrees. Many pages in e-learning applications are dynamically generated by book petitions such as quiz pages, conference messages, etc. Maping entree log entries with existent acquisition activities consists of replacing book calls with their assigned parametric quantity values with concrete activities. This is an backbreaking undertaking that assumes thorough cognition of the application books and their several parametric quantities and requires a function tabular array provided by the application interior decorators. The consequence is a sequence of scholars ‘ relevant online activities of the signifier: Login ExerciseList SubmissionQuiz1 ExerciseList ReadConferenceMessage… Completing the traversal paths consists of deducing cache hits and placeholder tampering based on the construction of the web site and how pages and activities are efficaciously linked together. Finally, incorporating the cleaned chink watercourses with bing informations about scholars can be really valuable and good. Such informations could be the profiles of the scholars, their quantitative and qualitative ratings, etc. For case uniting the classs associated with completed activities with the sequences of events taking to these activities can assist detect appropriate forms that can assist know apart between sequence of activities that yield good consequences and sequence of events that are non as effectual.
The web log cleansing and transmutation stage frequently consumes 80 % to 95 % of the attempt and resources needed for web use excavation [ 2 ] . The consequence of the pre-processing is a database of sets of pertinent activity sequences grouped by scholar. This is normally modeled with sequences of items associated with user designation stored in level files that current informations excavation algorithms can move upon. The information can besides be stored in a information warehouse like in [ 16 ] leting ad-hoc on-line analytical processing. The other two stages after informations pre-processing are pattern find utilizing intricate informations excavation algorithms, and pattern rating [ 9 ] .
3. Useful Data Mining Tasks
What is needed are summarization tendencies and forms that can be interpreted by pedagogues deliveringtheir classs online. Due to the importance of e-commerce and the moneymaking chances behind understanding online client buying behavior, there is enormous research attempt in developing informations excavation algorithms and systems tailored for e-business related web use informations excavation [ 4 ] . In add-on to descriptive statistical analysis provided by most web entree log analysis tools such as ciphering hit frequence, norm, average, etc. , length and continuance of Sessionss and other limited low-level statistical steps, there have been some informations excavation attacks adapted specifically for web use excavation. The most used methods are association regulations excavation, bunch, categorization, consecutive form analysis and dependence patterning [ 9 ] , every bit good as anticipation. These techniques are chiefly used for personalization, system betterment such as web caching and web traffic betterments, site alteration, and selling intelligence [ 9 ] . None of these applications, nevertheless, was tailored to distance acquisition, but the methods are general plenty that e-learning systems could profit from them. Association regulations coevals is the find of relationships between points in minutess. It is typically used formarket basket analysis to detect regulations of the signifier “x % of clients who buy item A and B besides buy item C.” Clustering is an unsupervised grouping of objects, while categorization is a supervised grouping. In web excavation, the objects could be users, events, Sessionss, pages, etc. Consecutive form analysis is similar to association regulations but takes into history the sequences of events. In other words, the fact that a page A is requested before another page B is captured in the forms discovered. All these techniques were designed for cognition find from really big databases of numerical informations [ 6 ] and were adapted for web excavation and applied in online concern with comparative success.
4. Enhancing Web-Based Learning Environments
WebSIFT [ 1 ] is a set of comprehensive web use tools that is able to execute many informations excavation undertakings and detect a assortment of forms from web logs. A versatile system, WebLogMiner [ 16 ] , uses informations warehousing engineering for pattern find and tendency summarisation from web logs. However these wide-ranging tools are non integrated in e-learning systems and it is cumbersome for an pedagogue who does n’t hold extended cognition in informations excavation to utilize these tools to better the effectivity of web-based acquisition environments. A new web use excavation system dedicated for e-learning is being developed to let pedagogues to measure online acquisition activities [ 15 ] . For an pedagogue utilizing a web-based class bringing environment, it could be good to track the activities go oning in the class web site and infusion forms and behaviors motivating demands to alter, better, or adapt the class contents. For illustration, one could place the waies often and on a regular basis visited, the waies ne’er visited, the bunchs of scholars based on the waies they follow, etc. For a scholar utilizing a web-based class bringing environment, it could be good to have intimations from the system on what subsequent activity to execute based on similar behavior by other “successful” scholars. For illustration, the system could propose cutoffs to often visited pages based on old user activities, or suggest activities that made similar scholars more “successful” . It could besides be good if the system adapts the class content logical construction to the scholar ‘s learning gait, involvement, or old behavior. Web-based class content is non ever presented and structured in an intuitive manner. By analysing common traversal waies of the class content web pages or frequent alterations in single traverse waies, the layout of the class can be reorganized or adapted to better suit the demands of a group or an single. We see two types of informations excavation in the context of e-learning: off-line web use excavation and incorporate web use excavation. Off-line web use excavation is the find of forms with a standalone application. This pattern find procedure would let pedagogues to measure the entree behavior, validate the acquisition theoretical accounts used, measure the acquisition activities, comparison scholars and their entree forms, etc. We have designed and implemented a paradigm of such an application as a tool for pedagogues to use association regulations to detect relationships between larning activities that scholars perform, consecutive analysis to detect interesting forms in the sequences of online activities, and constellating to group similar entree behaviors [ 15 ] . While most informations excavation algorithms need specific parametric quantities and threshold values to tune the find procedure, the users of web use excavation applications in the context of e-learning, viz. pedagogues and e-learning site interior decorators, are non needfully grok in the intricate complexnesss of informations mining algorithms. For this intent we have tried to plan new algorithms that need minimal input from the user and automatically adjust to the web log informations at manus. In [ 3 ] we propose a wholly non-parametric attack for constellating web Sessionss. Off-line web use excavation helps pedagogues put in inquiry and formalize the acquisition theoretical accounts they use every bit good as the construction of the web site as it is perused by the scholars. In contrast, integrated web use excavation is a procedure of detecting forms that is incorporated with the e-learning application. This encompasses adaptative web sites, personalization of activities, and automatic recommenders that suggest activities to scholars based on their penchants every bit good as their history of activities and the entree forms discovered from the communal entrees. We are presently planing a recommender based association regulation mining similar to the text classification we developed in [ 14 ] . The thought consists of detecting relevant associations between larning activities and bring forthing association regulations that are applied in existent clip when in a current session the activities of the ancestor of a regulation are verified so the activities in the consequent of the regulation are suggested to the scholar as the recommended following measure in the acquisition session. The algorithm for text classification presented in [ 14 ] can besides be used to automatically categorise scholars ‘ messages sent on an asynchronous conferencing system in order to assist the pedagogues better assess the information exchange in a class related forum.
5. Decisions and Future Work
The Web is an first-class tool to present online classs in the context of distance instruction. However, numbering merely on web traffic statistical analysis does non take advantage in the potency of hidden forms inside the web logs. Web use excavation is a non-trivial procedure of pull outing utile implicit and antecedently unknown forms from the use of the Web. Significant research is invested to detect these utile forms to increase profitableness of e-commerce sites. However, the ends of these applications and methods, “turning visitants into purchasers” , are different from the ends in e-learning: “turning scholars into effectual better learners.” We have seen some illustrations where informations excavation techniques can heighten online instruction for the pedagogues every bit good as the scholars.
While some tools utilizing informations excavation techniques to assist pedagogues and scholars are being developed, the research is still in its babyhood. In add-on, with the consciousness of the possible advantages of incorporate web use excavation and the insufficient informations recorded by web waiters, there is a demand for more specialised logs from the application side to enrich the information already logged by the web waiter. This added value by specific event entering on the e-learning side will give clicksteams and the forms discovered a better significance and reading.
- R. Cooley, B. Mobasher, J. Srivastava, Web Mining: Information and Pattern Discovery on the World Wide Web, Proceedings of the 9th IEEE international conference on Tools with AI, 1997.
- H. A. Edelstein, Pan for Gold in the Clickstream, Informationweek, March 2001, hypertext transfer protocol: //www.informationweek.com/828/mining.htm
- A. Foss, W. Wang, O. R. ZaA?iane, A Non-Parametric Approach to Web Log Analysis, Proc. Web Mining Workshop, in concurrence with the SIAM International Conference on Data Mining, Chicago, IL, USA, April 7, 2001
- M. N. Garofalakis, R. Rastogi, S. Seshadri, K. Shim, Data Mining and the Web: Past, Present and Future, Proceedings ofWIDM99, Kansas City, U.S.A. , 1999.
- C. Groeneboer, D. Stockley, T. Calvert, Virtual-U: A collaborative theoretical account for on-line acquisition environments, Proceedings Second International Conference on Computer Support for Collaborative Learning, Toronto, Ontario, December, 1997.
- J. Han and M. Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann Publisher, 2001
- J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, M.-C. Hsu, “ FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining” , Proc. 2000 Int. Conf. on Knowledge Discovery and Data Mining ( KDD’00 ) , Boston, MA, August 2000
- M. Spiliopoulou, L. C. Faulstich, K. Winkler, A Data Miner analysing the Navigational Behaviour of Web Users, Proceedings of workshop on Machine Learning in User Modeling of the ACAI’99, Creta, Greece, July, 1999.
- J. Srivastava, R. Cooley, M. Deshpande, P. Tan, Web Usage Mining: Discovery and Applications of Usage Patterns signifier Web Data, SIGKDD Explorations, Vol.1, No.2, Jan. 2000.
- The International Workshop on Web Knowledge Discovery and Data Mining, Kyoto, Japan, April 18, 2000, hypertext transfer protocol: //www.ntu.edu.sg/home/awkng/wkddm2000.htm
- Third International Workshop on Advanced Issues of E-Commerce and Web-based Information Systems San Jose, CA, USA, June 21-22, 2001 hypertext transfer protocol: //www.chutneytech.com/wecwis2001.html
- Third WEBKDD workshop on informations excavation for web applications: Mining Log Data Across All Customer TouchPoints, San Francisco, CA, USA, August 26, 2001, hypertext transfer protocol: //robotics.Stanford.EDU/ ronnyk/ WEBKDD2001/index.html
- WebCT: hypertext transfer protocol: //www.webct.com/
- O. R. ZaA?iane and Maria-Luiza Antonie, Automatic Text Categorization utilizing Association Rule Mining, submitted to the Journal of Intelligent Information Systems, Particular Issue on Automated Text Categorization, 2001
- O. R. ZaA?iane, J. Luo, Towards Evaluating Learners ‘ Behaviour in a Web-Based Distance Learning Environment, Proc. IEEE International Conference on Advanced Learning Technologies ( ICALT 2001 ) , Madison, WI, USA, 6-8 August 2001
- O. R. ZaA?iane, M. Xin, J. Han, Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs, Proceedings from the ADL’98 – Progresss in Digital Libraries, Santa Barbara, 1998.