Cyber Bullying Detection using Machine Learning Essay Example
Cyber Bullying Detection using Machine Learning Essay Example

Cyber Bullying Detection using Machine Learning Essay Example

Available Only on StudyHippo
  • Pages: 10 (2511 words)
  • Published: July 15, 2021
View Entire Sample
Text preview

Introduction

With the ease of access to internet and social media becoming an integral part of our day-to-day life, cyberbullying has been an issue for many years, especially among the youngsters. Cyberbullying is an aggressive act that harasses, humiliates or threatens other people via the Internet, social media or other electronic contents. The two main factors contributing to this social menace are anonymity and lack of meaningful supervision in the electronic medium [3]. With anonymity, anyone can create a fake account and can post content such as news or articles to attack other people.

Since, online materials spread fast and have a wider audience [6], cyberbullying has a very bad impact on an individual and can result into major problems like anxiety, depression, low self-worth, and even suicide in some cases. Social scientists such as Danah Boyd have described four aspects of the w

...

eb that changes the very dynamics of bullying and magnifies it to new levels: persistence, searchability, replicability and invisible audiences [3].

Based on the previous work and research related to cyberbullying, very little attention has been devoted to its detection, beyond regular-expression-driven systems based on keywords [3]. An effective cyberbullying detection system should prevent or decrease the cyberbullying incidents in cyberspace. One of the major issues in detecting cyberbullying is to judge the severity of the instance. Since, a victim can be exposed to multiple instances like embarrassing pictures or name calling, it can be done through various mode and can reach large audience within no time.

Machine learning can help us detect language patterns used by bullies and their victims and develop rules to automatically detect cyberbullying content [1]. To prevent the victims from th

View entire sample
Join StudyHippo to see entire essay

incidents, texts in the messages should be monitored, processed and analyzed as quickly as possible in order to support real time decisions [34]. Since manual detection is time consuming and hard to implement, automatic cyberbullying detection is preferred. Different classification algorithms like SVM, J48, Naive Bayes Multinomial model, and IBk can be used to detect cyberbullying. In this paper, we discuss about the sources more likely to have cyberbullying content, collection and preprocessing of data based on the studies, using different algorithms to detect cyberbullying and performance evaluation of these algorithms.

State of the Art

In the literature of cyberbullying detection, the focus has been directed towards the content of the conversations [5]. In [18], Yin et al. considered only the content of online posts and not the characteristics of the author of the posts [18]. Sentiment, content and contextual features were used to train a classifier for a corpus of online posts. Textual cyber bullying has also been highlighted in paper [3] and considered as the most common form of cyberbullying. By performing experiments with both binary and multiclass classifiers, Dinakar et al. concluded that compared to multiclass classifiers, individual classifiers can give better performance in cyberbullying detection. Another study on analyzing the written text and studying different features has been done in paper [5], which focus on determining the authorship for example authorship of emails, etc.

Instead of content of the text written by users, Paper [6] and [8] mainly focus on the user’s information and their characteristics such as gender and behavior to improve the accuracy of cyberbullying detection. Four types of features were applied to train the classifier, SVM, first for the posts written

by both the genders and then separately for each gender.

To summarize, the related work done to detect cyber bullying through computational ways can be considered as – understanding the cause, text categorization or topic modelling, and the application to real world problems.

Methodology

An overview of the available literature about different algorithms and techniques proposed for cyberbullying detection will be provided in this section.

Data Collection

With social media as the main source of data for cyberbullying, studies revealed that YouTube, Twitter, and MySpace are the most common platforms where cyberbullying occurs. Apart from these social networking sites, another website called Formspring.me allows the users to post content anonymously to other user’s page, thus making it highly prone to cyberbullying.

Feature Selection

Feature selection is done to choose the important features from the data so that the further analysis can be done with ease. The two well-known methods used for feature selection are: Chi-Square (CHI2) and Information Gain (IG). CHI2 test can be used to test if the occurrence of a specific term and the class are independent whereas IG can compute the level of information by using the information about the presence of a feature in corresponding class distribution. The attribute with high mutual information is preferred as compared to other features.

For example, the feature selection in Naive Bayes Multinomial (NBM) model is done by selecting the words that have the highest average mutual information with the class variable [30].

Generally, for cyber bullying detection the feature selection can be categorized into four kinds: content, sentiment, user and network-based. Content-based keywords are the extractable lexical items of a document such as keywords, pronouns, document length, profanity and punctuations. It is important

to avoid a bag-of-words approach since the feature space is large and it is not practical. Sentiment-based features are those which express the sentiments in a document. User-based features are the characteristics of a user’s profile including the age, gender and sexual orientation and network-based features are the usage metrics that can be extracted from the online social network and include items such as number of friends, number of followers, frequency of posting, etc. [36].

Data Preprocessing

Data preprocessing involves the data clean up steps such as removal of special characters, stop words, sequence of characters (ex: ‘lolll’) and white spaces so that the data contains only relevant information and further analysis of the text can be convenient. Some of the steps involved in data preprocessing are:

  • Tokenization: Unstructured text is converted into a set of tokens which is divided by white spaces and is further classified into words or sentences.
  • Stop words Removal: Removing the stop words such as “a”, “and”, “are” facilitate easy processing of the text.
  • Special Characters: The characters such as “@” are replaced with “at”, and the characters such as “#” are removed. It is important to replace these characters since they have large number of occurrences especially in tweets [36].
  • Stemming and Lemmatization: Stemming is a method to truncate the suffixes and prefixes of a word while lemmatization is a dictionary-based approach to obtain the root of a word, called lemma.

The Model

After the collection of data sets and preprocessing of the data is done, data is annotated with labels and if further classified with the help of a classification algorithm. According to the previous research done, NBM (Naive Bayes Multinomial) model,

IBk, J48 and SVM (Support Vector Machine) are the most preferred classifiers used for cyberbullying detection. Here we discuss how different classification algorithms can be used and the compare of performance of each.

Support Vector Machine

SVM (Support Vector Machine) is a supervised learning algorithm and is one of the most efficient and powerful classifiers in machine learning. It aims to find the maximum margin hyperplane that separates the two classes and selects a small number of boundary instances called support vectors from each class [38]. The process is initialized by getting the data ready to train the classifier. This includes:

  • Labelling of the data [33]
  • Generation of vocabulary [33]
  • Creation of document-term matrix [33]

After the data matrix is created, the optimal hyperplane is selected based on the plotted values to maximize the margin of the training data. The input data is then passed through the trained classifier to classify it into bullying or non-bullying.

SVM can also be used with kernel functions, namely:

  • RBF kernel (Radial basis function) [33]
  • Gaussian kernel [33]
  • Linear kernel [33]

Linear kernel is a special case of RBF kernel. In paper [33], liner kernel is applied on the datasets obtained from three different social networking sites namely, MySpace, Kongregate and Slashdot. While in paper [3], poly-2 kernel was used on the data collected from YouTube. Datasets are collected in the form of XML files each containing thread of multiple posts which are further extracted as a single data element. Every data element is considered as a single document and an appropriate weight is assigned.

According to the research and experiments done, SVM with linear kernel using unigrams gives an accuracy of 79.6% while with

bigrams it gives 8l.3% leading to the conclusion that bigrams should be used with the SVM linear kernel [33]. Based on the data collected from Twitter (1762 tweets) in paper [33], it can be concluded that linear SVM with bigram gives a better accuracy.

Naive Bayes Multinomial (NBM) Model

NBM is a text classifier based on Bayes theorem. It is a simple probabilistic classification algorithm that calculates probabilities with the help of combination of features and word frequency captured in documents.

In the multinomial model, a document is an ordered sequence of word events, drawn from the same vocabulary V [30]. This model includes two assumptions: first, the probability of each word event in a document is independent of the word’s context and position in the document [30] and second, the lengths of the documents are independent of the class [30].

J48 is a popular univariate decision tree algorithm that divides the data into classes and processes it through a decision tree. It uses the algorithm C4.5 to create a decision tree model from the attributes provided [2] and contains dummy data sets which can be used to decide the class to which the data should be assigned.

Some of the steps to construct the tree are:

  • First, check whether all cases belong to the same class, then the tree is a leaf and is labelled with that class [33].
  • Then we calculate the information and information gain (IG) for each attribute.
  • Based on the current selection criterion, find the best splitting attribute.

With decision trees, it’s important to consider the size of the tree since large tress may overfit the data. The internal nodes of the decision tree define the

various attributes, the branches define the values of these attributes and the leaf node defines the final classification of data. The best feature is decided based on the information gain in order to facilitate the process of classification.

The input data (for example tweets collected through Twitter API) is then passed through the classifier. The advantage of using this classifier is that it can take tables of data with a huge number of columns and classify them into simple decision trees.

IBk

The instance-based (IBk) algorithm is a k-nearest-neighbor classification algorithm that computes the distance of the new instance to all the other instances in the training set [32]. Then the k instances which are the nearest to the new instances are found. Based on the majority class labels of the k-nearest training instances, label is assigned to the new instances.

Performance is measured with the F-measure value:

F-measure = (2 × recall × precision) / (recall + precision) [38]

Precision for IBk can be defined as the ratio of the number of instances correctly labeled as belonging to the positive class to the total number of instances labeled as positives [38] also called as the exactness of the classification algorithm. Recall is the completeness of the classification algorithm, and it is the ratio of the number of true positives to the total number of elements that belong to the positive class [38].

Evaluation & Results

Prior work on the assessment of classifiers suggests that accuracy alone is an insufficient metric to gauge reliability [3]. Along with accuracy, kappa statistic (Cohen’s Kappa) has been used in previous work to evaluate classifiers.

In terms of Kappa values, SVM proved to be the most reliable, followed

by Naive Bayes and J48 [3].

Based on the research done in this area and as stated with figures in paper [38], when compared without any feature selection, NBM is the best classifier. However, when the feature selection is applied the best classifier is IBk. In terms of accuracy, the performance of all the classifiers improves with feature selection except J48.

The poor performance of J48 with feature selection can be stated to the fact that J48 is a decision tree classification algorithm and since decision trees can also be used as feature selectors, the algorithm already selects the best features while creating decision trees.

When the classifiers are compared based on the running time [38], NBM shows the best performance followed by SVM. IBk has a high running time in testing while J48 has long training time. Thus, in terms of accuracy and running time, NBM proved to be the best classification algorithm.

Future Work

For future work, to obtain better results topic modelling techniques like Hidden Topic Markov Models (HTMM) can be used to retrieve information and to classify the data into predefined categories. It is beneficial to implement a classifier at the source itself so that the cyber bullying detection can be done at the earliest and measures can be taken to control it. Better user interface and experience design patterns can also be used to detect the textual abuse and to control it.

Sentiment mixture models for multiple views of a given post and topic-author-community models that consider social interaction variables would be interesting to pursue [3].

Moreover, a grading system can be introduced which would categorize the bullying instances, thus making it easier to deal with each

of the instances [33].

References

  1. K. Reynolds, A. Kontostathis, & L. Edwards (2011), Using Machine Learning to Detect Cyberbullying, in 10th International Conference on Machine Learning and Applications, 2011.
  2. R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kauffman, 1993.
  3. K. Dinakar, R. Reichart, and H. Lieberman, “Modeling the Detection of Textual Cyberbullying,” in Proc. IEEE International Fifth International AAAI Conference on Weblogs and Social Media (SWM’11), 2011.
  4. Boyd, Danah. (2007) “Why Youth (Heart) Social Network Sites: The Role of Networked Publics in Teenage Social Life.” MacArthur Foundation Series on Digital Learning – Youth, Identity, and DigitalMedia Volume (ed. David Buckingham). Cambridge, MA: MIT Press.
  5. P. Galan-Garcia, J. Puerta, C. Gomez, I. Santos, & P. Bringas, Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying, 2014.
  6. 6. M. Dadvar, F. de Jong, R. Ordelman and R. Trieschnigg. Improved Cyberbullying Detection using Gender Information, 2012.
  7. A. Kontostathis, K. Reynolds, A. Garron, & L. Edwards, Detecting Cyberbullying: Query Terms and Techniques, 2013.
  8. M. Dadvar, F. de Jong, Cyberbullying Detection: A Step Toward a Safer Internet Yard, 2012.
  9. Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A., Edwards, L. 2009. Detection of harassment on Web 2.0. Proceedings of the Content Analysis in the WEB 2.0 (CAW2.0) Workshop at WWW2009, Madrid, Spain.
  10. Jerome H. Friedman. On bias, variance, 0/1 - loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1:55–77, 1997.
  11. A. McCallum, and K. Nigam, “A comparison of event models for Naïve Bayes text classification,” AAAI-98 Workshop on Learning for Text Categorization, pp. 41–48, 1998.
  12. D. Aha, and D. Kibler, “Instance-based learning algorithms,” Machine Learning,

vol.6, pp. 37-66, 1991.

  • Methods for Detection of Cyberbullying_A Survey
  • Automated Cyberbullying Detection using Clustering Appearance Patterns
  • Approaches to Automated Detection of Cyberbullying_A Survey
  • Detection of Cyberbullying on Social Media Messages in Turkish
  • Get an explanation on any task
    Get unstuck with the help of our AI assistant in seconds
    New