Mosquito Species Detection using Smart Phone Essay Example
Abstract
The World Health Organization (WHO) declares that mosquitoes are the main disease-transmitting insects and present a substantial danger. In 2015, there were 214 million instances of malaria reported globally. Furthermore, mosquitoes play a crucial role in transmitting the Zika virus, another extremely hazardous disease.
According to a CDC report, the Puerto Rico Department of Health (PRDH) received 62,500 suspected cases of Zika in 2016, with 29,345 confirmed positive cases. Out of the 3,500 species of mosquitoes found worldwide, only 175 are present in the United States. However, only a few of these species cause deadly diseases mentioned above. Henceforth, it is essential to classify mosquitoes as hazardous or non-hazardous.
In this study, we sought to classify 7 different species of dead mosquitoes using 60 samples obtained from the Hillsborough County Mosquito and Aquatic Weed Control Unit in Tam
...pa, Florida. Our goal was to create a method that would enable both regular individuals and experts to identify potential risks and take proactive measures. To achieve this, we utilized smartphone cameras to capture images of the mosquitoes and employed image preprocessing techniques to eliminate noise. We also implemented the random forest classification algorithm to enhance accuracy. The findings indicated favorable precision, recall, F1 measure, and an overall accuracy rate of 83.3%.
We have plans to develop a smartphone app that will utilize our learning model. This app aims to assist individuals in identifying different species of mosquitoes, even if they have no prior knowledge in this field.
Introduction
When it comes to spreading diseases, mosquitoes pose one of the greatest dangers among all animals. Malaria, Dengue, West Nile Fever, and Zika Fever are some of the mosquito-borne illnesses that have caused significan
harm to humanity. Combatting the proliferation of mosquitoes is considered an essential agenda by the global health sector. Organizations like the American Mosquito Control Association (AMCA) work towards addressing this issue through various programs aimed at educating citizens about mosquito hazards and control methods. The AMCA operates in over 50 countries worldwide. According to the Centers for Disease Control and Prevention (CDC), there are approximately 3500 mosquito species globally, with around 175 found in the United States.
Among programs designed to combat the spread of mosquitoes, it is vital to identify the types and quantities of species in a given area. Mosquito control organizations across the globe have dedicated personnel who use traps to catch mosquitoes in specific regions. These personnel visually examine each captured sample using magnifying glasses to determine the mosquito species. Identifying each sample takes approximately one minute, and with a larger number of samples, the identification process can consume hours of manual effort. This paper aims to create a system that combines smartphone camera images with machine learning algorithms to automatically detect the species of mosquitoes from their pictures. To achieve this goal, our contributions include: a) constructing a mosquito image database by collecting numerous samples from traps set up by the Hillsborough County Mosquito and Aquatic Weed Control in Tampa during Fall 2016.
Subsequently, the personnel assisted us in visually identifying the type of each sample, resulting in the collection of 60 samples belonging to seven different species. The table presents our database. Additionally, each sample was imaged using a Samsung Galaxy S5 phone from multiple angles under consistent indoor lighting conditions, resulting in a total of 200 images. This collection served
as our database for subsequent classification.
b). Designing Pre-processing Techniques: Typically, images are susceptible to various types of noise caused by differing environmental conditions and user expertise.
To preserve the key information in images, pre-processing is necessary for noise removal and smoothening. The process must ensure that the edges and boundaries of the images are maintained. To achieve this, we utilized the highly effective median filter, which is a widely employed technique in image processing.
Designing Random Forest Based Classifiers: Random Forest is a supervised machine learning algorithm that combines multiple decision trees. Each tree is built using a random subset of the training dataset. In comparison to other classification algorithms, Random Forest has demonstrated significant improvement in accuracy. It is also effective in handling outliers and noise. Additionally, Random Forest efficiently handles larger datasets without overfitting, as it only uses a subset of the training set for each tree. We conducted a thorough performance evaluation of our proposed techniques, which included evaluating our experiment on 60 image samples from seven different species. We used the 10-fold cross-validation technique and achieved an accuracy of 83:3% by utilizing RGB features.
The paper is divided as follows: Section II covers related works, Section III describes the experimental set up and data collection process. In Section IV, the details about preprocessing of image data, feature extraction and selection, building the learning model using classification method and leveraging different metrics to display the results are provided. Section V extensively discusses experimental evaluation and validation.
Finally, the discussion and conclusion sections are VI and VII respectively.
Related work
There have been numerous studies focused on utilizing smartphone cameras for image recognition. In this section, we highlight several
relevant and significant works.
A. Related Work on Image Recognition: One study aimed to assess the effectiveness of soil treatment on plant stress by using smartphone cameras. The researchers captured 34 images of plant leaves with a smartphone in two different soils: biosolids and unamended tailings. They then applied mean and median filters as preprocessing techniques before segmenting the images into pixels. From the segmented pixels, they extracted RGB, R, G, B, HSV, and YCbCr features.
The Random Forest algorithm, a supervised classification method, was developed to identify leaf stress and achieved an accuracy of 91.24%. Researchers conducted a survey on different techniques for detecting skin color based on pixels. They utilized color spaces such as RGB, Normalized RGB, HSV, and YCrCb to identify skin tones. Among these options, RGB is the most commonly employed color space for manipulating and storing digital images. Wen et.al put forth a method for insect identification and classification based on images.
This paper focuses on an experiment involving eight selected insect species. To ensure a non-destructive kill, the insects were frozen before being placed on a white balance panel. These specimens were then observed under a Nikon stereoscopic zoom microscope SMZ1000 equipped with a Plan Apochromat 0.5 objective. A DS-Fi1 color digital camera was used to capture images of the insects through the microscope. The various features analyzed in these images include color, texture, invariants, contour, and geometric properties. Specifically, the HSV color space features were taken into consideration for color analysis.
Many classification algorithms were used for testing and training the model, including minimum least square linear classifier (MLSLC), normal densities based linear classifier (NDLC), K nearest neighbor classifier (KNNC), nearest
mean classifier (NMC), and decision tree (DT). Out of these classifiers, the NDLC algorithm outperformed the others. 1) Comparing our Work w.r.t.
Our main objective is to collect mosquito images from smartphone cameras and utilize these images for training and testing a learning model. In the past, researchers have been able to identify insect species, but this necessitated access to a laboratory setup with a microscope and high-resolution digital camera, which are not typically found in households. To classify the mosquitoes, we have extracted RGB features that are frequently employed in color spaces. Nevertheless, the physical attributes of deceased mosquitoes like color and fragility alter over time. To maintain consistent environmental conditions, all images of dead mosquitoes were captured on the same day.
A smartphone from the Samsung Galaxy S5 series was utilized to capture images under normal daylight conditions, following the guidelines on the Mosquito and Aquatic Control Weed Control Unit's website. To conduct our study, we employed the knowledge aware fusion technique and took a total of 60 images using specific camera settings mentioned in the Table.
Our Approach
In our approach, we implemented two steps. Firstly, we applied pre-processing on the images to eliminate noise and select relevant features by utilizing filters such as median and mean. Secondly, we constructed a learning model by employing a classification algorithm based on random forest.
Our main objective is to develop a learning model that can identify each species of mosquitoes. The challenge we encountered was the size of the images. The images captured from smartphones are originally 2988 X 5322 pixels in size. To reduce the data dimensionality, we resized them to 256 X 256 pixels.
We implemented a median
filter technique to remove noise from each sample, as explained in the following subsection.
As our images were initially dark, it was crucial to maintain contrast between the background and foreground for accurate model building.
Therefore, we decided not to use a segmentation technique, as it would turn the background into black.
To train and test our model, we utilized the Random Forest algorithm, a supervised learning method, alongside a 10-fold cross validation technique.
The flow of our algorithm is illustrated in Figure.
To continue, we require labeled image data for model training.
All images were manually tagged by mosquito experts for proper identification.
Noise Removal
Digital images often experience different types of noise during capture and transmission, which can negatively impact result accuracy.
The text discusses various filters used to reduce noise in images. One such filter is the Sharpening Filter, which enhances the edges and line details in an image. This technique involves passing the original image through a high pass filter to extract its high frequency components. The scaled output of the high pass filter is then added to the original image, resulting in a sharpened image.
Another filter mentioned is the Mean Filter, which involves replacing each pixel value in an image with the mean of pixel values of its neighboring pixels within a sliding window of size n*n. This filtering technique is more effective in removing noise when a larger window size is used. It is also known as an average filter.
Median Filter: It is a nonlinear filtering technique used in digital image processing to replace each pixel value in the window of n * n size pixel by the median of all pixel values in that window. This technique
preserves edges while removing noise. We have used a 3*3 pixels window size for removing noise from our digital images. The output with and without the median filter is shown in Figure.
Feature Selection
Feature extraction and selection are critical parts of any supervised learning algorithm. Extraction involves reducing data dimensionality as the size and dimension of the data increase, making it difficult to handle manually.
Automation becomes necessary when considering feature selection. Feature selection involves choosing the most relevant features for our problem and removing irrelevant, redundant features that do not improve the accuracy of the learning model. In our model, we identify various species of mosquitoes, each with distinct colors.
The figure illustrates that all mosquitoes have similar shapes but differ in body and wing color. Therefore, it is essential to consider the correct color channels or their combination for the features. Two examples of color channels are RGB and HSV. RGB consists of Red, Green, and Blue channels, with each component capable of intensity levels ranging from 0 to 255 (integer).
Here, we extracted RGB feature from the mosquito image data. Then for feature selection, we applied Information-Gain attribute selection algorithm which is a good measure for deciding the relevance of an attribute. This feature selection technique generally helps in achieving high accuracy and using this we got 1000 features which serve as an input vector x into Random Forest Classification Algorithm for species detection. We calculated its precision, recall and F1-measure which is mentioned in Table Random Forest Algorithm: Random Forests(RF) is an ensemble supervised machine learning algorithm. It consists of a set of decision trees; h(x,i) i = 1, 2,, where x is
a feature vector extracted from the smartphone image data and i consists of K integers which are independent identically distributed random vectors. Each decision tree predicts a class independently.
A voting mechanism is used to determine the final predicted class by considering the results from each decision tree. The class that receives the majority vote will be chosen. The value of the attribute for each example, represented by xa vals(a), plays a role in this process. Randomization is introduced in two ways: First, through the random selection of data for bootstrap samples, as done in bagging. Second, through the random selection of input features for constructing individual base decision trees. Each tree grows to its maximum size without any pruning, until the stopping criterion is met. After ensembling the forest, testing data samples are labeled with mosquito species class based on a majority vote among all classes from all decision trees in the forest.
An overview of our evaluation methods reveals that we utilized 10-fold cross validation, which is a standard approach for our specific problem scope.
Cross-validation is a model validation technique that helps assess how the results of a classification model will generalize to an independent dataset. The technique involves dividing the dataset into 10 subsets, known as 10-fold cross-validation, and evaluating them 10 times. In each iteration, one subset is used as the test set while the other 9 subsets form the training set. The average error across all 10 trials is then computed to obtain the final result. This approach helps prevent problems like over-fitting in the classification model. In our case, we utilized the RGB feature mentioned previously to train our
classification model.
To evaluate its accuracy, we utilized a 10-fold cross validation technique and calculated precision, recall, and F1 measure for each species independently. The evaluation measures for the RGB feature are also displayed in Figure (Confusion Matrix).
Discussions
We evaluated our experiment on 7 species comprising a total of 60 mosquito samples. We employed the 10-fold cross validation technique for training and testing the model. In the future, we plan to further evaluate our system on a larger dataset, encompassing a greater number of species and utilizing various features such as RGB mean, median, standard deviation, texture, and entropy extracted through convolution of the image into a 3x3 pixel window. Additionally, we intend to implement this experiment as a smartphone application. The Random Forest Classification technique was employed and it took 0.11 seconds to build the model, achieving an accuracy of 83.3%.
The machine used for our experiment had the following configuration: Intel Core i7 CPU @2:6 Ghz with 16 GB RAM.
Conclusion
The use of smart phone cameras and classification techniques to identify mosquitoes can help distinguish between harmful and non-harmful species. In our study, we applied image filtering techniques to improve the clarity and accuracy of images captured by smart phones. We used the Random Forest classification technique with the information gain attribute selection method to accurately classify various species of mosquitoes.
Our future work is focused on evaluating this model in a real-time scenario through a smartphone application.
References
- Wikipedia, "Mosquito - wikipedia, the free encyclopedia," 2016, [Online; accessed 30-November-2016]. [Online]. Available: -, "Median filter - wikipedia, the free encyclopedia," 2016, [Online; accessed 23-May-2016]. [Online]. Available: L. Breiman, "Random forests," Machine learning, vol. 45,
no.
1, pp. 5-32, 2001.
Al-Lami, P. Bharti, S. Chellappan, and J. Burken, "Determining the effectiveness of soil treatment on plant stress using smart-phone cameras," in 2016 International Conference on Selected Topics in Mobile Wireless Networking (MoWNeT), April 2016, pp. 1-8.
The specified text is a citation in APA format that mentions the authors Sazonov and A. Andreeva, the title "A survey on pixel-based skin color detection techniques," the conference name "Proc. Graphicon," the volume number and location "vol. 3. Moscow, Russia," and the page numbers "pp. 85-92." Additionally, there is an HTML list item with the letter "C" inside it.
Wen and D. Guyer (2012) proposed a method for automated identification and classification of orchard insects based on image processing Computers and electronics in agriculture, vol. 89, pp. 110-115.
[Online]. Available:
165-192.
Kulkarni discusses the efficient learning of a random forest classifier using a disjoint partitioning approach in his paper titled "Efficient learning of random forest classifier using disjoint partitioning approach" published in the Proceedings of the World Congress on Engineering in 2013 (vol. 2, pp. 3-5).
Another source is Wikipedia, which provides information on confusion
matrices. The page is titled "Confusion matrix - Wikipedia, the free encyclopedia" and was last accessed on 11th October 2016. The source is available online.
- Data collection essays
- Graphic Design essays
- Data Mining essays
- Cryptography essays
- Internet essays
- Network Security essays
- Android essays
- Computer Security essays
- World Wide Web essays
- Website essays
- Computer Network essays
- Application Software essays
- Computer Programming essays
- Computer Software essays
- Benchmark essays
- Information Systems essays
- Email essays
- Hypertext Transfer Protocol essays
- Marshall Mcluhan essays
- Virtual Learning Environment essays
- Web Search essays
- Etiquette essays
- Mainstream essays
- Vodafone essays
- Web Search Engine essays
- Networking essays
- Telecommunication essays
- Network Topology essays
- Telecommunications essays
- Programming Languages essays
- Object-Oriented Programming essays
- Java essays
- Interpretation essays
- Plagiarism essays
- Analogy essays
- Learning English essays
- Artificial Intelligence essays
- Bitcoin essays
- Encryption essays
- Robotics essays
- Text Messaging essays
- Cloud Computing essays
- Computer Science essays
- Consumer Electronics essays
- Data Analysis essays
- Electronics essays
- engineering essays
- Enterprise Technology essays
- Hardware essays
- Impact of Technology essays