Multitouch Gesture Generation and Recognition Techniques Essay Example
Multitouch Gesture Generation and Recognition Techniques Essay Example

Multitouch Gesture Generation and Recognition Techniques Essay Example

Available Only on StudyHippo
  • Pages: 10 (2516 words)
  • Published: August 6, 2018
View Entire Sample
Text preview


Abstract

Smart phones are widely used for communication purposes, but this exposes users to various threats. These threats can disrupt the functioning of the smart phone and manipulate user data. Therefore, it is crucial for applications to ensure the privacy and integrity of information. However, single touch mobile security is not sufficient to efficiently protect confidential data [1].

Therefore, we are focusing on enhancing mobile security by implementing multitouch technology. Multi-touch is a form of authentication that allows a touch screen to detect multiple points of contact [2]. Through the use of multiple touch points, we can verify user identity and grant access to confidential data on mobile devices. Our research focuses on using biometric gestures and multitouch finger points to provide an additional layer of security [1].

Keywords: Multitouch, biometric

...

gesture, authentication, security, smart phone Finger-tracking, Android Operating system.

Introduction: Today’s IT admins face the challenging task of managing the numerous mobile devices that connect to enterprise networks daily for communication. This task has become increasingly important as the number of devices in operation and their uses have expanded worldwide. The problem is intensified within the enterprise due to the growing trend of users bringing their own devices to connect to the corporate internet. Authentication is the process of comparing provided credentials to those on file in a database of valid users' information.

If the credentials are a match, the user is granted authorization to access the system. The permissions and folders outline the user's environment and how they can interact with it, including their level of access and rights. This includes allocated storage space and other services [1]

View entire sample
Join StudyHippo to see entire essay

Typically, computer authentication involves using alphanumeric usernames and text-based passwords. However, this method has shown some drawbacks. For instance, users often choose passwords that are easily guessed or difficult to remember for themselves. To address this issue, researchers have developed authentication techniques that utilize multitouch biometric gestures as authentication passwords.

Multi-touch is a technology that allows for input gestures on multiple points of a device's surface. It is often used in touch screen devices like smartphones and tablets, as well as touch pads, tables, and walls [2]. Basically, multi-touch means that a touch screen or touchpad can recognize two or more simultaneous points of contact. This ability to track multiple points enables the device to interpret gestures like pinch-to-zoom and pinch, thereby improving its functionality.

Gesture recognition is the utilization of mathematical algorithms to interpret human gestures, which can originate from various bodily movements such as the face or hand. Additionally, gesture recognition encompasses identifying and classifying posture and other human behaviors. To assess accuracy, we employed the Equal Error Rate (EER), a metric that quantifies the occurrence of False Acceptance Rate (FAR) and False Rejection Rate (FRR). In order to evaluate whether employing multiple gestures would enhance system performance, we aggregated the scores of two distinct gestures from the same user in the same sequence and assessed the EER of these combined gestures. This is how our gesture authentication technique was developed.


Developing a Gesture Authentication Technique

In mobile authentication, biometric systems offer an effective means of verifying legitimate users based on their unique "something they are" characteristic [2].

The objective of biometric identification is to verify a person's identity automatically by utilizing unique gestures that are specific to

them. The biometric authentication system comprises two stages: enrollment and authentication. During the enrollment phase, a new user must record their personal hand signs. This involves performing chosen hand signs with ample space for movement. There are various types of gestures, including parallel (where all fingertips move in the same direction), closed (where all fingertips move towards the center of the hand), opened (where all fingertips move away from the center of the hand), and circular (where all fingertips rotate around the center of the hand). Hidden Markov Models can be used to match touch sequences with individual fingers. These models are statistical representations of an unobserved state within a Markov process [1] [3]. The following text describes a collection of finite states connected by transitions, similar to Bayesian Networks.

The model utilizes two probabilities, namely the transition probability and the output probability distribution, to associate with each state. The determination of the model's parameters is based on training data [4][5]. Figure2 displays hidden states representing Hidden Markov Models, accompanied by observable symbols in N-dimensions. Additionally, Figure3 showcases Multitouch Movement. The conventional expression for HMM can be represented as shown [4]. HMM is a mathematical tool employed in modeling signals and objects presenting temporal structures adhering to the Markov process.

The Hidden Markov Model (HMM) is represented as AZA» = ( A, B, A?a‚¬ ) (4b), which is shown in Figure 4: Conventional Hidden Markov Model. The state transition matrix is denoted by a = { a ij }, where aij represents the probability P[qt+1=sj|qt=si] for transitioning from state si to state sj, with i and j

ranging from 1 to N.
The distribution of observation symbol probabilities is represented by b = { bj(k)}, where bj(k) denotes the probability P[Ot=vk|qt=sj] for observing symbol vk in state sj, with j ranging from 1 to N and k ranging from 1 to M.
The initial state distribution is denoted by < em > A?a‚¬< / em > = {< em > A?a‚¬< / em >< sub >< sub >i< / sub >, where A?a‚¬i represents the probability P[q1=si] for the initial state being si.
The set of states is defined as s={s1,s2,...,< em > sN< / em>} and the state at time t is denoted as q.
The set of symbols is given by v={,,...,< em > vM< / em>}
Given the observation sequence OT1=O1O2…OTO1T=OO/O>TT and the model AZA»=(A,B,A?a‚¬), our goal is to efficiently calculate P(O | AZA»). This refers to the probability of the observation sequence given the model.We have two states: Training, where we compute and adjust AZA»=AZA»A?a?z AZA»=AZA»A?a?z based on input data sequences { O }, in order to maximize likelihood P(O | AZA»). And Recognizing, where we assign the class that maximizes likelihood P(O | AZA») based on AZAi=(AAi,BAi,Aa?¬Ai)AZAi=(AAi,BAi,Aa?¬Ai) for each class. The probability distribution of observation symbols P[Ot=vk|qt=sj] can be discrete or continuous depending on whether observations are different symbols. B(i,k) = P(Ot=k|qt=si). If observations are vectors in RL, it is common to represent P[Ot|qt] as a Gaussian: N(y;AZAi,AZAi)=1/(2?)L/2|AZAi|1/2exp[-1/2(y-AZA)TAZA-1(y-AZA)]. Another representation is a mixture of M Gaussians: P[Ot=y|qt=si]=?m=1MP(Mt=m|qt=sj)-vN(y;AZAm,i,AZAm,i), where Mt specifies which mixture component to use and P(Mt=m|qt=sj)=C(i,m) is the conditional prior weight of each mixture component.

In our approach, we implement both continuous and discrete

output variable distributions for the 1st and 2nd HMM stages respectively [3][6].

Dynamic Time Warping

Introduced by Sakoe and Chiba in 1978, Dynamic Time Warping (DTW) is an algorithm that compares two sequences that may vary in time. For instance, if two video clips of different people walking the same path are compared, DTW would detect similarities in their walking patterns despite differences in speed, acceleration, or deceleration. The algorithm starts with a set of template streams that describe each gesture in the system database. This leads to high computation time and limitations in recognition speed. Moreover, storing numerous templates for each gesture results in resource-heavy space usage on a constrained device.

Consider a training set of N sequences fS1; S2; : : : ; SNg, where each Sg represents a sample of the same gesture class. Each sequence Sg is composed of a set of feature vectors at each time t, Sg = fsg1; : : : ; sgLgg for a certain gesture category, where Lg is the length in frames of sequence Sg. Let's assume that sequences are ordered according to their length, so that Lgt1 _ Lg _ Lg+1; 8g 2 [2; ::;N ], the median length sequence is _ S = SdN2 e. This sequence _ S is used as a reference and the rest of the sequences are aligned with it using the classical Dynamic Time Warping with Euclidean distance [4], in order to avoid the temporal deformations of various samples from an equivalent gesture class. Therefore, once the alignment method is applied, all sequences have length LdN2 e. We define the set of warped sequences as ~

S = f ~ S1; ~ S2; : : : ; ~ SNg.

Given a training set of N sequences fS1; S2; : : : ; SNg, each representing a sample of the same gesture class, we can consider each sequence Sg as composed of a set of feature vectors at each time t, Sg = fsg1; : : : ; sgLgg. In this case, Lg represents the length in frames of sequence Sg. By ordering the sequences based on their length (such that Lgt1 _ Lg _ Lg+1 for all g in [2; ::;N1]), we can find the median length sequence _ S =SdN2 e[4]. This selected sequence _ S is used as a reference, and the remaining sequences are aligned to it using the Dynamic Time Warping algorithm with Euclidean distance [3]. This alignment process helps to remove temporal deformations observed in different samples from the same gesture category. As a result of the alignment, all sequences end up having a length of LdN2 e.

We define the set of warped sequences as ~ S = f ~ S1; ~ S2; : : : ; ~ SNg [3].


Input:

A gesture C={c1,..cn} with corresponding GMM model AZA»={AZA»1,..AZA»m}, its similarly threshold value A‚Aµ, and the testing seprate Q={q1,..qn}, Cost Matrix M is defined,where N(x), x =(i,t) is the set of three upper-left location of x in M.


Output:

Working path of the dected gesture, if any.


Artificial Neural Networks

Artificial Neural Networks (ANNs) are networks of weighted, directed graphs where the nodes are artificial neurons, and the directed edges are

connections between them.

The most common structure for an Artificial Neural Network (ANN) is the feed forward Multi-Layer Perceptron. It is called "feed forward" because the signals travel only in one direction through the network [4][8].

  • The value of xp,i for a given input pattern p is held by the i-th node in the input layer.
  • The net input to the j-th node in the hidden layer is calculated as follows:
  • The output of the j-th node in the hidden layer can be obtained as follows:
  • Similarly, the net input to the k-th node in the output layer is calculated as:
  • Finally, the output of the k-th node in the output layer can be obtained as:

In order to compute network error for a given input pattern p, neurons are organized in layers with outputs of neurons in each layer connected to inputs of neurons in next layer. Eventually, each neuron in output layer has assigned value. Each neuron represents specific class of gesture and input pattern is assigned to class whose neuron has highest value. During training, gesture class for each neuron known and used to assign "correct" value to nodes.


Critical Analysis

This section presents a critical analysis of the achieved results. The performance of ANNs, HMMs, and DTW algorithms was measured on a mobile phone in terms of recognition speed, accuracy, and training time [3]. Bayesian Networks, which are a superclass of

HMMs specifically designed for gesture classification, were not considered. Based on recognition, accuracy, and training time, it can be concluded that DTW outperforms HMM and ANN. The following results are summarized below:


Finger Tracking:

To adjust finger tracking parameters, calibration needs to be activated in the tab in the on-screen display [5][9].

Projection Signatures

Projection signatures are directly performed on the resulting threshold binary image of the hand [5].

The main process of this algorithm involves adding binary pixels diagonally, row by row (in the vertical direction). Prior knowledge of the hand angle is necessary. A low-pass filter is used on the signature (row sums) to reduce low-frequency variations that create multiple local maxima, causing multiple positive detections (more than one detection per fingertip). The resulting five maxima correspond to the positions of the five fingers.

Geometric Properties

The second algorithm is based on geometric properties and, as shown in line 3 of figure 5, utilizes a contour image of the hand with a reference point. This reference point can be determined by either finding the center of mass of the contour (barycenter or centroid), or selecting a point on the wrist [6].

Hand Movement Euclidean distances from the starting point to every point on the contour are calculated. The five highest distances are assumed to represent the finger ends, while the lowest distances can be used to identify the spaces between fingers. Additionally, a filtering process is necessary to minimize false positive results.

Circular Hough Transform: The circular Hough transform is applied to the contour image of the hand, although it can also be performed on an edge

image with a complex background if there are no circular elements corresponding to the fingertip radius. To efficiently identify finger ends, any points outside the contour image are discarded. However, this discard set contains a mixture of finger valleys and false positives that are difficult to differentiate.

Color Markers
The marker algorithm, unlike the three previous algorithms, relies on tracking color markers attached to the main joints of the fingers instead of hand characteristics. Each color is individually tracked using colour segmentation and filtering [5], allowing for identification of different hand segments. It is important for the marker colors to be easily trackable without affecting the threshold, edge, or contour image of the hand. By respecting these constraints, it becomes possible to apply all algorithms to the same video images and compare their accuracy and precision in relation to the markers [5].

All the presented algorithms have achieved success, to varying degrees, in detecting each finger. The algorithm for projection signatures can only provide a rough identification of a finger, whereas the algorithms for circular Hough transform and geometric properties can detect both intersections between fingers and endpoints of fingers. It is worth noting that, when fingers are folded, the endpoints do not correspond to the fingertips [5].


Conclusion

We have explored three prominent strategies that comprehensively characterize signal recognition possible on smartphones. These strategies include Artificial Neural Networks, Dynamic Time Warping, and Hidden Markov Models. They were optimized and tested on devices with limited resources, such as cellular phones. The strategies were also compared in terms of accuracy and computational performance. Among them, ANNs demonstrated the slowest computation performance due to their large neural network size.

HMMs proved to

have better performance, however, the DTW algorithm demonstrated faster speed while maintaining comparable accuracy in recognition. Unlike HMMs and ANNs, DTWs did not require any training.


References

  1. Kalyani Devidas: Deshmane Android Software based Multi-touch Gestures Recognition for Secure Biometric Modality
  2. Memon, K. Isbister, N. Sae-Bae, N. and K. Ahmed, “Multitouch gesture based authentication,” IEEE Trans. Inf. Forensics Security, vol.

9, no. 4, pp. 568-582, Apr. 2014

  • Methods for Multi-touch Gesture Recognition:Daniel Wood
  • Finger Tracking Methods Using EyesWeb Anne-Marie Burns1 and Barbara Mazzarino2
  • DWT: Probability-based Dynamic Time Warping and Bag-of-Visual -and-Depth-Words for Human Gesture Recognition
  • http://whatis.techtarget.com/definition/gesture-recognition
  • Get an explanation on any task
    Get unstuck with the help of our AI assistant in seconds
    New