Skinput Technology Essay Example
Skinput Technology Essay Example

Skinput Technology Essay Example

Available Only on StudyHippo
  • Pages: 17 (4451 words)
  • Published: February 16, 2018
  • Type: Essay
View Entire Sample
Text preview

The utilization of the human body as an input device is attractive for two primary reasons. Firstly, the human body offers a substantial external surface area of approximately two square meters, making it a suitable interface. Additionally, different parts of our body, such as hands, arms, upper legs, and torso, are easily accessible. Our inherent knowledge of our body's positioning in three-dimensional space enables us to interact with it precisely without visual aid. For instance, we can independently move each finger, touch our nose effortlessly, and bring our hands together for clapping.

In this paper, we introduce Skink's, a unique method that enables accurate and eyes-free finger input by utilizing a non-invasive, wearable bio-acoustic sensor. It is one of the few external input devices that offers both this level of precision and a wide interactio

...

n area. The paper presents the contributions of this work.

1) A novel, wearable sensor for bio-acoustic signal acquisition is depicted in Figure 1.

2) Our system is capable of determining the location of finger taps on the body through the use of an analysis approach.

Skink's is a technology that utilizes the human body for acoustic transmission, allowing the skin to act as an input surface. Our main objective is to detect finger taps on the arm and hand by analyzing mechanical vibrations that travel through the body. To capture these signals, we use a special armband with a unique array of sensors. This approach provides a convenient and portable finger input system that is always accessible on the body. We conducted a user study with twenty participants, consisting of two parts, to evaluate the effectiveness, precision, an

View entire sample
Join StudyHippo to see entire essay

limitations of our method.

To showcase the efficiency of our method, we have developed multiple proof-of-concept applications that illustrate its versatility and possibilities. These applications encompass a range of functionalities such as bio-acoustics, finger input, buttons, gestures, on-body interaction, projected displays, and audio interfaces.

Small mobile devices such as smartphones and smartwatches face limitations in interaction space due to their small size. This limitation affects the usability and functionality of these devices, including their screens and buttons. Enlarging these elements would contradict the primary advantage of small mobile devices. Therefore, alternative approaches are being explored to enhance interactions with such devices. One approach involves leveraging the surface area of the environment for interactive purposes whenever feasible. For instance, a technique has been devised that allows a small mobile device to utilize tables as a canvas for finger input gestures.

However, tables are not always present, and in a mobile context, users are unlikely to want to carry permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

An armband has been developed with a built-in wearable, bio-acoustic sensing array. The sensing elements within the array can detect vibrations transmitted through the body. The above-mentioned two sensor packages consist of five cantilevered pizza films, each with a specific weight and responsive to a particular frequency

range. To evaluate the performance and constraints of this system, a user study has been conducted. Moreover, prototype applications and further experimentation have been conducted to delve into the wider range of bio-acoustic input.

Related Work

Always-Available Input is necessary for everyday computing tasks, but it demands levels of focus, training, and concentration that do not align with typical computer interaction.

There has been limited research on the combination of finger input and biological signals. Scientists have utilized muscle activation's electrical signals during hand movement using electromyography (MEG). However, this method typically requires expensive amplification systems and the use of conductive gel for signal acquisition, making it less feasible for most users. The most similar input technology to ours is from Aments et al. 2, who placed contact microphones on a user's wrist to assess finger movement. However, this study was never officially evaluated and only focused on finger motions in one hand. The Hammond system uses a similar setup and achieves around 90% accuracy in classifying four gestures (e.g., raising heels, snapping fingers) through an HEM. Both systems have not yet tested false positive rejection performance. Additionally, both techniques require placing sensors near the interaction area (e.g., the wrist), which increases their invasiveness and visibility.

Bone conduction microphones and headphones, now popular among consumers, detect speech-related frequencies and transmit them effectively through bone. Positioned near the ear, these microphones capture vibrations produced during speech in the mouth and larynx. Sound is transmitted directly to the inner ear via the skull and jaw bones, bypassing the air and outer ear. This process ensures a clear pathway for environmental sounds.

Skink's main objective is to

provide a mobile input system that does not require users to carry or handle a device. There have been various proposed approaches in this field. Computer vision techniques are commonly used but they can be computationally intensive and prone to errors in mobile scenarios. Speech input is a logical choice for always-available input, but it is limited in precision in unpredictable acoustic environments and presents privacy and scalability concerns in shared settings. Another approach involves wearable computing, where a physical input device is integrated into clothing. For instance, glove-based input systems enable users to maintain their natural hand movements but they are bulkier, uncomfortable, and disrupt tactile sensation.

Post and Roth describe a "smart fabric" system that integrates sensors and conductors into fabric. However, adopting this approach to achieve constant input requires embedding technology in all clothing, which would be too intricate and costly. In contrast, the Sixteenths project presents a mobile input/output capability by merging projected information with a vision tracking system based on color markers. While this approach is possible, it faces challenges in terms of occlusion and accuracy. For instance, distinguishing between a finger tapping a button and a finger hovering over it is extremely challenging.

The current study examines the integration of on-body sensing with on-body projection. Bio-sensing Skink's takes advantage of the natural acoustic conduction properties of the human body to offer an input system, which is similar to previous research on using biological signals for computer input. Diagnostic medicine signals like heart rate and skin resistance, typically used for medical purposes, have been adapted to determine a user's emotional state (e.g., These features are typically uncontrollable and cannot be

accurately used as direct input.

Furthermore, brain sensing technologies like electroencephalography (EGG) and functional near-infrared spectroscopy (finer) have been utilized by researchers in the field of Human-Computer Interaction (HCI) to evaluate cognitive and emotional states. The primary focus of this research has mainly centered on involuntary signals. Conversely, brain signals have also served as a direct input for individuals with paralysis. However, current interfaces called Ibis that enable direct communication with the brain lack the necessary bandwidth required. In our study, we take inspiration from systems that employ acoustic transmission mound and multiple sensors to detect hand taps on a glass window.

Sushi et al. have employed a comparable technique for localizing a ball hitting a table in order to enhance a real-world game using computational methods. While we investigated the use of acoustic time-of-flight for localization, we determined it to be inadequate for accurate detection on the human body. As a result, we developed the fingerprinting approach detailed in this paper.

Skink's

Skink’s is a new input technique that enables the skin to function as a touch input surface, with the aim of broadening the range of sensing modalities for lackadaisically input systems.

Our main emphasis is on the arm for our prototype system, although this approach can also be applied to other areas. We have chosen the arm because it offers a large surface area for interaction and has a smooth, uninterrupted surface that is suitable for projection. Additionally, the forearm and hands have a diverse arrangement of bones, allowing us to capture distinct acoustic data from different positions. To obtain this data, we have created a wearable armband that is easy

to use and does not cause discomfort.

This section focuses on the mechanical phenomena involved in Skink's and specifically discusses the mechanical properties of the arm. It also explains the Skink's sensor and the techniques used to segment, analyze, and classify bio-acoustic signals. When a finger taps the skin, transverse waves (ripples) are created, displacing the skin. The sensor is activated when the wave passes underneath it. The tapping of the skin produces various forms of acoustic energy, including sound waves that are radiated into the air but not captured by the Skink's system.

The skin displacement caused by a finger impact creates transverse waves, which are the primary form of acoustic energy transmitted through the arm. When recorded with a high-speed camera, these waves manifest as ripples that emanate from the point of contact. The magnitude of these ripples is determined by both the impact force and the properties of the soft tissues in the affected region. Tapping on softer sections of the arm generally produces larger transverse waves than tapping on boney areas, which possess restricted flexibility.

In addition to the surface energy, there is also energy that travels inward, towards the skeleton. These waves travel through the soft tissues of the arm and stimulate the bone. The bone responds to this mechanical stimulation by rotating and translating as a rigid body. This vibration affects the soft tissues around the entire length of the bone, causing new waves to propagate outward to the skin.

We emphasize two distinct forms of conduction: arrangers waves that travel directly along the arm surface and longitudinal waves that pass through soft tissues into and out of the

bone. These mechanisms transport energy at different frequencies and distances. In general, higher frequencies are better transmitted through bone than through soft tissue. Although we do not explicitly model or rely on these conduction mechanisms for our analysis, we believe that the effectiveness of our technique relies on the intricate acoustic patterns that arise from the combination of these modalities.

In the same way, we also hold the belief that Joints have a crucial role in creating unique acoustic qualities at tapped locations. Ligaments hold bones together, and Joints can have other biological structures like fluid cavities, making them function as acoustic filters. Some Joints may dampen acoustics, while others may selectively reduce certain frequencies, resulting in distinctive acoustic signatures. Figure 3 illustrates the propagation of longitudinal waves: Finger impacts generate longitudinal waves that cause the internal skeletal structures to vibrate.

The bone creates longitudinal waves that travel from the bone to the skin. To capture different acoustic information, various sensing technologies were evaluated, such as bone conduction microphones, conventional microphones with stethoscopes, pizza contact microphones, and accelerometers. However, these transducers were designed for different purposes than measuring acoustics transmitted through the human body.

In our research, we discovered that mechanical sensors have several shortcomings. One major issue is that they are designed to have a consistent response across a wide range of frequencies, which is typically desirable for accurately representing input signals. However, in the case of tap input, only certain frequencies are transmitted through the arm. This means that a flat response curve results in capturing unnecessary frequencies, thereby increasing the signal-to-noise ratio.

Although bone conduction microphones may appear suitable for Skink's hooch,

they are typically designed for capturing human voice and disregard energy below the frequency range of human speech (starting at GHz). As a result, most sensors in this category lack sensitivity to lower-frequency signals, which we discovered through our empirical pilot studies to be crucial for characterizing finger taps. In order to overcome these obstacles, we transitioned from using a single sensing element with a flat response curve to utilizing an array of finely tuned vibration sensors.

We utilize small, cantilevered pizza films from Unseemliness, Measurement Specialties, Inc. These films can be adjusted to change the resonant frequency by attaching small weights to the end of the cantilever. This allows the sensing element to be sensitive to a specific, narrow, low-frequency range of the acoustic spectrum. Figure 4 demonstrates the response curve, which shows the relative sensitivity of the sensing element at a resonant frequency of 78 Hz. Adding more mass to the cantilever reduces the range of excitation that the sensor can respond to. Studies have found this to be useful for characterizing bio-acoustic input.

Response Curve
Figure 4 shows the response curve of one of our sensors, which has been adjusted to resonate at a GHz frequency. The graph illustrates a decrease of ?dB within a range of ±GHz from the resonant frequency. Additionally, due to their cantilevered design, these sensors are naturally not affected by forces parallel to the skin. As a result, they minimize skin stretching caused by everyday movements. However, they are highly sensitive to motion perpendicular to the skin plane. This sensitivity makes them perfect for capturing transverse surface

waves and longitudinal waves originating from internal structures.

Finally, our sensor design is affordable and can be made in a compact size (e.g., MESS), making it suitable for integration into upcoming mobile devices (e.g., an arm-mounted audio player). In our prototype system, we use a Mackey Onyx OFF audio interface to digitally capture data from the ten sensors. This interface is connected to a conventional desktop computer through Firmware. Then, a thin client written in C interacts with the device using the Audio Stream Input/output (ASIA) protocol.

The arm's relevant spectrum of frequencies was captured through channel sampling at 5 Hz, which is considered too low for speech or environmental audio. However, this reduced sample rate allows for easy portability to embedded processors. The Ordains platform's Teammates processor, for instance, can sample analog readings at kHz without sacrificing precision, making it capable of meeting Skink's (kHz total) full sampling power requirements. Afterward, the data was sent from our thin client to our primary application, Ritter in Java, via a local socket.

This program had three main functions. Firstly, it displayed real-time data from our ten sensors, helping to identify acoustic features. Secondly, it separated inputs from the data stream into separate instances (taps). Thirdly, it classified these input instances. The audio stream was divided into individual taps by calculating an absolute exponential average across all ten channels (Figure 6, red waveform). If the intensity threshold was surpassed (Figure 6, upper blue line), the program recorded the timestamp as a potential tap start.

If the intensity did not drop below a second, independent "closing" threshold (Figure 6, lower purple line) for both moms and

moms after the onset crossing, the event would be rejected. If there were start and end crossings that met these criteria, the acoustic data within that time period (along with a moms buffer on each end) would be considered as input for the event (Figure 6, upper Array Lower Array 25 Hz's 25 Hz's 27 Hz's 27 Hz's 30 Hz's 40 Hz's 38 Hz's 44 HZ'S 78 HZ'S 64 HZ'S).

The decision to have two sensors on the upper arm (above the elbow) was made in order to collect acoustic information from both the fleshy bicep area and the firmer area on the underside of the arm. This would provide better acoustic coupling to the Hummers, which is the main bone running from shoulder to elbow. When the sensor was placed below the elbow, on the forearm, one package was positioned near the Radius, which runs from the lateral side of the elbow to the thumb side of the wrist, and the other package was located near the Ulna, which runs parallel to this bone on the medial side of the arm closest to the body.

Each location provided distinct acoustic coverage and information, aiding in distinguishing input location. Through pilot data collection, resonant frequencies were chosen for each sensor package. The upper sensor package was adjusted to be more responsive to lower frequency signals, which were more common in fleshy regions. Conversely, the lower sensor array was tuned to detect higher frequencies, enabling better capture of signals transmitted through denser bones.

The two sensor packages have individual elements with resonant frequencies. Each segmented input has its own resonant frequency. These frequencies are used for

classification by the trained SAVE. The software uses an event model, where an event is created for each classified input. Any interactive features linked to that event are activated. Our video demonstrates that we are able to achieve fast interactivity.

The participants of the experiment were recorded producing ten channels of acoustic data by performing three finger taps on the forearm and then three taps on the wrist. The exponential average of these channels is depicted in red. The segmented input windows are highlighted in green.

Two different sensing elements are activated in the two locations. In order to assess the performance of our system, we enlisted 13 participants (7 female) from the Greater Seattle area, who represented a range of ages and body types. The participants' ages varied from 20 to 56, with an average age of 38.3, and their body mass indexes (BMIs) ranged from 20.5 (normal) to 31.9 (obese). The experimental conditions included tactical green regions. Despite its simplicity, this heuristic proved extremely reliable because of our successful noise suppression through the employed sensing approach. Once an input is segmented, the waveforms are analyzed.

The discrete nature of taps (i.e. point impacts) results in acoustic signals that do not express themselves over time as gestures (e.g. clenching of the hand) do. These signals simply decrease in intensity over time. Therefore, we calculate features for the entire input window and do not capture any temporal dynamics. To accomplish this, we use a brute force machine learning approach and calculate a total of 186 features, many of which are derived combinatorial. For general information, we include the average amplitude, standard aviation, and total (absolute)

energy of the waveforms in each channel (30 features).

From these calculations, we derive a total of 45 average amplitude ratios between channel pairs. Additionally, we include the average of these ratios as one feature. For each of the ten channels, we calculate a 256-point FT. However, we only consider the lower ten values and normalize them by the highest-amplitude FT value across all channels. We also calculate the center of mass of the power spectrum for each channel within the specified GHz to GHz range. This provides a rough estimate of the fundamental frequency of the signal displacing each sensor, resulting in ten features.

The most predictive features for subsequent feature selection are identified as the all-pairs amplitude ratios and specific bands of the FT. These 186 features are then used as input for a Support Vector Machine (SAVE) classifier. For a comprehensive explanation of CSV's, please refer to [4] for a tutorial. Our software utilizes the implementation provided in the Weak machine learning toolkit [28]. However, it is important to note that alternative and more advanced classification techniques and features can also be utilized. Therefore, the results presented in this paper should be considered as a baseline.

Before the SAVE can categorize input instances, it must undergo training by the user and sensor position. This process involves gathering multiple examples for each input location of interest. When recognizing live input using Skink's, the same 186 acoustic features are calculated in real-time. We have chosen three input groupings from numerous possible combinations to test. We believe these groupings are significant for interface design and also challenge our sensing ability.

Five different experimental conditions were

derived from these three groupings. One set of gestures tested involved participants tapping on the tips of each of their five fingers. The fingers are interesting for input as they provide clearly discrete interaction points, already well-named (e.g., ring finger). In addition to five finger tips, there are 14 knuckles (five major, nine minor), resulting in a total of 19 identifiable input locations on the fingers alone.

Secondly, our finger-defogger manipulation skills are extraordinary, as seen when we count by tapping on our fingers. Additionally, the fingers are arranged in a linear order, which can be advantageous for tasks such as entering numbers, adjusting magnitude (e.g., volume), and selecting from menus. Meanwhile, aside from the thumb, the fingers possess a comparable skeletal and muscular structure, making them one of the most consistent parts of the body. As a result, this significantly diminishes sound variation and makes it challenging to distinguish between them.

In addition, the acoustic information needs to pass through five (finger and wrist) joints in order to reach the forearm, which results in decreased signal strength. To overcome these challenges, we placed the sensor arrays on the forearm, just below the elbow, for this experimental indentation. Despite these obstacles, preliminary experiments demonstrated noticeable acoustic variations between fingers. We hypothesize that this is mainly influenced by differences in finger length and thickness, interactions with the intricate structure of wrist bones, and differences in acoustic transmission properties of the muscles that extend from the fingers to the forearm.

The use of five input locations on the forearm and hand, namely the arm, wrist, palm, thumb, and middle finger, was investigated in another gesture set

called "Whole Arm" (Figure 7). A within-subjects design was employed for the study, with each participant performing tasks in each of the five conditions in randomized order. The conditions included having sensors below the elbow for five fingers, having sensors above the elbow for five points on the whole arm, having sensors below the elbow for the same points (sighted and blind), and having sensors above the elbow for ten marked points on the forearm.

Participants were positioned in a traditional office chair, facing a desktop computer displaying stimuli. When the sensors were positioned below the elbow, the armband was placed ? CM away from the elbow, with one sensor package near the radius and another near the ulna. If the sensors were positioned above the elbow, the armband was placed ? CM above the elbow, with one sensor package resting on the biceps. Right-handed participants had the armband placed on their left arm so they could use their dominant hand for finger input.

The system's operation was unaffected when we flipped the setup for the left-handed participant. The armband's tightness was adjusted to ensure comfort while being firm. Artisans had the option to place their elbow either on the desk, tucked against their body, or on the chair's adjustable armrest during task performance. The study evaluated three sets of input locations, which were chosen for two reasons. Firstly, these locations were distinct and named parts of the body (e.g., "wrist"). As a result, participants could accurately tap these locations without any training or markings.

Furthermore, during piloting, it was discovered that these locations had unique acoustic characteristics due to the wide distribution of input

points, which resulted in additional variation. These locations were utilized in three different conditions. In one condition, the sensor was positioned above the elbow, and in another condition, it was positioned below the elbow. This variation was included in the experiment to determine the decrease in accuracy at this significant point of articulation. Additionally, participants repeated the lower placement condition while being in an eyes-free context, where they were instructed to close their eyes and face forward for both training and testing.

This condition was added to test how well users could accurately select input locations on the body without visual feedback, such as in a driving scenario. In order to determine the maximum potential of our sensing methods, our fifth and final experimental condition involved ten locations specifically on the forearm (Figure 6, "Forearm"). Unlike the whole-arm condition, this had a very high density of input locations, and unlike the hand, it relied on a surface (the forearm) that was physically uniform. We anticipated that these factors would make it challenging to sense acoustically. Additionally, this location was chosen because of its large and flat area, as well as its easy accessibility for visual inspection and finger input. At the same time, it provided an ideal projection surface for dynamic interfaces. To maximize the available surface area for input, we used small colored stickers to mark the targets, like in the previously described conditions.

In order to reduce confusion and ensure consistency in input, the forearm was chosen as the ideal location for projected interface elements. Stickers were used as temporary placeholders for projected buttons. The experimenter demonstrated finger taps on each location to

be tested and participants practiced duplicating these motions for about a minute with each gesture set. This helped participants understand our naming conventions like "Pinky" and "wrist" and allowed them to practice tapping their arm and hands with a finger from their other hand. It also helped convey the appropriate tap force to participants, who often initially tapped too hard. To train the system, participants were instructed to comfortably tap each location ten times using a finger of their choice. This process was repeated three times for each input location set, resulting in a total of 30 examples per location and 150 data points.

An exception to the standard procedure occurred with the ten forearm locations; only two rounds were gathered for time efficiency (20 examples per location, totaling 200 data points). Each experimental condition took about three minutes to complete. The training data was utilized to construct a SAVE classifier. In the subsequent testing phase, participants were given text stimuli (e.g. "tap your wrist") that directed them where to tap. The order of stimuli was randomized, with each location being presented ten times in total.

The system performed segmentation and classification in real-time and gave immediate feedback to the participant, such as "you tapped your wrist". We provided feedback to show participants where the system made errors, as they would experience with a real application. If an input was not segmented (i.e. the tap was too quiet), participants could see this and tap again. Overall, SEG- Figure 8 shows the accuracy of the three whole-arm-centric conditions, with error bars representing standard deviation. Negligible imitation error rates were found in all conditions and were not

included in further analysis.

Results

By grouping the ten input locations into A-E and G, we were able to achieve higher accuracies using a design-centric strategy. Additionally, F was created by analyzing per-location accuracy data. In this section, we present the classification accuracies for the test phases in five different conditions. In general, the classification rates were high, with an average accuracy of 87.6% across all conditions.

Get an explanation on any task
Get unstuck with the help of our AI assistant in seconds
New