Speech And Transform Ii Computer Science Essay Example
Speech And Transform Ii Computer Science Essay Example

Speech And Transform Ii Computer Science Essay Example

Available Only on StudyHippo
  • Pages: 16 (4288 words)
  • Published: August 4, 2018
  • Type: Case Study
View Entire Sample
Text preview

The purpose of this project is to use MATLAB software and its functions to design and implement denoising techniques for a noisy audio signal. The experimenter will conduct a literature review, summarize it, and provide details on its contribution to the field of study.

The text discusses the analysis and study of various techniques used in audio and speech processing. The implementation of these techniques is done using MATLAB version 7.0.

Introduction

The Fourier analysis is a powerful tool that can be used to obtain the frequency and amplitude components of a signal. It is effective in analyzing stationary signals, which repeat and are composed of sine and cosine components. However, when it comes to analyzing non-stationary signals, which do not repeat in the sampled region, the Fourier transform is not as efficient. In contrast, the wavelet transform allows for

...

the analysis of these signals. The wavelet transform involves splicing the signal into different components and studying them individually.

In terms of Fourier analysis, a signal is analyzed based on its sine and cosine components in relation to frequency and time. However, when utilizing a wavelet approach, the analysis differs. The wavelet algorithm employs a process that examines the data on various scales and resolutions. In wavelet analysis, a specific type of wavelet, known as the mother wavelet, is used as the primary wavelet for analysis. Analysis is then conducted starting from the higher frequency mother wavelet. In contrast, Fourier analysis simplifies the frequency analysis of the signal using a simplified form of the mother wavelet. Further analysis can be performed on the coefficients obtained from these wavelet components. Haar wavelets are characterized by their compactness, whic

View entire sample
Join StudyHippo to see entire essay

means they vanish as the interval becomes larger. However, an important limitation of Haar wavelets is that they are not continuously differentiable.

The concept of Fourier transform allows the analysis of both time and frequency components of a given signal, utilizing the cosine and sine functions. By analyzing a finite set of sampled points using the discrete Fourier transform, an approximate function of the original signal can be obtained. This analysis involves the use of a matrix with an order equal to the total number of sample points. As the number of samples increases, the problem encountered becomes more challenging. However, if there is uniform spacing between the samples, it is possible to factor in the Fourier matrix by multiplying a few matrices. This results in a vector operation with an order of m log m operations, known as the Fast Fourier Transform. Both types of Fourier transform discussed are linear transforms.In the wavelet domain, more complex mother wavelet functions are formed for the inverse transform matrix, which can be cosine and sine in the case of the FFT and the DWT.

The Fourier transforms analyze the sine and cosine domains. However, in wavelet analysis, there is a more complex domain function called the wavelet. These wavelets are localized functions that are set in the frequency domain and can be observed in the power spectra. This localization is useful for identifying frequency and power distribution. Unlike Fourier functions like sine and cosine, which are not localized, wavelet transforms provide a useful feature for this research. This feature of wavelets makes operations using wavelet transforms "sparse," which is beneficial for noise removal purposes.

A significant benefit of employing wavelets

is the variability of their windows. This variability is particularly useful in identifying and analyzing non-continuous portions and signals. To address this, it is common practice to use short wavelet functions. However, for more detailed analysis, longer functions are preferred. An effective approach is to utilize basis functions with short high frequency and basis functions with long low frequency. It should be noted that unlike Fourier analysis, which has a limited set of basis functions consisting of sine and cosine waves, wavelets have an unlimited set of basis functions (A. Graps, 1995-2004).

This feature of wavelets is crucial as it enables them to detect hidden information in a signal, which can be concealed by other time frequency methods like Fourier analysis. Wavelets are categorized into various families, each containing different subclasses that are distinguished by their coefficients and levels of iteration. The classification of wavelets is primarily based on the number of coefficients they have, also known as their vanishing moments, with a mathematical relationship linking the two (Fig above; N. Rao 2001). One valuable aspect of using wavelets is that the user has control over the wavelet coefficients for a specific wavelet type. Certain families of wavelets have been developed to effectively represent polynomial behavior, with the Haar wavelet being the simplest among them.

The coefficients are filters and are placed in a transformation matrix to be applied to a raw data vector. They are ordered in different patterns, with one pattern serving as a smoothing filter and another pattern revealing the detail information of the data (D. Aerts and I. Daubechies 1979).

The wavelet analysis coefficient matrix is applied hierarchically in an algorithm based on its

arrangement. Odd rows contain different coefficients that act as filters for smoothing, while even rows contain wavelet coefficients with analysis details. The full length data matrix undergoes smoothing and dissemination by half. This process is repeated with the matrix to further smooth and halve the coefficients. The goal is to extract the highest resolutions from the data source and perform data smoothing. Wavelet applications, such as wavelet shrinkage and thresholding, have been proven efficient in removing noise from data. When decomposing data using wavelets, some coefficients correspond to details of the dataset. If a detail is small, it can be removed without affecting major features of the data.The concept of thresholding in

Literature Review

is explained by S. Cai and K. Li (2010). This involves setting coefficients that are either equal to or less than a specified threshold to zero. These coefficients are then utilized in an inverse wavelet transform to reconstruct the data set. The review also includes the examination of the research conducted by Nikhil Rao (2001), which served as the basis for the development of a novel algorithm centered on compressing speech signals utilizing discrete wavelet transform techniques.

The MATLAB software version 6 was utilized for simulating and implementing the codes. The steps taken for achieving compression are as follows:

  • Wavelet function is chosen.
  • Decomposition level is selected.
  • Speech signal is inputted.
  • Speech signal is divided into frames.
  • Each frame is decomposed.

Thresholds are calculated.

  • Coefficients are truncated.
  • Zero-valued coefficients are encoded.
  • Data frame is quantized and bit encoded.
  • Data frame is transmitted.
  • Parts of the above extract were taken from the work by Nikhil Rao (2001). Haar and Daubechies wavelets were used in speech coding and synthesis. The following functions from the MATLAB suite, namely dwt, wavedec, waverec, and idwt, were used for computing the wavelet transforms (Nikhil Rao, 2001). The wavedec function performs signal decomposition, the waverec function reconstructs the signal from its coefficients, and the idwt function serves as the inverse transform on the signal of interest. All these functions can be found in the MATLAB software.

    The speech file analyzed was segmented into frames of 20 ms, with each frame consisting of 160 samples. These frames were then decomposed and compressed. The file format utilized was .OD files, allowing for the decomposition of the entire file without dividing it into frames.

    In the experiment, both global and by-level thresholding techniques were employed. Global thresholding aimed to preserve the largest coefficients, irrespective of the size of the wavelet transform's decomposition tree. On the other hand, level thresholding preserved the approximate coefficients at each decomposition level. During this process, two bytes were used to encode zero values - the first byte indicating the starting points of zeros and the second byte tracking successive zeros.

    The work undertaken by Qiang Fu and Eric A. Wan (2003) was also examined. Their work focused on improving speech quality through

    a wavelet de-nosing framework. In their approach, the initial step was to process the noisy speech signal using a spectral subtraction method. This method aims to eliminate noise from the signal before applying the wavelet transform. Next, they employed the traditional approach of decomposing the speech into different levels using wavelet transforms. The estimation of thresholding was performed on these levels. However, in this project, they used a modified version of the Ephraim/Malah suppression rule for threshold estimation. Finally, the speech signal was enhanced using the inverse wavelet transform.

    The speech signal was preprocessed to remove small levels of noise while minimizing distortion. A generalized spectral subtraction algorithm proposed by Bai and Wan was used for this task. Wavelet packet decomposition was utilized, with a six-stage tree structure decomposition approach using a 16-tap FIR filter derived from the Daubechies wavelet. For an 8kHz speech signal, the achieved decomposition resulted in 18 levels. A new type of estimation method was used to calculate threshold levels, considering noise deviation for different levels and time frames. An altered version of the Ephraim/Malah rule for suppression achieved soft thresholding. The signal was re-synthesized using inverse perceptual wavelet transform as the final stage. S.Manikandan's work in 2006 focused on reducing noise in wireless signals received using special adaptive techniques.

    In this study, the signal of interest was corrupted by white noise. To estimate the threshold level for de-noising, a time frequency dependent approach was employed. Both the hard and soft thresholding techniques were utilized in this project. Under a certain value, the coefficients for hard thresholding were scaled. A universal threshold was used for the Gaussian noise added to the project.

    The error criterion used was under 3 mean squared. However, based on experiments, it was discovered that this approximation is not very efficient for speech due to poor relations between quality and correlated noise existence. Hence, a new thresholding technique was implemented. In this technique, the standard deviation of noise was first estimated at different levels and time frames. The threshold was then calculated for the signal as well as the different sub-bands and their corresponding time frames.

    The soft thresholding with a modified Ephraim/Malah suppression rule, as previously seen in other works in this area, was also implemented. Based on the obtained results, an unnatural voice pattern was identified and a new technique based on modification from Ephraim and Mala was implemented.

    Procedure

    • Several voice recordings were performed and the file was read using the 'wavread' function as it was in a .wav format.
    • The entire length of the signal was chosen for analysis in this project.
    • Different MATLAB functions were used to calculate the uncorrupted signal power and signal-to-noise ratio (SNR).
    • Additive White Gaussian Noise (AWGN) was then added to the original recording, corrupting the uncorrupted signal.
    • The average power of the noise-corrupted signal and the signal-to-noise ratio (SNR) were then calculated.
    • Signal analysis followed, which included using the wavedec function in MATLAB to decompose the signal.
    • The detail coefficients and approximated coefficients were extracted and plotted to show the different levels of decomposition.
    • The different levels of coefficients were analyzed and compared, resulting in a detailed analysis of the decomposition.
    • After decomposing the different levels, denoising was performed using the ddencmp function in MATLAB.
    • The denoising process utilized the wdencmp function in MATLAB. A plot comparison was made between the noise corrupted signal and

    the denoised signal.

  • The average power and SNR of the denoised signal were calculated and compared to the original and denoised signal.
  • Implementation/Discussion

    The initial part of the project involved conducting a recording in MATLAB. This recording featured my own voice with a default sample rate of Fs = 11025. Codes were employed to perform these recordings, and various variables were adjusted and specified based on the utilized codes. The submitted m file contains all the codes used for this project. The recordings lasted for 9 seconds, and the wavplay function was employed to replay the recorded audio until a satisfactory recording was achieved. Afterwards, the wavwrite function was used to store the previously recorded data into a wav file.The data initially stored in variable "y" was written into a WAV file named "recording1". A plot was created to display the wave format of the recorded speech file. The above plot, labeled as Fig1, shows the original recording without any noise corruption. According to Fig1, the signal's maximum amplitude is +0.5, and the minimum amplitude is -0.3. By visually inspecting the plot, it is evident that the majority of information in the speech signal lies within the amplitude range of +0.15 to -0.15.

    The power of the speech signal was calculated in MATLAB using a periodogram spectrum, which estimates the spectral density of the signal. This calculation is done using the Fast Fourier Transform (The MathWorks 1984-2010) on a finite length digital sequence. The Hamming window parameter was used, where the window function is a function that is zero outside a chosen interval. The Hamming window is a commonly used window function that is multiplied point by point with

    the input of the fast Fourier transform. This helps control the levels of spectral artifacts in the magnitude of the Fourier transform results, especially when the input frequencies do not align with the bin center.

    Convolution in the frequency domain can be thought of as windowing, similar to multiplication in the time domain. The result of this multiplication is that any samples outside a frequency range will affect the overall amplitude of that frequency. Figure 2 shows a plot demonstrating the periodogram spectral analysis of the original recording.

    From the spectral analysis, it was determined that the power of the signal is 0.0011 watt. After analyzing the signal, noise was added to it. Specifically, additive Gaussian white noise (AWGN) was used. AWGN is a random signal with a flat power spectral density (Wikipedia, 2010).The term "white" refers to the continuous and uniform frequency spectrum of additional white noise at a fixed bandwidth around a given center frequency. In the project, the term "additive" means that this impairment corrupts the speech in the original signal. To view the MATLAB code used to add the noise to the recording, refer to the m file.

    The initial recording had a signal power of 1 watt and a SNR of 80. The applied code was the signal z, which is a copy of the original recording y. The following plot depicts the analysis of the noise-corrupted original recording (Fig3).

    Upon examining the plot, it can be inferred that the original recording's information is masked by the additive white noise, resulting in a negative impact known as aliasing. The noise overpowering the clean information causes distortion, as evident from the graph where the corrupted

    signal's amplitude surpasses that of the original recording.

    To determine the noise power of the corrupted signal, the signal power was divided by the signal-to-noise ratio. For the first recording, the calculated noise power is 1.37e-005, which also represents the noise power of the corrupted signal.The spectrum peridodogram was utilized to determine the average power of the corrupted signal. MATLAB calculations yielded a calculated power of 0.0033 watt. Fig4 displays a plot illustrating the periodogram spectral analysis of the corrupted signal. Upon analyzing the aforementioned plot, it is evident that the frequency range of the corrupted signal is broader. In contrast, the original recording's spectral frequency analysis indicated a value of -20Hz, while the corrupted signal showed a value of 30Hz. This increase in frequency within the corrupted signal can be attributed to the added noise, which subsequently masked out the original recording, thus perpetuating the process of aliasing.

    The corrupted signal showed a higher average power compared to the original signal due to the presence of additive noise. The signal to noise ratio (SNR) of the corrupted signal was calculated by dividing the corrupted power by the noise power, resulting in a SNR of 240. In contrast, the de-noised signal had a SNR of 472.72. The decrease in SNR in the corrupted signal can be attributed to the higher level of noise compared to the clean recording. This decrease in SNR in the corrupted signal will be further discussed. The increased SNR in the clean signal will also be discussed. The basis for comparing SNR is to measure the corruption caused by noise on a signal, with lower ratios indicating higher levels of corruption. MATLAB was

    used to calculate the signal and noise power for this ratio, as shown above. The analysis of the signal proceeded by creating a .wav file for the corrupted signal using the MATLAB command wavwrite, specifying the sample frequency (Fs), the corrupted file (N), and the name of the noise recording. Furthermore, a file named x1 was created using the MATLAB command wavread for further analysis.

    The MATLAB command wavedec was used to perform wavelet multilevel decomposition on the signal x1. This function applies a one-dimensional multilevel decomposition using discrete wavelet transform (DWT) with pyramid algorithms. During the decomposition process, the signal is filtered using high pass and low pass filters. The low pass output is then further filtered using high pass and low pass filters, and this process repeats. The filters used are linear time invariant filters, which pass high frequencies and attenuate frequencies below a threshold known as the cut off frequency. The designer specifies the rate of attenuation. Conversely, the low pass filter only allows low frequency signals to pass and attenuates signals with higher frequencies than the cut off. The decomposition procedure mentioned above was performed 8 times, and at each level, the actual signal is downsampled by a factor of 2. The high pass output at each stage represents the wavelet transformed data, referred to as detailed coefficients.

    Fig 5 above shows the levels decomposition according to The MathWorks (1994-2010). Block C above contains the decomposition vectors, while Block L contains the bookkeeping vector. Based on the provided representation, a signal X with a specific length is decomposed into coefficients.

    During the first stage of decomposition, the signal X is convolved with a low

    pass filter to obtain the approximate coefficient cA1, and convolved with a high pass filter to obtain the detailed coefficient cD1.

    In the second stage, the signal cA1 is further processed. It is downsampled by a factor of two and passed through high and low pass filters to obtain the approximate and detailed coefficients.

    The above algorithm represents the first level decomposition performed in MATLAB. The original signal x(t) is decomposed into approximate and detailed coefficients. The algorithm involves passing the signal through a low pass filter to extract the detail coefficients, resulting in D2(t)+D1(t). This analysis can be further processed through a multi-stage filter bank to produce more detailed coefficients, as shown in the algorithm below (The MathWorks 1994-2010).The MathWorks (1994-2010) states that the coefficients, AcAm(k)A and AcDm(k)A, for m = 1,2,3 can be calculated by iteratively or cascading a single stage filter bank to create a multiple stage filter bank. A graphical representation of the multilevel decomposition is shown in Figure 6, where it can be observed that at each level the signal is down sampled with a factor of 2.

    At d8 obeservation shows that the signal is down sampled by 2^8 i.e. 60,000/2^8. The purpose of this is to improve frequency resolution. Lower frequencies are always present, but my main focus is on higher frequencies that contain the actual data. I have utilized the daubechies wavelet type 4 (db4), which are defined by calculating running averages and differences through scalar products with scaling signals and wavelets (M.I. Mahmoud, M.)

    I. M. Dessouky, S. Deyab, and F.

    H. Elfouly, 2007) For this specific wavelet, there is a linear balance frequency response but a non-linear phase response.

    The Daubechies wavelet types use overlapping windows to ensure that higher frequency coefficients reflect changes in their corresponding frequencies. This makes the Daubechies wavelet types efficient for denoising and compressing audio signals. In the Daubechies D4 transform, there are 4 types of wavelets and scaling coefficient functions, which are shown below. In the wavelet transforms, different scaling functions are used at each step. If the analyzed data has a value of N, the N/2 smoothed values are calculated by applying the corresponding scaling function. The smoothed values are then stored in the lower half of the input vector for the ordered wavelet transform. The coefficient values for the wavelet function are g0A = h3, g1A = -h2, g2A = h1, g3A = -h0. The different scaling and wavelet functions are calculated using the inner product of the coefficients and the four data values.

    The equations are shown below (Ian Kaplan, July 2001); The repetition of the steps of the wavelet transforms was then used in the calculation of the function value of the wavelet and the scaling function value. For each repetition, there is an increase by two in the index, and when this occurs, a different wavelet and scaling function is produced. The diagram above showing the steps involved in forward transform (The MathWorks 1994-2010). The diagram above illustrates steps in the forward transform. Based on observation of the diagram, it can be seen that the data is divided up into different elements. These separate elements are even and the first elements are stored in the even array, and the second half of the elements are stored in the odd array. In reality, this

    is folded into a single function even though the diagram above goes against this. The diagram shows two normalized steps. The input signal in the algorithm above (Ian Kaplan, July 2001) is then broken down into what are called wavelets. One of the most significant benefits of using wavelet transforms is the fact that it contains a window that varies. To identify signal not continuous, having base functions that are short is most desirable.

    In order to obtain detailed frequency analysis, it is more effective to use long basis functions. A compromise can be achieved by combining short high frequency functions with long low frequency ones (Swathi Nibhanupudi, 2003). Wavelet analysis involves an infinite set of basis functions, which allows for wavelet transforms and analysis of cases that cannot be easily achieved using other time frequency methods, such as Fourier transforms. To extract the detailed coefficients, MATLAB codes are used, as shown in the m file provided. The detailed coefficients commonly used are Daubechies orthogonal type wavelets D2-D20. The index number represents the number of coefficients, and for each wavelet, the vanishing moments are identical to half of the coefficients. This can be observed in the orthogonal types, where D2 contains only one moment and D4 contains two moments, and so on. The vanishing moment of a wavelet refers to its ability to represent information in a signal or its polynomial behavior. The D2 type, which contains only one moment, can easily encode polynomial of one coefficient that represents the constant signal component.

    The D4 type encodes a polynomial with two coefficients, while the D6 type encodes a polynomial with three coefficients. The scaling and wavelet functions

    need to be normalized, and the normalization factor is factorA A. The wavelet coefficients are obtained by reversing the order of the scaling function coefficients and reversing the sign of the second coefficient (D4 wavelet = {-0.1830125, -0.3169874, 1.1830128, -0.6830128}). Mathematically, this can be expressed asA whereA kA represents the coefficient index, A bA represents a wavelet coefficient, A cA represents a scaling function coefficient, and A NA represents the wavelet index (e.g., 4 for D4) (M. Bahoura, J.

    Bouat (2009) presents a plot fig12 that illustrates the coefficients' levels and their details. The next step in the de-noising process involves removing the noise using MATLAB functions. These functions, ddencmp and wdencmp, perform thresholding to eliminate the noise. De-noising, which aims to eliminate uninformative noise from signals, is crucial in signal and image processing applications. Wavelets are commonly used in this field due to their efficient algorithms and the sparsity of wavelet representation. Sparsity refers to the majority of wavelet coefficients having small magnitudes, while only a small subset has large magnitudes. This small subset contains the informative part of the signal, while the other coefficients describe noise and can be discarded for a noise-free reconstruction.

    The most well-known methods for de-noising wavelets are thresholding approaches, such as hard thresholding. In hard thresholding, coefficients with magnitudes greater than the threshold are kept unaltered because they contain useful information, while the remaining coefficients are considered noise and set to zero. However, it is reasonable to assume that coefficients are a mixture of noise and informative signals. Soft thresholding approaches have been proposed to address this issue. In soft thresholding, coefficients smaller than the threshold are set to zero,

    but the remaining coefficients are shrunk closer to zero by an amount equal to the threshold. This helps reduce the impact of noise on all wavelet coefficients. In my project, I have decided to apply the de-noising algorithm after performing an eight-level decomposition. The decomposition levels are determined based on the fact that the signal is filtered through eight stages of low pass filters. Therefore, using the approximate sequence is unnecessary for the de-noising process.

    The assumption made is that the Gaussian noise added is already known. The best technique for the de-noising process that I found is soft thresholding. This technique provides better results compared to hard thresholding as it results in a smoother approximation. For audio de-noising, the soft thresholding technique yields the best results. The calculated thresholding value using MATLAB was found to be 0.0825. To de-noise, certain parameters need to be set.

    The signal elements with values lower than the threshold are set to zero in hard thresholding. In soft thresholding, the same procedure is applied but nonzero values are shrunk towards zero to reduce discontinuities. The function 'ddencmp' provides the default threshold level and soft-thresholding is used by default. Additionally, 'den' represents the calculation of default values.

    Get an explanation on any task
    Get unstuck with the help of our AI assistant in seconds
    New