Source/Filter Theory and Speech Sound Production – Flashcards
Unlock all answers in this set
Unlock answersquestion
For voiced sounds, the larynx produces pulses of air whose glottal spectrum is composed of a series of harmonics that decrease in amplitude as frequency increases For unvoiced sounds, the source is turbulent noise
answer
Source
question
The vocal tract filter is an open/closed tube. The resonances of the vocal tract are called formants. The formants act as band-pass filters. Changes in the length and shape of the vocal tract tube alter the resonant frequencies, resulting in production of different speech sounds.
answer
Filter
question
The output spectrum is obtained by passing the glottal spectrum through the vocal tract filter
answer
Output
question
Energy from the sound source (vocal folds) is modified by the resonance characteristics of the filter (vocal tract). The output spectrum represents the result of the filtering of the vocal tract on the sound source. The output spectrum has a harmonic energy distribution, but the amplitude characteristics of the source harmonics has been shaped by the formants. The source and the filter are independent from each other. Therefore, the harmonic structure of the source has no effect on the resonance frequencies (formants) of the filter (vocal tract). Speech can be produced at many different fundamental frequencies. Sources of sound may include: Vocal fold vibration (glottal source), Air turbulence through regions of constriction, or external energies
answer
The Acoustic Theory of Speech Production (Source-Filter Theory)
question
Are due to the size and shape of the container/cavity, not the nature of the sound source into that container. Thus, a speaker's fundamental frequency does not affect the formant frequencies.
answer
Resonances
question
the rate of vocal fold vibration (i.e., the fundamental frequency) Thus, changing the size of the vocal tract, by moving the articulators, affects the formant frequencies, but not the speaker's vocal pitch. Women have higher vocal pitch than men due to shorter vocal folds. Women have higher formant frequencies than men due to shorter vocal tracts
answer
A speaker's pitch is determined by...
question
Speech sounds need airflow. Most sounds (including all the sounds of English) are created by modifying a stream of air that is pushed outward from the lungs. Larynx - Periodic complex tone produced by vocal fold vibration. Oral Cavity - Aperiodic noise produced by pressure changes or airflow through a constriction.
answer
Two Sound Sources for Speech
question
Yes Even though the sound source is not always vocal fold vibration, there is still a SOURCE and a FILTER (resonating chamber)
answer
Can the source-filter theory account for consonant production?
question
The sound source for fricatives is turbulent noise produced by airflow through a tight constriction. The filter is the oral cavity, specifically the area in front of the constriction. The size of the cavity is the primary determinant of the resonant shaping (transfer function) of the noise (sound source). The length of the cavity determines its lowest frequency resonance. So, the source (turbulent noise) can be shaped (filtered) by the resonant characteristics of the filter (oral cavity in front of the constriction) in the same way that the laryngeal sound source is shaped by the resonances of the oral cavity for vowels.
answer
Fricatives
question
If the length of the front cavity is very short (labio-dental ; lingua-dental fricatives), the lowest frequency resonance of that cavity will be so high that it will contribute little to the shaping of the source (turbulent noise) spectrum. This will produce a flat spectrum for the radiated fricative. If the length of the front cavity is increased, the lowest frequency resonance of that cavity will be low enough to affect the shaping (filtering) of the noise source. For example, listen to the sound of "s" vs "sh". Which has the lower "pitch"?
answer
Fricatives and cavity size/length
question
Under certain conditions, the back cavity can contribute to the resonance shaping - this condition is termed "coupling" between the front and back cavities.
answer
Coupling
question
A. laminar B. turbulent C. first laminar, then turbulent Laminar - all the molecules are going in the same direction Turbulent flow - some of the molecules are going sideways as well as down
answer
Laminar and turbulent flow
question
Determines the vocal tract filtering for many consonants If we think of formants as Band-Pass Filters, Antiformants may be considered Band-Reject Filters. Antiformants weaken (attenuate) the amplitude of the frequency components of the source spectrum. Frequencies of the antiformants are determined by the size of the back cavity and of the size of the constriction (for fricatives).
answer
Antiformants
question
When the vocal tract is radically constricted (stops or fricatives). When the vocal tract is bifurcated (split) into two passages. -Example - Oral ; Nasal passages formed for nasal sounds. -Example - Oral cavity split left/right by the midline tongue closure produced for an "L".
answer
Antiformants arise...
question
Vowels are produced with a relatively open vocal tract. Vibration of the vocal folds is the sound source for vowel production - Vowels are always voiced. The shape of the vocal tract determines the resonance pattern (formants) for a particular vowel. Vowels have the highest energy and longest duration of the phonemes
answer
Vowels
question
In English, back vowels tend to be produced with lip rounding, front vowels without. The acoustic correlate of lip rounding is lowering of all of the formant frequencies. Lip rounding lengthens the vocal tract.
answer
Lip rounding
question
Differences in the vowel at the beginning, middle, and end
answer
Coarticulation causes what in a vowel?
question
A listener can essentially "hear" the position of the speaker's tongue body. The position of vowels in the vowel quadrilateral is predicted better by formant frequencies than by articulatory shape. F1 is influenced by tongue body height. F2 is influenced by tongue body A-P (frontness/backness) position. An even more accurate indicator of A-P position than F2 is the difference between the first two formants, i.e., F2 - F1.
answer
Vowel quadrilateral
question
(F1 = First Formant Frequency, F2 = Second Formant Frequency) F1 associated with Tongue Height /i/ & /u/ (high vowels): Low F1 /æ/ & /a/ (low vowels): High F1 F2 associated with A-P Position of tongue /u/ & /a/ (back vowels): Low F2 /i/ & /æ/ (front vowels): High F2 No matter how high F1 is, it will always be lower in frequency than F2.
answer
General rules of vowel formants
question
relative values (i.e., how the formant frequencies compare to each other) When a speaker produces the vowel /i/, we hear /i/ because of how F1 compares to F2 and F3, etc
answer
It is not the actual values of formants that allow you to hear an /a/ or an /i/, but the...
question
May be voiced or unvoiced The sound source may be the vocal folds or the result of vocal tract constriction
answer
Consonants
question
Bursts are typically evident for stops in word initial and medial position, but don't always occur in word final position. The burst is only about 10-30 msec in duration. Although transient, the burst (release) of the stop consonant may signal information about the place of articulation.
answer
Stops acoustic cue - the burst
question
Diffuse, flat, or falling spectrum
answer
Bilabial stop articulation
question
Diffuse, rising spectrum
answer
Alveolar stop articulation
question
Compact (mid-frequency emphasis) spectrum
answer
Velar stop articulation
question
Bends in the formant pattern that occur during the movement from the closure for the stop and the open vocal tract posture of the following sound (or vice versa). During a stop-to-vowel transition, formants may assume a rising, falling, or relatively flat pattern. This pattern depends on the place of articulation for the stop and the vocal tract configuration for the adjacent vowel.
answer
Stops acoustic cue - formant transitions
question
/b d g/ stops are voiced, /p t k/ are not. Acoustic cue that signals voicing for stops: Voice Onset Time. VOT = the interval between the release of the stop (the burst) and the onset of voicing. Negative, Simultaneous, or Short VOT indicates voiced stops (0-30 msec) Long VOT indicates voiceless (aspirated) stops (;30 msec VOT)
answer
Stops acoustic cue - voice onset time
question
voice onset precedes the burst release
answer
Prevoicing (negative VOT)
question
voice onset is simultaneous with the burst
answer
Simultaneous voicing
question
voice onset slightly follows the burst release
answer
Short lag
question
voice onset follows burst release at a longer lag
answer
Long lag
question
Acoustic cue signaling VOICING: Voice Onset Time (VOT) Acoustic cues signaling PLACE of articulation: Spectrum of Burst, Formant Transitions Acoustic cues signaling MANNER of articulation: Silence (Stop Gap), Plosion (Burst), Aspiration (for voiceless stops), Stereotypic rising F1 pattern
answer
Stops summary
question
Nasals are sonorants - always voiced. Nasal consonants are produced with nasal radiation of sound energy. The vocal tract is bifurcated (split) into two passages, creating antiformants. Although air cannot flow out of the oral cavity during the production of a nasal, the sound waves within the oral cavity and those within the nasal cavity interact in complex ways.
answer
Nasals, antiresonance
question
Nasal Murmur: cue for MANNER of articulation. -Spectrum dominated by low-frequency energy: Low-frequency formant at ; 500 Hz. (adult male speakers) due to the combinations of resonances, antiresonances, and damping effects of the nasal cavity. -Energy at higher frequencies is weak compared to this low-frequency resonance. -The nasal murmur for the three nasal consonants is very similar - not a reliable cue for place of articulation, only MANNER. Formant Transitions: cue for PLACE of articulation. Follow similar rules as stops.
answer
nasals, acoustic cues
question
Fricatives are obstruents because they involve the partial blocking of the vocal tract. Fricatives may be voiced or voiceless. Fricatives involve only partial blocking of the vocal tract, but the opening is so narrow that turbulence (eddy currents) patterns (frication) develop from the passage of air through the constriction. Additionally, an obstacle (teeth, lips) may be present in front of the constriction. Turbulent noise is inharmonic and has a broad (flat) spectrum: energy at many frequencies.
answer
Fricative phonetic quality
question
A primary feature of fricative production is the location at which turbulence occurs (location of the constriction). The turbulence noise (source) is filtered by the vocal tract, especially the cavity in front of the constriction. The size of the front cavity determines its filtering characteristics (transfer function). -A short front cavity will have little or no filtering effect because its lowest resonant frequency is so high that it isn't perceptually significant. -A long front cavity will have resonant frequencies that are low enough to contribute to the shaping of the source spectrum.
answer
Fricative phonetic quality, spectrum
question
Very short front cavity: little or no filtering effect on the turbulent noise source. Spectrum is of low energy, flat, and diffuse. These consonants sound very similar and appear similar on a spectrograph.
answer
Labiodental /f v/ & Linguadental /?, voiced th/ Fricatives
question
Low-energy, flat, and diffuse spectrum similar to the labiodentals & linguadental fricatives. Turbulence noise is generated in the larynx & pharynx & is filtered by the vocal tract as a whole. Vowel-like formant patterns are often evident in the radiated noise
answer
Glottal Fricative: /h/
question
Labiodental /f, v/ Linguadental /?, voiced th / Glottal fricative: /h/
answer
Weak fricatives
question
Lingua-alveoloar fricatives: /s, z/ Linguapalatal fricatives: /sh, ?/
answer
Strong fricatives
question
High energy noise spectra with most energy in the high frequencies (> 4 kHz for men, > 5 kHz for women) Front cavity contributes to shaping of the source spectrum
answer
Lingua-alveolar fricatives: /s, z/
question
Intense noise spectra with most energy in mid to high frequencies (> 2 kHz for men, > 3 kHz for women) Front cavity has a significant filtering effect on the source
answer
Linguapalatal fricatives: /sh, ?/
question
Cue to PLACE of articulation: Frequency range of frication (spectrum). Formant Transitions (more important for the weak fricatives). Cues to MANNER of articulation: Broad, Flat spectrum Frication noise Cues to VOICING: Whether or not the vocal cords are vibrating during the period of turbulent noise production. On a spectrograph, a voice bar will be present if voiced.
answer
Fricative acoustic cues
question
Affricate: a sequence of two speech sounds, a stop and a fricative, acting as a single linguistic unit, produced at the same place of articulation. Affricates ( / t? d / ) have a stop gap (silence) followed by intense frication. Frication noise is generated in a similar way to the fricatives, and the frequency range of frication is the same as for a fricative. Duration of frication is shorter in affricate than in fricative. Transition and voicing cues are similar to the stops and fricatives.
answer
Affricate acoustic cues
question
Glides are sonorants - always voiced. Glides ( / j w / ) have gradual transitions that appear as slowly changing formant patterns. Formant transitions are typically 75-150 msec, as compared to stops with transition times of approximately 50-75 msec.
answer
Glide acoustic cues
question
Liquids are sonorants - always voiced. The formant pattern (steady state and transition) is the primary acoustic cue. / l /: lateral, / r /: retroflex The formant pattern for the / l / results from the combination of formants and antiformants due to the oral cavity being split left/right by midline tongue closure Steady-state formant values for / l / (adult male) are 360 Hz for F1, 1300 Hz for F2, & 2700 Hz for F3 Liquid / r / has similar steady-state frequencies for F1 & F2, but F3 is much lower at ~1600 Hz The formant frequencies begin at these values and then change to match the frequencies of the following sound Formant transitions are the most reliable auditory cue for discrimination of the liquids
answer
Liquid acoustic cues