The vocal-tract filter

The diversity of sound from the somce and output are further emiched by the operation of the vocal tract. The vocal tract is collective term given to the pharynx, the oral cavity [Pg.150]

In general it is the oral cavity which is responsible for the variation in soimd. The pharynx and nasal cavity are relatively fixed, but the tongue, lips and jaw can all be used to change the shape of the oral cavity and hence modify tiie sound. The vocal tract can modify sounds from other sources as well by operation of tiie same principle. [Pg.151]

This model, whereby we see speech as being generated by a basic sound source and then further modified by the vocal tract, is known as the source/filter model of speech. The separation into source and filter not only adequately represents tiie mechanics of production but also corresponds to a reasonable model of perception in that it is known that listeners separate their perception of tiie source in terms of its fundamental frequency from the modified pattern of its harmonics. Furthermore, we know that the main acoustic dimension of prosody is the fundamental frequency, whereas the main dimensions of verbal distinction are made from a combination of the type of sound source (but not its frequency) and the modification by the vocal fract. The mathematics of both the source and the filter will be frilly described in Chapter 10. [Pg.151]

The diversity of sound from the source and output are fiirther enriched by the operation of the vocal tract. The vocal tract is collective term given to the pharynx, the oral cavity and the nasal cavity. These articulators can be used to modify the basic sound source and in doing so create a wider variety of sounds than would be possible by the source alone. Recall that all voiced sounds from the glottis comprise a fundamental frequency and its harmonics. The vocal tract functions by modifying these harmonics which has the effect of changing the timbre of the sound. That is, it does not alter the fundamental frequency, or even the frequency of the harmonics, but it does alter the relative strengths of the harmonics. [Pg.153]

We have informally observed that the vocal tract filter acts as a resonator that is, it amplifies certain fi equencies and attenuates others. How does this behaviour arise ... [Pg.318]

At integer multiples of the fundamental frequency we have the harmonics. Speech with a low fundamental frequency (say 100 Hz) will have closely spaced harmonics (occurring at 200 Hz, 300 Hz, 400 Hz,...), whereas speech with a higher fundamental frequency (e g. 200 Hz) will have widely spaced harmonics (400 Hz, 600 Hz, 800 Hz etc.). The tongue, jaw and tip positions create differently shaped cavities, the effect of which is to amplify certain harmoiucs while attenuating others. This gives some clue as to why we call this a vocal-tract filter here the vocal tract filters the harmonics by changing the amplitude of each harmonic. [Pg.159]

Another way to characterize the LPC filter is as an autoregressive (AR) spectral envelope model [Kay, 1988], The error minimized by LPC (time-waveform prediction error) forces the filter to model parametrically the upper spectral envelope of the speech waveform [Makhoul, 1975], Since the physical excitation of the vocal tract is not spectrally flat, the filter obtained by whitening the prediction error is not a physical model of the vocal tract. (It would be only if the glottal excitation were an impulse... [Pg.510]

A common and popular use of LPC is for speech analysis, synthesis, and compression. The reason for this is that the voice can be viewed as a source-filter model, where a spectrally rich input (pulses from the vocal folds or noise from turbulence) excites a filter (the resonances of the vocal tract). LPC is another form of vocoder (voice coder) as discussed in Chapter 7, but since LPC filters are not fixed in frequency or shape, fewer bands are needed to dynamically model the changing speech spectral shape. [Pg.90]

An amplification caused by a filter is called a resonance, and in speech these resonances are known as formants. The frequencies at which resonances occur are determined solely by the position of the vocal tract they are independent of the glottis. So no matter how the harmonics are spaced, for a certain vocal tract position the resonances will always occur at the same frequencies. Different mouth shapes give rise to different patterns of formants, and in this way, the production mechanisms of height and loudness give rise to different characteristic acoustic patterns. As each vowel has a different vocal tract shape, it will have different formant pattern, and it is these that the listener uses as the main cue to vowel identity. The relationship between mouth shapes and formant patterns is complicated, and is fully examined in Chapter 11. [Pg.161]

We know that the vocal tract has multiple formants. Rather than developing more and more complicated models to relate formant parameters to transfer functions directly, we can instead make use of the factorisation of the polynomial to simplify the problem. Recall from equation 10.66 that any transfer function polynomial can be broken down into its factors. We can therefore build a transfer function of any order by combining simple first and second order filters ... [Pg.310]

By definition, H gives the transfer function and frequency response for a unit impulse. In reality of course, the vocal tract input for vowels is the quasi-periodic glottal waveform. For demonstration piuposes, we will examine the effect of the /ih/ filter on a square wave, which we will use as a (very) approximate glottal source. We can generate the output waveforms y[n] by using the difference equation, and find the fi equency response of this vowel from //(e/ ). The input and output in the time domain and frequency domain are shown in figure 10.26. If the transfer function does indeed accurately describe ihe frequency behaviour of the filter, we should expect the spectra oiy[n, calculated by DFT to match H eJ )X(eJ ). We can see fiom figure 10.26 that indeed it does. [Pg.311]

The speech production process was qualitatively described in Chapter 7. There we showed that speech is produced by a source, such as the glottis, which is subsequently modified by the vocal tract acting as a filter. In this chapter, we turn our attention to developing a more formal quantitative model of speech production, using the techniques of signals and filters described in Chapter 10. [Pg.316]

Our first task is to build a model where the complex vocal apparatus is broken down into a small number of independent components. One way of doing this is shown in Figure 11.1b, where we have modelled the lungs, glottis, pharynx cavity, mouth cavity, nasal cavity, nostrils and lips as a set of discrete, coimected systems. If we make the assumption that the entire system is linear (in the sense described in Section 10.4) we can then produce a model for each component separately, and determine the behaviour of the overall system fi om the appropriate combination of the components. While of course the shape of the vocal tract will be continuously varying in time when speaking, if we choose a sufficiently short time fi ame, we can consider the operation of the components to be constant over that short period time. This, coupled with the linear assumption then allows us to use the theory of linear time invariant (LTI) filters (Section 10.4) throughout. Hence we describe the pharynx cavity, mouth cavity and lip radiation as LTI filters, and so file speech production process can be stated as the operation of a series of z-domain transfer functions on the input. [Pg.317]

In Section 12.12, we showed that the lossless tube model was a reasonable approximation for the vocal tract during the production of a vowel. If we assume for now that H z) can therefore be represented by an all-pole filter, we can write... [Pg.365]

These steps are repeated until i = p where we have a pol5momial and hence set of predictor coefficients of the required order. We have just seen how the minimisation of error over a window can be used to estimate the linear prediction coefficients. As these are in fact the filter coefficients that define the transfer function of the vocal tract, we can use these in a number of ways to generate other usefiil representations. [Pg.370]

That said formant s5mthesis does share much in common with the all-pole vocal tract model. As with the tube model, the formant synthesiser is modular with respect to the source and vocal tract filter. The oral cavity component is formed from the connection of between 3 and 6 individual formant resonators in series, as predicted by the vocal tract model, and each formant resonator is a second order filter of the t5q)c discussed in Section 10.5.3. [Pg.399]

A transfer function that creates multiple formants can be formed by simply multiplying several second order filters together. Hence the transfer function for the vocal tract is given as ... [Pg.401]

Despite its ability to faithfiilly mimic the target and transition patterns of natural speech, standard LP synthesis has a significant unnatural quality to it, often impressionisticly described as buzzy or metallic sounding. Recall that while we measured the vocal tract model parameters directly from real speech, we still used an explicit impulse/noise model for the source. As we will now see, it is this, and specifically the interaction of this with the filter, which creates the unnaturalness. [Pg.415]

The second problem concerns just how accurate our model of articulation should be. As we saw in our discussion on tube models, there is always a balance between the desire to mimic the phenomenon accurately and with being able to do so with a simple and tractable model. The earliest models, were more or less those described in Chapter 11, but since then a wide number of improvements have been made many along the lines described in Section 11.5. These have included modelling vocal tract losses, source-filter interaction, radiation from the lips, and of course improved glottal source characteristics REFS. In addition many of these have attempted to be models of both the vocal tract itself and the controls within it, such that many have models for muscle movement and motor control. [Pg.417]

Formant synthesis works by using individually controllable formant filters which can be set to produce accurate estimations of the vocal tract transfer function... [Pg.421]

MFCC synthesis is a technique which attempts to S3mthesise from a representation that we use because of its statistical modelling properties. A completely accurate S3mthesis from this is not possible, but it is possible to perform fairly accurate vocal tract filter reconstruction. Basic techniques use an impulse/noise excitation method, while more advanced techniques attempt a complex parameterisation of the source. [Pg.446]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...