Formants synthesis

Figure 13.1. Filter view of additive formant synthesis, showing individual impulse responses of parallel formant filters.

Figaie 13.3. Wavetable view of additive formant synthesis. [Pg.151]

In a source/filter vocal model such as LPC or parallel/cascade formant synthesis, periodic impulses are used to excite resonant filters to produce vowels. We could construct a simple alternative model using three, four, or more tables storing the impulse responses of the individual vowel formants. Note that it isn t necessary for the tables to contain pure exponentially decaying sinusoids. We could include aspects of the voice source, etc., as long as those effects are periodic. FOFs (originally introduced as Formant Onde Functions in French, translates to Formant Wave Functions in English) were created for... [Pg.151]

Formant synthesis was the first genuine synthesis technique to be developed and was the dominant technique imtil the early 1980s. Formant synthesis is often called synthesis-by-rule a term invented to make clear at the time that this was synthesis from scratch (at the time the term s5mthesis was more commonly used for the process of reconstructing a waveform that had been parameterised for speech coding purposes). As we shall see, most formant synthesis techniques do in fact use rules of the traditional form, but data driven teehniques have also been used. [Pg.398]

The formant synthesis technique just described is of course only half the problem in addition to generating waveforms from formant parameters, we have to be able to generate formant parameters from the discrete pronunciation representations of the type represented by the synthesis specification. It is useful therefore to split the overall process into separate parameter-to-speech (i.e. the formant synthesiser just described) and specification-to-parameter components. [Pg.406]

Before going into this, we should ask - how good does the speech sound if we give the formant synthesiser perfect input The specification-to-parameter component may produce errors and if we are interested in assessing the quality of the formant synthesis itself, it may be diffieult to do this from the specification directly. Instead we can use the technique of copy synthesis, where we forget about automatic text-to-speech conversion, and instead artificially generate the best possible parameters for the synthesiser. This test is in fact one of the comer stones of speech synthesis research it allows us to work on one part of the system in a modular fashion, but more importantly it acts as a proof of concept as to the synthesiser s eventual suitability for inelusion in the full TTS system. The key point is that if the synthesis sounds bad with the best possible input, then it will only sound worse when potentially error-full input is given instead. In effect copy synthesis sets the upper limit on expeeted quality from any system. [Pg.406]

An alternative to using formants as the primary means of control is to use the parameters of the vocal tract transfer function directly. The key here is that if we assume the all-pole tube model, we can in fact determine these parameters automatically by means of linear prediction, performed by the covariance or autocorrelation technique described in Chapter 12. In the following section we will explain in detail the commonality between linear prediction and formant synthesis, where the two techniques diverge, and how linear prediction can be used to generate speech. [Pg.410]

Beyond this the similarities between the formant s mthesiser and LP model start to diverge. Firstly, with the LP model, we use a single all-pole transfer function for all sounds. In the formant model, there are separate transfer functions in the formant synthesiser for the oral and nasal cavity. In addition a further separate resonator is used in formant synthesis to create a voiced source signal from the impulse train in the LP model the filter that does this is included in the all-pole filter. Hence the formant synthesiser is fundamentally more modular in that it separates these components. This lack of modularity in the LP model adds to the difficulty in providing physical interpretations to the coefficients. [Pg.411]

It should be clear from our exposition that each technique has inherent tradeoffs with respect to the above wish list. For example, we make many assumptions in order to use the lossless all-pole linear prediction model for all speech sounds. In doing so, we achieve a model whose parameters we can measure easily and automatically, but find that these are difficult to interpret in a useful sense. While the general nature of the model is justified, the assumptions we make to achieve automatic analysis mean that we can t modify, manipulate and control the parameters in as direct a way as we can with formant synthesis. Following on from this, it is difficult to produce a simple and elegant phonetics-to-parameter model, as it is difficult to interpret these parameters in higher level phonetic terms. [Pg.418]

On the other hand, with formant synthesis, we can in some sense relate the parameters to the phonetics in that we know for instance that the typical formant values for an /iy/ vowel are 300Hz,... [Pg.418]

Klatt s 1987 article. Review ofText-to-Speech Conversion for English [255] is an excellent source for further reading on first generation systems. Klatt documents the history of the entire TTS field, and then explains the (then) state of the art systems in detail. While his account is heavily biased towards formant synthesis, rather than LP or articulatory syntiiesis, it none the less remains a very solid account of technology before and at the time. [Pg.420]

Formant synthesis works by using individually controllable formant filters which can be set to produce accurate estimations of the vocal tract transfer function... [Pg.421]

In general formant synthesis produces intelligible but not natural sounding speech. [Pg.421]

In terms of production, it is very similar to formant synthesis with regard to the source and vowels. It differs in that all sounds are generated by an all-pole filter, whereas parallel filters are common in formant synthesis. [Pg.421]

Throughout the book, we have made statements to the effect that statistical text analysis outperforms rule methods, or that unit selection is more natural than formant synthesis. But how do we know this In one way or another, we have evaluated our systems and come to these conclusions. How we go about this is the topic of this section. [Pg.534]

Hogberg, J. Data driven formant synthesis. InProceedings of Eurospeech 1997 (1997). [Pg.584]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...