Linear-prediction synthesis

Atal and Hanauer, 1971] Atal, B. S. and Hanauer, L. S. (1971). Speech analysis and synthesis by linear prediction of the speech wave. J. Acoustical Soc. of America, 50 637-655. [Pg.535]

In prior chapters we found that spectral shape is important to our perception of sounds, such as vowel/consonant distinctions, the different timbres of the vowels eee and ahh, etc. We also discovered that sinusoids are not the only way to look at modeling the spectra of sounds (or soimd components), and that sometimes just capturing the spectral shape is the most important thing in parametric sound modeling. Chapters 5 and 6 both centered on the notion of additive synthesis, where sinusoids and other components are added to form a final wave that exhibits the desired spectral properties. In this chapter we will develop and refine the notion of subtractive synthesis and discuss techniques and tools for calibrating the parameters of subtractive synthesis to real sounds. The main technique we will use is called Linear Predictive Coding (LPC), which will allow us to automatically fit a low-order resonant filter to the spectral shape of a sound. [Pg.85]

Subtractive synthesis uses a complex source wave—such as an impulse, a periodic train of impulses, or white noise—to excite a spectral-shaping filter. Linear prediction, or linear predictive coding (LPC), gives us a mathematical technique for automatically decomposing a sound into a source and a filter. For low order LPC (6-20 poles or so), the filter is fit to the coarse spectral... [Pg.94]

Bishnu Atal. Speech Analysis and Synthesis by Linear Prediction of the Speech Wave. Journal of the Acoustical Society of America 47.65(Abstract) (1970). [Pg.95]

In prior chapters we looked at subtractive synthesis techniques, such as modal synthesis (Chapter 4) and linear predictive coding (Chapter 8). In these methods a complex source is used to excite resonant fQters. The source usually has a flat spectnun, or exhibits a simple roll-off pattern like f or ip (6 dB or 12 dB per octave). The filters, possibly time-varying, shape the spectrum to model the desired sound. [Pg.149]

Currently, mel-scale cepstral coeflicients, and perceptual linear prediction coefficients transformed into cepstral coefficients, are popular choices for the above reasons. Specifically they are ehosen because they are robust to noise, can be modelled with diagonal covariance, and with the aid of the perceptual scaling are more discriminative than would otherwise be. From a speech synthesis point of view, these points are worth making, not because the same requirements exist for synthesis, but rather to make the reader aware that the reason MFCCs and PLPs are so often used in ASR systems is for the above reasons, and not because they are intrinsically better in any general purpose sort of way. This also helps explain why there are so many speech representations in the first place each has strengths in certain areas, and will be used as the application demands. In fact, as we shall see in Chapter 16, the application requirements which make, say, MFCCs so suitable for speeeh recognition are almost entirely absent for our purposes. We shall leave a discussion as to what representations really are suited for speech synthesis purposes until Chapter 16. [Pg.395]

An alternative to using formants as the primary means of control is to use the parameters of the vocal tract transfer function directly. The key here is that if we assume the all-pole tube model, we can in fact determine these parameters automatically by means of linear prediction, performed by the covariance or autocorrelation technique described in Chapter 12. In the following section we will explain in detail the commonality between linear prediction and formant synthesis, where the two techniques diverge, and how linear prediction can be used to generate speech. [Pg.410]

Ease of data acquisition Whether the system is rule-driven or data-driven , some data has to be acquired, even if this is just to help the rule-writer determine appropriate values for the rules. Here linear prediction clearly wins, because its parameters can easily be determined from any real speech w aveform. When formant synthesisers were mainly being developed, no fully reliable formant trackers existed, so the formant values had to be determined either manually or semi-manually. While better formant traekers now exist, many other parameters required in formant S5mthesis (e.g. zero loeations or bandwidth values) are still somewhat difficult to determine. Articulatory synthesis is partieularly interesting in that in the past it was next to impossible to acquire data. Now, various techniques such as EMA and MRI have made this much easier, and so it should be possible to collect much bigger databases for this purpose. The inability to collect accurate articulatory data is certainly one of the main reasons why articulatory synthesis never really took off. [Pg.418]

It should be clear from our exposition that each technique has inherent tradeoffs with respect to the above wish list. For example, we make many assumptions in order to use the lossless all-pole linear prediction model for all speech sounds. In doing so, we achieve a model whose parameters we can measure easily and automatically, but find that these are difficult to interpret in a useful sense. While the general nature of the model is justified, the assumptions we make to achieve automatic analysis mean that we can t modify, manipulate and control the parameters in as direct a way as we can with formant synthesis. Following on from this, it is difficult to produce a simple and elegant phonetics-to-parameter model, as it is difficult to interpret these parameters in higher level phonetic terms. [Pg.418]

Atal, B. S., and Hanauer, L. Speech analysis and synthesis by linear prediction of the... [Pg.572]

PSOLA, which operates in the time domain. It separates the original speech into frames pitch-synchronously and performs modification ly overlapping and adding these frames onto a new set of epochs, created to match the synthesis specification. Residual-excited linear prediction performs LP analysis, but uses the whole residual in resynthesis rather than an impulse. The residual is modified in a manner very similar to that of PSOLA. [Pg.434]

Standard cepstral analysis can be used for a number of purposes, for example FO extraction and spectral envelope determination. One of the main reasons that cepstral coefficients are used for spectral representations is that they are robust and well suited to statistical analysis because the coefficients are to a large extent statistically independent. In synthesis however, measuring the spectral envelope accurately is a critical to good quality and many teclmiques have been proposed for more accurate spectral estimation than classic linear prediction or cepstral analysis. [Pg.465]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...