Linear-prediction speech analysis

Atal and Hanauer, 1971] Atal, B. S. and Hanauer, L. S. (1971). Speech analysis and synthesis by linear prediction of the speech wave. J. Acoustical Soc. of America, 50 637-655. [Pg.535]

Bishnu Atal. Speech Analysis and Synthesis by Linear Prediction of the Speech Wave. Journal of the Acoustical Society of America 47.65(Abstract) (1970). [Pg.95]

The preceding sections showed the basic techniques of source filter separation using first cepstral then linear prediction analysis. We now turn to the issue of using these techniques to generate a variety of representations, each of which by some means describes the spectral envelope of the speech. [Pg.371]

Here Y z) is the speech, U z) is the source, V z) is the vocal tract and R z) is the radiation. Ideally, the transfer function H z) found by linear prediction analysis would be V(z), the vocal tract transfer function. In the course of doing this, we could then find U(z) and R z). In reality, in general H z) is a close approximation to F(z) but is not exactly the same. The main reason for this is that LP minimisation criterion means that the algorithm attempts to find the lowest error for the whole system, not just the vocal tract component. In fact, H z) is properly expressed as... [Pg.371]

A very popular representation in speech recognition is the mel-frequency cepstral coefficient or MFCC. This is one of the few popular represenations lhat does not use linear prediction. This is formed by first performing a DFT on a frame of speech, then performing a filter bank analysis (see Section 12.2) in which the frequency bin locations are defined to lie on the mel-scale. This is set up to give say 20-30 coefficients. These are then transformed to the cepstral domain by the discrete cosine transform (we use this rather than the DFT as we only require the real part to be calculated) ... [Pg.379]

Linear prediction was in fact primarily developed for use in speech coding applications. As we have just seen, performing LP analysis allows us to deconvolve the signal into a source and filter, which can then be used to reconstruct the original signal. Simply separating the source and filter... [Pg.387]

We have just seen that elosed phase linear prediction requires that we analyse eaeh pitch period separately. This type of speech analysis is called pitch synchronous analysis and can only be performed if we are in faet able to find and isolate individual periods of speech. We do this by means of an pitch marking orepoch detection algorithm (EDA). [Pg.391]

Recall that Equation 13.18 is exactly the same as the linear prediction Equation 12.16, where = fli, 02,..., Op are the predictor coefficients and x[n] is the error signal e n. This shows that the result of linear prediction gives us the same type of transfer function as the serial formant synthesiser, and hence LP can produce exactly the same range of frequency responses as the serial formant S5mthesiser. The significance is of course that we can derive the linear prediction coefficients automatically fi om speech and don t have to make manual or perform potentially errorful automatic formant analysis. This is not however a solution to the formant estimation problem itself reversing the set of Equations 13.14 to 13.18 is not trivial, meaning that while we can accurately estimate the all-pole transfer function for arbitrary speech, we can t necessarily decompose this into individual formants. [Pg.411]

It should be clear from our exposition that each technique has inherent tradeoffs with respect to the above wish list. For example, we make many assumptions in order to use the lossless all-pole linear prediction model for all speech sounds. In doing so, we achieve a model whose parameters we can measure easily and automatically, but find that these are difficult to interpret in a useful sense. While the general nature of the model is justified, the assumptions we make to achieve automatic analysis mean that we can t modify, manipulate and control the parameters in as direct a way as we can with formant synthesis. Following on from this, it is difficult to produce a simple and elegant phonetics-to-parameter model, as it is difficult to interpret these parameters in higher level phonetic terms. [Pg.418]

The above technique bears some similarities to the TD-PSOLA technique in that it uses a pitch-synchronous analysis to isolate individual pitch periods, after which modification and resynihesis is performed. In fact in a technique called linear prediction pitch synchronous overlap and add or LP-PSOLA, we can use the PSOLA more or less directly on the residual rather than the waveform. As above, epoch detection is used to find the epochs. The residual is then separated into a number of symmetrical frames centred on the epoch. Pitch modification is performed by moving the residual frames closer or further away, and duration modification is performed by duplication or elimination of frames, in just the same way as in TD-PSOLA. The only difference is that these operations are performed on the residual which is then fed into the LP filter to produce speech. This technique differs only from the Hunt technique in the shape of the frames. Both techniques uses window functions with their highest point at the epoch in Hunt s technique the windows are asymmetrical with the idea that they are capturing a single impulse, in LP-PSOLA the windows are symmetrical. In listening tests, the two techniques produced virtually identical quality speech. [Pg.435]

For a frame of speech, basic sinusoidal analysis can be found in a means similar to linear prediction (Section 12.4), where we use the model to create an artificial signal s n) ... [Pg.437]

Atal, B. S., and Hanauer, L. Speech analysis and synthesis by linear prediction of the... [Pg.572]

Makhoul, J. Spectral analysis of speech by linear prediction. IEEE Transactions on Audio and Electroacoustics 3 (1973), 140-148. [Pg.589]

We will now turn to the important problem of source-filter separation. In general, we wish to do this because the two components of the speech signal have quite different and independent linguistic ftmctions. The source controls the pitch, which is the acoustic correlate of intonation, while the filter controls the spectral envelope and formant positions, which determine which phones are being produced. There are three popular techniques for performing source-filter separation. First we will examine filter-bank analysis in this section, before turning to cepstral analysis and linear prediction in the next sections. [Pg.352]

Here Y(z) is the speech, U(z) is the source, V(z) is the vocal tract and R(z) is file radiation. Ideally, the transfer function H(z) found by linear-prediction analysis would... [Pg.362]

PSOLA, which operates in the time domain. It separates the original speech into frames pitch-synchronously and performs modification ly overlapping and adding these frames onto a new set of epochs, created to match the synthesis specification. Residual-excited linear prediction performs LP analysis, but uses the whole residual in resynthesis rather than an impulse. The residual is modified in a manner very similar to that of PSOLA. [Pg.434]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...