Perceptual linear prediction

It is possible to perform a similar operation with LP coefficients. In the normal calculation of these, spectral representations aren t used, so scaling the frequency domain (as in the case of the mel-scaled cepstrum) isn t possible. Recall, however, that in the autocorrelation technique of LP we used the set of autocorrelation functions to find the predictor coefficients. In Section 10.3.9 we showed that the power spectrum is in fact the Fourier transform of the antocorrelation function, and hence the autocorrelation function can be found from the inverse transform of the power spectrum. [Pg.370]

instead of calculating the antocorrelation function in the time domain, we can calculate it 1 first finding the DFT of a frame, squaring it, and then performing an inverse DFT. In this operation it is possible to scale the spectnun as described above. The final LP coefficients will therefore be scaled with respect to the mel or Baik scale, snch that more poles are nsed for lower frequencies than for higher ones. [Pg.370]

Currently, mel-scale cepstral coeflicients, and perceptual linear prediction coefficients transformed into cepstral coefficients, are popular choices for the above reasons. Specifically they are ehosen because they are robust to noise, can be modelled with diagonal covariance, and with the aid of the perceptual scaling are more discriminative than would otherwise be. From a speech synthesis point of view, these points are worth making, not because the same requirements exist for synthesis, but rather to make the reader aware that the reason MFCCs and PLPs are so often used in ASR systems is for the above reasons, and not because they are intrinsically better in any general purpose sort of way. This also helps explain why there are so many speech representations in the first place each has strengths in certain areas, and will be used as the application demands. In fact, as we shall see in Chapter 16, the application requirements which make, say, MFCCs so suitable for speeeh recognition are almost entirely absent for our purposes. We shall leave a discussion as to what representations really are suited for speech synthesis purposes until Chapter 16. [Pg.395]

If the perceptual approach is used for the prediction of subjectively perceived audio quality of the output of a linear, time-invariant system then the system characterization approach and the perceptual approach must lead to the same answer, In the system characterization approach one will first characterize the system and then interpret the results using knowledge of both the auditory system and the input signal for which one wants to determine the quality. In the perceptual approach one will characterize the perceptual quality of the output signals with the input signals as a reference. [Pg.303]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...