Signal Processing in HMM synthesis

The output of the HMM synthesis process is a sequence of cepstral vectors and FO values, and so the final task is to convert these into a speech waveform. This can be accomplished in a number of ways, see for example Section 14.6. In general though the approach is to use the generated cepstral output to create a spectral envelope, and use the generated FO output to create an impulse train. The impulses are then fed into a filter with the coefficients derived from the cepstral parameters. While reasonably effective, this vocoder style approach is essentially the same as that used in first generation systems and so can suffer from the buzz or metallic sound characteristic of those systems (see Section 13.3.5). A major focus of current research is to improve on this. [Pg.464]

While cepstral generation via inversion of tiie cepstral analysis technique is possible, it is more common in HMM synthesis to use tiie more direct technique of Mel Logarithmic Spectrum Approximation (MLSA) [229]. This technique is quicker (in that it doesn t require the expensive inverse DFT operations) and can be more accurate at modelling spectral envelopes. MLSA uses more sophisticated signal processing techniques than have so far been introduced and so is beyond our present scope. See however Imai s original paper for details of how this is performed [229]. [Pg.465]

Standard cepstral analysis can be used for a number of purposes, for example FO extraction and spectral envelope determination. One of the main reasons that cepstral coefficients are used for spectral representations is that they are robust and well suited to statistical analysis because the coefficients are to a large extent statistically independent. In synthesis however, measuring the spectral envelope accurately is a critical to good quality and many teclmiques have been proposed for more accurate spectral estimation than classic linear prediction or cepstral analysis. [Pg.465]

STRAIGHT extracts the FO values with fixed-point analysis and carries out FO-adaptive spectral analysis combined with a surface reconstruction method in the time frequency region to remove signal periodicity. It also extracts aperiodicity measurements on the frequency domain. These are based on a ratio between the lower and npper smoothed spectral envelopes and represent the relative energy distribntion of aperiodic components. [Pg.465]

It is not possible to use the STRAIGHT parametrization in the HMMs, since estimating statistically reliable acoustic models using high-dimensional observations is very difficult. To avoid this problem, some systems (e.g. [ ]) have used mel-cepstral coefficients converted from the smoothed spectrum with a recursive algorithm [ ]. For the same reason, the aperiodicity measurements must also be averaged, usually on five frequency sub-bands (0-1000, 1000-2000, 2000-4000, 4000-6000 and 6000-8000 Hz). [Pg.465]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...