Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Mel-scaled cepstra

A very popular representation in speech recognition is the mel-frequency cepstral coefficient or MFCC. This is one of the few popular represenations that does not use linear prediction. This is formed by first performing a DFT on a fi ame of speech, then performing a filter-bank analysis (see Section 12.2) in which the frequency-bin locations are defined to lie on the mel-scale. This is set up to give say 20-30 coefficients. These are then transformed to the cepstral domain ly the discrete cosine transform (we use this rather than the DFT since we require only the real part to be calculated)  [Pg.370]

It is common to ignore the higher cepstral coefficients, and often in ASR only the bottom 12 MFCCs are used. This representation is very popular in ASR, for two reasons. Firstly it has the basic desirable property that the coefficients are largely independent, allowing probability densities to be modelled with diagonal covariance matrices (see Section 15.1.3). Secondly, the mel-scaling has been shown to offer better discrimination between phones, which is an obvious help in recognition. [Pg.370]


Transform die representation into a space that has more desirable properties log magnitude spectra follow the ear s dynamic range mel-scaled cepstra scale according to the frequency sensitivity to the ear log area ratios are amenable to simple interpolation and line-spectral frequencies show the formant patterns robustly. [Pg.386]


See other pages where Mel-scaled cepstra is mentioned: [Pg.370]   


SEARCH



Cepstra

Mel-scale

© 2024 chempedia.info