Mel-scaled cepstra

A very popular representation in speech recognition is the mel-frequency cepstral coefficient or MFCC. This is one of the few popular represenations that does not use linear prediction. This is formed by first performing a DFT on a fi ame of speech, then performing a filter-bank analysis (see Section 12.2) in which the frequency-bin locations are defined to lie on the mel-scale. This is set up to give say 20-30 coefficients. These are then transformed to the cepstral domain ly the discrete cosine transform (we use this rather than the DFT since we require only the real part to be calculated) [Pg.370]

It is common to ignore the higher cepstral coefficients, and often in ASR only the bottom 12 MFCCs are used. This representation is very popular in ASR, for two reasons. Firstly it has the basic desirable property that the coefficients are largely independent, allowing probability densities to be modelled with diagonal covariance matrices (see Section 15.1.3). Secondly, the mel-scaling has been shown to offer better discrimination between phones, which is an obvious help in recognition. [Pg.370]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...