Mel-frequency cepstral coefficients

A very popular representation in speech recognition is the mel-frequency cepstral coefficient or MFCC. This is one of the few popular represenations lhat does not use linear prediction. This is formed by first performing a DFT on a frame of speech, then performing a filter bank analysis (see Section 12.2) in which the frequency bin locations are defined to lie on the mel-scale. This is set up to give say 20-30 coefficients. These are then transformed to the cepstral domain by the discrete cosine transform (we use this rather than the DFT as we only require the real part to be calculated) ... [Pg.379]

We now turn to techniques used to synthesize speech from cepstral representations and in particular the mel-frequency cepstral coefficients (MFCCs) commonly used in ASR systems. Synthesis from these is not actually a common second generation technique, but it is timely to introduce this technique here as it is effectively performing the same job as pure second generation techniques. In Chapter 15 we will give a full justification for wanting to synthesise from MFCCs, but the main reason is that they are a representation that is highly amenable to robust statistical analysis because the coefficients are statistically independent of one another. [Pg.441]

Chasan, D. Speech reconstruction from mel frequency cepstral coefficients and pitch. In Proceedings of the International Conference on Acoustics Speech and Signal Processing 2000 (2000). [Pg.576]

It is partly for this reason that we find that mel-frequency cepstral coefficients (MFCCs) are used as the representation of choice in many ASR systems (in addition, they are deemed to have good discrimination properties and are somewhat insensitive to differences between speakers). It is important to note, however, that HMMs themselves are neutral with respect to the type of observation used, and in principle we could use any of... [Pg.439]

It is not possible to use the STRAIGHT parametrization in the HMMs, since estimating statistically reliable acoustic models using high-dimensional observations is very difficult. To avoid this problem, some systems (e.g. [ ]) have used mel-cepstral coefficients converted from the smoothed spectrum with a recursive algorithm [ ]. For the same reason, the aperiodicity measurements must also be averaged, usually on five frequency sub-bands (0-1000, 1000-2000, 2000-4000, 4000-6000 and 6000-8000 Hz). [Pg.465]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...