Filter-bank speech analysis

A very popular representation in speech recognition is the mel-frequency cepstral coefficient or MFCC. This is one of the few popular represenations lhat does not use linear prediction. This is formed by first performing a DFT on a frame of speech, then performing a filter bank analysis (see Section 12.2) in which the frequency bin locations are defined to lie on the mel-scale. This is set up to give say 20-30 coefficients. These are then transformed to the cepstral domain by the discrete cosine transform (we use this rather than the DFT as we only require the real part to be calculated) ... [Pg.379]

We will now turn to the important problem of source-filter separation. In general, we wish to do this because the two components of the speech signal have quite different and independent linguistic ftmctions. The source controls the pitch, which is the acoustic correlate of intonation, while the filter controls the spectral envelope and formant positions, which determine which phones are being produced. There are three popular techniques for performing source-filter separation. First we will examine filter-bank analysis in this section, before turning to cepstral analysis and linear prediction in the next sections. [Pg.352]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...