Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Epoch detection finding the instant of glottal closure

2 Epoch detection finding the instant of glottal closure [Pg.381]

We have just seen that closed-phase LP requires that we analyse each pitch period separately. This type of speech analysis is called pitch-synchronous analysis and can be performed only if we are in fact able to find and isolate individual periods of speech. We do this by means of a pitch-marking or epoch-detection algorithm (EDA). [Pg.381]

The idea behind an EDA is to locate a single instant in each pitch period that serves as an anchor for further analysis. These positions are often known as pitch marks, pitch epochs or simply epochs. In general, they can refer to any reference point, but are often described in terms of salient positions on the glottal-flow signal, such as the peak of the flow for each pitch period. For many types of analysis (such as TD-PSOLA, which will be described in Section 14.2) it doesn t really matter where the anchor point is chosen, so long as it is consistent from period to period. Often a time-domain reference point such as the peak of the highest excursion in each period is used, or the trough of the lowest excursion. That said, many analysis techniques focus on one particular point known as [Pg.381]

Currently, mel-scale cepstral coefficients, and perceptual LP coefficients transformed into cepstral coefficients, are popular choices for the above reasons. Specifically, they are chosen because they are robust with respect to noise, can be modelled with diagonal covariance, and, with the aid of the perceptual scaling, are more discriminative than would otherwise be the case. From a speech-synthesis point of view, these points are worth making, not because the same requirements exist for synthesis, but rather to make the reader aware that the reason why MFCCs and PLPs are so often used in ASR systems is to do with the above factors, not because they are intrinsically better in any general-purpose sort of way. This also helps explain why there are so many speech representations in the first place each has strengths in certain areas, and will be used as the application demands. In fact, as we shall see in Chapter 16, the application requirements which make, say, MFCCs so suitable for speech recognition are almost entirely absent for our purposes. We shall leave a discussion of what representations really are suited for speech-synthesis purposes until Chapter 16. [Pg.385]




SEARCH



Epoch

Instantizing

© 2024 chempedia.info