Determining the phone boundaries

Instead of, or in addition to, this technique it is possible to use a human labeller to listen to the speech. This labeller would either correct the words when the speaker deviates fiom the [Pg.478]

Given the word sequence, our next job is to determine the phone sequence. In considering this, we should recall the discussions of Sections 7.3.2, 7.3.6 and 8.1.3. There we stated that there was considerable choice in how we mapped from the words to a sound representation, with the main issue being whether to choose a representation that is close to the lexicon (e.g. phonemes) or one that is closer to the signal (e.g. phones with allophonic variation marked). [Pg.479]

Another important consideration is that if we are going to use an HMM based labeller, then it makes sense to use a sound representation system which is amenable to HMM labelling. In general this means adopting a system which represents speech sounds as a linear list, and unfortunately precludes some of the more sophisticated non-linear phonologies described in Section 7.4.3. [Pg.479]

Given the word and phone sequence we can construct an HMM model network can to recognise just those words. Recognition is obviously performed with perfect accuracy, but in doing the recognition search we also determine the most likely state sequence, and this gives us the phone and word boundaries. Often this operation is called forced alignment. [Pg.479]

This reahsation has led to the study of alternative HMM configurations built specifically for the purpose of alignment. Matousek et al [235] report a number of experiments using the [Pg.480]

Authors of a number of studies have investigated using a state-of-the-art general-purpose speaker-independent speech recogniser to perform the alignment. This works [Pg.468]

This realisation has led to the study of alternative HMM configurations built specifically for the purpose of ahgnment. Matousek et al. [23 5] report a number of experiments using tile HTK toolkit to label the data, whereby a small amount of hand-labelled data is used to provide initial models, which are then retrained on the full corpus. In Clark et al. [Pg.469]

Multiple pronunciations, caused by pronunciation variants, can easily be dealt with in an HMM finmework, no matter whether a specially trained aligner or general-purpose recogniser is used. When a case of multiple pronunciation occnrs, the recognition network simply splits, and allows two separate state paths from the start of the word to the end. During alignment, the decoder will pick the path with the highest probability, which can be taken to be the correct pronunciation. [Pg.470]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...