Recognising with HMMs

Rgure 15.4 A schematic illustration of HMM with three states, each with a separate Gaussian, denoted bi, 2 and. The transition prohahdities are given hy, where i is the start state and j is the destination state. If i = j then this is the self-transition probability, that is, the probahihty that the system wiU stay in the same state. The transition probabilities exiting a state always sums to 1. [Pg.441]

If we had only one observation, it would be easy to find the state which gives the highest probability. Instead, of course, we have a sequence of observations, which we assume has been generated by moving tiu ough a sequence of states. In principle any one of the possible state sequences could have generated these observations it s just that some are more likely than others. Because of this, we cannot deterministically find [Pg.441]

For a state sequence Q and model M, we can find the total probability that this sequence generated the observations by calculating the probabilities of the observations and transitions of that sequence, so for a sequence that moves through states 1, 2 and 3, for example, we would have [Pg.442]

In general, then, for a sequence of states Q = q, q2, .qt), the probability for that sequence is given by [Pg.442]

By definition, all probabilities are 1, so it should be clear that the final calculated values in Equation (15.6) may be very small, even for the single-highest-probability case. Because of this, it is common to use log probabilities, in which case Equation (15.6) [Pg.442]

Given the word and phone sequence we can construct an HMM model network can to recognise just those words. Recognition is obviously performed with perfect accuracy, but in doing the recognition search we also determine the most likely state sequence, and this gives us the phone and word boundaries. Often this operation is called forced alignment. [Pg.479]

Multiple pronunciations, caused by pronunciation variants, can easily be dealt with in an HMM framework, no matter whether a specially trained aligner or general purpose recogniser is used. When a multiple pronunciation occurs, the recognition network simply splits, and allows two separate state paths from the start of the word to the end. During ahgnment, the decoder will pick the path with the highest probability, and this can be taken to be the correct pronunciation. [Pg.481]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...