Pronunciation lexicons

In all modem TTS systems we make extensive use of a lexicon. The issue of lexicons is in fact quite complicated, and we discuss this in full in Chapter 8 when we start to talk about pronunciation in detail. For our current purpose, its main use is that it lists the words that are known to the system, and that it defines their written form. A word may have more than one written form (labour and labor) and two words may share the same written form polish etc. It is by using a lexicon as a place to define what is a possible word and what its written forms may be, that we can use the decoding model we are adopting in this book. [Pg.64]

At this stage it is worth mentioning two other types of word/token relationships that are often cited as problems for TTS. Acronyms are words formed from sequences of other words, usually (but not always) by taking the first letter of each. Examples include NATO, UNICEF SCUBA and AIDS. In our approach, these are treated as normal words so that we have a word NATO in our lexicon and this has a pronunciation /n ey t ow/ the fact that this was historically formed from other words or is normally found written in upper case is of no real concern. (In fact, with use many acronyms really do become indistinguishable from normal words such that radar is nearly always spelled in lower case and few people realise that it is an acronym, and fewer still what the original words were). [Pg.100]

Most words have a single canonical pronunciation it is this that we think of as being stored in the lexicon it is this which is the definition of how that word sounds. Hence a canonical transcription aims to transcribe the speech in terms of the canonical pronunciations which are given for each word in the lexicon. In such a system, the task can be described as one where we first identify the words, and then transcribe the speech with the phonemes for each word and finally mark the boundaries between the phonemes. Complications lie in the fact that the speaker may say a filled pause (i.e. an um or err ), or that they may mispronounce a word (i.e. they might say /n uw k y uw 1 er/ instead of /n uw k 1 iy er/ for nuclear). In such cases, one must decide whether to take account of these effects and label them, or whether to stick to the canonical pronunciation. A second type of broad transcription, known as phonemic transcription takes a slightly more literal approach, whereby the transcriber marks the sounds as he or she thinks they occur, but in doing so only draws from the inventory of defined phonemes. Thus in such a system, the transcriber would be able to describe the differences between the two renditions of NUCLEAR. [Pg.172]

Perhaps unsurprisingly, the consensus as to where this point should be has shifted over the years. When more traditional systems were developed, memory was very tight and hence the number of base types had to be kept low regardless of any errors. In more recent years technological developments have eased the pressure on memory making more abstract representations possible. Given this, there is more choice over where exaetly the ideal representation should lie. In fact, as we shall see in Chapter 16, the most successfiil systems adopt a quite phonemic representation and avoid any rewriting to a phonetic space if at all possible. Because of this, the pronunciation component in modem systems is in fact much simpler than was perhaps the case in older systems, and quite often the input to the synthesiser is simply canonical forms themselves, direct from the lexicon. [Pg.196]

It does seem somewhat wasteful that separate pronunciation systems and therefore lexicons are required for British English and American English while differences of course exist, pronunciations in the two accents aren t completely different. Fitt [161], [162], [163] proposed a solution to this where a more abstract pronunciation drawn from a large set of phonemes was used as a base lexicon. From this, filters could be used to generate accents for any accent of English. [Pg.198]

In speech technology, it is quite often to find existing lexicons in what we can term simple dictionary format. Such lexicons can be stored as simple ascii files, where each entry starts with die orthography, followed by the pronunciation and possibly other information. The entries are ordered by the orthography such that a few typical entries would look like this ... [Pg.211]

A solution to this is to build the lexicon as a relational database. In this, we have exactly one entry for each word. Each entry contains a number of uniquely identified fields, each of which has a single value. For for a simple word, the entry may just contain two fields, ORTHOGRAPHY and PRONUNCIATION. It is a simple matter to add more fields such as POS or SYNCAT. Each entry in a relational database can also be seen as a feature structure of the type we are now familiar with, and because of this similarity, we will use the feature structure terminology for ease of... [Pg.211]

Given this large offline lexicon, we see then that the real debate about rules vs lexicons is not one of quality, but rather one of balance between run-time and off-line resources. If we take the case where we include the entire off-line lexicon in the system lexicon, we will have a system which uses a considerable amount of memory, but where the processing speed is minimal (simply the small amount of time taken to look up a word). If on the other hand we create a system lexicon that is only a small subset of the offline lexicon, this will result in a smaller footprint, but as the pronunciation of absent words will have to generate at run-time, the processing costs... [Pg.215]

Pronunciations can be generated by lexicon look up or by algorithm. These two techniques should be viewed as points on a scale. [Pg.225]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...