Canned speech and limited-domain synthesis

The main technical drawback of canned speech is that it can say only a fixed number of things. This can be a severe drawback in even simple applications where, for example, a telephone number is to be read in an answering-machine application. A common solution is to attempt to splice together recordings of individual words or phrases so as to create new utterances. The result of such operations varies greatly from acceptable (but clearly [Pg.43]

Faced with the choice between fiiUy natural but inflexible canned speech and somewhat unnatural but fijlly flexible TTS, some researchers have proposed limited-domain synthesis systems that aim to combine their benefits. There are many different approaches to this. Some systems attempt to mix canned speech and TTS. Black and Lenzo [52] proposed a system for cleverly joining words and carrier phrases, for use in apphcations such as a talking clock. The phrase-splicing approach of Donovan et al. [139] used recorded carrier phrases cleverly sphced with unit-selection synthesis from recordings of the same speaker. A somewhat different approach is to use what is basically the same system as for normal unit-selection synthesis but to load the database of recordings with words and phrases from the required domain [9], [394], [436], [440]. [Pg.44]

Such systems initially compare unfavorably with TTS in that they require a new set of utterances for each application, compared to a TTS system which would just be deployed once and can be used for any application. Furthermore, the canned speech approach can only say a very fixed number of filings, which can limit the scope of the application (for instance it would be very difficult to speak a user s name). Finally, if the application has to be updated in some way and new utterances added, this requires additional recordings which may incur considerable difficulty if say the original speaker is unobtainable. Despite these apparent disadvantages, caimed speech is nearly always deployed in commercial systems in place of TTS. Part of the reason behind this is technical, part cultural. Technically canned speech is perfectly natural and as users show extreme sensitivity to the naturalness of all speech output this factor can outweigh all others. In recent years, TTS systems have improved considerably in terms of naturalness, and so it is more common to find TTS systems in these applications. There are other non-technical reasons caimed speech is seen as a simple, low-tech solution whereas TTS is seen as complex and hi-tech. The upshot is that most system designers feel that they know where they stand with caimed speech, whereas TTS requires some leap of faith. There may be purely business reasons also while the canned speech approach incurs up front cost in an application, it is a one off cost, and does not increase with the size of deployment. TTS systems by contrast can often be sold like normal soft- [Pg.43]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...