Concept to speech

It is not strictly speaking necessary to start with text as input in creating synthetic speech. As we have clearly stated, our TTS model is really two processes, an analysis one followed by a synthesis one, and these are quite different in nature. As the analysis system will never be perfect, it is quite reasonable to ask whether there are situations where we can do away with this component and generate speech directly . Often this way of doing things is called concept-to-speech in contrast to text-to-speech. We shall stick with the generally used term Concept-to-speech , but we should point out that this can however mean a number of different things which we will now explain. [Pg.42]

A final solution is that we use a TTS system as before, but augment the input so as to reduce [Pg.42]

The message-to-form and augmented-text approaches are well covered in this book in the sense that they can be adapted from our main common-form TTS system. Meaning-to-speech is considerably more difficult and has only been researched in a very tentative manner to date. Mainly for this reason, we do not cover this approach in any detail here. [Pg.43]

A final solution is that we use a TTS system as before, but augment the input so as to reduce ambiguity and explicitly show the system where to generate prosodic effects. This can be done by a number of means, including the use of XML or other markup (explained in Chapter 17). While this will inevitably lack the power and fine control of the two above methods, it has the advantage in that the input is in a more standard form, such that a system developer should be able to get this working and switch from one system to another. It also has the advantage in that one can easily vary how close the system is to raw text or clean text. [Pg.42]

The message-to-form and augmented-text approaches are well covered in this book in the sense that they can be adapted from our main common-form TTS system. [Pg.42]

Before going into this, we should ask - how good does the speech sound if we give the formant synthesiser perfect input The specification-to-parameter component may produce errors and if we are interested in assessing the quality of the formant synthesis itself, it may be diffieult to do this from the specification directly. Instead we can use the technique of copy synthesis, where we forget about automatic text-to-speech conversion, and instead artificially generate the best possible parameters for the synthesiser. This test is in fact one of the comer stones of speech synthesis research it allows us to work on one part of the system in a modular fashion, but more importantly it acts as a proof of concept as to the synthesiser s eventual suitability for inelusion in the full TTS system. The key point is that if the synthesis sounds bad with the best possible input, then it will only sound worse when potentially error-full input is given instead. In effect copy synthesis sets the upper limit on expeeted quality from any system. [Pg.406]

Frequently, however, you will find it necessary to provide an extended definition—that is, a longer, more detailed explanation that thoroughly defines the subject. Essays of extended definitions are quite common think, for instance, of the articles you ve seen on mercy killing or abortion that define life in a variety of ways. Other recent essays have grappled with such complex concepts as free speech, animal rights, pornography, affirmative action, and domestic violence. [Pg.248]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...