Synthesis of prosody

The majority of this chapter focuses on the synthesis of intonation. The main acoustic representation of intonation is fundamental frequency (FO), such that intonation is often defined as the manipulation of FO for commimicative or linguistic purposes. As we shall see, techniques for S5mthesizing FO contours are inherently linked to the model of intonation used, and so the whole topic of intonation, including theories, models and FO synthesis is dealt with here. In addition, we cover the topic of predicting intonation form from text, which was deferred from Chapter 6 as we first require an understanding of intonational phenomena theories and models before explaining this. [Pg.227]

Timing is considered the second important acoustic representation of prosody. Timing is used to indicate stress (phones are longer than normal), phrasing (phones get noticeably longer immediately prior to a phrase break) and rhythm. [Pg.227]

In this book we have used the terms FO (fundamental frequen( ) and pitch interchangeably, as there are few cases where the difference matters. Strictly speaking, pitch is what is perceived, such that some errors or non-linearities of perception may lead this to be slightly different to fundamental frequency. Fundamental frequency is a little harder to define in a true periodic signal this is simply defined as the reciprocal of the period, but as speech is never purely periodic this definition itself does not suffice. An alternative definition is that it is the input or driving, frequency of the vocal folds. [Pg.228]

In prosody, FO is seen as the direct expression of intonation and often intonation is defined as the linguistic use of FO. The relationship between the two is a little more subtle than this though as it is clear that listeners do not perceive the FO contour directly, but rather a processed version of this. The exact mechanism is not known, but it is as if the listener interpolates the contour through the imvoiced regions so as to produce a continuous, unbroken contour. [Pg.228]

The models differ significantly in what they take as the primary form of intonation. In the AM model this is quite abstract while in the Tilt model this is quite literal or acoustie . These differences in primary form should not be taken to mean that the proponents of these models do not believe that there should be more abstract or more conerete representations, just that the best representation happens to lie where they describe it. In the many synthesis schemes based on the AM model there are other, more phonetic or acoustic levels, and in the Tilt model there is always the intention that it should serve as the phonetie description of some more abstract higher level representation. [Pg.229]

This chapter is concerned with the issue of synthesising aconstic representations of prosody. The input to the algorithms described here varies but in general takes the form of the phrasing, stress, prominence and discourse patterns which we introduced in Chapter 6. Hence the complete process of synthesis of proso can be seen as one whereby we first extract a prosodic form representation from the text, as described in Chapter 6, and then synthesize an acoustic representation of this form, as described here. [Pg.225]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...