Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Sentence splitting

Many of the algorithms in TTS work a sentence at a time. This is because most linguistic units smaller than this (words, syllables) etc are heavily influenced by their neighbours which makes autonomous processing difficult. Sentences on the other hand don t interact with each other much, and apart from some specific phenomena, we can by and large process each sentence independently without problem. The input to the TTS system is not necessarily in sentence form however, and in many cases we are presented with a document which contains several sentences. The task of sentence splitting then is to take the raw document and segment it into a list of sentences. [Pg.67]

While sentence splitting is not the most complex of tasks in TTS, it is important to get right and this is mainly due to the fact that sentence-final prosody is one of the phenomena that listeners are most sensitive to. Generating the high quality sentence-final prosody is hard enough in its own right, but if the sentence boundary is in the wrong place, then the system has no chance. [Pg.67]

For conventional writing we make use of the fact that in most cases the writer has clearly marked the sentence boundaries and so our job is simply to recover these from the text. Perhaps [Pg.67]

A basic sentence splitting algorithm for conventional writing can be defined as follows  [Pg.68]

Search forwards through the input to find instances of the possible end of sentence characters., and . [Pg.68]

For conventional writing we make use of the fact that in most cases the writer has clearly marked the sentence boundaries and so our job is simply to recover these from the text. Perhaps ironically then, our algorithm is based on the lay notion of finding upper-case characters at beginnings of sentences and full-stop characters at the end, and marking sentences as what lies between. The situation is not quite that simple as we know the full-stop character can be used for a variety of purposes, as can upper-case characters. The task is therefore one of finding instances of full stops and related characters, and classifying them as to whether they indicate sentence boundaries or not. [Pg.67]


Sentence splitting segmentation of the document into a list of sentences. [Pg.53]

Now that we know what we are looking for (underlying words) we can turn our attention to the problem of how to extract these from text. While in principle this could be achieved in a single process, it is common in TTS to perform this in a number of steps. In this section and the next we deal with the initial steps of tokenisation and sentence splitting which aim to split the input sequence of characters into units which are more easily processed by other processes which attempt to determine the word identity, subsequent... [Pg.64]

This tag can be used to override the decisions of the sentence splitting algorithm. [Pg.69]

This tag indicates that a sentence break should be placed at this point. It is a good way for the author to override any possible shortcomings in the sentence-splitting algorithm. [Pg.69]


See other pages where Sentence splitting is mentioned: [Pg.41]    [Pg.63]    [Pg.67]    [Pg.70]    [Pg.70]    [Pg.63]    [Pg.64]    [Pg.67]    [Pg.70]   
See also in sourсe #XX -- [ Pg.53 ]




SEARCH



Sentences

Sentences sentence splitting

Sentences sentence splitting

Sentencing

© 2024 chempedia.info