Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Text classification with respect to semiotic systems

1 Text classification with respect to semiotic systems [Pg.44]

As we explained, it is a mistake to think that text is simply or always an encoding of natural language. Rather, we should see text as a common physical signal that can be used to encode many different semiotic systems, of which natural language is just one (rather special) case. [Pg.44]

There are two main ways to deal with this problem. The first is the text-normalisation approach, which sees the text as the input to the S5mthesiser and tries to rewrite any non-standard text as proper linguistic text. The second is to classify each section of text according to one of the known semiotic classes. From there, a parser specific to each classes is used to analyse that section of text and uncover the underlying form. For natural language the text analysis job is now done but for the other systems an additional stage is needed, where the underlying form is translated into words. [Pg.44]

Let us consider the semiotic class approach. Assume for a moment that we can divide an input sentence into a sequence of text tokens, such that the input sentence [Pg.44]

Semiotic classification is therefore a question of assigning the correct class to each of these tokens. This can be done based on the patterns within the tokens themselves (e.g. three numbers divided by slashes (e.g. 10/12/6 7) is indicative of a date) and optionally the tokens surrounding the one in question (so that if we find 1967 preceded by in there is a good chance that this is a year). [Pg.45]




SEARCH



Classification system

Respect

Semiotic

Semiotic classification

Semiotic systems

Semiotics

© 2024 chempedia.info