Counting tokens

The basic idea of decision lists is to determine the strength of the link between a token and its trigger, and this is done by measuring the frequency of occurrence. In the following table (taken from [507]), we see the raw counts which have been found for the token bass ... [Pg.86]

In this, the expression P fi,f2,--fNlwi) is easier to calculate than the expression of Equation 5.1 because all we need do now is find a word that gives rise to an ambiguous token and then count occurrences of the features fi,f2,--fN around this word. As we mentioned in Section 5.2.1, in general the interactions between a set of features mean that calculating P f, fi,--fN)... [Pg.87]

Now we turn to some more problematic cases. In English, the consonants /dh/, /zh/, /ng/ and /h/ are a little more difficult to deal with. There are clearly exactly four unvoiced fricatives /th/, /f/, /s/ and /sh/ in English and of these, /f/ and /s/ have the voiced equivalents /v/ and /z/, and these are unproblematic. The voiced equivalent of /th/ is /dh/ and while /th/ is a common enough phoneme, /dh/ only occurs in very particular patterns. For a start, the number of minimal pairs that occur between /dh/ and /th/ are few real examples include teeth noun, /1 iy th/ and teeth.verb, /t iy dh/, while most are near minimal pairs such as bath /b ae th/ and bathe / b ey dh/ where the quality of the vowel differs. Even if we except /dh/ as a phoneme on this basis, it occurs in strange lexicon patterns. In an analysis of one dictionary, it occurred in only about 100 words out of a total of 25,000. But if we take token count into consideration, we find in fact that it is one of the most common phonemes of all in that it occurs in some of the most common function words... [Pg.200]

This successful mystery observee program illustrates an important principle in incentive/ reward programs. You get what you reinforce. Programs lhat reward employees for handing in a completed CBC will probably increase the number of checklists received, but how about tire quality of the CBC Will the number of constructive comments on a CBC decrease when a reward is given for quantity You can count on this for employees who view the reward as a "payoff" for their efforts. That is why it is important to educate people about the rationale and true value of a particular safety effort. Then the big payoff is injury prevention, and the extra reward can be perceived as a "token of appreciation" for heartfelt participation. [Pg.229]

To Introduce Important concepts, then, it is first necessary to distinguish between a specific use name found in the data base and all occurrences of that specific name. The specific name Itself will be called a use type, while any single occurrence of that name will be called a use token. Thus, if the use naraotio occurs A8 times in the data base, then that set could be described as 48 use tokens or, on the other hand, it could be described as the use type naraotio having a frequency of 48 in the data base. This usage is very convenient for counting purposes. [Pg.40]

The type of data file displayed in Table IV resembles a concordance in the sense that it shows occurrences of different words in different subclasses or texts. In most linguistic studies, the columns represent different texts, and the table entries denote counts of word tokens in those texts. We will use the description "text" here to denote one of these columns even though no text has been defined rather, "text" means a partitioning of the data. The entire table will be called a word distribution table, or use distribution file, when referring to these particular data. [Pg.46]

Frequency Distribution. The word distribution table provides not only the distribution of word tokens among the various texts but also the total frequency of each word type. The total number of types or tokens having a given frequency can be determined by simply counting types in the distribution table or summing the total number of tokens with that frequency. The functions which present the number of types or tokens of a given frequency are called, respectively, the type and token distribution fvnations. [Pg.46]

In practical terms, the type and token cumulative frequency distributions may be tested for log-normality by plotting these functions on normal probability paper with the logarithm of frequency as the abscissa. When this test was applied to the medical use type and token frequency distributions, the log-normal model was found to describe both distributions very well over two orders of magnitude. Figure 1 shows this relationship for the use type distribution and also for the cluster type distribution. The cluster types involve a different counting than use types, so that two clusters are the same if all their components are Identical. The total number of cluster types is simply the sum of the last column in Table II. [Pg.48]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...