KNN-method

In the original kNN method, an unknown object (molecule) is classified according to the majority of the class memberships of its K nearest neighbors in the training set (Fig. 13.4). The nearness is measured by an appropriate distance metric (a molecular similarity measure as applied to the classification of molecular structures). It is implemented simply as follows ... [Pg.314]

When applied to QSAR studies, the activity of molecule u is calculated simply as the average activity of the K nearest neighbors of molecule u. An optimal K value is selected by the optimization through the classification of a test set of samples or by the leave-one-out cross-validation. Many variations of the kNN method have been proposed in the past, and new and fast algorithms have continued to appear in recent years. The automated variable selection kNN QSAR technique optimizes the selection of descriptors to obtain the best models [20]. [Pg.315]

D. Coomans and D.L. Massart, Alternative K-nearest neighbour rules in supervised pattern recognition. Part 2. Probabilistic classification on the basis of the kNN method modified for direct density estimation. Anal. Chim. Acta, 138 (1982) 153-165. [Pg.240]

The KNN method [77] is probably the simplest classification method to understand. Once the model space and distance measure are defined, its classification rule involves rather simple logic ... [Pg.393]

The KNN method has several advantages aside from its relative simplicity. It can be used in cases where few calibration samples are available. In addition, it does not assume that the classes are separated by linear partitions in the space. As a result, it can be rather effective at handling highly nonUnear separation structures. [Pg.394]

One disadvantage of the KNN method is that it does not provide an assessment of confidence in the class assignment result, hi addition, it does not sufficiently handle cases where an unknown sample belongs to none of the classes in the calibration data, or to more than one class. A practical disadvantage is that the user must input the number of nearest neighbors (K) to use in the classifier. In practice, the optimal value of K is influenced by the total number of calibration samples (N), the distribution of calibration samples between classes, and the degree of natural separation of the classes in the sample space. [Pg.394]

The KNN method is probably the simplest classification method to understand. It is most commonly applied to a principal component space. In this case, calibration is achieved by simply constructing a PCA model using the calibration data, and choosing the optimal number of PCs (A) to use in the model. Prediction of an unknown sample is then done by calculating the PC scores for that sample (Equation 8.57), followed by application of the classification rule. [Pg.289]

One disadvantage of the KNN method is that it does not provide an assessment of confidence in the class assignment result. In addition, it does not sufficiently handle the... [Pg.290]

Li L, Weinberg CR, Darden TA, Pedersen LG. Gene selection for sample classification based on gene expression data study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 2001 17 1131-42. [Pg.423]

The kNN method is illustrated in Figure 13.10, where the data obviously contain two clusters K and L. In the first case (Fig. 13.10a), the unknown sample (A) is situated in between the samples of class L and the kNN method classifies the sample correctly in that class. When the unknown object ( ) is located at the border of, for instance, class L, but also close to the other class (Fig. 13.10b), kNN will allocate the object to the class with the majority of the k nearest objects, in this case class L. In the third case (Fig. 13.10c), the unknown ( ) is situated at the border of class K and far from class L. Since all kNNs are belonging to class K, the object will be classified in that... [Pg.306]

The KNN method has the advantage that it is applicable in cases where data are not separable by a linear decision surface and performs best on data that contain a high density of points. Since no classificator is explicitly stated, an interpretation of the results in stmctural terms is not possible. [Pg.75]

Table 2. Classification of compounds from the training set according to KNN method...

Each pattern is characterized in the usual way (Chapter 1.2) by a set of d components (features, measurements) and can be considered as a point in a d-dimensional space. An additional component as described in Chapter 1.3 is not necessary for the KNN-method. Classification of an unknown pattern x is made by examination some pattern points with known class membership which are closest to x- In order to find the nearest neighbours of the unknown it is necessary to compute the distances between X and all other pattern points of the available data set. The number of neighbours which are considered for classification is usually denoted by K. If only one neighbour is used for the classification (K=1, 1NN-method") the class membership of the first (nearest) neighbour gives the class membership of the unknown. If more than one neighbour is used a voting scheme or some other procedure is applied to determine the class of the unknown. [Pg.62]

The KNN-method does not require linearly separable clusters (Figure 31). The KNN-method is a multiclass method. Membership to any number of classes is determined by the same neighbour patterns ... [Pg.62]

No training is necessary because the classification procedure contains all patterns of the data set. New patterns may be added to the data set without difficulties- The main disadvantage of the original KNN-method is the fact that no data compression is possible all pattern vectors... [Pg.62]

FIGURE 31. KNN-method. In this Linearly inseparable data set, each point - considered as unknown - is classified correctly by the first neighbour. [Pg.63]

Besides the Euclidean distance other distance measurements (Chapter 2.1.8) have been used for chemical applications of the KNN-method. [Pg.64]

In most chemical applications of the KNN-method, only the first (nearest) neighbour (K=1) was used for classification. For K>1 a simple vot1ng ("one neighbour one vote") may be applied. The contributions of the neighbours to the voting can also be weighted by the distances (or the squared distances) between the unknown and the neighbours. [Pg.64]

An increase in the number of neighbours permits to estimate the probabilities of all classes at the point which is defined by the unknown pattern. With a large number of neighbours the KNN-method becomes a modified Bayes classification (Chapter 5) C2003. [Pg.64]

A disadvantage of the KNN method is the Large amount of time which is required to classify unknown patterns with a large data set. Because all patterns of the data set must be examined to classify each unknown the computational requirements may make applications prohibitively expensive. Classification time can be significantly decreased if the training set can be reduced to a smaller number of patterns which lie near the decision boundary. Several strategies have been proposed to find an optimum subset with a minimum number of patterns that correctly classifies all pattern sof the original data set. [Pg.69]

The edited KNN-method eliminates patterns that are incorrectly classified by use of the remainder of the data C405 4063. [Pg.69]

Ritter et. al. C2463 described a selective KNN-method to approximate the decision boundary by an optimum subset of patterns. [Pg.69]

The KNN-method has been used to analyze 2-dimensional projections of d-dimensional clusters in order to evaluate the method of projection C248/ 2993. [Pg.71]

The KNN-method is the method of choice if the cluster structure is complex and a linear classifier fails. Because of the large computational requirements necessary for KNN-classifications the method is not suitable for a large number of unknowns or a large data set of known patterns. [Pg.71]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...