K-nearest neighbour

The methods of SIMCA, discriminant analysis and DPLS involve producing statistical models, such as principal components and canonical variates. Nearest neighbour methods are conceptually much simpler, and do not require elaborate statistical computations. [Pg.249]

Calculate the distance of an unknown to all members of the training set (see Table 4.28). Usually a simple Euclidean distance is computed, see Section 4.4.1. [Pg.249]

Take the majority vote and use this for classification. Note that if K = 5, one of the five closest objects belongs to class A in this case. [Pg.250]

Sometimes it is useful to perform KNN analysis for a number of values of K, e.g. 3, 5 and 7, and see if the classification changes. This can be used to spot anomalies or artefacts. [Pg.250]

This conceptually simple approach works well in many situations, but it is important to understand the limitations. [Pg.251]

Shen M, Xiao Y, Golbraikh A, Gombar VK, Tropsha A. Development and validation of k-nearest neighbour QSPR models of metabolic stability of drug candidates. I Med Chem 2003 46 3013-20. [Pg.463]

A first distinction which is often made is that between methods focusing on discrimination and those that are directed towards modelling classes. Most methods explicitly or implicitly try to find a boundary between classes. Some methods such as linear discriminant analysis (LDA, Sections 33.2.2 and 33.2.3) are designed to find explicit boundaries between classes while the k-nearest neighbours (A -NN, Section 33.2.4) method does this implicitly. Methods such as SIMCA (Section 33.2.7) put the emphasis more on similarity within a class than on discrimination between classes. Such methods are sometimes called disjoint class modelling methods. While the discrimination oriented methods build models based on all the classes concerned in the discrimination, the disjoint class modelling methods model each class separately. [Pg.208]

In a more sophisticated version of this technique, called the k-nearest neighbour method (k-NN method), one selects the k nearest objects to u and applies a majority rule u is classified in the group to which the majority of the k objects belong. Figure 33.12 gives an example of a 3-NN method. One selects the three nearest neighbours (A, B and C) to the unknown u. Since A and B belong to L, one... [Pg.223]

D. Coomans and D.L. Massart, Alternative K-nearest neighbour rules in supervised pattern recognition. Part 2. Probabilistic classification on the basis of the kNN method modified for direct density estimation. Anal. Chim. Acta, 138 (1982) 153-165. [Pg.240]

The K nearest neighbour criterion can also be used for classification. Find the distance of die object in question 3 from the nine objects in die table above. Which are die three closest objects, and does tiiis confirm the conclusions in question 3 ... [Pg.257]

Identify the top -K nearest neighbours for each of the N molecules in the dataset. [Pg.120]

The Jarvis-Patrick method involves the use of a list of the top K nearest neighbours for each molecule in a dataset, i.e., the ATmolecules that are most similar to it. Once these lists have been produced for each molecule in the dataset that is to be processed, two molecules are clustered together if they are nearest neighbours of each other and if they additionally have some... [Pg.120]

The discriminant analysis techniques discussed above rely for their effective use on a priori knowledge of the underlying parent distribution function of the variates. In analytical chemistry, the assumption of multivariate normal distribution may not be valid. A wide variety of techniques for pattern recognition not requiring any assumption regarding the distribution of the data have been proposed and employed in analytical spectroscopy. These methods are referred to as non-parametric methods. Most of these schemes are based on attempts to estimate P(x g > and include histogram techniques, kernel estimates and expansion methods. One of the most common techniques is that of K-nearest neighbours. [Pg.138]

Application of Equation (39) to the /sT-NN rule serves to define a sphere, or circle for bivariate data, about the unclassified sample point in space, of radius rjc which is the distance to the ATth nearest neighbour, containing K nearest neighbours. Figure 7. It is the volume of this sphere which is used as an estimate... [Pg.139]

B. K. Alsberg, R. Goodacre, J.J. Rowland and D.B. Kell, Classification of Pyrolysis Mass Spectra by Fuzzy Multivariate Rule Induction-comparison with Regression, K-nearest Neighbour, Neural and Decision-tree Methods. Analytica Chimica Acta, 348(1-3) (1997), 389 07. [Pg.408]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...