K-nearest-neighbor classification

Kauffman, G.W. and Jurs, P.C. (2001b) QSAR and k-nearest neighbor classification analysis of selective cyclooxygenase-2 inhibitors using topologically-based numerical descriptors./. Chem. Inf. Comput. Sci., 41, 1553-1560. [Pg.1087]

In the original kNN method, an unknown object (molecule) is classified according to the majority of the class memberships of its K nearest neighbors in the training set (Fig. 13.4). The nearness is measured by an appropriate distance metric (a molecular similarity measure as applied to the classification of molecular structures). It is implemented simply as follows ... [Pg.314]

When applied to QSAR studies, the activity of molecule u is calculated simply as the average activity of the K nearest neighbors of molecule u. An optimal K value is selected by the optimization through the classification of a test set of samples or by the leave-one-out cross-validation. Many variations of the kNN method have been proposed in the past, and new and fast algorithms have continued to appear in recent years. The automated variable selection kNN QSAR technique optimizes the selection of descriptors to obtain the best models [20]. [Pg.315]

This rule can be easily extended so that the classification of the majority of k nearest neighbors is used to assign the pattern class. This extension is called the k-nearest neighbor rule and represents the generalization of the 1-NN rule. Implementation of the k-NN rule is unwieldy at best because it requires calculation of the proximity indices between the input pattern x and all patterns z , of known classification. [Pg.59]

Miller, D. W. (2001) Results of a new classification algorithm combining K nearest neighbors and recursive partitioning../. Chem Inf. Comput. Sci. 41, 168-175. [Pg.108]

While assigning a class is the goal of KNN, it is also of interest to determine the confidence to place on the classification. Several approaches can be taken to measure this confidence. For example, more confidence is placed on classifications when all K nearest neighbors are from the same class. Conversely, the confidence in tlie classification decreases as the K nearest neighbors are represented by more tlian one class (e.g., the first nearest neighbor is from class A and the second nearest neighbor is from B). [Pg.62]

A quantitative approach to placing a confidence on classification is to calculate the proportion of the K nearest neighbors that are members of the respective classes. In the terminology used in this book, this is an attempt to validate... [Pg.62]

To have confidence in a classification, nearly all of the K nearest neighbors should be firom the same class. [Pg.68]

A sample classification is suspect if the K nearest neighbors are from multiple classes. [Pg.68]

Note that using an 11 nearest-neighbor rule to classify X would result in 10 A votes and 1 D vote. The only reason a D sample receives a vote is because there are only 10 A samples. This does not pose a problem in this example because of the large number of A samples. However, if there were only 1 A sample and 10 D samples, an 11 nearest-neighbor classification would have 10 D votes and 1 A vote even if the unknown is verv close to the A class and ver> far from the D class. Tlierefore, the maximrtm value of K that should be considered is equal to the number of samples in the class with the fewest members. [Pg.240]

Supervised learning methods - multivariate analysis of variance and discriminant analysis (MVDA) - k nearest neighbors (kNN) - linear learning machine (LLM) - BAYES classification - soft independent modeling of class analogy (SIMCA) - UNEQ classification Quantitative demarcation of a priori classes, relationships between class properties and variables... [Pg.7]

K-nearest neighbors (KNN) is the name of a classification method that is unsupervised in the sense that class membership is not used to train the technique, but that makes predictions of class membership. The principle behind KNN is an appealingly common sense one objects that are close together in the descriptor space are expected to show similar behavior in their response. Figure 7.5 shows a two-dimensional representation of the way that KNN operates. [Pg.171]

Besides the classical Discriminant Analysis (DA) and the k-Nearest Neighbor (k-NN), other classification methods widely used in QSAR/QSPR studies are SIMCA, Linear Vector Quantization (LVQ), Partial Least Squares-Discriminant Analysis (PLS-DA), Classification and Regression Trees (CART), and Cluster Significance Analysis (CSA), specifically proposed for asymmetric classification in QSAR. [Pg.1253]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...