K nearest neighbors

Figure 3-20. Distribution of the dataset of 120 reactions in the Kohonen netv/ork, a) The neurons were patterned on the basis of intellectually assigned reaction types b) in addition, empty neurons were patterned on the basis of their k nearest neighbors.

In the original kNN method, an unknown object (molecule) is classified according to the majority of the class memberships of its K nearest neighbors in the training set (Fig. 13.4). The nearness is measured by an appropriate distance metric (a molecular similarity measure as applied to the classification of molecular structures). It is implemented simply as follows ... [Pg.314]

When applied to QSAR studies, the activity of molecule u is calculated simply as the average activity of the K nearest neighbors of molecule u. An optimal K value is selected by the optimization through the classification of a test set of samples or by the leave-one-out cross-validation. Many variations of the kNN method have been proposed in the past, and new and fast algorithms have continued to appear in recent years. The automated variable selection kNN QSAR technique optimizes the selection of descriptors to obtain the best models [20]. [Pg.315]

Zheng W, Tropsha A. Novel variable selection quantitative structure-property relationship approach based on the k-nearest-neighbor principle. J Chem Inf Comput Sci 2000 40(l) 185-94. [Pg.317]

Shen M, Xiao Y, Golbraikh A, Gombar VK, Tropsha A. Development and validation of k-nearest-neighbor QSPR models of metabolic stability of drug candidates. J Med Chem 2003 46 3013-20. [Pg.375]

A similarity-related approach is k-nearest neighbor (KNN) analysis, based on the premise that similar compounds have similar properties. Compounds are distributed in multidimensional space according to their values of a number of selected properties the toxicity of a compound of interest is then taken as the mean of the toxicides of a number (k) of nearest neighbors. Cronin et al. [65] used KNN to model the toxicity of 91 heterogeneous organic chemicals to the alga Chlorella vulgaris, but found it no better than MLR. [Pg.481]

This rule can be easily extended so that the classification of the majority of k nearest neighbors is used to assign the pattern class. This extension is called the k-nearest neighbor rule and represents the generalization of the 1-NN rule. Implementation of the k-NN rule is unwieldy at best because it requires calculation of the proximity indices between the input pattern x and all patterns z , of known classification. [Pg.59]

Lazar (http //lazar.in silico.de/predict) is a k-nearest-neighbor approach to predict chemical endpoints from a training set based on structural fragments [43]. It derives predictions for query structures from a database with experimentally determined toxicity data [43]. Model provides prediction for four endpoints Acute toxicity to fish (lethality) Fathead Minnow Acute Toxicity (LC50), Carcinogenicity, Mutagenicity, and Repeated dose toxicity. [Pg.185]

Horton, P., and Nakai, K. (1997). Better prediction of protein cellular localization sites with the k nearest neighbor classifier. Intell. Syst. Mol. Biol. 5, 147-152. [Pg.336]

Next, supervised-learning pattern recognition methods were applied to the data set. The 111 bonds from these 28 molecules were classified as either breakable (36) or non-breakable (75), and a stepwise discriminant analysis showed that three variables, out of the six mentioned above, were particularly significant resonance effect, R, bond polarity, Qa, and bond dissociation energy, BDE. With these three variables 97.3% of the non-breakable bonds, and 86.1% of the breakable bonds could be correctly classified. This says that chemical reactivity as given by the ease of heterolysis of a bond is well defined in the space determined by just those three parameters. The same conclusion can be drawn from the results of a K-nearest neighbor analysis with k assuming any value between one and ten, 87 to 92% of the bonds could be correctly classified. [Pg.273]

Similarity Distance In the case of a nonlinear method such as the k Nearest Neighbor (kNN) QSAR [41], since the models are based on chemical similarity calculations, a large similarity distance could signal query compounds that are too dissimilar to the... [Pg.442]

Here y is the average and cr is the standard deviation of the Euclidean distances of the k nearest neighbors of each compound in the training set in the chemical descriptor space, and Z is an empirical parameter to control the significance level, with the default value of 0.5. If the distance from an external compound to its nearest neighbor in the training set is above Dc, we label its prediction unreliable. [Pg.443]

Shen, M., Letiran, A., Xiao, Y., Golbraikh, A., Kohn, H., Tropsha, A. Quantitative structure-activity relationship analysis of functionalized amino acid anticonvulsant agents using k nearest neighbor and simulated annealing PLS methods./. Med. Chem. 2002, 45, 2811-2823. [Pg.455]

In the following discussion, three types of air pollutant analytical data will be examined using principal component analysis and the K-Nearest Neighbor (KNN) procedure. A set of Interlaboratory comparison data from X-ray emission trace element analysis, data from a comparison of two methods for determining lead In gasoline, and results from gas chromatography/mass spectrometry analysis for volatile organic compounds In ambient air will be used as Illustrations. [Pg.108]

In order to check the results of the analysis, K-Nearest Neighbor distances were computed for the scaled data set Including the cadmium results. The median of the distances from a given laboratory to the three nearest neighbors ranged from 0.26 to 1.24 with the median distance between members of the cluster (1,2,3,5,6,7) equal to 0.79. The median distances of Laboratories 4 and 8 from members of this cluster were 1.24 and 1.22, respectively, supporting the view that these laboratories are outliers. [Pg.110]

In the distance methods, class assignment is based on the distance of the unknown to its k-nearest neighbors since the distances of the training set objects from each other are known, one can determine whether an unknown is not a member of the training sets. [Pg.249]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...