Cross-validated correlation coefficient

The data containing 324 descriptor values of 88 molecules was given as an input to VSMP program, to build models based on three and four descriptors, keeping the interdescriptor correlation below 0.75. The best three-descriptors model, Eq. 80, was based on descriptors 254 (atomic type E-state index), 311 (AlogP98), and 320 (2D Van der Waals surface area) with a correlation coefficient, r, of 0.8425, and the cross-validated correlation coefficient, q, of 0.8239. The correlation coefficients of the other two VSMP models, Eqs. 81 and 82 were 0.8411 and 0.8329, respectively. Significantly, the descriptors 254 and 311 were selected in all the best three-descriptors models of VSMP. The three descriptors, in the models 80, 81, and 82 were 320, 144 (Kappa shape index of order 1), and 30 (topological Xu index), respectively. [Pg.542]

The best correlation equations obtained for the Km values in the presence of carboxylesterase (CE) or human plasma (HP) are given below as Eqns. 8.3 and 8.4, respectively. The statistical quality of the equations can be assessed by r2, the squared correlation coefficient, and q2, the cross-validated correlation coefficient (a measure of the predictive power of the equation, which is considered as acceptable when q2> 0.4). Both equations are statistically sound and have acceptable predictive power. [Pg.454]

Other cross-validated correlation coefficient calculation methods are also used. For example, instead of leaving one point out, 20% of the individual cases may be excluded each time. A correlation equation is derived from the remaining set as before, and the resulting equation is used to calculate predicted values for each of the 20% of points omitted in this procedure. The deviations are then accumulated resulting in a q. ... [Pg.230]

Statistical parameters, when available, indicating the significance of each of the descriptor s contribution to the final regression equation are listed under its corresponding term in the equation. These include the standard errors written as values, the Student t test values, and the VIF. The significance of the equation will be indicated by the sample size, n the variance explained, r the standard error of the estimate, s the Fisher index, F and the cross-validated correlation coefficient, q. When known, outliers will be mentioned. The equations are followed by a discussion of the physical significance of the descriptor terms. [Pg.232]

Table9.3 Statistical data of the models. Three different cross-validated correlation coefficients were calculated using the leave one out (LOO), leave two out (LTO) and the five random group (5RG) methods.

In the equation, n is the number of data points, r is the correlation coefficient, r2 is the goodness of fit, q2 is the leave-one-out cross-validated correlation coefficient expressing the goodness of prediction, x is the standard deviation, and F is the ratio of the variance of the calculated values to that of the observed values. The numbers in parentheses are the 95% confidence intervals. According to equation (20.1), the following structural factors affect the affinity of ligands for hERG ... [Pg.587]

Figure 4. Pictorial representation of 3D-QSAR models. The color code is as follows sterically favourable and unfavourable interactions, green and red regions, respectively favourable and unfavourable influence of high electron density, cyan and yellow zones respectively. To aid interpretation the template 26, idazoxan compounds 35 and 40 have been added to the electrostatic map, whereas clonidine, compounds 5, 8 and 34 are shown in the steric map. n, number of data points q and r, cross-validated and non-cross-validated correlation coefficient, respectively s, standard deviation one, optimal number of components.

The next step is the assessment how well the QSAR model can predict the activity of previously unseen compounds. Several strategies for this validation procedure have been published and discussed [85-87]. Cross-validation is the most common practice in internal model validation. In a single cross-validation step, either one (leave-one-out cross-validation, LQQ) or several samples (leave-group-out cross-validation, LGO) are omitted from the data set. Their activity is predicted with a model which was generated from the remaining training samples. After each compound has been predicted once, a cross-validated correlation coefficient between predictions and observations is calculated. Especially, values are considered as an estimate of the predictive power of a model. [Pg.67]

For each s pificant descriptor set, obtained in the previous step, an additional noncoUinear descriptor scale was added, and the appropriate (n + l)-parameter regression treatment was performed. When the Fisher criterion at the given probability level, F (or the cross-validated correlation coefficient for leave-one-out Rcv(Q). obtained for any of these correlations was smaller than that for the best correlation of the previous rank, the latter was designated as the final result and the search was terminated. Otherwise, the descriptor sets with the highest regression... [Pg.255]

The final result has therefore the maximum value of the Fisher criterion and the highest value of the cross-validated correlation coefficient. According to these statistical criteria, it was considered as the best representation of the property in the given (large) descriptor space. The BMLR approach has a variation that takes care of the noncoUinearity of descriptors pairs, called the Heuristics method (1996JPC10400). The advantages of such methods are that they are fast and limit the chance correlation to minimum. Both techniques were successfully used by ARK for model building for a tremendous amount of chemical properties of compounds and heterocycHcs, in particular. [Pg.256]

A PM that is often used in jackknife calculations is the cross-validated correlation coefficient denoted typically by cv or q. For one output PE, it is given by... [Pg.119]

One of the successful examples of employment of QSAR in the design and development of an HIV drug is the discovery of Indinavir (L-735,524) (2), one of the first HIVPI approved by the US-FDA. Holloway et aL [163] first reported this compound when they conducted SAR studies on a combined series of isostere derivatives of (43,44). A high correlation between the inter-molecular interaction energy ( int) calculated for HIVPR inhibitor complexes and enzyme inhibition activity was observed. QSAR 51-53 were developed for native, acetylpepstatin-inhibited and L-689,502-inhibited HIVPR, respectively [ 163]. X-ray coordinates and the force field technique were employed in the calculation of int (intermolecular interaction energy). In these QSAR, rev is the cross-validated correlation coefficient. [Pg.227]

Different groups have employed different techniques and (sometimes) different statistical parameters to evaluate the performance of models developed independently for the modeling set (described below). To harmonize the results of this study, the same standard parameters were chosen to describe each model s performance as applied to the modeling and external test set predictions. Thus, we have employedQabj (squared leave-one-out cross-validation correlation coefficient) for the modeling set, (frequently described as coefficient of determination) for the external validations sets, and MAE (mean absolute error) for the linear correlation between predicted (Tpred) and experimental (Yexp) data (here, Y = pIGCso), these parameters are defined as follows ... [Pg.1333]

Eleven compounds that were not included in the training set were selected as a test data set to validate the QSAR models. All of the test compounds were well predicted. The mean and standard deviation of prediction errors were 0.28 and 0.005 for the CoMFA model, and only 0.33 and 0.011 for the CoMSIA model. The predictive which was analogous to the cross-validated correlation coefficient q, was 0.883 for the CoMFA and 0.908 for the CoMSIA, suggesting a high reliability of these models. [Pg.330]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...