Cross-validation test

Cross-validation test The values of q for these QSAR models are from 0.549 to 0.972. The high values of q validate the QSAR models. From the literature, it must be greater than 0.50 [73,74]. [Pg.69]

Claros (1995) released an attractive program, MitoProt. In this program, various sequence features of a potential signal region are reported to assist in the user s decision making. Later, an objective prediction method that combines many sequence features by the discriminant analysis was proposed (Claros and Vincens, 1996). With a cross-validation test, its accuracy was estimated to be 75%. [Pg.315]

The model was claimed to compute 5000-6000 molecules per min. The predictive ability of the model was validated by four approaches. In the first approach, a set of 20 compounds was randomly selected as an initial validation test set. A model was developed from the remaining 86 compounds with an MAE of 0.33, from which the test set values were then predicted. The results of this test prediction were very good and provided momentum for support of the three structure descriptors. In the second approach, a full cross-validation test of the model was investigated. The data set of 102 compounds was divided... [Pg.530]

Figure 9.23 Prediction resuits for ammonia, validated with two-segment cross validation (test set switch). Slope = 0.96. RMSEP = 0.48% ammonia.

The full-scale industrial experiment demonstrated the feasibility of a convenient, nonintrusive aconstic chemometric facility for reliable ammonia concentration prediction. The training experimental design spanned the industrial concentration range of interest (0-8%). Two-segment cross-validation (test set switch) showed good accnracy (slope 0.96) combined with a satisfactory RMSEP. It is fully possible to further develop this pilot study calibration basis nntil a fnll industrial model has been achieved. There wonld appear to be several types of analogous chemical analytes in other process technological contexts, which may be similarly approached by acoustic chemometrics. [Pg.301]

Two non-parametric methods for hypothesis testing with PCA and PLS are cross-validation and the jackknife estimate of variance. Both methods are described in some detail in the sections describing the PCA and PLS algorithms. Cross-validation is used to assess the predictive property of a PCA or a PLS model. The distribution function of the cross-validation test-statistic cvd-sd under the null-hypothesis is not well known. However, for PLS, the distribution of cvd-sd has been empirically determined by computer simulation technique [24] for some particular types of experimental designs. In particular, the discriminant analysis (or ANOVA-like) PLS analysis has been investigated in some detail as well as the situation with Y one-dimensional. This simulation study is referred to for detailed information. However, some tables of the critical values of cvd-sd at the 5 % level are given in Appendix C. [Pg.312]

The goodness of fit of PLS models is calculated as an error of the prediction, in a manner similar to the description in ordinary least squares methods. Using the so-called cross-validation test one can determine the number of significant vectors in U and T and also the error of prediction. [Pg.200]

According to the cross-validation test one uses the matrix Fand deletes a certain number of values, for example in accordance with the following scheme ... [Pg.200]

Cross-validation test The values of for these QSAR models are from... [Pg.83]

The extraction of preference functions, as the training procedure, is not a very powerful training procedure and it is not expected to lead to overtraining. We shall test this assumption by performing still another two-times cross-validation test in which 168 membrane proteins are divided into 63 proteins used by Rost et al. [9] and 105 proteins used by us. Table 9 lists performance results for different combinations of training and testing procedures... [Pg.429]

It is easily seen that our method is comparable, if not better, to the method based on the NN algorithm. The prediction method of Rost et has been trained using rigorous cross-validation test on only 69 proteins and yielded A2 = 0.7298, Q2 = 94.72%. We can see that the difference between the prediction accuracy on never-seen proteins and that on the training data set of proteins for the method of Rost et is considerably higher. [Pg.142]

In summary, the support vector machine (SVM) and partial least square (PLS) methods were used to develop quantitative structure activity relationship (QSAR) models to predict the inhibitory activity of nonpeptide HIV-1 protease inhibitors. Cenetic algorithm (CA) was employed to select variables that lead to the best-fitted models. A comparison between the obtained results using SVM with those of PLS revealed that the SVM model is much better than that of PLS. The root mean square errors of the training set and the test set for SVM model were calculated to be 0.2027, 0.2751, and the coefficients of determination (R2) are 0.9800, 0.9355 respectively. Furthermore, the obtained statistical parameter of leave-one-out cross-validation test (Q ) on SVM model was 0.9672, which proves the reliability of this model. Omar Deeb is thankful for Al-Quds University for financial support. [Pg.79]

Wilk s X = 0.459). The two monotypic genera in this data-set, Eudyptes and Eudyptula, performed poorly in the cross-validation test (Table 13.4), although the direction of failure remained the same as that for the species level. [Pg.230]

A cross-validation test (Table 13.5) showed all E. crestatus were misidentifled as P. adeliae, and all of Eu. minor as E. crestatus, P. antarctica, S. demersus or S. humboldti. No species of Spheniscus achieved more than 40.0 per cent correct identifications. These species were mostly mistaken for each other, although 37.5 per cent of S. demersus outlines are misidentifled as Eu. minor. [Pg.231]

Here the error denotes the averaged absolute error in LOO cross-validation test. [Pg.70]

By SVC with Gaussian kernel and C = 100, the samples listed in Table 6.6 can be classified with the rate of correctness of 100%. In LOO cross-validation test, the rate of correctness of prediction is 97.8%. [Pg.133]

Using support vector classification with Gaussian kernel, the data set listed in Table 6.10 can be classified with clear-cut boundaries in the feature space. The LOO cross-validation test gives the rate of correctness of prediction equal to 98.9%. [Pg.141]

Figure 7.1 illustrates the comparison of the values of experimental enthalpy of formation of polycyclic aromatic hydrocarbons and the predicted values by SVR in LOO cross-validation test. [Pg.150]

It has been found that the support vector regression with polynomial kernel of second degree can make the mathematical model for the thickness control of the semiconductor films. Figure 8.9 shows the comparison between the experimental thickness data and the predicted thickness in LOO cross-validation test [9]. [Pg.179]

LOO cross-validation test is 95.4%. By support vector classification, and select the mutually connected good sample points located far away from optimal plane of separation, we can find some optimal region to avoid the formation of mixed orientation. For example, the following region can be used as optimal regions ... [Pg.181]

By using LOO cross-validation test, it can be found that the best value of C is 80. [Pg.218]

Table 9.10 lists the MRE by SVR and ANN. It can be seen that the result of LOO cross-validation test of SVR is slightly better than that of ANN. [Pg.219]

SVC method has been used to classify the cigarettes from the factories of Yunnan province and those from Henan province. The kernel type has been selected in SVC computation. After several trials, the linear function has been found to be the best one in this case. By feature selection based on the prediction ability of SVC, a data subset including the contents of Cu, Zn, Mn and Cr is used for the mathematical modeling, with 100% rate of correctness of prediction in LOO cross-validation test for this data set. A criterion for the classification of Yunnan cigarettes from Henan ones can be obtained by SVC (linear kernel, C =100) as follows ... [Pg.225]

Although the classification by Fisher method is also very good, but the LOO cross-validation test of Fisher method leads to some misclassification results. [Pg.225]

The classification by Fisher method is also clear-cut, but both the result of LOO cross-validation test of Fisher method and the results of K-Nearest Neighbor (KNN) methods (k=l, 3 or 5) cause some misclassification, while the result of the LOO cross-validation test of SVC (linear kernel, C =100) shows 100% rate of correctness. [Pg.226]

The quality of mathematical model made by SVC is dependent on the selection of kernel function and parameter C in computation. In order to find the best choice of kernel function and the value of parameter C, the rate of correctness (P ) of the prediction in LOO cross-validation test is used as the criterion for this selection work. [Pg.227]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...