Cross-validation procedures technique

Oliveri et al. (2009) presented the development of an artificial tongue based on cyclic voltammetry at Pt microdisk electrodes for the classification of olive oils according to their geographical origin the measurements are made directly in the oil samples, previously mixed with a proper quantity of a RTIL (room temperature ionic liquid). The pattern recognition techniques applied were PCA for data exploration and fc-NN for classification, validating the results by means of a cross-validation procedure with five cancellation groups. [Pg.107]

The simplest and most general cross-validation procedure is the leave-one-out technique (LOO technique), where each object is taken away, one at a time. In this case, given n objects, n reduced models have to be calculated. This technique is particularly important as this deletion scheme is unique, and the predictive ability of the different models can be compared accurately. However, in several cases, the predictive ability obtained is too optimistic, particularly when the number of objects is quite large. This is because of a too small perturbation of the data when only one object is left out. [Pg.462]

When the number of objects is not too small, more realistic predictive abilities are obtained by deleting more than one object at each step. To apply this cross-validation procedure, called the leave-more-out technique (LMO technique), the number of cancellation groups is defined by the user, i.e. the number of blocks the data are divided into, and, at each step, all the objects belonging to a block are left out from the calculation of the model. [Pg.462]

A rapid head-space analysis instrument for the analysis of the volatile fractions of 105 extra virgin olive oils coming from five different Mediterranean areas was put forward by Cerrato-Oliveros and his co-workers. The rough information collected by this system was unraveled and interpreted with well-known multivariate techniques of display (principal component analysis), feature selection (stepwise linear discriminant analysis), and classification (linear discriminant analysis). 93.4% of the samples were correctly classified and 90.5% correctly predicted by the cross-validation procedure, whilst 80.0% of an external test set, aiming at full validation of the classification rule, were correctly assigned. [Pg.177]

Several cross-validation techniques are readily implemented in GOLPE (generating optimal linear PLS estimations). This program performs chemo-metric analyses on GRID and CoMFA fields, and it can be used to further refine the PLS model. In the two-random-groups cross-validation procedure,... [Pg.154]

Aires-de-Sousa and Gasteiger used four regression techniques [multiple linear regression, perceptron (a MLF ANN with no hidden layer), MLF ANN, and v-SVM regression] to obtain a quantitative structure-enantioselectivity relationship (QSER). The QSER models the enantiomeric excess in the addition of diethyl zinc to benzaldehyde in the presence of a racemic catalyst and an enan-tiopure chiral additive. A total of 65 reactions constituted the dataset. Using 11 chiral codes as model input and a three-fold cross-validation procedure, a neural network with two hidden neurons gave the best predictions ANN 2 hidden neurons, R pred = 0.923 ANN 1 hidden neurons, R pred = 0.906 perceptron, R pred = 0.845 MLR, R p .d = 0.776 and v-SVM regression with RBF kernel, R pred = 0.748. [Pg.377]

Equations (24) and (25) are adequate for designing decision trees. The feature that minimizes the information content is selected as a node. This procedure is repeated for every leaf node until adequate classification is obtained. Techniques for preventing overfitting of training data, such as cross validation are then applied. [Pg.263]

Cross validation and bootstrap techniques can be applied for a statistically based estimation of the optimum number of PCA components. The idea is to randomly split the data into training and test data. PCA is then applied to the training data and the observations from the test data are reconstmcted using 1 to m PCs. The prediction error to the real test data can be computed. Repeating this procedure many times indicates the distribution of the prediction errors when using 1 to m components, which then allows deciding on the optimal number of components. For more details see Section 3.7.1. [Pg.78]

To avoid models with chance correlation, a check with different validation procedures must be adopted, such as, for example, cross-validation, y-scrambling and QUICK rule. A general validation procedure [Wold, 1991] would be the deletion of some objects before the selection of the variables applying the variable selection procedure and then predicting the responses for excluded objects. The whole procedure, including variable selection, is then repeated a number of times, depending on the adopted specific validation technique. [Pg.461]

Accuracy is the ability of any assay to provide the correct result. Ideally, the assay should detect all of the analyte (100% recovery) and nothing else (no interference or cross-reactivity). To estimate the method accuracy, a comparison of method results with tme sample concentrations must be completed. A straightforward procedure involves the use of a standard reference material, in which the analyte concentration is known with high accuracy and precision. Standard reference materials are not generally available for biochemical analytes, however. When a reference material is not available, accuracy can be established by comparison with alternative previously validated analytical techniques, or currently accepted methods. Intralaboratory tests of matrix effects and interferences are also conducted in order to establish the accuracy of a new method. [Pg.332]

SIMCA is a supervised pattern recognition technique, which needs to have the data classrhed manually or done using HCA. SIMCA then performs PCA on each class with a sufficient number of factors retained to account for most of the variation within classes. The number of factors retained is very important. If too few are selected, the information in the model set can become distorted. By using a procedure called cross validation, segments of the data are omitted during PCA, and the omitted data are predicted and compared to the actual value. This is repeated for every data element until each point has been excluded once from the determination. The PCA model that yields the minimum prediction error for the omitted data is retained. [Pg.191]

Selection of Optimal Tree. The optimal tree (most accurate tree) is the one having the highest predictive ability. Therefore, one has to evaluate the predictive error of the subtrees and choose the optimal one among them. The most common technique for estimating the predictive error is the cross-validation method, especially when the data set is small. The procedure of performing a cross validation is described earlier (see section 14.2.2.1). In practice, the optimal tree is chosen as the simplest tree with a predictive error estimate within one standard error of minimum. It means that the chosen tree is the simplest with an error estimate comparable to that of the most accurate one. [Pg.337]

Note PRESS and q may relate to any property that is being modelled and not just activity , will always be smaller than r. When q > 0.3, a model is considered significant. Although cross-validation may seem a robust validation technique, some difficulties should not be overlooked. Variables that do not contribute to prediction, i.e. cause noise in the model, may have detrimental effects on CV. This may particularly play a role when many variables have to be considered, such as in a 3D-QSAR CoMFA analysis (see Chapter 25). A procedure for variable selection in the case of many variables has been developed and is named GOLPE (generating optimal linear PLS estimations). ... [Pg.361]

To ensure the reliability of analytical techniques, they need to be validated. Validation provides information on the overall performance of the assay as well as on individual parameters and factors that can be used to estimate the degree of uncertainty associated with an assay (Ellison et al., 2000). An adequate validation procedure assesses, and therefore ensures, that the immunoassay performs within an acceptable range of established criteria. Parameters used to evaluate the performance of the assays may be affected by (1) factors inherent to the analytical technique, such as antibody specificity and antibody cross-reactivity, and (2) external factors such as environmental conditions (temperature) and type of sample (matrix, processed food vs. raw ingredients). A... [Pg.237]

Most current studies report an internal validation accuracy without an independent validation set. When there are a sufficiently large number of samples, the whole dataset can be split into two, one for training and one for testing (validation) this method is called hold-out validation. When the number of samples is limited, leave-one-out cross validation (LOOCV) is a popular technique. Here, a procedure is repeated N times, and each time a different sample is left out and used for testing the model learned from the remaining (N - 1) samples. The accuracy of... [Pg.420]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...