Sample selection, calibration

Figure 8.39 Three-dimensional scatter plot of the first three PCA scores obtained from a set of original calibration data. The calibration samples selected by the cluster analysis method are marked with an x. ...

There are three rules of thumb to guide us in selecting the number of calibration samples we should include in a training set. They are all based on the number of components in the system with which we are working. Remember that components should be understood in the widest sense as "independent sources of significant variation in the data." For example, a... [Pg.19]

This GLS estimator is akin to inverse variance-weighted regression discussed in Section 8.2.3. Again there is a limitation V can be inverted only when the number of calibration samples is larger than the number of predictor variables, i.e. spectral wavelengths. Thus, one either has to work with a limited set of selected wavelengths or one must apply other solutions which have been proposed for tackling this problem [5]. [Pg.356]

The reliability of multispecies analysis has to be validated according to the usual criteria selectivity, accuracy (trueness) and precision, confidence and prediction intervals and, calculated from these, multivariate critical values and limits of detection. In multivariate calibration collinearities of variables caused by correlated concentrations in calibration samples should be avoided. Therefore, the composition of the calibration mixtures should not be varied randomly but by principles of experimental design (Deming and Morgan [1993] Morgan [1991]). [Pg.188]

In MLR, if m is the vector (dimension by 1) of the selected absorbance values obtained from a spectral vector a, and M is the matrix of selected absorbance values for the calibration samples, then the Mahalanobis Distance is defined as equation 74-6a ... [Pg.498]

An important aspect of variable selection that is often overlooked is the hazard brought about through the use of cross-validation for two quite different purposes namely (1) as an optimization criterion for variable selection and other model optimization tasks (including selection of the optimal number of PLS LVs or PCR PCs) and (2) as an assessment of the quality of the final model built using all samples. In this case, one can get highly optimistic estimates of a model s performance, because the same criterion is used to both optimize and evaluate the model. As a result, when doing variable selection, especially with a limited number of calibration samples, it is advisable to do an additional outer loop cross-validation across the entire model... [Pg.424]

It is worth noting that these standards could be a subset of the same standards used to develop the calibration model for the property of interest. In this case, there are several sample selection strategies available for identifying the transfer standards from the complete set of calibration samples [107-111]. [Pg.427]

The body of samples selected is split into two subsets, namely the calibration set and the validation set. The former is used to construct the calibration model and the latter to assess its predictive capacity. A number of procedures for selecting the samples to be included in each subset have been reported. Most have been applied to situations of uncontrolled variability spanning much wider ranges than those typically encountered in the pharmaceutical field. One especially effective procedure is that involving the selection of as many samples as required to span the desired calibration range and encompassing the whole possible spectral variability (i.e. the contribution of physical properties). The choice can be made based on a plot of PCA scores obtained from all the samples. [Pg.474]

Once we have generated the forest of trees, the next question is what to do with the information it contains. To explore this, recall that the analysis has two possible purposes —prediction and feature selection. Prediction is the problem that arises if we have a new compound whose activity is unknown and wish to predict its activity on the basis of the relationships seen in the calibration sample. A good way to use the forest to make such a prediction is bagging. ... [Pg.325]

Calibration Design 9 samples, selected using a mixture design Preprocessing baseline correction using the average of the first 10 measurement variables. [Pg.295]

In multivariate calibration, selectivity is commonly used to measure the amount of signal that cannot be used for prediction because of the overlap between the signal of the analyte and the signal of the interferences [68,69]. For inverse models, such as PLS, selectivity is usually calculated for each calibration sample as... [Pg.229]

To illustrate the MLR method, the SMLR calibration method is used to build a model for the czs-butadiene content in the polymers. In this case, four variables are specified for selection, based on prior knowledge that there are four major chemical components that are varying independently in the calibration samples. The SMLR method chooses the four X-variables 1706, 1824, 1670, and 1570 nm, in that order. These four selected variables are then used to build an MLR regression model for czs-butadiene content, the fit of which is shown in Figure 8.13. Table 8.5 lists the variables that were chosen by the SMLR method,... [Pg.255]

The selected subset cross-validation method is probably the closest internal validation method to external validation in that a single validation procedure is executed using a single split of subset calibration and validation data. Properly implemented, it can provide the least optimistic assessment of a model s prediction error. Its disadvantages are that it can be rather difficult and cumbersome to set it up so that it is properly implemented, and it is difficult to use effectively for a small number of calibration samples. It requires very careful selection of the validation samples such that not only are they sufficiently representative of the samples to be applied to the model during implementation, but also the remaining samples used for subset calibration are sufficiently representative as well. This is the case because there is only one chance given to test a model that is built from the data. [Pg.272]

To illustrate some commonly encountered classification methods, a data set obtained from a series of polyurethane rigid foams will be used.55 In this example, a series of 26 polyurethane foam samples were analyzed by NIR diffuse reflectance spectroscopy. The spectra of these foams are shown in Figure 8.25. Each of these foam samples belongs to one of four known classes, where each class is distinguished by different chemistry in the hard block parts of the polymer chain. Of the 26 samples, 24 are selected as calibration samples and 2 samples are selected as prediction samples. Prediction sample A is known to belong to class number 2, and prediction sample B is known to belong to class number 4. Table 8.8 provides a summary of the samples used to produce this data set. [Pg.289]

There are several methods that can be used to select well-distributed calibration samples from a set of such happenstance data. One simple method, called leverage-based selection, is to run a PCA analysis on the calibration data, and select a subset of calibration samples that have extreme values of the leverage for each of the significant PCs in the model. The selected samples will be those that have extreme responses in their analytical profiles. In order to cover the sample states better, it would also be wise to add samples that have low leverage values for each of the PCs, so that the center samples with more normal analytical responses are well represented as well. Otherwise, it would be very difficult for the predictive model to characterize any non-linear response effects in the analytical data. In PAC, where spectroscopy and chromatography methods are common, it is better to assume that non-linear effects in the analytical responses could be present than to assume that they are not. [Pg.313]

Another useful method for sample selection is cluster analysis-based selection.3 4,67 in this method, it is typical to start with a compressed PCA representation of the calibration data. An unsupervised cluster analysis (Section 8.6.3.1) is then performed, where the algorithm is terminated after a specific number of clusters are determined. Then, a single sample is selected from each of the clusters, as its representative in the final calibration data set. This cluster-wise selection is often done on the basis of the maximum distance from the overall data mean, but it can also be done using each of the cluster means instead. [Pg.313]

Figure 8.39 shows a three-dimensional scatter plot of the first three PC scores obtained from a PCA analysis of 987 calibration spectra that were collected for a specific on-line analyzer calibration project. In this case, cluster analysis was done using the first six PCs (all of which cannot be displayed in the plot ) in order to select a subset of 100 of these samples for calibration. The three-dimensional score plot shows that the selected samples are well distributed among the calibration samples, at least when the first three PCs are considered. [Pg.313]

The cluster analysis-based method of sample selection is very useful when one wants to ensure that at least one sample from each of a known number of subclasses is selected for calibration. However, one must be careful to specify a sufficient number of clusters, otherwise all of the subgroups might not be represented. It is always better to err on the side of determining too many clusters. The specification of a number of clusters that is much greater than the number of natural groups in the data should still result in well-distributed calibration samples. [Pg.313]

Another important issue that arises in the PDS method, as well as some other standardization methods, is the selection of the samples to use for standardization. It is critical that the standardization samples efficiently convey the magnitude and nature of instrument-to-instrument variability artifacts that are expected to be present in the analyzers while they are operating in the field. Note that this criterion is different than the criterion used for sample selection for calibration, which is to sufficiently cover the compositions of the process samples that the analyzer is expected to see during its operation. Sample selection strategies for instrument standardization have been given by many.73,77-79... [Pg.319]

The protocol must present an uncertainty budget. Its components should be carefully estimated, and may be stated in standard uncertainties, but expanded uncertainties can have great utility, provided the k factor is carefully chosen and indicated [2, 4, 6]13. All supposa-ble uncertainty sources (of types A and B)14, must be considered. Uncertainty components are concerned with contaminations, matrix effects, corrections, lack of stability or of stoichiometry, impurities in reagents, instrument non-linearities and calibrations, inherent uncertainties in standard methods, and uncertainties from subsample selection. Explicitly excluded may have to be sample selection in the field before submission to the laboratory and contamination prior to sample submission to the laboratory. The responsibility for adhering to the protocol s procedures, for which the planned complete uncertainty budget applies, rests with the laboratory and the analyst in charge of the measurement. [Pg.21]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...