Prediction samples

This yields an estimate for the bias (intercept) a and slope b needed to correct predictions yg from the new (child) instrument that are based on the old (parent) calibration model, b. The virtue of this approach is its simplicity one does not need to investigate in any detail how the two sets of spectra compare, only the two sets of predictions obtained from them are related. The assumption is that the same type of correction applies to all future prediction samples. Variations in conditions that may have a different effect on different samples cannot be corrected for in this manner. [Pg.376]

The first one we mention is the question of the validity of a test set. We all know and agree (at least, we hope that we all do) that the best way to test a calibration model, whether it is a quantitative or a qualitative model, is to have some samples in reserve, that are not included among the ones on which the calibration calculations are based, and use those samples as validation samples (sometimes called test samples or prediction samples or known samples). The question is, how can we define a proper validation set Alternatively, what criteria can we use to ascertain whether a given set of samples constitutes an adequate set for testing the calibration model at hand ... [Pg.135]

The distances of the N calibration samples from the prediction sample in the space are ranked from least to greatest. [Pg.393]

Figure 12.17 Scatter plot of the first two PC scores obtained from PCA analysis of the polyurethane foam spectra shown in Figure 12.16. Different symbols are used to denote samples belonging to the four known classes. The designated prediction samples are denoted as a solid circle (sample A) and a solid triangle (sample B).

If there is a tie between two or more classes, the class whose samples have the lowest combined distance from the prediction sample is selected. [Pg.394]

The and Q metrics are the highest-level metrics for abnormality detection. For any given prediction sample that produces a high or interesting or Q value, one can also obtain the contributions of each analyzer variable to the and Q value, aptly named the t- and q-contributions, which are defined below ... [Pg.431]

Standardizing the predicted values is a simple, useful choice that ensures smooth calibration transfer in situations (a) and (b) above. The procedure involves predicting samples for which spectra have been recorded on the slave using the calibration model constructed for the master. The predicted values, which may be subject to gross errors, are usually highly correlated with the reference values. The ensuing mathematical relation, which is almost always linear, is used to correct the values subsequently obtained with the slave. [Pg.478]

A list of the class(es) of the K nearest neighbors for a given prediction sample. [Pg.68]

Figure 4.92. Plot of prediction samples. Solid, unknown 1 dashed, unknown 2 dotted, unknown 3 dashed-dotted, unknown 4. The values for i1-i7 are the sensor intensities and s1-s7 are the sensor slopes.

In contrast w DCLS, the ptire spectra in the indirect approach are not measured direcfly, but are estimated from mixture spectra. One reason for using ICLS is that a is not possible to physically separate die components (e.g., when one cd the components of interest is a gas and future prediction samples are mixtures of the gas dissolved in a liquid). Indirect CLS is also used when the model assumptions do not hold if the pure component is run neat. By preparing mixtures, it is possible to dilute a strongly absorbing component so that the modd assumptions hold. [Pg.114]

FIGURE 5.60. Statistical prediction errors for the prediction samples, with the maximum from valcSation indicated by the horizontal line. [Pg.126]

In Equation 5.28, s is a function of the concentration residuals observed during calibration, r is tlie measurement vector for the prediction sample, and R contains the calilxation measurements for the variables used in the model. Because the assumptions of linear regression are often not rigorously obeyed, the statistical pret ion error should be used empirically rather than absolutely. It is useful for validating the prediction samples by comparing the values for... [Pg.135]

Statistical Prediction Errors The statistical prediction errors are plotted in Figure 5.86, with the maximum from the model validation denoted by the horizontal line. All prediction samples fall below this line, indicating that there are no unusual samples. [Pg.144]

Raw Measurement Plot The taw data for the prediction samples is not usually plotted given that the diagnostics did not flag any samples as unusual. Tliey are displayed here because of the limited error detection of the other diagnostic tools. The pieprocessed data in Figure 5.87 do not re eal any unusual samples. [Pg.144]

For this example, the prediction sample leverages are shown in Figure 5.109 with a horizontal line denoting three times the average leverage of the calibration samples. In this case, all of the samples are below the calibration maximum and are comparable with each other except for sample 13. This is another indication that this sample is outside the calibration range. [Pg.160]

Figure 5 A10 displays the versus sample number for the 20 prediction samples. Three samples (1, 10. and 18) have values that are well above...

These residual specim show dtarh- wh)- the prediction samples have large caic It has been our experience that the residual spectra can be useful in... [Pg.161]

FIGURE 5.111. Spectral residuals of prediction samples 1, 10, and 18 using a three-factor PLS model, with the range of the calibration residuals shown by the horizontal line. [Pg.161]

Leverage is a measure of the location of a prediction sample in the calibration measurement row space. A high leverage indicates a sample that has an unusual score vector relative to the calibration samples. [Pg.162]

I niike calibration samples, the leverage values for prediction samples are not constrained to be less than 1. [Pg.162]

The model was applied to 99 prediction samples after preprocessing and the prediction restilts validated. [Pg.167]

Sample Leverage A plot of the leverage for the prediction samples is shown in Figure 5.124. All of the samples have leverage values less than three times the average leverage from the calibration (denoted by the horizontal line). Tliis is an indication that the model is not being used to extrapolate. [Pg.167]

The determined in th.c- enUbnttion phase is shown by the horizontal Ht c.. H arc le.ss indicating that the prediction samples... [Pg.167]

Measurement Residual Plot The spectral residuals for the prediction samples are shown in Figure 5-126 where the horizontal lines indicate the range of residuals from the model validation phase. Except for one sample, the magnitude and shape of the residuals are sin i ar lo he model validation residuals, which is consistent with the observed values. The sample with the unusual residual shape has an of 2.9 (sample 60), Although the shape ap-]iears unusual, the magnitude (which is what measures) is not. In practice, we would note this occurrence and accept the predicted value as being accurate because of the small magnitude of the residuals. [Pg.167]

FIGURE 5.125. values for the prediction samples, with F. from calibration shown as a horizontal line. [Pg.168]

FIGURE 5.1M. ectral residuals for all prediction samples, with the horizontal lines indicating the range of the calibration residuals. [Pg.168]

The reason for these misclassifications is that the training sample residuals for the rank two SIMCA model are larger than the rank three model. Therefore, prediction samples can have a larger residual and still be considered to be members of class C. [Pg.263]

The scores shows the location of the prediction sample relative to the training set samples. The plots are 2 or 3 dimensional representations of the row space. [Pg.266]

Statistical Prediction Errors (Model and Sample Diag Jostic) Uncertainties in the concentrations can be estimated because the predicted concentrations are regression coefficients from a linear regression (see Equations 5.7-5.10). These are referred to as statistical prediction errors to distinguish them from simple concentration residuals (c — c). Tlie statistical prediction errors are calculated for one prediction sample as... [Pg.281]

The statistical prediction errors for the unknowns are compared to the maximum statistical prediction error found from model validation in order to assess the reliability of the prediction. Prediction samples which have statistical prediction errors that are significantly larger than this criterion are investigated funher. In the model validation, the maximum error observed for component A is 0.025 (Figure 5-1 In) and 0.019 for component B (Figure 5.11b). For unknown 1, the statistical prediction errors are within this range. For the other unknowns, the statistical prediction errors are much larger. Therefore, the predicted concentrations should not be considered valid. [Pg.287]

Suppose that another series of samples is to be predicted using tlie DCLS model. Figure 5.24 displays t ie spectral residuals from the predictions. A problem is indicated because the residuals do not resemble those from the validation data (see Figure 5.18). Figure 5-25 display the spectra of the unknown samples, which reveal a random linear baseline witli variable offset. It is not obvious from the residuals that tliis is the problem, because CIS attempts to fit the baseline feature with the pure spectra. Wlien unexpected features appear in the spectra of the prediction samples, CLS compensates by overestimating the concentration of one or more of the pure components. Hie result is that the residuals have features tliat come from the pure speara as well as some remnant of the original unexpected features. This makes interpretation of the residual spectra difficult. [Pg.288]

The predicted caustic concentrations for 99 prediction samples are -plotted in Figure 5.59. [Pg.304]

Summat of Prediction Diagnostic Tools for ICLS, Example 2 Based on the prediction diagnostics, the conclusion is that the predicted values for 98 of 99 prediction samples are reasonable. Based on the range of validation concentration residuals (see Figure 5.56), the errors in the predicted caustic concentrations of the unknovsrns are expected to be within 0.17 wt.% corresponding to an RMSEP of 0.06 wt.%. [Pg.305]

Because the data are sy nthetic, it is known that four additional prediction samples are problematic. Samples 3 and 10 have twice and five times the noise le el of the other samples, respectively. Additionally, samples 15 and 20 are shifted by 0.4 and 0." units relative to the correct position. To show the... [Pg.317]

The predicted caustic concentrations for 99 prediction samples are plotted in Figure 5.85. The first derivative over the entire wavelength region is calculated before performing a prediction using the three-variable model. [Pg.322]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...