Model diagnostics

The model diagnostic tools suggest the inherent dimensionality of this data set is 2. th two principal components, the percent variance described is 99.4% and the residuals appear reasonably random and are small in magnitude. The RMSECV PCA is not revealing, but also does not contradict this conclusion. [Pg.52]

PerceiTt ariance Plot (Model Diagnostic) The percent variance described by each festor shown in Figure 4.35 reveals that more than 92% of the variance is desribed by the first principal component. Without further investigation it is ncG clear whether other principal components are also significant. [Pg.57]

Classification Table (Model Diagnostic) The classification table can aid in determining which classes are not well separated. It summarizes how the samples from each of the classes are classified for a given value of K. Table -t.7 is the Classification Table for this example when considering three nearest neighbors. The first row indicates that the 10 samples in class A are all classified as belonging to class A. Similarly, the second row indicates that all 10 class... [Pg.64]

Habit 4. Examine the ResultsA alidate the Model Incorrect Classification Plot (Model Diagnostic) The results from the leavc-one-om KNN classification as a function of K are shown in Figure 4.54. The calculations were performed for K = 1-3 because the class with the smallest number of samples (PP) has three members. For any of these values of K, the classific ns are always correct for all 29 samples. This implies that the classes are Jewell separated. [Pg.69]

Percent Variance Plot (Model Diagnostic) The percent variances described for the entire training set are shown in Table 4.17. These results indicate that the three classes occupy three or four dimensions in row space. Further examination of the other diagnostics will help refine this estimate. [Pg.77]

Root an Square Error of Cross Validation for PCA Plot (Model Diagnostic) Figuir 4.83 shows that the RMSECV PCA decreases significantly after the first and ssond PCs arc added, but the deercase is much smaller when additional PCs are aiifed. This implies that a two-component PCA. model is appropriate. [Pg.89]

Root Mem Square Error of Prediction (RMSEP) (Model Diagnostic) The RMSEP is anciaer diagnostic for examining the errors in the predicted concentrations. Whie the statistical prediction error discussed earlier quantifies preci-... [Pg.105]

Root Mea Square Error of Prediction (RMSEP) (Model Diagnostic) The RMSEP values for all four components are numerically summarized in Table 5.6. They are large owing to the bias in the predictions. Several reasons for this bias can be proposed, including an inaccurate reference method, transcription errcKS, poor experimental procedures, changes in densiw and/or pathlength, l t scatter in the instrument or sample, chemical interactions,... [Pg.113]

Estimated Pure Spectra Plot (Model Diagnostic) Figure 5 34 shows the three estimated pure spectra for this example. These are evaluated given... [Pg.115]

CalibraJwn Measurement Residual Plot (Model Diagnostic) After the pure specta are estimated (S), they are used with the original C matrix to generate esti es of the mixture spectra (R CS). These are then used to calculate a caUbration residual matrix which contains the portion of the mixture spectra that are not fit by the estimated pures (Equation 5.18). [Pg.116]

Estimated Pure Spectra Plot (Model Diagnostic) Tlie estimated pure spectra shown in Figure 5-45 reveal a caustic pure spectrum that has a negative peak. WithNIR the pure spectra are expected to have positive bands. However, in this example, this expectation is not reasonable because peak perturbations are being modeled and a first derivative has been used. [Pg.120]

Uncertainty in Pure Spectra (Model Diagnostic) The caustic spectrum and uncertainties plotted in Figure 5-46 reveal large uncertainties throughout the spectral region. The uncertainties in the other pure-component spectra show similar results. [Pg.120]

Calibration Measurement Residual Plot (Model Diagnostic) The magnitude of the calibration spectral residuials shown in Figure 5.47 are large relative to the original preprocessed data (Figure 5-44), and they also have nonrandom features. These observations indicate a potential problem with the... [Pg.120]

Model and Parameter Sta stics (Model Diagnostic) Table 5-13 displays the variables selected for a model constructed to predict caustic. The table lists summary statistics for the regression model as weU as information about the estimated regression coefficients. Six variables in addition to an intercept are found to be significant at the 95% confidence level. [Pg.140]

Root Mean Square Error of Calibration (RMSEC) Plot (Model Diagnostic) The RMSEC as a function of the number of variables included in the model is shown in Figure 5-77. It decreases as variables are added to the model and the largest decrease is observed between a one- and two-variable model. The reported error in the reference caustic concentration is approximately 0.033 vrt.% (la). The tentative conclusion is that four variables are appropriate because the RMSEC is less than the reference concentration error after five variables are included in the model. [Pg.140]

Root Mean Square Error of Prediction (RMSEP) Plot (Model Diagnostic) The validation set is employed to determine the optimum number of variables to use in the model based on prediction (RMSEP) rather than fit (RMSEO- RM-SEP as a function of the number of variables is plotted in Figure 5.7S for the prediction of the caustic concentration in the validation set, Tlie cuive levels off after three variables and the RMSEP for this model is 0.053 Tliis value is within the requirements of the application (lcr= 0.1) and is not less than the error in the reported concentrations. [Pg.140]

Several diagnostic tools are discussed below and a summary is foimd at the end of the section in Table 5.18. These tools are used to investigate three aspects of the data set the model, the samples, and the variables. The headings for each tool indicate the aspects that are studied with that tool. The primary use of the model diagnostic tools is to determine the optimum rank of the model. The sample diagnostic tools are used to study the relationships between the samples and identify unusual samples. The variable diagnostic tools do the same, but for the variables. [Pg.148]

Percent Variance Table (Model Diagnostic) Tlie first diagnostic is the percent variance explained for both the concentration and the measurement data. The results for this example are listed in Table 5-15 as a function of the number of factors included in the model. The percent variance explained is the amount of variance explained by a model with a given number of factors relative to the total variance in the data set. [Pg.148]

Percent Variance Table (Model Diagnostic) The percent variance explained for the corrected data is shown in Table 5-17. Comparing these results to those found in Table 5.15, a more continuous behavior of the percent variance expired is seen for the concentration data as a function of factor number. With three fectors, almost all of the variance in the concentration data has been explained and there is just a small amount of spectral variance remaining. The results are reasonable given the fact that three chemical components are known to be varying in the samples. [Pg.154]

Root Mean Square Error of Prediction (RMSEP) Plot (Model Diagnostic) The new RMSEP plot in Figure 5-100 is more well behaved than the plot shown in Figure 5-93 (with the incorrect spectrum 3). A minimum is found at 3 factors with a corresponding RMSEP that is almost two orders of magnitude smaller than the minimum in Figure 5-93- The new RMSEP plot shows fairly ideal behavior with a sharp decrease in RMSEP as factors are added and then a slight increase when more than three factors are included. [Pg.154]

A Six-factor PLS model was found to be optimal based on the model diagnostic tools. The measures of performance for die final model are as follows (see Table 5.18 for a description of these figures of merit) ... [Pg.167]

One strength is the excellent model-diagnostic capabilities. Tliese diagnostics help assess the confidenee hat can be placed in the model. [Pg.173]

Root Mean Square Error of Cross Validation for PCA Plot (Model Diagnostic) As described above, the residuals from a standard PCA calculation indicate how the PCA model fits the samples that were used to construction the PCA model. Specifically, they are the portion of the sample vectors that is not described by the model. Cross-validation residuals are computed in a different manner, A subset of samples is removed from the data set and a PCA model is constructed. Then the residuals for the left out samples are calculated (cross-validation residuals). The subset of samples is returned to the data set and the process is repeated for different subsets of samples until each sample has been excluded from the data set one time. These cross-validation residuals are the portion of the left out sample vectors that is not described by the PCA model constructed from an independent sample set. In this sense they are like prediction residuals (vs. fit). [Pg.230]

Incorrect Oassification Plot (Model Diagnostic) The first diagnostic to examine is a plot of incorrect classifications versus the number of nearest neighbors, shown in Figure 4.46. [Pg.242]

Classiflcation Table (Model Diagnostic) The classification table also indi-ates that the classes are clearly distinguishable using these measurements (see Table 4.14). This table is identical whether K = 1, 2, or 3. [Pg.248]

PCA of Class B—Percent Variance Plot (Model Diagnostic) The first principal component describes 99.15% of the variance, the second describes 0.85%, and the third describes less than 0.01%. Assuming the noise in the data is measured to be greater than or equal to 0.01% of the variation, one would infer that these data lie on a two-dimensional plane. [Pg.254]

Root Mean Square Ei ror of Cross-Validation for PCA Plot (Model Diagnostic) Figure 4.63 displays the RMSECV PCA vs. number of principal components for the class B data from a leave-one-out cross-validation calculation. The RMSECy PCA quickly drops and levels off at two principal components, consistent with the choice of a rank tv- o model. [Pg.254]

PCA of TEA—Percent Variance Plot (Model Diagnostic) For the TEA class the first through fourth PCs describe 97.3%. 2.2%. 0.2%, and 0.1% of the variation, respectively. This suggests that a rank of m o is appropriate, assuming the noise in the data is more than 0.2% of the variance. [Pg.267]

PCA of MEK—Percent Variance (Model Diagnostic) The first through fourth PCs of the MEK data describe 99.0%, 0.4%, 0.3%, and 0.1% of the variation, respectively. Assuming that the noise is greater than 0.4% of the variation, a one-component PCA model may be appropriate. [Pg.268]

Uncet tainty in Pure Spectra (Model Diagnostic) The pure-component spectra are estimated from a standard multiple linear regression calculation (Equation 5-16) and. therefore, error estimates are available. The error estimates for all pure spectra at variable are shown in Equation 5.17 ... [Pg.294]

Uncertainty in Pure Spectra ((Model Diagnostic) The caustic pure spectrum uncertainties shown in Figure 5.52 are smaller than with the previous model (Figure 5.46). [Pg.301]

Calibration Measurement Residuals Plot (Model Diagnostic) The calibration spectral residuals shown in Figure 5-53 are still structured, but are a factor of 4 smaller than the residuals when temperature was not part of the model Comparing with Figure 5-51, the residuals structure resembles the estimated pure spectrum of temperature. Recall that the calibration spectral residuals are a function of model error as well as errors in the concentration matrix (see Equation 5.18). Either of these errors can cause nonrandom features in the spectral residuals. The temperature measurement is less precise relative to the chemical concentrations and, therefore, the hypothesis is that the structure in the residuals is due to temperature errors rather than an inadequacy in the model. [Pg.301]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...