SIMCA model

M. Forina, G. Drava and G. Contarini, Feature selection and validation of SIMCA models a case study with a typical Italian cheese. Analusis 21 (1993) 133-147. [Pg.241]

A SIMCA model is actually an assembly of J class-specific PCA models, each of which is built using only the calibration samples of a single class. At that point, confidence levels for the Hotelling P and Q values (recall Equations 12.21 and 12.22) for each class can be determined independently. A SIMCA model is applied to an unknown sample by applying its analytical profile to each of the J PCA models, which leads to the generation of J sets of Hotelling P and Q statistics for that sample. At this point, separate assessments of the unknown sample s membership to each class can be made, based on the P and Q values for that sample, and the previously determined confidence levels. [Pg.396]

Although the development of a SIMCA model can be rather cumbersome, because it involves the development and optimization of J PCA models, the SIMCA method has several distinct advantages over other classification methods. First, it can be more robust in cases where the different classes involve discretely different analytical responses, or where the class responses are not linearly separable. Second, the treatment of each class separately allows SIMCA to better handle cases where the within-class variance structure is... [Pg.396]

Although the SIMCA method is very versatile, and a properly optimized model can be very effective, one must keep in mind that this method does not use, or even calculate, between-class variability. This can be problematic in special cases where there is strong natural clustering of samples that is not relevant to the problem. In such cases, the inherent interclass distance can be rather low compared to the mtraclass variation, thus rendering the classification problem very difficult. Furthermore, from a practical viewpoint, the SIMCA method requires that one must obtain sufficient calibration samples to fully represent each of the J classes. Also, the on-line deployment of a SIMCA model requires a fair amount of overhead, due to the relatively large number of parameters and somewhat complex data processing instructions required. However, there are several current software products that facilitate SIMCA deployment. [Pg.397]

SIMCA modeling was utilized to determine the separability of the samples collected at the three different sites. The results presented In Table IV Indicate the model cannot separate the samples from the West Seattle and Maple Leaf sites. Since both of these sites are located downwind of the major regional emission sources and experience similar meteorology their rainwater composition Is similar. The Tolt reservoir site Is separated from the Seattle sites with 79 percent of the samples collected there correctly classified by the SIMCA model. This site Is believed to be Influenced by the same emission sources as the other two sites but experiences different meteorological conditions (primarily longer transport times and more frequent and larger quantity of rainfall) due to Its location In the foothills of the Cascade Mountains (elevation 550 meters). Considering the uncertainty In the reported concentrations (see Table VII) and the similar air pollution emission sources the SIMCA results are reasonable. [Pg.42]

A softwaar package is used to calculate the SIMCA model. The steps we use to gcncrate iJSe SIMCA models are found in Table 4.16. [Pg.75]

Steps a and 2 are discussed in detail in Sections 4.2,1 and 4.2.2 (PCA and HCA). In ep 3, the training set is divided into calibration and test sets to facilitate the estimation of the SIMCA models. Typically, we leave more than half of the dacEin the calibration set. It is also a good practice to repeat the calibration proo ure in Table 4.16 with different selections of calibration and test sets. An Amative to separate test sets is to implement some form of cross-validation. Bit example, Icave-one-out cross-validation can be performed where each sair is left out and predicted one at a time. [Pg.75]

Using theadibration set, constnia SIMCA models for each class with initial settings for rank asd boundary distance. [Pg.75]

Predict thcclass of the test set samples using the initial SIMCA models. [Pg.75]

Usii theajwnbined calibration and test sets, construct final SIMCA models for each class usingdie rank and boundary settings determined in steps 4-6. [Pg.75]

When the PCA analysis is completed and the calibration and validation sets are chosen, the next step is to create SIMCA models for the calibration set samples. The initial rank estimates from PCA and software default class volumes are used, the performance of the models on the test samples is examined, and the SIMCA settings are adjusted as necessary. [Pg.80]

There are many results to be reviewed because there are multiple classes for which SIMCA models are constructed and validated. The order in which to examine the results is a matter of preference, and many approaches are equally appropriate. We will review one SIMCA model at a time, and examine the test set predictions for that one model against samples from all classes. Ideal performance of a SIMCA model means that it includes as part of the class those samples that truly belong to the class and excludes those samples that are from all of the other classes. In reality, a number of classification scenarios are possible. Table A. 18 lists the possibilities along with possible root causes for misclassified test samples. [Pg.80]

Qass A SIMCA Model Validation—Values (Model and Sample Diagnostic) The most direct procedure for performance evaluation of the SIMCA models is to examine how well the models classified the samrics fn m. riJ of... [Pg.80]

All of She class C validation samples arc predicted to be outside of the class B SIMCA model. The minimum values are on the order of 100 and the maximum value is over 3000. Based on these results, it does not appear that there is aiw overlap of classes C and B. [Pg.82]

Figure 4.75 SIMCA model validation for Class C. The letters indicate the class in which the validation sample is a member.

All of the class A samples are correctly excluded from class C. For this scenario, the size of the box for the class C SIMCA model is increased and the resulting values for the class A test samples are examined. [Pg.84]

For this example, the size of the box is manipulated to iliustrate a point. In general, the size of the box is set such that all samples arc correctly included and excluded during the validation phase. If the SIMCA model using a default box size includes samples from other classes, smaller bo.xcs can be used in an attemnt to cxc ui. e these antnles,... [Pg.84]

The final seep of constructing the SIMCA models is to merge the calibration and test s iples for each of the classes and reconstruct new SIMCA models using ail of e data. The rank and boundary parameters determined in Habit 4 are used fer the final models. These models are used to predict the class(cs) of unknown smples. Table 4.24 contains the values for three unknown samples where the empirically determined critical value is 1.6. From the values, the aaclusions are that unknown 1 is not a member of any class in the training ses unknown 2 is a member of class B, and tonknown 3 is a member of both classes A and B. [Pg.85]

P jjj has already validated the predictions to some extent. A very large P value (e.g., unknown 2 on class A has a calc of 11,000) indicates a very large difference ijetween the unknown sample and the calibration samples from that class. Unknown 1 has large P j values for all of the classes with most of the conisSbution coming from the PCA measurement residual. Unknown 2 is within the box of all SIMCA models but has large PCA contributions with classes A and C. Unknown 3 is excluded only from class C primarily because of a large contribution from the distance term (the expected value is zero). [Pg.85]

Measurment Residual Plot There are residual plots for each unknown sample for every SIMCA model. Tlie residual spectra for samples that belong to a class are expected to resemble in magnitude and shape normally distributed noise as fotsrd in the training set Depending on the structure of the residuals, it may be possible to identify failures in the instrument (e.g., excessive noise) or chemical differences between tlie calibration and unknown samples (e.g., peaks in the residuals). The residual plot may help identify why a sample is not classified iiso any given class. [Pg.85]

The procedure from Table 4.16 was followed to construct the SIMCA models. Habit 4. Examine the Results / Validate the Model... [Pg.88]

The PCA results for the two individual classes are examined first. The PCA of the entire training set is performed in PCA Example 2 in Section 4.2.2.2. In the current PCA analysis, the nmks and boundaries are chosen and then the SIMCA models are constructed and validated. [Pg.88]

From the PCA analysis, it was concluded that appropriate ranks for the TEA and MEK SI.MCA models are two and one, respectively. The next step is to construct SIMCA models and test their performance on validation samples. The ranks determined during the PCA analyses and the default settings for the class volume size for the models are used. [Pg.90]

To test the models, the training set is divided into calibration and validation sets, as shown in Table 4.26. The predictive ability of the TE and MEK SIMCA models is then evaluated using samples from all 10 classes. [Pg.90]

E4 SLMCA Validation—Values (Model and Sample Diagnostic) A rank two SIMCA model is used to generate the values for the three validation samples known to belong to the TE class (see Table 4.27). As is the desired result, each of tlie samples have an value smaller than the critical value and are therefore classified as TEA samples. [Pg.90]

Table 4.28 displays and for the three TEA test samples as a fianction of the rank of the TEA SIMCA model. For a rank of one or two, the validation samples are predicted to be in the class When the rank is three,... [Pg.91]

Because the number of samples in the calibration set is small, all of the TEA samples (c bration and validation) are used to construct a TEA SIMCA model that is tested against the validation samples from the rest of the training set... [Pg.91]

TABLE 4.29L PCA and Distance Contributions to for MEK SIMCA Model and MEKIsyidation Samples = 1.84) ... [Pg.92]

Examini F while changing rank confimts a rank one SIMCA model for MEK. Using rank one, the three validation samples are all predicted to be in the class, l g rank two or three, all validation samples are predicted to be outside ofiise MEK class (see Table 4.30). [Pg.92]

TABLE 4.3Q. Values for MEK Validation Samples Using SIMCA Models with Ranks One through Three... [Pg.92]

Summary of Validation Diagnostic Tools for SIMCA From the validation analysis S was found that the measurements and SIMCA models for TEA and MEK are adequate to distinguish these materials from the other chemicals considered is this study. The diagnostic tools also indicate that it is more difficult to discriuHiate between the classes at lower concentrations. [Pg.93]

The resute of predictions using the two SIMCA models on four unknown samples are own in Table 4.31- These preliminary results indicate that unknowns 1 and 4 are not a member of either class, unknown 2 is MEK and unknown 3STEA. [Pg.93]

Scores Mot The score plot is used to examine the location of the samples in the PCS space. A three-dimensional PCA scores plot for TEA model is shown in iure 4.93. (Keep in mind that only the first two PCs were used to construct TEA SIMCA model.) The TEA training set samples and the four unknowns are shown on this plot. Unknown 3, which was predicted to be... [Pg.93]

Figure 4.93. PCA of TEA SI MCA library samples (O) with unknowns (labeled with numbers). The first two principal components are used to make the TEA SIMCA model.

To c in a good representation of the size and shape of a class, many samples per class must be obtained for deriving the SIMCA models. [Pg.95]

To construct the multidimensional boxes, a training set of samples with known class ideniit) is obtained. The training set is divided into separate sets, one for each class, and principal components are calculated separately for each of the classes. The number of relevant principal components (rank) is determined for each class and the SIMCA models are completed by defining boundary regions for each of the PCA models. [Pg.251]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...