SIMCA classification

Often the goal of a data analysis problem requites more than simple classification of samples into known categories. It is very often desirable to have a means to detect oudiers and to derive an estimate of the level of confidence in a classification result. These ate things that go beyond sttictiy nonparametric pattern recognition procedures. Also of interest is the abiUty to empirically model each category so that it is possible to make quantitative correlations and predictions with external continuous properties. As a result, a modeling and classification method called SIMCA has been developed to provide these capabihties (29—31). [Pg.425]

This kind of statistical consideration is used to detect oudiers, ie, when a sample does not belong to any known group. It is also the basis of a variation of SIMCA called asymmetric classification, where only one category is modelable and distinguished from all others, which spread randomly through hyperspace. This type of problem is commonly encountered in materials science, product quaUty, and stmcture—activity studies. [Pg.426]

A useful tool in the interpretation of SIMCA is the so-called Coomans plot [32]. It is applied to the discrimination of two classes (Fig. 33.18). The distance from the model for class 1 is plotted against that from model 2. On both axes, one indicates the critical distances. In this way, one defines four zones class 1, class 2, overlap of class 1 and 2 and neither class 1 nor class 2. By plotting objects in this plot, their classification is immediately clear. It is also easy to visualize how certain a classification is. In Fig. 33.18, object a is very clearly within class 1, object b is on the border of that class but is not close to class 2 and object c clearly belongs to neither class. [Pg.231]

In SIMCA, we can determine the modelling power of the variables, i.e. we measure the importance of the variables in modelling the elass. Moreover, it is possible to determine the discriminating power, i.e. which variables are important to discriminate two classes. The variables with both low discriminating and modelling power are deleted. This is more a variable elimination procedure than a selection procedure we do not try to select the minimum number of features that will lead to the best classification (or prediction rate), but rather eliminate those that carry no information at all. [Pg.237]

H. Van der Voet and P.M. Coenegracht, The evaluation of probabilistic classification methods. Part 2. Comparison of SIMCA, ALLOC, CLASSY and LDA. Anal. Chim. Acta, 209 (1988) 1-27. [Pg.240]

H. Van Der Voet, P.M.J. Coenegracht and J.B. Hemel, New probabilistic versions of the Simca and Classy classification methods. Part 1. Theoretical description. Anal. Chim. Acta, 192 (1987) 63-75. [Pg.241]

H. van der Voet and D.A. Doornbos, The improvement of SIMCA classification by using kernel density estimation. Part 1. Anal. Chim. Acta, 161 (1984), 115-123 Part 2. Anal. Chim. Acta, 161 (1984) 125-134. [Pg.241]

In contrast, SIMCA uses principal components analysis to model object classes in the reduced number of dimensions. It calculates multidimensional boxes of varying size and shape to represent the class categories. Unknown samples are classified according to their Euclidean space proximity to the nearest multidimensional box. Kansiz et al. used both KNN and SIMCA for classification of cyanobacteria based on Fourier transform infrared spectroscopy (FTIR).44... [Pg.113]

There are many classification methods apart from linear discriminant analysis (Derde et al. [1987] Frank and Friedman [1989] Huberty [1994]). Particularly worth mentioning are the SIMCA method (Soft independent modelling of class analogies) (Wold [1976] Frank [1989]), ALLOC (Coomans et al. [1981]), UNEQ (Derde and Massart [1986]), PRIMA (Juricskay and Veress [1985] Derde and Massart [1988]), DASCO (Frank [1988]), etc. [Pg.263]

Frank IE (1989) Classification models discriminant analysis, SIMCA, CART. Chemom Intell Lab Syst 5 247... [Pg.284]

Use of multivariate approaches based on classification modelling based on cluster analysis, factor analysis and the SIMCA technique [98,99], and the Kohonen artificial neural network [100]. All these methods, though rarely implemented, lead to very good results not achievable with classical strategies (comparisons, amino acid ratios, flow charts) and, moreover it is possible to know the confidence level of the classification carried out. [Pg.251]

R. Checa Moreno, E. Manzano, G. Miron, L.F. Capitan Vallvey, Comparison between Traditional Strategies and Classification Technique (SIMCA) in the Identification of Old Proteinaceous Binders, Talanta, 75 (3), 697 704 (2008). [Pg.258]

Nonetheless, a sub-set belonging to one class may very likely be normally distributed. In this case a PCA calculated on one class cannot work in describing data belonging to another class. In this way, the membership of data to each class can be evaluated. This aspect is used by a classification method called SIMCA (Soft Independent Modelling of Class Analogy). It is a clever exploitation of the limitations of PCA to build a classification methodology [20]. [Pg.156]

Points with a constant Euclidean distance from a reference point (like the center) are located on a hypersphere (in two dimensions on a circle) points with a constant Mahalanobis distance to the center are located on a hyperellipsoid (in two dimensions on an ellipse) that envelops the cluster of object points (Figure 2.11). That means the Mahalanobis distance depends on the direction. Mahalanobis distances are used in classification methods, by measuring the distances of an unknown object to prototypes (centers, centroids) of object classes (Chapter 5). Problematic with the Mahalanobis distance is the need of the inverse of the covariance matrix which cannot be calculated with highly correlating variables. A similar approach without this drawback is the classification method SIMCA based on PC A (Section 5.3.1, Brereton 2006 Eriksson et al. 2006). [Pg.60]

Vanden Branden, K., Hubert, M. Chemom. Intell. Lab. Syst. 79, 2005, 10-21. Robust classification in high dimensions based on the SIMCA method. [Pg.263]

The main classification methods for drug development are discriminant analysis (DA), possibly based on principal components (PLS-DA) and soft independent models for class analogy (SIMCA). SIMCA is based only on PCA analysis one PCA model is created for each class, and distances between objects and the projection space of PCA models are evaluated. PLS-DA is for example applied for the prediction of adverse effects by nonsteroidal anti-... [Pg.63]

A principal components multivariate statistical approach (SIMCA) was evaluated and applied to interpretation of isomer specific analysis of polychlorinated biphenyls (PCBs) using both a microcomputer and a main frame computer. Capillary column gas chromatography was employed for separation and detection of 69 individual PCB isomers. Computer programs were written in AMSII MUMPS to provide a laboratory data base for data manipulation. This data base greatly assisted the analysts in calculating isomer concentrations and data management. Applications of SIMCA for quality control, classification, and estimation of the composition of multi-Aroclor mixtures are described for characterization and study of complex environmental residues. [Pg.195]

Another feature of SIMCA that is of considerable utility lies in the assistance the technique provides in selecting relevant variables. Information contained in the residuals, ei -, can be used to select variables relevant to the classification objective. If the residuals for a variable are not well predicted by the model, the standard deviation is large. An expression defined as modeling power has been defined to quantitatively express this relationship. The modeling power (MPOW) is defined as ... [Pg.206]

SIMCA can be applied to the problem of classification when attempting to correlate measurable effect variables with composition of the classified samples. In correlation analyses one may wish to determine how other sample variables, such as sediment composition, organic content, lipid concentration, etc., influence the composition of measured residues or concentrations of PCBs. [Pg.209]

In the discussion that follows, the SIMCA method is illustrated by applying it to three problems (1) quality assurance of chromatography data, (2) classification of unknowns, and (3) predicting the composition of unknown samples. This third problem is one of deconvolution of a mixture and calculation of the relative concentration of the constituents (25. 38). [Pg.210]

Classification To illustrate the use of SIMCA in classification problems, we applied the method to the data for 23 samples of Aroclors and their mixtures (samples 1-23 in Appendix I). In this example, the Aroclor content of the three samples of transformer oil was unknown. Samples 1-4, 5-8, 9-12 and 13-16, were Aroclors 1242, 1248, 1254, and 1260, respectively. Samples 17-20 were 1 1 1 1 mixtures of the Aroclors. Application of SIMCA to these data generated a principal components score plot (Figure 12) that shows the transformer oil is similar, but not... [Pg.216]

From the outset acoustic chemometrics is fully dependent upon the powerful ability of chemometric full spectrum data analysis to elucidate exactly where in the spectral range (which frequencies) the most influential information is found. The complete suite of chemometric approaches, for example PCA, PLS regression, SIMCA (classification/discrimination) are at the disposition of the acoustic spectral data analyst there is no need here to delve further into this extremely well documented field. (See Chapter 12 for more detail.)... [Pg.284]

Although the development of a SIMCA model can be rather cumbersome, because it involves the development and optimization of J PCA models, the SIMCA method has several distinct advantages over other classification methods. First, it can be more robust in cases where the different classes involve discretely different analytical responses, or where the class responses are not linearly separable. Second, the treatment of each class separately allows SIMCA to better handle cases where the within-class variance structure is... [Pg.396]

Although the SIMCA method is very versatile, and a properly optimized model can be very effective, one must keep in mind that this method does not use, or even calculate, between-class variability. This can be problematic in special cases where there is strong natural clustering of samples that is not relevant to the problem. In such cases, the inherent interclass distance can be rather low compared to the mtraclass variation, thus rendering the classification problem very difficult. Furthermore, from a practical viewpoint, the SIMCA method requires that one must obtain sufficient calibration samples to fully represent each of the J classes. Also, the on-line deployment of a SIMCA model requires a fair amount of overhead, due to the relatively large number of parameters and somewhat complex data processing instructions required. However, there are several current software products that facilitate SIMCA deployment. [Pg.397]

Distance-based methods possess a superior discriminating power and allow highly similar compounds (e.g. substances with different particle sizes or purity grades, products from different manufacturers) to be distinguished. One other choice for classification purposes is the residual variance, which is a variant of soft independent modeling of class analogy (SIMCA). [Pg.471]

After determining the underlying factors which affect local precipitation composition at an Individual site, an analysis of the slmlllarlty of factors between different sites can provide valuable Information about the regional character of precipitation and Its sources of variability over that spatial scale. SIMCA ( ) Is a classification method that performs principal component factor analysis for Individual classes (sites) and then classifies samples by calculating the distance from each sample to the PGA model that describes the precipitation character at each site. A score of percent samples which are correctly classified by the PGA models provides an Indication of the separability of the data by sites and, therefore, the uniqueness of the precipitation at a site as modeled by PGA. [Pg.37]

Table IV SIMCA results, classification matrix for fractional concentrations at three sites.

$Table IV SIMCA results, classification matrix for fractional concentrations at three sites.$

Pattern recognition has been applied In many forms to various types of chemical data (1,2). In this paper the use of SIMCA pattern recognition to display data and detect outliers In different types of air pollutant analytical data Is Illustrated. Pattern recognition Is used In the sense of classification of objects Into sets with emphasis on graphical representations of data. Basic assumptions which are Implied In the use of this method are that objects In a class are similar and that the data examined are somehow related to this similarity. [Pg.106]

Only one class modeling method is conmonly applied to analytical data and this is the SIMCA method ( ) of pattern recognition. In this method the class structure (cluster) is approximated by a point, line, plane, or hyperplane. Distances around these geometric functions can be used to define volumes where the classes are located in variable space, and these volumes are the basis for the classification of unknowns. This method allows the development of information beyond class assignment ( ). [Pg.246]

Since SIMCA is a class modeling method, class assignment is based on fit of the unknowns to the class models. This assignment allows the classification result that the unknown is none of the described classes, and has the advantage of providing the relative geometric portion of the newly classified object. This makes it possible to assess or quantitate the test sample in terms of external variables that are available for the training sets. [Pg.249]

The SIMCA method of pattern recognition is in a comprehensive set of programs for classification, and we have discussed how it works in this regard. Classification problems represent only a few of types of problems that can be solved with this approach. [Pg.249]

SIMCA uses PCA to model the shape and position of the object formed by the samples in row space for class definition. A multidimensional box is constructed for each class and the classification of future samples (prediction) is performed by determining within which box, if any, the sample lies. Tltis is in contrast to KNN, where only the physical closeness of samples in space is used for ckssification. [Pg.72]

There are many results to be reviewed because there are multiple classes for which SIMCA models are constructed and validated. The order in which to examine the results is a matter of preference, and many approaches are equally appropriate. We will review one SIMCA model at a time, and examine the test set predictions for that one model against samples from all classes. Ideal performance of a SIMCA model means that it includes as part of the class those samples that truly belong to the class and excludes those samples that are from all of the other classes. In reality, a number of classification scenarios are possible. Table A. 18 lists the possibilities along with possible root causes for misclassified test samples. [Pg.80]

Summary of Prediction Diagnostic Tools for SIMCA From the prediction diagnostics, the conclusion is that unknow.as 1 and 4 do not belong to either of the TEA or MEK classes. Sample 3 is a member of the TEA class and sample 2 is a member of the MEK class. There is considerable reliability in the classifications due to the large values for the excluded samples both in the validation and prediction phases. The residuals and score plots are consistent with the values. [Pg.273]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...