Class-modelling methods SIMCA

A first distinction which is often made is that between methods focusing on discrimination and those that are directed towards modelling classes. Most methods explicitly or implicitly try to find a boundary between classes. Some methods such as linear discriminant analysis (LDA, Sections 33.2.2 and 33.2.3) are designed to find explicit boundaries between classes while the k-nearest neighbours (A -NN, Section 33.2.4) method does this implicitly. Methods such as SIMCA (Section 33.2.7) put the emphasis more on similarity within a class than on discrimination between classes. Such methods are sometimes called disjoint class modelling methods. While the discrimination oriented methods build models based on all the classes concerned in the discrimination, the disjoint class modelling methods model each class separately. [Pg.208]

Only one class modeling method is conmonly applied to analytical data and this is the SIMCA method ( ) of pattern recognition. In this method the class structure (cluster) is approximated by a point, line, plane, or hyperplane. Distances around these geometric functions can be used to define volumes where the classes are located in variable space, and these volumes are the basis for the classification of unknowns. This method allows the development of information beyond class assignment ( ). [Pg.246]

In SIMCA, a class modeling method, a parameter called modeling power is used as the basis of feature selection. This variable is defined in Equation 4, where is the standard deviation of a vari-... [Pg.247]

Since SIMCA is a class modeling method, class assignment is based on fit of the unknowns to the class models. This assignment allows the classification result that the unknown is none of the described classes, and has the advantage of providing the relative geometric portion of the newly classified object. This makes it possible to assess or quantitate the test sample in terms of external variables that are available for the training sets. [Pg.249]

Nevertheless, in most of the electronic tongue applications found in the literature, classification techniques like linear discriminant analysis (LDA) and partial least squares discriminant analysis (PLS-DA) have been used in place of more appropriate class-modeling methods. Moreover, in the few cases in which a class-modeling technique such as soft independent modeling of class analogy (SIMCA) is applied, attention is frequently focused only on its classification performance (e.g., correct classification rate). Use of such a restricted focus considerably underutilizes the significant characteristics of the class-modeling approach. [Pg.84]

Then the next step consists on application of multivariate statistical methods to find key features involving molecules, descriptors and anticancer activity. The methods include principal component analysis (PCA), hiererchical cluster analysis (HCA), K-nearest neighbor method (KNN), soft independent modeling of class analogy method (SIMCA) and stepwise discriminant analysis (SDA). The analyses were performed on a data matrix with dimension 25 lines (molecules) x 1700 columns (descriptors), not shown for convenience. For a further study of the methodology apphed there are standard books available such as (Varmuza FUzmoser, 2009) and (Manly, 2004). [Pg.188]

Historically, SIMCA [43,44], proposed by Wold et al. in 1976, was the first class-modelling method introduced in the literature. Its key assumption is that the main systematic variability characterizing the samples from a category can be captured by a principal component model (see Chapter 4) of opportune dimensionality, built on training samples from that class. In detail, defining... [Pg.230]

The similarity of samples can be evaluated by using geometrical constructs based on the standard deviation of the objects modeled by SIMCA. By enclosing classes in volume elements in descriptor space, the SIMCA method provides information about the existence of similarities among the members of the defined classes. Relations among samples, when visualized in this way, increase one s ability to formulate questions or hypotheses about the data being examined. The selection of variables on the basis of MPOW also provides clues as to how samples within a class are similar, and the derived class model describes how the objects are similar, with regard to the internal variation of these variables. [Pg.208]

Although the development of a SIMCA model can be rather cumbersome, because it involves the development and optimization of J PCA models, the SIMCA method has several distinct advantages over other classification methods. First, it can be more robust in cases where the different classes involve discretely different analytical responses, or where the class responses are not linearly separable. Second, the treatment of each class separately allows SIMCA to better handle cases where the within-class variance structure is... [Pg.396]

The SIMCA distances from two class models (or from the two models of the same category obtained by different methods) are reported in Coomans diagrams (Fig. 26) to show the results of modelling-classification analysis. [Pg.124]

The SIMCA method has been developed to overcome some of these limitations. The SIMCA model consists of a collection of PCA models with one for each class in the dataset. This is shown graphically in Figure 10. The four graphs show one model for each excipient. Note that these score plots have their origin at the center of the dataset, and the blue dashed line marks the 95% confidence limit calculated based upon the variability of the data. To use the SIMCA method, a PCA model is built for each class. These class models are built to optimize the description of a particular excipient. Thus, each model contains all the usual parts of a PCA model mean vector, scaling information, data preprocessing, etc., and they can have a different number of PCs, i.e., the number of PCs should be appropriate for the class dataset. In other words, each model is a fully independent PCA model. [Pg.409]

When the SIMCA method is applied to the polyurethane data, it is found that two PCs are optimal for each of the four local PCA class models. When this SIMCA model is applied to the prediction sample A, it correctly assigns it to class 2. When the model is applied to prediction sample B, it is stated that this sample does not belong to any class,... [Pg.295]

Principal component analysis is central to many of the more popular multivariate data analysis methods in chemistry. For example, a classification method based on principal component analysis called SIMCA [69, 70] is by the far the most popular method for describing the class structure of a data set. In SIMCA (soft independent modeling by class analogy), a separate principal component analysis is performed on each class in the data set, and a sufficient number of principal components are retained to account for most of the variation within each class. The number of principal components retained for each class is usually determined directly from the data by a method called cross validation [71] and is often different for each class model. [Pg.353]

Theory. SIMCA is a parametric classification method introduced by Wold (29), which supposes that the objects of a given class are normally distributed. The particularity of this PCA-based method is that one model is built for each class separately, that is, disjoint class modeling is performed. The algorithm starts by determining the optimal number of PCs for each individual model with CV. The resulting PCs are then used to define a hypervolume for each class. The boundary around one group of objects is then the confidence limit for the residuals of all objects determined by a statistical T-test (30, 31). The direction of the PCs and the limits established for these PCs define the model of a class (Fig. 13.13). [Pg.312]

It often occurs that active compounds cannot be well separated from inactive ones using linear models such as PLS or LDA. This may be because the active compounds cluster together in an area of property space and they are surrounded by inactive compounds. Such data are called embedded or asymmetric data. Several methods have been developed to treat such data sets, the best known is the SIMCA algorithm. The SIMCA (soft independent modelling of class analogy) method is a tool for pattern... [Pg.362]

More often, the SIMCA method is used. This finds separate principal component models for each class. By using SIMCA, the object variable number ratio is less critical and the model is constructed around the projected, rather than the original, data. The basic steps of principal component calculations as needed for SIMCA have been outlined in the chapter on projection methods with the NIPALS algorithm (Example 5.1). [Pg.195]

UNEQ is applied only when the number of variables is relatively low. For more variables, one does not work with the original variables, but rather with latent variables. A latent variable model is built for each class separately. The best known such method is SIMCA. [Pg.212]

Classical supervised pattern recognition methods include /( -nearest neighbor (KNN) and soft independent modeling of class analogies (SIMCA). Both... [Pg.112]

There are many classification methods apart from linear discriminant analysis (Derde et al. [1987] Frank and Friedman [1989] Huberty [1994]). Particularly worth mentioning are the SIMCA method (Soft independent modelling of class analogies) (Wold [1976] Frank [1989]), ALLOC (Coomans et al. [1981]), UNEQ (Derde and Massart [1986]), PRIMA (Juricskay and Veress [1985] Derde and Massart [1988]), DASCO (Frank [1988]), etc. [Pg.263]

Nonetheless, a sub-set belonging to one class may very likely be normally distributed. In this case a PCA calculated on one class cannot work in describing data belonging to another class. In this way, the membership of data to each class can be evaluated. This aspect is used by a classification method called SIMCA (Soft Independent Modelling of Class Analogy). It is a clever exploitation of the limitations of PCA to build a classification methodology [20]. [Pg.156]

The main classification methods for drug development are discriminant analysis (DA), possibly based on principal components (PLS-DA) and soft independent models for class analogy (SIMCA). SIMCA is based only on PCA analysis one PCA model is created for each class, and distances between objects and the projection space of PCA models are evaluated. PLS-DA is for example applied for the prediction of adverse effects by nonsteroidal anti-... [Pg.63]

Unlike the methods discussed above, which strive to find directions in a common space that separate known classes, the SIMCA method [81] works on a quite different principle define a unique space for each class, define class-specific models using each of these spaces, and then apply any unknown sample to all of these models in order to assess class membership. [Pg.396]

Although the SIMCA method is very versatile, and a properly optimized model can be very effective, one must keep in mind that this method does not use, or even calculate, between-class variability. This can be problematic in special cases where there is strong natural clustering of samples that is not relevant to the problem. In such cases, the inherent interclass distance can be rather low compared to the mtraclass variation, thus rendering the classification problem very difficult. Furthermore, from a practical viewpoint, the SIMCA method requires that one must obtain sufficient calibration samples to fully represent each of the J classes. Also, the on-line deployment of a SIMCA model requires a fair amount of overhead, due to the relatively large number of parameters and somewhat complex data processing instructions required. However, there are several current software products that facilitate SIMCA deployment. [Pg.397]

Distance-based methods possess a superior discriminating power and allow highly similar compounds (e.g. substances with different particle sizes or purity grades, products from different manufacturers) to be distinguished. One other choice for classification purposes is the residual variance, which is a variant of soft independent modeling of class analogy (SIMCA). [Pg.471]

After determining the underlying factors which affect local precipitation composition at an Individual site, an analysis of the slmlllarlty of factors between different sites can provide valuable Information about the regional character of precipitation and Its sources of variability over that spatial scale. SIMCA ( ) Is a classification method that performs principal component factor analysis for Individual classes (sites) and then classifies samples by calculating the distance from each sample to the PGA model that describes the precipitation character at each site. A score of percent samples which are correctly classified by the PGA models provides an Indication of the separability of the data by sites and, therefore, the uniqueness of the precipitation at a site as modeled by PGA. [Pg.37]

SIMCA (each class described by a PC model). The basic idea of the SIMCA method is that multivariate data measured on a group of similar objects, a proper class are well approximated by a simple PC model. [Pg.85]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...