Scores plots problem

Such a result appears to be of major interest given that neither any classification of compounds, nor any training information was applied to the PCA model. A more detailed inspection of the score plot in Fig. 17.5 indicates that some compounds are misclassified, although experimental evaluation of these compounds revealed problems with their chemical stability or solubility. Thus, it appears that this model can be used to evaluate the false-positive (or false-negative) experiments. Moreover, it can also be used to evaluate the metabolic stability from the 3D structure of drug candidate prior to experimental measurements. [Pg.418]

Classification To illustrate the use of SIMCA in classification problems, we applied the method to the data for 23 samples of Aroclors and their mixtures (samples 1-23 in Appendix I). In this example, the Aroclor content of the three samples of transformer oil was unknown. Samples 1-4, 5-8, 9-12 and 13-16, were Aroclors 1242, 1248, 1254, and 1260, respectively. Samples 17-20 were 1 1 1 1 mixtures of the Aroclors. Application of SIMCA to these data generated a principal components score plot (Figure 12) that shows the transformer oil is similar, but not... [Pg.216]

Scores Plot (Sa nple Diagnostic) The score plots show the relationship of the samples in LS row space and are examined for consistency with what is known about dse data set. Look for unusual or inconsistent patterns which can indicate potential problems with the model and/or samples (see also PCA, Section 4.2.2). 1b the PCA discussion the scores are referred to as PCs, but in PLS they are referred to as factors. [Pg.153]

The complete design is seen in the score space with replicate center points clearly visible. Note that the interpretation of scores plots is not always as straightforward as in this example. The experimental design is not seen if the experiment is not well designed or if the problem is high dimensional. The level of impEcidy modeled components (e.g., component O also has an effect on the relative position of the samples in score space. For this example, the effect of C on the relative placement of the samples in score space is small. [Pg.156]

One way to resolve these inconsistencies is to plot liie spectra of samples 3 and 11 and compare the differences in the raw data. Figure 5.99a displays the two spectra and the difference spectrum. The ttv O spectra are the same except for slight differences which can be attributed to measurement noise. Sample 3 is known to be the problem given the known concentrations and the spatial relationsliip between the other samples in the scores plots (i.e., it should be in the lower center portion of the graph). This means that either sample 3 was incorrectly prepared to have the same concentration as sample II, or sample 11 was measured when sample 3 was thought to have been measured. When the spectrum of sample 3 is remeasured, the resulting spectrum is very different from sample 11 (see Figure 5.99b). [Pg.332]

Scores plots can be used to answer many different questions about the relationship between objects and more examples are given in the problems at the end of this chapter. [Pg.206]

Problem 4.5 Certification of NIR Filters Using PC Scores Plots... [Pg.258]

The systematic variation displayed by the score plots can be used in various ways for designing test sets for experimental studies. Which kind of design should be used will depend on the problem to be treated. [Pg.44]

The points of the selected test compounds in the score plots should have a sufficient spread. In this chapter it is discussed how such selections can be made to cope with some common problems. [Pg.429]

If the problem is to determine in which type of solvent the reaction can be run, an evident principle is to select representative test solvents from each class. In this respect, representative members of a class would be items which are not at the extreme ends of the subgroups in the score plot. For example, it would be better to choose isopropanol (17) as a typical alcohol rather than methanol (4). [Pg.433]

A similar problem is encountered when a promising solvent has been found in a screening experiment, e.g. by a "uniform spread" design. It is reasonable to assume that the preferred solvent has properties which are similar to those of a promising candidate. The next step is then to explore the solvents projected in the vicinity of the promising candidate in the score plot. This can be accomplished by a small "uniform spread" selection around the winning candidate, or by a simplex search described below. [Pg.437]

Scatter plots in PCA have special properties because the scores are plotted on the base P, and the columns of P are orthonormal vectors. Hence, the scores in PCA are plotted on an orthonormal base. This means that Euclidean distances in the space of the original variables, apart from the projection step, are kept intact going to the scores in PCA. Stated otherwise, distances between two points in a score plot can be understood in terms of Euclidian distances in the space of the original variables. This is not the case for score plots in PARAFAC and Tucker models, because they are usually not expressed on an orthonormal base. This issue was studied by Kiers [2000], together with problems of differences in horizontal and vertical scales. The basic conclusion is that careful consideration should be given to the interpretation of scatter plots. This is illustrated in Example 8.3. [Pg.192]

If we are interested in studying each sampling , a matrix X(, is obtained having I K rows and / columns. This approach is very straightforward in terms of computation, but since / X iif is usually a rather large number, the interpretation of the resulting score plot can give some problems. [Pg.231]

The distribution of the observations can be visualized using scatter plots. For obvious reasons, scatter plots are limited to three dimensions at most, and typically to two dimensions. Therefore, the direct observation of the data distribution in data sets with several tens, hundreds or even thousands of variables is not possible. One can always construct scatter plots for selective pairs or thirds of variables, but this is an overwhelming and often misleading approach. Projection models overcome this problem. PCA and PLS can be used straightforwardly to visualize the distribution of the data in the latent subspace, considering only a few latent variables (LVs) which contain most of the variability of interest. Scatter plots of the scores corresponding to the LVs, the so-called score plots, are used for this purpose. [Pg.64]

Sometimes measurements are evenly distributed in the scores plot but one or more measurements fall distinctly outside the envelope of the other measurements. Also in this case, the measurements that fall outside the envelope should receive special attention. It might be that these measurements were faulty, for example, owing to problems with the sensor. It is best to eliminate such a measurement from the data set, at least initially. [Pg.295]

As in many such problems, some form of pretreatment of the data is warranted. In all applications discussed here, the analytical data either have been untreated or have been normalized to relative concentration of each peak in the sample. Quality Assurance. Principal components analysis can be used to detect large sample differences that may be due to instrument error, noise, etc. This is illustrated by using samples 17-20 in Appendix I (Figure 6). These samples are replicate assays of a 1 1 1 1 mixture of the standard Aroclors. Fitting these data for the four samples to a 2-component model and plotting the two first principal components (Theta 1 and Theta 2 [scores] in... [Pg.210]

In the preceding description of the Mahalanobis distance, the number of coordinates in the distance metric is equal to the number of spectral frequencies. As discussed earlier in the section on principal component analysis, the intensities at many frequencies are dependent, and by using the full spectrum, we fit the noise in addition to the real information. In recent years, Mahalanobis distance has been defined with PCA or PLS scores instead of the spectral frequencies because these techniques eliminate or at least reduce most of the overfitting problem. The overall application of the Mahalanobis distance metric is the same except that the rt intensity values are replaced by the scores from PCA or PLS. An example of a Mahalanobis distance calculation on a set of Raman spectra for 25 carbohydrates is shown in Fig. 5-11. The 25 spectra were first subjected to PCA, and it was found that the first three principal components could account for most of the variance in the spectra. It was first assumed that all 25 spectra belonged to the same class because they were all carbohydrates. However, as shown in the three-dimensional plot in Fig. 5-11, the spectra can be clearly divided into three separate classes, with two of the spectra almost equal distance from each of the three classes. Most of the components in the upper left class in the two-dimensional plot were sugars however, some sugars were found in the other two classes. For unknowns, scores have to be calculated from the principal components and processed in the same way as the spectral intensities. [Pg.289]

Outliers due to real differences in data will also be detected by plotting the first two score vectors against each other. It depends on the specific problem whether or not they should be included in the final analysis. Nevertheless, they can be detected at an early stage of the invesitgation. [Pg.370]

The prediction of Y-data of unknown samples is based on a regression method where the X-data are correlated to the Y-data. The multivariate methods, usually used for such a calibration, are principal component regression (PCR) and partial least squares regression (PLS). Both methods are based on the assumption of linearity and can deal with co-linear data. The problem of co-linearity is solved in the same way as the formation of a PCA plot. The X-variables are added together into latent variables, score vectors. These vectors are independent since they are orthogonal to each other and they can therefore be used to create a calibration model. [Pg.7]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...