X-block data

The decision whether or not to variance scale the x-block data is independent from the decision about scaling the y-block data. We can decide to scale either, both, or neither. [Pg.176]

Variance (cont) of prediction, 167 Variance scaling, 100, 174 Vectors basis, 94 Weighting of data, 100 Whole spectrum method, 71 x-block data, 7 x-data, 7 XE, 94 y-block data, 7 y-data, 7... [Pg.205]

Table IVa. Compositional variables considered as x-block data in the PLS regression analysis, their codes, maximum value of the 12 samples analyzed, and the percentage explained variance from the first two components extracted from...

As we will soon see, the nature of the work makes it extremely convenient to organize our data into matrices. (If you are not familiar with data matrices, please see the explanation of matrices in Appendix A before continuing.) In particular, it is useful to organize the dependent and independent variables into separate matrices. In the case of spectroscopy, if we measure the absorbance spectra of a number of samples of known composition, we assemble all of these spectra into one matrix which we will call the absorbance matrix. We also assemble all of the concentration values for the sample s components into a separate matrix called the concentration matrix. For those who are keeping score, the absorbance matrix contains the independent variables (also known as the x-data or the x-block), and the concentration matrix contains the dependent variables (also called the y-data or the y-block). [Pg.7]

In addition to the set of new coordinate axes (basis space) for the spectral data (the x-block), we also find a set of new coordinate axes (basis space) for the concentration data (the y-block). [Pg.131]

PLS is more complex than PCR because we are simultaneously using degrees of fieedom in both the x-block and the y-block data. In the absence of a rigourous derivation of the proper number of degrees of freedom to use for PLS a simple approximation is the number of samples, n, minus the number of factors (latent variables), f, minus 1. [Pg.170]

The BLOCK DATA subroutine provides information needed for the optimization to labeled COMMON. This includes the number of variables (NV) to be optimized, their initial values (X), their maximum values (XMAX), their minimum values (XMIN), initial increments to use in varying X values (DELTAX) and an indication of how accurate the optimized variables should be (DEIMIN). The parameters NTRACE and MATRIX are output options available from STEPIT. Once STEPIT has been modified for use with CSMP, it can be used without further modification. All information required for an optimization problem is provided by means of the BLOCK DATA subroutine. Although we find this approach satisfactory for batch calculations, individuals with an interactive computer system may wish to modify STEPIT so that this information can be introduced more conveniently. [Pg.300]

In principle, in the absence of noise, the PLS factor should completely reject the nonlinear data by rotating the first factor into orthogonality with the dimensions of the x-data space which are spawned by the nonlinearity. The PLS algorithm is supposed to find the (first) factor which maximizes the linear relationship between the x-block scores and the y-block scores. So clearly, in the absence of noise, a good implementation of PLS should completely reject all of the nonlinearity and return a factor which is exactly linearly related to the y-block variances. (Richard Kramer)... [Pg.153]

B program, PLS-2, uses the partial least squares (PLS) method. This method has been proposed by H. Wold (37) and was discussed by S. Wold (25). In such a problem there are two blocks of data, T and X. It is assumed that T is related to X by latent variables u and t is derived from the X block and u is derived from the Y block. [Pg.209]

As shown in Fig. 22, the resulting procedure, referred to as a multi-block experiment, produces a two-dimensional data set, such as an array of FIDs (its exact nature depends upon the signal acquisition method). The data of each x-block are then reduced to a single quantity, S(t) which should be proportional either to the total sample magnetization Ma(x) or to one of its components. Since the vertical scale of the relaxation curve is irrelevant, we can identify S(t) with Ma(x) at the exact time of detection (usually just after the first excitation pulse). [Pg.442]

Correlation problems concern the study of the data tables of uncategorized objects, divided vertically the vertical divisions correspond to two (or more) blocks of variables (block X, block Y,.. .). One or more variables may be in a block. [Pg.96]

PLS falls in the category of multivariate data analysis whereby the X-matrix containing the independent variables is related to the Y-matrix, containing the dependent variables, through a process where the variance in the Y-matrix influences the calculation of the components (latent variables) of the X-block and vice versa. It is important that the number of latent variables is correct so that overfitting of the model is avoided this can be achieved by cross-validation. The relevance of each variable in the PLS-metfiod is judged by the modelling power, which indicates how much the variable participates in the model. A value close to zero indicates an irrelevant variable which may be deleted. [Pg.103]

PLS is related to principal components analysis (PCA) (20), This is a method used to project the matrix of the X-block, with the aim of obtaining a general survey of the distribution of the objects in the molecular space. PCA is recommended as an initial step to other multivariate analyses techniques, to help identify outliers and delineate classes. The data are randomly divided into a training set and a test set. Once the principal components model has been calculated on the training set, the test set may be applied to check the validity of the model. PCA differs most obviously from PLS in that it is optimized with respect to the variance of the descriptors. [Pg.104]

Molecular structure elucidation, principally by single crystal X-ray diffraction, has become almost routine and is now available for the majority of the metal amides presently discussed. In the 1980 book, however, such data were provided for just 112 compounds, 54 of which were for d- and/-block metals and 41 for the Group 13 metal amides. The contrast with the developing situation is illustrated by reference to Group 1 metal amides from four X-ray data sets in 1980 there were more than 200 by the end of 2007. [Pg.5]

Collinearity among the x-variables (e.g. absorbances at consecutive times of the atomic peaks) is not a problem. The latent variables calculated in PLS, like the PCs, resume the most relevant information of the whole data set by taking linear combinations of the x-variables. The scores of the X-block are orthogonal (the factors are independent of each other) and the corresponding weights are orthonormal (their maximum magnitude is 1). As for PCR, this means that the information explained by... [Pg.190]

Fig. 6 In the case of the oblique unit cell the intensity of the (—11) signal in the XRD pattern is usually higher than the intensity of the (11) signal (left). This indicates that the layer fragments in the crystallographic unit cell are inclined toward the shorter diagonal [the crystallographic plane (—11)] of the primitive unit cell (right). The electron density map was reconstructed from the X-ray data (see Sect. 3) bright regions are filled by the aromatic parts of molecules while the dark regions are filled by the alkyl chains. Dotted lines show midplanes of the blocks...

Note that the x block error depends only on how many PCs have been used in the model, but the error in the c block depends also on the specific compound, there being a different percentage error for each compound in the mixture. For 0 PCs, the estimates of the PCs and concentrations are simply 0 (or the mean if the data have been centred). The graphs of errors for... [Pg.11]

Fig. 11 represents the same data for acenaphthylene. Whereas the x block modelling error is fairly similar to that of pyrene, the concentration is modelled much less well, a consequence of the substantial spectral overlap and lack of significant features. [Pg.15]

By plotting the eigenvalues (or errors in modelling the x block), it is also possible to determine prediction errors for the x data block. However, the main aim of calibration is to predict concentrations rather than spectra, so this information, whereas useful, is less frequently employed in calibration. [Pg.20]

Many refinements to cross-validation have been proposed in the literature. It is possible to perform cross-validation on the x block to determine the optimum number of components instead of the c block. There are several alternative approaches to cross-validation, a common one involving leaving larger proportions of the data out (e.g. one tenth) at a time, valuable for very large datasets. Some statisticians also propose methods involving removing individual measurements rather than individual objects or spectra, but such approaches are less used in analytical chemistry. The leave one sample out at a time method is a popular, easily implemented, and widespread approach. There tends to be a significant divide between... [Pg.21]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...