Multivariate dependent data

The last four chapters of this book have all been concerned with methods that handle multiple independent (descriptor) variables. This has included techniques for displaying multivariate data in lower dimensional space, determining relationships between points in N dimensions and fitting models between multiple descriptors and a single response variable, continuous or discrete. Hopefully, these examples have shown the power of multivariate techniques in data analysis and have demonstrated that the information contained in a data set will often be revealed only by consideration of all of the data at once. What is true for the analysis of multiple descriptor variables is also true for the analysis of multiple [Pg.162]

Equation (8.2) only describes 60 per cent of the variance in PC2 and the high standard error for the shape descriptor term casts some doubt on the predictive ability of the equation. However, it is hoped that these two equations demonstrate the way in which regression models for multivariate dependent data can be generated by means of PCA. [Pg.178]

This chapter has shown how multivariate dependent data, from multiple experiments or multiple results from one experiment, may be analysed by a variety of methods. The output from these analyses should be consistent Avith the results of the analysis of individual variables and in some circumstances may provide information that is not available from consideration of individual results. In this respect the multivariate treatment of dependent data offers the same advantages as the multivariate treatment of independent data. The simultaneous multivariate analysis of response and descriptor data may also be advantageous but does suffer from complexity in prediction. [Pg.182]

Analytical quality control (QC) efforts usually are at level I or II. Statistical evaluation of multivariate laboratory data is often complicated because the number of dependent variables is greater than the number of samples. In evaluating quality control, the analyst seeks to establish that replicate analyses made on reference material of known composition do not contain excessive systematic or random errors of measurement. In addition, when such problems are detected, it is helpful if remedial measures can be Inferred from the QC data. [Pg.2]

Chapters 6 and 7 described the construction of regression models (MLR, PCR, PLS, and continuum regression) in which a single dependent variable was related to linear combinations of independent variables. Can these procedures be modified to include multiple dependent variables One fairly obvious way to take account of at least some of the information in a multivariate dependent set is to carry out PCA or FA on the data and use the resulting scores to constmct regression models. [Pg.177]

For example, the objects may be chemical compounds. The individual components of a data vector are called features and may, for example, be molecular descriptors (see Chapter 8) specifying the chemical structure of an object. For statistical data analysis, these objects and features are represented by a matrix X which has a row for each object and a column for each feature. In addition, each object win have one or more properties that are to be investigated, e.g., a biological activity of the structure or a class membership. This property or properties are merged into a matrix Y Thus, the data matrix X contains the independent variables whereas the matrix Ycontains the dependent ones. Figure 9-3 shows a typical multivariate data matrix. [Pg.443]

Analytical results are often represented in a data table, e.g., a table of the fatty acid compositions of a set of olive oils. Such a table is called a two-way multivariate data table. Because some olive oils may originate from the same region and others from a different one, the complete table has to be studied as a whole instead as a collection of individual samples, i.e., the results of each sample are interpreted in the context of the results obtained for the other samples. For example, one may ask for natural groupings of the samples in clusters with a common property, namely a similar fatty acid composition. This is the objective of cluster analysis (Chapter 30), which is one of the techniques of unsupervised pattern recognition. The results of the clustering do not depend on the way the results have been arranged in the table, i.e., the order of the objects (rows) or the order of the fatty acids (columns). In fact, the order of the variables or objects has no particular meaning. [Pg.1]

We consider an nxn table D of distances between the n row-items of an nxp data table X. Distances can be derived from the data by means of various functions, depending upon the nature of the data and the objective of the analysis. Each of these functions defines a particular metric (or yardstick), and the graphical result of a multivariate analysis may largely depend on the particular choice of distance function. [Pg.146]

Any data matrix can be considered in two spaces the column or variable space (here, wavelength space) in which a row (here, spectrum) is a vector in the multidimensional space defined by the column variables (here, wavelengths), and the row space (here, retention time space) in which a column (here, chromatogram) is a vector in the multidimensional space defined by the row variables (here, elution times). This duality of the multivariate spaces has been discussed in more detail in Chapter 29. Depending on the chosen space, the PCs of the data matrix... [Pg.246]

Partial least squares (PLS) projections to latent structures [40] is a multivariate data analysis tool that has gained much attention during past decade, especially after introduction of the 3D-QSAR method CoMFA [41]. PLS is a projection technique that uses latent variables (linear combinations of the original variables) to construct multidimensional projections while focusing on explaining as much as possible of the information in the dependent variable (in this case intestinal absorption) and not among the descriptors used to describe the compounds under investigation (the independent variables). PLS differs from MLR in a number of ways (apart from point 1 in Section 16.5.1) ... [Pg.399]

We now proceed to m observations. The ith observation provides the estimates xi of the independent variables Xj and the estimate y, of the dependent variable Y. The n estimates xtj of the variables Xj provided by this ith observation are lumped together into the vector xt. We assume that the set of the (n+1) data (i/,y,) associated with the ith observation represent unbiased estimates of the mean ( yf) of a random (n + 1)-vector distributed as a multivariate normal distribution. The unbiased character of the estimates is equivalent to... [Pg.294]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...