Principal component statistical

Figure 10.3 compares the distributions of a dataset containing the 108 most used existing solvents and a dataset of 239 SOLVSAFE solvent candidates in two principal components which account for the structural diversity of both datasets. One of the defining features of chemical spaces is that molecular structures can be represented as points whose coordinates depend on the values of relevant descriptors or variables. To characterize each molecular structure, SOLVSAFE used 52 structural descriptors. The principal component statistical analysis projects the data contained in the 52-dimensional chemical space into a two-dimensional space (plot in Figure 10.3). This approximation provides an overview of the systematic variation and distribution of the structural information and reveals how significant is the dissimilarity of the SOLVSAFE dataset when compared with the traditional solvents dataset.

The previously mentioned data set with a total of 115 compounds has already been studied by other statistical methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis, and the Partial Least Squares (PLS) method [39]. Thus, the choice and selection of descriptors has already been accomplished. [Pg.508]

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

We now consider a type of analysis in which the data (which may consist of solvent properties or of solvent effects on rates, equilibria, and spectra) again are expressed as a linear combination of products as in Eq. (8-81), but now the statistical treatment yields estimates of both a, and jc,. This method is called principal component analysis or factor analysis. A key difference between multiple linear regression analysis and principal component analysis (in the chemical setting) is that regression analysis adopts chemical models a priori, whereas in factor analysis the chemical significance of the factors emerges (if desired) as a result of the analysis. We will not explore the statistical procedure, but will cite some results. We have already encountered examples in Section 8.2 on the classification of solvents and in the present section in the form of the Swain et al. treatment leading to Eq. (8-74). [Pg.445]

More detailed statistical analyses (chemical element balance, principal component analysis and factor analysis) demonstrate that soil contributes >50% to street dust, iron materials, concrete/cement and tire wear contribute 5-7% each, with smaller contributions from salt spray, de-icing salt and motor vehicle emissions (5,93-100). A list is given in Table VII of the main sources of the elements which contribute to street dust. [Pg.130]

Since that time thousands of QSARs, covering a wide and diverse range of end points, have been published [9] most of these have used MLR, but numerous other statistical techniques have also been used, such as partial least squares, principal component analysis, artificial neural networks, decision trees, and discriminant analysis [f4]. [Pg.472]

The data from sensory evaluation and texture profile analysis of the jellies made with amidated pectin and sunflower pectin were subjected to Principal component analysis (PC) using the statistical software based on Jacobi method (Univac, 1973). The results of PC analysis are shown in figure 7. The plane of two principal components (F1,F2) explain 89,75 % of the variance contained in the original data. The attributes related with textural evaluation are highly correlated with the first principal component (Had.=0.95, Spr.=0.97, Che.=0.98, Gum.=0.95, Coe=0.98, HS=0.82 and SP=-0.93). As it could be expected, spreadability increases along the negative side of the axis unlike other textural parameters. [Pg.937]

Principal component analysis (PCA) is a statistical method having as its main purpose the representation in an economic way the location of the objects in a reduced coordinate system where only p axes instead of n axes corresponding to n variables (p[Pg.94]

H. Hotelling, Analysis of a complex of statistical variables into principal components. J. Educ. [Pg.159]

According to Andersen [12] early applications of LLM are attributed to the Danish sociologist Rasch in 1963 and to Andersen himself. Later on, the approach has been described under many different names, such as spectral map analysis [13,14] in studies of drug specificity, as logarithmic analysis in the French statistical literature [15] and as the saturated RC association model [16]. The term log-bilinear model has been used by Escoufier and Junca [ 17]. In Chapter 31 on the analysis of measurement tables we have described the method under the name of log double-centred principal components analysis. [Pg.201]

Here xik is an estimated value of a variable at a given point in time. Given that the estimate is calculated based on a model of variability, i.e., PCA, then Qi can reflect error relative to principal components for known data. A given pattern of data, x, can be classified based on a threshold value of Qi determined from analyzing the variability of the known data patterns. In this way, the -statistic will detect changes that violate the model used to estimate x. The 0-statistic threshold for methods based on linear projection such as PCA and PLS for Gaussian distributed data can be determined from the eigenvalues of the components not included in the model (Jack-son, 1992). [Pg.55]

However, there is a mathematical method for selecting those variables that best distinguish between formulations—those variables that change most drastically from one formulation to another and that should be the criteria on which one selects constraints. A multivariate statistical technique called principal component analysis (PCA) can effectively be used to answer these questions. PCA utilizes a variance-covariance matrix for the responses involved to determine their interrelationships. It has been applied successfully to this same tablet system by Bohidar et al. [18]. [Pg.618]

The multivariate statistical data analysis, using principal component analysis (PCA), of this historical data revealed three main contamination profiles. A first contamination profile was identified as mostly loaded with PAHs. A samples group which includes sampling sites R1 (Ebro river in Miranda de Ebro, La Rioja), T3 (Zadorra river in Villodas, Alava) and T9 (Arga river in Puente la Reina, Navarra), all located in the upper Ebro river basin and close to Pamplona and Vitoria cities,... [Pg.146]

To understand these processes and correlate residue profiles with specific toxic responses required congener-specific methods of analysis and complex statistical techniques (principal component analysis). Using these techniques, it was established that eggs of Forster s terns of two colonies differed significantly in PCB composition (Schwartz and Stalling 1991). Similar techniques were used to identify various PCB-contaminated populations of harbor seals (Phoca vitulina) in Denmark (Storr-Hansen and Spliid 1993). [Pg.1318]

The authors wanted to select indicators that specifically tap melancholic depression. To evaluate this construct, a principal components analysis of the joint pool of K-SADS and BDI items was performed. Two independent statistical tests suggested a two-component solution, but the resulting components appeared to reflect method factors, rather than substantive factors. Specifically, all of the BDI items loaded on the first component (except for three items that did not load on either component) and nearly all of the K-SADS items loaded on the second component. In fact, the first component correlated. 98 with the BDI and the second component correlated. 93 with the K-SADS. Ambrosini et al., however, concluded that the first component reflected depression severity and the second component reflected melancholic depression. This interpretation was somewhat at odds with the data. Specifically, the second component included some K-SADS items that did not tap symptoms of melancholia (e.g., irritability and anger) and did not include some BDI items that measure symptoms of melancholia (e.g., loss of appetite). [Pg.158]

Principal Component Analysis (PCA) is a complex statistical approach for highlighting the variance in the image using multiplication of original data with eigenvectors. (NASA Remote Sensing... [Pg.486]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...