Principal component statistical analysis

Figure 10.3 compares the distributions of a dataset containing the 108 most used existing solvents and a dataset of 239 SOLVSAFE solvent candidates in two principal components which account for the structural diversity of both datasets. One of the defining features of chemical spaces is that molecular structures can be represented as points whose coordinates depend on the values of relevant descriptors or variables. To characterize each molecular structure, SOLVSAFE used 52 structural descriptors. The principal component statistical analysis projects the data contained in the 52-dimensional chemical space into a two-dimensional space (plot in Figure 10.3). This approximation provides an overview of the systematic variation and distribution of the structural information and reveals how significant is the dissimilarity of the SOLVSAFE dataset when compared with the traditional solvents dataset.

Table V. Statistical Summary for A-3 Principal Components SIMCA Analysis of Aroclor Samples.

Leonard and Roy [ 194] recently reported QSAR 70-73 on the HIV protease inhibitory data of 1,2,5,6-tetra-o-benzyl-D-mannitols (62) studied by Bouzide et al. [195]. Several statistical techniques such as stepwise regression, multiple linear regression with factor analysis as the data preprocessing step (FA-MLR), principal component regression analysis (PCRA) and partial least square (PLS) analysis were appHed to identify the structural and physicochemical requirements for HIV protease inhibitory activity. [Pg.240]

Because we were not confident in the 20 factors proposed by the Minnesota team, we conducted a principal-components factor analysis of the database. We engaged a mathematical statistician, Marilyn Monda, to assist us with this process. A factor analysis is a statistical process that looks in the database for groups of questions that correlate. This process allows us to define sets of questions that are related to each other. We call these question sets factors. Our factor analysis indicated that the original Minnesota survey measured six factors. Rather than name the factors immediately, we decided to conduct focus-group research with survey respondents in order to better understand the factors. [Pg.149]

The previously mentioned data set with a total of 115 compounds has already been studied by other statistical methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis, and the Partial Least Squares (PLS) method [39]. Thus, the choice and selection of descriptors has already been accomplished. [Pg.508]

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

We now consider a type of analysis in which the data (which may consist of solvent properties or of solvent effects on rates, equilibria, and spectra) again are expressed as a linear combination of products as in Eq. (8-81), but now the statistical treatment yields estimates of both a, and jc,. This method is called principal component analysis or factor analysis. A key difference between multiple linear regression analysis and principal component analysis (in the chemical setting) is that regression analysis adopts chemical models a priori, whereas in factor analysis the chemical significance of the factors emerges (if desired) as a result of the analysis. We will not explore the statistical procedure, but will cite some results. We have already encountered examples in Section 8.2 on the classification of solvents and in the present section in the form of the Swain et al. treatment leading to Eq. (8-74). [Pg.445]

More detailed statistical analyses (chemical element balance, principal component analysis and factor analysis) demonstrate that soil contributes >50% to street dust, iron materials, concrete/cement and tire wear contribute 5-7% each, with smaller contributions from salt spray, de-icing salt and motor vehicle emissions (5,93-100). A list is given in Table VII of the main sources of the elements which contribute to street dust. [Pg.130]

Since that time thousands of QSARs, covering a wide and diverse range of end points, have been published [9] most of these have used MLR, but numerous other statistical techniques have also been used, such as partial least squares, principal component analysis, artificial neural networks, decision trees, and discriminant analysis [f4]. [Pg.472]

The data from sensory evaluation and texture profile analysis of the jellies made with amidated pectin and sunflower pectin were subjected to Principal component analysis (PC) using the statistical software based on Jacobi method (Univac, 1973). The results of PC analysis are shown in figure 7. The plane of two principal components (F1,F2) explain 89,75 % of the variance contained in the original data. The attributes related with textural evaluation are highly correlated with the first principal component (Had.=0.95, Spr.=0.97, Che.=0.98, Gum.=0.95, Coe=0.98, HS=0.82 and SP=-0.93). As it could be expected, spreadability increases along the negative side of the axis unlike other textural parameters. [Pg.937]

Principal component analysis (PCA) is a statistical method having as its main purpose the representation in an economic way the location of the objects in a reduced coordinate system where only p axes instead of n axes corresponding to n variables (p[Pg.94]

Each oil-dispersant combination shows a unique threshold or onset of dispersion [589]. A statistic analysis showed that the principal factors involved are the oil composition, dispersant formulation, sea surface turbulence, and dispersant quantity [588]. The composition of the oil is very important. The effectiveness of the dispersant formulation correlates strongly with the amount of the saturate components in the oil. The other components of the oil (i.e., asphaltenes, resins, or polar substances and aromatic fractions) show a negative correlation with the dispersant effectiveness. The viscosity of the oil is determined by the composition of the oil. Therefore viscosity and composition are responsible for the effectiveness of a dispersant. The dispersant composition is significant and interacts with the oil composition. Sea turbulence strongly affects dispersant effectiveness. The effectiveness rises with increasing turbulence to a maximal value. The effectiveness for commercial dispersants is a Gaussian distribution around a certain salinity value. [Pg.305]

H. Hotelling, Analysis of a complex of statistical variables into principal components. J. Educ. [Pg.159]

According to Andersen [12] early applications of LLM are attributed to the Danish sociologist Rasch in 1963 and to Andersen himself. Later on, the approach has been described under many different names, such as spectral map analysis [13,14] in studies of drug specificity, as logarithmic analysis in the French statistical literature [15] and as the saturated RC association model [16]. The term log-bilinear model has been used by Escoufier and Junca [ 17]. In Chapter 31 on the analysis of measurement tables we have described the method under the name of log double-centred principal components analysis. [Pg.201]

However, there is a mathematical method for selecting those variables that best distinguish between formulations—those variables that change most drastically from one formulation to another and that should be the criteria on which one selects constraints. A multivariate statistical technique called principal component analysis (PCA) can effectively be used to answer these questions. PCA utilizes a variance-covariance matrix for the responses involved to determine their interrelationships. It has been applied successfully to this same tablet system by Bohidar et al. [18]. [Pg.618]

The multivariate statistical data analysis, using principal component analysis (PCA), of this historical data revealed three main contamination profiles. A first contamination profile was identified as mostly loaded with PAHs. A samples group which includes sampling sites R1 (Ebro river in Miranda de Ebro, La Rioja), T3 (Zadorra river in Villodas, Alava) and T9 (Arga river in Puente la Reina, Navarra), all located in the upper Ebro river basin and close to Pamplona and Vitoria cities,... [Pg.146]

To understand these processes and correlate residue profiles with specific toxic responses required congener-specific methods of analysis and complex statistical techniques (principal component analysis). Using these techniques, it was established that eggs of Forster s terns of two colonies differed significantly in PCB composition (Schwartz and Stalling 1991). Similar techniques were used to identify various PCB-contaminated populations of harbor seals (Phoca vitulina) in Denmark (Storr-Hansen and Spliid 1993). [Pg.1318]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...