Multivariate dataset

Principal component analysis has been used to simplify and explore multivariate datasets that arise in many areas of human activity social sciences, biostatistics, economics, etc. It has also been used effectively in analytical chemistry and in chemomet-rics [55]. PCA was introduced as a valuable tool in structure correlation work by Murray-Rust and Bland [15]. The first application [56] involved the analysis and classification of the conformations adopted in crystal structures by the yS-l -aminoribofuranosyl fragment. Since then, PCA has been applied to studies of substituent-induced deformations of benzene rings [57], configurational distortions from ideal symmetry at metal centres [7, 8, 9 and references therein], and to analyses of ring conformations in terms of archetypal forms and their interconversions (see e.g. [58]). [Pg.138]

The object of the various cluster analysis algorithms, then, is to perform a numerical dissection of the multivariate dataset G Nf,N, such that each individual cluster contains fragments that have closely similar geometries. The numerical results are best summarized in terms of an archetypal or mean fragment geometry for each ma-... [Pg.155]

All metabonomics studies result in complex multivariate datasets that require a variety of chemometric, bioinformatic, and visualization tools for effective interpretation. The aim of these procedures is to produce biochemically based hnger-prints that are of diagnostic or other classification value. A second stage, crucial in such studies, is to identify the substances causing the diagnosis or classification, and these become the combination of biomarkers that reflects actual biological events. Thus, metabonomics studies allow real-world or biomedical endpoint observations to be obtained. [Pg.1505]

To reduce the dimensionality of multivariate datasets, PCA or similar ordination methods are commonly used to reduce the number of variables in a dataset with minimal information loss (Wackernagel, 2003). Canonical correlation analysis (CCA) (Goovaerts, 1994 Wackemagel, 2003) is another method suited for multivariate indicator analysis with the aim to analyze relationships between sets of variables. [Pg.591]

If the data are to be used by chemists for comparing similarities or differences between compounds or perhaps to see if different groupings of compounds have different types of biological response, then what is needed is Dimension Reduction. Dimension reduction is the name given to a process that reduces the dimensionality of a multivariate dataset, while retaining most of the information that it contains. Dimension reduction is not to be... [Pg.290]

Supervised variable elimination might also be regarded as variable selection. Whether we consider this to be the third major section of how to treat multivariate datasets is a matter of semantics, however. It is possible to eliminate variables in a supervised manner rather than to select them. One obvious way is to eliminate variables that have a zero or very low correlation with the response variable or variables. In the case of classified response data, this selection means those descriptors that have the same distribution (mean and standard deviation) for the two or more classes. The danger in this selection process is the possibility that a variable might have a low correlation with the response but contribute to a multivariate correlation. Although this is possible, in practice, it is unlikely. [Pg.309]

The %HIA, on a scale between 0 and 100%, for the same dataset was modeled by Deconinck et al. with multivariate adaptive regression splines (MARS) and a derived method two-step MARS (TMARS) [38]. Among other Dragon descriptors, the TMARS model included the Tig E-state topological parameter [25], and MARS included the maximal E-state negative variation. The average prediction error, which is 15.4% for MARS and 20.03% for TMARS, shows that the MARS model is more robust in modeling %H1A. [Pg.98]

Display methods (EP, NLM) can be considered as clustering techniques, when no apriori information is given about the subdivision of the dataset into categories. However, with the name of cluster analysis, we will denote the techniques working with the whole multivariate information in the following way. [Pg.130]

Preliminary data analysis carried out for the spectral datasets were functional group mapping, and/or hierarchical cluster analysis (HCA). This latter method, which is well described in the literature,4,9 is an unsupervised approach that does not require any reference datasets. Like most of the multivariate methods, HCA is based on the correlation matrix Cut for all spectra in the dataset. This matrix, defined by Equation (9.1),... [Pg.193]

In this review, we demonstrate that excellent IR spectra from microscopic regions of cells and tissue can be collected. These spectra are extremely sensitive to variations in the biochemical composition of the pixels from which the spectra were acquired. Multivariate analyses of the spectra datasets of cells, cell smears and tissue sections produce pseudocolor maps in a totally unsupervised fashion that reproduce the histopathology of tissue sections and cytological features of cells and cell smears. [Pg.202]

Another method that can be used to quickly extract useful chemical information from an infrared image dataset is MCR 50-52,54,56-58 In some cases, this method can be used to obtain the concentration and absorbance spectra for each constituent in the original dataset. However, if the goal is not necessarily to resolve the constituents spectra, but rather to empirically classify them, a regression method may be more appropriate.53,59,60 The most prominent multivariate regression methods include PLS and ANNs. [Pg.271]

FIGURE 1 Multivariate approaches for omics data integration. (A) The RV coefficient is a correlation measure between datasets that can be used as distance metric. (B) The 02PLS method dissects gene expression and metabolomics datasets for shared and data type-specific variation. (C) The N-way approach accommodates experimental factors in a multidimensional block. Tucker3 is used to study intradataset covariation and NPLS analyzes between-block covariation. Panel (B) Reproduced from Bylesjo et al. (23). Panel (C) Reproduced from Conesa et al. (24). [Pg.449]

Fig. 2 NMR-based metabolomics can be used to quickly identify changes in the global NMR pattern. In this case, the red peaks between 2.5-0.5 ppm are indicative of metabolic differences that are specific to the disease state. Actual data is not nearly as clear as this schematic. The analysis of typical NMR metabolomics datasets requires the use of multivariate analysis methods, such as principle components analysis (PCA), in order to use the metabolome to classify samples...

Multivariate approaches. The methods of Section 2.2.2 could be extended to all ten PAHs in the dataset of case study 1, and with appropriate choice of ten wavelengths may give reasonable estimates of concentrations. However, all the original wavelengths contain some information and there is no reason why most of the spectrum cannot be employed. [Pg.8]

This led to the concept of fragmentation of the total molecular surface area in combination with multivariate analysis (Stenberg et al. 2001) towards predictive models of drug permeability for more complex datasets. Permeability models were established based on so-called partitioned total surface area (PTSA) descriptors. Each of the PTSA descriptors corresponds to the surface of a certain atom type, differentiated by hybridisation, which results in individual descriptors for e.g. sp3, sp2, and sp carbon atoms. The resulting permeability model based on 19 descriptors finally consisted of oxygen, nitrogen and polar hydrogen surfaces, while the main contribution for prediction of Caco-2 permeability was attributed to PSA. In addition some more lipophilic contributions... [Pg.414]

While models to understand a single chemotype were reported to include only a few informative descriptors from this surface-area family for larger and more diverse datasets, multivariate analysis of multiple descriptors might be more useful. [Pg.414]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...