Multivariate similarity

USA, using multivariate similarity among congener profiles in sediment samples. Environmental Toxicology and Chemistry 21 1591-1599. [Pg.1707]

Multivariate data analysis usually starts with generating a set of spectra and the corresponding chemical structures as a result of a spectrum similarity search in a spectrum database. The peak data are transformed into a set of spectral features and the chemical structures are encoded into molecular descriptors [80]. A spectral feature is a property that can be automatically computed from a mass spectrum. Typical spectral features are the peak intensity at a particular mass/charge value, or logarithmic intensity ratios. The goal of transformation of peak data into spectral features is to obtain descriptors of spectral properties that are more suitable than the original peak list data. [Pg.534]

Spectral features and their corresponding molecular descriptors are then applied to mathematical techniques of multivariate data analysis, such as principal component analysis (PCA) for exploratory data analysis or multivariate classification for the development of spectral classifiers [84-87]. Principal component analysis results in a scatter plot that exhibits spectra-structure relationships by clustering similarities in spectral and/or structural features [88, 89]. [Pg.534]

The current widespread interest in MFC techniques was initiated by pioneering research performed by two industrial groups in the 1970s. Shell Oil (Houston, TX) reported their Dynamic Matrix Control (DMC) approach in 1979, while a similar technique, marketed as IDCOM, was published by a small French company, ADERSA, in 1978. Since then, there have been over one thousand applications of these and related MFC techniques in oil refineries and petrochemical plants around the world. Thus, MFC has had a substantial impact and is currently the method of choice for difficult multivariable control problems in these industries. However, relatively few applications have been reported in other process industries, even though MFC is a veiy general approach that is not limited to a particular industiy. [Pg.739]

Referring to the discussion of the fundamental concepts regarding half cells and the Nernst equation in Chapter 5 (Section 5.3.1) it is possible to briefly summarize the similarities and differences of these two sets of systems. It is important to recognize the ways in which they are different when considering the behavior of complex multivariate systems such as the oceans and clouds, or a lake-river system. [Pg.421]

Analytical results are often represented in a data table, e.g., a table of the fatty acid compositions of a set of olive oils. Such a table is called a two-way multivariate data table. Because some olive oils may originate from the same region and others from a different one, the complete table has to be studied as a whole instead as a collection of individual samples, i.e., the results of each sample are interpreted in the context of the results obtained for the other samples. For example, one may ask for natural groupings of the samples in clusters with a common property, namely a similar fatty acid composition. This is the objective of cluster analysis (Chapter 30), which is one of the techniques of unsupervised pattern recognition. The results of the clustering do not depend on the way the results have been arranged in the table, i.e., the order of the objects (rows) or the order of the fatty acids (columns). In fact, the order of the variables or objects has no particular meaning. [Pg.1]

A. Thielemans, P.J. Lewi and D.L. Massart, Similarities and differences among multivariate display techniques by Belgian Cancer Mortality Distribution data. Chemom. Intell. Lab. Syst., 3 (1988) 277-300. [Pg.206]

UNEQ can be applied when only a few variables must be considered. It is based on the Mahalanobis distance from the centroid of the class. When this distance exceeds a critical distance, the object is an outlier and therefore not part of the class. Since for each class one uses its own covariance matrix, it is somewhat related to QDA (Section 33.2.3). The situation described here is very similar to that discussed for multivariate quality control in Chapter 20. In eq. (20.10) the original variables are used. This equation can therefore also be used for UNEQ. For convenience it is repeated here. [Pg.228]

As explained already, SIMCA can be applied as an outlier test, similarly to the multivariate QC tests referred to earlier. Feam et al. [44] have described certain properties of SIMCA in this respect and compared it with some alternatives. [Pg.232]

Given these tables of multivariate data one might be interested in various relationships. For example, do the two panels have a similar perception of the different olive oils (Tables 35.1 and 35.2) Are the oils more or less similarly scattered in the two multidimensional spaces formed by the Dutch and by the British attributes How are the two sets of sensory attributes related Does the... [Pg.308]

Procrustes analysis is a method for relating two sets of multivariate observations, say X and Y. For example, one may wish to compare the results in Table 35.1 and Table 35.2 in order to find out to what extent the results from both panels agree, e.g., regarding the similarity of certain olive oils and the dissimilarity of others. Procrustes analysis has a strong geometric interpretation. The... [Pg.310]

Thus, we see that CCA forms a canonical analysis, namely a decomposition of each data set into a set of mutually orthogonal components. A similar type of decomposition is at the heart of many types of multivariate analysis, e.g. PCA and PLS. Under the assumption of multivariate normality for both populations the canonical correlations can be tested for significance [6]. Retaining only the significant canonical correlations may allow for a considerable dimension reduction. [Pg.320]

DOSY is a technique that may prove successful in the determination of additives in mixtures [279]. Using different field gradients it is possible to distinguish components in a mixture on the basis of their diffusion coefficients. Morris and Johnson [271] have developed diffusion-ordered 2D NMR experiments for the analysis of mixtures. PFG-NMR can thus be used to identify those components in a mixture that have similar (or overlapping) chemical shifts but different diffusional properties. Multivariate curve resolution (MCR) analysis of DOSY data allows generation of pure spectra of the individual components for identification. The pure spin-echo diffusion decays that are obtained for the individual components may be used to determine the diffusion coefficient/distribution [281]. Mixtures of molecules of very similar sizes can readily be analysed by DOSY. Diffusion-ordered spectroscopy [273,282], which does not require prior separation, is a viable competitor for techniques such as HPLC-NMR that are based on chemical separation. [Pg.340]

Another possible advantage with MolSurf descriptors (and also other multi parameter descriptors) is the fact that they describe the investigated compounds not only with a single value, as in the case of PSA and log P descriptors, but in a multivariate way. This approach provides a more balanced description of the requirements that a structure must have in order to be well absorbed and may, in turn, provide additional insight on how to develop compounds having favorable absorption properties. However, as will be described in Section 16.4.10, simpler -i.e., less computationally demanding - parameters carrying similar information content with equal interpretability may be used to derive models for intestinal absorption at the same level of statistical quality. [Pg.391]

The MND can be mathematically described by an expression that is similar in form, but has the characteristic that each of the individual parts of the expression represents the multivariate analog of the corresponding part of equation 1-1. [Pg.5]

The results show that DE-MS alone provides evidence of the presence of the most abundant components in samples. On account of the relatively greater difficulty in the interpretation of DE-MS mass spectra, the use of multivariate analysis by principal component analysis (PCA) of DE-MS mass spectral data was used to rapidly differentiate triterpene resinous materials and to compare reference samples with archaeological ones. This method classifies the spectra and indicates the level of similarity of the samples. The output is a two- or three-dimensional scatter plot in which the geometric distances among the various points, representing the samples, reflect the differences in the distribution of ion peaks in the mass spectra, which in turn point to differences in chemical composition of... [Pg.90]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...