Multivariate data exploration

Given a set of data related to a number of measurements, after the application of proper feature extraction, pre-processing and normalization, exploratory techniques aim at studying the intrinsic characteristics of the data in order to discover eventual internal properties. [Pg.153]

Exploratory data analysis shows the aptitude of an ensemble of chemical sensors to be utilized for a given application, leaving to the supervised classification the task of building a model to be used to predict the class membership of unknown samples. [Pg.153]

Two main groups of exploratory analysis may be identified representation techniques and clustering techniques. [Pg.153]

Clustering techniques are mostly based on the concept of similarity expressed through the definition of a metric (distances calculus rule) in [Pg.153]

Always start pattern recognition by performing a preliminary multivariate data exploration PCA is a perfect tool for this purpose, being useful to visualize structures (groupings, correlation) within the data and to make decisions about the subsequent processing steps. [Pg.108]

Evidence of the appHcation of computers and expert systems to instmmental data interpretation is found in the new discipline of chemometrics (qv) where the relationship between data and information sought is explored as a problem of mathematics and statistics (7—10). One of the most useful insights provided by chemometrics is the realization that a cluster of measurements of quantities only remotely related to the actual information sought can be used in combination to determine the information desired by inference. Thus, for example, a combination of viscosity, boiling point, and specific gravity data can be used to a characterize the chemical composition of a mixture of solvents (11). The complexity of such a procedure is accommodated by performing a multivariate data analysis. [Pg.394]

On the other hand, factor analysis involves other manipulations of the eigen vectors and aims to gain insight into the structure of a multidimensional data set. The use of this technique was first proposed in biological structure-activity relationship (i. e., SAR) and illustrated with an analysis of the activities of 21 di-phenylaminopropanol derivatives in 11 biological tests [116-119, 289]. This method has been more commonly used to determine the intrinsic dimensionality of certain experimentally determined chemical properties which are the number of fundamental factors required to account for the variance. One of the best FA techniques is the Q-mode, which is based on grouping a multivariate data set based on the data structure defined by the similarity between samples [1, 313-316]. It is devoted exclusively to the interpretation of the inter-object relationships in a data set, rather than to the inter-variable (or covariance) relationships explored with R-mode factor analysis. The measure of similarity used is the cosine theta matrix, i. e., the matrix whose elements are the cosine of the angles between all sample pairs [1,313-316]. [Pg.269]

Thousands of chemical compounds have been identified in oils and fats, although only a few hundred are used in authentication. This means that each object (food sample) may have a unique position in an abstract n-dimensional hyperspace. A concept that is difficult to interpret by analysts as a data matrix exceeding three features already poses a problem. The art of extracting chemically relevant information from data produced in chemical experiments by means of statistical and mathematical tools is called chemometrics. It is an indirect approach to the study of the effects of multivariate factors (or variables) and hidden patterns in complex sets of data. Chemometrics is routinely used for (a) exploring patterns of association in data, and (b) preparing and using multivariate classification models. The arrival of chemometrics techniques has allowed the quantitative as well as qualitative analysis of multivariate data and, in consequence, it has allowed the analysis and modelling of many different types of experiments. [Pg.156]

A data matrix produced by compositional analysis commonly contains 10 or more metric variables (elemental concentrations) determined for an even greater number of observations. The bridge between this multidimensional data matrix and the desired archaeological interpretation is multivariate analysis. The purposes of multivariate analysis are data exploration, hypothesis generation, hypothesis testing, and data reduction. Application of multivariate techniques to data for these purposes entails an assumption that some form of structure exists within the data matrix. The notion of structure is therefore fundamental to compositional investigations. [Pg.63]

The development of anal3dical techniques involves implementation of new and advanced computational methods that enable multivariate data preprocessing and their exploration. The metabolic profiles, previously determined using an adequate... [Pg.246]

Tel. 206-441-4696, fax 206-441-0841, e-mail infomtrx halcyon.com Chemometric analysis based in part on the ARTHUR pattern recognition program. InStep for routine application of statistical models. EinSight for multivariate and visual data exploration. PCs (DOS). [Pg.335]

Generally, two types of questions are asked when applying multivariate data analysis techniques one question aims to explore the gathered data without ary preconceived assumptions or notions, while the second question relates to sample classification and finding valid and powerful models for prediction purposes. [Pg.213]

PCA is a method based on the Karhunen-Loeve transformation (KL transformation) of the data points in the feature space. In KL transformation, the data points in the feature space are rotated such that the new coordinates of the sample points become the linear combination of the original coordinates. And the first principal component is chosen to be the direction with largest variation of the distribution of sample points. After the KL transformation and the neglect of the components with minor variation of coordinates of sample points, we can make dimension reduction without significant loss of the information about the distribution of sample points in the feature space. Up to now PCA is probably the most widespread multivariate statistical technique used in chemometrics. Within the chemical community the first major application of PCA was reported in 1970s, and form the foundation of many modem chemometric methods. Conventional approaches are univariate in which only one independent variable is used per sample, but this misses much information for the multivariate problem of SAR, in which many descriptors are available on a number of candidate compounds. PCA is one of several multivariate methods that allow us to explore patterns in multivariate data, answering questions about similarity and classification of samples on the basis of projection based on principal components. [Pg.192]

The first multivariate operation to be done on a data matrix (see Figure 2) is exploration of the data. Exploration of multivariate space is done by factor analysis methods, of which principal component analysis (PCA) is the easiest choice. The data are projected to a matrix with less and more meaningful variables called scores (latent variables) (see... [Pg.344]

Which food area would require explorative multivariate data analysis tools We have seen in the introduction section that food science today embraces a wide multidisciplinary ambit, involving chemistry, biology/micro-biology, genetics, medicine, agriculture, technology and environmental science, and also sensory and consumer analysis as weU as economy. [Pg.78]

We examine grain-size normalization by means of multivariate log-ratio techniques. Specifically, published data are reanalyzed to explore the relationship between normalizing agents and grain-size composition, and to compare the log-ratio method with the traditional approach of heavy metal normalization. [Pg.133]

During the last two or three decades, chemists became used to the application of computers to control their instruments, develop analytical methods, analyse data and, consequently, to apply different statistical methods to explore multivariate correlations between one or more output(s) (e.g. concentration of an analyte) and a set of input variables (e.g. atomic intensities, absorbances). [Pg.244]

CONTENTS 1. Chemometrics and the Analytical Process. 2. Precision and Accuracy. 3. Evaluation of Precision and Accuracy. Comparison of Two Procedures. 4. Evaluation of Sources of Variation in Data. Analysis of Variance. 5. Calibration. 6. Reliability and Drift. 7. Sensitivity and Limit of Detection. 8. Selectivity and Specificity. 9. Information. 10. Costs. 11. The Time Constant. 12. Signals and Data. 13. Regression Methods. 14. Correlation Methods. 15. Signal Processing. 16. Response Surfaces and Models. 17. Exploration of Response Surfaces. 18. Optimization of Analytical Chemical Methods. 19. Optimization of Chromatographic Methods. 20. The Multivariate Approach. 21. Principal Components and Factor Analysis. 22. Clustering Techniques. 23. Supervised Pattern Recognition. 24. Decisions in the Analytical Laboratory. [Pg.215]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...