Statistic methods

Since the accuracy of experimental data is frequently not high, and since experimental data are hardly ever plentiful, it is important to reduce the available data with care using a suitable statistical method and using a model for the excess Gibbs energy which contains only a minimum of binary parameters. Rarely are experimental data of sufficient quality and quantity to justify more than three binary parameters and, all too often, the data justify no more than two such parameters. When data sources (5) or (6) or (7) are used alone, it is not possible to use a three- (or more)-parameter model without making additional arbitrary assumptions. For typical engineering calculations, therefore, it is desirable to use a two-parameter model such as UNIQUAC. [Pg.43]

The primary purpose for expressing experimental data through model equations is to obtain a representation that can be used confidently for systematic interpolations and extrapolations, especially to multicomponent systems. The confidence placed in the calculations depends on the confidence placed in the data and in the model. Therefore, the method of parameter estimation should also provide measures of reliability for the calculated results. This reliability depends on the uncertainties in the parameters, which, with the statistical method of data reduction used here, are estimated from the parameter variance-covariance matrix. This matrix is obtained as a last step in the iterative calculation of the parameters. [Pg.102]

In the maximum-likelihood method used here, the "true" value of each measured variable is also found in the course of parameter estimation. The differences between these "true" values and the corresponding experimentally measured values are the residuals (also called deviations). When there are many data points, the residuals can be analyzed by standard statistical methods (Draper and Smith, 1966). If, however, there are only a few data points, examination of the residuals for trends, when plotted versus other system variables, may provide valuable information. Often these plots can indicate at a glance excessive experimental error, systematic error, or "lack of fit." Data points which are obviously bad can also be readily detected. If the model is suitable and if there are no systematic errors, such a plot shows the residuals randomly distributed with zero means. This behavior is shown in Figure 3 for the ethyl-acetate-n-propanol data of Murti and Van Winkle (1958), fitted with the van Laar equation. [Pg.105]

The range of uncertainty in the UR may be too large to commit to a particular development plan, and field appraisal may be required to reduce the uncertainty and allow a more suitable development plan to be formed. Unless the range of uncertainty is quantified using statistical techniques and representations, the need for appraisal cannot be determined. Statistical methods are used to express ranges of values of STOMP, GIIP, UR, and reserves. [Pg.158]

The calculation of characteristic values causes a high amount of values which contain redundant informations. Due to this the forth partial step will be to reduce this amount of values using extraction methods. This can be realized with statistical methods like cross correlation analysis. [Pg.16]

B. Widom, Structure and Thermodynamics of Interfaces, in Statistical Mechanics and Statistical Methods in Theory and Application, Plenum, New York, 1977, pp. 33-71. [Pg.97]

Reiss H 1977 Scaled particle theory of hard sphere fluids Statistical Mechanics and Statistical Methods in Theory and Application ed U Landman (New York Plenum) pp 99-140... [Pg.552]

Quack M and Troe J 1981 Statistical methods in scattering Theor. Chem. Adv. Perspect. B 6 199-276... [Pg.1092]

Pratt L R 1986 A statistical method for identifying transition states in high dimensional problems J. Chem. Phys. 85 5045-8... [Pg.2288]

Previous studies with a variety of datasets had shown the importance of charge distribution, of inductive effect), of r-electronegativity, resonance effect), and of effective polarizability, aeffi polarizability effect) for details on these methods see Section 7.1). All four of these descriptors on all three carbon atoms were calculated. However, in the final study, a reduced set of descriptors, shown in Table 3-4, was chosen that was obtained both by statistical methods and by chemical intuition. [Pg.194]

To understand the recommendations for structure descriptors in order to be able to apply them in QSAR or drug design in conjunction with statistical methods or machine learning techniques. [Pg.401]

A structure descriptor is a mathematical representation of a molecule resulting from a procedure transforming the structural information encoded within a symbolic representation of a molecule. This mathematical representation has to be invariant to the molecule s size and number of atoms, to allow model building with statistical methods and artificial neural networks. [Pg.403]

Chirality codes are used to represent molecular chirality by a fixed number of de-.scriptors. Thc.se descriptors can then be correlated with molecular properties by way of statistical methods or artificial neural networks, for example. The importance of using descriptors that take different values for opposite enantiomers resides in the fact that observable properties are often different for opposite enantiomers. [Pg.420]

After an alignment of a set of molecules known to bind to the same receptor a comparative molecular field analysis CoMFA) makes it possible to determine and visuahze molecular interaction regions involved in hgand-receptor binding [51]. Further on, statistical methods such as partial least squares regression PLS) are applied to search for a correlation between CoMFA descriptors and biological activity. The CoMFA descriptors have been one of the most widely used set of descriptors. However, their apex has been reached. [Pg.428]

The abbreviation QSAR stands for quantitative structure-activity relationships. QSPR means quantitative structure-property relationships. As the properties of an organic compound usually cannot be predicted directly from its molecular structure, an indirect approach Is used to overcome this problem. In the first step numerical descriptors encoding information about the molecular structure are calculated for a set of compounds. Secondly, statistical methods and artificial neural network models are used to predict the property or activity of interest, based on these descriptors or a suitable subset. A typical QSAR/QSPR study comprises the following steps structure entry or start from an existing structure database), descriptor calculation, descriptor selection, model building, model validation. [Pg.432]

The data analysis module of ELECTRAS is twofold. One part was designed for general statistical data analysis of numerical data. The second part offers a module For analyzing chemical data. The difference between the two modules is that the module for mere statistics applies the stati.stical methods or rieural networks directly to the input data while the module for chemical data analysis also contains methods for the calculation ol descriptors for chemical structures (cl. Chapter 8) Descriptors, and thus structure codes, are calculated for the input structures and then the statistical methods and neural networks can be applied to the codes. [Pg.450]

It extends the usage of statistical methods and combines it with machine learning methods and the application of expert systems. The visualization of the results of data mining is an important task as it facilitates an interpretation of the results. Figure 9-32 plots the different disciplines which contribute to data mining. [Pg.472]

Classification describes the process of assigning an instance or property to one of several given classes. The classes are defined beforehand and this class assignment is used in the learning process, which is therefore supervised. Statistical methods and decision trees (cf. Section 9.3) are also widely used for classification tasks. [Pg.473]

A very important data mining task is the discovery of characteristic descriptions for subsets of data, which characterize its members and distinguish it from other subsets. Descriptions can, for example, be the output of statistical methods like average or variance. [Pg.474]

The previously mentioned data set with a total of 115 compounds has already been studied by other statistical methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis, and the Partial Least Squares (PLS) method [39]. Thus, the choice and selection of descriptors has already been accomplished. [Pg.508]

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

The value of the torsional energy increment has been variously estimated, but TORS = 0.42 kcal mol was settled on for the bond contribution method in MM3, In the full statistical method (see below), low-frequency torsional motion should be calculated along with all the others so the empirical TORS inererneut should be zero. In fact, TORS is not zero (Allinger, 1996). It appears that the TORS inererneut is a repository for an energy eiror or errors in the method that are as yet unknown. [Pg.154]

Linnig, F. J., and J. Mandel, Which Measure of Precision AnaZ. Chem., 36 25A (1964). Mark, H., and J. Workman, Statistics in Spectroscopy, Academic Press, San Diego, CA, 1991. Meier, P. C., and R. E. Zund, Statistical Methods in Analytical Chemistry, Wiley, New York, 1993. [Pg.212]

The probabilistic nature of a confidence interval provides an opportunity to ask and answer questions comparing a sample s mean or variance to either the accepted values for its population or similar values obtained for other samples. For example, confidence intervals can be used to answer questions such as Does a newly developed method for the analysis of cholesterol in blood give results that are significantly different from those obtained when using a standard method or Is there a significant variation in the chemical composition of rainwater collected at different sites downwind from a coalburning utility plant In this section we introduce a general approach to the statistical analysis of data. Specific statistical methods of analysis are covered in Section 4F. [Pg.82]

Richardson, T. H. Reproducible Bad Data for Instruction in Statistical Methods, /. Chem. Educ. 1991, 68, 310-311. [Pg.97]

A variety of statistical methods may be used to compare three or more sets of data. The most commonly used method is an analysis of variance (ANOVA). In its simplest form, a one-way ANOVA allows the importance of a single variable, such as the identity of the analyst, to be determined. The importance of this variable is evaluated by comparing its variance with the variance explained by indeterminate sources of error inherent to the analytical method. [Pg.693]

A statistical method for comparing three or more sets of data. [Pg.693]

The extension of these ideas to random coils can proceed along two lines. In one analysis the coil domain is visualized as a sphere, as in the case above, with r taking the place of R. Alternatively, statistical methods can be employed... [Pg.647]

O. L. Davies and O. L. Goldsmith, Statistical Methods in Kesearch with Special Reference to the Chemical Industy, Longman, New York, 1976. [Pg.43]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...