Methods statistical

Many of the standard methods of statistics have been applied for quality control for many years. The computer has merely made their routine application easier, and enables the results to be presented in a more useful form. For example, it is possible to program the computer to take the analytical output, to combine it with other data so that the output is in the form of a decision e.g. this sample is acceptable or this sample is suitable for internal use, but not for outside customers or this sample is not to specification, being deficient in the following respects. .. This is time-saving and reduces error at the decision-making stage. [Pg.13]

Because analytical results can be accumulated over long periods on easily accessible disks — a big improvement on laboratory notebooks — it is easier to evaluate the performance of a method or process over a period of time. It is easy to determine the frequency of defects in performance, both by inspection and by more formal methods. One of these which is being used more widely is the cusum procedure. A reference value, k, is selected typically it is the mean value of the expected result. Then sequentially the cumulative sum, Si is calculated [Pg.13]

It is also a form of time series analysis, which is a growth point in statistics. It is to be expected that as improved and novel statistical methods are developed, accumulated data will be used increasingly for the purposes of quality control and forecasting performance. [Pg.14]

For the formal description of relationships between activity measures and structural descriptors of compounds various statistical techniques can be used, such as [Pg.63]

Classification methods pattern recognition cluster analysis discriminant analysis. [Pg.63]

The method of choice depends on the type of activity to be modelled and on the quality and quantity of the data. The derivation of meaningful QSARs always consists of three crucial steps [Pg.63]

Sound QSARs for predictive purposes have to satisfy two criteria [Pg.63]

The models provide an accurate description of the activity of all compounds it is based on. [Pg.63]

In the setting up of the statistical theory there are several alternative procedures, and the newcomer to the subject often finds it difiicult to see the connexion between them. The essential problem, of course, is that of averagiug and the various methods differ principally in the following respects [Pg.339]

Most introductory accounts use the combination (la), (26) and (3a), which is one of the easiest to grasp. However, the disadvantage of averaging over the quantum states of molectdes as in (la), is that the statistical formulae which are obtained are applicable only to qrstems in which the particles are independent, as in perfect gases. In many elementary accounts of this method it is also necessary to adopt a subterfuge in order to introduce an important term, n , into the formulae. The method (26) may also lead to an erroneous impression of the nature of equilibrium, for it might be taken to imply that the system stays always in the particular distribution known as the most probable distribution. Tliis would be to preclude fluctuation phenomena. [Pg.340]

For these reasons we shall put forward an elementary treatment based on the combination (16), (2a) and either (3a) or (36) according to circumstances. This treatment will apply only to equilibrium situations a statistical mechanics which would take account of the approach to equilibrium would require a much more elaborate treatment involving Liouville s theorem. It may be remarked too that (3a), (36) and (3c) signify three different choices of the independent variables in the corresponding thermodynamic analysis of the problem—these are (Z7, F, n ), (3T, F, n ) and (T, F, respectively. [Pg.340]

Let it be supposed that we wish to calculate the value of some property of a macroscopic system which is in equilibrium. Now the experimental measurement of this property always requires a certain interval of time for its execution, and if the calculated value of is to be relevant the latter must clearly be some sort of average over all the quantum states through which the system is likely to pass during the period of measurement. [Pg.340]

During a virtually infinite interval of time, A, the system may be expected to pass through each of its accessible quantum states a great many times. Let P, P, P2, etc., be the fractions of this interval which the system spends in the quantum states 0,1, 2, etc. P may be called the probability of the system being in the quantum state . Let Xo Xv X%t the values of the property x which occur when [Pg.340]

The advent of relatively inexpensive computers has enabled the accumulation and rapid analysis of large sets of data. By this means patterns and trends not always apparent from visual inspection of chromatograms or tables of data can be discriminated by being sorted into recognizable patterns. This approach is essential for some techniques such as pyrolysis where the quantity of data produced would otherwise be overwhelming. Several statistical approaches to exploit the information content of fatty acid and triacylglycerol patterns for the detection and quantification of CBEs in cocoa butter have been reported (Lipp et al., 2001 Simoneau et al., 1999). [Pg.87]

Principal components analysis (PCA) reduces the volume of large data sets by combining correlated variables and maximizing variances to show patterns in the data. Usually, analysis of the variance (ANOVA) is used to prove that the null hypothesis, that there is no difference between the data sets, is not valid. Test results are compared with table values at a probability (normally 95%) that they will conform to that value. Data are plotted in such ways that different populations are visibly separate and the clustering within each set illustrates the degree of repeatability. [Pg.87]

Discriminant analysis has been used in many of the analyses described in this chapter, in particular the classification of cocoa butters by origin and processing from pyrolysis MS data (Radovic et al., 1998), from triacylglycerol profiles obtained by HPLC (Hernandez et al., 1991) and from analysis of volatiles (Pino, 1992). Data from the analysis of mixtures of CBEs with cocoa butter, which model techniques for measuring CBEs in chocolate, have been treated by similar means (Anklam et al., 1996). [Pg.87]

Multivariate statistical techniques can be used to combine data from several classical analytical methods. In this way Simoneau et al. (2000) tested the quantification of CBEs in mixtures with cocoa butter by analysis of fatty acids by GC and triacyglycerols with both GC and HPLC. [Pg.87]

Since the scale of the TTs vary by several orders of magnitude, each TT was transformed by the natural log of the index plus one. [Pg.107]

The large number of TIs, and the fact that many of them are highly correlated, confounds the development of predictive models. Therefore, we attempted to reduce the number of TIs to a smaller set of relatively independent variables. Variable clustering was used to divide the TIs into disjoint subsets (clusters) that are essentially unidimensional. These clusters form new variables which are the first principal component derived from the members of the cluster. From each cluster of indexes, a single index was selected. The index chosen was the one most correlated with the cluster variable. In some cases, a member of a cluster showed poor group membership relative to the other members of the cluster, i.e., the correlation of an index with the cluster variable was much lower than the other members. Any variable showing poor cluster membership was selected for further studies as well. A correlation of a TI with the cluster variable less than 0.7 was used as the definition of poor cluster membership. [Pg.107]

The monomer units (building blocks from which the molecules are built-up) differ in the number of reacted functional groups, i.e., in the number of bonds in which they take part, or, in other words, in the number of bonds they issue — the term which will be used later on. For a single-component system, this distribution is sufficient (for multicomponent systems the types of bonds are to be specified, too) for the build-up of trees, if the so-called first-shell substitution effect (fsse) is operative. Fsse means that the reactivity of a group in a unit is independent of the state of the groups in the neighbouring units. [Pg.14]

The distribution of monomer units according to the number of bonds they issue is conveniently expressed through a probability generating function which is a simple tool used in generation of the trees. Thus, the probability generating function (pgf) for the number of bonds issuing from a f-functional monomer in the root Fp(z) is defined as [Pg.14]

In Fq(z), Pj is the probability of finding a monomer unit in the root which issues i bonds. This probability is equal to the fraction of units with i reacted functional groups z is an auxiliary (dummy) variable of the generating function through which the operations with the pgf are performed. It is important to note that Pj is just the coefficient at z By operations, the differentiations or rarely integrations are meant. In the derived statistical averages, z is put equal to 1 or 0. Thus [Pg.14]

2 In earlier papers, the letter 0 was used for the pgf variable. For typographical convenience, it has been replaced here by z [Pg.14]

While Fp(z) represents the distribution of units in the root, it does not apply to units in generations g 0. The monomer with no reacted group (fraction Pj,) cannot appear in generation g 0 because such unit must be bound at least to the unit in the preceding generation. The pgf for units on generations g 0, F(z), is in the case of fase [Pg.15]

Prior to analyses, all calculated descriptors were transformed by In (x + c), where x represents the original descriptor value and c is a constant added [Pg.51]

With respect to the appHed regression methodologies, RR is similar to PCR in that the independent variables are transformed to their principal components (PCs). However, while PCR utilizes only a subset of the PCs, RR retains them all but downweights them based on their eigenvalues. With PLS, a subset of the PCs is also used, but the PCs are selected by considering both the independent and dependent variables. For each model developed, the cross-validated R was obtained using the leave-one-out (LOO) approach and can be calculated as follows [Pg.52]

Since the accuracy of experimental data is frequently not high, and since experimental data are hardly ever plentiful, it is important to reduce the available data with care using a suitable statistical method and using a model for the excess Gibbs energy which contains only a minimum of binary parameters. Rarely are experimental data of sufficient quality and quantity to justify more than three binary parameters and, all too often, the data justify no more than two such parameters. When data sources (5) or (6) or (7) are used alone, it is not possible to use a three- (or more)-parameter model without making additional arbitrary assumptions. For typical engineering calculations, therefore, it is desirable to use a two-parameter model such as UNIQUAC. [Pg.43]

The primary purpose for expressing experimental data through model equations is to obtain a representation that can be used confidently for systematic interpolations and extrapolations, especially to multicomponent systems. The confidence placed in the calculations depends on the confidence placed in the data and in the model. Therefore, the method of parameter estimation should also provide measures of reliability for the calculated results. This reliability depends on the uncertainties in the parameters, which, with the statistical method of data reduction used here, are estimated from the parameter variance-covariance matrix. This matrix is obtained as a last step in the iterative calculation of the parameters. [Pg.102]

In the maximum-likelihood method used here, the "true" value of each measured variable is also found in the course of parameter estimation. The differences between these "true" values and the corresponding experimentally measured values are the residuals (also called deviations). When there are many data points, the residuals can be analyzed by standard statistical methods (Draper and Smith, 1966). If, however, there are only a few data points, examination of the residuals for trends, when plotted versus other system variables, may provide valuable information. Often these plots can indicate at a glance excessive experimental error, systematic error, or "lack of fit." Data points which are obviously bad can also be readily detected. If the model is suitable and if there are no systematic errors, such a plot shows the residuals randomly distributed with zero means. This behavior is shown in Figure 3 for the ethyl-acetate-n-propanol data of Murti and Van Winkle (1958), fitted with the van Laar equation. [Pg.105]

The range of uncertainty in the UR may be too large to commit to a particular development plan, and field appraisal may be required to reduce the uncertainty and allow a more suitable development plan to be formed. Unless the range of uncertainty is quantified using statistical techniques and representations, the need for appraisal cannot be determined. Statistical methods are used to express ranges of values of STOMP, GIIP, UR, and reserves. [Pg.158]

The calculation of characteristic values causes a high amount of values which contain redundant informations. Due to this the forth partial step will be to reduce this amount of values using extraction methods. This can be realized with statistical methods like cross correlation analysis. [Pg.16]

B. Widom, Structure and Thermodynamics of Interfaces, in Statistical Mechanics and Statistical Methods in Theory and Application, Plenum, New York, 1977, pp. 33-71. [Pg.97]

Reiss H 1977 Scaled particle theory of hard sphere fluids Statistical Mechanics and Statistical Methods in Theory and Application ed U Landman (New York Plenum) pp 99-140... [Pg.552]

Quack M and Troe J 1981 Statistical methods in scattering Theor. Chem. Adv. Perspect. B 6 199-276... [Pg.1092]

Pratt L R 1986 A statistical method for identifying transition states in high dimensional problems J. Chem. Phys. 85 5045-8... [Pg.2288]

Previous studies with a variety of datasets had shown the importance of charge distribution, of inductive effect), of r-electronegativity, resonance effect), and of effective polarizability, aeffi polarizability effect) for details on these methods see Section 7.1). All four of these descriptors on all three carbon atoms were calculated. However, in the final study, a reduced set of descriptors, shown in Table 3-4, was chosen that was obtained both by statistical methods and by chemical intuition. [Pg.194]

To understand the recommendations for structure descriptors in order to be able to apply them in QSAR or drug design in conjunction with statistical methods or machine learning techniques. [Pg.401]

A structure descriptor is a mathematical representation of a molecule resulting from a procedure transforming the structural information encoded within a symbolic representation of a molecule. This mathematical representation has to be invariant to the molecule s size and number of atoms, to allow model building with statistical methods and artificial neural networks. [Pg.403]

Chirality codes are used to represent molecular chirality by a fixed number of de-.scriptors. Thc.se descriptors can then be correlated with molecular properties by way of statistical methods or artificial neural networks, for example. The importance of using descriptors that take different values for opposite enantiomers resides in the fact that observable properties are often different for opposite enantiomers. [Pg.420]

After an alignment of a set of molecules known to bind to the same receptor a comparative molecular field analysis CoMFA) makes it possible to determine and visuahze molecular interaction regions involved in hgand-receptor binding [51]. Further on, statistical methods such as partial least squares regression PLS) are applied to search for a correlation between CoMFA descriptors and biological activity. The CoMFA descriptors have been one of the most widely used set of descriptors. However, their apex has been reached. [Pg.428]

The abbreviation QSAR stands for quantitative structure-activity relationships. QSPR means quantitative structure-property relationships. As the properties of an organic compound usually cannot be predicted directly from its molecular structure, an indirect approach Is used to overcome this problem. In the first step numerical descriptors encoding information about the molecular structure are calculated for a set of compounds. Secondly, statistical methods and artificial neural network models are used to predict the property or activity of interest, based on these descriptors or a suitable subset. A typical QSAR/QSPR study comprises the following steps structure entry or start from an existing structure database), descriptor calculation, descriptor selection, model building, model validation. [Pg.432]

The data analysis module of ELECTRAS is twofold. One part was designed for general statistical data analysis of numerical data. The second part offers a module For analyzing chemical data. The difference between the two modules is that the module for mere statistics applies the stati.stical methods or rieural networks directly to the input data while the module for chemical data analysis also contains methods for the calculation ol descriptors for chemical structures (cl. Chapter 8) Descriptors, and thus structure codes, are calculated for the input structures and then the statistical methods and neural networks can be applied to the codes. [Pg.450]

It extends the usage of statistical methods and combines it with machine learning methods and the application of expert systems. The visualization of the results of data mining is an important task as it facilitates an interpretation of the results. Figure 9-32 plots the different disciplines which contribute to data mining. [Pg.472]

Classification describes the process of assigning an instance or property to one of several given classes. The classes are defined beforehand and this class assignment is used in the learning process, which is therefore supervised. Statistical methods and decision trees (cf. Section 9.3) are also widely used for classification tasks. [Pg.473]

A very important data mining task is the discovery of characteristic descriptions for subsets of data, which characterize its members and distinguish it from other subsets. Descriptions can, for example, be the output of statistical methods like average or variance. [Pg.474]

The previously mentioned data set with a total of 115 compounds has already been studied by other statistical methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis, and the Partial Least Squares (PLS) method [39]. Thus, the choice and selection of descriptors has already been accomplished. [Pg.508]

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

The value of the torsional energy increment has been variously estimated, but TORS = 0.42 kcal mol was settled on for the bond contribution method in MM3, In the full statistical method (see below), low-frequency torsional motion should be calculated along with all the others so the empirical TORS inererneut should be zero. In fact, TORS is not zero (Allinger, 1996). It appears that the TORS inererneut is a repository for an energy eiror or errors in the method that are as yet unknown. [Pg.154]

Linnig, F. J., and J. Mandel, Which Measure of Precision AnaZ. Chem., 36 25A (1964). Mark, H., and J. Workman, Statistics in Spectroscopy, Academic Press, San Diego, CA, 1991. Meier, P. C., and R. E. Zund, Statistical Methods in Analytical Chemistry, Wiley, New York, 1993. [Pg.212]

The probabilistic nature of a confidence interval provides an opportunity to ask and answer questions comparing a sample s mean or variance to either the accepted values for its population or similar values obtained for other samples. For example, confidence intervals can be used to answer questions such as Does a newly developed method for the analysis of cholesterol in blood give results that are significantly different from those obtained when using a standard method or Is there a significant variation in the chemical composition of rainwater collected at different sites downwind from a coalburning utility plant In this section we introduce a general approach to the statistical analysis of data. Specific statistical methods of analysis are covered in Section 4F. [Pg.82]

Richardson, T. H. Reproducible Bad Data for Instruction in Statistical Methods, /. Chem. Educ. 1991, 68, 310-311. [Pg.97]

A variety of statistical methods may be used to compare three or more sets of data. The most commonly used method is an analysis of variance (ANOVA). In its simplest form, a one-way ANOVA allows the importance of a single variable, such as the identity of the analyst, to be determined. The importance of this variable is evaluated by comparing its variance with the variance explained by indeterminate sources of error inherent to the analytical method. [Pg.693]

A statistical method for comparing three or more sets of data. [Pg.693]

The extension of these ideas to random coils can proceed along two lines. In one analysis the coil domain is visualized as a sphere, as in the case above, with r taking the place of R. Alternatively, statistical methods can be employed... [Pg.647]

O. L. Davies and O. L. Goldsmith, Statistical Methods in Kesearch with Special Reference to the Chemical Industy, Longman, New York, 1976. [Pg.43]

See also in sourсe #XX -- [ Pg.9 , Pg.10 , Pg.11 ]

See also in sourсe #XX -- [ Pg.188 , Pg.535 , Pg.653 , Pg.657 ]

See also in sourсe #XX -- [ Pg.91 ]

See also in sourсe #XX -- [ Pg.58 , Pg.59 , Pg.168 , Pg.174 ]

See also in sourсe #XX -- [ Pg.371 , Pg.393 ]

See also in sourсe #XX -- [ Pg.9 ]

See also in sourсe #XX -- [ Pg.109 , Pg.110 , Pg.111 , Pg.112 , Pg.113 , Pg.114 , Pg.115 , Pg.116 , Pg.117 , Pg.118 , Pg.119 , Pg.120 , Pg.121 ]

See also in sourсe #XX -- [ Pg.256 ]

See also in sourсe #XX -- [ Pg.318 ]

See also in sourсe #XX -- [ Pg.54 , Pg.68 ]

See also in sourсe #XX -- [ Pg.71 , Pg.72 ]

See also in sourсe #XX -- [ Pg.184 ]

See also in sourсe #XX -- [ Pg.83 , Pg.84 , Pg.85 , Pg.86 , Pg.87 ]

See also in sourсe #XX -- [ Pg.16 , Pg.26 ]

See also in sourсe #XX -- [ Pg.339 ]

See also in sourсe #XX -- [ Pg.78 ]

See also in sourсe #XX -- [ Pg.178 , Pg.179 , Pg.180 , Pg.181 , Pg.182 , Pg.183 , Pg.184 , Pg.185 , Pg.186 , Pg.199 ]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...