Statistical methods model building

Reilly, P., and Blau, G., The Use of Statistical Methods to Build Mathematical Models of Chemical Reacting Systems, Canadian Journal of Chemical Engineering, 52, 289-299 (1974). [Pg.116]

A structure descriptor is a mathematical representation of a molecule resulting from a procedure transforming the structural information encoded within a symbolic representation of a molecule. This mathematical representation has to be invariant to the molecule s size and number of atoms, to allow model building with statistical methods and artificial neural networks. [Pg.403]

The abbreviation QSAR stands for quantitative structure-activity relationships. QSPR means quantitative structure-property relationships. As the properties of an organic compound usually cannot be predicted directly from its molecular structure, an indirect approach Is used to overcome this problem. In the first step numerical descriptors encoding information about the molecular structure are calculated for a set of compounds. Secondly, statistical methods and artificial neural network models are used to predict the property or activity of interest, based on these descriptors or a suitable subset. A typical QSAR/QSPR study comprises the following steps structure entry or start from an existing structure database), descriptor calculation, descriptor selection, model building, model validation. [Pg.432]

A. Dijkstra, and L. Kaufman, Evaluation and Optimization of Laboratory Methods and Analytical Procedures (Amsterdam Elsevier, 1978) G. E. P. Box, W. G. Hunter, and J. S. Hunter, Statistics for Experimenters An Introduction to Design Data Analysis and Model Building (New York Wiley, 1978) R. S. Strange, Introduction to Experimental Design for Chemists, J. Chem. Ed. 1990,67. 113. [Pg.666]

To compare and evaluate different kinds of models created during the model development process graphical and statistical methods should be applied. A good description of the model building process can be found elsewhere [14]. [Pg.461]

The terms bioinformatics and cheminformatics refer to the use of computational methods in the study of biology and chemistry. Information from DNA or protein sequences, protein structure, and chemical structure is used to build models of biochemical systems or models of the interaction of a biochemical system with a small molecule (e.g., a drug). There are mathematical and statistical methods for analysis, public databases, and literature associated with each of these disciplines. However, there is substantial value in considering the interaction between these areas and in building computational models that integrate data from both sources. In the most... [Pg.282]

Model Building Method". Paper presented at Princeton Symposium on Statistics, 1966. [Pg.247]

Model building and current statistical methods applied to interpretation of rate data are presented in... [Pg.436]

Validation without an independent test set. Each application of the adaptive wavelet algorithm has been applied to a training set and validated using an independent test set. If there are too few observations to allow for an independent testing and training data set, then cross validation could be used to assess the prediction performance of the statistical method. Should this be the situation, it is necessary to mention that it would be an extremely computational exercise to implement a full cross-validation routine for the AWA. That is. it would be too time consuming to leave out one observation, build the AWA model, predict the deleted observation, and then repeat this leave-one-out procedure separately. In the absence of an independent test set, a more realistic approach would be to perform cross-validation using the wavelet produced at termination of the AWA, but it is important to mention that this would not be a full validation. [Pg.200]

Chapter 1 is an overview of statistical methods and elementary concepts for statistical model building. [Pg.511]

Ligand- and structure-based approaches are valuable tools for the identification and optimization of lead compounds. Each strategy needs special prerequisites and has strengths and weaknesses. In some cases only the strengths of both methods may be combined for a joint approach, called structure-based pharmacophore alignment. Here, the receptor site serves as a complement to build the pharmacophore model and sophisticated statistical methods from 3D-QSAR (PCA and PLS) are applied for the prediction of activity [19, 20]. [Pg.1187]

The final result has therefore the maximum value of the Fisher criterion and the highest value of the cross-validated correlation coefficient. According to these statistical criteria, it was considered as the best representation of the property in the given (large) descriptor space. The BMLR approach has a variation that takes care of the noncoUinearity of descriptors pairs, called the Heuristics method (1996JPC10400). The advantages of such methods are that they are fast and limit the chance correlation to minimum. Both techniques were successfully used by ARK for model building for a tremendous amount of chemical properties of compounds and heterocycHcs, in particular. [Pg.256]

Cheminformatics is a relatively new disciphne that encompasses a number of different fields. Many people come to cheminformatics from other fields, or use cheminformatic methods as an adjunct to experimental work. Standards for how cheminformatics professionals should be trained are still emerging. Hopefully as the field matures, practitioners will begin to appreciate the importance of a firm grounding in Statistics. Without an appreciation for the statistical foundations in our methods, it is difficult to see how our field will progress. This paper should not be considered a comprehensive guide to model building or evaluation. The objective here was just to point out a few commonly encountered pitfalls. The interested reader is urged to consult any of a number of excellent Biostatistics or data analysis texts [31,49]. Another excellent additional source of information on model evaluation and comparison is the recent work of Anthony Nicholls [50-52]. [Pg.18]

Compound pairs detected as informative activity cliffs often illustrate key chemical features for activity. These pairs, however, may also often be detected as apparent statistical outliers in quantitative SAR analysis methods [56], since the assumption of SAR continuity is fundamental for QSAR model building and affinity prediction. [Pg.210]

Barnett V, Lewis T (1994) Outliers in statistical data, 3rd edn. Wiley, Chichester Bloomfield P (2000) Fourier analysis of time series an introduction, 2nd edn. Wiley, New York Box GE, Jenkins GM (1970) Time series analysis, forecasting, and control. Holden-Day, Oakland Box GE, Hunter WG, Hunter IS (1978) Statistics for experimenters an introduction to design, data analysis, and model building. Wiley, New York Brillo J (2007) Excel for scientists and engineers numerical methods. Wiley, Hoboken Chambers JM, Cleveland WS, Kleiner B, Tukey P (1983) Graphical methods for data analysis. [Pg.404]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...