Outliers, statistical structural analysis

The goal of EDA is to reveal structures, peculiarities and relationships in data. So, EDA can be seen as a kind of detective work of the data analyst. As a result, methods of data preprocessing, outlier selection and statistical data analysis can be chosen. EDA is especially suitable for interactive proceeding with computers (Buja et al. [1996]). Although graphical methods cannot substitute statistical methods, they can play an essential role in the recognition of relationships. An informative example has been shown by Anscombe [1973] (see also Danzer et al. [2001], p 99) regarding bivariate relationships. [Pg.268]

Structure-based clustering is used to group related compounds for the purpose of HTS data analysis, identification of SAR series, and detection of potential outliers (Engels et al., 2002). In one example, researchers at GNF reported a statistical approach to dynamically score each scaffold family (obtained by prior clustering of screened structures) based on family members HTS activities. This method identifies compounds that share structural similarities and similarly high HTS activities it yielded greatly improved confirmation rates compared to using a static (scaffold-independent) activity cut-off (Yan et al., 2005). [Pg.253]

Exploratory data analysis is a collection of techniques that search for structure in a data set before calculating any statistic model [Krzanowski, 1988]. Its purpose is to obtain information about the data distribution, about the presence of outliers and clusters, and to disclose relationships and correlations between objects and/or variables. Principal component analysis and cluster analysis are the most well-known techniques for data exploration [Jolliffe, 1986 Jackson, 1991 Basilevsky, 1994]. [Pg.61]

The methodology to answering these parameter estimation and set-based questions relies on different mathematical approaches. In principle, the parameter identification of chemical kinetic models can be posed as classical statistical inference [17,19-21] given a mathematical model and a set of experimental observations for the model responses, determine the best-fit parameter values, usually those that produce the smallest deviations of the model predictions from the measurements. The validity of the model and the identification of outliers are then determined using analysis of variance. The general optimizations are computationally intensive even for well-behaved, well-parameterized algebraic functions. Further complications arise from the highly ill-structured character... [Pg.255]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...