Variance biological data

A variety of statistical parameters have been reported in the QSAR literature to reflect the quality of the model. These measures give indications about how well the model fits existing data, i.e., they measure the explained variance of the target parameter y in the biological data. Some of the most common measures of regression are root mean squares error (rmse), standard error of estimates (s), and coefficient of determination (R2). [Pg.200]

The correlation coefficient r is a measure of quality of fit of the model. It constitutes the variance in the data. In an ideal situation one would want the correlation coefficient to be equal to or approach 1, but in reality because of the complexity of biological data, any value above 0.90 is adequate. The standard deviation is an absolute measure of the quality of fit. Ideally s should approach zero, but in experimental situations, this is not so. It should be small but it cannot have a value lower than the standard deviation of the experimental data. The magnitude of s may be attributed to some experimental error in the data as well as imperfections in the biological model. A larger data set and a smaller number of variables generally lead to lower values of s. The F value is often used as a measure of the level of statistical significance of the regression model. It is defined as denoted in Equation 1.27. [Pg.10]

Fig. (14). A Experimental vs. calculated pIC o values from a QSAR model for inhibitory effect of 28 STLs on NF-kB activation (biological data from [59], structures see Fig. (12)). The model was generated by GA-PLS analysis (number of latent variables 3) from the 8 descriptors shown in the loading weights plot (B). This plot illustrates the impact of each descriptor on the first two latent variables (PCI and PC2) explaining 54 % and 27 % of the variance in the Y data (pIC o), respectively.

Ensures diat die model for the population response is correctly specified—reasonable for population pharmacokinetics Serial concentrations measured from an individual are likely to be correlated Constant intraindividual variance is frequently violated and typically accounted for widi error models that specify the G vs. concentration relationship die distribution of G over (time) is defined by die underlying structural model Historical requirement for inference unrealistic for nonlinear models particularly with biologic data... [Pg.324]

In general, a regression equation can be accepted in QSAR studies, if the correlation coefficient r is around or better than 0.9 for in vitro data and 0.8 for whole animal data (as already discussed, its value depends not only on the quality of fit but also on the overall variance of the biological data compare eqs. 124—126, chapter 5.1),... [Pg.95]

Guaianohdes have also been intensively examined toward protozoa like members of the genera Trypanosoma (sleeping sickness), Leishmania (leishmaniasis), and Plasmodium (malaria) only studies that also tested cytotoxic activity are considered here. In all cases, an antiprotozoal activity correlates positively with cytotoxicity, and the major determinants for activity are a,p-unsaturated carbonyl residues. Certain compounds are considerably more toxic against protozoa than against mammalian cells and vice versa. A comparative QSAR analysis has been undertaken, and both activities were found to depend mainly on the same structural elements and molecular properties. The observed variance in the biological data can maybe be explained by the positioning of the various molecules in the active site [63-65]. [Pg.3093]

If the aim of a 3D-QSAR analysis is to quantitate the features of ligand-receptor recognition, the biological data should reflect the affinity of the ligand for its receptor without being complicated by distribution to the site or metabolism. Because QSAR models are based on the differences in bioactivities of the compounds, there must be adequate variance in bioactivity within the data set. Additionally, 3D-QSAR should exceed in precision that of the bioactivity measurements. [Pg.186]

Statistics in general is a discipline dealing with ideas on description of data, implications of data (relation to general pharmacological models), and questions such as what effects are real and what effects are different Biological systems are variable. Moreover, often they are living. What this means is that they are collections of biochemical reactions going on in synchrony. Such systems will have an intrinsic variation in their output due to the variances in the... [Pg.225]

Fig. 37.3. Principal components biplot showing the positions of 6 substituted oxazepine (O) and 6 substituted thiazepine (S) neuroleptics with respect to three physicochemical parameters and two biological activities [41,43]. The data are shown in Table 37.6. The thiazepine analogs are represented by means of filled symbols. The horizontal and vertical components represent 50 and 39%, respectively, of the variance in the data.

On the other hand, factor analysis involves other manipulations of the eigen vectors and aims to gain insight into the structure of a multidimensional data set. The use of this technique was first proposed in biological structure-activity relationship (i. e., SAR) and illustrated with an analysis of the activities of 21 di-phenylaminopropanol derivatives in 11 biological tests [116-119, 289]. This method has been more commonly used to determine the intrinsic dimensionality of certain experimentally determined chemical properties which are the number of fundamental factors required to account for the variance. One of the best FA techniques is the Q-mode, which is based on grouping a multivariate data set based on the data structure defined by the similarity between samples [1, 313-316]. It is devoted exclusively to the interpretation of the inter-object relationships in a data set, rather than to the inter-variable (or covariance) relationships explored with R-mode factor analysis. The measure of similarity used is the cosine theta matrix, i. e., the matrix whose elements are the cosine of the angles between all sample pairs [1,313-316]. [Pg.269]

Normalization of cDNA microarray data is a very important step in the process of data analysis. With current technology, systematic hias is unavoidable and must he dealt with in a sensible manner. Furthermore, normalization methods need to be consistently apphed to all raw data. Using different normalization methods on different datasets may introduce bias and thereby decrease the validity of the data. Normahzed data should be free of systematic bias and should thereby provide a truer representation of the biological variance. Furthermore, normahzed data increases the validity of shde to shde comparisons. [Pg.399]

Current practice in microarray experimentation suggests that a balance design with adequate replication be used. Good experimental design and execution will produce data that minimize technical variance, allowing the statistical analyses to evaluate biological variance more effectively Still, the nature of the data requires that an estimate of the FDR be included in the statistical analysis. This enables the researcher to assess the reliability/validity of the results of the statistical analysis. As discussed earlier, cDNA microarray... [Pg.400]

The drawback of these approaches is their failure to compartmentalize sources of variation. Thus, in studies where biological and technical replicates both exist, there is the possibility that the normalization method will inappropriately remove biological variance, such as the treatment effect itself. This can lead to a reduction in the overall treatment related response. Other sources of variation include replicate effects, where the timing of hybridization and scanning, may cause persistent trends in the data. [Pg.539]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...