Statistical methods mean square error

Additional examination of the model s fit is performed through the comparison of the experimental and predicted bioactivities and is needed to statistically ensure that the models are sound. The methods of chi (%) and root-mean squared error (RMSE) are performed to determine if the model possesses the predictive quality reflected in the R2. The use of RMSE shows the error between the mean of the experimental values and predicted activities. The chi value exhibits the difference between the experimental and predicted bioactivities ... [Pg.186]

The residuals may also be obtained from cross-validation. With the methods described in this book, both the X part and the y part of Equation (7.7) are modeled and hence have associated residuals. The residual ey of y is usually summed to a number of regression statistics such as percentage variance explained, the coefficient of determination, root mean squared error of prediction etc. Diagnostics based on y-residuals are well covered in standard regression literature [Atkinson 1985, Beebe et al. 1998, Cook 1996, Cook Weisberg 1980, Cook Weisberg 1982, Martens Ntes 1989, Weisberg 1985],... [Pg.170]

The experimental data were processed with a commonly used variational statistic method [14] and by KINS program given in [15]. The experimental data are presented in form of arithmetic means with the indication of the mean square errors of the arithmetic means (M m). [Pg.243]

There are three statistics often employed for comparing the performances of multivariate calibration models root mean squared error of calibration (RMSEC), root mean squared error of cross validation (RMSECV), and root mean squared error of prediction (RMSEP). All three methods are based on the calculated root mean squared error (RMSE)... [Pg.221]

In summary, the support vector machine (SVM) and partial least square (PLS) methods were used to develop quantitative structure activity relationship (QSAR) models to predict the inhibitory activity of nonpeptide HIV-1 protease inhibitors. Cenetic algorithm (CA) was employed to select variables that lead to the best-fitted models. A comparison between the obtained results using SVM with those of PLS revealed that the SVM model is much better than that of PLS. The root mean square errors of the training set and the test set for SVM model were calculated to be 0.2027, 0.2751, and the coefficients of determination (R2) are 0.9800, 0.9355 respectively. Furthermore, the obtained statistical parameter of leave-one-out cross-validation test (Q ) on SVM model was 0.9672, which proves the reliability of this model. Omar Deeb is thankful for Al-Quds University for financial support. [Pg.79]

One asterisk indicates significance at 95%, two asterisks at 99% level. NS, not significant at 95% level. Calculated by dividing mean square of line by mean square for error in this case deviations from double regression are used as an estimate of error. Significance determined from tables cf., e.g., G. W. Snedecor, Statistical Methods, 4th Edn. Iowa State College Press, Ames, 1946. [Pg.260]

In some cases when estimates of the pure-error mean square are unavailable owing to lack of replicated data, more approximate methods of testing lack of fit may be used. Here, quadratic terms would be added to the models of Eqs. (32) and (33), the complete model would be fitted to the data, and a residual mean square calculated. Assuming this quadratic model will adequately fit the data (lack of fit unimportant), this quadratic residual mean square may be used in Eq. (68) in place of the pure-error mean square. The lack-of-fit mean square in this equation would be the difference between the linear residual mean square [i.e., using Eqs. (32) and (33)] and the quadratic residual mean square. A model should be rejected only if the ratio is very much greater than the F statistic, however, since these two mean squares are no longer independent. [Pg.135]

Thus, when a property of the sample (which exists as a large volume of material) is to be measured, there usually will be differences between the analytical data derived from application of the test methods to a gross lot or gross consignment and the data from the sample lot. This difference (the sampling error) has a frequency distribution with a mean value and a variance. Variance is a statistical term defined as the mean square of errors the square root of the variance is more generally known as the standard deviation or the standard error of sampling. [Pg.167]

A quarter of a century ago Behnken [224] as well as Tidwell and Mortimer [225] pointed out that the linearization transforms the error structure in the observed copolymer composition with the result that such errors after transformation have no longer zero mean and constant variances. It means that such transformed variables do not meet the requirements for the least-squares procedure. The only statistically accurate means of estimation of the reactivity ratios from the experimental data is based on the non-linear least-squares procedure. An effective computing program for this purpose has been published by Tidwell and Mortimer (TM) [225]. Their method is considered to be such a modification of the curve-fitting procedure where the sum of the squares of the difference between the observed and computed polymer compositions is minimized. [Pg.60]

Table IV shows the overall analysis of variance (ANOVA) and lists some miscellaneous statistics. The ANOVA table breaks down the total sum of squares for the response variable into the portion attributable to the model, Equation 3, and the portion the model does not account for, which is attributed to error. The mean square for error is an estimate of the variance of the residuals — differences between observed values of suspensibility and those predicted by the empirical equation. The F-value provides a method for testing how well the model as a whole — after adjusting for the mean — accounts for the variation in suspensibility. A small value for the significance probability, labelled PR> F and 0.0006 in this case, indicates that the correlation is significant. The R2 (correlation coefficient) value of 0.90S5 indicates that Equation 3 accounts for 91% of the experimental variation in suspensibility. The coefficient of variation (C.V.) is a measure of the amount variation in suspensibility. It is equal to the standard deviation of the response variable (STD DEV) expressed as a percentage of the mean of the response response variable (SUSP MEAN). Since the coefficient of variation is unitless, it is often preferred for estimating the goodness of fit.

The quantities AUMC and AUSC can be regarded as the first and second statistical moments of the plasma concentration curve. These two moments have an equivalent in descriptive statistics, where they define the mean and variance, respectively, in the case of a stochastic distribution of frequencies (Section 3.2). From the above considerations it appears that the statistical moment method strongly depends on numerical integration of the plasma concentration curve Cp(r) and its product with t and (r-MRT). Multiplication by t and (r-MRT) tends to amplify the errors in the plasma concentration Cp(r) at larger values of t. As a consequence, the estimation of the statistical moments critically depends on the precision of the measurement process that is used in the determination of the plasma concentration values. This contrasts with compartmental analysis, where the parameters of the model are estimated by means of least squares regression. [Pg.498]

Statistical Analysis. Analysis of variance (ANOVA) of toxicity data was conducted using SAS/STAT software (version 8.2 SAS Institute, Cary, NC). All toxicity data were transformed (square root, log, or rank) before ANOVA. Comparisons among multiple treatment means were made by Fisher s LSD procedure, and differences between individual treatments and controls were determined by one-tailed Dunnett s or Wilcoxon tests. Statements of statistical significance refer to a probability of type 1 error of 5% or less (p s 0.05). Median lethal concentrations (LCjq) were determined by the Trimmed Spearman-Karber method using TOXSTAT software (version 3.5 Lincoln Software Associates, Bisbee, AZ). [Pg.96]

Given two estimates of a statistic, one from a sample of size n and the other from a sample of size 2n, one might expect that the estimate from the larger sample would be more reliable than that from the smaller sample. This is, in fact, supported by statistical theory. If the variance in the population is cr, then the variance of the sample mean for samples of size n is a jn. The square root of this is the standard error of the mean. Consistent with the variance of the sample mean being /n times that of a single determination (cr ), the standard deviation and the CV% of the sample mean are reduced by the square root of n. As a direct consequence, an assay method that relies on the mean of two independent concentration determinations has a CV / /2 that of the same method based on a single determination. This provides an easy way to increase the precision (reduce variability) of a method. An example of this is found in radioimmunoassay in which it is common for a concentration estimate to be calculated from the mean response of two determinations of a specimen. [Pg.3484]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...