Variance unexplained

Fischer statistics (F) Fischer statistics (F) is the ratio between explained and unexplained variance for a given number of degree of freedom. The larger the F value the greater the probability that the QSAR equation is significant. The F values obtained for these QSAR models are from 17.622 to 283.714, which are statistically significant at the 95% level. [Pg.69]

This measure can be related to the sum of squared elements of the columns of X to obtain a proportion of unexplained variance for each variable. Subtraction from 1 results in a measure Qj of explained variance for each variable using a PCs... [Pg.91]

In Equation 4.1, C is the n x k matrix of pure chromatograms (k independently varying components), P is the m x k matrix of pure-component spectra, and the matrix e contains unexplained variance, e.g., measurement error. Figure 4.1 shows an example of such a data matrix having two overlapped peaks. [Pg.71]

The correlation coefficient r, the total variance SS the unexplained variance SSQ, and the standard deviation, are defined as follows ... [Pg.10]

The statistically focused methods for defining ADs are related to information content of the investigated descriptors, for example, the variance of the descriptor matrix and calculate the amount of an unexplained variance for the training set objects (the model) and compare it with the corresponding amount for the new objects to be predicted. If the amount of unexplained variance for the new obj ects is much greater, typically more than the two standard deviations from the training set compounds ( 95% confidence interval), the former objects are designated to be outside the AD of the model. [Pg.397]

Relevance of descriptor variables-. The perpendicular distance from a data point Oj to the principal component pj is given by the residual dy, see Fig. 15.15. The sum of squared deviations with respect to pj over all data points is 2 dy ((/= 1 to TV). This sum of squares is the variation not accounted for by pj. After conection for the number of degrees of freedom this can be expressed as "unexplained variance". [Pg.366]

The sum of squares 2 dy" over all objects and all components is the residual sum of squares not accounted for by the model. This can be partitioned into components, SCj/, which show how much the "unexplained variance" for each descriptor, "fc", contributes to the total sum of squares, see Fig. 15.16. It is seen in Fig. 15.16 that... [Pg.367]

This partitioning of "unexplained variance" offers a means of determining the relevance of each descriptior in the principal components model. Descriptor... [Pg.367]

The value of the indicator variable I corresponds to the biological activity contribution of a mesylamido group, based on the benzamido group as the reference substituent. While not too much information can be derived from this value, there is no other way to combine eqs. 58 and 59 to one equation. The large increase in the value of the correlation coefficient r (from 0.935 and 0.971, respectively, to 0.990) results from the fact that the overall variance of the data increases by combining both subsets, while the unexplained variance remains constant, as can be seen from a comparison of the standard deviations s of all equations (compare chapter 5.1). [Pg.55]

The correlation coefficient r (eq. 124) is a relative measure of the quality of fit of the model because its value depends on the overall variance of the dependent variable (this is illustrated by eqs. 58 — 60, chapter 3.8 while the correlation coefficients r of the two subsets are relatively small, the correlation coefficient derived from the combined set is much larger, due to the increase in the overall variance). The squared correlation coefficient r is a measure of the explained variance, most often presented as a percentage value. The overall (total) variance is defined by eq. 125, the unexplained variance (SSQ = sum of squared error residual variance variance not explained by the model) by eq. 126. [Pg.93]

The cross-validation coefficient is the complement to the fraction of unexplained variance over the total variance. [Pg.360]

These limitations can be overcome by constructing models to predict folding behavior and then quantifying their accuracy. For the latter step, the Pearson linear correlation coefficient can be used with jc, as the observed values and y, as the predicted ones (for which we introduce the shorthand notations rjct, and Tcv, described below). Alternatively, one can calculate the root-mean-square error or the closely related fraction of unexplained variance ... [Pg.4]

In some papers, other criteria are used. For example, sometimes standard error of prediction is used instead of (or together with) Rr. Standard error of prediction itself makes no sense until we compare it with the standard deviation for activities of the test set, which brings us back to the correlation coefficients. If used, mean absolute error (MAE) should be compared with the mean absolute deviation from the mean. Sometimes, f-ratio is calculated, which is the variance explained by the model divided by the unexplained variance. It is believed that the higher is the F-ratio, the better is the model. We suppose that when f-ratio is used, it must be always accompanied by the corresponding p-value. [Pg.1319]

In cross-validation, a ( "press) value is defined like r- in regression and PLS analysis, using PRESS instead of the unexplained variance E(ycaic — Tobs) - Cross-validated values are always smaller than the r values, including all objects (rpiT). As long as only significant PLS vectors are... [Pg.454]

An identical strategy was followed for assessing the effect of lead on physical outcome variables in the presence of control variables. After constructing least unexplained variance models for physical outcome measures with the control variables, lead variables were separately entered into the model. Table 5 shows that infant s weight at birth, chest circumference and trunk length were all affected by lead even after control variables were taken into account. Additionally, umbilical cord lead alone, absent in the bivariate correlations, now shows some association with trunk length. [Pg.392]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...