Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Over-fitting

Quality factor or quality ratio (Q) The high values of Q (2.259-14.646) for these QSAR models suggest that the high predictive power for these models as well as no over-fitting. [Pg.69]

Figure 5-67 displays the results of the cross validation computations of the corn data with PCR and PLS. The graph is fairly typical PLS is consistently better at small numbers of factors and predictions are very similar at the optimal number of factors ne, which is 10 for PLS and 12 for PCR. Experience has shown that it is dangerous to use an excessive number of factors (over-fitting) for thew prediction of new unknown samples. This is why we selected ne= 12 rather than 23 for PCR. [Pg.310]

NB This pilot study included only six concentration levels with a five-component PLS model there is a virtual certainty of over fitting the model. Even a two-segment cross validation is no absolute guarantee [2], but does supply the only possible validation basis with any credibility. However, the results in Figure 9.23 indicate and substantiate satisfactory possibilities for continuing to the type of extended calibration work needed in a full-fledged industrial calibration setting. [Pg.299]

To avoid over-fitting, a commonly used approach is to select a subset of descriptors to build models. GAs are widely used to select descriptors prior to using other statistical tools, such as MLR, to build models. Certainly, principal component analysis and PLS fitting are also widely used in reducing the dimensions of descriptors. Traditionally, stepwise linear regression is used to select certain descriptors to enter the regression equations. [Pg.120]

Many types of classifiers are based on linear discriminants of the form shown in (1). They differ with regard to how the weights are determined. The oldest form of linear discriminant is Fisher s linear discriminant. To compute the weights for the Fisher linear discriminant, one must estimate the correlation between all pairs of genes that were selected in the feature selection step. The study by Dudoit et al. indicated that Fisher s linear discriminant did not perform well unless the number of selected genes was small relative to the number of samples. The reason is that in other cases there are too many correlations to estimate and the method tends to be unstable and over-fit the data. [Pg.330]

N. M. Faber and R. Rajko, How to avoid over-fitting in multivariate calibration the conventional validation approach and an alternative. Anal. Chim. Acta, 595, 2007, 98-106. [Pg.238]

Fig. 1. Resonant molecular formation rate in /it + Do collisions calculated for a 3 K target [7,8,9]. The rates are normalized to the liquid hydrogen density and averaged over fit hyperfine states. Also shown is the /it elastic scattering rate on the d nucleus [12]... Fig. 1. Resonant molecular formation rate in /it + Do collisions calculated for a 3 K target [7,8,9]. The rates are normalized to the liquid hydrogen density and averaged over fit hyperfine states. Also shown is the /it elastic scattering rate on the d nucleus [12]...
Kelly et al. [13], when describing their work on the geographical origin of rice, also addressed the issue of over-fitting of data, and proposed a stepwise approach, selecting the minimum number of variables (from the 52 measured) in order to maximize the separation whilst ensuring that statistical over-fitting was minimized. [Pg.129]

DT Does not make any assumption of the type of relationship between target property and molecular descriptors Models are easy to interpret Fast classification speed Multi-class classification May have over fitting when training set is small and number of molecular descriptors is large Ranks molecular descriptors using information gain which may not be the best for some problems... [Pg.231]

The desirable number of knots and degrees of polynomial pieces can be estimated using cross-validation. An initial value for s can be n/7 or / n) for n > 100 where n is the number of data points. Quadratic splines can be used for data without inflection points, while cubic splines provide a general approximation for most continuous data. To prevent over-fitting data with... [Pg.82]

In equation (3.15), A refers to a hydroxyl group at position 12, FO to a formyl ester, AC to an acetyl ester, PR to a propionyl ester, and the number to the position of substitution. The low value of x suggests that this relationship might over-fit the data because it is much lower than the standard deviation of replicate measurements. [Pg.74]

The distinction between physics-based and empirically-based models concerns confidence. Physics-based models are characterised by our confidence in the mathematical description of the system. Empirical models are characterised by a lack of confidence in our ability to extrapolate a pattern observed in the training set to a relevant test set. The literature abounds with examples of over-fitted models, over-optimistic assessments of model quality, inappropriate use of statistical testing and models that are no more significant than random chance. Even so, the promise of empirical models, such as QSAR models, to guide compound design and testing, means this is an increasingly important area of research and justifies continued eflfort and focus. [Pg.243]


See other pages where Over-fitting is mentioned: [Pg.717]    [Pg.301]    [Pg.305]    [Pg.355]    [Pg.34]    [Pg.158]    [Pg.192]    [Pg.107]    [Pg.120]    [Pg.131]    [Pg.124]    [Pg.362]    [Pg.458]    [Pg.258]    [Pg.209]    [Pg.195]    [Pg.205]    [Pg.32]    [Pg.129]    [Pg.323]    [Pg.80]    [Pg.188]    [Pg.368]    [Pg.135]    [Pg.138]    [Pg.67]    [Pg.87]    [Pg.147]    [Pg.247]    [Pg.57]    [Pg.574]    [Pg.124]    [Pg.388]    [Pg.701]    [Pg.363]    [Pg.34]    [Pg.146]   
See also in sourсe #XX -- [ Pg.89 , Pg.215 , Pg.231 , Pg.238 ]

See also in sourсe #XX -- [ Pg.104 , Pg.269 ]

See also in sourсe #XX -- [ Pg.49 ]




SEARCH



Over-fitting data

Polynomial regression over-fitting

© 2024 chempedia.info