Over-fitting data

The desirable number of knots and degrees of polynomial pieces can be estimated using cross-validation. An initial value for s can be n/7 or / n) for n > 100 where n is the number of data points. Quadratic splines can be used for data without inflection points, while cubic splines provide a general approximation for most continuous data. To prevent over-fitting data with... [Pg.82]

The Antoine equation does not fit data accurately much above the normal boiling point. Thus, as regression by computer is now standard, more accurate expressions applicable to the critical point have become usable. The entire DIPPR Compilation" is regressed with the modified RiedeP equation (2-28) with constants available for over 1500 compounds. [Pg.389]

Both Riedel and Wagner regressions usually fit data within a few tenths of a percent over the entire range between the triple point and the critical point. [Pg.390]

If estimated of distribution parameters are desired from data plotted on a hazard paper, then the straight line drawn through the data should be based primarily on a fit to the data points near the center of the distribution the sample is from and not be influenced overly by data points in the tails of the distribution. This is suggested because the smallest and largest times to failure in a sample tend to vary considerably from the true cumulative hazard function, and the middle times tend to lie close to it. Similar comments apply to the probability plotting. [Pg.1053]

The authors of Ref. 18 fitted data from over 175 experiments to the scaled vented pressure parameters, using total heats of explosion for W. Graphs from that paper will be shown later. [Pg.17]

Figure 5-67 displays the results of the cross validation computations of the corn data with PCR and PLS. The graph is fairly typical PLS is consistently better at small numbers of factors and predictions are very similar at the optimal number of factors ne, which is 10 for PLS and 12 for PCR. Experience has shown that it is dangerous to use an excessive number of factors (over-fitting) for thew prediction of new unknown samples. This is why we selected ne= 12 rather than 23 for PCR. [Pg.310]

Figure 6.17 Reaction composition followed over time ( ) aldehyde (49), (O) nitroalcohol intermediate (55), and (A) product (57) (modified from Reference 6). Lines represent fitted data from kinetic model [49],...

Measured normal incidence reflectances of a-SiC for incident electric field perpendicular to the hexagonal axis are shown in Fig. 9.6 these are unpublished measurements made in the authors laboratory, but they are similar to those pubhshed by Spitzer et al. (1959). Also included in this figure are both sets of optical constants—n, k and e, e"—calculated from the best fit of a one-oscillator model to the experimental data. Note that the model curve is almost a perfect representation of the data over the entire range shown for this solid, the technique of fitting data with a one-oscillator model is both a simple and accurate method for extracting optical constants. [Pg.242]

Many types of classifiers are based on linear discriminants of the form shown in (1). They differ with regard to how the weights are determined. The oldest form of linear discriminant is Fisher s linear discriminant. To compute the weights for the Fisher linear discriminant, one must estimate the correlation between all pairs of genes that were selected in the feature selection step. The study by Dudoit et al. indicated that Fisher s linear discriminant did not perform well unless the number of selected genes was small relative to the number of samples. The reason is that in other cases there are too many correlations to estimate and the method tends to be unstable and over-fit the data. [Pg.330]

Pharmacokinetic models. An important advance in risk assessment for hazardous chemicals has been the application of pharmacokinetic models to interpret dose-response data in rodents and humans (EPA, 1996a Leung and Paustenbach, 1995 NAS/NRC, 1989 Ramsey and Andersen, 1984). Pharmacokinetic models can be divided into two categories compartmental or physiological. A compartmental model attempts to fit data on the concentration of a parent chemical or its metabolite in blood over time to a nonlinear exponential model that is a function of the administered dose of the parent. The model can be rationalized to correspond to different compartments within the body (Gibaldi and Perrier, 1982). [Pg.117]

We will only discuss the problem of fitting data points to a straight line. In earlier days, this was accomplished by placing a transparent piece of plastic with a straight edge over the data, so as to minimize the sum of the magnitudes of the deviations of the point from the edge. [Pg.386]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...