Small data set

Fast and accurate predictions of H NMR chemical shifts of organic compounds arc of great intcrc.st for automatic stnicturc elucidation, for the analysi.s of combinatorial libraries, and, of course, for assisting experimental chemists in the structural characterization of small data sets of compounds. [Pg.524]

The performance of the ncttral network method is remarkable considering the relatively small data set on which it was based. [Pg.528]

Both and are easy to determine for small data sets. For larger data sets, however, calculating becomes tedious. Its calculation is simplified by taking advantage... [Pg.695]

Notice that by the inclusion of Tg, the mean is much more strongly influenced than the median. The value of such comparisons lies in the automatic processing of large numbers of small data sets, in order to pick out the suspicious ones for manual inspection. (See also the next Section.)... [Pg.15]

With small data sets or if there is reason to suspect deviations from the Gaussian distribution, a robust outlier test should be used. [Pg.243]

Hydrophobicity is found to be the single most important parameter for this data set, which shows that at all the parts where substituents have been entered, hydrophobic contacts have been made. The Unear Clog P model suggests that the highly hydrophobic molecules will be more active. Although this is a very small data set it is the best model and explains 97.5% of the variance in log 1/C. [Pg.52]

Examples of the application of correlation analysis to diene and polyene data sets are considered below. Both data sets in which the diene or polyene is directly substituted and those in which a phenylene lies between the substituent and diene or polyene group have been considered. In that best of all possible worlds known only to Voltaire s Dr. Pangloss, all data sets have a sufficient number of substituents and cover a wide enough range of substituent electronic demand, steric effect and intermolecular forces to provide a clear, reliable description of structural effects on the property of interest. In the real world this is not often the case. We will therefore try to demonstrate how the maximum amount of information can be extracted from small data sets. [Pg.714]

Nansen, C., Campbell, J.F., Phillips, T.W., and Mullen, M.A. 2003. The impact of spatial structure on the accuracy of contour maps of small data sets. J. Econ. Entonol. 96, 1617-1625. [Pg.290]

The methods CV (Section 4.2.5) and bootstrap (Section 4.2.6), which are necessary for small data sets, are called resampling strategies. They are applied to obtain a reasonable high number of predictions, even with small data sets. It is evident that the larger the data set and the more reliable (more friendly in terms of modeling) the data are, the better the prediction performance can be estimated. Thus a qualitative uncertainty relation of the general form is presumed ... [Pg.123]

Quite intriguingly, the model lacked a volume term. Although this new model outperformed the previously reported models on the small data set, its serious drawback was the use of experimental values for the various partition coefficients. [Pg.468]

Admittedly there do exist a few, rare situations in which no option for test set validation is possible (historical data, very small data sets, other. ..). In such cross-validation finds its only legitimate application area (NB None of these situations mnst result from voluntary decisions made by the data analyst, however). In historical data there simply does not exist the option to make any resampling, etc. In small data sets, this option might have existed, bnt perhaps went unused because of negligence - or this small sample case may be fnlly legitimate. [Pg.77]

Small data sets (< 20 objects) + Relatively quick + Relatively quick - Can be slow, if m or number of iterations large - Selection of subsets unknown + OK, if many iterations done - Avoid using if N > 20 + Good choice. -. unless designed/DOE data - Requires time to determine/ construct cross validation array + often needed to avoid the external subset selection trap... [Pg.412]

Small Data Set. Low-temperature (plasma) ashes (LTAs) were obtained from ten diverse coal samples (Table I), ranging in rank from lignite to Ivb. Infrared spectra were obtained of duplicate samples of each LTA. A separate set of duplicates generated (for other purposes) for the first four LTAs listed also was analyzed by FTIR. [Pg.45]

Infrared data in the 1575-400 cm region (1218 points/spec-trum) from LTAs from 50 coals (large data set) were used as input data to both PLS and PCR routines. This is the same spe- tral region used in the classical least-squares analysis of the small data set. Calibrations were developed for the eight ASTM ash fusion temperatures and the four major ash elements as oxides (determined by ICP-AES). The program uses PLSl models, in which only one variable at a time is modeled. Cross-validation was used to select the optimum number of factors in the model. In this technique, a subset of the data (in this case five spectra) is omitted from the calibration, but predictions are made for it. The sum-of-squares residuals are computed from those samples left out. A new subset is then omitted, the first set is included in the new calibration, and additional residual errors are tallied. This process is repeated until predictions have been made and the errors summed for all 50 samples (in this case, 10 calibrations are made). This entire set of... [Pg.55]

Nonparametric statistics (NPS) differs primarily from its traditional, distribution-based counterpart by dealing with data of unknown probability distributions. Its principal attractiveness lies, in fact, in not requiring the knowledge of a probability distribution. NPS is especially inviting when the assumption of normal distribution of small data sets is hazardous (if at all admissible), even if NPS-based calculations are more time consuming than in traditional statistics. The steadily growing importance of NPS has been amply demonstrated by numerous textbooks and monographs published within the last few decades, e.g. [1-7],... [Pg.94]

For small data sets (n < 10), which are often encountered in chemical analysis, a simple method to determine if an outlier is rejectable is the Q test. In this test, a value for Q is calculated and compared to a table of Q values that represent a certain percentage of confidence that the proposed rejection is valid. If the calculated Q value is greater than the value from the table, then the suspect value is rejected and the mean recalculated. If the Q value is less than or equal to the value from the table, then the calculated mean is reported. Q is defined as follows ... [Pg.27]

A commonly used invalid estimate is called the re-substitution estimate. You use all the samples to develop a model. Then you predict the class of each sample using that model. The predicted class labels are compared to the true class labels and the errors are totaled. It is well known that the re-substitution estimate of error is highly biased for small data sets and the simulation of Simon et al. (14) confirmed that, with a 98.2% of the simulated data sets resulting in zero misclassifications even when no true underlying difference existed between the two groups. [Pg.334]

In addition, the chemist may infer the structural characteristics that reduce toxic potency, thereby providing a rational basis to design new, less toxic analogs. Typically, in larger data sets the relationship between structure and activity is more apparent, but small data sets can nonetheless be fairly useful. The application of SARs for the design of safer chemicals is demonstrated below using aliphatic carboxylic acids and organonitriles, two classes of important commercial chemical substances. [Pg.86]

D. Grientschnig, Relation Between Prediction Errors of Inverse and Classical Calibration, Fresenius J. Anal. Chem. 2000,367, 497 J. Tellinghuisen, Inverse vs Classical Calibration for Small Data Sets, Fresenius J. Anal. Chem. 2000, 368, 585. [Pg.665]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...