Explanation of the Data Set

The adaptive wavelet algorithm is applied to three spectral data sets. The dimensionality of each data set is p = 512 variables. The data sets will be referred to as the seagrass, paraxylene and butanol data. The number of training and testing spectra in the group categories is listed in Table 1 for each set of data. [Pg.442]

The training seagrass data comprise 55 spectra in each group and the testing data have 34 spectra in each class. Fig. 1 shows five sample spectra from each of the classes. With the naked eye, there appears to be some striking similarities between the spectra from the different seagrass species. [Pg.443]

In this section, we design our own task specific filter coefficients using the adaptive wavelet algorithm of Chapter 8. The idea behind the adaptive wavelet algorithm is to avoid the decision of which set of filter coefficients and hence the wavelet family which would be best suited to our data. Instead, we basis design our own wavelets or more specifically, the filter coefficients which define the wavelet and scaling function. This is done to suit the current task at hand, which in this case is discriminant analysis. [Pg.444]

The discriminant criterion function implemented by the adaptive wavelet algorithm is the CVQPM criterion function discussed in Section 1.4. The adaptive wavelet algorithm is applied using several settings of the m, q and jo [Pg.444]

Since log(p)/log(m) for the case m — 4, is not an integer, we would like to clarify our definition of J, the highest level in the DWT (which is the original data). We let J = ceil(log(512)/log(m)). For the case m = 4, the highest level in the DWT is 5 as demonstrated in Fig. 4. At the highest level there are 512 coefficients, at level 4, there are 512/4 = 128 coefficients in each band, at level 3 there are 128/4 = 32 coefficients in each band, and, for the level which we consider, there are 32/4 = 8 coefficients in each band. [Pg.446]

Further experimental data and explanation of the experimental set up may be found in earlier papers of Lange et al. [8-10]. [Pg.81]

Less straightforward is the problem of rogue data points, which appear to satisfy the usual criteria of quality, but fall well outside the norms established by the data set or by reliable precedent. If there is no apparent reason for the discrepancy — and a careful search for an explanation may reveal a perturbing effect of interest — such data points may be treated as outliers,2 and may legitimately be omitted from the correlation. [Pg.92]

Figure 12.25 provides a graphical explanation of the phenomena of over and underfitting [1], It shows that the overall prediction error of a model has contributions from two sources (1) the interference error and (2) the estimation error. The interference error continually decreases as the complexity of the calibration model increases, as the added complexity enables the model to explain more interferences in the analyzer data. At the same time, however, the estimation error of the model increases with the model complexity, because there are more independent model parameters that need to be estimated from the same limited set of data. These competing forces result in a conceptual minimum in the overall prediction error of a model, where the combination of interference error and estimation error are minimized. It should be noted that this explanation of a model s prediction error assumes that the calibration data are sufQciently representative of the data that will be obtained when the model is applied. [Pg.408]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...