Training set construction

ANN to memorize that case. The test set should also contain a representative sampling of cases to realistically assess how the ANN responds to new situations. A word about autoassociation problems is in order here. If your goal is to use an ANN to simply store patterns or to compress data, you really do not need a test set because all you care about are the cases with which you train the network. If you want to pass corrupt data through the ANN to see if the network will output a clean version of the input, you may want to construct a test set to see how well the network can do this training set construction is presumably trivial here you know what data you want to store or compress, and this data is the training set. [Pg.108]

Neural network classifiers. The neural network or other statistical classifiers impose strong requirements on the data and the inspection, however, when these are fulfilled then good fully automatic classification systems can be developed within a short period of time. This is for example the case if the inspection is a part of a manufacturing process, where the inspected pieces and the possible defect mechanisms are well known and the whole NDT inspection is done in repeatable conditions. In such cases it is possible to collect (or manufacture) as set of defect pieces, which can be used to obtain a training set. There are some commercially available tools (like ICEPAK [Chan, et al., 1988]) which can construct classifiers without any a-priori information, based only on the training sets of data. One has, however, always to remember about the limitations of this technique, otherwise serious misclassifications may go unnoticed. [Pg.100]

This requirement is pretty easy to accept. It makes sense that, if we are going to generate a calibration, we must construct a training set that exhibits all the forms of variation that we expect to encounter in the unknown samples. We certainly would not expect a calibration to produce accurate results if an unknown sample contained a spectral peak that was never present in any of the calibration samples. [Pg.14]

When we plot the sample concentrations in this way, we begin to see that each sample with a unique combination of component concentrations occupies a unique point in this concentration space. (Since this is the concentration space of a training set, it sometimes called the calibration space.) If we want to construct a training set that spans this concentration space, we can see that we must do it in the multivariate sense by including samples that, taken as a set, will occupy all the relevant portions of the concentration space. [Pg.29]

We will now construct the concentration matrices for our training sets. Remember, we will simulate a 4-component system for which we have concentration values available for only 3 of the components. A random amount of the 4th component will be present in every sample, but when it comes time to generate the calibrations, we will not utilize any information about the concentration of the 4th component. Nonetheless, we must generate concentration values for the 4th component if we are to synthesize the spectra of the samples. We will simply ignore or discard the 4th component concentration values after we have created the spectra. [Pg.35]

Now, we are ready to apply PCR to our simulated data set. For each training set absorbance matrix, A1 and A2, we will find all of the possible eigenvectors. Then, we will decide how many to keep as our basis set. Next, we will construct calibrations by using ILS in the new coordinate system defined by the basis set. Finally, we will use the calibrations to predict the concentrations for our validation sets. [Pg.111]

As in isocratic mode, the estimate of log P is indirect and based on the construction of a linear retention model between a retention property characteristic of the solute (logkw) and a training set with known logP ci values. To assess the most performing procedures, the three hydrophobicity indexes (( )o, CHI and logkw) were compared on the basis of the solvation equation [41]. These parameters were significantly inter-related with each other, but not identical. Each parameter was related to log P with values between 0.76 and 0.88 for the 55 tested compounds fitting quality associated with the compound nature. [Pg.343]

Most Radial code is constructed by RuleMaker from training sets... [Pg.21]

More recently, another linear discriminant analysis (LDA) model was constructed for a set of 157 compounds for which Pcaco-2 was measured [43]. This model, which applied DRAGON descriptors, achieved an accuracy of classification at 91 % for the training set and 84% for the test set. When this model was applied to predict a set of 241 drugs for which HIA data were available, good correlation (>81%) was achieved between the two ADME-Tox properties. [Pg.109]

Once a QSAR model is constructed, it must be validated using the external test set. The data points in the test set should not appear in the training set. There are two approaches to improve the prediction accuracy for a given QSAR model. The first approach utilized the concept of "the domain of applicability," which is used to estimate the uncertainty in prediction of a particular molecule based on how similar it is to the compound used to build the model. To make a more accurate prediction for a given molecule in the test set, the structurally similar compounds in the training set are used to construct model and that model is used to make the prediction. In some cases, the domain similarity is measured using molecular descriptor similarity, rather than the structural similarity. The... [Pg.120]

Another way to form a VHTS from several 4D-QSAR models is to use all the distinct grid cell occupancy descriptors (GCODs) and the bioactivity (AG) values of the training set. This simple method of constructing a VHTS-QSAR model is likely to suffer from overfitting the data, but is useful in a VHTS... [Pg.167]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...