Reduced data set

Two regression cases have been run on the Cimetidine data, the first uses all of the data points, whilst the second uses a sub-set of the data, to confirm the predictive power of the model. The reduced data set consists of the single solvents in Table 5. which adequately cover the range of conceptual segment types. Where solubility data is supplied in mixed solvents it is necessary to enter the data directly into the Aspen Properties interface before regression. [Pg.61]

In phase III of the data collection step, companies which produce or import existing substances in quantities between 10 and 1000 tons per year (LPVCs or Low Production Volume Chemicals) were required to submit a reduced data set by 4 June 1998. [Pg.35]

The chosen trace can differ in appearance from the initially displayed one since the selections proceed on the basis of a reduced data set while the confirmed traces are derived from the full data set. If the confirmed trace is not suitable use the Prev.Trace and Next Trace buttons to select an appropriate trace. [Pg.164]

Figure 6.28) and the PC-model is calculated for the reduced data-set. Because the PC model of X is the product of t and p, the model predicts the held-out elements (the element xik is predicted as ttpk). Hence, by comparing the prediction of the held-out elements with their actual values, an estimate of the predictive power of the model is obtained. The usual estimator of the predictive power in PCA and PLS is prediction error sum of squares (PRESS), defined as ... [Pg.328]

It is important to note that the abstract components are not physical spectra present in our data set and therefore cannot be used to identify chemical components in the sample and the eigenimages are not their thickness maps. In fact the principal components may contain parts of various spectral signatures. Components with lower eigenvalues usually contain noise, which we eliminate by forming a reduced data set. [Pg.752]

Cross-link density and parameters relating to the network structure can be measured by NMR by analysis of the transverse relaxation decay (cf. Section 1.3) and the longitudinal relaxation in the rotating frame [67]. Combined with spatial resolution, the model-based analysis of relaxation yields maps of cross-link density and related parameters [68]. Often the statistical distribution of relaxation parameters over all pixels provides a reduced data set with sufficient information for sample characterization and discrimination [68]. [Pg.271]

Another criterion that is based on the predictive ability of PCA is the predicted sum of squares (PRESS) statistic. To compute the (cross validated) PRESS value at a certain k, we remove the ith observation from the original data set (for i = 1,. .., n), estimate the center and the k loadings of the reduced data set, and then compute the fitted value of the ith observation following Equation 6.16, now denoted as x, . Finally, we set... [Pg.193]

With the help of appropriate filters it is also possible to create a partial data set out of the standard data set. Especially when a huge number of analyses have to be calculated - as with a coupled model (transport plus reaction) - CPU-time can be saved with a reduced data set. However, it must be verified that the partial data set yields comparable results to the original data set. [Pg.76]

If you have imported a data file with a large number of data records (e.g., 8000 data points) you may wish to work with a reduced data set, say, every fiftieth point. The following sections describe three ways you can create such a list by using AutoFill to Fill Down a pattern, by using Excel s Sampling tool, or by using a worksheet formula. [Pg.154]

For each reduced data set, the model is calculated, and responses for the deleted objects are predicted from the model. The squared differences between the true response and the predicted response for each object left out are added to PRESS (predictive residual sum of squares). From the final PRESS, the (or R cv) and SDEP (standard deviation error of prediction) values are usually calculated [Cruciani et ah, 1992]. [Pg.462]

It is important to know how many principal components (factors) should be retained to accurately describe the data matrix D in Eq. (15), and still reduce the amount of noise. A common method used is the cross validation technique, which provides a pseudo-predictive method to estimate the number of factors to retain. The cross validation technique leaves a percentage of the data (y %) out at a time. Using this reduced data set, PCA is again carried out to provide new loading and scores. These are then used to predict the deleted data and then used to calculate the ensuing error dehned by... [Pg.56]

Note that LL( ) is not the log-likelihood with the ith case deleted. The log-likelihood of the complete case is based on, say n observations. The log-likelihood of the reduced data set will be based on < n observations so that comparing LL to LL( ) is not valid. Hence the need to determine the log-likelihood of the complete case data set using the parameter estimates of the reduced data set. [Pg.196]

Notified substances are listed in the DSL, as a supplement published in the Canada Gazette as necessary. After listing they can be manufactured or imported by other suppliers for unrestricted use. Hence substances notified with a reduced data set, because of limited use or exposure or with data waivers, are not listed in the DSL. Also, substances suspected of being toxic can only be listed in the DSL after they are regulated under CEPA to ensure their safe use. [Pg.560]

Protein Composition of Peas (Reduced Data Set) (a) 1 = Smooth Pea Cultivars 2 = Wrinkled Pea Cultivars (b) LaurelPs Technique (c) Ultracentrifugation... [Pg.227]

Tables A.l and A.2 (see pp. 182 and 184) summarize the continuous variables used in analysis. In order to facilitate a comparative interpretation of odds ratios from different factors, normalizing transformations were applied to the remaining factors Continuous variables, with the exception of impact speed in PCDS, were transformed by subtracting the mean and dividing by the standard deviation (SD) computed from the full sample (i.e., including cases with missing impact speed or with pedestrian age <4). Impact speed in PCDS was scaled by the mean, i.e., divided by the mean. The most important variables were tested to ensure that this procedure did not result in a significant change in the mean of those variables compared with the reduced data set.

Table 7.6 Variable loadings for the first five PCs derived from the reduced data set of 11 variables (from Livingstone et al. 1992, with permission of the Royal Society of Chemistry)...

Recursively call FuzzyOrthogonalPCA() on this reduced data set after... [Pg.282]

Combining the information from both plots, one would start with removing batch 84 and create new figures for the reduced data set and repeat the elimination process. [Pg.294]

For the sub-model of q, which depends on two input variables, T and the H2 composition, the minimum number of data points required for accurate fuzzy sub-model development was investigated. The data was obtained by adding random signals to the input variables in the reactor simulation. There are two requirements the data should be evenly distributed over the entire set and the data density should be an independent variable. To obtain an evenly distributed data set, the number of data points was reduced by sequentially removing data points within a certain distance of a randomly chosen data point. By varying this distance it is possible to produce different reduced data sets. The base case counted 200 data points. [Pg.427]

From reduced data sets it appeared that still an accurate hybrid model could be designed, although it was more difficult to find a good set of clusters. However, model extrapolation capabilities became less reliable. [Pg.428]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...