Data splitting

The data collected at contact (sample data Figure 7-8) was compared with similar device types for contact resistance and comb leakage (Note test lot circled). All the data for each split was combined in order to compare the distribution in reference to other lots. As shown in Figure 7-8, the data splits did not have a large effect on the overall electrical process control results. [Pg.113]

Very often a test population of data is not available or would be prohibitively expensive to obtain. When a test population of data is not possible to obtain, internal validation must be considered. The methods of internal PM model validation include data splitting, resampling techniques (cross-validation and bootstrapping) (9,26-30), and the posterior predictive check (PPC) (31-33). Of note, the jackknife is not considered a model validation technique. The jackknife technique may only be used to correct for bias in parameter estimates, and for the computation of the uncertainty associated with parameter estimation. Cross-validation, bootstrapping, and the posterior predictive check are addressed in detail in Chapter 15. [Pg.237]

The resampling approaches of cross-validation (CV) and bootstrapping do not have the drawback of data splitting in that all available data are used for model development so that the model provides an adequate description of the information contained in the gathered data. Cross-validation and bootstrapping are addressed in Chapter 15. One problem with CV deserves attention. Repeated CV has been demonstrated to be inconsistent if one validates a model by CV and then randomly shuffles the data, after shuffling, the model may not be validated. [Pg.238]

Since a PM model may be used not only for the explanation of variability but also for predictions (28), being certain about covariates that are retained in the model and the predictive accuracy of the model is important. Thus, the stability of the PM model (in terms of the covariates) and its predictive performance is essential. Stability is used in the sense of replication stability for inclusion of covariates in a model (29). Sample sizes are usually too small (especially in pediatric studies) to apply the well known and often recommended method of data splitting (30). With better computer facilities, a computer-intensive method such as the related bootstrap method has proved to be a practicable alternative (31) (see Chapter 15 of this text). The method proposed by Ette (31) for stability testing to ensure that appropriate covariates are selected to build a PM model is described below. [Pg.392]

When an external data set is not available, internal data validation is done instead. This method, usually called data-splitting, splits the available data into two... [Pg.254]

Probabilistic QSAR models, Binomial PLS and local similarity assessment N = 560 CYP3A4, N = 526 CYP2D6 substrates. Data split 70/30 training/test set. 14.2% false positive (CYP3A4) 6.6% false positive (CYP2D6). 204... [Pg.326]

Data splitting is fairly straightforward and covered in detail in the next section on validation. It simply implies that data to be modeled are partitioned based on differences in sampling (i.e., windows where suspect 0 are believed to be constant). The most common data splits to explore pharmacokinetic time dependencies would be single-dose, chronic non-steady-state, and steady-state conditions. Data subsets are modeled individually with all parameters and variability estimates along with any relevant covariate expressions compared in a manner similar to a validation procedure (see next section). Data can be combined in a leave-one-out strategy (see cross-validation description) to examine the uniformity of data windows. ... [Pg.335]

Hence, models with and without time dependencies can be compared to evaluate their predictive performance and ability to explain sources of variation. Rostami-Hodjegan et al. compared induction models and data-splitting techniques in their evaluation of adaptive changes in methadone kinetics. [Pg.336]

Cross-validation is a leave-one-out or leave-some-out validation technique in which part of the data set is reserved for validation. Essentially, it is a data-splitting technique. The distinction lies within the manner of the split and the number of data sets evaluated. In the strict sense a -fold cross-validation involves the division of available data into k subsets of approximately equal size. Models are built k times, each time leaving out one of the subsets from the build. The k models are evaluated and compared as described previously, and a hnal model is dehned based on the complete data set. Again, this technique as well as all validation strategies offers flexibility in its application. Mandema et al. successfully utilized a cross-validation strategy for a population pharmacokinetic analysis with oxycodone in which a portion of the data was reserved for an evaluation of predictive performance. Although not strictly a cross-validation, it does illustrate the spirit of the approach. [Pg.341]

Figure 20 Difference between the degree of polymerization of the adsorbing segments in System I (PS) and System II (4-BrS) in PBr 3-CUD and PBr 3 NB as a function of the moie fraction of 4-BrS in the PBr S copolymer. For each system, the data split into two groups depending on the bromination temperature. Bromination of PS below 7= 32.8 °C produces PBr RCPs with a random-blocky distribution of the 4-BrS segments (b-PBr 3)- Bromination of PS above r=32.8°C produces PBr RCPs with a more random distribution of the 4-BrS segments (r-PBr ).

$Figure 20 Difference between the degree of polymerization of the adsorbing segments in System I (PS) and System II (4-BrS) in PBr 3-CUD and PBr 3 NB as a function of the moie fraction of 4-BrS in the PBr S copolymer. For each system, the data split into two groups depending on the bromination temperature. Bromination of PS below 7= 32.8 °C produces PBr RCPs with a random-blocky distribution of the 4-BrS segments (b-PBr 3)- Bromination of PS above r=32.8°C produces PBr RCPs with a more random distribution of the 4-BrS segments (r-PBr ).$

X-ray singje aystal diffraction data. Split model, in which Ce and oxygen atoms are shifted from their ideal positions, is also proposed in the paper. Synchrotron powder diffraction data. [Pg.130]

In Fig. 16.5, we are able to look at this data split by core raw material (largest component only). [Pg.426]

The use of a new data set or data splitting for the purpose of cross validation may not always be applicable or desirable. In addition, the results depend on the location of the split. An alternative is to define the prediction... [Pg.63]

Almoy T, Haugland E. Calibration methods for NIRS instruments a theoretical evaluation and comparisons by data splitting and simulations. Appl Spectrosc 1994 48 327-332. [Pg.129]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...