Analysis Dataset Models

Analysis Dataset Models (ADaM). The CDISC ADaM team defines data set definition guidance for the analysis data structures. These data sets are designed for creating statistical summaries and analysis. [Pg.5]

Statistical Analysis Dataset Model Change from Baseline Version 0.2. Prepared by the CDISC Analysis Dataset Modeling Team (AdaM), 2005. http //www.cdisc. [Pg.924]

The type of data standards that will be implemented should be stated. For example, datasets will be created using Study Data Tabulation Model (SDTM) and Analysis Data Model (ADaM) standards (as specified by the Clinical Data Interchange Standards Consortium at http //... [Pg.60]

FIGURE 11.3 Analysis of model performance over time. Every month the analysis is repeated with all newly obtained data from the in-house microsomal lability assay, (a) The model performance is given by the correlation coefficient between experimental and predicted values for each new dataset. The light bar shows the performance of the original model, the dark bars show the performance of a model that was updated on a monthly base. Model update is done by adding all previously determined data to the training dataset, (b) The initial model was derived from a combined training and test set of 18,000 samples. After 36 months, the number of available samples has increased by 12,500 samples. [Pg.255]

Analysis of high thronghpnt data, seamlessness of large datasets, modeling of high-thronghpnt datasets, information extraction, component architectures for visualization and computing... [Pg.189]

Another problem is to determine the optimal number of descriptors for the objects (patterns), such as for the structure of the molecule. A widespread observation is that one has to keep the number of descriptors as low as 20 % of the number of the objects in the dataset. However, this is correct only in case of ordinary Multilinear Regression Analysis. Some more advanced methods, such as Projection of Latent Structures (or. Partial Least Squares, PLS), use so-called latent variables to achieve both modeling and predictions. [Pg.205]

Five percent random error was added to the error-free dataset to make the simulation more realistic. Data for kinetic analysis are presented in Table 6.4.3 (Berty 1989), and were given to the participants to develop a kinetic model for design purposes. For a more practical comparison, participants were asked to simulate the performance of a well defined shell and tube reactor of industrial size at well defined process conditions. Participants came from 8 countries and a total of 19 working groups. Some submitted more than one model. The explicit models are listed in loc.cit. and here only those results that can be graphically presented are given. [Pg.133]

Once soil samples have been analyzed and it is certain that the corresponding results reflect the proper depths and time intervals, the selection of a method to calculate dissipation times may begin. Many equations and approaches have been used to help describe dissipation kinetics of organic compounds in soil. Selection of the equation or model is important, but it is equally important to be sure that the selected model is appropriate for the dataset that is being described. To determine if the selected model properly described the data, it is necessary to examine the statistical assumptions for valid regression analysis. [Pg.880]

Using our dataset which includes all of the descriptors mentioned so far, we conducted a PLS analysis using SIMCA software [34], In the initial PLS model, MW, V, and a (Alpha) were removed because they are in each case highly correlated with CMR (r > 0.95). SIMCA s VIP function selected only qmin (Qnegmin) for removal on the basis of it making no important contribution to the model. In the second model, 2q+/a (SQpos A) and ECa/a (SCa A) coincided nearly exactly in the three-component space of these two, we decided to keep only ECa/a in the third and final model. This model consisted of three components and accounted for 75% of the variance in log SQ the Q2 value was 0.66. [Pg.238]

A simple protocol was used to build the compounds compounds were modeled with the corresponding net charges, after which 2D-3D structure conversion was carried out using the program Concord [21]. The 3D dataset obtained was submitted to the VolSurf program, and principal component analysis (PCA) was applied for chemometric interpretation. No metabolic stability information was applied to the model. [Pg.417]

The modeler first encounters basis swapping in setting up a model, when it may be necessary to swap the basis to constrain the calculation. The thermodynamic dataset contains reactions written in terms of a preset basis that includes water and certain aqueous species (Na+, Ca++, K+, Cl-, HCOJ, SO4-, H+, and so on) normally encountered in a chemical analysis. Some of the members of the original basis are likely to be appropriate for a calculation. When a mineral appears at equilibrium or a gas at known fugacity appears as a constraint, however, the modeler needs to swap the mineral or gas in question into the basis in place of one of these species. [Pg.71]

The model that utilized regression analysis was one that built upon previous work by the same authors [36,39]. In this case, the dataset was expanded to 125-129 drugs and the number of assessed descriptors increased to 210. Models for acidic and basic compounds were developed separately as well as a model using all compounds, and the advantages of analyzing acids and bases separately were minimal. Mean-fold errors were generally around 1.8. Descriptors that dominated the models included lipophilicity, fraction anionic or cationic, surface electrostatic potential, and parameters specific to aliphatic carbons and fluorine. [Pg.484]

The residuals (log 1/C observed - log 1/C predicted) for the entire dataset were examined in detail for each key regression equation. Analysis of major residuals allowed not only perception of true "outliers but also provided insights for improving the model i.e., sharpening hypotheses and concepts. [Pg.326]

Contrary to the practical results reviewed above, statistics from correlation work revealed a serious deficiency in the accuracy with which Phase I Equations 3 and 4 predicted -for the Phase II dataset r for Equation 3 predictions for the 103 compound Phase II data was only 0.45 r for Equation 4 predictions for the Phase II dataset was only 0.44. An analysis of the residuals for the Phase II dataset [Potency(observed)-Potency(predicted by Phase I models)] immediately Identified the source of the problem of the 26 Phase II compounds having DICARB >4, 17 had potency for adult observed more than one log unit better than predicted 15 had egg potency observed more than one log unit better than predicted. As schematically shown in Figure 2B, the parabolic functions for DICARB for the Phase I models underpredict at values of DICARB extrapolated beyond those represented in the Phase I dataset. [Pg.335]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...