Variable selection methods

The variable selection methods have been also adopted for region selection in the area of 3D QSAR. For example, GOLPE [31] was developed with chemometric principles and q2-GRS [32] was developed based on independent CoMFA analyses of small areas of near-molecular space to address the issue of optimal region selection in CoMFA analysis. Both of these methods have been shown to improve the QSAR models compared to original CoMFA technique. [Pg.313]

Narayanan R and Gunturi SB. In silico ADME modelling prediction models for blood-brain barrier permeation using a systematic variable selection method. Bioorg Med Chem 2005 13 3017-28. [Pg.510]

Coupling Fast Variable Selection Methods to Neural Network-Based Classifiers Application to Multi-Sensor Systems. [Pg.388]

We may suppose that not all 600 wavelengths are useful for the prediction of nitrogen contents. A variable selection method called genetic algorithm (GA, Section 4.5.6) has been applied resulting in a subset with only five variables (wavelengths). Figure 1.3c and d shows that models with these five variables are better than models... [Pg.23]

A stepwise variable selection method adds or drops one variable at a time. Basically, there are three possible procedures (Miller 2002) ... [Pg.154]

In Section 4.8.2, we will describe a method called Lasso regression. Depending on a tuning parameter, this regression technique forces some of the regression coefficients to be exactly zero. Thus the method can be viewed as a variable selection method where all variables with coefficients different from zero are in the final regression model. [Pg.157]

Since the value of H depends on the choice of , modifications of this procedure have been proposed (Fernandez Piema and Massart 2000). Another modification of the Hopkins statistic—published in the chemometrics literature—concern the distributions of the values of the used variables (Hodes 1992 Jurs and Lawson 1991 Lawson and Jurs 1990). The Hopkins statistic has been suggested for an evaluation of variable selection methods with the aim to find a variable set (for instance, molecular descriptors) that gives distinct clustering of the objects (for instance, chemical structures)—hoping that the clusters reflect, for instance, different biological activities (Lawson and Jurs 1990). [Pg.286]

The rather time- and cost-expensive preparation of primary brain microvessel endothelial cells, as well as the limited number of experiments which can be performed with intact brain capillaries, has led to an attempt to predict the blood-brain barrier permeability of new chemical entities in silico. Artificial neural networks have been developed to predict the ratios of the steady-state concentrations of drugs in the brain to those of the blood from their structural parameters [117, 118]. A summary of the current efforts is given in Chap. 25. Quantitative structure-property relationship models based on in vivo blood-brain permeation data and systematic variable selection methods led to success rates of prediction of over 80% for barrier permeant and nonper-meant compounds, thus offering a tool for virtual screening of substances of interest [119]. [Pg.410]

Narayanan and Gunturi [33] developed QSPR models based on in vivo blood-brain permeation data of 88 diverse compounds, 324 descriptors, and a systematic variable selection method called Variable Selection and Modeling method based on the Prediction (VSMP). VSMP efficiently explored all... [Pg.541]

Cruciani, G. and Watson, K.A. Comparative molecular field analysis using GRID force-field and GOLPE variable selection methods in a study of inhibitors of glycogen phosphorylase b. [Pg.139]

With these argnments as a backdrop, I will review some empirical variable selection methods in addition to the prior knowledge-based, stepwise and all possible-combinations methods discnssed earlier in the MLR section (Section 12.3.2). [Pg.423]

The variable selection methods discussed above certainly do not cover all selection methods that have been proposed, and there are several other methods that could be quite effective for PAT applications. These include a modified version of a PLS algorithm that includes interactive variable selection [102], and a combination of GA selection with wavelet transform data compression [25]. [Pg.424]

Candidate Tariables were chosen using a mixed-variable selection method and validated based on prediction ability. Separate models (with different measurement variables) were estimated for each of the components. The final models and measures of performance are as follows (see Table 5.11 for a description of these figures of merit) ... [Pg.136]

A mixed variable-selection method is used to select variables with the Proba-biiin to Enter and Probability to Leave values set to 0.05. Separate models are constructed to predict components A and B. The calibration data are used to select the variables to include in the model, and the validation data are used to further refine the model to optimize the predictive ability. [Pg.310]

The mixed-variable selection method is used with the probability to enter and probability to leave values set to 0.03. Only a model for the prediction of caustic concentration is developed, which means that the values for the salt, water, and temperatures are not used in the calculations (even though in this case they are known). The calibration data (X in Figure 5.42) consisting... [Pg.318]

Methods that select covariates by choosing those that are most strongly associated with the primary outcome (often called variable selection methods ) should be avoided. The clinical and statistical relevance of a covariate should be assessed and justified from a source other than the current dataset. ... [Pg.108]

It should be noted that there are other multivariate variable selection methods that one could consider for their application. For example, the interactive variable selection (IVS) method71 is an actual modification of the PLS method itself, where different sets of X-variables are removed from the PLS weights (W, see Equation 8.37) of each latent variable in order to assess the usefulness at each X-variable in the final PLS model. [Pg.316]

Cmciani, G., Watson, K. Comparative Molecular Field Analysis Using GRID Force Field and GOLPE Variable Selection Methods in a Study of Inhibitors of... [Pg.245]

A partially Bayesian approach was suggested by Chipman et al. (1997). They used independent prior distributions for each main effect being active. The prior distribution selected for Pj was a mixture of normals, namely, N(0, r ) with prior probability 1 — tzj and N(0, Cj if) with prior probability ttj, where Cj greatly exceeds 1. The prior distribution for a2 was a scaled inverse-x2. They then used the Gibbs-sampling-based stochastic search variable selection method of George and McCulloch (1993) to obtain approximate posterior probabilities for Pj, that is, for each factor they obtained the posterior probability that Pj is from /V(0, cj if) rather than from N(0, r ). They treated this as a posterior probability that the corresponding factor is active and used these probabilities to evaluate the posterior probability of each model. [Pg.182]

One of the most common problems encountered in data modeling is the choice of independent variables for inclusion in the model Perhaps one of the major disadvantages of any variable selection method based on an underlying linear model is the fact that the form of the model is specified in advance. If some nonlinear model better fits the data, then this may be used as the underlying model for the variable selection process. A neural network properly fitted to a data set should make use of the best nonlinear model as dictated by the data itself. Thus a variable selection procedure applied to the network model might be expected to extract the most relevant set of variables, at least in terms of modeling the output (response) variable. [Pg.154]

Manavalan P., Johnson W.C. Jr. (1987) Variable Selection Method Improves the Prediction of Protein Secondary Structure from Circular Dichroism Spectra, Anal. Biochem. 167, 76-85. [Pg.293]

For illustration, we shall consider here one of the nonlinear variable selection methods that adopts a k-Nearest Neighbor (kNN) principle to QSAR [kNN-QSAR (49)]. Formally, this method implements the active analog principle that lies in the foundation of the modern medicinal chemistry. The kNN-QSAR method employs multiple topological (2D) or topographical (3D) descriptors of chemical structures and predicts biological activity of any compound as the average activity of k most similar molecules. This method can be used to analyze the structure-activity relationships (SAR) of a large number of compounds where a nonlinear SAR may predominate. [Pg.62]

In order to avoid some drawbacks of the stepwise approaches, the i-fold stepwise variable selection method was recently proposed [Lucic et ai, 1999b]. This technique is based on descriptor orthogonalization and, at each subsequent step, adds the set of the best i descriptors. [Pg.468]

A variable selection method proposed to decrease the huge number of models that must be evaluated by all subset model searching. [Pg.470]

Among these methods, Generating Optimal Linear PLS Estimations (GOLPE) is a variable selection method for selecting by -> experimental design a limited number of - interaction energy values, aimed at obtaining the best predictive PLS models. [Pg.473]

An exploratory analysis performed by FSIW-EFA provides an estimate of the number of components in each pixel. For resolution purposes, only those pixels in the partial local rank map will be potentially constrained, because these are the pixels for which a robust estimation of the number of missing components can be obtained. However, the FSIW-EFA information is not sufficient to identify which components are absent from the constrained pixels. For identification purposes, the local rank information should be combined with reference spectral information, the ideal reference being the pure spectra of the constituents, although in most images not all of these are known. For the image components with no pure spectrum available, the reference taken is an approximation of this pure spectrum. These approximate pure spectra can be obtained by pure variable selection methods, or they may be the result of a simpler MCR-ALS analysis where only non-negativity constraints have been applied. [Pg.92]

Other specific PLS-based variable selection methods were proposed and presented below. [Pg.854]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...