Published Variable Selection Methods

The number of published accounts of variable selection methods in the general literature is enormous. To provide a focus, this section will concentrate just on applications to computer-aided drug design. Variable selection was identified as an important requirement at about the same time as the need for variable elimination techniques. The simplest method of variable selection is to choose those variables that have a large correlation with the response and, for simple datasets, that method is probably not a bad choice. As we have shown in this chapter, variable selection may be an integral part of a modeling technique, but not all modeling methods lend themselves to variable selection, and in these cases, other techniques need to be applied. [Pg.339]

GA-based approaches have optimized the production of many different types of models (artificial neural network architectures in particular) and simultaneously selected variables and optimized neural network mod-els. ° ° GAs coupled with well-known and less well-known modeling methods have also been used by scientists in variable selection. The combination of a GA with multiple linear regression was shown to perform well on datasets containing 15, 26, and 35 descriptors. PLS coupled with a GA has also been shown to be useful in variable selection. Spline fitting [Pg.340]

A great deal of interest exists in methods for variable selection as well as for model evaluation, which are actually two sides of the same coin. Bayesian neural networks include a procedure called automatic relevance determination (ARD), allowing for the identification of important variables.A -nearest neighbor method for variable selection has been applied successfully to problems of biological activity and metabolic stability. Other [Pg.340]

given these different approaches to variable selection, is it possible to recommend a particular procedure The answer is no, unfortunately, because these reports involve many different datasets, model building strategies, and intended aims, thus making direct comparisons difficult. Reports on comparing variable selection strategies are beginning to appear, however, and no doubt in the fullness of time a superior approach or sequence of approaches will be identified. [Pg.341]

The development of computational chemistry software and techniques, coupled with the increasing speed and decreasing costs of computing machinery, has transformed computer-aided material design, particularly drug design. The calculation of many chemical descriptors for almost any kind of molecule is now a trivial problem. Variable selection, however, is not trivial and becomes necessary when a dataset contains many variables. What constitutes as many depends both on the use that will be made of the data by the scientist and the ratio of data points (cases) to variables. [Pg.341]

Since the value of H depends on the choice of , modifications of this procedure have been proposed (Fernandez Piema and Massart 2000). Another modification of the Hopkins statistic—published in the chemometrics literature—concern the distributions of the values of the used variables (Hodes 1992 Jurs and Lawson 1991 Lawson and Jurs 1990). The Hopkins statistic has been suggested for an evaluation of variable selection methods with the aim to find a variable set (for instance, molecular descriptors) that gives distinct clustering of the objects (for instance, chemical structures)—hoping that the clusters reflect, for instance, different biological activities (Lawson and Jurs 1990). [Pg.286]

Several methods have been published for detecting multicolinearity. Because the focus of this chapter is on variable selection methods, we will merely list some of the earlier popular methods and discuss in more detail a method developed specifically to assess the amount of multicolinearity present together with the group or groups of variables involved. A more detailed account of these methods can be found elsewhere." ... [Pg.304]

The number of papers published in the literature relating to supervised variable selection methods is vast. Its obvious association with the ubiquitous publication of regression modeling means that trying to provide an exhaustive overview of the literature is an impossible task. Although this chapter aims to review variable selection methods, all that we can hope to achieve realistically is a broad coverage of the more obvious source methods and report here techniques that are sound from a statistical point of view and for which the software is available... [Pg.309]

Many current multidimensional methods are based on instruments that combine measurements of several luminescence variables and present a multiparameter data set. The challenge of analyzing such complex data has stimulated the application of special mathematical methods (80-85) that are made practical only with the aid of computers. It is to be expected that future analytical strategies will rely heavily on computerized pattern recognition methods (79, 86) applied to libraries of standardized multidimensional spectra, a development that will require that published luminescence spectra be routinely corrected for instrumental artifacts. Warner et al, (84) have discussed the multiparameter nature of luminescence measurements in detail and list fourteen different parameters that can be combined in various combinations for simultaneous measurement, thereby maximizing luminescence selectivity with multidimensional measurements. Table II is adapted from their paper with the inclusion of a few additional parameters. [Pg.12]

RR is similar to PCR in that the independent variables are transformed to their principal components (PCs). However, while PCR utilizes only a subset of the PCs, RR retains them all but downweighs them based on their eigenvalues. With PLS, a subset of the PCs is also used, but the PCs are selected by considering both the independent and dependent variables. Statistical theory suggests that RR is the best of the three methods, and this has been generally borne out in multiple comparative studies [30,36-38]. Thus, some of our published studies report RR results only. [Pg.486]

Propene on HY was, therefore, selected for the first in situ variable-temperature study using the CAVERN method. These experiments were carried out in early 1988 and published in 1989 (93). The central features of the CAVERN experiments were that the propene was introduced into the zeolite at cryogenic temperature and the sample was manipulated so that spectral acquisition could commence with an unreacted sample. Additional spectra were then acquired as the sample was slowly raised to room temperature. Detailed experiments of this sort were carried out for propene-2-l3C and propene-7-13C and less extensive experiments were performed for propene-3-13C. These experiments showed, among other things, that the 250 ppm peak was formed coincident with a second peak at ca. 156 ppm and the relative intensities of these peaks were 2 1. A careful study of the literature of carbenium ion chemistry in sulfuric acid and superacid solution media suggested the assignment of these resonances (250 and 156 ppm) to alkyl-substituted cyclopentenyl cations similar to 4. [Pg.141]

Influence of catalyst preparation, composition, and structure on activity and selectivity. There is an extensive literature some of which has been covered incidentally already. An excellent review has been published by Ripperger and Saum. Much of the work relates to the catalysts as oxides and so will not be covered in detail in this Report. Variables that influence catalytic properties include preparation e.g., method and order of addition of active components), pretreatment (drying and calcination,pre-reduction sulphiding,composition (e.g., concentration and ratio of active components and the type of support), distribution of active components in catalyst particles,particle size, " and surface area and pore size distribution. [Pg.200]

Published performance data on various pieces of equipment and different materials is still relatively scarce. More of this work would help in the selection of equipment and in the understanding of the effects of equipment and particle variables. Examples which clearly illustrate sound methods of measurement, calculation of product uniformity, and control of product quality and photographs of mixing and particle movements, would also be helpful in developing this area. [Pg.319]

A review of the important aspects of current reliability theory has been published by the British Construction Industry Research and Information Association [61]. Only an outline of the basic ideas will be reviewed here. Methods of safety analysis grouped under the general heading of reliability theory have been categorised into three levels as follows level 1, includes methods in which appropriate levels of structural reliability are provided on a structural element (member) basis, by the specification of partial safety factors and characteristic values of basic variables level 2, includes methods which check probabilities of failure at selected points on a failure boundary defined by a given limit state equation this is distinct from level 3 which includes methods of exact probabilistic analysis for a whole structural system, using full probability distributions with probabilities of failure interpreted as relative frequencies. [Pg.79]

In the last thirty years, data which have been published on the level of serum uric acid in man have varied considerably. The main reason for this is a considerable increase in the uric acid values. Furthermore, different laboratory methods (mostly improved laboratory techniques), the selection of the treatment groups, the different definition of the critical value, which is often mistaken for the middle or average value, and last but not least the different conditions for blood taking have all been responsible for these variable data. [Pg.1]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...