Partition-Based Methods

Partition-based modeling methods are also called subset selection methods because they select a smaller subset of the most relevant inputs. The resulting model is often physically interpretable because the model is developed by explicitly selecting the input variable that is most relevant to approximating the output. This approach works best when the variables are independent (De Veaux et al., 1993). The variables selected by these methods can be used as the analyzed inputs for the interpretation step. [Pg.41]

The general empirical model given by Eq. (22) can be specialized to that determined by partition-based methods as [Pg.41]

The basis functions in a CART or inductive decision tree model are given by [Pg.41]

Pm is the number of partitions or splits spm = 1 and indicates the right or left of the associated step function v(p, m) indicates the selected input variables in each partition and tpm represents the location of the split in the corresponding input space. The indices p and m are used for the split and node or basis function, respectively. The basis functions given by Eq. (34) are of a fixed, piecewise constant shape. [Pg.42]

Empirical comparison of MARS with BPN has shown that the perfor- [Pg.42]

Partition-based methods address dimensionality by selecting input variables that are most relevant to efficient empirical modeling. The input space is partitioned by hyperplanes that are perpendicular to at least one of the input axes, as depicted in Fig. 6d. [Pg.11]

Fig. 6. Input transformation in (a) methods based on linear projection, (b) methods based on nonlinear projection, nonlocal transformation, (c) methods based on nonlinear projection, local transformation, and (d) partition-based methods. (From Bakshi and Utojo, 1998.)...

Following the same generalized model given by Eq. (6), input-output methods may be broadly classified as either projection-based or partition-based methods, as listed in Fig. 2. [Pg.33]

Dissimilarity and clustering methods only describe the compounds that are in the input set voids in diversity space are not obvious, and if compounds are added then the set must be re-analyzed. Cell-based partitioning methods address these problems by dividing descriptor space into cells, and then populating those cells with compounds [67, 68]. The library is chosen to contain representatives from each cell. The use of a partition-based method with BCUT descriptors [69] to design an NMR screening library has recently been described [70]. [Pg.401]

The choice of representation, of similarity measure and of selection method are not independent of each other. For example, some types of similarity measure (specifically the association coefficients as exemplified by the well-known Tanimoto coefficient) seem better suited than others (such as Euclidean distance) to the processing of fingerprint data [12]. Again, the partition-based methods for compound selection that are discussed below can only be used with low-dimensionality representations, thus precluding the use of fingerprint representations (unless some drastic form of dimensionality reduction is performed, as advocated by Agrafiotis [13]). Thus, while this chapter focuses upon selection methods, the reader should keep in mind the representations and the similarity measures that are being used recent, extended reviews of these two important components of diversity analysis are provided by Brown [14] and by Willett et al. [15]. [Pg.116]

There is already an extensive literature relating to compound-selection methods, from which it is possible to identify four major classes of method although, as we shall see, there is some degree of overlap between these four classes, viz cluster-based methods, dissimilarity-based methods, partition-based methods and optimisation methods. The next four sections of this chapter present the various algorithms that have been suggested for each approach we then discuss comparisons and applications of these algorithms, and the chapter concludes with some thoughts on further developments in the field. [Pg.117]

An alternative two-part classification has been proposed by Pearlman et al. [90], who characterise methods as either cell-based or distance-based, these classes corresponding to partition-based methods and to all the other types of method, respectively. As Pearlman et al. note, distance-based methods can be used with any type of structural representation but are most effective when the need is to identify subsets (of whatever sort) cell-based... [Pg.134]

Clustering algorithms can be classified into four major approaches hierarchical methods, partitioning-based methods, density-based methods, and grid-based methods. Here, we will focus on the hierarchical cluster approach because it is often used in the context of structure-activity analysis. Recent research has suggested that hierarchical methods perform better than the more commonly used nonhierarchical methods in separating known actives and inactives [41]. [Pg.681]

However, the fingerprint driven diversity methods suffer from an inability to describe a bounded chemical space novel molecules can be added with a concurrent increase in diversity with little indication of how evenly sampled parts of the space is. In this context, partition based methods promise much. The BCUT descriptors are particularly suitable for the definition of a bounded chemical space and have the added bonus that they are quickly calculated. Partition based methods also scale very well in that it is only necessary to calculate which bin the molecule falls into, not to compute all pairwise similarities with the other molecules in the set. Absolute and relative diversity may be computed from bin occupancy, for example the number of bins covered by a compound set. [Pg.373]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...