SVM regression models

Eden Prairie, MN), DICKEY-john OmegAnalyzerG (DICKEY-john Corp, Auburn, IL), Perten DA 7200 (Perten Instruments Inc., Springfield, IL), Bruker Optics/ Cog-nis QTA (Brucker Optics Inc., Billerica, MA), and an ASD LabSpec Pro (Analytical Spectral Devices Inc., Boulder, CO) for 18 amino acids. Partial least squares (PLS) and support vector machines (SVM) regression models performed significantly better than artificial neural networks (ANN). They used a calibration data set of 526 samples... [Pg.181]

Table 2 Data for the Angiotensin II Antagonists QSAR and for the SVM Regression Models from Figures 8-11...

Figure 8 SVM regression models with a degree 2 polynomial kernel (Eq. [65]) for the dataset from Table 2 (a) s = 0.05 (b) s = 0.1.

Figure 10 SVM regression models with 8 = 0.1 for the dataset of Table 2 (a) polynomial kernel, degree 10, Eq. [65] (b) exponential radial basis function kernel, a = 1, Eq. [67].

Similarly with Eq. [101], the kernel SVM regression model has w given by... [Pg.343]

Table 6 Patterns Used for the SVM Regression Models in Figures 45—48...

Figure 45 SVM regression models for the dataset from Table 6, with s = 0.1 (a) degree 10 polynomial kernel (b) spline kernel.

In Figure 45, we present two SVM regression models, the first one obtained with a degree 10 polynomial kernel and the second one computed with a spline kernel. The polynomial kernel has some oscillations on both ends of the curve, whereas the spline kernel is observed to be inadequate for modeling the two spikes. The RBF kernel was also unable to offer an acceptable solution for this regression dataset (data not shown). [Pg.345]

By further increasing s to 0.5 (Figure 47a), the shape of the SVM regression model becomes even less similar to the dataset. The regression tube is now... [Pg.345]

In 2008, Yan et al. [54] made prediction for a data set of 552 compounds for which HIA experimental data are available. Molecular descriptors were calculated by ADRIANA.Code and Cerius 2 as well. A set of models were constructed with PLS and SVM regression. The best model, which developed with SVM regression, had correlation coefficient of 0.89 and standard error of 16.35%. [Pg.113]

First, a percentage, say 80%, from the m samples with variables in Vi and V2 is randomly selected to build two regression models using a preselected modeling method such as PLS [11] or support vector machines (SVMs) [12], respectively. Then an RMSEP value can be computed for each model by using the remaining 20% samples as the test set. Denote the two RMSEP values as RMSEPi and RMSEP2, of which the difference can be calculated as... [Pg.9]

These classification methods use different principles and rules for learning and prediction of class membership, but wiU usually produce a comparable result. Some comparisons of the methods have been given (i.e., Kotsiantis, 2007 Rani et al., 2006). Although the modem methods such as SVM have demonstrated very good performance, the drawback is that the model becomes an incomprehensible black-box that removes the explanatory information provided by, for example, a logistic regression model. However, classification performance usually outweighs the need for a comprehensible model. PCA has been used for classification based on bioimpedance measurements. Technically, PCA is not a method for classification but rather a method of data reduction, more suitable as a parameterization step before the classification analysis. [Pg.386]

SVR) maximizes the prediction accuracy of the classifier (regression) model while simultaneously escaping from data overfitting. In SVM, the inputs are first nonlin-early mapped into a high-dimensional feature space (O) wherein they are classified using a linear hyperplane (Fig. 3.4). [Pg.138]

We validated the CMF approach in two case studies and obtained preliminary results, which have been published as short communications [7, 13]. The first one dealt with the use of the CMF to build 3D-QSAR regression models [7]. In the second case study [13], the performance of a new method for virtual screening of organic compounds based on the combination of the CMF methodology with the one-class SVM method (1-SVM) has been assessed. In both cases the CMF has not only proven its efficiency, bnt has also demonstrated some advantages compared to state-of-the-art approaches in chemoinformatics. [Pg.441]

Support vector machines were extended by Vapnik for regression" by using an s-insensitive loss function (Figure 7). The learning set of patterns is used to obtain a regression model that can be represented as a tube with radius s fitted to the data. In the ideal case, SVM regression finds a function that maps... [Pg.296]

We will use this dataset later to demonstrate the kernel influence on the SVM regression, as well as the effect of modifying the tube radius e. However, we will not present QSAR statistics for the SVM model. Comparative QSAR models are shown in the section on SVM applications in chemistry. [Pg.297]

The -insensitive loss function used in the SVM regression adds a new parameter s that significantly influences the model and its prediction capacity. Besides the -insensitive, other loss functions can be used with SVM regression, such as quadratic, Laplace, or Huber loss functions (Figure 44). [Pg.344]

We now present an illustrative example of a one-dimensional nonlinear SVM regression using the dataset in Table 6. This dataset has two spikes, which makes it difficult to model with the common kernels. [Pg.344]

Aptula et al. used multiple linear regression to investigate the toxicity of 200 phenols to the ciliated protozoan Tetrahymena pyriformis Using their MLR model, they then predicted the toxicity of another 50 phenols. Here we present a comparative study for the entire set of 250 phenols, using multiple linear regression, artificial neural networks, and SVM regression methods. Before computing the SVM model, the input vectors were scaled to zero mean and unit variance. The prediction power of the QSAR models was tested with complete cross-validation leave-5%-out (L5%0), leave-10%-out (L10%O), leave-20%-out (L20%O), and leave-25%-out (L25%0). The capacity parameter C was optimized for each SVM model. [Pg.363]

Table 15 contains the best SVM regression results for each kernel. The cross-validation results show that the correlation coefficient decreases in the following order of kernels linear > degree 2 polynomial > neural > RBF > anova. The MLR and SVMR linear models are very similar, and both are significantly better than the SVM models obtained with nonlinear kernels. The inability of nonlinear models to outperform the linear ones can be attributed to the large experimental errors in determining BCF. [Pg.370]

Selecting an optimum group of descriptors is both an important and time-consuming phase in developing a predictive QSAR model. Frohlich, Wegner, and Zell introduced the incremental regularized risk minimization procedure for SVM classification and regression models, and they compared it with recursive feature elimination and with the mutual information procedure. Their first experiment considered 164 compounds that had been tested for their human intestinal absorption, whereas the second experiment modeled the aqueous solubility prediction for 1297 compounds. Structural descriptors were computed by those authors with JOELib and MOE, and full cross-validation was performed to compare the descriptor selection methods. The incremental... [Pg.374]

Aires-de-Sousa and Gasteiger used four regression techniques [multiple linear regression, perceptron (a MLF ANN with no hidden layer), MLF ANN, and v-SVM regression] to obtain a quantitative structure-enantioselectivity relationship (QSER). The QSER models the enantiomeric excess in the addition of diethyl zinc to benzaldehyde in the presence of a racemic catalyst and an enan-tiopure chiral additive. A total of 65 reactions constituted the dataset. Using 11 chiral codes as model input and a three-fold cross-validation procedure, a neural network with two hidden neurons gave the best predictions ANN 2 hidden neurons, R pred = 0.923 ANN 1 hidden neurons, R pred = 0.906 perceptron, R pred = 0.845 MLR, R p .d = 0.776 and v-SVM regression with RBF kernel, R pred = 0.748. [Pg.377]

A molecular similarity kernel, the Tanimoto similarity kernel, was used by Lind and Maltseva in SVM regression to predict the aqueous solubility of three sets of organic compounds. The Taniomto similarity kernel was computed from molecular fingerprints. The RMSE and q cross-validation statistics for the three sets show a good performance of SVMR with the Tanimoto kernel set 1 (883 compounds), RMSE = 0.62 and q = 0.88 set 2 (412 compounds), RMSE = 0.77 and = 0.86 and set 3 (411 compounds), RMSE = 0.57 and 17 = 0.88. An SVMR model was trained on set 1 and then tested on set 2 with good results, i.e., RMSE = 0.68 and q = 0.89. [Pg.377]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...