Random forest methods

Li, S., Fedorowicz, A., Singh, H. and Soderholm, S.C. (2005) Application of the random forest method in studies of local lymph node assay based skin sensitization data. /. Chem. Inf. Model, 45, 952-964. [Pg.1103]

Zhang N, Li B-Q, Gao S, Ruan J-S, Cai Y-D (2012) Computational prediction and analysis of protein [gamma]-carboxylation sites based on a random forest method. Mol Bio Syst 8 2946-2955... [Pg.315]

Random Forest methods (Breiman 2001 Random Forests 2001) construct ensembles of trees based on multiple random selections of subsets of descriptors and bootstrapping of compounds. The compounds not selected in a particular bootstrapping are considered as a so-called out of bag set, and used as the test set. The trees are not pruned. Best trees in the forest are chosen for consensus prediction of external compounds. The method can include bagging (Berk 2008 Breiman 1996) and boosting (Berk 2008 Breiman 1998) approaches. [Pg.1318]

Of the physicochemical descriptors, lipophilicity (as described by clogP and Topological Polar Surface Area (TPSA) gave the strongest overall correlation to incidence of adverse in vivo outcomes, whether analyzed in terms of free or total drug threshold concentrations. In the case of free drug threshold analysis, a Random Forest statistical method indicated that there was a higher chance of a compound with TPSA <70... [Pg.383]

In this study, a machine learning model system was developed to classify cell line chemosensitivity exclusively based on proteomic profiling. Using reverse-phase protein lysate microarrays, protein expression levels were measured by 52 antibodies in a panel of 60 human cancer cell (NCI-60) lines. The model system combined several well-known algorithms, including Random forests, Relief, and the nearest neighbor methods, to construct the protein expression-based chemosensitivity classifiers. [Pg.293]

It is interesting to note that various QSAR/QSPR models from an array of methods can be very different in both complexity and predictivity. For example, a simple QSPR equation with three parameters can predict logP within one unit of measured values (43) while a complex hybrid mixture discriminant analysis-random forest model with 31 computed descriptors can only predict the volume of distribution of drugs in humans within about twofolds of experimental values (44). The volume of distribution is a more complex property than partition coefficient. The former is a physiological property and has a much higher uncertainty in its experimental measurements while logP is a much simpler physicochemical property and can be measured more accurately. These and other factors can dictate whether a good predictive model can be built. [Pg.41]

Figure 10. Model selection and assessment diagnostic performance measure S for random forest and partial least squares (PLS) methods applied to the BBB data for various percentages of the data (Ptrain) in the training set.

We previously applied Random Forests (RF), a tree-based classification and regression method to pathway analysis of gene expression data.46 The proposed methods allow researchers to rank the significance of biological pathways as well as discover important genes in the same process. [Pg.296]

Alternatively, multiple models can be developed using different sets of descriptors [57], One popular decision tree (DT) consensus method, called Random Forests, has recently demonstrated improved performance over bagging [58], The DT method determines a chemical s activity through a... [Pg.160]

Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Machine IntelL, 20 832-844. [Pg.155]

Prediction of Drug-Induced PT Toxicity and Injury Mechanisms with an hiPSC-Based Model and Machine Learning Methods The weak points of the HPTC- and hESC-based models described previously (Sections 23.3.2.1 and 23.3.3.1) were the data analysis procedures. In order to improve result classification, the raw data obtained with three batches of HPTC and the 1L6/1L8-based model (Li et al., 2013) were reanalyzed by machine learning (Su et al., 2014). Random forest (RE), support vector machine (SVM), k-NN, and Naive Bayes classifiers were tested. Best results were obtained with the RF classifier and the mean values (three batches of HPTC) ranged between 0.99 and 1.00 with respect to sensitivity, specificity, balanced accuracy, and AUC/ROC (Su et al., 2014). Thus, excellent predictivity could be obtained by combining the lL6/lL8-based model with automated classification by machine learning. [Pg.378]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...