Machine learning method

3 Machine Learning Methods in Chemoinformatics for Drug Discovery --------------------------=1 [Pg.136]

ML is a branch of artificial intelligence, which is concerned with the constraction and study of computational systems that can learn from data [9]. A ML system could be trained based on properties and features and on the basis of that information, predictions can be done. The aim of ML is to teach a machine to learn from experiences, i.e. to feed it with a set of example objects and, based on the information content thereof, to build a classifier or a predictive model [10] (Fig. 3.3). [Pg.136]

The ML-based classifiers can be divided into the following types [Pg.136]

They mainly consist of a training data set and analyse this training data to learn relationships between data elements to produce an inferred function. They involve algorithms such as Bayesian statistics, decision tree (DT) learning, support vector machine (SVM), random forest (RF) and nearest neighbour algorithms. [Pg.136]

In this type of algorithm, there is no supervising (as in supervised learning) label data in the training set to fignre out the hidden stmcture within the unlabelled data [Pg.136]

It extends the usage of statistical methods and combines it with machine learning methods and the application of expert systems. The visualization of the results of data mining is an important task as it facilitates an interpretation of the results. Figure 9-32 plots the different disciplines which contribute to data mining. [Pg.472]

A machine-learning method was proposed by Klon et al. [104] as an alternative form of consensus scoring. The method proved unsuccessful for PKB, but showed promise for the phosphatase PTPIB (protein tyrosine phosphatase IB). In this approach, compounds were first docked into the receptor and scored using conventional means. The top scoring compounds were then assumed to be active and used to build a naive Bayes classification model, all compounds were subsequently re-scored and ranked using the model. The method is heavily dependent upon predicting accurate binding... [Pg.47]

Neural Nets (NNs) relate a set of input neurons with an output neuron (providing the prediction label of a data point) by a network of layers of neurons in the interior. They are certainly among the most frequently used Machine Learning methods in the field [148] and allow for a high degree of customization since the architecture of the network itself is part of the parameters the user may define. [Pg.75]

Support Vector Machines (SVMs) generate either linear or nonlinear classifiers depending on the so-called kernel [149]. The kernel is a matrix that performs a transformation of the data into an arbitrarily high-dimensional feature-space, where linear classification relates to nonlinear classifiers in the original space the input data lives in. SVMs are quite a recent Machine Learning method that received a lot of attention because of their superiority on a number of hard problems [150]. [Pg.75]

Fig. 4. Application of bioinformatics tools to 2D-DIGE data analysis. Proteome data consisting of the normalized spot intensity values are exported from the image analysis software and their correlation with clinicopathological data examined. Using informatics tools including clustering algorithms and machine-learning methods, a novel cancer classification based on proteome data is established, and key proteomic features and proteins corresponding to biomarker candidates are identified.

The efficiency of several popular machine-learning methods ANN, SVM, NN, Maximal Margin Linear Programming, RBFNN, and MLR, to build predictive... [Pg.340]

To conclude, we discuss a couple of approaches to the integration of genome variation and gene expression data and that fall within the realm of what is known as machine learning methods. [Pg.452]

Prometheus is essentially a Java application capable of doing pioteomics. genomics and chemo metric compulations and uses machine learning methods. The complexity of the whole application is due to complex computational algorithms involved. The ncod tor converting it to an executable arises due to following reasons ... [Pg.225]

Simplistic and heuristic similarity-based approaches can hardly produce as good predictive models as modern statistical and machine learning methods that are able to assess quantitatively biological or physicochemical properties. QSAR-based virtual screening consists of direct assessment of activity values (numerical or binary) of all compounds in the database followed by selection of hits possessing desirable activity. Mathematical methods used for models preparation can be subdivided into classification and regression approaches. The former decide whether a given compound is active, whereas the latter numerically evaluate the activity values. Classification approaches that assess probability of decisions are called probabilistic. [Pg.25]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...