Supervised classification problems

There are two different approaches to the protein sequence classification problem. One can use an unsupervised neural network to group proteins if there is no knowledge of the number and composition of final clusters (e.g., Ferran Ferrara, 1992). Or one can use supervised networks to classify sequences into known (existing) protein families (e.g., Wu et al., 1992). [Pg.136]

Once the data are prepared, they can be explored chemometrically with techniques as PCA, rPCA, PP, and clustering. These enable visualization of the structure of the data set more specifically, they detect outliers and group similar samples. For several applications, it was confirmed that this approach outperforms the visual comparison of electropherograms. Chemometric techniques can also be apphed to classify samples based on their CE profile. When the classes in the data set are a priori known, supervised classification techniques as EDA, QDA, kNN, CART, PLSDA, SIMCA, and SVM can be used. The choice of techniques will often depend on the preference of the analyst and the complexity of the data. However, when nonlinear classification problems occur, a more complex technique as, for instance, SVM, will be outper-... [Pg.318]

In chemometrics we are very often dealing not with individual signals, but with sets of signals. Sets of signals are used for calibration purposes, to solve supervised and unsupervised classification problems, in mixture analysis etc. All these chemometrical techniques require uniform data presentation. The data must be organized in a matrix, i.e. for the different objects the same variables have to be considered or, in other words, each object is characterized by a signal (e.g. a spectrum). Only if all the objects are represented in the same parameter space, it is possible to apply chemometrics techniques to compare them. Consider for instance two samples of mineral water. If for one of them, the calcium and sulphate concentrations are measured, but for the second one, the pH values and the PAH s concentrations are available, there is no way of comparing these two samples. This comparison can only be done in the case, when for both samples the same sets of measurements are performed, e.g. for both samples, the pH values, and the calcium and sulphate concentrations are determined. Only in that case, each sample can be represented as a point in the three-dimensional parameter space and their mutual distances can be considered measures of similarity. [Pg.165]

As supervised methods are considered in this chapter (see [13] for applying ETD to unsupervised classification problems) it is assumed that there are time series available for calibration of the static classification method. For this data the class affiliation of each measurement is known and it is further supposed that no transitions occur during calibration measurements. [Pg.314]

There are two main classification systems to organize proteins based on their structure CATH [174] and SCOP [175]. These systems are used to label training data for a number of supervised learning problems found in protein structure prediction. This problem is divided into three subproblems depending on the data... [Pg.53]

Technometrics. Vol. 17, pp. 103-109. ISSN 0040-1706 Meissen, W. Wehrens, R. Buydens, L. (2006). Supervised Kohonen networks for classification problems. Chemometrics and Intelligent Laboratory Systems. Vol. 83, pp. 99-113. ISSN 0169-7439... [Pg.38]

Support vector machine (SVM) is originally a binary supervised classification algorithm, introduced by Vapnik and his co-workers [13, 32], based on statistical learning theory. Instead of traditional empirical risk minimization (ERM), as performed by artificial neural network, SVM algorithm is based on the structural risk minimization (SRM) principle. In its simplest form, linear SVM for a two class problem finds an optimal hyperplane that maximizes the separation between the two classes. The optimal separating hyperplane can be obtained by solving the following quadratic optimization problem ... [Pg.145]

Weighting methods determine the importance of the scaled features for a certain classification problem. Weighting can only be used for supervised learning. Weighting factors are calculated or estimated by... [Pg.103]

With two exceptions, nonclassification, supervised learning problems constitute any continuous valued output, supervised learning problem other than classification. The exceptions are heteroassociative and autoassociative binary output problems such as mapping, data compression, and dimension reduction. [Pg.118]

Classification, or the division of data into groups, methods can be broadly of two types supervised and unsupervised. The primary difference is that prior information about classes into which the data fall is known and representative samples from these classes are available for supervised methods. The supervised and unsupervised approaches loosely lend themselves into problems that have prior hypotheses and those in which discovery of the classes of data may be needed, respectively. The division is purely for organization purposes in many applications, a combination of both methods can be very powerful. In general, biomedical data analysis will require multiple spectral features and will have stochastic variations. Hence, the field of statistical pattern recognition [88] is of primary importance and we use the term recognition with our learning and classification method descriptions below. [Pg.191]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...