Regression and classification

Multiscale regression or wavelet regression [60] is based on the simple idea that the mapping between the independent and dependent variables may involve different resolution levels. Most approaches to multivariate regression and classification only make use of the original data resolution in forming models. The multiscale approach enables the investigator to zoom in and out of the detail structures in the data. [Pg.375]

Let us now consider regression in general in terms of a matrix formulation of the fast wavelet transform. [Pg.375]

The FWT basis malri.x. The fast wavelet transform (FWT) can be formulated in terms of matrix algebra by storing each of the wavelet functions in the time/wavelength domain in a matrix B. This matrix contains all the translations and dilations of the wavelet necessary to perform a full transform. One common way to organise this matrix is to sort the sets of shifted basis [Pg.375]

In a typical chemical regression problem the X matrix contains the N spectra (as rows) and M wavelengths (as columns) where y is column vector of a component concentration. Assuming that Beers law holds we have that [Pg.376]

X is the generalised inverse performed by some regression method (e.g. partial least squares regression). Inserting for X [Pg.376]

If the target variable is continuous, a method leading to a predicting function for Y [Pg.222]

If the target variable is discrete and assumes values from a finite set 6, we search for a predicting function [Pg.222]

This is called classification. The predicted values /(Xj) should agree with the known class variable value for as many observations as possible. To express this quantitatively, we introduce a cost function [Pg.223]

The function value L(k, 1) indicates how to penalize a misclassification of an observation from class k as class 1. Of course, the cost of a correct classification should be L(k,k) = 0. Unless given otherwise, in this work we will use the zero-one cost function L(k, Z) = 1 - 8 k, 1), where [Pg.223]

Using the zero-one cost function, TCE is just the number of misclassifications. [Pg.223]

Worth AP, Cronin MTD. The use of discriminant analysis, logistic regression and classification tree analysis in the development of classification models for human health effects. J Mol Struct (Theochem) 2003 622 97-111. [Pg.492]

The similarity in approach to LDA (Section 33.2.2) and PLS (Section 33.2.8) should be pointed out. Neural classification networks are related to neural regression networks in the same way that PLS can be applied both for regression and classification and that LDA can be described as a regression application. This can be generalized all regression methods can be applied in pattern recognition. One must expect, for instance, that methods such as ACE and MARS (see Chapter 11) will be used for this purpose in chemometrics. [Pg.235]

Another type of ANNs widely employed is represented by the Kohonen self organizing maps (SOMs), used for unsupervised exploratory analysis, and by the counterpropagation (CP) neural networks, used for nonlinear regression and classification (Marini, 2009). Also, these tools require a considerable number of objects to build reliable models and a severe validation. [Pg.92]

Validation techniques constitute a fundamental tool for the assessment of the validity of models obtained from a data set by multivariate regression and classification methods. Validation techniques are used to check the predictive power of the models, i.e. to give a measure of their capability to perform reliable predictions of the modelled response for new cases where the response is imknown [Diaconis and Efron, 1983 Myers, 1986 Cramer III et al, 1988a Rawlings, 1988]. [Pg.461]

Todeschini, R., Consonni, V., Mauri, A. and Pavan, M. (2003) MOBYDIGS software for regression and classification models by genetic algorithms, in Chemometrics Genetic Algorithms and Artificial Neurcd Networks (ed. R. Leardi), Elsevier, Amsterdam, The Netherlands, pp. 141-167. [Pg.1183]

The majority of the QSAR strategies aimed at building models are based on regression and classification methods, depending on the problem studied. For continuous properties, like most of the biological activities and physico-chemical properties, the typical QSAR/QSPR model is defined as... [Pg.1252]

A ranking model is a relationship between a set of dependent attributes, experimentally investigated, and a set of independent attributes, i.e. model variables. As in regression and classification models the variable selection is one of the main step to find predictive models. In the present work, the Genetic Algorithm (GA-VSS) approach is proposed as the variable selection method to search for the best ranking models within a wide set of predictor variables. The ranking based on the selected subsets of variables is... [Pg.181]

Todeschini R, Consonni V, Mauri A, Pavan M (2004) MobyDigs software for regression and classification models by genetic algorithms in Nature-inspired methods in chemometrics genetic algorithms and artificial neural networks (R. Leardi Ed.), Chapter 5, Elsevier pp 141-167... [Pg.217]

An algorithm for the construction of more parsimonious regression and classification models can be found based on these formulas ... [Pg.363]

A model population analysis approach for statistical model comparison is developed in this work. From our case studies, we have found strong evidences that support the use of model population analysis for the comparison of different variable sets or different modeling methods in both regression and classification. P values resulting from the proposed method in combination with the sign of the mean of D values clearly shows whether two models have the same performance or which model is significantly better. [Pg.18]

LeBlanc, M. and R.J. Tibshirani. Combining estimates in regression and classification. / Am Stat Assoc, 91 1641-1650,1996. [Pg.190]

BP networks tend to get stuck in local minima. Although the algorithms mentioned earlier that speed up learning help to overcome this tendency, it may be best to use a different network if you think this is a serious problem. The networks listed in the second to last paragraph for regression and classification problems are possible options. [Pg.91]

Steinfath, M., Groth, D., Lisec, L, Selbig, J. (2008) Metabohte profile analysis from raw data to regression and classification. Physiologia Plantarum, 132, 150-161. [Pg.556]

ReNDeR. The ReNDeR program may be used to produce non-linear displays and principal components plots of data sets (as well as regression and classification). Further details are available from Andy Lewcock, AEA Technology, Applied Neurocomputing Centre, 8.12 Harwell, EMdeot, Oxon 0X11 ORA, UK... [Pg.236]

Preprocessing of multivariate and multiway data sets prior to regression and discriminant analysis follow the general principles outlined earher with few exceptions. In general, the response matrix (i.e., the data that are to be predicted) should be mean centered because this serves an additional purpose in regression and classification models. By centering both the dependent and independent variables, any possible differences in offsets are ranoved. Row normalization can be implemented if the priority is to establish a relationship between variables, rather than estimate the magnitude of the response, or to stabilize the impact of differently concentrated samples on models, as previously described. For example, if the calibration model is intended to predict a concentration from data that follow the Beer-Lambert law (e.g., fluorescence), then it is crucial not to normalize as this would cause the loss of concentration information. If, on the other hand, the model is intended to classify samples, then normahzation may help the model focus on patterns rather than on concentration-induced variations. [Pg.344]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...