Regression creating models

The underlying notion in bilinear modeling is that something causes the systematic variabilities in the X data. But we may not correctly know what it is there may be surprises in the data due to unexpected interferents, chemical interactions, nonlinear responses, etc. An approximate model of the subspace spanned by these phenomena in X is created. This X model is used for stabilizing the calibration modeling. The PLS regression primarily models the most dominant and most y-relevant of these X phenomena. Thus neither the manifest measured variables nor our causal assumptions about physical laws are taken for granted. Instead we tentatively look for systematic patterns in the data, and if they seem reasonable, we use them in the final calibration model. [Pg.197]

Principal component regression (PCR) is an extension of PCA with the purpose of creating a predictive model of the Y-data using the X or measurement data. For example, if X is composed of temperatures and pressures, Y may be the set of compositions that results from thermodynamic considerations. Piovoso and Kosanovich (1994) used PCR and a priori process knowledge to correlate routine pressure and temperature measurements with laboratory composition measurements to develop a predictive model of the volatile bottoms composition on a vacuum tower. [Pg.35]

All regression methods aim at the minimization of residuals, for instance minimization of the sum of the squared residuals. It is essential to focus on minimal prediction errors for new cases—the test set—but not (only) for the calibration set from which the model has been created. It is relatively easy to create a model— especially with many variables and eventually nonlinear features—that very well fits the calibration data however, it may be useless for new cases. This effect of overfitting is a crucial topic in model creation. Definition of appropriate criteria for the performance of regression models is not trivial. About a dozen different criteria— sometimes under different names—are used in chemometrics, and some others are waiting in the statistical literature for being detected by chemometricians a basic treatment of the criteria and the methods how to estimate them is given in Section 4.2. [Pg.118]

For each chromosome (variable subset), a so-called fitness (response, objective function) has to be determined, which in the case of variable selection is a performance measure of the model created from this variable subset. In most GA applications, only fit-criteria that consider the number of variables are used (AIC, BIC, adjusted R2, etc.) together with fast OLS regression and fast leave-one-out CV (see Section 4.3.2). Rarely, more powerful evaluation schemes are applied (Leardi 1994). [Pg.157]

FIGURE 4.24 PLS as a multiple linear regression method for prediction of a property y from variables xi,..., xm, applying regression coefficients b1,...,bm (mean-centered data). From a calibration set, the PLS model is created and applied to the calibration data and to test data. [Pg.165]

In this paper the PLS method was introduced as a new tool in calculating statistical receptor models. It was compared with the two most popular methods currently applied to aerosol data Chemical Mass Balance Model and Target Transformation Factor Analysis. The characteristics of the PLS solution were discussed and its advantages over the other methods were pointed out. PLS is especially useful, when both the predictor and response variables are measured with noise and there is high correlation in both blocks. It has been proved in several other chemical applications, that its performance is equal to or better than multiple, stepwise, principal component and ridge regression. Our goal was to create a basis for its environmental chemical application. [Pg.295]

Utilization of Partial Least Squares (PLS) (22) regression to reduce the data set and create the QSAR model. [Pg.176]

In either case, reaching this point indicates that the drug is beneficial or not and is at least a qualitative endpoint. Last observation carried forward (LOCF), a standard method of data analysis, carries the last data point forward week by week. Random regression models can estimate what would happen at a later time point, assuming that patients change in a linear fashion. Improvement, however, often levels off. Thus, creating data points based on questionable assumptions can potentially introduce substantial bias. [Pg.24]

PCR creates a quantitative model in a two-step process (1) the so-called principal components analysis (PCA) scores (they are described just below), T, of the I calibration samples are calculated for A factors and then (2) the scores are regressed against the analyte concentration. [Pg.174]

The trouble is that you often have too many descriptors, and/or insufficient information on the reaction mechanism. This creates two problems building a regression model requires the calculation of the inverse ofXTX, which cannot be done for a matrix X that contains more variables than experiments. Moreover, if you... [Pg.257]

Sometimes the question arises whether it is possible to find an optimum regression model by a feature selection procedure. The usual way is to select the model which gives the minimum predictive residual error sum of squares, PRESS (see Section 5.7.2) from a series of calibration sets. Commonly these series are created by so-called cross-validation procedures applied to one and the same set of calibration experiments. In the same way PRESS may be calculated for a different sets of features, which enables one to find the optimum set . [Pg.197]

The lack of uniqueness of circuit models creates ambiguity when interpreting impedance response using regression analysis. A good fit does not, in itself, validate the model rised. As discussed in Chapter 23, impedance spectroscopy is not a standalone technique. Additional observations are needed to validate a model. [Pg.72]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...