Dimensionality of the Model

Dimensionality (or complexity) of the regression model refers to the number of factors included in the regression model. This is probably one of the most critical parameters that have to be optimised in order to obtain good predictions. Unfortunately, there is no unique criterion on how to set it. In this introductory book, only the most common procedure is explained in detail. The reader should be aware, however, that it is just a procedure, not the procedure. We will also mention two other recent possibilities that, in our experience, yield good results and are intuitive. [Pg.200]

Before going into the details on how to set the dimensionality, we first recommend taking a look at the sample sets to verify that no clear outlying samples will immediately influence (negatively) the models. One can start by examining all the spectra overlaid in order to see rapidly whether some are different. A histogram of the concentration values may help to assess their [Pg.200]

Another important concept is parsimony. This means that if you have to select among several models that perform more or less equally, the preferred model is the one that has fewer factors, because the fewer factors are used, the less sensitive its predictions will be to non-relevant spectral phenomena. Hence the PLS model will still be useful for future unknowns even when slight dilfe-rences appear (e.g. some smooth instrumental drift, slightly higher noise, etc.). Such a model is said to be more parsimonious than the others. [Pg.204]

A way to start evaluating how a model performs when different factors are considered is to evaluate how much of the information (i.e. variance) in the X- and Y-blocks is explained. We expect that whichever the particular number of factors is the optimum, no more relevant information would enter the model after such a value. Almost any software will calculate the amount of variance explained by the model. A typical output will appear as in Table 4.1. There, it is clear that not all information in X is useful to predict the concentration of Sb in the standards, probably because of the interfering phenomena caused by the concomitants. It is worth noting that only around 68% of the information in X is related to around 98% of the information in Y ([Sb]). This type of table is not always so clear and a fairly important number of factors may be required to model a large percentage of information in X and, more importantly, in Y. As a first gross approach, one can say that the optimal dimensionality should be [Pg.204]

Number of factors Variance explained in X-block (%) Variance explained in Y-block (%) [Pg.205]

The dimensionality of the model, a, is estimated so as to give the model as good predictive properties as possible. Geometrically, this corresponds to the fitting of an a-dimensional hyperplane to the object points in the measurement space. The fitting is made using the least squares criterion, i.e. the sum of squared residuals is minimized for the class data set. [Pg.85]

In SIMCA and CLASSY, the inner model space is formed from all the significant components. In practice, because of the difficulty of obtaining the number of significant components, often the dimensionality of the model is systematically examined to obtain the best number of components on the basis of method performances (prediction). [Pg.126]

In addition to the dimensionality of the model, one also must consider what one can called the dimensionality of the graph. All of the problems that are endemic to this phase of traditional nomenclature will be evaded in the nomenclature being proposed because the focus shall be strictly on the set of edges. The set of faces, in this system, has no significance. Consequently, the fact that Euler s Polyhedron Formula is applicable only to heuristically simple polytopes (of any dimension) is a problem that does NOT arise. [Pg.28]

The dimensionality of the model parameters is expressed through L, using the condition of the Hamiltonian 02 being dimensionless... [Pg.597]

To provide an overview on cell-level models, in this chapter the dimensionality of the models is used as the criterion. On the cell level, zero-dimensional to fully three-dimensional approaches are known. These dimensions are illustrated in Figure 15.2 Whereas zero-dimensional models are single equations and one-dimensional approaches describe processes orthogonal to the electrolyte, simulations in two and more dimensions also include the mass, heat, and charge transport in the plane of the flow field. [Pg.269]

Using amplitude ratios makes inverting for source mechanisms more difficult, however, because a ratio is a nonlinear function of its denominator. Systematic searching methods still work, but because the dimensionality of the model space is increased by two over that for a DC mechanism, the computational labor is greatly increased (typically by a factor of more than 100). [Pg.1576]

Re-sampling methods are widely used to estimate parameters and/or their uncertainty in a model [28,48]. The simplest case is the estimation of the mean of a population. In a multivariate context, re-sampling methods are applied to estimate the parameters and their uncertainty with two objectives (a) To estimate the dimensionality of the model in terms of latent variables and (b) to estimate the uncertainty of individual variables to find the relevant ones (out of many). [Pg.182]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...