Variable selection problem

Garthwaite, P. Dickey, J. M. 1996. Quantifying and using expert opinion for variable-selection problems in regression. [Pg.1706]

In terms of the derived general relationships (3-1) and (3-2), x, y, and h are independent variables—cost and volume, dependent variables. That is, the cost and volume become fixed with the specification of dimensions. However, corresponding to the given restriedion of the problem, relative to volume, the function g(x, y, z) =xyh becomes a constraint funedion. In place of three independent and two dependent variables the problem reduces to two independent (volume has been constrained) and two dependent as in functions (3-3) and (3-4). Further, the requirement of minimum cost reduces the problem to three dependent variables x, y, h) and no degrees of freedom, that is, freedom of independent selection. [Pg.441]

Continuous variables can assume any value within an interval discrete variables can take only distinct values. An example of a discrete variable is one that assumes integer values only. Often in chemical engineering discrete variables and continuous variables occur simultaneously in a problem. If you wish to optimize a compressor system, for example, you must select the number of compressor stages (an integer) in addition to the suction and production pressure of each stage (positive continuous variables). Optimization problems without discrete variables are far easier to solve than those with even one discrete variable. Refer to Chapter 9 for more information about the effect of discrete variables in optimization. [Pg.45]

This problem is very small, however, with only two decision variables. As the number of decision variables increases, the number of iterations required by evolutionary solvers to achieve high accuracy increases rapidly. To illustrate this, consider the linear project selection problem shown in Table 10.9. The optimal solution is also shown there, found by the LP solver. This problem involves determining the optimal level of investment for each of eight projects, labeled A through H, for which fractional levels are allowed. Each project has an associated net present value (NPV) of its projected net profits over the next 5 years and a different cost in each of the 5 years, both of which scale proportionately to the fractional level of investment. Total costs in each year are limited by forecasted budgets (funds available in... [Pg.405]

Closely related to the creation of regression models by OLS is the problem of variable selection (feature selection). This topic is therefore presented in Section 4.5, although variable selection is also highly relevant for other regression methods and for classification. [Pg.119]

Variable selection is an optimization problem. An optimization method that combines randomness with a strategy that is borrowed from biology is a technique using genetic algorithms—a so-called natural computation method (Massart et al. 1997). Actually, the basic structure of GAs is ideal for the purpose of selection (Davis 1991 Hibbert 1993 Leardi 2003), and various applications of GAs for variable selection in chemometrics have been reported (Broadhurst et al. 1997 Jouan-Rimbaud et al. 1995 Leardi 1994, 2001, 2007). Only a brief introduction to GAs is given here, and only from the point of view of variable selection. [Pg.157]

A general alternative to stepwise-type searching methods for variable selection would be methods that attempt to explore as much of the possible solution space as possible. An exhaustive search of all possible combinations of variables is possible only for problems that involve relatively few x variables. However, it... [Pg.423]

As discussed in the introduction, the solution of the inverse model equation for the regression vector involves the inversion of R R (see Equation 5 23). In many anal al chemistry experiments, a large number of variables are measured and R R cannot be inverted (i.e., it is singular). One approach to solving this problem is called stepwise MLR where a subset of variables is selected such that R R is not singular. There must be at least as many variables selected as there are chemical components in the system and these variables must represent different sources of variation. Additional variables are required if there are other soairces of variation (chemical or physical) that need to be modeled. It may also be the case that a sufficiently small number of variables are measured so that MIR can be used without variable selection. [Pg.130]

TTie problem with keeping variables that appear to be significant but are only modeling noise is that tliis overfltting of the data degrades the prediction ability of the model. It is, therefore, important to only add variables that improve prediction of future samples, not just improve the fit. Tlie approach we take in this section is to use the statistical output as the first pass for variable selection. We then further refine the model (which usually means reducing the number of variables) by examining results from a validation set. [Pg.311]

Recent work by Weber86 87 has shown that it is possible to determine the activity of catalysts from a completely practical point of view by correlating the more important actions of the catalyst as a function of the process variables (temperature, pressure, nature of catalyst) by means of simple equations, the validity of which has been proved for a large number of catalytic processes. These equations are especially useful in the study of simultaneous chemical reactions in which selectivity problems play a part. They are of great value for more systematical research on catalysts. [Pg.104]

Remark 4 The presented optimization model is an MINLP problem. The binary variables select the process stream matches, while the continuous variables represent the utility loads, the heat loads of the heat exchangers, the heat residuals, the flow rates and temperatures of the interconnecting streams in the hyperstructure, and the area of each exchanger. Note that by substituting the areas from the constraints (B) into the objective function we eliminate them from the variable set. The nonlinearities in the in the proposed model arise because of the objective function and the energy balances in the mixers and heat exchangers. As a result we have nonconvexities present in both the objective function and constraints. The solution of the MINLP model will provide simultaneously the... [Pg.355]

Besides Tikhonov regularization, there are numerous other regularization methods with properties appropriate to distinct problems [42, 53,73], For example, an iterated form of Tikhonov regularization was proposed in 1955 [77], Other situations include using different norms instead of the Euclidean norm in Equation 5.25 to obtain variable-selected models [53, 79, 80] and different basis sets such as wavelets [81],... [Pg.153]

These IF statements are really a form of discrete decision making embedded within the model. One possible approach to remove the difficulties it caused is to move the discrete decisions to the outside of the model and the continuous variable optimizer. For example, the friction factor equation can be selected to be the laminar one irrespective of the Reynolds number that is computed later. Constraints can be added to forbid movement outside the laminar region or to forbid movement too far outside the laminar region. If the solution to the well-behaved continuous variable optimization problem (it is solved with few iterations) is on such a constraint boundary, tests can be made to see if crossing the constraint boundary can improve the objective function. If so, the boundary is crossed—i.e., a new value is given to the discrete decision, etc. [Pg.520]

It is desired to perform PLS calibration on this dataset, but first to standardise the data. Explain why there may be problems with this approach. Why is it desirable to reduce the number of variables from 200, and why was this variable selection less important in the PLS1 calculations ... [Pg.338]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...