Selection of variables

A measure of the linear interrelation of two non-constant variables X and Z is the correlation coefficient [Pg.228]

If X and Z are completely correlated, then there is a representation Z =aX+b with a 0. There is a term from linear algebra to describe this situation affine dependence. Complete correlation is an equivalence relation on the set of variables. [Pg.229]

If a single predictor X is chosen from a leirger set of Xj, j n to form a regression for the target variable Y, then the X with the highest absolute correlation coefficient with Y should be selected. This is the optimal choice partlculeirly in linear regression (see Subsection 6.2.1), since the coefficient of determination in a simple linear re- [Pg.229]

The situation is more complex in multiple eind/or nonlinear regression. For these cases, predictors with maximal correlation coefficients to Y do not necessarily form [Pg.229]

When deaUng with binary classification problems using a single descriptor, Fisher ratios (FR) can be used to select the best predictor [60,315,324]. Such ratios are defined as [Pg.230]

For all the engineering branches that use dimensional analysis as a methodology, the pertinent variables of one process can be classified into four groups [Pg.488]

Fluid flow, heating and composition, which change by reaction or by transfer at one interface, represent the specificity of the chemical engineering processes. The response of a system to the applied effects that generate the mentioned cases depends on the nature of the materials involved in the process. All the properties of the materials such as density, viscosity, thermal capacity, conductivity, species diffusivity or others relating the external effects to the process response must be included as variables. The identification of these variables is not always an easy task. A typical case concerns the variation of the properties of the materials, in a nonlinear dependence with the operation variables. For example, when studying the flow of complex non-Newtonian fluids such as melted polymers in an externally heated conduct, their non-classical properties and their state regarding the effect of temperature make it difficult to select the properties of the materials. [Pg.488]

Feature selection, i.e. the selection of variables that are meaningful for the classification and elimination of those that have no discriminating (or, for certain techniques, no modelling power). This step is discussed further in Section 33.3. [Pg.207]

Sets of spectroscopic data (IR, MS, NMR, UV-Vis) or other data are often subjected to one of the multivariate methods discussed in this book. One of the issues in this type of calculations is the reduction of the number variables by selecting a set of variables to be included in the data analysis. The opinion is gaining support that a selection of variables prior to the data analysis improves the results. For instance, variables which are little or not correlated to the property to be modeled are disregarded. Another approach is to compress all variables in a few features, e.g. by a principal components analysis (see Section 31.1). This is called... [Pg.550]

The similarity of samples can be evaluated by using geometrical constructs based on the standard deviation of the objects modeled by SIMCA. By enclosing classes in volume elements in descriptor space, the SIMCA method provides information about the existence of similarities among the members of the defined classes. Relations among samples, when visualized in this way, increase one s ability to formulate questions or hypotheses about the data being examined. The selection of variables on the basis of MPOW also provides clues as to how samples within a class are similar, and the derived class model describes how the objects are similar, with regard to the internal variation of these variables. [Pg.208]

A space state model of the reactor can be easily deduced, by appropriate selection of variables. The Eq.(12)-(14) can be rewritten as follows ... [Pg.9]

The selection of variables is of central importance for the outcome of a system comparison on environmental and resource use impacts. The ideal variable or set of variables respectively provides information and describes the state of environmental phenomena with certain significance. Thus, applying a set of variables should make it possible to monitor and assess the state of the environment, to identify changes and trends, to transmit scientific data to become relevant for policy, and to evaluate already implemented policy measures. The concept of environmental indicators is broadly accepted as an adequate tool. Accordingly, an indicator is defined as a parameter or a value derived from parameters, which indicates the state of the environment with significance extending beyond that which is directly associated with a parameter value. A parameter s definition in this context is a property that is measured or observed (OECD 1994). Fieri et al. (1996) states that the purposes of indicators are as follows ... [Pg.6]

No straightforward and efficient method for optimal selection of variables for predictive models. [Pg.351]

Tlie factor-based methods described in this chapter (PLS/PCR) are inverse methods that do not reh on the selection of variables to solve the inversion... [Pg.352]

R.R. Blocking, The analysis and selection of variables in linear regression. Biometrics 32 1976) 1-49. [Pg.264]

If plotting cannot be used in the selection of a proper geometric representation (and the corresponding equation form) of the Influence of each variable factor being studied, then accepted theory and past research results may aid the researcher in selection of variable and equation forms to be used. Within limits, the resuts of t-tests of regression coefficients may be used to select the most appropriate variables and variable transformations for use in the final... [Pg.303]

Design is the selection of variables (e.g., heat exchanger areas, maximum heater and cooler loads) which lead a given design structure (HEN topology or general process flow sheet) to have specified properties. [Pg.9]

One particular challenge in the effective use of MLR is the selection of appropriate X-variables to use in the model. The stepwise and APC methods are some of the most common empirical methods for variable selection. Prior knowledge of process chemistry and dynamics, as well as the process analytical measurement technology itself, can be used to enable a priori selection of variables or to provide some degree of added confidence in variables that are selected empirically. If a priori selection is done, one must be careful to select variables that are not highly correlated with one other, or else the matrix inversion that is done to calculate the MLR regression coefficients (Equation 8.24) can become unstable, and introduce noise into the model. [Pg.255]

The initial selection of variables can be further reduced automatically using a selection algorithm (often backward elimination or forward selection). Such an automated procedure sounds as though it should produce the optimal choice of predictive variables, but it is often necessary in practice to use clinical knowledge to over-ride the statistical process, either to ensure inclusion of a variable that is known from previous studies to be highly predictive or to eliminate variables that might lead to overfitting (i.e. overestimation of the predictive value of the model by inclusion of variables that appear to be predictive in the derivation cohort, probably by chance, but are unlikely to be predictive in other cohorts). [Pg.187]

The mixture fraction Z could have been introduced and employed directly in earlier examples. Since it is a coupling function, it could have been used in place of P in equations (9) and (38), with equation (70) employed to recover jS from the solution for Z. In fact, it could have been introduced in Section 1.3, to replace P in equation (1-49). Although such selections of variables basically are matters of personal taste, the replacement of jS by Z achieves a convenient normalization and also can help to clarify aspects of physical interpretations. For example, in equation (25) the flame-sheet condition, jS = 0, becomes Z = Z, a condition of mixture-fraction stoichiometry. For the droplet-burning problem, when all the assumptions that underlie equation (58) are introduced, it is found that equation (42), interpreted for Z, becomes simply Z = 1 — (1 + where B is defined at... [Pg.76]

A stepwise selection procedure is performed to search for QSPR/QSAR models after the preliminary exclusion of - constant and near-constant variables. The - pair correlation cutoff selection of variables is then performed to avoid highly correlated descriptor variables within the model. [Pg.75]

Modeling is also a requirement for the design space. However, what constitutes a model can vaiy from an almost totally empirical model to a first principles model All may be valid if the assumptions upon which the model was created are clear and adhered to. For example, the model presented above is an empirical model based upon selection of variables that seem logical based upon the science and statistical analysis of the data collected. If we had a physics equation (constitutive relationship) and the ability to predict all of the variables, it would be a first principles model. In between empirical and first principles are so called hybrid" models that may have known relationtihips between variables but require calibration or determination of coefficients. The differ-ences are that ... [Pg.330]

Of course, the reason for the improvement in the calibration model when the second term is included is that A21 serves to compensate for the absorbance due to the tyrosine since X21 is in the spectral region of a tyrosine absorption band with little interference from tryptophan. Figure 6. In general, the selection of variables for multivariate regression analysis may not be so obvious. [Pg.174]

In order to optimise the in vitro profile, we focused our attention on the nature of the substituent at N-1 and a quantitative structure-activity study was performed on a series of N-1 alkyl derivatives. After selection of variables, the affinity for the CCK-B receptor was related to the calculated values of both lipophilicity [26] and molar refractivity [27] of the substituent and the following equation was derived using PLS analysis implemented in program GOLPE [28] (all parameters are referred to the substituents at N-1) ... [Pg.382]

However, in this case the relationships obtained were only valid during the first 40 minutes of reaction since they were determined from the initial conversion values, although at 550°C the difference between the calculated and experimentally determined values was only 2.5%. A further model could be derived where the effects of catalyst stability and deactivation were also considered in order to define the best selection of variables in the catalyst preparation. [Pg.414]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...