Centering and Scaling

This section will review the basic operations for centering, weighting, and scaling data sets. We will simply review the mechanics of each operation. For a more detailed treatment of the topics, please refer to the bibliography. [Pg.173]

There are two basic kinds of centering and scaling. Data can be treated variable by variable, or they can be treated sample by sample. For example, if we are dealing with a system of absorbance spectra measured on samples each containing two components, a variable by variable operation would deal with one component at a time, or one wavelength at a time while a sample by sample operation would deal with one spectrum or one sample at a time. [Pg.173]

Variance scaling is performed on a variable by variable basis. In other words, we would variance scale a set the concentration values of a data set on a component by component basis. Starting with the first component, we compute the total variance of the concentrations of that component. There are several variations on variance scaling. First, we will consider the most the method which adjusts all the variables to exactly unit variance. To do this we compute the variance of the variable, and then use the variance to scale all the concentrations of all the samples so that the new variance for the component is equal to unity. [Pg.175]

To compute the variance, we first find the mean concentration for that component over all of the samples. We then subtract this mean value from the concentration value of this component for each sample and square this difference. We then sum all of these squares and divide by the degrees of freedom (number of samples minus 1). The square root of the variance is the standard deviation. We adjust the variance to unity by dividing the concentration value of this component for each sample by the standard deviation. Finally, if we do not wish mean-centered data, we add back the mean concentrations that were initially subtracted. Equations [Cl] and [C2] show this procedure algebraically for component, k, held in a column-wise data matrix. [Pg.175]

Then scale each point by the standard deviation [Pg.176]

The two main ways of data pre-processing are mean-centering and scaling. Mean-centering is a procedure by which one computes the means for each column (variable), and then subtracts them from each element of the column. One can do the same with the rows (i.e., for each object). ScaUng is a a slightly more sophisticated procedure. Let us consider unit-variance scaling. First we calculate the standard deviation of each column, and then we divide each element of the column by the deviation. [Pg.206]

If the data include outliers, it is advisable to use robust versions of centering and scaling. The simplest possibility is to replace the arithmetic means of the columns by the column medians, and the standard deviations of the columns by the median absolute deviations (MAD), see Sections 1.6.3 and 1.6.4, as shown in the following M-code for a matrix X. [Pg.50]

The PCA model gives a representation of the centered (and scaled) data matrix... [Pg.91]

Suppose that a response surface design has been run with n design variables, Xj, x, x, ..., x , and m environmental variables, z, z, z, ..., z. During the experiment the environmental variables are controlled at fixed levels and can be regarded as fixed effects. Suppose that the x s and z s are centered and scaled around 0. In this section, several alternative models for the relationship between the design and environmental variables and the response will be considered. [Pg.48]

The procedure by which the factorization described above is carried out on a data matrix obtained after centering and scaling each descriptor to unit variance is often called Factor Analysis, (The term "Factor analysis" in mathematical statistics has a slightly different meaning which will not be discussed here. For details of this, see [9]). It can be effected by computing the eigenvectors to (X - X) (X - X). Another, and more efficient method is to use a procedure called Singular value descomposition (SVD), see Appendix 15B. [Pg.360]

Data from 23 normal operating batch runs are available. The variables measured during the run of the batch process are added, amount of hydrogen, pressure and temperature. The process data are collected at 101 equidistant points in time. The point of the melting curve related to a temperature of 35 °C is chosen as the quality variable. Hence, a problem results of relating a quality variable y (23 x 1) with a three-way array of process variables X (23 x 3 x 101). Prior to analysis, X and y are centered and scaled across the batch direction (see Chapter 9). Subsequently, both X and y are scaled to unit sum of squares. [Pg.78]

The purpose and use of centering and scaling are discussed in this chapter. The background is explained using two-way bilinear data analysis for simplicity, and the results are then generalized to three-way data analysis. Other types of preprocessing are also relevant for three-way models, but these will only be lightly touched upon in Appendix 1 as the basic use of these is very similar to their use in standard two-way analysis. [Pg.221]

Important notation, definitions and main results are given in Section 9.1. The derivation of the main results is based on a quite mathematical exposition in the following sections. For those less interested in these mathematical derivations, it is important to know at least the rules of centering and scaling given in Section 9.1. [Pg.222]

In Section 9.5 three-way centering and scaling is described based on the two-way results. The main overall results are described in Section 9.6. [Pg.222]

Already at this point, it is useful to have an overview of the centering and scaling properties. A number of results are therefore listed here. These results are derived and substantiated in the following sections. [Pg.227]

Figure 9.8. A two-way array showing the relation between centering and scaling.

A number of important features of the common preprocessing steps centering and scaling... [Pg.244]

Proper combinations of centering and scaling can be expressed similarly to the two-way case. That is, scaling does not affect centering across other modes, but centering affects scaling within all modes. [Pg.245]

The appropriate centering and scaling procedures can most easily be summarized as in Figure 9.13. Centering must be done by subtracting scalars from individual vectors of the array, while scaling must be performed by multiplying individual slabs. [Pg.245]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...