Robust Covariance Estimator

A comparison with the target covariance shows that the conventional indirect approach gives a very good estimation of the covariance in this case [Pg.189]

The performances of the indirect conventional methods described previously are very sensitive to outliers, so they are not robust. The main reason for this is that they use a direct method to calculate the covariance matrix of the residuals ( ). If outliers are present in the sampling data, the assumption about the error distribution will be [Pg.189]

An estimator is called robust if it is insensitive to mild departures from the underlying assumptions and is also only slightly more inefficient relative to conventional approaches when these assumptions are satisfied. [Pg.190]

One type of common robust estimator is the so-called M-estimator or generalized maximum likelihood estimator, originally proposed by Huber (1964). The basic idea of an M-estimator is to assign weights to each vector, based on its own Mahalanobis distance, so that the amount of influence of a given point decreases as it becomes less and less characteristic. [Pg.190]

Let Zi. Zr be samples of j dimension. Then, the M-estimator of the location (vector m) and the scatter (matrix V) are obtained by solving the following set of simultaneous equations (Maronna, 1976 Huber, 1981, p. 212) [Pg.190]

The covariance matrix of measurement errors is a very useful statistical property. Indirect methods can deal with unsteady sampling data, but unfortunately they are very sensitive to outliers and the presence of one or two outliers can cause misleading results. This drawback can be eliminated by using robust approaches via M-estimators. The performance of the robust covariance estimator is better than that of the indirect methods when outliers are present in the data set. [Pg.214]

In real practice, the location m and the variance have to be estimated from real data. An iterative algorithm, similar to the one used in Chapter 10 for the robust covariance estimation, is used to calculate the trust function. The main advantage of using this algorithm is that the convergence is warranted. [Pg.235]

A more robust correlation measure, -y Vt, can be derived from a robust covariance estimator such as the minimum covariance determinant (MCD) estimator. The MCD estimator searches for a subset of h observations having the smallest determinant of their classical sample covariance matrix. The robust location estimator—a robust alternative to the mean vector—is then defined as the arithmetic mean of these h observations, and the robust covariance estimator is given by the sample covariance matrix of the h observations, multiplied by a factor. The choice of h determines the robustness of the estimators taking about half of the observations for h results in the most robust version (because the other half of the observations could be outliers). Increasing h leads to less robustness but higher efficiency (precision of the estimators). The value 0.75n for h is a good compromise between robustness and efficiency. [Pg.57]

It is a distance measure that accounts for the covariance structure, here estimated by the sample covariance matrix C. Clearly, one could also take a robust covariance estimator. The Mahalanobis distance can also be computed from each observation to the data center, and the formula changes to... [Pg.60]

The Mahalanobis distance used for multivariate outlier detection relies on the estimation of a covariance matrix (see Section 2.3.2), in this case preferably a robust covariance matrix. However, robust covariance estimators like the MCD estimator need more objects than variables, and thus for many applications with m>n this approach is not possible. For this situation, other multivariate outlier detection techniques can be used like a method based on robustified principal components (Filzmoser et al. 2008). The R code to apply this method on a data set X is as follows ... [Pg.64]

The goal of robust PCA methods is to obtain principal components that are not influenced much by outliers. A first group of methods is obtained by replacing the classical covariance matrix with a robust covariance estimator, such as the reweighed MCD estimator [45] (Section 6.3.2). Let us reconsider the Hawkins-Bradu-Kass data in p = 4 dimensions. Robust PCA using the reweighed MCD estimator yields the score plot in Figure 6.7b. We now see that the center is correctly estimated in the... [Pg.187]

Another approach to robust PCA has been proposed by Hubert et al. [52] and is called ROBPCA. This method combines ideas of both projection pursuit and robust covariance estimation. The projection pursuit part is used for the initial dimension reduction. Some ideas based on the MCD estimator are then applied to this lowerdimensional data space. Simulations have shown that this combined approach yields more accurate estimates than the raw projection pursuit algorithm RAPCA. The complete description of the ROBPCA method is quite involved, so here we will only outline the main stages of the algorithm. [Pg.189]

A = (aij)mxg is the matrix for the constraint equations. The algorithm for robust covariance estimation can be implemented as follows. [Pg.191]

Campbell NA. Robust procedures in multivariate analysis 1 robust covariance estimation. Appl Stat 1980 29 231-7. [Pg.353]

Pena D, Prieto FJ. Multivariate outher detection and robust covariance estimation. Technometrics 2001 41 286—300. [Pg.353]

Only a few publications in the literature have dealt with this problem. Almasy and Mah (1984) presented a method for estimating the covariance matrix of measured errors by using the constraint residuals calculated from available process data. Darouach et al. (1989) and Keller et al. (1992) have extended this approach to deal with correlated measurements. Chen et al. (1997) extended the procedure further, developing a robust strategy for covariance estimation, which is insensitive to the presence of outliers in the data set. [Pg.203]

There exist other estimators for robust covariance or correlation, like S-estimators (Maronna et al. 2006). In general, there are restrictions for robust estimations of the... [Pg.57]

FIGURE 2.13 Concentrations of MgO and Cl in glass vessels samples (Janssen et al. 1998). The plots show the Mahalanobis distances versus the object number the distances are computed using classical (left) and robust (right) estimates for location and covariance. The horizontal lines correspond to the cutoff value Jx fi 975 = 2.72. Using the robust estimates, several outliers are identified. [Pg.63]

Unfortunately, the use of these affine equivariant covariance estimators is limited to small to moderate dimensions. To see why, let us again consider the MCD estimator. As explained in Section 6.3.2, if p denotes the number of variables in our data set, the MCD estimator can only be computed if pcovariance matrix of any //-subset has zero determinant. Because h < n, the number of variables p may never be larger than n. A second problem is the computation of these robust estimators in high dimensions. Today s fastest algorithms can handle up to about 100 dimensions, whereas there are fields like chemometrics that need to analyze data with dimensions in the thousands. Moreover the accuracy of the algorithms decreases with the dimension p, so it is recommended that small data sets not use the MCD in more than 10 dimensions. [Pg.188]

Note that a PCA analysis often starts by prestandardizing the data to obtain variables that all have the same spread. Otherwise, the variables with a large variance compared with the others will dominate the first principal components. Standardizing by the mean and the standard deviation of each variable yields a PCA analysis based on the correlation matrix instead of the covariance matrix. We can also standardize each variable j in a robust way, e.g., by first subtracting its median, medQty,. .., x,v), and then dividing by its robust scale estimate, Qn(Xip > -%) ... [Pg.189]

Based on the estimated robust covariance matrix, the Mahalanobis distances for oil residuals are calculated. All residuals whose Mahalanobis distances exceed a critical vciue based on the 97.5% quantile of the X2 distribution are handled as ouUiers. l For each identified outlier, a dummy variable is constructed which has entry 1 at the time index of the identified outlier (and is zero otherwise).Figure 2.17 shows the pairs of residuals at each time. Grey-coloured -marked points refer to identified outliers whose time indices are superimposed. [Pg.48]

There are essentially two approaches for robust PCA the first is based on PCA on a robust covariance matrix, which is rather straightforward as the PCs are the eigenvectors of the covariance matrix. Different robust estimators of covariance matrix may be adopted (MVT [92], MVE and MCD [93]) but the decomposition algorithm is the same. The second approach is based on projection pursuit (PP), by using a projection aimed at maximizing a robust measure of scale, that is, in a PP algorithm, the direction with maximum robust variance of the projected data is pursued here different search algorithms are proposed. [Pg.122]

One of the first concepts to increase the robustness of robust PLS was introduced by Gil and Romera [35]. They used robust covariance and crosscovariance between X and the dependent variable, y, adopting the Stahel-Donoho estimator of data scatter with Huber s weighing function [36] for this purpose. Another attempt was made by Cummins and Andrews [37]. They introduced an iterative approach based on down-weighting the influence of samples that have large residuals from the constructed PLS model. The sample weights are modified iteratively during the construction of the PLS model. [Pg.344]

Most techniques for process data reconciliation start with the assumption that the measurement errors are random variables obeying a known statistical distribution, and that the covariance matrix of measurement errors is given. In Chapter 10 direct and indirect approaches for estimating the variances of measurement errors are discussed, as well as a robust strategy for dealing with the presence of outliers in the data set. [Pg.26]

Chen, J., Bandoni, A., and Romagnoli, J. A. (1997). Robust estimation of measurement error variance/ covariance from process sampling data. Comput. Chem. Eng. 21, 593-600. [Pg.27]

Without outliers. The estimation of P both from the conventional indirect approach PC and from the robust approach 4/r gave similar results when compared with the target covariance matrix (P ... [Pg.212]

Pr from the robust estimator still gives the correct answer, as expected. However, the conventional approach fails to provide a good estimate of the covariance even for the case when only one outlier is present in the sampling data. [Pg.212]

The robust estimator still provides a correct estimation of the covariance matrix on the other hand, the estimate J>C> provided by the conventional approach, is incorrect and the signs of the correlated coefficients have been changed by the outliers. [Pg.214]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...